Automate Real Estate Listing Scraper 🏠🤖 with ScrapeGraph AI and Google Sheets

Go to Workflow
6 views
Built by Davide Boizza Davide Boizza
Created on June 13, 2026

Description

This workflow automates the process of scraping real estate property listings from websites using ScrapeGraph AI, extracting structured data, and saving it to a Google Sheet. It is designed to handle paginated listing pages and can be adapted to any real estate site that uses URL parameters for pagination.

NOTE:
This workflow has been tested with Immobiliare.it, the #1 real estate website in Italy. However, it is designed to be adaptable by modifying the pagination parameter and the listing URL pattern, you can use it with any real estate website that structures its listings with URL-based pagination.

Business Use Cases:

Real estate market intelligence
Lead generation for agencies
Price trend analysis
Property comparison dashboards
CRM enrichment
Competitor monitoring

Key Advantages

1. ✅ Fully Automated Lead Collection

Automatically collects real estate listings without manual browsing.

2. ✅ AI-Powered Extraction

Uses AI instead of rigid selectors:

More resilient to website layout changes
Handles dynamic content better
Reduces maintenance effort

3. ✅ Structured Data Output

The defined JSON schema ensures:

Clean database-ready data
Standardized fields
Easy integration with CRM or analytics tools

4. ✅ Pagination Scalability

Can easily scale:

Increase number of pages
Change city
Adapt to different portals

5. ✅ Duplicate Prevention

Google Sheets uses URL matching to:

Avoid duplicates
Update existing records

6. ✅ Modular Architecture

The workflow is modular and reusable:

URL generation logic is independent
Extraction schema is customizable
Storage layer can be replaced (CRM, database, Airtable, etc.)

7. ✅ Cost & Time Efficiency

Eliminates manual data entry
Saves research time
Enables automated market monitoring

How it works

The workflow is structured in two main phases:

Listing URL Discovery
The user provides a base URL, the maximum number of pages to scrape, and the pagination parameter name (e.g., pag for Immobiliare.it).
A Code node generates a list of page URLs by appending the pagination parameter.
Each page URL is processed through the ScrapegraphAI node, which extracts all individual listing URLs.
An Information Extractor node (powered by Google Gemini) filters and validates the extracted URLs based on a defined structure.
A Wait node introduces a delay between requests to avoid rate limiting.
A Loop Over Items node ensures all generated page URLs are processed.

Data Extraction & Storage
All collected listing URLs are aggregated and split into individual items.
A second loop processes each listing URL through another ScrapegraphAI node, which extracts detailed property data (title, description, price, area, bedrooms, bathrooms, floor, rooms, balcony, terrace, cellar, heating, air conditioning, image URLs) based on a JSON schema.
The extracted data is then written to a Google Sheet using the Google Sheets node, with each listing stored in a new row and deduplicated based on the listing URL.

The workflow is fully automated and can scale to handle multiple listing pages and hundreds of individual property URLs.

Set up steps

To use this workflow, follow these steps:

Import the workflow into your n8n instance.

Configure credentials:
ScrapegraphAI: Add your API key for ScrapegraphAI.
Google Gemini (PaLM): Add your Google Gemini API credentials.
Google Sheets OAuth2: Authenticate with the Google account where you want to store the data.

Prepare your target Google Sheet:
Create a new Google Sheet (or clone this template).
Note the Sheet ID (from the URL) and the sheet name (tab name) where data should be written.

Customize the input parameters:
In the Set params node, define:
url: The base URL of the listing page (without pagination parameters).
max_pages: The number of pages to scrape.
page_format_value: The query parameter used for pagination (e.g., pag for Immobiliare.it).

Adjust the listing URL structure (if needed):
In the Extract individual URL node, update the system prompt to match the URL pattern of the target website (e.g., https://www.xxx.it/xxx/xxxx).

Review the output schema:
In the Extract data node, you can modify the JSON schema to match the fields you want to extract from each listing.

Update the Google Sheet node:
Set the correct Document ID and Sheet Name in the Update real estate listings node.
Ensure the column mapping matches your sheet structure.

Activate the workflow and click Execute Workflow to start scraping.

👉 Subscribe to my new YouTube channel. Here I’ll share videos and Shorts with practical tutorials and FREE templates for n8n.

Need help customizing?
Contact me for consulting and support or add me on Linkedin.

Nodes Used (4)

Code
n8n-nodes-base.code
Google Gemini Chat Model
@n8n/n8n-nodes-langchain.lmChatGoogleGemini
Google Sheets
n8n-nodes-base.googleSheets
Information Extractor
@n8n/n8n-nodes-langchain.informationExtractor