Skip to main content
The SearchGraph combines search engine capabilities with ScrapeGraphAI’s extraction power. It automatically searches the web, finds relevant pages, and extracts structured data.

Overview

This example demonstrates how to:
  • Perform web searches and extract results
  • Combine search with intelligent data extraction
  • Use schemas for structured search results
  • Configure search parameters

Basic Search Example

Search the web and extract structured information:
import os
from dotenv import load_dotenv
from scrapegraphai.graphs import SearchGraph

load_dotenv()

# Define the configuration for the graph
openai_key = os.getenv("OPENAI_API_KEY")

graph_config = {
    "llm": {
        "api_key": openai_key,
        "model": "openai/gpt-4o",
    },
    "max_results": 2,
    "verbose": True,
}

# Create the SearchGraph instance and run it
search_graph = SearchGraph(
    prompt="List me Chioggia's famous dishes",
    config=graph_config
)

result = search_graph.run()
print(result)

Search with Custom Schema

Define a schema to get structured, validated results:
import os
from typing import List
from dotenv import load_dotenv
from pydantic import BaseModel, Field
from scrapegraphai.graphs import SearchGraph
from scrapegraphai.utils import convert_to_csv, convert_to_json, prettify_exec_info

load_dotenv()

# Define the output schema for the graph
class Dish(BaseModel):
    name: str = Field(description="The name of the dish")
    description: str = Field(description="The description of the dish")

class Dishes(BaseModel):
    dishes: List[Dish]

# Define the configuration for the graph
openai_key = os.getenv("OPENAI_APIKEY")

graph_config = {
    "llm": {
        "api_key": openai_key,
        "model": "openai/gpt-4o"
    },
    "max_results": 2,
    "verbose": True,
}

# Create the SearchGraph instance and run it
search_graph = SearchGraph(
    prompt="List me Chioggia's famous dishes",
    config=graph_config,
    schema=Dishes
)

result = search_graph.run()
print(result)

# Get graph execution info
graph_exec_info = search_graph.get_execution_info()
print(prettify_exec_info(graph_exec_info))

# Save to json and csv
convert_to_csv(result, "result")
convert_to_json(result, "result")

Step-by-Step Breakdown

1

Import dependencies

import os
from typing import List
from dotenv import load_dotenv
from pydantic import BaseModel, Field
from scrapegraphai.graphs import SearchGraph

load_dotenv()
Import SearchGraph and Pydantic for schema definition.
2

Define your schema

class Dish(BaseModel):
    name: str = Field(description="The name of the dish")
    description: str = Field(description="The description of the dish")

class Dishes(BaseModel):
    dishes: List[Dish]
Create Pydantic models to structure the search results.
3

Configure search parameters

graph_config = {
    "llm": {
        "api_key": openai_key,
        "model": "openai/gpt-4o"
    },
    "max_results": 2,  # Number of search results to process
    "verbose": True,
}
Set max_results to control how many search results to scrape.
4

Create and run search graph

search_graph = SearchGraph(
    prompt="List me Chioggia's famous dishes",
    config=graph_config,
    schema=Dishes
)

result = search_graph.run()
The graph automatically searches, finds relevant pages, and extracts data.

How SearchGraph Works

1

Execute search

Uses a search engine (like Google) to find relevant web pages based on your prompt
2

Select results

Retrieves the top N results (configured by max_results)
3

Scrape pages

Visits each search result and extracts the page content
4

Extract data

Uses AI to extract information matching your prompt and schema from all pages
5

Aggregate results

Combines data from multiple sources into a single structured response

Configuration Options

Expected Output

With the schema defined above:
{
    "dishes": [
        {
            "name": "Sarde in Saor",
            "description": "Traditional Venetian sweet and sour sardines with onions, pine nuts, and raisins"
        },
        {
            "name": "Risotto di Gò",
            "description": "Creamy risotto made with gò, a local lagoon fish"
        },
        {
            "name": "Moleche",
            "description": "Soft-shell crabs, a seasonal delicacy from the Venice lagoon"
        }
    ]
}
For more control over which links to scrape:
from scrapegraphai.graphs import SearchLinkGraph

search_link_graph = SearchLinkGraph(
    prompt="List me the best AI tools",
    config=graph_config,
)

# Returns URLs first
links = search_link_graph.run()
print("Found links:", links)

# Then you can scrape specific links
from scrapegraphai.graphs import SmartScraperMultiGraph

scraper = SmartScraperMultiGraph(
    prompt="Extract tool name, description, and pricing",
    source=links[:3],  # Scrape top 3 results
    config=graph_config,
)

result = scraper.run()

Common Use Cases

Research

Gather information on topics from multiple sources automatically

Market Analysis

Research competitors, pricing, and market trends

News Aggregation

Collect and structure news articles on specific topics

Product Discovery

Find and compare products across different websites
Search and follow links recursively:
from scrapegraphai.graphs import DepthSearchGraph

depth_graph = DepthSearchGraph(
    prompt="Find information about Python web scraping libraries",
    config={
        "llm": {
            "api_key": os.getenv("OPENAI_API_KEY"),
            "model": "openai/gpt-4o",
        },
        "max_results": 3,
        "max_depth": 2,  # Follow links up to 2 levels deep
        "verbose": True,
    }
)

result = depth_graph.run()
This explores links within the initial search results for more comprehensive data.

Export Results

Save your search results in different formats:
from scrapegraphai.utils import convert_to_csv, convert_to_json

# Run the search
result = search_graph.run()

# Export to different formats
convert_to_csv(result, "dishes_output")  # Creates dishes_output.csv
convert_to_json(result, "dishes_output")  # Creates dishes_output.json

# Pretty print execution info
from scrapegraphai.utils import prettify_exec_info

exec_info = search_graph.get_execution_info()
print(prettify_exec_info(exec_info))

Performance Tips

Limit search results: Start with max_results: 2-3 for faster execution and lower token usage.
Use specific prompts: More specific search queries lead to more relevant results and better extraction.
Enable caching: Set up caching to avoid re-scraping the same pages during development.

Comparison with Basic Scraping

FeatureSmartScraperGraphSearchGraph
InputSpecific URLsNatural language query
SearchNoYes, automatic
Use CaseKnown websitesDiscovery and research
SpeedFasterSlower (search + scrape)
ResultsSingle/few sourcesMultiple sources

Next Steps

Custom Schemas

Learn to structure search results with schemas

Multi-Page Scraping

Scrape specific URLs after searching

Troubleshooting

Issue: No results returned
  • Make your search query more specific
  • Increase max_results to search more pages
  • Check if search engine is accessible
Issue: Irrelevant results
  • Refine your prompt to be more specific
  • Use a schema to filter unwanted data
  • Adjust search parameters
Issue: Slow performance
  • Reduce max_results
  • Enable headless: True mode
  • Use a faster model for extraction
Issue: Rate limiting
  • Add delays between requests
  • Reduce the number of results
  • Use a different search engine