Get up and running with ScrapeGraphAI in under 5 minutes. This guide will walk you through creating your first scraper using the SmartScraperGraph pipeline.
The most common scraping pipeline is SmartScraperGraph, which extracts information from a single page given a natural language prompt and a source URL.
Here’s a complete example using Ollama with a local Llama model:
1
Ensure Ollama is running
Make sure you have Ollama installed and a model downloaded:
ollama pull llama3.2ollama serve
2
Create your scraper script
Create a new Python file (e.g., scraper.py) with the following code:
from scrapegraphai.graphs import SmartScraperGraphimport json# Define the configuration for the scraping pipelinegraph_config = { "llm": { "model": "ollama/llama3.2", "temperature": 0, "model_tokens": 8192, "format": "json", }, "verbose": True, "headless": False,}# Create the SmartScraperGraph instancesmart_scraper_graph = SmartScraperGraph( prompt="Find information about the founders and what the company does", source="https://scrapegraphai.com/", config=graph_config)# Run the pipelineresult = smart_scraper_graph.run()print(json.dumps(result, indent=4))
3
Run the scraper
Execute your script:
python scraper.py
You’ll see the scraper fetch the page, process it with the LLM, and return structured data:
{ "description": "ScrapeGraphAI transforms websites into clean, organized data for AI agents and data analytics.", "founders": [ { "name": "Marco Vinciguerra", "role": "Founder & Software Engineer", "linkedin": "https://www.linkedin.com/in/marco-vinciguerra-7ba365242/" }, { "name": "Lorenzo Padoan", "role": "Founder & Product Engineer", "linkedin": "https://www.linkedin.com/in/lorenzo-padoan-4521a2154/" } ]}
For production use cases, you might prefer OpenAI’s models:
import osimport jsonfrom dotenv import load_dotenvfrom scrapegraphai.graphs import SmartScraperGraphload_dotenv()# Define the configuration with OpenAIgraph_config = { "llm": { "api_key": os.getenv("OPENAI_API_KEY"), "model": "openai/gpt-4o-mini", }, "verbose": True, "headless": False,}# Create and run the scrapersmart_scraper_graph = SmartScraperGraph( prompt="Extract the first article with its title and summary", source="https://www.wired.com", config=graph_config,)result = smart_scraper_graph.run()print(json.dumps(result, indent=4))
Model Selection: gpt-4o-mini offers a great balance between performance and cost. For more complex scraping tasks, consider using gpt-4o or gpt-4-turbo.