OmniScraperGraph - ScrapeGraphAI

Overview

OmniScraperGraph is a scraping pipeline that automates the process of extracting information from web pages including text and images. It uses vision-capable language models to describe and analyze images alongside text content.

Class Signature

class OmniScraperGraph(AbstractGraph):
    def __init__(
        self,
        prompt: str,
        source: str,
        config: dict,
        schema: Optional[Type[BaseModel]] = None,
    )

Constructor Parameters

prompt

str

required

The natural language prompt describing what information to extract from the page, including text and images.

source

str

required

The source to scrape. Can be:

A URL starting with http:// or https://
A local directory path for offline HTML files

config

dict

required

Configuration parameters for the graph. Must include:

llm: LLM configuration with vision support (e.g., {"model": "openai/gpt-4o"})

Optional parameters:

max_images (int): Maximum number of images to process (default: 5)
verbose (bool): Enable detailed logging
headless (bool): Run browser in headless mode
additional_info (str): Extra context for the LLM
loader_kwargs (dict): Parameters for page loading
storage_state (str): Browser state file path

schema

Type[BaseModel]

default:"None"

Optional Pydantic model defining the expected output structure.

Attributes

prompt

str

The user’s extraction prompt.

source

str

The source URL or local directory path.

config

dict

Configuration dictionary for the graph.

schema

BaseModel

Optional output schema for structured data extraction.

llm_model

object

The configured language model instance (must support vision).

max_images

int

Maximum number of images to process and analyze.

input_key

str

Either “url” or “local_dir” based on the source type.

Methods

run()

Executes the scraping process including image analysis and returns the answer.

def run(self) -> str

return

str

The extracted information including text and image descriptions, or “No answer found.” if extraction fails.

Basic Usage

from scrapegraphai.graphs import OmniScraperGraph

graph_config = {
    "llm": {
        "model": "openai/gpt-4o",  # Vision-capable model
        "api_key": "your-api-key"
    },
    "max_images": 5
}

omni_scraper = OmniScraperGraph(
    prompt="List all the attractions in Chioggia and describe their pictures.",
    source="https://en.wikipedia.org/wiki/Chioggia",
    config=graph_config
)

result = omni_scraper.run()
print(result)

Vision-Capable Models

OmniScraperGraph requires LLM models with vision capabilities:

OpenAI GPT-4 Vision

config = {
    "llm": {
        "model": "openai/gpt-4o",
        "api_key": "your-api-key"
    },
    "max_images": 10
}

Anthropic Claude with Vision

config = {
    "llm": {
        "model": "anthropic/claude-3-opus-20240229",
        "api_key": "your-api-key"
    },
    "max_images": 8
}

Google Gemini

config = {
    "llm": {
        "model": "google_genai/gemini-pro-vision",
        "api_key": "your-api-key"
    },
    "max_images": 5
}

Structured Output with Schema

from pydantic import BaseModel, Field
from typing import List

class ImageDescription(BaseModel):
    url: str = Field(description="Image URL")
    description: str = Field(description="What the image shows")
    relevance: str = Field(description="How it relates to the query")

class Attraction(BaseModel):
    name: str = Field(description="Attraction name")
    description: str = Field(description="Text description")
    images: List[ImageDescription] = Field(description="Related images")

class Attractions(BaseModel):
    attractions: List[Attraction]
    summary: str = Field(description="Overall summary")

omni_scraper = OmniScraperGraph(
    prompt="Extract attractions with their descriptions and images",
    source="https://en.wikipedia.org/wiki/Chioggia",
    config=graph_config,
    schema=Attractions
)

result = omni_scraper.run()

Graph Workflow

The OmniScraperGraph uses the following node pipeline:

FetchNode → ParseNode → ImageToTextNode → GenerateAnswerOmniNode

FetchNode: Fetches the web page content
ParseNode: Parses HTML and extracts image URLs (parse_urls=True)
ImageToTextNode: Analyzes images using vision model
GenerateAnswerOmniNode: Combines text and image descriptions to answer the prompt

Controlling Image Processing

Limit Number of Images

config = {
    "llm": {"model": "openai/gpt-4o"},
    "max_images": 3  # Process only first 3 images
}

omni_scraper = OmniScraperGraph(
    prompt="Describe the main product images",
    source="https://example.com/product",
    config=config
)

Process More Images

config = {
    "llm": {"model": "openai/gpt-4o"},
    "max_images": 20  # Process up to 20 images
}

omni_scraper = OmniScraperGraph(
    prompt="Analyze all gallery images and categorize them",
    source="https://example.com/gallery",
    config=config
)

Use Cases

E-commerce Product Analysis: Extract product info with image descriptions
Real Estate Listings: Scrape property details with photo analysis
Art Gallery Cataloging: Document artwork with descriptions
News Articles: Extract articles with image context
Travel Guides: Collect destination info with visual descriptions

Advanced Usage

E-commerce Product Scraping

from pydantic import BaseModel
from typing import List

class ProductImage(BaseModel):
    url: str
    shows: str = Field(description="What the image shows")
    angle: str = Field(description="Camera angle or perspective")

class Product(BaseModel):
    name: str
    price: float
    description: str
    images: List[ProductImage]
    features_visible: List[str] = Field(description="Features visible in images")

config = {
    "llm": {"model": "openai/gpt-4o"},
    "max_images": 8,
    "additional_info": "Focus on product features visible in images"
}

omni_scraper = OmniScraperGraph(
    prompt="Extract product information including detailed analysis of all product images",
    source="https://example.com/product/12345",
    config=config,
    schema=Product
)

result = omni_scraper.run()

Real Estate Listings

from pydantic import BaseModel
from typing import List

class RoomImage(BaseModel):
    room_type: str = Field(description="Type of room shown")
    description: str = Field(description="What's visible in the image")
    condition: str = Field(description="Condition/quality assessment")
    features: List[str] = Field(description="Notable features visible")

class Property(BaseModel):
    address: str
    price: str
    bedrooms: int
    bathrooms: int
    description: str
    room_images: List[RoomImage]
    property_highlights: str

config = {
    "llm": {"model": "openai/gpt-4o"},
    "max_images": 15,
    "additional_info": "Analyze room conditions and features from images"
}

omni_scraper = OmniScraperGraph(
    prompt="Extract complete property information with detailed room analysis from images",
    source="https://example.com/property/listing",
    config=config,
    schema=Property
)

result = omni_scraper.run()

Art Gallery Cataloging

from pydantic import BaseModel
from typing import List, Optional

class Artwork(BaseModel):
    title: str
    artist: str
    year: Optional[str] = None
    medium: str
    image_description: str = Field(description="Detailed description of the artwork from image")
    style: str = Field(description="Artistic style identified from image")
    colors: List[str] = Field(description="Dominant colors")
    subject: str = Field(description="Subject matter")

class Exhibition(BaseModel):
    name: str
    artworks: List[Artwork]
    curator_notes: Optional[str] = None

config = {
    "llm": {"model": "anthropic/claude-3-opus-20240229"},
    "max_images": 10,
    "additional_info": "Provide detailed art analysis including style, technique, and composition"
}

omni_scraper = OmniScraperGraph(
    prompt="Catalog all artworks with detailed visual analysis",
    source="https://example.com/exhibition",
    config=config,
    schema=Exhibition
)

result = omni_scraper.run()

Accessing Results

result = omni_scraper.run()

# Get the answer
print("Answer:", result)

# Access full state
final_state = omni_scraper.get_state()
answer = final_state.get("answer")
img_descriptions = final_state.get("img_desc")
img_urls = final_state.get("img_urls")
parsed_doc = final_state.get("parsed_doc")

print(f"\nProcessed {len(img_urls)} images")
print(f"\nImage Descriptions:")
for i, desc in enumerate(img_descriptions, 1):
    print(f"{i}. {desc}")

# Execution info
exec_info = omni_scraper.get_execution_info()
for node_info in exec_info:
    print(f"{node_info['node_name']}: {node_info['exec_time']:.2f}s")
    print(f"Tokens: {node_info['total_tokens']}")
    print(f"Cost: ${node_info['total_cost_USD']:.4f}")

Cost Considerations

Vision model API calls are typically more expensive than text-only:

# Estimate cost
final_state = omni_scraper.get_state()
num_images = len(final_state.get("img_urls", []))

# OpenAI GPT-4 Vision pricing (example)
per_image_cost = 0.01  # Approximate
estimated_image_cost = num_images * per_image_cost

print(f"Processed {num_images} images")
print(f"Estimated image analysis cost: ${estimated_image_cost:.2f}")

Performance Tips

Limit max_images: Process only necessary images to reduce cost
Use appropriate models: GPT-4o for quality, gpt-4o-mini for speed
Provide context: Use additional_info to guide image analysis
Image quality: Higher resolution images provide better analysis
Test with small batches: Start with few images and scale up

Error Handling

try:
    result = omni_scraper.run()
    
    if result == "No answer found.":
        print("Extraction failed")
        
        # Check if images were found
        final_state = omni_scraper.get_state()
        img_urls = final_state.get("img_urls", [])
        
        if not img_urls:
            print("No images found on the page")
        else:
            print(f"Found {len(img_urls)} images but analysis failed")
    else:
        print(f"Success: {result}")
        
except Exception as e:
    print(f"Error during scraping: {e}")

Comparison with SmartScraperGraph

Feature	OmniScraperGraph	SmartScraperGraph
Text Extraction	Yes	Yes
Image Analysis	Yes	No
LLM Requirement	Vision-capable	Any
Cost	Higher	Lower
Use Case	Visual content	Text content
Speed	Slower	Faster

Limitations

Requires vision-capable LLM models
More expensive than text-only scraping
Slower due to image processing
Image quality affects analysis accuracy
Some images may fail to load or process

SmartScraperGraph - Text-only web scraping
OmniSearchGraph - Search with image analysis
SmartScraperMultiGraph - Scrape multiple URLs (text only)

Documentation Index

​Overview

​Class Signature

​Constructor Parameters

​Attributes

​Methods

​run()

​Basic Usage

​Vision-Capable Models

​OpenAI GPT-4 Vision

​Anthropic Claude with Vision

​Google Gemini

​Structured Output with Schema

​Graph Workflow

​Controlling Image Processing

​Limit Number of Images

​Process More Images

​Use Cases

​Advanced Usage

​E-commerce Product Scraping

​Real Estate Listings

​Art Gallery Cataloging

​Accessing Results

​Cost Considerations

​Performance Tips

​Error Handling

​Comparison with SmartScraperGraph

​Limitations

​Related Graphs

Overview

Class Signature

Constructor Parameters

Attributes

Methods

run()

Basic Usage

Vision-Capable Models

OpenAI GPT-4 Vision

Anthropic Claude with Vision

Google Gemini

Structured Output with Schema

Graph Workflow

Controlling Image Processing

Limit Number of Images

Process More Images

Use Cases

Advanced Usage

E-commerce Product Scraping

Real Estate Listings

Art Gallery Cataloging

Accessing Results

Cost Considerations

Performance Tips

Error Handling

Comparison with SmartScraperGraph

Limitations

Related Graphs