Google Gemini Configuration

Overview

Google Gemini provides cutting-edge AI models with industry-leading context windows, perfect for scraping large documents and complex websites. Key Features:

Massive Context: Up to 2M tokens (Gemini 2.0 Pro)
Multimodal: Process text, images, and video
Fast: Gemini Flash optimized for speed
Cost-Effective: Competitive pricing
Latest Tech: Google’s newest AI capabilities

Prerequisites

Get API Key

Go to Google AI Studio
Click “Create API Key”
Copy your API key

Install ScrapeGraphAI

pip install scrapegraphai
playwright install

Set Environment Variable

export GOOGLE_API_KEY="your-api-key"

Or create a .env file:

GOOGLE_API_KEY=your-api-key

Basic Configuration

import os
from dotenv import load_dotenv
from scrapegraphai.graphs import SmartScraperGraph

load_dotenv()

graph_config = {
    "llm": {
        "api_key": os.getenv("GOOGLE_API_KEY"),
        "model": "google_genai/gemini-2.0-flash-latest",
    },
    "verbose": True,
    "headless": False,
}

smart_scraper_graph = SmartScraperGraph(
    prompt="Extract all product information",
    source="https://example.com",
    config=graph_config,
)

result = smart_scraper_graph.run()
print(result)

Available Models

Recommended
All Models
Vertex AI

Gemini 2.0 Flash (Best for Most Tasks)

graph_config = {
    "llm": {
        "api_key": os.getenv("GOOGLE_API_KEY"),
        "model": "google_genai/gemini-2.0-flash-latest",
    },
}

Context: 1M tokens
Speed: Very fast
Cost: Affordable
Best for: Most scraping tasks

Gemini 2.0 Pro (Maximum Context)

graph_config = {
    "llm": {
        "api_key": os.getenv("GOOGLE_API_KEY"),
        "model": "google_genai/gemini-2.0-pro-exp",
    },
}

Context: 2M tokens (largest available!)
Quality: Highest quality
Best for: Very large documents, complex reasoning

Complete list of Gemini models:

Model	Context	Speed	Best For
`gemini-2.0-flash-latest`	1M	Very Fast	General use
`gemini-2.0-flash-exp`	1M	Very Fast	Experimental features
`gemini-2.0-pro-exp`	2M	Fast	Maximum context
`gemini-1.5-pro-latest`	128K	Fast	Stable, reliable
`gemini-1.5-flash-latest`	128K	Very Fast	Speed priority
`gemini-pro`	128K	Fast	Legacy support

Gemini 2.0 models are recommended for new projects.

Use Gemini via Google Cloud Vertex AI:

graph_config = {
    "llm": {
        "model": "google_vertexai/gemini-2.0-flash",
        "project_id": "your-gcp-project-id",
        "location": "us-central1",
    },
}

Vertex AI Benefits:

Enterprise features
VPC integration
Better SLAs
Usage quotas

Requires Google Cloud authentication. See Vertex AI docs.

Configuration Options

Temperature

Control response randomness:

graph_config = {
    "llm": {
        "api_key": os.getenv("GOOGLE_API_KEY"),
        "model": "google_genai/gemini-2.0-flash-latest",
        "temperature": 0,  # Deterministic (recommended for scraping)
    },
}

0: Deterministic, consistent results
0.5: Balanced
1.0: Creative, varied responses

Use temperature: 0 for web scraping to ensure consistent data extraction.

Max Tokens

Limit response length:

graph_config = {
    "llm": {
        "api_key": os.getenv("GOOGLE_API_KEY"),
        "model": "google_genai/gemini-2.0-flash-latest",
        "max_tokens": 4000,  # Limit output
    },
}

Safety Settings

Control content filtering:

graph_config = {
    "llm": {
        "api_key": os.getenv("GOOGLE_API_KEY"),
        "model": "google_genai/gemini-2.0-flash-latest",
        "safety_settings": [
            {
                "category": "HARM_CATEGORY_HARASSMENT",
                "threshold": "BLOCK_NONE"
            },
            {
                "category": "HARM_CATEGORY_HATE_SPEECH",
                "threshold": "BLOCK_NONE"
            },
        ],
    },
}

Adjust safety settings if legitimate content is being blocked during scraping.

Complete Examples

import os
import json
from dotenv import load_dotenv
from scrapegraphai.graphs import SmartScraperGraph
from scrapegraphai.utils import prettify_exec_info

load_dotenv()

graph_config = {
    "llm": {
        "api_key": os.getenv("GOOGLE_API_KEY"),
        "model": "google_genai/gemini-2.0-flash-latest",
        "temperature": 0,
    },
    "verbose": True,
    "headless": False,
}

smart_scraper = SmartScraperGraph(
    prompt="Extract the main article with title, author, and date",
    source="https://www.wired.com",
    config=graph_config,
)

result = smart_scraper.run()
print(json.dumps(result, indent=4))

# Get execution info
graph_exec_info = smart_scraper.get_execution_info()
print(prettify_exec_info(graph_exec_info))

Using Vertex AI

For enterprise deployments, use Google Cloud Vertex AI:

Enable Vertex AI

Go to Google Cloud Console
Enable Vertex AI API
Set up authentication

Install Google Cloud SDK

pip install google-cloud-aiplatform

Authenticate

gcloud auth application-default login

Configure ScrapeGraphAI

graph_config = {
    "llm": {
        "model": "google_vertexai/gemini-2.0-flash",
        "project_id": "your-gcp-project-id",
        "location": "us-central1",
        "temperature": 0,
    },
}

Cost Optimization

Use Gemini Flash for Speed

Gemini Flash models are faster and cheaper:

"model": "google_genai/gemini-2.0-flash-latest"  # Faster, cheaper

Limit Max Tokens

Reduce token usage for simple tasks:

"max_tokens": 2000  # Limit response length

Use Headless Mode

Faster scraping = fewer API calls:

"headless": True  # Run browser in background

Cache Common Responses

Implement caching for frequently scraped pages:

import functools

@functools.lru_cache(maxsize=100)
def scrape_cached(url):
    return scraper.run()

Troubleshooting

API Key Invalid

Error: Invalid API keySolution:

Verify API key at Google AI Studio
Ensure key is active and not expired
Check environment variable is set:

echo $GOOGLE_API_KEY

Rate Limit Exceeded

Error: 429 Rate limit exceededSolution: Implement retry logic:

from tenacity import retry, wait_exponential, stop_after_attempt

@retry(wait=wait_exponential(min=1, max=60), stop=stop_after_attempt(5))
def scrape_with_retry():
    return scraper.run()

Content Blocked by Safety Filters

Error: Content blocked due to safety settingsSolution: Adjust safety settings:

"safety_settings": [
    {"category": "HARM_CATEGORY_HARASSMENT", "threshold": "BLOCK_NONE"},
    {"category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "BLOCK_NONE"},
    {"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "BLOCK_NONE"},
    {"category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "BLOCK_NONE"},
]

Model Not Available

Error: Model not foundSolution: Use correct model name with provider prefix:

# Correct
"model": "google_genai/gemini-2.0-flash-latest"

# Wrong
"model": "gemini-2.0-flash-latest"  # Missing prefix

Advantages of Gemini

Massive Context

Process up to 2M tokens in a single request - perfect for scraping entire websites or long documents.

Multimodal

Can process text, images, and video together for richer data extraction.

Fast

Gemini Flash models offer industry-leading speed for real-time scraping.

Cost-Effective

Competitive pricing compared to other providers, especially for large contexts.

Best Practices

Use Latest Models

Always use latest model versions:

"model": "google_genai/gemini-2.0-flash-latest"

Set Temperature to 0

For consistent scraping:

"temperature": 0

Enable Verbose Mode

During development:

"verbose": True

Handle Rate Limits

Implement exponential backoff for production use.

Next Steps

Advanced Configuration

Learn about proxy rotation and browser settings

Groq

Explore ultra-fast Groq inference

Documentation Index

​Overview

​Prerequisites

​Basic Configuration

​Available Models

​Gemini 2.0 Flash (Best for Most Tasks)

​Gemini 2.0 Pro (Maximum Context)

​Configuration Options

​Temperature

​Max Tokens

​Safety Settings

​Complete Examples

​Using Vertex AI

​Cost Optimization

​Troubleshooting

​Advantages of Gemini

Massive Context

Multimodal

Fast

Cost-Effective

​Best Practices

Use Latest Models

Set Temperature to 0

Enable Verbose Mode

Handle Rate Limits

​Next Steps

Advanced Configuration

Groq

Overview

Prerequisites

Basic Configuration

Available Models

Gemini 2.0 Flash (Best for Most Tasks)

Gemini 2.0 Pro (Maximum Context)

Configuration Options

Temperature

Max Tokens

Safety Settings

Complete Examples

Using Vertex AI

Cost Optimization

Troubleshooting

Advantages of Gemini

Best Practices

Next Steps