Ollama Configuration - ScrapeGraphAI

Overview

Ollama allows you to run large language models locally on your machine. This means:

Free: No API costs
Private: Your data never leaves your computer
Fast: No network latency
Offline: Works without internet connection

Perfect for development, testing, or privacy-sensitive projects.

Prerequisites

Install Ollama

Download and install Ollama from ollama.ai:

# Download from ollama.ai or use Homebrew
brew install ollama

Start Ollama Server

ollama serve

The server will start on http://localhost:11434

Pull a Model

Download a model (e.g., Llama 3.2):

ollama pull llama3.2

First-time download may take a few minutes depending on model size.

Install ScrapeGraphAI

pip install scrapegraphai
playwright install

Basic Configuration

from scrapegraphai.graphs import SmartScraperGraph

graph_config = {
    "llm": {
        "model": "ollama/llama3.2",
        "temperature": 0,
        "model_tokens": 4096,
    },
    "verbose": True,
    "headless": False,
}

smart_scraper_graph = SmartScraperGraph(
    prompt="Find some information about the founders.",
    source="https://scrapegraphai.com/",
    config=graph_config,
)

result = smart_scraper_graph.run()
print(result)

This example is from: examples/smart_scraper_graph/ollama/smart_scraper_ollama.py

Recommended Models

Recommended
Fast & Lightweight
Specialized

Llama 3.2 (Best for Most Tasks)

graph_config = {
    "llm": {
        "model": "ollama/llama3.2",
        "temperature": 0,
        "model_tokens": 128000,
    },
}

# Pull the model
ollama pull llama3.2

Size: 7B parameters
Context: 128K tokens
RAM: ~8GB
Best for: General scraping tasks

Llama 3.3 70B (Highest Quality)

graph_config = {
    "llm": {
        "model": "ollama/llama3.3",
        "temperature": 0,
        "model_tokens": 128000,
    },
}

ollama pull llama3.3:70b

Size: 70B parameters
Context: 128K tokens
RAM: ~40GB
Best for: Complex scraping, highest accuracy

Llama 3.2 1B (Ultra Fast)

graph_config = {
    "llm": {
        "model": "ollama/llama3.2:1b",
        "temperature": 0,
        "model_tokens": 128000,
    },
}

ollama pull llama3.2:1b

RAM: ~2GB
Best for: Simple scraping, low-resource systems

Gemma 2 (Google)

graph_config = {
    "llm": {
        "model": "ollama/gemma2",
        "temperature": 0,
        "model_tokens": 128000,
    },
}

ollama pull gemma2

RAM: ~6GB
Best for: Balanced performance

Mistral (European)

graph_config = {
    "llm": {
        "model": "ollama/mistral",
        "temperature": 0,
        "format": "json",
        "model_tokens": 128000,
    },
}

ollama pull mistral

RAM: ~8GB
Best for: JSON output, European languages

Qwen 2.5 (Chinese)

graph_config = {
    "llm": {
        "model": "ollama/qwen:14b",
        "temperature": 0,
    },
}

ollama pull qwen:14b

Best for: Chinese content, multilingual

Configuration Options

Custom Base URL

If Ollama is running on a different host or port:

graph_config = {
    "llm": {
        "model": "ollama/llama3.2",
        "base_url": "http://192.168.1.100:11434",  # Remote Ollama server
        "temperature": 0,
    },
}

JSON Format Mode

Force JSON output (required for some models):

graph_config = {
    "llm": {
        "model": "ollama/mistral",
        "temperature": 0,
        "format": "json",  # Ollama needs format specified
    },
}

Some Ollama models require "format": "json" for structured output. If you get parsing errors, add this option.

Embeddings Configuration

Use local embeddings for better RAG performance:

graph_config = {
    "llm": {
        "model": "ollama/llama3.2",
        "temperature": 0,
    },
    "embeddings": {
        "model": "ollama/nomic-embed-text",
        "temperature": 0,
    },
}

# Pull the embedding model
ollama pull nomic-embed-text

Complete Examples

from scrapegraphai.graphs import SmartScraperGraph
from scrapegraphai.utils import prettify_exec_info

graph_config = {
    "llm": {
        "model": "ollama/llama3.2",
        "temperature": 0,
        "model_tokens": 4096,
    },
    "verbose": True,
    "headless": False,
}

smart_scraper_graph = SmartScraperGraph(
    prompt="Find some information about the founders.",
    source="https://scrapegraphai.com/",
    config=graph_config,
)

result = smart_scraper_graph.run()
print(result)

# Get execution info
graph_exec_info = smart_scraper_graph.get_execution_info()
print(prettify_exec_info(graph_exec_info))

Available Models

View all available models:

ollama list

Popular models for scraping:

Model	Size	Context	RAM Needed
`llama3.2:1b`	1B	128K	~2GB
`llama3.2`	7B	128K	~8GB
`llama3.3:70b`	70B	128K	~40GB
`mistral`	7B	128K	~8GB
`gemma2`	9B	128K	~6GB
`qwen:14b`	14B	32K	~10GB
`codellama`	7B	16K	~8GB
`nomic-embed-text`	-	8K	~1GB

Pull any model:

ollama pull <model-name>

Performance Tips

Use GPU Acceleration

Ollama automatically uses GPU if available. Verify with:

ollama ps

For NVIDIA GPUs, ensure CUDA is installed. For Apple Silicon, Metal is used automatically.

Increase Context Length

For long documents, increase model_tokens:

"llm": {
    "model": "ollama/llama3.2",
    "model_tokens": 128000,  # Maximum context
}

Keep Model in Memory

Ollama keeps models in memory for 5 minutes by default. Increase this:

# Set to 1 hour
export OLLAMA_KEEP_ALIVE=1h
ollama serve

Use Lighter Models for Simple Tasks

For basic scraping, use smaller models:

"model": "ollama/llama3.2:1b"  # Fast and efficient

Troubleshooting

Connection Refused

Error: Connection refused to http://localhost:11434Solution: Ensure Ollama is running:

ollama serve

Model Not Found

Error: model 'llama3.2' not foundSolution: Pull the model first:

ollama pull llama3.2

Out of Memory

Error: System runs out of RAMSolution: Use a smaller model:

"model": "ollama/llama3.2:1b"  # Only 2GB RAM

JSON Parsing Error

Error: Failed to parse JSON responseSolution: Add format parameter:

"llm": {
    "model": "ollama/mistral",
    "format": "json",  # Force JSON output
}

Advantages of Ollama

Free

No API costs - run unlimited scraping jobs

Private

Your data never leaves your machine

Fast

No network latency, especially with GPU

Offline

Works without internet connection

Next Steps

OpenAI

Compare with cloud-based OpenAI models

Advanced Config

Learn about proxy rotation and browser settings

Documentation Index

​Overview

​Prerequisites

​Basic Configuration

​Recommended Models

​Llama 3.2 (Best for Most Tasks)

​Llama 3.3 70B (Highest Quality)

​Llama 3.2 1B (Ultra Fast)

​Gemma 2 (Google)

​Mistral (European)

​Qwen 2.5 (Chinese)

​Configuration Options

​Custom Base URL

​JSON Format Mode

​Embeddings Configuration

​Complete Examples

​Available Models

​Performance Tips

​Troubleshooting

​Advantages of Ollama

Free

Private

Fast

Offline

​Next Steps

OpenAI

Advanced Config

Overview

Prerequisites

Basic Configuration

Recommended Models

Llama 3.2 (Best for Most Tasks)

Llama 3.3 70B (Highest Quality)

Llama 3.2 1B (Ultra Fast)

Gemma 2 (Google)

Mistral (European)

Qwen 2.5 (Chinese)

Configuration Options

Custom Base URL

JSON Format Mode

Embeddings Configuration

Complete Examples

Available Models

Performance Tips

Troubleshooting

Advantages of Ollama

Next Steps