XAI (Grok) - ScrapeGraphAI

Overview

The XAI class provides integration with xAI’s Grok language models using an OpenAI-compatible API. It wraps LangChain’s ChatOpenAI class with xAI-specific configuration.

Grok is xAI’s conversational AI model, known for its real-time knowledge and unique personality. It’s designed to be helpful, truthful, and maximally curious.

Class Definition

from scrapegraphai.models import XAI

class XAI(ChatOpenAI):
    """
    A wrapper for the ChatOpenAI class (xAI uses an OpenAI-compatible API) that
    provides default configuration and could be extended with additional methods.
    
    Args:
        llm_config (dict): Configuration parameters for the language model.
    """

Source: scrapegraphai/models/xai.py:8

Constructor

XAI(**llm_config)

Parameters

model

string

required

xAI model identifier. Available options:

grok-beta: The main Grok model
grok-vision-beta: Grok with vision capabilities

Check xAI documentation for the latest model versions.

api_key

string

required

Your xAI API key. Sign up at x.ai to get access.

The api_key parameter is automatically converted to openai_api_key internally for compatibility with the ChatOpenAI interface.

temperature

float

default:"0.7"

Controls randomness in responses. Range: 0.0 to 2.0.

Lower values (0.0-0.3): More focused and deterministic
Medium values (0.4-0.9): Balanced creativity and coherence
Higher values (1.0-2.0): More creative and varied

max_tokens

int

Maximum number of tokens to generate in the response.

streaming

bool

default:"false"

Enable streaming responses for real-time output.

**kwargs

any

Additional parameters supported by LangChain’s ChatOpenAI class, including:

top_p: Nucleus sampling parameter
frequency_penalty: Reduce repetition
presence_penalty: Encourage topic diversity
timeout: Request timeout in seconds

Implementation Details

The XAI class automatically configures the OpenAI base URL to point to xAI’s API:

def __init__(self, **llm_config):
    if "api_key" in llm_config:
        llm_config["openai_api_key"] = llm_config.pop("api_key")
    llm_config["openai_api_base"] = "https://api.x.ai/v1"
    
    super().__init__(**llm_config)

Source: scrapegraphai/models/xai.py:18 This design:

Maps api_key to openai_api_key for consistency
Sets the base URL to https://api.x.ai/v1
Inherits all LangChain ChatOpenAI functionality
Maintains OpenAI-compatible interface

Usage Examples

Basic Usage with SmartScraperGraph

from scrapegraphai.graphs import SmartScraperGraph
from scrapegraphai.models import XAI

graph_config = {
    "llm": {
        "model": "grok-beta",
        "api_key": "your-xai-api-key",
        "temperature": 0.5
    },
    "verbose": True
}

scraper = SmartScraperGraph(
    prompt="Extract all news headlines and their categories",
    source="https://example.com/news",
    config=graph_config
)

result = scraper.run()
print(result)

Direct Model Usage

from scrapegraphai.models import XAI
from langchain_core.messages import HumanMessage

# Initialize the model
llm = XAI(
    model="grok-beta",
    api_key="your-xai-api-key",
    temperature=0.7,
    max_tokens=2000
)

# Use with LangChain
messages = [
    HumanMessage(content="Explain the key principles of web scraping ethics")
]

response = llm.invoke(messages)
print(response.content)

Streaming Responses

from scrapegraphai.models import XAI
from langchain_core.messages import HumanMessage

llm = XAI(
    model="grok-beta",
    api_key="your-xai-api-key",
    streaming=True
)

messages = [HumanMessage(content="Describe modern web scraping techniques")]

print("Grok's response: ", end="")
for chunk in llm.stream(messages):
    print(chunk.content, end="", flush=True)
print()

Real-Time Data Extraction

from scrapegraphai.graphs import SmartScraperGraph

# Grok has access to real-time information
graph_config = {
    "llm": {
        "model": "grok-beta",
        "api_key": "your-xai-api-key",
        "temperature": 0.3
    }
}

scraper = SmartScraperGraph(
    prompt="Extract trending topics and provide context about current events",
    source="https://news.example.com",
    config=graph_config
)

result = scraper.run()
print(result)

With Structured Output

from scrapegraphai.graphs import SmartScraperGraph
from pydantic import BaseModel, Field
from typing import List

class NewsArticle(BaseModel):
    headline: str = Field(description="Article headline")
    category: str = Field(description="News category")
    timestamp: str = Field(description="Publication time")
    summary: str = Field(description="Brief summary")

class NewsList(BaseModel):
    articles: List[NewsArticle]

graph_config = {
    "llm": {
        "model": "grok-beta",
        "api_key": "your-xai-api-key",
        "temperature": 0.0  # Deterministic for structured output
    }
}

scraper = SmartScraperGraph(
    prompt="Extract all news articles with metadata",
    source="https://example.com/news",
    config=graph_config,
    schema=NewsList
)

result = scraper.run()
for article in result.articles:
    print(f"Headline: {article.headline}")
    print(f"Category: {article.category}")
    print(f"Time: {article.timestamp}")
    print(f"Summary: {article.summary}")
    print("---")

Multi-Source Aggregation

from scrapegraphai.graphs import SmartScraperGraph
from typing import List, Dict

def aggregate_news(sources: List[str]) -> List[Dict]:
    """Aggregate news from multiple sources using Grok."""
    graph_config = {
        "llm": {
            "model": "grok-beta",
            "api_key": "your-xai-api-key",
            "temperature": 0.4
        }
    }
    
    results = []
    for source in sources:
        scraper = SmartScraperGraph(
            prompt="Extract top stories with context and relevance",
            source=source,
            config=graph_config
        )
        results.append({
            "source": source,
            "data": scraper.run()
        })
    
    return results

sources = [
    "https://techcrunch.com",
    "https://theverge.com",
    "https://arstechnica.com"
]

aggregated = aggregate_news(sources)
for item in aggregated:
    print(f"\nFrom {item['source']}:")
    print(item['data'])

Configuration Best Practices

Temperature Settings by Use Case

# For factual data extraction
config = {
    "llm": {
        "model": "grok-beta",
        "api_key": "your-key",
        "temperature": 0.0  # Maximum precision
    }
}

# For content analysis with insights
config = {
    "llm": {
        "model": "grok-beta",
        "api_key": "your-key",
        "temperature": 0.5  # Balanced
    }
}

# For creative content generation
config = {
    "llm": {
        "model": "grok-beta",
        "api_key": "your-key",
        "temperature": 0.9  # More creative
    }
}

Performance Optimization

from scrapegraphai.models import XAI

# Optimize for speed
llm = XAI(
    model="grok-beta",
    api_key="your-key",
    max_tokens=500,  # Limit response length
    timeout=30  # Fast timeout
)

# Optimize for quality
llm = XAI(
    model="grok-beta",
    api_key="your-key",
    temperature=0.1,  # Low variance
    max_tokens=3000  # Detailed responses
)

Advanced Features

Custom System Prompts

from scrapegraphai.models import XAI
from langchain_core.messages import SystemMessage, HumanMessage

llm = XAI(
    model="grok-beta",
    api_key="your-xai-api-key"
)

messages = [
    SystemMessage(
        content="You are a data extraction specialist. Always return valid JSON with proper field types."
    ),
    HumanMessage(
        content="Extract product information from this HTML: <html>...</html>"
    )
]

response = llm.invoke(messages)
print(response.content)

Conversation Memory

from scrapegraphai.models import XAI
from langchain_core.messages import SystemMessage, HumanMessage, AIMessage

llm = XAI(
    model="grok-beta",
    api_key="your-xai-api-key"
)

# Multi-turn conversation
conversation = [
    SystemMessage(content="You are helping with web scraping tasks."),
    HumanMessage(content="I need to scrape product prices from an e-commerce site."),
]

# First response
response1 = llm.invoke(conversation)
conversation.append(AIMessage(content=response1.content))

# Follow-up
conversation.append(
    HumanMessage(content="How do I handle pagination?")
)

response2 = llm.invoke(conversation)
print(response2.content)

Error Handling and Retries

from scrapegraphai.graphs import SmartScraperGraph
import time
from typing import Optional

def scrape_with_fallback(
    url: str,
    prompt: str,
    max_retries: int = 3
) -> Optional[dict]:
    """Scrape with exponential backoff and error handling."""
    
    for attempt in range(max_retries):
        try:
            graph_config = {
                "llm": {
                    "model": "grok-beta",
                    "api_key": "your-xai-api-key",
                    "timeout": 60
                }
            }
            
            scraper = SmartScraperGraph(
                prompt=prompt,
                source=url,
                config=graph_config
            )
            
            return scraper.run()
            
        except Exception as e:
            if attempt < max_retries - 1:
                wait_time = 2 ** attempt
                print(f"Attempt {attempt + 1} failed: {e}")
                print(f"Retrying in {wait_time}s...")
                time.sleep(wait_time)
            else:
                print(f"All attempts failed: {e}")
                return None

result = scrape_with_fallback(
    "https://example.com",
    "Extract main content"
)

Batch Processing

from scrapegraphai.models import XAI
from langchain_core.messages import HumanMessage
import concurrent.futures

def process_prompt(prompt: str) -> str:
    """Process a single prompt."""
    llm = XAI(
        model="grok-beta",
        api_key="your-xai-api-key"
    )
    response = llm.invoke([HumanMessage(content=prompt)])
    return response.content

prompts = [
    "Summarize this article: ...",
    "Extract email addresses from: ...",
    "List product features: ...",
    "Identify key dates: ..."
]

# Process in parallel
with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:
    results = list(executor.map(process_prompt, prompts))

for i, result in enumerate(results):
    print(f"\nResult {i+1}:")
    print(result)

Comparison with Other Models

XAI vs Other Providers

Feature	XAI Grok	OpenAI GPT-4	DeepSeek
Real-time data	Yes	Limited	No
Personality	Unique, curious	Professional	Technical
API compatibility	OpenAI-like	Native	OpenAI-like
Pricing	Competitive	Premium	Budget
Best for	Current events	General purpose	Code/tech

When to Use XAI Grok

Use XAI Grok when:

Need real-time or current information
Want a conversational, curious AI personality
Scraping news or trending content
Require contextual understanding of recent events
Want OpenAI-compatible API with unique features

Consider alternatives when:

Need maximum accuracy for technical tasks
Budget is primary concern
Require specialized domain knowledge
Need vision capabilities (use grok-vision-beta)

Environment Variables

For security best practices, use environment variables:

import os
from scrapegraphai.graphs import SmartScraperGraph

graph_config = {
    "llm": {
        "model": "grok-beta",
        "api_key": os.getenv("XAI_API_KEY"),
        "temperature": 0.5
    }
}

scraper = SmartScraperGraph(
    prompt="Extract content",
    source="https://example.com",
    config=graph_config
)

Set the environment variable:

export XAI_API_KEY="your-xai-api-key-here"

Or use a .env file:

# .env
XAI_API_KEY=your-xai-api-key-here

from dotenv import load_dotenv
import os

load_dotenv()

api_key = os.getenv("XAI_API_KEY")

Common Use Cases

News Aggregation

from scrapegraphai.graphs import SmartScraperGraph

graph_config = {
    "llm": {
        "model": "grok-beta",
        "api_key": "your-key",
        "temperature": 0.4
    }
}

scraper = SmartScraperGraph(
    prompt="Extract headlines with context about why they're significant",
    source="https://news.example.com",
    config=graph_config
)

result = scraper.run()

from scrapegraphai.graphs import SmartScraperGraph

graph_config = {
    "llm": {
        "model": "grok-beta",
        "api_key": "your-key"
    }
}

scraper = SmartScraperGraph(
    prompt="Analyze trending topics and sentiment",
    source="https://social-platform.com/trending",
    config=graph_config
)

trends = scraper.run()

Research Assistant

from scrapegraphai.models import XAI
from langchain_core.messages import HumanMessage

llm = XAI(
    model="grok-beta",
    api_key="your-key",
    temperature=0.6
)

research_query = HumanMessage(
    content="""Analyze this research paper abstract and:
    1. Identify key findings
    2. List methodologies used
    3. Suggest related research areas
    
    Abstract: ...
    """
)

response = llm.invoke([research_query])
print(response.content)

Models Overview

All available custom models

DeepSeek

Alternative cost-effective LLM

SmartScraperGraph

Main scraping graph using LLMs

Configuration

Detailed configuration guide

Documentation Index

​Overview

​Class Definition

​Constructor

​Parameters

​Implementation Details

​Usage Examples

​Basic Usage with SmartScraperGraph

​Direct Model Usage

​Streaming Responses

​Real-Time Data Extraction

​With Structured Output

​Multi-Source Aggregation

​Configuration Best Practices

​Temperature Settings by Use Case

​Performance Optimization

​Advanced Features

​Custom System Prompts

​Conversation Memory

​Error Handling and Retries

​Batch Processing

​Comparison with Other Models

​XAI vs Other Providers

​When to Use XAI Grok

​Environment Variables

​Common Use Cases

​News Aggregation

​Social Media Analysis

​Research Assistant

​Related Resources

Models Overview

DeepSeek

SmartScraperGraph

Configuration

Overview

Class Definition

Constructor

Parameters

Implementation Details

Usage Examples

Basic Usage with SmartScraperGraph

Direct Model Usage

Streaming Responses

Real-Time Data Extraction

With Structured Output

Multi-Source Aggregation

Configuration Best Practices

Temperature Settings by Use Case

Performance Optimization

Advanced Features

Custom System Prompts

Conversation Memory

Error Handling and Retries

Batch Processing

Comparison with Other Models

XAI vs Other Providers

When to Use XAI Grok

Environment Variables

Common Use Cases

News Aggregation

Social Media Analysis

Research Assistant

Related Resources