Skip to main content

Overview

The DeepSeek class provides integration with DeepSeek’s language models using an OpenAI-compatible API. It wraps LangChain’s ChatOpenAI class with DeepSeek-specific configuration.
DeepSeek offers competitive pricing and strong performance for reasoning tasks, making it an excellent alternative to OpenAI models.

Class Definition

from scrapegraphai.models import DeepSeek

class DeepSeek(ChatOpenAI):
    """
    A wrapper for the ChatOpenAI class (DeepSeek uses an OpenAI-like API) that
    provides default configuration and could be extended with additional methods.
    
    Args:
        llm_config (dict): Configuration parameters for the language model.
    """
Source: scrapegraphai/models/deepseek.py:8

Constructor

DeepSeek(**llm_config)

Parameters

model
string
required
DeepSeek model identifier. Common options:
  • deepseek-chat: General purpose chat model
  • deepseek-coder: Specialized for coding tasks
api_key
string
required
Your DeepSeek API key. Get one from DeepSeek Platform.
The api_key parameter is automatically converted to openai_api_key internally for compatibility with the ChatOpenAI interface.
temperature
float
default:"0.7"
Controls randomness in responses. Range: 0.0 to 2.0.
  • Lower values (0.0-0.3): More deterministic, factual
  • Medium values (0.4-0.9): Balanced creativity
  • Higher values (1.0-2.0): More creative, varied
max_tokens
int
Maximum number of tokens to generate in the response.
streaming
bool
default:"false"
Enable streaming responses for real-time output.
**kwargs
any
Additional parameters supported by LangChain’s ChatOpenAI class, including:
  • top_p: Nucleus sampling parameter
  • frequency_penalty: Reduce repetition
  • presence_penalty: Encourage topic diversity
  • timeout: Request timeout in seconds

Implementation Details

The DeepSeek class automatically configures the OpenAI base URL to point to DeepSeek’s API:
def __init__(self, **llm_config):
    if "api_key" in llm_config:
        llm_config["openai_api_key"] = llm_config.pop("api_key")
    llm_config["openai_api_base"] = "https://api.deepseek.com/v1"
    
    super().__init__(**llm_config)
Source: scrapegraphai/models/deepseek.py:18 This design:
  1. Maps api_key to openai_api_key for consistency with other models
  2. Sets the base URL to https://api.deepseek.com/v1
  3. Inherits all LangChain ChatOpenAI functionality

Usage Examples

Basic Usage with SmartScraperGraph

from scrapegraphai.graphs import SmartScraperGraph
from scrapegraphai.models import DeepSeek

graph_config = {
    "llm": {
        "model": "deepseek-chat",
        "api_key": "your-deepseek-api-key",
        "temperature": 0.5
    },
    "verbose": True
}

scraper = SmartScraperGraph(
    prompt="Extract all product names and prices",
    source="https://example.com/products",
    config=graph_config
)

result = scraper.run()
print(result)

Direct Model Usage

from scrapegraphai.models import DeepSeek

# Initialize the model
llm = DeepSeek(
    model="deepseek-chat",
    api_key="your-deepseek-api-key",
    temperature=0.7,
    max_tokens=2000
)

# Use with LangChain
from langchain_core.messages import HumanMessage

messages = [
    HumanMessage(content="Explain web scraping in simple terms")
]

response = llm.invoke(messages)
print(response.content)

Coding Tasks with DeepSeek Coder

from scrapegraphai.models import DeepSeek

# Use the specialized coding model
coder = DeepSeek(
    model="deepseek-coder",
    api_key="your-api-key",
    temperature=0.3  # Lower temperature for more precise code
)

from langchain_core.messages import HumanMessage

code_prompt = HumanMessage(
    content="Write a Python function to parse HTML tables into a list of dictionaries"
)

response = coder.invoke([code_prompt])
print(response.content)

Streaming Responses

from scrapegraphai.models import DeepSeek
from langchain_core.messages import HumanMessage

llm = DeepSeek(
    model="deepseek-chat",
    api_key="your-api-key",
    streaming=True
)

messages = [HumanMessage(content="Tell me about web scraping best practices")]

for chunk in llm.stream(messages):
    print(chunk.content, end="", flush=True)

Multi-Source Scraping

from scrapegraphai.graphs import SmartScraperGraph

graph_config = {
    "llm": {
        "model": "deepseek-chat",
        "api_key": "your-deepseek-api-key"
    }
}

websites = [
    "https://news.ycombinator.com",
    "https://reddit.com/r/programming",
    "https://dev.to"
]

results = []
for url in websites:
    scraper = SmartScraperGraph(
        prompt="Extract the top 5 trending topics",
        source=url,
        config=graph_config
    )
    results.append(scraper.run())

for i, result in enumerate(results):
    print(f"\nResults from {websites[i]}:")
    print(result)

With Structured Output

from scrapegraphai.graphs import SmartScraperGraph
from pydantic import BaseModel, Field
from typing import List

class Product(BaseModel):
    name: str = Field(description="Product name")
    price: float = Field(description="Product price in USD")
    rating: float = Field(description="Product rating out of 5")

class ProductList(BaseModel):
    products: List[Product]

graph_config = {
    "llm": {
        "model": "deepseek-chat",
        "api_key": "your-api-key",
        "temperature": 0.0  # Deterministic for structured output
    }
}

scraper = SmartScraperGraph(
    prompt="Extract all products with their details",
    source="https://example.com/shop",
    config=graph_config,
    schema=ProductList
)

result = scraper.run()
print(result)  # Validated Pydantic model

Configuration Best Practices

Temperature Settings

# For factual extraction (product data, tables, etc.)
config = {
    "llm": {
        "model": "deepseek-chat",
        "api_key": "your-key",
        "temperature": 0.0  # Maximum precision
    }
}

# For content summarization
config = {
    "llm": {
        "model": "deepseek-chat",
        "api_key": "your-key",
        "temperature": 0.5  # Balanced
    }
}

# For creative tasks (generating descriptions, etc.)
config = {
    "llm": {
        "model": "deepseek-chat",
        "api_key": "your-key",
        "temperature": 0.9  # More creative
    }
}

Cost Optimization

from scrapegraphai.models import DeepSeek

# Limit tokens to control costs
llm = DeepSeek(
    model="deepseek-chat",
    api_key="your-key",
    max_tokens=500,  # Limit response length
    timeout=30  # Fail fast on slow requests
)

# Use caching for repeated queries
from langchain.cache import InMemoryCache
from langchain.globals import set_llm_cache

set_llm_cache(InMemoryCache())

llm = DeepSeek(
    model="deepseek-chat",
    api_key="your-key"
)

# Subsequent identical queries will use cache

Advanced Features

Custom System Prompts

from scrapegraphai.models import DeepSeek
from langchain_core.messages import SystemMessage, HumanMessage

llm = DeepSeek(
    model="deepseek-chat",
    api_key="your-api-key"
)

messages = [
    SystemMessage(content="You are an expert at extracting structured data from HTML. Always return valid JSON."),
    HumanMessage(content="Extract product data from: <html>...</html>")
]

response = llm.invoke(messages)

Error Handling

from scrapegraphai.models import DeepSeek
from openai import OpenAIError, RateLimitError, APIError
import time

llm = DeepSeek(
    model="deepseek-chat",
    api_key="your-api-key",
    max_retries=3
)

def scrape_with_retry(url: str, max_attempts: int = 3):
    for attempt in range(max_attempts):
        try:
            scraper = SmartScraperGraph(
                prompt="Extract main content",
                source=url,
                config={"llm": {"model": "deepseek-chat", "api_key": "your-key"}}
            )
            return scraper.run()
        except RateLimitError:
            wait_time = 2 ** attempt  # Exponential backoff
            print(f"Rate limited. Waiting {wait_time}s...")
            time.sleep(wait_time)
        except APIError as e:
            print(f"API error: {e}")
            if attempt == max_attempts - 1:
                raise
        except Exception as e:
            print(f"Unexpected error: {e}")
            raise
    
    raise Exception("Max retry attempts reached")

Comparison with Other Models

DeepSeek vs OpenAI

FeatureDeepSeekOpenAI GPT-4
API CompatibilityOpenAI-compatibleNative
PricingLower costHigher cost
ReasoningStrongVery strong
Coding TasksExcellent (deepseek-coder)Excellent
Context WindowVaries by modelUp to 128k
Best ForCost-sensitive, codingMaximum quality

When to Use DeepSeek

Use DeepSeek when:
  • Cost efficiency is important
  • Working on coding/technical content
  • Need good reasoning at lower price
  • Processing large volumes of data
  • API compatibility with OpenAI is beneficial
Consider alternatives when:
  • Maximum accuracy is critical
  • Need specific OpenAI features (function calling, vision, etc.)
  • Working with highly specialized domains

Environment Variables

For security, use environment variables for API keys:
import os
from scrapegraphai.graphs import SmartScraperGraph

graph_config = {
    "llm": {
        "model": "deepseek-chat",
        "api_key": os.getenv("DEEPSEEK_API_KEY"),
        "temperature": 0.5
    }
}

scraper = SmartScraperGraph(
    prompt="Extract content",
    source="https://example.com",
    config=graph_config
)
Set the environment variable:
export DEEPSEEK_API_KEY="your-api-key-here"

Models Overview

All available custom models

SmartScraperGraph

Main scraping graph using LLMs

Configuration

Detailed configuration guide

XAI

Alternative LLM provider