Overview
The DeepSeek class provides integration with DeepSeek’s language models using an OpenAI-compatible API. It wraps LangChain’s ChatOpenAI class with DeepSeek-specific configuration.
DeepSeek offers competitive pricing and strong performance for reasoning tasks, making it an excellent alternative to OpenAI models.
Class Definition
from scrapegraphai.models import DeepSeek
class DeepSeek ( ChatOpenAI ):
"""
A wrapper for the ChatOpenAI class (DeepSeek uses an OpenAI-like API) that
provides default configuration and could be extended with additional methods.
Args:
llm_config (dict): Configuration parameters for the language model.
"""
Source: scrapegraphai/models/deepseek.py:8
Constructor
Parameters
DeepSeek model identifier. Common options:
deepseek-chat: General purpose chat model
deepseek-coder: Specialized for coding tasks
Your DeepSeek API key. Get one from DeepSeek Platform . The api_key parameter is automatically converted to openai_api_key internally for compatibility with the ChatOpenAI interface.
Controls randomness in responses. Range: 0.0 to 2.0.
Lower values (0.0-0.3): More deterministic, factual
Medium values (0.4-0.9): Balanced creativity
Higher values (1.0-2.0): More creative, varied
Maximum number of tokens to generate in the response.
Enable streaming responses for real-time output.
Additional parameters supported by LangChain’s ChatOpenAI class, including:
top_p: Nucleus sampling parameter
frequency_penalty: Reduce repetition
presence_penalty: Encourage topic diversity
timeout: Request timeout in seconds
Implementation Details
The DeepSeek class automatically configures the OpenAI base URL to point to DeepSeek’s API:
def __init__ ( self , ** llm_config ):
if "api_key" in llm_config:
llm_config[ "openai_api_key" ] = llm_config.pop( "api_key" )
llm_config[ "openai_api_base" ] = "https://api.deepseek.com/v1"
super (). __init__ ( ** llm_config)
Source: scrapegraphai/models/deepseek.py:18
This design:
Maps api_key to openai_api_key for consistency with other models
Sets the base URL to https://api.deepseek.com/v1
Inherits all LangChain ChatOpenAI functionality
Usage Examples
Basic Usage with SmartScraperGraph
from scrapegraphai.graphs import SmartScraperGraph
from scrapegraphai.models import DeepSeek
graph_config = {
"llm" : {
"model" : "deepseek-chat" ,
"api_key" : "your-deepseek-api-key" ,
"temperature" : 0.5
},
"verbose" : True
}
scraper = SmartScraperGraph(
prompt = "Extract all product names and prices" ,
source = "https://example.com/products" ,
config = graph_config
)
result = scraper.run()
print (result)
Direct Model Usage
from scrapegraphai.models import DeepSeek
# Initialize the model
llm = DeepSeek(
model = "deepseek-chat" ,
api_key = "your-deepseek-api-key" ,
temperature = 0.7 ,
max_tokens = 2000
)
# Use with LangChain
from langchain_core.messages import HumanMessage
messages = [
HumanMessage( content = "Explain web scraping in simple terms" )
]
response = llm.invoke(messages)
print (response.content)
Coding Tasks with DeepSeek Coder
from scrapegraphai.models import DeepSeek
# Use the specialized coding model
coder = DeepSeek(
model = "deepseek-coder" ,
api_key = "your-api-key" ,
temperature = 0.3 # Lower temperature for more precise code
)
from langchain_core.messages import HumanMessage
code_prompt = HumanMessage(
content = "Write a Python function to parse HTML tables into a list of dictionaries"
)
response = coder.invoke([code_prompt])
print (response.content)
Streaming Responses
from scrapegraphai.models import DeepSeek
from langchain_core.messages import HumanMessage
llm = DeepSeek(
model = "deepseek-chat" ,
api_key = "your-api-key" ,
streaming = True
)
messages = [HumanMessage( content = "Tell me about web scraping best practices" )]
for chunk in llm.stream(messages):
print (chunk.content, end = "" , flush = True )
Multi-Source Scraping
from scrapegraphai.graphs import SmartScraperGraph
graph_config = {
"llm" : {
"model" : "deepseek-chat" ,
"api_key" : "your-deepseek-api-key"
}
}
websites = [
"https://news.ycombinator.com" ,
"https://reddit.com/r/programming" ,
"https://dev.to"
]
results = []
for url in websites:
scraper = SmartScraperGraph(
prompt = "Extract the top 5 trending topics" ,
source = url,
config = graph_config
)
results.append(scraper.run())
for i, result in enumerate (results):
print ( f " \n Results from { websites[i] } :" )
print (result)
With Structured Output
from scrapegraphai.graphs import SmartScraperGraph
from pydantic import BaseModel, Field
from typing import List
class Product ( BaseModel ):
name: str = Field( description = "Product name" )
price: float = Field( description = "Product price in USD" )
rating: float = Field( description = "Product rating out of 5" )
class ProductList ( BaseModel ):
products: List[Product]
graph_config = {
"llm" : {
"model" : "deepseek-chat" ,
"api_key" : "your-api-key" ,
"temperature" : 0.0 # Deterministic for structured output
}
}
scraper = SmartScraperGraph(
prompt = "Extract all products with their details" ,
source = "https://example.com/shop" ,
config = graph_config,
schema = ProductList
)
result = scraper.run()
print (result) # Validated Pydantic model
Configuration Best Practices
Temperature Settings
# For factual extraction (product data, tables, etc.)
config = {
"llm" : {
"model" : "deepseek-chat" ,
"api_key" : "your-key" ,
"temperature" : 0.0 # Maximum precision
}
}
# For content summarization
config = {
"llm" : {
"model" : "deepseek-chat" ,
"api_key" : "your-key" ,
"temperature" : 0.5 # Balanced
}
}
# For creative tasks (generating descriptions, etc.)
config = {
"llm" : {
"model" : "deepseek-chat" ,
"api_key" : "your-key" ,
"temperature" : 0.9 # More creative
}
}
Cost Optimization
from scrapegraphai.models import DeepSeek
# Limit tokens to control costs
llm = DeepSeek(
model = "deepseek-chat" ,
api_key = "your-key" ,
max_tokens = 500 , # Limit response length
timeout = 30 # Fail fast on slow requests
)
# Use caching for repeated queries
from langchain.cache import InMemoryCache
from langchain.globals import set_llm_cache
set_llm_cache(InMemoryCache())
llm = DeepSeek(
model = "deepseek-chat" ,
api_key = "your-key"
)
# Subsequent identical queries will use cache
Advanced Features
Custom System Prompts
from scrapegraphai.models import DeepSeek
from langchain_core.messages import SystemMessage, HumanMessage
llm = DeepSeek(
model = "deepseek-chat" ,
api_key = "your-api-key"
)
messages = [
SystemMessage( content = "You are an expert at extracting structured data from HTML. Always return valid JSON." ),
HumanMessage( content = "Extract product data from: <html>...</html>" )
]
response = llm.invoke(messages)
Error Handling
from scrapegraphai.models import DeepSeek
from openai import OpenAIError, RateLimitError, APIError
import time
llm = DeepSeek(
model = "deepseek-chat" ,
api_key = "your-api-key" ,
max_retries = 3
)
def scrape_with_retry ( url : str , max_attempts : int = 3 ):
for attempt in range (max_attempts):
try :
scraper = SmartScraperGraph(
prompt = "Extract main content" ,
source = url,
config = { "llm" : { "model" : "deepseek-chat" , "api_key" : "your-key" }}
)
return scraper.run()
except RateLimitError:
wait_time = 2 ** attempt # Exponential backoff
print ( f "Rate limited. Waiting { wait_time } s..." )
time.sleep(wait_time)
except APIError as e:
print ( f "API error: { e } " )
if attempt == max_attempts - 1 :
raise
except Exception as e:
print ( f "Unexpected error: { e } " )
raise
raise Exception ( "Max retry attempts reached" )
Comparison with Other Models
DeepSeek vs OpenAI
Feature DeepSeek OpenAI GPT-4 API Compatibility OpenAI-compatible Native Pricing Lower cost Higher cost Reasoning Strong Very strong Coding Tasks Excellent (deepseek-coder) Excellent Context Window Varies by model Up to 128k Best For Cost-sensitive, coding Maximum quality
When to Use DeepSeek
Cost efficiency is important
Working on coding/technical content
Need good reasoning at lower price
Processing large volumes of data
API compatibility with OpenAI is beneficial
Consider alternatives when:
Maximum accuracy is critical
Need specific OpenAI features (function calling, vision, etc.)
Working with highly specialized domains
Environment Variables
For security, use environment variables for API keys:
import os
from scrapegraphai.graphs import SmartScraperGraph
graph_config = {
"llm" : {
"model" : "deepseek-chat" ,
"api_key" : os.getenv( "DEEPSEEK_API_KEY" ),
"temperature" : 0.5
}
}
scraper = SmartScraperGraph(
prompt = "Extract content" ,
source = "https://example.com" ,
config = graph_config
)
Set the environment variable:
export DEEPSEEK_API_KEY = "your-api-key-here"
Models Overview All available custom models
SmartScraperGraph Main scraping graph using LLMs
Configuration Detailed configuration guide
XAI Alternative LLM provider