Overview
SpeechGraph is a unique scraping pipeline that combines web scraping with text-to-speech conversion. It scrapes a web page, generates a text answer to your prompt, and then converts that answer into an audio file.
Features
Scrape and summarize web content
Automatic text-to-speech conversion
Customizable voice and audio settings
Save audio in MP3 format
Powered by OpenAI’s TTS models
SpeechGraph requires OpenAI API and uses the tts-1 model. Ollama and other local models are not supported for audio generation.
Parameters
The SpeechGraph constructor accepts the following parameters:
SpeechGraph(
prompt: str , # Natural language description of what to extract
source: str , # URL or path to local HTML file
config: dict , # Configuration dictionary
schema: Optional[BaseModel] = None # Pydantic schema for structured output
)
Configuration Options
Parameter Type Default Description llmdict Required LLM model configuration (OpenAI) tts_modeldict Required Text-to-speech model configuration output_pathstr "output.mp3"Path where audio file will be saved verbosebool FalseEnable detailed logging headlessbool TrueRun browser in headless mode
Usage Example
import os
from dotenv import load_dotenv
from scrapegraphai.graphs import SpeechGraph
from scrapegraphai.utils import prettify_exec_info
load_dotenv()
# Define audio output path
FILE_NAME = "website_summary.mp3"
curr_dir = os.path.dirname(os.path.realpath( __file__ ))
output_path = os.path.join(curr_dir, FILE_NAME )
# Define the configuration
openai_key = os.getenv( "OPENAI_API_KEY" )
graph_config = {
"llm" : {
"api_key" : openai_key,
"model" : "openai/gpt-4o" ,
"temperature" : 0.7 ,
},
"tts_model" : {
"api_key" : openai_key,
"model" : "tts-1" ,
"voice" : "alloy" # Options: alloy, echo, fable, onyx, nova, shimmer
},
"output_path" : output_path,
}
# Create the SpeechGraph instance
speech_graph = SpeechGraph(
prompt = "Make a detailed audio summary of the projects." ,
source = "https://perinim.github.io/projects/" ,
config = graph_config,
)
# Run the graph
result = speech_graph.run()
print (result) # Prints the text that was converted to speech
# Get execution info
graph_exec_info = speech_graph.get_execution_info()
print (prettify_exec_info(graph_exec_info))
Voice Options
OpenAI TTS offers 6 different voices:
Voice Description Best For alloyNeutral, balanced General purpose echoMale, clear Professional content fableBritish, expressive Storytelling onyxDeep, authoritative News, announcements novaFemale, warm Friendly content shimmerSoft, gentle Calm narration
graph_config = {
"llm" : { "api_key" : openai_key, "model" : "openai/gpt-4o" },
"tts_model" : {
"api_key" : openai_key,
"model" : "tts-1" ,
"voice" : "nova" # Choose your preferred voice
},
"output_path" : "output.mp3" ,
}
TTS Models
OpenAI offers two TTS models:
tts-1 : Optimized for speed (recommended)
tts-1-hd : Higher quality but slower
"tts_model" : {
"api_key" : openai_key,
"model" : "tts-1-hd" , # Use HD for higher quality
"voice" : "alloy"
}
Custom Output Path
Specify where to save the generated audio file:
import os
# Save in current directory
output_path = os.path.join(os.getcwd(), "my_summary.mp3" )
# Save in specific directory
output_path = "/path/to/audio/summary.mp3"
graph_config = {
"llm" : { "api_key" : openai_key, "model" : "openai/gpt-4o" },
"tts_model" : { "api_key" : openai_key, "model" : "tts-1" , "voice" : "alloy" },
"output_path" : output_path,
}
How It Works
Fetch : Downloads the web page content
Parse : Parses HTML into structured text
Generate : Creates text answer based on your prompt
Convert : Converts text to speech using OpenAI TTS
Save : Saves audio file to specified path
The run() method returns the text content that was converted to speech:
result = speech_graph.run()
# Returns: String containing the text summary
# Side effect: Saves audio file to output_path
print (result) # The text that was converted
# Audio file saved at: output_path
Example: Podcast Summary
graph_config = {
"llm" : {
"api_key" : os.getenv( "OPENAI_API_KEY" ),
"model" : "openai/gpt-4o" ,
"temperature" : 0.8 , # More creative
},
"tts_model" : {
"api_key" : os.getenv( "OPENAI_API_KEY" ),
"model" : "tts-1" ,
"voice" : "echo" # Professional male voice
},
"output_path" : "podcast_summary.mp3" ,
}
speech_graph = SpeechGraph(
prompt = "Create an engaging podcast-style summary of this article, highlighting the key insights and why they matter." ,
source = "https://example.com/article" ,
config = graph_config,
)
result = speech_graph.run()
print ( f "Audio saved! Duration: ~ { len (result.split()) / 150 :.1f} minutes" )
Error Handling
try :
result = speech_graph.run()
if result:
print ( "Audio generated successfully!" )
print ( f "Text: { result } " )
print ( f "Audio saved to: { graph_config[ 'output_path' ] } " )
else :
print ( "No content generated" )
except ValueError as e:
print ( f "Audio generation error: { e } " )
except Exception as e:
print ( f "Error during scraping: { e } " )
Cost Considerations:
OpenAI GPT-4: ~$0.01-0.03 per request (depending on content length)
OpenAI TTS: ~$0.015 per 1,000 characters
Average article: $0.02-0.10 total cost
Audio Quality:
tts-1: Good quality, fast generation (~1 second per 1,000 characters)
tts-1-hd: Better quality, slower (~2-3 seconds per 1,000 characters)
Use Cases
Content Accessibility : Convert articles to audio for visually impaired users
Commuting : Create audio summaries for listening while driving
Learning : Generate audio study materials from documentation
Podcasting : Create audio content from written sources
News Briefings : Daily audio summaries of news articles
Local HTML Files
You can also convert local HTML files to audio:
speech_graph = SpeechGraph(
prompt = "Summarize this document in an engaging way" ,
source = "/path/to/local/document.html" ,
config = graph_config,
)
result = speech_graph.run()
SmartScraperGraph Text-only scraping without audio
OmniScraperGraph Scrape with image analysis