Skip to main content

Overview

SpeechGraph is a scraping pipeline that scrapes the web, provides an answer to a given prompt, and generates an audio file from the extracted information. It combines web scraping with text-to-speech capabilities.

Class Signature

class SpeechGraph(AbstractGraph):
    def __init__(
        self,
        prompt: str,
        source: str,
        config: dict,
        schema: Optional[Type[BaseModel]] = None,
    )

Constructor Parameters

prompt
str
required
The natural language prompt describing what information to extract and convert to speech.
source
str
required
The source to scrape. Can be:
  • A URL starting with http:// or https://
  • A local directory path for offline HTML files
config
dict
required
Configuration parameters for the graph. Must include:
  • llm: LLM configuration (e.g., {"model": "openai/gpt-4o"})
  • tts_model: Text-to-speech model configuration
Optional parameters:
  • output_path (str): Path to save the audio file (default: “output.mp3”)
  • verbose (bool): Enable detailed logging
  • headless (bool): Run browser in headless mode
  • additional_info (str): Extra context for the LLM
schema
Type[BaseModel]
default:"None"
Optional Pydantic model defining the expected output structure.

Attributes

prompt
str
The user’s extraction prompt.
source
str
The source URL or local directory path.
config
dict
Configuration dictionary for the graph.
schema
BaseModel
Optional output schema for structured data extraction.
llm_model
object
The configured language model instance.
input_key
str
Either “url” or “local_dir” based on the source type.

Methods

run()

Executes the scraping process, generates audio, and returns the text answer.
def run(self) -> str
return
str
The extracted information as a string. The audio file is saved to disk.
Raises:
  • ValueError: If no audio was generated from the text.

Basic Usage

from scrapegraphai.graphs import SpeechGraph

graph_config = {
    "llm": {
        "model": "openai/gpt-4o",
        "api_key": "your-openai-key"
    },
    "tts_model": {
        "api_key": "your-openai-key",
        "model": "tts-1",
        "voice": "alloy"
    },
    "output_path": "summary.mp3"
}

speech_graph = SpeechGraph(
    prompt="List all the attractions in Chioggia and generate an audio summary.",
    source="https://en.wikipedia.org/wiki/Chioggia",
    config=graph_config
)

result = speech_graph.run()
print(result)  # Prints the text
# Audio is saved to summary.mp3

Text-to-Speech Configuration

Using OpenAI TTS

config = {
    "llm": {
        "model": "openai/gpt-4o",
        "api_key": "your-api-key"
    },
    "tts_model": {
        "api_key": "your-api-key",
        "model": "tts-1",        # or "tts-1-hd" for higher quality
        "voice": "alloy"         # alloy, echo, fable, onyx, nova, shimmer
    },
    "output_path": "output.mp3"
}

speech_graph = SpeechGraph(
    prompt="Summarize the key points",
    source="https://example.com/article",
    config=config
)

Available Voices

OpenAI TTS offers six voice options:
  • alloy: Neutral and balanced
  • echo: Male voice
  • fable: British accent
  • onyx: Deep male voice
  • nova: Female voice
  • shimmer: Soft female voice

Advanced Usage

Custom Output Path

import os
from datetime import datetime

# Generate timestamped filename
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
output_path = f"./audio_summaries/summary_{timestamp}.mp3"

# Ensure directory exists
os.makedirs("./audio_summaries", exist_ok=True)

config = {
    "llm": {"model": "openai/gpt-4o"},
    "tts_model": {"model": "tts-1", "voice": "nova"},
    "output_path": output_path
}

speech_graph = SpeechGraph(
    prompt="Create a brief audio summary",
    source="https://example.com",
    config=config
)

result = speech_graph.run()
print(f"Audio saved to: {output_path}")

High-Quality Audio

config = {
    "llm": {"model": "openai/gpt-4o"},
    "tts_model": {
        "model": "tts-1-hd",     # High-definition audio
        "voice": "onyx",
        "speed": 1.0             # Playback speed (0.25 to 4.0)
    },
    "output_path": "hq_summary.mp3"
}

Graph Workflow

The SpeechGraph uses the following node pipeline:
FetchNode → ParseNode → GenerateAnswerNode → TextToSpeechNode
  1. FetchNode: Fetches the web page content
  2. ParseNode: Parses and chunks the content
  3. GenerateAnswerNode: Extracts information based on the prompt
  4. TextToSpeechNode: Converts the answer to audio

Use Cases

  1. Accessibility: Convert web content to audio for visually impaired users
  2. Learning: Create audio summaries of educational content
  3. News Briefings: Generate audio news summaries
  4. Podcast Generation: Create podcast episodes from articles
  5. Audiobooks: Convert written content to audio format

Example: News Briefing

from typing import List
from pydantic import BaseModel, Field

class NewsBrief(BaseModel):
    headline: str = Field(description="Main headline")
    summary: str = Field(description="Brief summary in 2-3 sentences")
    key_points: List[str] = Field(description="3-5 key points")

config = {
    "llm": {"model": "openai/gpt-4o"},
    "tts_model": {
        "model": "tts-1",
        "voice": "nova"
    },
    "output_path": "news_brief.mp3",
    "additional_info": "Create a concise news briefing suitable for audio"
}

speech_graph = SpeechGraph(
    prompt="Create a news briefing with headline, summary, and key points",
    source="https://example.com/news-article",
    config=config,
    schema=NewsBrief
)

result = speech_graph.run()
print("Text version:", result)
print("Audio saved to: news_brief.mp3")

Example: Educational Summary

config = {
    "llm": {"model": "openai/gpt-4o"},
    "tts_model": {
        "model": "tts-1",
        "voice": "alloy"
    },
    "output_path": "lesson.mp3",
    "additional_info": "Explain concepts clearly as if teaching a student"
}

speech_graph = SpeechGraph(
    prompt="Explain the key concepts of quantum computing in simple terms",
    source="https://example.com/quantum-computing-intro",
    config=config
)

result = speech_graph.run()
print(f"Lesson audio created: lesson.mp3")

Accessing Results

result = speech_graph.run()

# Get the text answer
print("Text:", result)

# Access full state
final_state = speech_graph.get_state()
text_answer = final_state.get("answer")
audio_bytes = final_state.get("audio")

print(f"Text length: {len(text_answer)} characters")
print(f"Audio size: {len(audio_bytes)} bytes")

# Execution info
exec_info = speech_graph.get_execution_info()
for node_info in exec_info:
    print(f"{node_info['node_name']}: {node_info['exec_time']:.2f}s")

Error Handling

try:
    result = speech_graph.run()
    print(f"Success! Text: {result}")
    print(f"Audio saved to: {config['output_path']}")
    
except ValueError as e:
    if "No audio generated" in str(e):
        print("Failed to generate audio from text")
    else:
        raise
        
except Exception as e:
    print(f"Error during processing: {e}")

Cost Considerations

OpenAI TTS pricing (as of 2024):
  • tts-1: $0.015 per 1,000 characters
  • tts-1-hd: $0.030 per 1,000 characters
# Estimate cost
text_length = len(result)
cost_per_char = 0.015 / 1000  # for tts-1
estimated_cost = text_length * cost_per_char
print(f"Estimated TTS cost: ${estimated_cost:.4f}")

Performance Tips

  1. Use tts-1 for faster generation and lower cost
  2. Use tts-1-hd for higher quality audio when needed
  3. Keep prompts concise to reduce text length and TTS costs
  4. Use additional_info to guide the LLM toward audio-friendly output
  5. Test different voices to find the best fit for your use case

Limitations

  • Audio generation adds processing time and cost
  • Maximum text length depends on TTS provider limits
  • Audio quality depends on the TTS model used
  • Requires OpenAI API key with TTS access