Overview
ScrapeGraphAI allows you to build custom scraping pipelines by composing nodes into directed acyclic graphs (DAGs). This gives you complete control over the scraping workflow, enabling you to create specialized pipelines tailored to your specific needs.Using BaseGraph
TheBaseGraph class is the foundation for creating custom scraping workflows. It manages the execution flow of interconnected nodes.
Basic Structure
Node Configuration
Input Expressions
Nodes use boolean expressions to define their input requirements:- Single input:
input="url" - OR logic:
input="url | local_dir"(accepts either) - AND logic:
input="user_prompt & parsed_doc"(requires both) - Complex:
input="user_prompt & (relevant_chunks | parsed_doc | doc)"
Node Config Dictionary
Each node accepts anode_config dictionary for customization:
Graph Execution
Theexecute() method returns a tuple:
- state: Final state dictionary with all outputs
- execution_info: List of execution metrics per node
View execution_info structure
View execution_info structure
Using GraphBuilder
TheGraphBuilder class uses natural language to automatically generate graph configurations.
Dynamic Graph Creation
Visualizing Graphs
Convert your graph to a visual diagram:Graphviz must be installed on your system. Download from graphviz.org/download.
Adding Nodes Dynamically
You can append nodes to an existing graph:Complete Example
Here’s a full example combining all concepts:~/workspace/source/examples/custom_graph/openai/custom_graph_openai.py
Best Practices
- Entry Point: Always ensure the first node in the
nodeslist matches theentry_pointparameter - Error Handling: Wrap graph execution in try-except blocks to handle node failures
- Verbose Mode: Enable
verbose: Trueduring development for detailed logging - Chunk Size: Adjust
chunk_sizebased on your LLM’s token limits - Timeouts: Set appropriate
timeoutvalues to prevent hanging requests
Next Steps
- Learn how to create custom nodes
- Explore Burr integration for advanced workflow tracking
- Check out troubleshooting tips for common issues
