Burr Integration - ScrapeGraphAI

Overview

Burr is a workflow orchestration framework that brings state machine management, tracking, and visualization to ScrapeGraphAI. It enables you to:

Track execution state across all nodes
Visualize graph execution in real-time
Debug workflow issues with detailed logs
Resume failed executions from checkpoints
Monitor performance metrics

Installation

Install ScrapeGraphAI with Burr support:

pip install scrapegraphai[burr]

Burr requires Python 3.9 or higher.

Basic Usage

Enable Burr tracking in any ScrapeGraphAI graph:

Configure Burr Parameters

from scrapegraphai.graphs import SmartScraperGraph

graph_config = {
    "llm": {
        "model": "openai/gpt-4o",
        "api_key": "your-api-key",
    },
    "verbose": True,
}

burr_config = {
    "project_name": "my_scraping_project",
    "app_instance_id": "scraper-001",
}

scraper = SmartScraperGraph(
    prompt="Extract product information including price and description",
    source="https://example.com/products",
    config=graph_config,
    use_burr=True,
    burr_config=burr_config,
)

Execute with Tracking

result = scraper.run()
print(result)

Launch Burr UI

burr

Navigate to http://localhost:7241 to view your execution.

BurrBridge Architecture

The BurrBridge class converts ScrapeGraphAI nodes into Burr actions:

~/workspace/source/scrapegraphai/integrations/burr_bridge.py

from scrapegraphai.integrations import BurrBridge
from scrapegraphai.graphs import BaseGraph
from scrapegraphai.nodes import FetchNode, ParseNode, GenerateAnswerNode

# Create your graph
graph = BaseGraph(
    nodes=[fetch_node, parse_node, generate_node],
    edges=[
        (fetch_node, parse_node),
        (parse_node, generate_node),
    ],
    entry_point=fetch_node,
)

# Wrap with BurrBridge
burr_config = {
    "project_name": "custom_graph_project",
    "app_instance_id": "custom-001",
    "inputs": {},  # Optional initial inputs
}

bridge = BurrBridge(graph, burr_config)

# Execute with tracking
initial_state = {
    "user_prompt": "Extract main content",
    "url": "https://example.com",
}

final_state = bridge.execute(initial_state=initial_state)
print(final_state["answer"])

Configuration Options

Burr Config Dictionary

burr_config = {
    # Project name for grouping related executions
    "project_name": "my_scraping_project",

    # Unique identifier for this application instance
    "app_instance_id": "scraper-001",

    # Optional: Initial inputs to pass to the graph
    "inputs": {
        "custom_param": "value",
    },
}

Custom Hooks

Burr uses hooks to track execution lifecycle events:

from burr.lifecycle import PostRunStepHook, PreRunStepHook
from burr.core import Action, State
from typing import Any

class CustomLoggingHook(PostRunStepHook, PreRunStepHook):
    """Custom hook for detailed logging."""

    def pre_run_step(
        self,
        *,
        state: State,
        action: Action,
        **future_kwargs: Any
    ):
        print(f"Starting action: {action.name}")
        print(f"State keys: {list(state.__dict__.keys())}")

    def post_run_step(
        self,
        *,
        state: State,
        action: Action,
        **future_kwargs: Any
    ):
        print(f"Finished action: {action.name}")
        print(f"Updated state keys: {list(state.__dict__.keys())}")

The PrintLnHook is included by default in BurrBridge, providing basic execution logging.

Node Bridge Implementation

The BurrNodeBridge class adapts ScrapeGraphAI nodes to Burr actions:

from scrapegraphai.integrations.burr_bridge import BurrNodeBridge

class BurrNodeBridge(Action):
    """Bridge class to convert a base graph node to a Burr action."""

    def __init__(self, node):
        super(BurrNodeBridge, self).__init__()
        self.node = node

    @property
    def reads(self) -> list[str]:
        """Returns input keys the node reads from state."""
        return parse_boolean_expression(self.node.input)

    def run(self, state: State, **run_kwargs) -> dict:
        """Execute the node with inputs from Burr state."""
        node_inputs = {key: state[key] for key in self.reads if key in state}
        result_state = self.node.execute(node_inputs, **run_kwargs)
        return result_state

    @property
    def writes(self) -> list[str]:
        """Returns output keys the node writes to state."""
        return self.node.output

    def update(self, result: dict, state: State) -> State:
        """Update Burr state with node outputs."""
        return state.update(**result)

Real-World Example

Here’s a complete example with custom graph and Burr tracking:

import os
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

from scrapegraphai.graphs import BaseGraph
from scrapegraphai.nodes import (
    FetchNode,
    ParseNode,
    RAGNode,
    GenerateAnswerNode,
)
from scrapegraphai.integrations import BurrBridge

load_dotenv()

# Configuration
graph_config = {
    "llm": {
        "api_key": os.getenv("OPENAI_APIKEY"),
        "model": "gpt-4o",
    },
}

llm_model = ChatOpenAI(graph_config["llm"])
embedder = OpenAIEmbeddings(api_key=llm_model.openai_api_key)

# Define nodes
fetch_node = FetchNode(
    input="url",
    output=["doc"],
    node_config={"verbose": True, "headless": True},
)

parse_node = ParseNode(
    input="doc",
    output=["parsed_doc"],
    node_config={"chunk_size": 4096, "verbose": True},
)

rag_node = RAGNode(
    input="user_prompt & parsed_doc",
    output=["relevant_chunks"],
    node_config={
        "llm_model": llm_model,
        "embedder_model": embedder,
        "verbose": True,
    },
)

generate_node = GenerateAnswerNode(
    input="user_prompt & relevant_chunks",
    output=["answer"],
    node_config={"llm_model": llm_model, "verbose": True},
)

# Create graph
graph = BaseGraph(
    nodes=[fetch_node, parse_node, rag_node, generate_node],
    edges=[
        (fetch_node, parse_node),
        (parse_node, rag_node),
        (rag_node, generate_node),
    ],
    entry_point=fetch_node,
)

# Burr integration
burr_config = {
    "project_name": "product_scraper",
    "app_instance_id": "run-001",
}

bridge = BurrBridge(graph, burr_config)

# Execute with tracking
initial_state = {
    "user_prompt": "List all products with their prices",
    "url": "https://example.com/shop",
}

final_state = bridge.execute(initial_state=initial_state)
print(final_state["answer"])

Visualization in Burr UI

The Burr UI provides:

Execution Graph View

Visual representation of your node graph
Highlights currently executing nodes
Shows data flow between nodes

State Inspector

View state at any point in execution
Inspect input/output for each node
Track state changes over time

Timeline View

Chronological execution history
Time spent in each node
Identify performance bottlenecks

Error Tracking

Detailed error messages and stack traces
State at time of failure
Easy debugging with state replay

Advanced Features

State Persistence

Burr automatically persists execution state:

from burr import tracking

# Configure persistent tracking
tracker = tracking.LocalTrackingClient(
    project="my_project",
    storage_dir="./burr_tracking"
)

burr_config = {
    "project_name": "persistent_scraper",
    "app_instance_id": "persistent-001",
}

Multiple Graph Instances

Track multiple concurrent scraping jobs:

import uuid

# Create unique instance IDs
for url in urls:
    instance_id = f"scraper-{uuid.uuid4()}"

    scraper = SmartScraperGraph(
        prompt="Extract data",
        source=url,
        config=graph_config,
        use_burr=True,
        burr_config={
            "project_name": "bulk_scraper",
            "app_instance_id": instance_id,
        },
    )

    result = scraper.run()

Spawning Child Applications

Burr supports hierarchical execution tracking:

from burr.core import ApplicationContext

# Parent application creates child
with ApplicationContext.get() as context:
    child_bridge = BurrBridge(child_graph, {
        "project_name": context.app_id,
        "app_instance_id": f"child-{uuid.uuid4()}",
    })

Troubleshooting

Burr UI Not Starting

# Check if Burr is installed
pip show burr

# Reinstall if needed
pip install --upgrade scrapegraphai[burr]

# Launch with custom port
burr --port 8080

State Not Updating

Ensure node outputs match expected state keys:

# In your custom node
def execute(self, state: dict) -> dict:
    # Make sure to update state with correct output keys
    state.update({self.output[0]: result})
    return state

Missing Execution Data

Verify use_burr=True is set:

# Correct
scraper = SmartScraperGraph(
    ...,
    use_burr=True,  # ✓
    burr_config={...}
)

# Incorrect - no tracking
scraper = SmartScraperGraph(
    ...,
    burr_config={...}  # ✗ Missing use_burr=True
)

Performance Considerations

Burr tracking adds a small overhead to execution time. For production high-throughput scenarios, consider:

Disabling tracking after development
Using sampling (track only N% of executions)
Configuring local storage vs. remote tracking

Best Practices

Unique Instance IDs: Use UUIDs or timestamps for app_instance_id
Project Organization: Group related scraping jobs under the same project_name
Development vs. Production: Enable Burr in development, disable in production
Storage Management: Regularly clean old tracking data to save disk space
Error Handling: Let Burr capture errors naturally - avoid swallowing exceptions

Resources

Next Steps

Build custom graphs with Burr tracking
Create custom nodes and track their execution
Explore other integrations with LangChain and LlamaIndex

Documentation Index

​Overview

​Installation

​Basic Usage

​BurrBridge Architecture

​Configuration Options

​Burr Config Dictionary

​Custom Hooks

​Node Bridge Implementation

​Real-World Example

​Visualization in Burr UI

​Execution Graph View

​State Inspector

​Timeline View

​Error Tracking

​Advanced Features

​State Persistence

​Multiple Graph Instances

​Spawning Child Applications

​Troubleshooting

​Burr UI Not Starting

​State Not Updating

​Missing Execution Data

​Performance Considerations

​Best Practices

​Resources

​Next Steps

Overview

Installation

Basic Usage

BurrBridge Architecture

Configuration Options

Burr Config Dictionary

Custom Hooks

Node Bridge Implementation

Real-World Example

Visualization in Burr UI

Execution Graph View

State Inspector

Timeline View

Error Tracking

Advanced Features

State Persistence

Multiple Graph Instances

Spawning Child Applications

Troubleshooting

Burr UI Not Starting

State Not Updating

Missing Execution Data

Performance Considerations

Best Practices

Resources

Next Steps