Overview
ScrapeGraphAI includes an optional telemetry system that helps the development team understand usage patterns and improve the library. This page explains what data is collected, how to opt-out, and privacy considerations.What Data is Collected
The telemetry system collects anonymous usage data when graphs are executed:~/workspace/source/scrapegraphai/telemetry/telemetry.py
Data Points
Collected Data Details
Collected Data Details
- user_prompt: The prompt you provide to extract data
- json_schema: Schema used for structured extraction (if provided)
- website_content: The content extracted from the website
- llm_response: The LLM’s generated response
- llm_model: Name of the LLM model used
- url: The source URL being scraped
- anonymous_id: A random UUID (not linked to your identity)
- version: ScrapeGraphAI library version
What is NOT Collected
- Personal identifying information (PII)
- API keys or credentials
- IP addresses
- System information
- Local file paths
- Error details or stack traces
Telemetry data is sent asynchronously in a background thread and does not impact scraping performance. Failed telemetry sends are silently ignored.
How to Opt-Out
There are three ways to disable telemetry:Method 1: Environment Variable (Recommended)
Set the environment variable before running your script:Method 2: Configuration File
Create or edit~/.scrapegraphai.conf:
~/.scrapegraphai.conf
The anonymous_id is automatically generated on first use. You can keep it or remove it when disabling telemetry.
Method 3: Programmatic
Disable telemetry in your code:Telemetry Implementation
Rate Limiting
Telemetry has built-in rate limiting:Conditional Collection
Telemetry only sends data when all required fields are present:Error Handling
Telemetry never crashes your application:Privacy Considerations
Data Transmission
- Sent over HTTPS to
https://sgai-oss-tracing.onrender.com/v1/telemetry - 2-second timeout prevents hanging
- Failures are logged but don’t affect execution
Data Storage
The anonymous ID is stored locally in~/.scrapegraphai.conf:
- Is randomly generated
- Is not linked to your identity
- Helps group sessions for understanding usage patterns
- Can be removed by deleting the config file
Sensitive Data Protection
Verifying Telemetry Status
Check if telemetry is enabled:Docker and Production Environments
Docker
Disable telemetry in Dockerfile:Docker Compose
docker-compose.yml
Kubernetes
deployment.yaml
CI/CD
.github/workflows/scrape.yml
Complete Opt-Out Example
Here’s a complete script with telemetry disabled:scraper_no_telemetry.py
FAQ
Is telemetry enabled by default?
Is telemetry enabled by default?
Yes, telemetry is enabled by default but can be easily disabled using any of the methods above.
Can telemetry identify me?
Can telemetry identify me?
No. Only an anonymous random UUID is used, which is not linked to any personal information.
Does telemetry slow down scraping?
Does telemetry slow down scraping?
No. Telemetry is sent asynchronously in a background thread with a 2-second timeout, and failures are ignored.
Where is telemetry data sent?
Where is telemetry data sent?
Data is sent to
https://sgai-oss-tracing.onrender.com/v1/telemetry over HTTPS.Can I see what data is being sent?
Can I see what data is being sent?
Yes, you can inspect the payload by checking the source code at ~/workspace/source/scrapegraphai/telemetry/telemetry.py:80
Does disabling telemetry affect functionality?
Does disabling telemetry affect functionality?
No. Disabling telemetry has no impact on ScrapeGraphAI’s functionality. All features work identically.
Next Steps
- Review troubleshooting tips for common issues
- Build custom graphs with privacy in mind
- Explore integrations for production deployments
