Diffbot
Diffbot's AI-powered web scraping and knowledge graph crawler.
What is Diffbot?
Diffbot operates an AI-powered web scraping infrastructure that builds a comprehensive knowledge graph from the public web. Unlike traditional crawlers, Diffbot uses computer vision and NLP to understand page structure and extract structured data automatically.
Diffbot's technology is used by companies worldwide for competitive intelligence, lead generation, and data enrichment. Their knowledge graph contains billions of entities extracted from web pages, making it one of the largest commercial web data products.
The crawler identifies itself as "Diffbot" and visits pages to extract structured information like product details, article content, organization data, and person profiles. This data feeds into commercial APIs used by thousands of businesses.
User-Agent Strings
These are the known user-agent patterns used by Diffbot. Use them to identify this crawler in your server logs or configure robots.txt rules.
robots.txt example:
User-agent: Diffbot Disallow: /private/ Allow: /
How to Manage Diffbot
Block if you don't want structured data extraction from your pages.
Diffbot extracts product info, pricing, and organization data.
Low to moderate crawl rates.
Use Switch to identify Diffbot and serve limited content if desired.
Start managing Diffbot today
Switch detects, tracks, and lets you build custom journeys for Diffbot and 35+ other AI agents and crawlers. Set up in five minutes.
Get Started FreeRelated Agents
AI2Bot
Commercial CrawlersAllen AI
Allen Institute for AI's research crawler for academic AI development.
Amazonbot
Commercial CrawlersAmazon
Amazon's web crawler powering Alexa, Amazon search, and AI services.
Applebot-Extended
Commercial CrawlersApple
Apple's AI training token controlling how Applebot data is used for Apple Intelligence.
Bytespider
Commercial CrawlersByteDance
ByteDance's web crawler for TikTok AI and LLM training data.
CCBot
Commercial CrawlersCommon Crawl
Common Crawl's open-source web archive used by multiple AI companies for training.
ClaudeBot
Commercial CrawlersAnthropic
Anthropic's web crawler collecting training data for Claude models.