How to Block Diffbot
Complete guide to blocking Diffbot (Diffbot) from crawling your website using robots.txt, server configuration, and Switch workflows.
Should You Block Diffbot?
Diffbot collects data for AI model training. Blocking it prevents your content from being used in Diffbot's AI products without affecting your search visibility.
This is a common and recommended action for sites that want to control how their content is used in AI training.
Blocking Methods
1robots.txt
High for cooperative crawlersAdd a Disallow rule for Diffbot's user-agent string in your robots.txt file. This is the standard, cooperative method that well-behaved crawlers respect.
2Server-side UA filtering
HighConfigure your web server (nginx, Apache, Cloudflare) to reject requests matching Diffbot's user-agent patterns. This blocks at the network level before your application processes the request.
3Switch Journey Workflows
Highest — granular, real-time controlCreate a custom journey in Switch that detects Diffbot and routes it to a block action, challenge, redirect, or modified content — without touching your server configuration.
robots.txt — Block Diffbot
Add the following to your robots.txt file (at the root of your domain) to block Diffbot:
User-agent: Diffbot Disallow: / User-agent: diffbot Disallow: /
robots.txt — Allow with Restrictions
Alternatively, allow Diffbot on most pages while blocking specific directories:
User-agent: Diffbot Disallow: /private/ Allow: / User-agent: diffbot Disallow: /private/ Allow: /
Diffbot User-Agent Strings
Use these patterns to identify Diffbot in your server logs or firewall rules:
Frequently Asked Questions
Does blocking Diffbot affect my Google search rankings?
No. Blocking Diffbot does not affect your Google search rankings. Only blocking Googlebot impacts Google Search visibility.
Does Diffbot respect robots.txt?
Yes, Diffbot respects robots.txt directives. Adding a Disallow rule for its user-agent will prevent it from crawling blocked paths.
Can I allow Diffbot on some pages but not others?
Yes. Use robots.txt to disallow specific directories, or use Switch journey workflows for granular page-level control with conditional logic.
Go beyond robots.txt
Switch detects Diffbot in real-time and lets you build custom journey workflows — block, challenge, redirect, or serve modified content. No server changes required.
Get Started Free