Definition

What is robots.txt?

A text file at a website's root that tells crawlers which pages they can and cannot access.

robots.txt is a plain text file placed at the root of a website (e.g., example.com/robots.txt) that provides instructions to web crawlers about which pages or sections they should or shouldn't access. It follows the Robots Exclusion Protocol, a voluntary standard that cooperative crawlers respect.

The file uses simple directives: User-agent specifies which crawler the rules apply to, Disallow blocks access to specific paths, Allow permits access to paths within a disallowed directory, and Crawl-delay suggests a minimum wait between requests. A sitemap directive points crawlers to your XML sitemap.

Important limitation: robots.txt is advisory, not enforced. Well-behaved crawlers (Googlebot, GPTBot, ClaudeBot) respect it, but malicious scrapers and browser agents ignore it entirely. For enforceable access control, you need server-side solutions or tools like Switch that can block or challenge non-compliant visitors.

How Switch Helps

Switch complements robots.txt by providing enforceable access control for crawlers that ignore robots.txt, plus behavioral detection for agents that don't use identifiable user-agent strings.

Get Started Free

Back to Glossary

What is robots.txt?

How Switch Helps

Related Agents

Googlebot

GPTBot

ClaudeBot

Related Terms