🏢14 agents

Commercial Crawlers Bots & Crawlers

Crawlers operated by AI companies to collect training data for large language models. They scrape web content at scale, often without direct user interaction.

AI2Bot

Commercial Crawlers

Allen AI

Allen Institute for AI's research crawler for academic AI development.

UA: AI2Bot, ai2bot

Amazonbot

Commercial Crawlers

Amazon

Amazon's web crawler powering Alexa, Amazon search, and AI services.

UA: Amazonbot

Applebot-Extended

Commercial Crawlers

Apple

Apple's AI training token controlling how Applebot data is used for Apple Intelligence.

UA: Applebot-Extended

Bytespider

Commercial Crawlers

ByteDance

ByteDance's web crawler for TikTok AI and LLM training data.

UA: Bytespider, bytespider, Bytedance

CCBot

Commercial Crawlers

Common Crawl

Common Crawl's open-source web archive used by multiple AI companies for training.

UA: CCBot, ccbot

ClaudeBot

Commercial Crawlers

Anthropic

Anthropic's web crawler collecting training data for Claude models.

UA: ClaudeBot, claudebot, Claude-Web, anthropic-ai, Anthropic

3 sessions3 events

cohere-ai

Commercial Crawlers

Cohere

Cohere's web crawler for enterprise AI and language model training.

UA: cohere-ai, CohereBot

DeepSeekBot

Commercial Crawlers

DeepSeek

DeepSeek's web crawler for their open-source large language models.

UA: DeepSeekBot, deepseek

Diffbot

Commercial Crawlers

Diffbot

Diffbot's AI-powered web scraping and knowledge graph crawler.

UA: Diffbot, diffbot

Google-Extended

Commercial Crawlers

Google

Google's AI training token controlling use of Googlebot-crawled content for AI.

UA: Google-Extended

GPTBot

Commercial Crawlers

OpenAI

OpenAI's training data crawler for GPT models including ChatGPT and GPT-4.

UA: GPTBot, gptbot

ICC-Crawler

Commercial Crawlers

NICT

Japan's NICT research crawler for AI and multilingual data collection.

UA: ICC-Crawler

Meta-WebIndexer

Commercial Crawlers

Meta

Meta's web indexer for improving Meta AI search and knowledge.

UA: Meta-WebIndexer, meta-webindexer

webzio

Commercial Crawlers

Webz.io

Webz.io's data extraction crawler used by AI companies for training data.

UA: webzio

Explore Other Categories

Manage all commercial crawlers with Switch

Detect, track, and build custom response journeys for every commercial crawlers visiting your site. Five-minute setup.

Get Started Free