Bytespider
ByteDance's web crawler for TikTok AI and LLM training data.
What is Bytespider?
Bytespider is ByteDance's web crawler used to collect training data for their AI models and services, including TikTok's recommendation algorithms and their LLM products. ByteDance is one of the world's largest tech companies, operating TikTok and several AI research divisions.
The crawler has been observed crawling at significant rates and does not always provide full documentation about its data usage policies. It identifies itself with "Bytespider" or "Bytedance" in the user-agent string.
Bytespider has faced scrutiny for its aggressive crawling behavior and the geopolitical implications of data collection by a China-headquartered company. Site owners should make informed decisions about allowing access based on their content policies and audience.
User-Agent Strings
These are the known user-agent patterns used by Bytespider. Use them to identify this crawler in your server logs or configure robots.txt rules.
robots.txt example:
User-agent: Bytespider Disallow: /private/ Allow: /
How to Manage Bytespider
Consider geopolitical and data policy implications before allowing access.
Can be aggressive — use Switch to monitor crawl rates and patterns.
Block in robots.txt if you don't want ByteDance to use your content for AI training.
Separate from TikTok social crawler — manage independently.
Start managing Bytespider today
Switch detects, tracks, and lets you build custom journeys for Bytespider and 35+ other AI agents and crawlers. Set up in five minutes.
Get Started FreeRelated Agents
TikTok
Social CrawlersByteDance
ByteDance's crawler for TikTok link previews and embed generation.
AI2Bot
Commercial CrawlersAllen AI
Allen Institute for AI's research crawler for academic AI development.
Amazonbot
Commercial CrawlersAmazon
Amazon's web crawler powering Alexa, Amazon search, and AI services.
Applebot-Extended
Commercial CrawlersApple
Apple's AI training token controlling how Applebot data is used for Apple Intelligence.
CCBot
Commercial CrawlersCommon Crawl
Common Crawl's open-source web archive used by multiple AI companies for training.
ClaudeBot
Commercial CrawlersAnthropic
Anthropic's web crawler collecting training data for Claude models.