Glossary

Key terms and concepts related to AI agents, web crawlers, bot management, and the agentic web — explained for site managers and developers.

Agent Fingerprinting

Identifying AI agents through a combination of technical signals beyond just the user-agent string.

Agentic Web

The emerging paradigm where AI agents autonomously browse, interact with, and transact on websites.

AI Search Engine

A search platform that uses AI to generate direct answers with citations instead of traditional link results.

AI Training Crawler

A web crawler that collects content to train artificial intelligence and large language models.

Bot Detection

Techniques for identifying automated visitors versus human users on a website.

Bot Management

The practice of detecting, classifying, and controlling automated traffic on a website.

Browser Agent

An AI system that controls a real web browser to browse, interact with, and complete tasks on websites.

Content Gate

A technique that prevents automated scripts from accessing page content by requiring JavaScript execution.

Crawl Budget

The number of pages a search engine will crawl on your site within a given time period.

llms.txt

A proposed standard file (like robots.txt) that provides AI language models with a site summary and key information.

robots.txt

A text file at a website's root that tells crawlers which pages they can and cannot access.

Structured Data

Machine-readable markup (like JSON-LD) that helps search engines and AI agents understand page content.

User-Agent String

An HTTP header that identifies the software making a web request, such as a browser or crawler.

Web Crawler

An automated program that systematically browses the web to discover and index content.

Web Scraping

The automated extraction of data from websites, typically at scale.