Five abstract bot node icons with amber pulse rings representing AI crawler activity on dark navy background
SEO

AI Bots Are Crawling Your Website Right Now: Which Ones Should You Actually Let In

By Jeroen 11 min read

Last Updated: March 2026

TL;DR: Five major AI bots are crawling your website right now. Google's AI crawler sends the most traffic back relative to crawl volume. PerplexityBot crawls heavily but delivers high-quality referral traffic. GPTBot is expensive to host but essential for ChatGPT visibility. ClaudeBot currently sends almost no traffic but is improving rapidly so blocking it now is a long-term visibility risk. Meta-ExternalAgent sends zero traffic. Block it. For all others, allow by default and monitor.


Why AI Bots Are Different from Google's Crawler

Google's main crawler (Googlebot) crawls your site, indexes your pages, and returns traffic through organic search results. The relationship is direct and well-understood. AI bots operate differently.

AI crawlers like GPTBot and ClaudeBot collect content for training datasets and real-time retrieval. When a user asks ChatGPT or Perplexity a question, the AI system may cite your content and send a user to your site. But the ratio between how much a bot crawls versus how much traffic it actually refers back to you varies enormously between platforms. Some AI bots crawl hundreds of pages for every single referral visit they generate. One sends zero referral visits regardless of how much it crawls.

Understanding these crawl-to-refer ratios is the foundation of a rational AI bot strategy. Blocking a high-referral bot to save server resources is a bad trade. Allowing a zero-referral bot to crawl freely is a waste. The data from SEOmator's 2026 GEO Data Report makes the decision clear for each bot. This connects directly to your broader answer engine optimization strategy.

The Crawl-to-Refer Ratios for Every Major AI Bot

The crawl-to-refer ratio measures how many page crawls each bot performs for every one referral visit it sends back to your site. A ratio of 5:1 means the bot crawls 5 pages for every 1 visit it refers. A ratio of 23,951:1 means the bot crawls nearly 24,000 pages for every referral visit generated.

AI Bot Platform Crawl-to-Refer Ratio Recommendation
GoogleOther Google Gemini 5:1 Allow. Best ROI of any AI crawler.
PerplexityBot Perplexity AI 111:1 Allow. High crawl cost, but referral quality is high.
GPTBot ChatGPT (OpenAI) 1,276:1 Allow for now. ChatGPT visibility requires it.
ClaudeBot Claude (Anthropic) 23,951:1 Allow. Ratio is improving. Blocking now risks future visibility.
Meta-ExternalAgent Meta AI No referral mechanism Block. Zero traffic return.

Source: SEOmator GEO Data Report 2026.

"The robots.txt decisions you make today will determine your AI search visibility for the next 2-3 years. Businesses that block AI crawlers now will be invisible when AI search becomes the primary discovery channel. The cost of allowing crawlers is minimal. The cost of blocking them is permanent."

Greg Sterling, Co-founder (Source: Near Media)
Server rack in dark data center with rows of LEDs showing varying activity levels, amber and bright indicators for high traffic bots
Different AI bots crawl your site at different rates. The ones with the highest crawl-to-refer ratios are worth letting through your robots.txt without restriction.

GoogleOther: The AI Crawler Worth Fully Trusting

GoogleOther is Google's AI crawler, separate from the standard Googlebot. It feeds Google Gemini and AI Overviews. With a 5:1 crawl-to-refer ratio, it is the most efficient AI crawler by a significant margin. Every 5 pages GoogleOther crawls, your site gets one referral visit from a Google AI surface.

Blocking GoogleOther is one of the clearest self-inflicted visibility mistakes a small business can make. Some sites have added GoogleOther blocks in robots.txt based on confusion between AI training crawlers and standard search crawlers. Do not do this. GoogleOther is the AI crawler most directly tied to local business visibility in Google Search and Google Maps AI features.

A complete local SEO audit checks robots.txt for accidental blocks on GoogleOther as part of the technical review. It is more common than you would expect.

PerplexityBot and GPTBot: Allow Both, Watch Carefully

PerplexityBot's 111:1 crawl-to-refer ratio looks expensive, but the traffic quality justifies the cost. Perplexity users are actively researching before making a decision. A visit from a Perplexity citation is typically higher intent than an average organic search visit. BOL Agency's GEO and AEO research found that AI-referred traffic converts at higher rates than standard organic traffic for local service businesses, because the user has already received a recommendation before clicking through.

GPTBot's 1,276:1 ratio is harder to justify on pure traffic math. For every 1,276 pages GPTBot crawls, it sends one referral visit. But the alternative, blocking GPTBot, means your business does not appear in ChatGPT answers at all. ChatGPT has over 100 million active users. The potential visibility loss from blocking GPTBot almost certainly outweighs the server cost of allowing it, especially for a small business site that GPTBot will not crawl aggressively.

ClaudeBot: The Long Game

ClaudeBot's 23,951:1 ratio is the worst of any bot that still has a referral mechanism. Right now, allowing ClaudeBot costs significant crawl bandwidth for almost no return traffic. But the key phrase is "right now."

Anthropic's Claude is growing rapidly as a consumer AI product. The crawl-to-refer ratio for ClaudeBot is expected to improve substantially as Claude's web-connected features expand and user adoption increases. Blocking ClaudeBot today to save server resources means that when the ratio improves and Claude becomes a meaningful traffic source, your site's content will not be indexed and you will be invisible on that platform.

The cost of allowing ClaudeBot on a typical small business site is minimal. The cost of blocking it and losing future visibility on a growing AI platform is not. Allow it, monitor the ratio quarterly, and reassess if crawl volume becomes a resource problem.

Meta-ExternalAgent: Block It

Meta-ExternalAgent is the one unambiguous block. It crawls your site and sends zero referral traffic back. There is no Meta AI product that currently sends search referral traffic to websites. The bot collects data for Meta's internal AI training and research, and your site gets nothing in return.

Blocking Meta-ExternalAgent in your robots.txt has no downside. There is no Meta AI search surface that will recommend your business to users and send them to your site. Block it with confidence.

How to Update Your robots.txt for AI Bots

Updating robots.txt is a five-minute technical change that has permanent effect. Most website platforms let you edit robots.txt directly. For Astro sites deployed on Cloudflare Pages, the robots.txt file lives in the /public directory. The DEV Community's Astro SEO guide covers the specific setup for static site deployments.

Here is the robots.txt configuration based on the crawl-to-refer ratio data:

# Standard search crawlers
User-agent: Googlebot
Allow: /

# Google AI crawler (Gemini, AI Overviews): best AI referral ratio
User-agent: GoogleOther
Allow: /

# Perplexity: allow, high referral quality
User-agent: PerplexityBot
Allow: /

# ChatGPT: allow, essential for ChatGPT visibility
User-agent: GPTBot
Allow: /

# Claude (Anthropic): allow, ratio improving
User-agent: ClaudeBot
Allow: /

User-agent: anthropic-ai
Allow: /

# Meta: block, zero referral mechanism
User-agent: Meta-ExternalAgent
Disallow: /

This configuration allows all high-value AI crawlers while blocking the one bot with zero return value. Fibr AI's LLM content optimization guide recommends reviewing your robots.txt configuration quarterly as new AI platforms launch and existing platforms update their crawler user agent strings.

Our website building service configures robots.txt correctly for AI crawler visibility from day one. If you already have a site and want to check whether your current configuration is blocking valuable AI crawlers, our local SEO audit includes a full technical review. Get your free audit.

How to Check Which AI Bots Are Crawling Your Site

Most business owners have no idea which AI bots are visiting their site or how often. Here's how to find out:

  1. Server access logs. If you have access to raw server logs (most hosting platforms provide this), search for user agent strings: GPTBot, PerplexityBot, ClaudeBot, anthropic-ai, GoogleOther, and meta-externalagent. This shows exactly which bots are crawling, how often, and which pages they visit most.
  2. Google Search Console. Under Settings, then Crawl Stats, Google now separates GoogleOther (AI crawler) from standard Googlebot. Check how many pages GoogleOther is crawling per day. If the number is zero, you may have an accidental robots.txt block.
  3. Cloudflare analytics. If your site is on Cloudflare (including Cloudflare Pages), the bot traffic analytics in the Security section show AI crawler activity with user agent breakdowns. This is the easiest method for sites already on Cloudflare.
  4. Third-party bot monitoring tools. Services like Vercel Analytics and some WordPress plugins now include AI bot traffic as a separate category. Check your analytics platform's documentation for AI crawler reporting features.

If you discover that no AI bots are crawling your site, the most common cause is a restrictive robots.txt or a server configuration that returns 403 errors to non-standard user agents. Both are fixable within minutes.

The Business Impact: Why This Matters for Small Businesses

AI search is not a future trend. It's happening now. Over 100 million people use ChatGPT monthly. Perplexity is growing rapidly as a Google alternative. Google AI Overviews appear on a significant percentage of search results pages. When someone asks an AI "who is the best plumber in Oakland?" and your business doesn't appear, that's a lead you lost to a competitor who allowed their site to be crawled and indexed.

For small businesses, the stakes are proportionally higher than for large companies. A national brand with existing media coverage will appear in AI answers regardless of their robots.txt configuration. A local business with limited online presence needs every advantage. Allowing AI crawlers, publishing structured content, and maintaining a complete Google Business Profile are the three things that determine whether AI search sends you customers or sends them to competitors.

The businesses that configure their AI crawler access correctly today will compound that advantage over the next 2-3 years as AI search market share grows. The ones that block crawlers or ignore AI search entirely will face an increasingly expensive catch-up game. For more on how to position your business for AI search visibility, see our answer engine optimization services and our guide to ranking in AI search.

What to Do Next

Here's your action plan, in priority order:

  1. Check your robots.txt right now. Visit yourdomain.com/robots.txt in your browser. If you see any "Disallow" rules for GPTBot, GoogleOther, PerplexityBot, or ClaudeBot, remove them immediately.
  2. Update robots.txt with the configuration above. Copy the recommended configuration from this article and replace your current robots.txt. This takes less than 5 minutes on any platform.
  3. Monitor crawl activity monthly. Check server logs or Google Search Console to verify AI bots are actually crawling your site after the change.
  4. Get an SEO audit. Our 200+ factor audit checks robots.txt configuration, structured data, and AI search readiness as part of the technical review. Request your free audit to see where you stand.

Frequently Asked Questions

Should I block all AI bots to save server resources?

No. Blocking all AI bots eliminates your business from AI search results on ChatGPT, Perplexity, and Gemini. The server resource cost of AI crawlers is generally minor on a typical small business site. The only AI bot worth blocking unconditionally is Meta-ExternalAgent, which has zero crawl-to-refer ratio meaning it sends no traffic back to your site regardless of crawl volume.

What happens if I block GPTBot? Will my business disappear from ChatGPT?

Blocking GPTBot prevents OpenAI from indexing your content for ChatGPT retrieval. Your business will not appear in ChatGPT answers for queries where your content would otherwise be cited. For most small businesses, the visibility risk of blocking GPTBot outweighs the server resource savings. ChatGPT has over 100 million active users. Allow GPTBot.

Does my robots.txt affect Google's regular search crawl?

Only if you add rules specifically for Googlebot. Adding rules for GPTBot, PerplexityBot, ClaudeBot, or Meta-ExternalAgent has no effect on Google's standard search crawler. Google's AI crawler (GoogleOther) is separate from Googlebot and can be managed independently. Be careful not to confuse them when editing robots.txt.

How do I check if AI bots are crawling my site?

Check your server access logs and filter by user agent strings: GPTBot, PerplexityBot, ClaudeBot, anthropic-ai, GoogleOther, and meta-externalagent. Google Search Console shows crawl data for GoogleOther separately from Googlebot under the crawl stats report. Most hosting platforms include server log access or have analytics plugins that surface bot traffic by user agent.

Will AI crawlers slow down my website?

For a typical small business site with 10-50 pages, AI crawler traffic is negligible. These bots typically crawl a few pages per day, not thousands. The performance impact on your server is effectively zero. Large sites with millions of pages may need to manage crawl rates, but that does not apply to local business websites.

How often should I review my robots.txt for AI crawlers?

Quarterly. New AI platforms launch regularly, and existing platforms update their crawler user agent strings. A quarterly check ensures you're allowing all valuable bots and blocking any new ones that offer no return. Set a calendar reminder to review robots.txt at the start of each quarter.


Sources and References

About the Author: Jeroen is the founder of Voxel Phase, an SEO and automation agency serving small businesses in the Bay Area. He specializes in technical SEO, AI search visibility, and building content systems that get cited by ChatGPT, Perplexity, and Gemini.

Our local SEO audit includes a full technical review of your robots.txt, schema markup, and AI crawler configuration. Serving businesses in San Francisco, Oakland, the Bay Area, San Jose, and Sacramento. Get your free audit.

AI bots robots.txt GPTBot ClaudeBot Perplexity GEO technical SEO AI search

Get Your Free SEO Audit

  • 200+ ranking factors analyzed
  • Prioritized action plan you can use immediately
  • Real data about your specific situation
Get Your Free SEO Audit

No commitment. No credit card.

Schedule a Call