AI bots like ChatGPT, Gemini, Perplexity and others are used by companies to collect data from websites for training their AI models. While this can help improve AI systems, you might not want your website’s content used without permission.

In this guide, you’ll learn how to block AI bots from crawling your website easily and safely.

Table of Contents

    What Are AI Bots?

    AI bots are automated crawlers (like Googlebot or Bingbot) that visit websites to gather data. However, unlike search engine bots (which index pages to show in search results), AI bots often collect data to train artificial intelligence models.

    Common AI Crawlers

    Here are a few well-known AI bots you might want to block:

    Bot Name Company / Purpose
    GPTBotUsed by OpenAI (ChatGPT, GPT models)
    CCBotUsed by Common Crawl (large-scale web data collection)
    Anthropic-aiUsed by Claude AI
    Google-ExtendedUsed by Google to collect data for Gemini (AI model)
    FacebookBot / Meta-ExternalAgentUsed by Meta for AI research
    AmazonbotUsed by Amazon for AI and data indexing

    Why You Might Want to Block AI Bots

    Here are some reasons website owners choose to block them:

    • ❌ You don’t want your content used to train AI models
    • 🔒 You want to protect your original articles and images
    • 💰 You’re running a membership or paid content site
    • 📉 You want to control how your content appears online

    Blocking AI bots gives you more control over your website’s data.

    How to Block AI Bots Using robots.txt

    The easiest way to block AI bots is by editing your website’s robots.txt file.

    📂 What is robots.txt?

    It’s a small text file in your website’s root folder (e.g., yourwebsite.com/robots.txt) that tells bots what they can or cannot access.

    Access your robots.txt file & Add the following lines:

    User-agent: GPTBot
    Disallow: /
    
    User-agent: ChatGPT-User
    Disallow: /
    
    User-agent: Google-Extended
    Disallow: /
    
    User-agent: Gemini
    Disallow: /
    
    User-agent: PerplexityBot
    Disallow: /
    
    User-agent: ClaudeBot
    Disallow: /
    
    User-agent: AnthropicAI
    Disallow: /
    
    User-agent: CCBot
    Disallow: /
    
    User-agent: XaiCrawler
    Disallow: /
    
    User-agent: Copilot
    Disallow: /
    
    User-agent: Meta-ExternalAgent
    Disallow: /
    
    User-agent: Amazonbot
    Disallow: /
    
    User-agent: Applebot-Extended
    Disallow: /

    If you’re using:

    • WordPress – Use an SEO plugin like RankMath, Yoast SEO or All in One SEO to edit robots.txt.
    • Blogger – Go to Settings → Crawlers and indexing → Enable custom robots.txt, then paste the code above.

    ⚠️ Important Notes

    Not all bots respect robots.txt — some may still crawl your site. To fully protect your data, you may need server-level blocks using .htaccess or a firewall rule.

    Server-Level Blocking with .htaccess

    If you’re using an Apache server, you can block bots at the server level. Add this code to your .htaccess file:

    RewriteEngine On
    RewriteCond %{HTTP_USER_AGENT} (GPTBot|ChatGPT|Google-Extended|Gemini|PerplexityBot|ClaudeBot|AnthropicAI|CCBot|XaiCrawler|Copilot) [NC]
    RewriteRule .* - [F,L]

    This denies access to these bots completely — even if they ignore robots.txt.

    Block AI Bots Using Cloudflare Firewall Rules

    You can also block AI Bots using the Cloudflare CDN. You can Go to AI Crawl control and Block the Bots you want. Make sure to allow Search Engine Crawlers as it is important for Indexing your Website.

    Final Thoughts

    Blocking AI bots is an easy way to protect your website’s content from being used without permission. By updating your robots.txt or .htaccess, you can control which bots can crawl your site.


    Frequently Asked Questions (FAQ)

    What is the easiest way to block AI bots from crawling my website?

    The easiest way is to add Disallow rules for known AI user agents like GPTBot and Google-Extended in your robots.txt file.

    Does blocking AI bots in robots.txt guarantee they won't crawl my site?

    No, robots.txt is a polite request. While major players like OpenAI and Google respect it, malicious or improperly configured bots may ignore it. For stricter enforcement, use server-level blocking via .htaccess or a web application firewall (WAF).

    Should I block all AI bots?

    It depends on your goals. If you want to protect proprietary content, blocking them is smart. However, blocking them might prevent your site from being cited as a source in AI-generated search summaries (like Google SGE).


    Abhishek Dey Roy

    Written by Abhishek Dey Roy

    Abhishek Dey Roy is an SEO Consultant & Digital Strategist helping businesses scale online. He specializes in technical SEO, content strategy, and web performance optimization.

    Read More About Me →