As AI models like ChatGPT, Gemini, and Claude expand, they rely on massive amounts of web data for training. If you want to protect your intellectual property, blocking these bots is a critical step.
In this guide, we'll show you how to identify common AI crawlers and implement effective blocking strategies using robots.txt, .htaccess, and Cloudflare.
Table of Contents
Why Block AI Bots?
While search engine bots (like Googlebot) index your content to drive traffic, AI bots often harvest data to train models that might eventually compete with your site. Reasons to block include:
- IP Protection: Prevent your original research and writing from being digested by AI.
- Resource Conservation: Reduce server load caused by aggressive crawling.
- Revenue Protection: Ensure your premium content isn't bypassed via AI summaries.
1. Blocking via robots.txt
The standard way to "politely" request bots not to crawl your site is through the robots.txt file. Here is the recommended blocklist for 2026:
User-agent: GPTBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: CCBot
Disallow: /
2. Server-Level Blocking (.htaccess)
For more robust enforcement, you can block bots based on their User-Agent string directly at the server level. This prevents them from even reaching your content.
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} (GPTBot|ChatGPT|Google-Extended|ClaudeBot|PerplexityBot) [NC]
RewriteRule .* - [F,L]
3. Cloudflare AI Crawl Control
If you use Cloudflare, you have access to a one-click toggle to block all known AI bots. Go to Security > Bots and enable AI Crawler Blocking. This is by far the most effective and easiest method for most users.
Your feedback helps us improve our content for everyone.