The SEO Framework · KB

★︎ Start with TSF
  • Extensions
  • Documentation
  • Pricing
  1. Home
  2. Knowledge Base
  3. TSF
  4. Robots.txt blocks

Robots.txt blocks — Contents

  • Important information
  • The robots.txt settings
    • Blocking AI crawlers
      • AI crawlers blocked
    • Blocking SEO crawlers
      • SEO crawlers blocked
    • Adjusting the blocklist

Robots.txt blocks

Published on November 17, 2024
Revised on November 20, 2024

Robots.txt is a standard that search engine crawlers and other bots use to determine which pages they are blocked from accessing.

Important information

Robots.txt is not a security measure, and it can’t be used to prevent access to your website. Bots can choose to ignore robots.txt, and some bots don’t even look for it at all.

Moreover, robots.txt can be ignored by search engine crawlers if a link to your site is coming from another site. Therefore, you should rely on the noindex directive instead. The SEO Framework provides toggles for these at “SEO Settings → Robots Settings → Indexing,” and you can tune this independently for each page.

Lastly, you should never use robots.txt to hide sensitive information. If you don’t want something to be public, don’t put it on the internet or employ proper authentication and authorization mechanisms. You can ask your hosting provider for help implementing these.

The robots.txt settings

In TSF v5.1, we introduced Robots.txt Settings. You can find them at “SEO Settings → Robots Settings → Robots.txt.”

TSF does not target any specific bots by default. However, we provide two lists of bots that you can block. You can find this list below.

These settings are disabled by default because we believe they can provide value. However, we understand that some users may want to block these bots from crawling their websites to save resources or to prevent scraping.

Reasons to block AI crawlers

AI crawlers are bots that scrape your content to train their AI models. These crawlers aren’t used for indexing your pages on search engines, but blocking these crawlers may prevent AI chatbots from referencing your website.

If you write creatively, consider blocking these bots to prevent them from learning your writing style and using it to generate content for others. Still, your creative writing can indirectly bring joy to others through the AI models. We cannot solve this dilemma, but we can provide a checkbox.

However, if you want to share facts with the world, consider allowing these bots to crawl your website to help them train and provide accurate information. This way, everyone can benefit from your knowledge, even though you may not be credited for it.

List of AI crawlers blocked

The list below shows what the latest version of The SEO Framework blocks when blocking AI crawlers. It isn’t exhaustive because not all bots look for robots.txt. We’ll update the list as we learn more about bots that fit this category.

  • Amazonbot by Amazon
  • Applebot-Extended by Apple
  • CCBot by Common Crawl
  • ClaudeBot by Anthropic
  • GPTBot by OpenAI
  • Google-Extended by Google
  • GoogleOther by Google
  • Meta-ExternalAgent by Meta
  • FacebookBot by Meta

Reasons to block SEO crawlers

SEO crawler bots work by analyzing your website’s structure, content, and backlinks to provide SEO insights. These bots are not necessary for indexing your website; you can safely block them without affecting your ranking.

Notably, these bots can provide anyone with insights into your website’s SEO performance, which can be used to compete against you.

If you don’t use any of the SEO services these bots facilitate, you should consider blocking them to prevent them from scraping your content. This way, you can save resources and prevent your content from being used to provide SEO insights to others.

We would have blocked SEO crawlers by default if it didn’t impact historical data that users onboarding with their SEO services might find useful.

List of SEO crawlers blocked

The list below shows what the latest version of The SEO Framework blocks when blocking SEO crawlers. This list isn’t exhaustive because not all bots look for robots.txt. We’ll update the list as we learn more about bots that fit this category.

  • AhrefsBot by Ahrefs
  • AhrefsSiteAudit by Ahrefs
  • barkrowler by Babbar
  • DataForSeoBot by DataForSEO
  • dotbot by Moz
  • rogerbot by Moz
  • SemrushBot by SEMrush
  • SiteAuditBot by SEMrush
  • SemrushBot-BA by SEMrush

Adjusting the default blocklist

TSF provides a filter to adjust the blocklist. You can use this filter to remove bots from the blocklist or add new bots to it.

The filter is called the_seo_framework_robots_blocked_user_agents. It receives two arguments: the blocklist and the type of bots being blocked. The blocklist is an associative array where the key is the bot’s name, and the value is an array with the bot’s information.

Below is an example snippet where we remove two bots from being affected by each blocklist and add one example bot to every list.

add_filter(
	'the_seo_framework_robots_blocked_user_agents',
	function ( $agents, $type ) {

		switch ( $type ) {
			case 'ai':
				// Remove Amazonbot and Applebot-Extended from the AI blocklist,
				// allowing them to crawl your site again, while still blocking others.
				unset( $agents['Amazonbot'], $agents['Applebot-Extended'] );

				// Add "ExampleBot"	to the AI agent blocklist.
				$agents['ExampleBot'] = [
					'by'   => 'Example AI',
					'link' => 'https://example.com/aibot',
				];
				break;
			case 'seo':
				// Remove Ahrefs from the SEO blocklist,
				// allowing them to crawl your site again, while still blocking others.
				unset( $agents['AhrefsBot'], $agents['AhrefsSiteAudit'] );

				// Add "ExampleBot" to the SEO agent blocklist.
				$agents['ExampleBot'] = [
					'by'   => 'Example SEO',
					'link' => 'https://example.com/seobot',
				];
				break;
		}

		return $agents;
	},
	10,
	2,
);

If you want to learn more about filters, you can read our guide on using filters.

Filed Under: The SEO Framework

Other articles

  • The SEO Framework

    • Breadcrumb shortcode
    • Constant reference for The SEO Framework
    • Filter reference for The SEO Framework
    • Common plugin update issues
    • Headless mode

Commercial

The SEO Framework
Trademark of CyberWire B.V.
Leidse Schouw 2
2408 AE Alphen a/d Rijn
The Netherlands
KvK: 83230076
BTW/VAT: NL862781322B01

Twitter  GitHub

Professional

Pricing
About
Support
Press

Rational

Blog
Privacy Policy
Terms and Conditions
Refund Policy

Practical

Documentation
TSF on WordPress
TSF on GitHub
TSFEM on here
TSFEM on GitHub
Feature Highlights

In the cloud in 2025 › The SEO Framework