Robots.txt blocks

Robots.txt is a standard that search engine crawlers and other bots use to determine which pages they are blocked from accessing.

Important information

Robots.txt is not a security measure, and it can’t be used to prevent access to your website. Bots can choose to ignore robots.txt, and some bots don’t even look for it at all.

Moreover, robots.txt can be ignored by search engine crawlers if a link to your site is coming from another site. Therefore, you should rely on the noindex directive instead. The SEO Framework provides toggles for these at “SEO Settings → Robots Settings → Indexing,” and you can tune this independently for each page.

Lastly, you should never use robots.txt to hide sensitive information. If you don’t want something to be public, don’t put it on the internet or employ proper authentication and authorization mechanisms. You can ask your hosting provider for help implementing these.

The robots.txt settings

In TSF v5.1, we introduced Robots.txt Settings. You can find them at “SEO Settings → Robots Settings → Robots.txt.”

The blocking settings are disabled by default because we believe the crawlers can provide value. However, we understand that some users may want to block these bots from crawling their websites to save resources or to prevent scraping.

TSF does not allow blocking any specific crawler by default, it only provides two curated sets of crawlers (AI and SEO). You can find these listed below.

Reasons to block AI crawlers

AI crawlers are bots that scrape your content to train their AI models. These crawlers aren’t used for indexing your pages on search engines, but blocking these crawlers may prevent AI chatbots from referencing your website.

If you write creatively, consider blocking these bots to prevent them from learning your writing style and using it to generate content for others. Still, your creative writing can indirectly bring joy to others through the AI models. We cannot solve this dilemma, but we can provide a checkbox.

On the other hand, if you want to share facts with the world, consider allowing these bots to crawl your website to help them train and provide accurate information. This way, everyone can benefit from your knowledge, even though you may not be credited for it.

List of AI crawlers blocked

The list below shows what the latest version of The SEO Framework blocks when blocking AI crawlers. It isn’t exhaustive because not all bots look for robots.txt. We’ll update the list as we learn more about bots that fit this category.

Amazonbot by Amazon
Applebot-Extended by Apple
CCBot by Common Crawl
ClaudeBot by Anthropic
GPTBot by OpenAI
Google-Extended by Google
GoogleOther by Google
Meta-ExternalAgent by Meta
FacebookBot by Meta

Reasons to block SEO crawlers

SEO crawler bots work by analyzing your website’s structure, content, and backlinks to provide SEO insights. These bots are not necessary for indexing your website; you can safely block them without affecting your ranking.

Notably, these bots can provide anyone with insights into your website’s SEO performance, which can be used to compete against you.

If you don’t use any of the SEO services these bots facilitate, you should consider blocking them to prevent them from scraping your content. This way, you can save resources and prevent your content from being used to provide SEO insights to others.

We would have blocked SEO crawlers by default if it didn’t impact historical data that users onboarding with their SEO services might find useful.

List of SEO crawlers blocked

The list below shows what the latest version of The SEO Framework blocks when blocking SEO crawlers. This list isn’t exhaustive because not all bots look for robots.txt. We’ll update the list as we learn more about bots that fit this category.

AhrefsBot by Ahrefs
AhrefsSiteAudit by Ahrefs
barkrowler by Babbar
DataForSeoBot by DataForSEO
dotbot by Moz
rogerbot by Moz
SemrushBot by SEMrush
SiteAuditBot by SEMrush
SemrushBot-BA by SEMrush

Adjusting the default blocklist

TSF provides a filter to adjust the blocklist. You can use this filter to remove bots from the blocklist or add new bots to it.

The filter is called the_seo_framework_robots_blocked_user_agents. It receives two arguments: the blocklist and the type of bots being blocked. The blocklist is an associative array where the key is the bot’s name, and the value is an array with the bot’s information.

Below is an example snippet where we remove two bots from being affected by each blocklist and add one example bot to every list.

add_filter(
	'the_seo_framework_robots_blocked_user_agents',
	function ( $agents, $type ) {

		switch ( $type ) {
			case 'ai':
				// Remove Amazonbot and Applebot-Extended from the AI blocklist,
				// allowing them to crawl your site again, while still blocking others.
				unset( $agents['Amazonbot'], $agents['Applebot-Extended'] );

				// Add "ExampleBot"	to the AI agent blocklist.
				$agents['ExampleBot'] = [
					'by'   => 'Example AI',
					'link' => 'https://example.com/aibot',
				];
				break;
			case 'seo':
				// Remove Ahrefs from the SEO blocklist,
				// allowing them to crawl your site again, while still blocking others.
				unset( $agents['AhrefsBot'], $agents['AhrefsSiteAudit'] );

				// Add "ExampleBot" to the SEO agent blocklist.
				$agents['ExampleBot'] = [
					'by'   => 'Example SEO',
					'link' => 'https://example.com/seobot',
				];
				break;
		}

		return $agents;
	},
	10,
	2,
);

If you want to learn more about filters, you can read our guide on using filters.

The SEO Framework · KB

Important information

The robots.txt settings

Reasons to block AI crawlers

List of AI crawlers blocked

Reasons to block SEO crawlers

List of SEO crawlers blocked

Adjusting the default blocklist

Commercial

Professional

Rational

Practical