Robots.txt is a standard that search engine crawlers and other bots use to determine which pages they are blocked from accessing.
Important information
Robots.txt is not a security measure, and it can’t be used to prevent access to your website. Bots can choose to ignore robots.txt, and some bots don’t even look for it at all.
Moreover, robots.txt can be ignored by search engine crawlers if a link to your site is coming from another site. Therefore, you should rely on the noindex directive instead. The SEO Framework provides toggles for these at “SEO Settings → Robots Settings → Indexing,” and you can tune this independently for each page.
Lastly, you should never use robots.txt to hide sensitive information. If you don’t want something to be public, don’t put it on the internet or employ proper authentication and authorization mechanisms. You can ask your hosting provider for help implementing these.
The robots.txt settings
In TSF v5.1, we introduced Robots.txt Settings. You can find them at “SEO Settings → Robots Settings → Robots.txt.”
TSF does not target any specific bots by default. However, we provide two lists of bots that you can block. You can find this list below.
These settings are disabled by default because we believe they can provide value. However, we understand that some users may want to block these bots from crawling their websites to save resources or to prevent scraping.
Reasons to block AI crawlers
AI crawlers are bots that scrape your content to train their AI models. These crawlers aren’t used for indexing your pages on search engines, but blocking these crawlers may prevent AI chatbots from referencing your website.
If you write creatively, consider blocking these bots to prevent them from learning your writing style and using it to generate content for others. Still, your creative writing can indirectly bring joy to others through the AI models. We cannot solve this dilemma, but we can provide a checkbox.
However, if you want to share facts with the world, consider allowing these bots to crawl your website to help them train and provide accurate information. This way, everyone can benefit from your knowledge, even though you may not be credited for it.
List of AI crawlers blocked
The list below shows what the latest version of The SEO Framework blocks when blocking AI crawlers. It isn’t exhaustive because not all bots look for robots.txt. We’ll update the list as we learn more about bots that fit this category.
- Amazonbot by Amazon
- Applebot-Extended by Apple
- CCBot by Common Crawl
- ClaudeBot by Anthropic
- GPTBot by OpenAI
- Google-Extended by Google
- GoogleOther by Google
- Meta-ExternalAgent by Meta
- FacebookBot by Meta
Reasons to block SEO crawlers
SEO crawler bots work by analyzing your website’s structure, content, and backlinks to provide SEO insights. These bots are not necessary for indexing your website; you can safely block them without affecting your ranking.
Notably, these bots can provide anyone with insights into your website’s SEO performance, which can be used to compete against you.
If you don’t use any of the SEO services these bots facilitate, you should consider blocking them to prevent them from scraping your content. This way, you can save resources and prevent your content from being used to provide SEO insights to others.
We would have blocked SEO crawlers by default if it didn’t impact historical data that users onboarding with their SEO services might find useful.
List of SEO crawlers blocked
The list below shows what the latest version of The SEO Framework blocks when blocking SEO crawlers. This list isn’t exhaustive because not all bots look for robots.txt. We’ll update the list as we learn more about bots that fit this category.
- AhrefsBot by Ahrefs
- AhrefsSiteAudit by Ahrefs
- barkrowler by Babbar
- DataForSeoBot by DataForSEO
- dotbot by Moz
- rogerbot by Moz
- SemrushBot by SEMrush
- SiteAuditBot by SEMrush
- SemrushBot-BA by SEMrush
Adjusting the default blocklist
TSF provides a filter to adjust the blocklist. You can use this filter to remove bots from the blocklist or add new bots to it.
The filter is called the_seo_framework_robots_blocked_user_agents
. It receives two arguments: the blocklist and the type of bots being blocked. The blocklist is an associative array where the key is the bot’s name, and the value is an array with the bot’s information.
Below is an example snippet where we remove two bots from being affected by each blocklist and add one example bot to every list.
add_filter(
'the_seo_framework_robots_blocked_user_agents',
function ( $agents, $type ) {
switch ( $type ) {
case 'ai':
// Remove Amazonbot and Applebot-Extended from the AI blocklist,
// allowing them to crawl your site again, while still blocking others.
unset( $agents['Amazonbot'], $agents['Applebot-Extended'] );
// Add "ExampleBot" to the AI agent blocklist.
$agents['ExampleBot'] = [
'by' => 'Example AI',
'link' => 'https://example.com/aibot',
];
break;
case 'seo':
// Remove Ahrefs from the SEO blocklist,
// allowing them to crawl your site again, while still blocking others.
unset( $agents['AhrefsBot'], $agents['AhrefsSiteAudit'] );
// Add "ExampleBot" to the SEO agent blocklist.
$agents['ExampleBot'] = [
'by' => 'Example SEO',
'link' => 'https://example.com/seobot',
];
break;
}
return $agents;
},
10,
2,
);
If you want to learn more about filters, you can read our guide on using filters.