A sitemap is an XML-formatted file that lists links of your site, optionally with discrete information, like timestamps. It isn’t easily readable for humans–but robots, like Googlebot, can quickly digest it.
We can attach an XSL-formated stylesheet to the sitemap so that it’s easily readable for human beings. However, robots do not make use of the stylesheet.
Let’s get this out right away: A sitemap does not directly contribute to ranking — sitemaps only help with indexing pages quicker. Most sites without a sitemap perform just as well in search as sites with one in the same sector. And, if you manage content via a dynamic CMS like WordPress, you probably do not need a sitemap at all.
Below, we’ll dive into the intricate, layman-friendly details so that you can make an informed decision on why you should use a sitemap. Thereunder, we’ll explain how our sitemap works with your WordPress website.
As always, we composed this article using reliable evergreen sources, such as Google’s documentation and Bing’s documentation, and we never rely on pseudo-science or hacks spouted by self-proclaimed SEO gurus.
In this article, we cover search engines that have a 98% combined market share.
Why should I use a sitemap?
You may want to consider using a sitemap if:
- Your site is new and search engines aren’t aware of its hierarchy yet.
- Your site is large and has over 500 pages.
- Your site has many old, deeply tucked away pages.
- Your site does not have a natural hierarchy.
- Your site publishes time-sensitive, expiring content, such as news.
You don’t need a sitemap if:
- Your site is small and well-established.
- Your site has proper internal linking, such as menus, archives, and breadcrumbs.
Some SEOs advocate using sitemaps for indexing images and videos. The sitemap is antiquated and limited in this regard, and we recommend moving forward using Schema.org structured data, instead.
Your site is new: discovering all pages
Search engines crawlers (also known as “spiders”) first need to discover your site before the search engine can index it. As they crawl over your pages, they’ll need to process each page to discover more links. Then, they’ll crawl those linked pages, process those pages, discover more links, and so forth.
So, when your site is new, you can help the crawler discover all important links instantly by handing them a sitemap. This way, most of your pages will get indexed quickly.
Your site is large: prioritized crawling
Search engines keep a record of all internal and external links of-and to your website. The search engines tag a hidden priority to each page (depending on the number of backlinks and indexing state) and periodically crawl those. For instance, your homepage will be crawled more often than your contact page.
As your site grows larger, search engines won’t necessarily crawl your pages at a proportionally increased rate. So, when you add new pages, it might take longer for the crawler to discover those via internal linking. This is where a sitemap can truly help.
Larger sites often come with larger sitemaps. However, keep in mind that a bigger sitemap takes more time for search engines to parse. It’s better to have fewer items in the sitemap when you plan on posting often. For the same reason, it’s also better to have fewer sitemaps — with The SEO Framework, you’ll get two sitemaps at most: a standard “base” sitemap and a Google News sitemap.
There’s no harm in leaving pages out of the sitemap — once a search engine is aware of a page’s existence, they’ll keep it indexed (when indexable) and will always crawl it periodically.
Your site is old: maintaining intent
When your website has old, forgotten, isolated content with few backlinks, then search engines might hide that content from the search-engine-index. This is because modern search engines intend to discover your intent.
Simply put, CMSs like WordPress can generate pages for you. Add /search/any+search+query
to your site’s URL, et voilà, there’s a new page. Because most sites nowadays can create pages on-the-fly, there’s an infinite number of pages around the web, and not all of those are useful.
This is where intent comes in: Did you link to the page? How often do others link to that page? Is the link trustworthy? Is the content duplicated? Does your sitemap link to the page?
Your site is unnatural: substitute bad hierarchy
We covered above that search engines discover links automatically from your content. When your website does not link to pages via menus, breadcrumbs, archives, footers, sidebars, or via the content, then you may require a sitemap.
However, if a visitor can’t navigate your site easily, then why would a search engine? Do you want your business to rely solely on an overlord like Google? If you require a sitemap because there’s a lack of internal linking, then you’re doing it wrong or are otherwise being creative and are making the life of your visitors and yourself needlessly difficult.
Your site may also be unnatural because it’s dynamically rendered via JavaScript (animations, REST, etc.). This doesn’t mean it’s challenging to navigate for human beings, but it takes longer for search engines to process because they need to parse your site in an actual browser. A sitemap will then become an essential tool for indexing your pages quickly.
Your site publishes news: indexing ASAP
News publishers need to get their pages indexed as soon as possible. To help search engines, you can attach <lastmod>
timestamp values to each link in the sitemap. Aptly, these timestamps indicate when pages were last modified. Search engine crawlers read these, and when a timestamp is more recent than what they logged before, they’ll recrawl the page more quickly.
If your site is connected to Google News, then a Google News sitemap will help you tremendously. Google crawls this sitemap type and its content with a greater priority than anything else. To speed up parsing, they want you to limit the Google News-sitemap’s with content published only within the past two days, with at most 1 000 links. We provide a Google News sitemap via our Articles extension.
Do I need a sitemap with WordPress?
Talking about 99.99% of WordPress sites? No. WordPress does not benefit from sitemaps much. Let’s go over the benefits of the sitemap again, so that you can decide you need one.
Discovering all pages: When you start with WordPress, you won’t accumulate hundreds of pages instantly. You’d probably write at most one blog post a day (perhaps, you feel jazzy and publish an about page) — something which with search engines can easily keep up.
Prioritized crawling: WordPress makes it easy for search engines by spawning links to tag-and category-pages everywhere. Those help search engines find your latest pages and posts eventually — often within a few days. Nevertheless, if your site is over 500 pages large, a sitemap will probably help accelerate this process a bit.
Maintaining intent: If you use an (at least, our) SEO plugin and adequately categorize your pages and organize your menus so that even a human could navigate your website easily, then intent is kept, and you won’t need a sitemap.
Substitute bad hierarchy: If your site’s hierarchy is terrible, then you probably won’t need a sitemap because no one will visit anyway. However, if you use complex protocols that confuse search engines to process your site slowly, then yes, you probably need a sitemap; otherwise, you still won’t need one.
Indexing ASAP: Only when your content is of utmost evanescent importance (news or the cure for disease X), then you’ll require fast indexing. Regardless, you’re greedy, so you probably will ignore everything else postulated on this page and want a sitemap anyway. If you’re managing a news publisher’s site, then you genuinely need a sitemap and will pardon our rudeness from the sentence before.
The sitemap of The SEO Framework
The base sitemap we bring in TSF plugin is small and consists of one page only. We use one page for the reasons described above: Search engines easily and rapidly process it.
Only posts and pages are included in the sitemap of The SEO Framework; you won’t find archives, attachments, or author pages therein. We exclude those because most themes already properly provide internal linking, among technical difficulties.
The sitemaps from TSF are virtually generated via PHP and aren’t stored on your website’s drive. This feature adds support for WordPress Multisite environments, where multiple WordPress websites can work from one directory — storing the sitemap on the drive would otherwise prevent outputting unique sitemaps for each site.
Multiple sitemaps
TSF‘s sitemaps are dynamic and have unlimited sitemap endpoints, accessible from /sitemap.xml
. You can add anything to the endpoints, like /sitemap.xmlanything
.
We added this feature to automatically support any random translation plugin because that adds /es/
or ?lang=es
to WordPress URLs. You can learn more about this on our KB article about translation plugin compatbility.
We also added the /sitemap_index.xml
endpoint to support misconfigured hosts that cater to Yoast SEO. If you can access /sitemap.xml
, your website is configured correctly in this regard, and you can ignore the /sitemap_index.xml
one. We had to add this because Yoast still spreads misinformation about proper NGINX configurations. We expound on this in the NGINX support section.
Updating the sitemap
TSF caches the sitemap’s generated content in the database for 604 800 seconds (1 week). It refreshes whenever you:
- Update or publish any type of post or page;
- Update the permalink settings;
- Or update the SEO settings.
If the sitemap doesn’t update accordingly, you might have a caching plugin enabled that interferes with the sitemap’s cache. Please see if you can exclude the sitemap, or flush it periodically otherwise.
Excluding posts and pages
You can exclude posts and pages from the sitemap by applying noindex
to them.
Translation plugins
Translation plugins get a unique sitemap per language. For more information, please refer to this article.
NGINX support
NGINX is a web server meant to improve performance and scalability with statically loaded files. Since WordPress always serves its pages dynamically (even when using a caching plugin), NGINX must proxy to a PHP process, which negates NGINX’s performance benefits and adds complexity. Custom configurations of NGINX can prevent the proxy from calling WordPress, which is why you might be experiencing issues with NGINX.
NGINX’s configuration language is very powerful and straightforward […], but often people coming from other servers are not sure how things work in NGINX and just copy and paste whatever they see from a blog that seems to fill their needs. — NGINX
NGINX must be configured correctly to display dynamically created pages, including TSF’s sitemaps. We recommend relying on WordPress’s proposed NGINX configuration. This configuration will work as-is with most WordPress setups.
If you don’t know how to fix NGINX configuration issues, please reach out to your hosting provider.
The sitemap redirects
If the /sitemap.xml
endpoint redirects to /wp-sitemap.xml
or /sitemap_index.xml
, and perhaps even shows a 404-error because of that, you’ll have to remove all sitemap
-related configurations from your NGINX configuration file, which might’ve been erroneously added by your hosting provider or NGINX installation script.
After you’ve removed the misconfigurations, you may experience some of the other issues listed below. You’ll need to fix those as well.
The sitemap shows a 404-error
Some NGINX configurations instruct .xml
endpoints independently; therefore, those can’t resolve via WordPress’s /index.php
file. Because TSF relies on WordPress to display its sitemaps virtually, the sitemaps will resolve with a 404-error. To fix this issue permanently, you must remove the xml
-part from any location ~* .([...]|xml)$
configurations.
You can work around this issue by adding -alt
to the sitemap’s endpoint and submit that URL to Google and Bing’s sitemap reports — for example: https://example.com/sitemap.xml-alt
. However, we recommend foregoing this workaround and fix the cause instead.
The sitemap shows a blank page
If you notice that the sitemap shows a blank page, inspect the source (CTRL+U on Windows) of the sitemap and see if anything’s there. If you find something, disabling the styling of the sitemap should fix the issue immediately. There are multiple causes for this issue.
- You may’ve accessed the sitemap from a domain that’s not alike the one you’ve set in the General Settings of WordPress. Since XSL (XML stylesheet) files are executable scripts, they’re applicable to CORS-related directives. Make sure you’re accessing the sitemap from the same domain as configured in WordPress–even the subdomain (
www.
) and the protocol (https://
) must match. - On NGINX, you may’ve configured
.xsl
endpoints incorrecty. Apply the same fixes for NGINX listed above, but then forxsl
instead ofxml
.
If disabling the sitemap’s stylesheet doesn’t resolve the issue, then you may’ve increased the sitemap’s query-limit too high — we recommend lowering it to a query-limit of 3000 or below. And if even that doesn’t resolve the issue, then you may experience plugin or theme conflicts.
The sitemap is locked
The sitemap locking mechanism helps mitigate DoS attacks. This feature only works with sitemap transient caching enabled. Essentially, the sitemap locks itself when two or more threads try to generate the sitemap concurrently.
The lock’s duration is equal to PHP’s max-execution-time set by the server, or 3 minutes — whichever takes less time. The lock releases early when the sitemap’s generation is completed; so, the timer only indicates the “worst-case scenario.”
When a locked sitemap is presented to a search engine crawler, TSF will also hint that the search engine should try again later by sending an HTTP 503 status error.
The lock should resolve itself. When it doesn’t, you should consider lowering the sitemap’s query-limit or otherwise enable sitemap prerendering. Sitemap prerendering also mitigates issues when search engines timeout or face a locked sitemap.
Inspecting via Google Search Console
TSF prevents indexing of its sitemaps because they aren’t helpful landing pages for human visitors. So, when you inspect the sitemap via the URL inspector, Google Search Console will report that it’s not indexable and perhaps some other errors. You can safely ignore these notices.
You’ll find a “Sitemaps” report under the “Index” tab in the sidebar menu at Google Search Console. There, you can submit, manage, and inspect your sitemaps properly.
Using Core Sitemaps
WordPress 5.5 empowers every WordPress website with a native sitemap. You can activate them by deactivating TSF‘s sitemap, provided that another plugin or theme hasn’t disabled them.
We don’t believe the Core Sitemaps are beneficial for most WordPress sites. We stubbornly kept our sitemap simple; it’s easier for us to maintain and faster search engines to process. Search engines crawl your pages more quickly using TSF’s sitemap, no matter your website’s size.
Nevertheless, when using Core Sitemaps in combination with TSF, most Sitemap Settings (on the SEO Settings page) will affect the sitemap. Since TSF v5.0.5, the availability of these settings dynamically change, depending on which sitemap you choose.