A sitemap is an XML-formated file that lists links of your site, optionally with discrete information, like timestamps. It isn’t easily readable for humans–but robots, like Googlebot, can quickly digest it.
We can attach an XSL-formated stylesheet to the sitemap so that it’s easily readable for human beings. However, robots do not make use of the stylesheet.
Let’s get this out right away: A sitemap does not directly contribute to ranking — sitemaps only help with indexing pages quicker. Most sites without a sitemap perform just as well in search as sites with one in the same sector. And, if you manage content via a dynamic CMS like WordPress, you probably do not need a sitemap at all.
Below, we’ll dive into the intricate, layman-friendly details so that you can make an informed decision on why you should use a sitemap. Thereunder, we’ll explain how our sitemap works with your WordPress website.
As always, we composed this article using reliable evergreen sources, such as Google’s documentation and Bing’s documentation, and we never rely on pseudo-science or hacks spouted by self-proclaimed SEO gurus.
In this article, we cover search engines that have a 98% accumulative market share.
Why should I use a sitemap?
You may want to consider using a sitemap if:
- Your site is new and search engines aren’t aware of its hierarchy yet.
- Your site is large and has over 500 pages.
- Your site has many old, deeply tucked away pages.
- Your site does not have a natural hierarchy.
- Your site publishes time-sensitive, expiring content, such as news.
You don’t need a sitemap if:
- Your site is small and well-established.
- Your site has proper internal linking, such as menus, archives, and breadcrumbs.
Some SEOs advocate using sitemaps for indexing images and videos. The sitemap is antiquated and limited in this regard, and we recommend moving forward using Schema.org structured data, instead.
Your site is new: discovering all pages
Search engines crawlers (also known as “spiders”) first need to discover your site before the search engine can index it. As they crawl over your pages, they’ll need to process each page to discover more links. Then, they’ll crawl those linked pages, process those pages, discover more links, and so forth.
So, when your site is new, you can help the crawler discover all important links instantly by handing them a sitemap. This way, most of your pages will get indexed quickly.
Your site is large: prioritized crawling
Search engines keep a record of all internal and external links of and to your website. The search engines add a priority to each page (depending on the number of backlinks and indexing state), and periodically crawl those. For instance, your homepage will be crawled more often than your contact page.
As your site grows larger, search engines won’t necessarily crawl your pages at a proportionally increased rate. So, when you add new pages, it might take longer for the crawler to discover those via internal linking. This is where a sitemap can truly help.
Larger sites often come with larger sitemaps. However, keep in mind that a bigger sitemap takes more time for search engines to parse. It’s better to have fewer items in the sitemap when you plan on posting often. For the same reason, it’s also better to have fewer sitemaps — with The SEO Framework, you’ll get two sitemaps at most: a standard “base” sitemap and a Google News sitemap.
There’s no harm in leaving pages out of the sitemap — once a search engine is aware of a page’s existence, they’ll keep it indexed (when indexable) and will always crawl it periodically.
Your site is old: maintaining intent
When your website has old, forgotten, and isolated content with few backlinks, then search engines might hide the content from the search engine index. This is because modern search engines intend to discover your intent.
Simply put, CMSs like WordPress can generate pages for you. Add
/search/any+search+query to your site’s URL, et voilà, there’s a new page. Because most sites nowadays can create pages on-the-fly, there’s an infinite number of pages around the web, and not all of those are useful.
This is where intent comes in: Did you link to the page? How often do others link to that page? Is the link trustworthy? Is the content duplicated? Does your sitemap link to the page?
Your site is unnatural: making sense with hierarchy
We covered above that search engines discover links automatically from your content. When your website does not link to pages via menus, breadcrumbs, archives, footers, sidebars, or via the content, then you may require a sitemap.
However, if a visitor can’t navigate your site easily, then why would a search engine? Do you want your business to rely on an overlord like Google solely? If you require a sitemap because there’s a lack of internal linking, then you’re doing it wrong, or are otherwise being creative and are making the life of your visitors and yourself needlessly difficult.
Your site publishes news: indexing ASAP
News publishers need to get their pages indexed as soon as possible. To help search engines, you can attach
<lastmod> timestamp values to each link in the sitemap. Aptly, these timestamps indicate when pages were last modified. Search engine crawlers read these, and when a timestamp is more recent than what they logged before, they’ll recrawl the page more quickly.
When you ping to search engines that the sitemap is updated, they will immediately discover page changes via the timestamps, and they’ll add those pages high up their crawling queue.
If your site is connected to Google News, then a Google News sitemap will help you tremendously. Google crawls this sitemap type and its content with a greater priority than anything else. To speed up parsing, they want you to limit the Google News sitemap’s with content published within the past two days, with at most 1 000 links. We provide a Google News sitemap via our Articles extension.
The sitemap of The SEO Framework
The base sitemap we bring in TSF plugin is small and consists of one page only. We use one page for the reasons described above: Search engines easily and rapidly process it.
Only posts and pages are included in the sitemap of The SEO Framework; you won’t find archives, attachments, or author pages therein. We exclude those because most themes already properly provide internal linking, among technical difficulties.
The sitemaps from TSF are virtually generated via PHP and aren’t stored on your website’s drive. This feature adds support for WordPress Multisite environments, where multiple WordPress websites can work from one directory — storing the sitemap on the drive would otherwise prevent outputting unique sitemaps for each site.
Updating the sitemap
TSF caches the sitemap’s generated content in the database for 604 800 seconds (1 week). It refreshes whenever you:
- Update or publish any type of post or page;
- Update the permalink settings;
- Or update the SEO settings.
If the sitemap doesn’t update accordingly, you might have a caching plugin enabled that interferes with the sitemap’s cache. Please see if you can exclude the sitemap, or flush it periodically otherwise.
Whenever the sitemap is updated, TSF can ping Google and Bing, whereafter they’ll crawl and process the sitemap immediately. The pinging occurs at most once every 3 600 seconds to prevent spamming the search engines. The Google News sitemap does not have a rate-limiter for pinging.
Excluding posts and pages
You can exclude posts and pages from the sitemap by applying
noindex to them.
Translation plugins get a unique sitemap per language. For more information, please refer to this article.
NGINX is a web server meant to improve performance and scalability with statically loaded files. Since WordPress always serves its pages dynamically (even when using a caching plugin), NGINX must proxy to a PHP process, which negates NGINX’s performance benefits and adds complexity. This proxy can be overwritten by other custom configurations in NGINX, which is why you might be experiencing issues with NGINX.
NGINX’s configuration language is very powerful and straightforward […], but often people coming from other servers are not sure how things work in NGINX and just copy and paste whatever they see from a blog that seems to fill their needs. — NGINX
NGINX must be configured correctly to display dynamically created pages, including TSF’s sitemaps. We recommend relying on WordPress’s proposed NGINX configuration. This configuration will work as-is with most WordPress setups.
If you don’t know how to fix NGINX configuration issues, please reach out to your hosting provider.
The sitemap redirects
/sitemap.xml endpoint redirects to
/sitemap_index.xml, and perhaps even shows a 404-error because of that, you’ll have to remove all
sitemap-related configurations from your NGINX configuration file, which might’ve been erroneously added by your hosting provider or NGINX installation script.
After you’ve removed the misconfigurations, you may experience some of the other issues listed below. You’ll need to fix those as well.
The sitemap shows a 404-error
Some NGINX configurations instruct
.xml endpoints independently; therefore, those can’t resolve via WordPress’s
/index.php file. Because TSF relies on WordPress to display its sitemaps virtually, the sitemaps will resolve with a 404-error. To fix this issue permanently, you must remove the
xml-part from any
location ~* .([...]|xml)$ configurations.
You can work around this issue by adding
-alt to the sitemap’s endpoint and submit that URL to Google and Bing’s sitemap reports — for example:
https://example.com/sitemap.xml-alt. However, we recommend foregoing this workaround and fix the cause instead.
The sitemap shows a blank page
If you notice that the sitemap shows a blank page, inspect the source (CTRL+U on Windows) of the sitemap and see if anything’s there. If you find something, disabling the styling of the sitemap should fix the issue immediately. There are multiple causes for this issue.
- You may’ve accessed the sitemap from a domain that’s not alike the one you’ve set in the General Settings of WordPress. Since XSL (XML stylesheet) files are executable scripts, they’re applicable to CORS-related directives. Make sure you’re accessing the sitemap from the same domain as configured in WordPress–even the subdomain (
www.) and the protocol (
https://) must match.
- On NGINX, you may’ve configured
.xslendpoints incorrecty. Apply the same fixes for NGINX listed above, but then for
If disabling the sitemap’s stylesheet doesn’t resolve the issue, then you may’ve increased the sitemap’s query-limit too high — we recommend lowering it to a query-limit of 3000 or below. And if even that doesn’t resolve the issue, then you may experience plugin or theme conflicts.
Inspecting via Google Search Console
TSF prevents indexing of its sitemaps because they aren’t helpful landing pages for human visitors. So, when you inspect the sitemap via the URL inspector, Google Search Console will report that it’s not indexable, and in some cases even displays other errors. You can safely ignore these notices.
At Google Search Console, in the sidebar menu, you’ll find a “Sitemaps” report under the “Index” tab. There, you can submit, manage, and inspect your sitemaps properly.
Using Core Sitemaps
WordPress 5.5 empowers every WordPress website with a native sitemap. You can activate them by deactivating TSF‘s sitemap, provided that another plugin or theme hasn’t disabled them.
We don’t believe the Core Sitemaps are beneficial for most WordPress sites. We stubbornly kept our sitemap simple; it’s easier for us to maintain and faster search engines to process. Search engines crawl your pages more quickly using TSF’s sitemap, no matter your website’s size.
We were unpleasantly surprised by its sudden integration, so as of The SEO Framework v4.1, Core Sitemap don’t listen to the indexing states brought by TSF. Therefore you may notice errors spawning in Google Search Console’s sitemap report.