🗺️ The Ultimate Guide to XML Sitemaps
An XML sitemap acts as a roadmap of your website that leads search engines to all your important pages.
This guide explains how to properly structure a sitemap.xml file, which tags are actually
used by Google, and the strict rules you must follow.
📑 Table of Contents
1. What is an XML Sitemap?
An XML (Extensible Markup Language) sitemap is a file that lists the essential URLs of a site along with metadata about each URL (such as when it was last updated). Its purpose is to help web crawlers, like Googlebot or Bingbot, discover and index pages more intelligently and efficiently.
2. Why are Sitemaps Important?
While search engines can usually discover pages by following internal links, a sitemap guarantees that all your canonical pages are known to the crawler. A sitemap is especially crucial if:
- Your site is really large: It helps crawlers prioritize recently updated content.
- Your site is new: It helps search engines discover your pages faster when you don't have many backlinks yet.
- Your site has isolated pages: Pages that aren't well-linked internally (orphan pages) can still be found.
3. XML Sitemap Syntax & Example
An XML sitemap must be encoded in UTF-8. Here is the standard structure of a basic sitemap containing a single URL:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://www.rankosaur.com/</loc>
<lastmod>2026-02-15T12:00:00+00:00</lastmod>
</url>
<url>
<loc>https://www.rankosaur.com/features/</loc>
<lastmod>2026-03-01T08:30:00+00:00</lastmod>
</url>
</urlset>
4. Important Sitemap Tags
Not all tags defined in the original sitemap protocol are useful today. Here is what you need to know:
<loc>(Required): The absolute URL of the page. It must begin with the protocol (e.g.,https://).<lastmod>(Highly Recommended): The date of the last modification of the file, preferably in W3C Datetime format (YYYY-MM-DD). Google relies heavily on this tag to know if a page needs to be recrawled.<changefreq>(Ignored by Google): How frequently the page is likely to change. Google stated they ignore this tag because it is often inaccurate.<priority>(Ignored by Google): The priority of this URL relative to other URLs on your site. Google has explicitly stated they do not use this tag.
5. Sitemap Index Files (For Large Sites)
A single sitemap can only hold a maximum of 50,000 URLs and cannot exceed 50 MB (uncompressed). If your site exceeds these limits, you must split your URLs across multiple sitemaps and group them using a Sitemap Index file.
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://www.rankosaur.com/sitemap-pages.xml</loc>
<lastmod>2026-03-01T18:23:17+00:00</lastmod>
</sitemap>
<sitemap>
<loc>https://www.rankosaur.com/sitemap-products.xml</loc>
<lastmod>2026-03-02</lastmod>
</sitemap>
</sitemapindex>
6. Crucial Rules & Best Practices
⚠️ Golden Rule: Your sitemap should ONLY contain canonical, indexable URLs that
return a 200 OK status code.
- No Errors or Redirects: Never include URLs that return 404 (Not Found), 410 (Gone), 301 (Permanent Redirect), or 302 (Temporary Redirect) errors.
- No Non-Indexable Pages: Do not include URLs that have a
noindexmeta tag or are blocked by yourrobots.txtfile. - No Duplicate Content: Do not include paginated pages, parameter URLs (unless canonicalized to themselves), or alternate versions of a page. Only the canonical URL belongs in the sitemap.
- Submit it: Always submit your sitemap in Google Search Console and Bing Webmaster Tools so you can monitor indexing errors.
- Link it in robots.txt: Add the following line to the bottom of your
robots.txtfile so all crawlers can find it instantly:Sitemap: https://www.rankosaur.com/sitemap.xml
Pro Tip: Rank-O-Saur allows you to quickly read the live robots.txt of
any website and immediately open the linked sitemaps with a single click.