XML Sitemaps: The Complete Guide

An XML sitemap acts as a roadmap of your website that leads search engines to all your important pages. This guide explains how to properly structure a sitemap.xml file, which tags are actually used by Google, and the strict rules you must follow.

1. What is an XML Sitemap?

An XML (Extensible Markup Language) sitemap is a file that lists the essential URLs of a site along with metadata about each URL (such as when it was last updated). Its purpose is to help web crawlers, like Googlebot or Bingbot, discover and index pages more intelligently and efficiently.

2. Why are Sitemaps Important?

While search engines can usually discover pages by following internal links, a sitemap guarantees that all your canonical pages are known to the crawler. A sitemap is especially crucial if:

  • Your site is really large: It helps crawlers prioritize recently updated content.
  • Your site is new: It helps search engines discover your pages faster when you don't have many backlinks yet.
  • Your site has isolated pages: Pages that aren't well-linked internally (orphan pages) can still be found.

3. XML Sitemap Syntax & Example

An XML sitemap must be encoded in UTF-8. Here is the standard structure of a basic sitemap containing a single URL:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://www.rankosaur.com/</loc>
    <lastmod>2026-02-15T12:00:00+00:00</lastmod>
  </url>
  <url>
    <loc>https://www.rankosaur.com/features/</loc>
    <lastmod>2026-03-01T08:30:00+00:00</lastmod>
  </url>
</urlset>

4. Important Sitemap Tags

Not all tags defined in the original sitemap protocol are useful today. Here is what you need to know:

  • <loc> (Required): The absolute URL of the page. It must begin with the protocol (e.g., https://).
  • <lastmod> (Highly Recommended): The date of the last modification of the file, preferably in W3C Datetime format (YYYY-MM-DD). Google relies heavily on this tag to know if a page needs to be recrawled.
  • <changefreq> (Ignored by Google): How frequently the page is likely to change. Google stated they ignore this tag because it is often inaccurate.
  • <priority> (Ignored by Google): The priority of this URL relative to other URLs on your site. Google has explicitly stated they do not use this tag.

5. Sitemap Index Files (For Large Sites)

A single sitemap can only hold a maximum of 50,000 URLs and cannot exceed 50 MB (uncompressed). If your site exceeds these limits, you must split your URLs across multiple sitemaps and group them using a Sitemap Index file.

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://www.rankosaur.com/sitemap-pages.xml</loc>
    <lastmod>2026-03-01T18:23:17+00:00</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://www.rankosaur.com/sitemap-products.xml</loc>
    <lastmod>2026-03-02</lastmod>
  </sitemap>
</sitemapindex>

6. Crucial Rules & Best Practices

Golden Rule: Your sitemap should ONLY contain canonical, indexable URLs that return a 200 OK status code.

  1. No Errors or Redirects: Never include URLs that return 404 (Not Found), 410 (Gone), 301 (Permanent Redirect), or 302 (Temporary Redirect) errors.
  2. No Non-Indexable Pages: Do not include URLs that have a noindex meta tag or are blocked by your robots.txt file.
  3. No Duplicate Content: Do not include paginated pages, parameter URLs (unless canonicalized to themselves), or alternate versions of a page. Only the canonical URL belongs in the sitemap.
  4. Submit it: Always submit your sitemap in Google Search Console and Bing Webmaster Tools so you can monitor indexing errors.
  5. Link it in robots.txt: Add the following line to the bottom of your robots.txt file so all crawlers can find it instantly:
    Sitemap: https://www.rankosaur.com/sitemap.xml

Pro Tip: Rank-O-Saur allows you to quickly read the live robots.txt of any website and immediately open the linked sitemaps with a single click.

Christoph Hein, Head of SEO and search consultant
About the Author

Christoph Hein

Head of SEO at Popken Fashion Group & independent Search Consultant

Christoph has spent 10+ years in search, currently steering organic strategy for 5 fashion brands across 13 countries and more than 30 domains. Alongside his in-house and consulting work, he founded niche content portals such as Angelmagazin.de and BaristaCompass.com, and built the Rank-O-Saur extension to make technical SEO audits effortless. Every guide here is grounded in hands-on, data-driven practice rather than theory.