🗺️ The Ultimate Guide to XML Sitemaps

An XML sitemap acts as a roadmap of your website that leads search engines to all your important pages. This guide explains how to properly structure a sitemap.xml file, which tags are actually used by Google, and the strict rules you must follow.

1. What is an XML Sitemap?

An XML (Extensible Markup Language) sitemap is a file that lists the essential URLs of a site along with metadata about each URL (such as when it was last updated). Its purpose is to help web crawlers, like Googlebot or Bingbot, discover and index pages more intelligently and efficiently.

2. Why are Sitemaps Important?

While search engines can usually discover pages by following internal links, a sitemap guarantees that all your canonical pages are known to the crawler. A sitemap is especially crucial if:

3. XML Sitemap Syntax & Example

An XML sitemap must be encoded in UTF-8. Here is the standard structure of a basic sitemap containing a single URL:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://www.rankosaur.com/</loc>
    <lastmod>2026-02-15T12:00:00+00:00</lastmod>
  </url>
  <url>
    <loc>https://www.rankosaur.com/features/</loc>
    <lastmod>2026-03-01T08:30:00+00:00</lastmod>
  </url>
</urlset>

4. Important Sitemap Tags

Not all tags defined in the original sitemap protocol are useful today. Here is what you need to know:

5. Sitemap Index Files (For Large Sites)

A single sitemap can only hold a maximum of 50,000 URLs and cannot exceed 50 MB (uncompressed). If your site exceeds these limits, you must split your URLs across multiple sitemaps and group them using a Sitemap Index file.

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://www.rankosaur.com/sitemap-pages.xml</loc>
    <lastmod>2026-03-01T18:23:17+00:00</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://www.rankosaur.com/sitemap-products.xml</loc>
    <lastmod>2026-03-02</lastmod>
  </sitemap>
</sitemapindex>

6. Crucial Rules & Best Practices

⚠️ Golden Rule: Your sitemap should ONLY contain canonical, indexable URLs that return a 200 OK status code.

  1. No Errors or Redirects: Never include URLs that return 404 (Not Found), 410 (Gone), 301 (Permanent Redirect), or 302 (Temporary Redirect) errors.
  2. No Non-Indexable Pages: Do not include URLs that have a noindex meta tag or are blocked by your robots.txt file.
  3. No Duplicate Content: Do not include paginated pages, parameter URLs (unless canonicalized to themselves), or alternate versions of a page. Only the canonical URL belongs in the sitemap.
  4. Submit it: Always submit your sitemap in Google Search Console and Bing Webmaster Tools so you can monitor indexing errors.
  5. Link it in robots.txt: Add the following line to the bottom of your robots.txt file so all crawlers can find it instantly:
    Sitemap: https://www.rankosaur.com/sitemap.xml

Pro Tip: Rank-O-Saur allows you to quickly read the live robots.txt of any website and immediately open the linked sitemaps with a single click.