🕵️ The Ultimate Guide to Meta Robots Tags

The Meta Robots tag gives you page-level control over how search engines index and serve your content to users. It is one of the most critical tools in an SEO professional's arsenal for managing crawl budget and index bloat.

📑 Table of Contents

1. What is a Meta Robots Tag?
2. HTML Code Example
3. Core Directives (Index vs. Follow)
4. Advanced Directives
5. Meta Robots vs. robots.txt
6. Common Mistakes to Avoid

1. What is a Meta Robots Tag?

A meta robots tag is a piece of code placed in the <head> section of a web page. It provides instructions to web crawlers (like Googlebot) regarding whether that specific page should be added to the search engine's index and whether the links on that page should be followed.

2. HTML Code Example

The tag uses the name="robots" attribute to target all crawlers. The content attribute contains the specific instructions (directives) separated by commas.

<!DOCTYPE html>
<html>
<head>
    <title>Internal Search Results</title>
    <meta name="robots" content="noindex, follow">
</head>
<body>
    <!-- Page content -->
</body>
</html>

Pro Tip: You can target specific bots by changing the name attribute. For example, <meta name="googlebot" content="noindex"> will tell only Google to ignore the page, while Bing and Yahoo might still index it.

3. Core Directives (Index vs. Follow)

The most common values used in the meta robots tag revolve around indexing and link following:

index: Allows the search engine to index the page. (This is the default behavior, so it doesn't actually need to be specified).
noindex: Explicitly tells the search engine not to index the page. If the page is already in the search results, it will be removed upon the next crawl.
follow: Tells the crawler to follow the links on the page to discover new URLs and pass link equity. (This is also default behavior).
nofollow: Tells the crawler not to follow any links on this page. (Note: To nofollow a single specific link, use the rel="nofollow" attribute directly on the <a> tag instead).

4. Advanced Directives

Google supports several other powerful directives to control how your search snippets appear:

noarchive: Prevents Google from showing a "Cached" link in the search results.
nosnippet: Prevents a text snippet or video preview from being shown in the search results (a static title will still appear).
max-snippet:[number]: Specifies the maximum text length (in characters) of a snippet.
max-image-preview:[setting]: Specifies the maximum size of an image preview to be shown (options: none, standard, large).
notranslate: Tells Google not to offer a translation of this page in the search results.

5. Meta Robots vs. robots.txt

This is the most common and dangerous mix-up in technical SEO:

robots.txt is about crawling. Meta Robots is about indexing.

⚠️ Critical Warning: If you add a noindex tag to a page, but then block that URL path in your robots.txt file, Google will never see the noindex tag! The bot is blocked from crawling the page, so it cannot read the <head>. If the page was already indexed, it might remain in the search results as a "URL only" result.

If your goal is to remove a page from Google, ensure the page is allowed to be crawled in robots.txt so Google can read the noindex command.

6. Common Mistakes to Avoid

Using noindex on paginated pages: Do not add noindex to pages like /blog/page/2/. This can cause Google to stop crawling deeper into your site architecture, preventing new articles from being discovered.
Conflicting Tags: Ensure you don't have multiple meta robots tags giving conflicting instructions (e.g., one plugin generating index and another generating noindex). Google will usually default to the most restrictive option.
X-Robots-Tag Headers: Remember that robots directives can also be sent via the HTTP header (useful for PDFs or non-HTML files). If a page has an index meta tag but a noindex X-Robots-Tag HTTP header, it will not be indexed.