🕵️ The Ultimate Guide to Meta Robots Tags
The Meta Robots tag gives you page-level control over how search engines index and serve your content to users. It is one of the most critical tools in an SEO professional's arsenal for managing crawl budget and index bloat.
📑 Table of Contents
1. What is a Meta Robots Tag?
A meta robots tag is a piece of code placed in the <head> section of a web page. It
provides instructions to web crawlers (like Googlebot) regarding whether that specific page should be
added to the search engine's index and whether the links on that page should be followed.
2. HTML Code Example
The tag uses the name="robots" attribute to target all crawlers. The content
attribute contains the specific instructions (directives) separated by commas.
<!DOCTYPE html>
<html>
<head>
<title>Internal Search Results</title>
<meta name="robots" content="noindex, follow">
</head>
<body>
<!-- Page content -->
</body>
</html>
Pro Tip: You can target specific bots by changing the name attribute. For example,
<meta name="googlebot" content="noindex"> will tell only Google to ignore the
page, while Bing and Yahoo might still index it.
3. Core Directives (Index vs. Follow)
The most common values used in the meta robots tag revolve around indexing and link following:
index: Allows the search engine to index the page. (This is the default behavior, so it doesn't actually need to be specified).noindex: Explicitly tells the search engine not to index the page. If the page is already in the search results, it will be removed upon the next crawl.follow: Tells the crawler to follow the links on the page to discover new URLs and pass link equity. (This is also default behavior).nofollow: Tells the crawler not to follow any links on this page. (Note: To nofollow a single specific link, use therel="nofollow"attribute directly on the<a>tag instead).
4. Advanced Directives
Google supports several other powerful directives to control how your search snippets appear:
noarchive: Prevents Google from showing a "Cached" link in the search results.nosnippet: Prevents a text snippet or video preview from being shown in the search results (a static title will still appear).max-snippet:[number]: Specifies the maximum text length (in characters) of a snippet.max-image-preview:[setting]: Specifies the maximum size of an image preview to be shown (options:none,standard,large).notranslate: Tells Google not to offer a translation of this page in the search results.
5. Meta Robots vs. robots.txt
This is the most common and dangerous mix-up in technical SEO:
robots.txt is about crawling.
Meta Robots is about indexing.
⚠️ Critical Warning: If you add a noindex tag to a page, but then block
that URL path in your robots.txt file, Google will never see the noindex
tag! The bot is blocked from crawling the page, so it cannot read the
<head>. If the page was already indexed, it might remain in the search results as
a "URL only" result.
If your goal is to remove a page from Google, ensure the page is allowed to be crawled
in robots.txt so Google can read the noindex command.
6. Common Mistakes to Avoid
- Using
noindexon paginated pages: Do not addnoindexto pages like/blog/page/2/. This can cause Google to stop crawling deeper into your site architecture, preventing new articles from being discovered. - Conflicting Tags: Ensure you don't have multiple meta robots tags giving
conflicting instructions (e.g., one plugin generating
indexand another generatingnoindex). Google will usually default to the most restrictive option. - X-Robots-Tag Headers: Remember that robots directives can also be sent via the HTTP
header (useful for PDFs or non-HTML files). If a page has an
indexmeta tag but anoindexX-Robots-Tag HTTP header, it will not be indexed.