How does the robots.txt validator decide if a URL is blocked?

It follows Google's matching rules: it picks the group with the most specific (longest) matching user-agent, then applies the rule with the longest path pattern. If an Allow and a Disallow rule are equally specific, the Allow wins. Wildcards (*) and end-anchors ($) are supported.

Why can't the tool always fetch a robots.txt from a URL?

The validator itself runs entirely in your browser. The optional 'fetch from URL' tries a direct request first; if your browser blocks it (CORS), it falls back to our own first-party proxy on rankosaur.com, which retrieves the file server-side and processes it transiently without storing it. You can also paste or upload the file instead.

robots.txt Validator & Tester

Q: Can I test AI crawlers like GPTBot or ClaudeBot?

Yes. Type any user-agent into the field — including GPTBot, ClaudeBot, Google-Extended or CCBot — or use the quick-pick buttons, to see whether that specific bot is allowed or blocked.

Check in seconds whether a URL is allowed or blocked by a robots.txt — for Googlebot, AI crawlers, or any custom user-agent. The validator runs in your browser; the optional URL fetch falls back to our own first-party proxy only if your browser blocks it (CORS).

Step 1

Load your robots.txt

Fetch from a URL

…or upload a file

…or paste the rules

Step 2

Test a URL

URL or path to test

User-agent

How to use the robots.txt tester

Load the rules. Enter a domain and fetch its robots.txt, upload a file, or simply paste the directives into the box.
Enter a URL or path you want to check, e.g. /blog/post or a full URL.
Pick a user-agent. Use a quick-pick button or type any crawler name (e.g. GPTBot) to see how that specific bot is treated.

How robots.txt matching works

Search engines like Google don't read robots.txt top-to-bottom. They first select the group with the most specific User-agent that matches the crawler, then, within that group, apply the rule with the longest path pattern. If an Allow and a Disallow rule are equally specific, the Allow wins. The * wildcard matches any sequence of characters and $ anchors the end of the URL. This validator reproduces exactly that logic.

Want the full picture? Read our complete guide to robots.txt syntax, pattern matching and blocking AI crawlers.

Read the robots.txt guide

Frequently asked questions

How does the tool decide if a URL is blocked?

It follows Google's matching rules: the most specific (longest) matching User-agent group is chosen, then the rule with the longest path pattern wins. On a tie between Allow and Disallow, Allow wins. * and $ wildcards are supported.

How does fetching a robots.txt from a URL work?

The validator itself runs entirely in your browser. The optional fetch tries a direct request first; because browsers block most cross-site requests (CORS), it then falls back to our own first-party proxy on rankosaur.com, which retrieves the file server-side and processes it transiently — it is not stored. You can always paste or upload the file instead.

Can I test AI crawlers like GPTBot or ClaudeBot?

Yes. Type any user-agent — including GPTBot, ClaudeBot, Google-Extended or CCBot — or use the quick-pick buttons, to check whether that specific bot is allowed or blocked.

Does a Disallow rule remove a page from Google?

No. Disallow only stops crawling; a blocked URL can still be indexed if it's linked elsewhere. To remove a page from the index use a noindex tag instead — and don't block it in robots.txt, or Google can't see the tag.