How to Write a robots.txt File

Write a correct robots.txt: where it goes, how User-agent, Disallow and Allow work, why Disallow does not hide a page from search, and how to list your sitemap.

Updated 6 min read By CodingEagles
Free tool Robots.txt Generator Build a valid robots.txt with allow/disallow and sitemap. Open tool

A robots.txt file tells web crawlers which parts of your site they may request and which they should leave alone. It is a plain text file at the root of your domain, and it follows a small, strict format. The robots.txt generator builds one from a few choices, but writing it correctly means knowing what each line does and, just as important, what robots.txt cannot do.

Where it lives and how it reads

Crawlers look for the file at exactly one place: /robots.txt at the root of your domain, such as https://example.com/robots.txt. A copy in any subfolder is ignored. Serve it as plain text.

The file is a series of groups. Each group opens with one or more User-agent lines naming the crawlers it applies to, followed by the rules for them. A minimal file that lets everyone in looks like this:

User-agent: *
Disallow:

The * matches every crawler, and an empty Disallow means nothing is off-limits. To shut everyone out instead, you give Disallow a slash:

User-agent: *
Disallow: /

Disallow, Allow and matching specific bots

Within a group, Disallow lists paths a crawler should not fetch, and Allow carves out exceptions inside a disallowed area. Paths are matched from the start, so Disallow: /cart blocks /cart and everything under it.

You can target individual crawlers by giving them their own group. A common pattern blocks one aggressive bot while leaving the rest free:

User-agent: AhrefsBot
Disallow: /

User-agent: *
Disallow: /admin/
Disallow: /cart/

A crawler obeys the most specific group that names it, and falls back to the * group if none does.

The trap: robots.txt is not a hiding place

The most common mistake is using Disallow to keep a page out of search results. It does not do that. Disallow only asks well-behaved crawlers not to fetch the page; the URL can still be indexed if something links to it, often showing up with no description because the crawler never read it.

To actually keep a page out of results, do the opposite of what feels natural: allow it to be crawled and add a noindex meta tag or HTTP header so the crawler reads the instruction and drops it. For anything genuinely private, robots.txt is no protection at all, since it is public and advisory; put those pages behind authentication.

Crawl-delay and sitemaps

Two optional lines round out most files. Crawl-delay asks a crawler to wait a set number of seconds between requests; Bing and some others honor it, but Googlebot ignores it and uses Search Console settings instead. A Sitemap line gives crawlers the absolute URL of your sitemap, which helps them discover your pages:

Sitemap: https://example.com/sitemap.xml

You can list more than one sitemap, one per line.

Publishing it

Save the file as robots.txt and upload it to your web root so it answers at /robots.txt. Test it by visiting that URL directly, and check it in your search console’s robots tester before relying on it.

A robots.txt is one of several small files and records that configure how the outside world treats your domain. If you also handle email for it, the same publish-and-verify discipline applies to your SPF record and DMARC record, which control who can send mail as your domain. To build your crawler rules, open the robots.txt generator, pick your mode, and copy the file.

Frequently asked questions

Where do I put robots.txt?
It must live at the root of your domain and be served at /robots.txt, for example https://example.com/robots.txt. Crawlers only check that location. A robots.txt placed in a subfolder is ignored.
Does Disallow remove a page from Google?
No. Disallow stops compliant crawlers from fetching a page, but a blocked URL can still be indexed if other pages link to it. To keep a page out of results, allow crawling and add a noindex meta tag or header, or require authentication.
Can I block one crawler but allow others?
Yes. Each User-agent line starts a group of rules for that crawler. Add a group for the bot you want to restrict, such as User-agent: AhrefsBot followed by Disallow: /, and a separate User-agent: * group for everyone else.

Ready to try it?

Build a valid robots.txt with allow/disallow and sitemap. Free, in-browser, and 100% private — your data never leaves your device.

Open the Robots.txt Generator