A robots.txt file tells web crawlers which parts of your site they may request and which they should leave alone. It is a plain text file at the root of your domain, and it follows a small, strict format. The robots.txt generator builds one from a few choices, but writing it correctly means knowing what each line does and, just as important, what robots.txt cannot do.
Where it lives and how it reads
Crawlers look for the file at exactly one place: /robots.txt at the root of your domain, such as https://example.com/robots.txt. A copy in any subfolder is ignored. Serve it as plain text.
The file is a series of groups. Each group opens with one or more User-agent lines naming the crawlers it applies to, followed by the rules for them. A minimal file that lets everyone in looks like this:
User-agent: *
Disallow:
The * matches every crawler, and an empty Disallow means nothing is off-limits. To shut everyone out instead, you give Disallow a slash:
User-agent: *
Disallow: /
Disallow, Allow and matching specific bots
Within a group, Disallow lists paths a crawler should not fetch, and Allow carves out exceptions inside a disallowed area. Paths are matched from the start, so Disallow: /cart blocks /cart and everything under it.
You can target individual crawlers by giving them their own group. A common pattern blocks one aggressive bot while leaving the rest free:
User-agent: AhrefsBot
Disallow: /
User-agent: *
Disallow: /admin/
Disallow: /cart/
A crawler obeys the most specific group that names it, and falls back to the * group if none does.
The trap: robots.txt is not a hiding place
The most common mistake is using Disallow to keep a page out of search results. It does not do that. Disallow only asks well-behaved crawlers not to fetch the page; the URL can still be indexed if something links to it, often showing up with no description because the crawler never read it.
To actually keep a page out of results, do the opposite of what feels natural: allow it to be crawled and add a noindex meta tag or HTTP header so the crawler reads the instruction and drops it. For anything genuinely private, robots.txt is no protection at all, since it is public and advisory; put those pages behind authentication.
Crawl-delay and sitemaps
Two optional lines round out most files. Crawl-delay asks a crawler to wait a set number of seconds between requests; Bing and some others honor it, but Googlebot ignores it and uses Search Console settings instead. A Sitemap line gives crawlers the absolute URL of your sitemap, which helps them discover your pages:
Sitemap: https://example.com/sitemap.xml
You can list more than one sitemap, one per line.
Publishing it
Save the file as robots.txt and upload it to your web root so it answers at /robots.txt. Test it by visiting that URL directly, and check it in your search console’s robots tester before relying on it.
A robots.txt is one of several small files and records that configure how the outside world treats your domain. If you also handle email for it, the same publish-and-verify discipline applies to your SPF record and DMARC record, which control who can send mail as your domain. To build your crawler rules, open the robots.txt generator, pick your mode, and copy the file.