Punycode is a way to write a Unicode domain name using only the characters DNS allows: the letters a to z, the digits 0 to 9 and the hyphen. A label with accents or non-Latin script is converted into an ASCII string that starts with xn--, so münchen becomes xn--mnchen-3ya. The Punycode converter encodes any domain into this form and decodes it back to readable Unicode.
Here is why the encoding exists and how to read it.
DNS only speaks ASCII
The domain name system was designed decades ago around a narrow character set. A label can hold letters, digits and hyphens, and nothing else. That was fine when domains were English words, but it left out most of the world’s scripts: no accents, no Cyrillic, no Arabic, no Chinese.
Rather than rebuild DNS, the internet standards added a translation layer. A domain with non-ASCII characters, an internationalized domain name or IDN, is encoded into a plain ASCII label that the existing DNS can store and resolve. Punycode is that encoding, and the IDNA standard is the set of rules around it.
How the xn— form is built
Punycode separates a label into two parts. The basic ASCII characters are kept as they are. The non-ASCII characters are encoded into a compact suffix of letters and digits, and the whole label is given the xn-- prefix so software recognises it.
Take münchen:
münchen → xn--mnchen-3ya
The mnchen part is the ASCII letters in order. The 3ya after the final hyphen is the encoded information that says where the ü goes and which character it is. The algorithm, set out in RFC 3492, is reversible, so decoding xn--mnchen-3ya gives back münchen exactly.
Each label of a domain is encoded on its own. In café.example.com, only café becomes xn--caf-dma; the example and com labels are already ASCII and pass through unchanged.
Where you actually meet it
You rarely type Punycode, but it surfaces in several places:
- Registrars and DNS zones. When you register an IDN, the registrar stores the
xn--form. Your DNS records use it too. - TLS certificates. A certificate for an international domain lists the ASCII Punycode name in its subject and SAN fields.
- Email headers and logs. Mail servers and server logs often show the encoded label rather than the Unicode one.
- Browser address bars. Browsers display the Unicode form when it is safe, and fall back to the raw
xn--form when a label mixes scripts in a way that could be deceptive.
That last point is the one worth understanding.
The homograph problem
Many characters in different scripts look almost identical. A Cyrillic а is hard to tell from a Latin a. An attacker can register a domain that uses one of these lookalikes so the Unicode display looks exactly like a well-known brand, while the actual domain is different. This is a homograph attack.
Punycode is not the cause, but it is the cure. Because the real domain is always the ASCII xn-- string, you can decode a suspicious link and see the genuine characters underneath. If a domain that displays as a familiar name decodes to a label full of mixed-script characters, that is a red flag. Browsers automate part of this by refusing to show the Unicode form for risky labels.
Reading and writing Punycode
When you need the ASCII form to publish a record or configure a server, or you want to check what an xn-- domain really says, paste it into the Punycode converter. Encoding turns a Unicode domain into its xn-- labels, and decoding turns an xn-- domain back into the characters it stands for. The conversion runs in your browser, so the domains you check are never sent anywhere, which is the safe way to inspect a link you are not sure about.