What Is Punycode

Punycode is a way to write a Unicode domain name using only the characters DNS allows: the letters a to z, the digits 0 to 9 and the hyphen. A label with accents or non-Latin script is converted into an ASCII string that starts with xn--, so münchen becomes xn--mnchen-3ya. The Punycode converter encodes any domain into this form and decodes it back to readable Unicode.

Here is why the encoding exists and how to read it.

DNS only speaks ASCII

The domain name system was designed decades ago around a narrow character set. A label can hold letters, digits and hyphens, and nothing else. That was fine when domains were English words, but it left out most of the world’s scripts: no accents, no Cyrillic, no Arabic, no Chinese.

Rather than rebuild DNS, the internet standards added a translation layer. A domain with non-ASCII characters, an internationalized domain name or IDN, is encoded into a plain ASCII label that the existing DNS can store and resolve. Punycode is that encoding, and the IDNA standard is the set of rules around it.

How the xn— form is built

Punycode separates a label into two parts. The basic ASCII characters are kept as they are. The non-ASCII characters are encoded into a compact suffix of letters and digits, and the whole label is given the xn-- prefix so software recognises it.

Take münchen:

münchen   →   xn--mnchen-3ya

The mnchen part is the ASCII letters in order. The 3ya after the final hyphen is the encoded information that says where the ü goes and which character it is. The algorithm, set out in RFC 3492, is reversible, so decoding xn--mnchen-3ya gives back münchen exactly.

Each label of a domain is encoded on its own. In café.example.com, only café becomes xn--caf-dma; the example and com labels are already ASCII and pass through unchanged.

Where you actually meet it

You rarely type Punycode, but it surfaces in several places:

Registrars and DNS zones. When you register an IDN, the registrar stores the xn-- form. Your DNS records use it too.
TLS certificates. A certificate for an international domain lists the ASCII Punycode name in its subject and SAN fields.
Email headers and logs. Mail servers and server logs often show the encoded label rather than the Unicode one.
Browser address bars. Browsers display the Unicode form when it is safe, and fall back to the raw xn-- form when a label mixes scripts in a way that could be deceptive.

That last point is the one worth understanding.

The homograph problem

Many characters in different scripts look almost identical. A Cyrillic а is hard to tell from a Latin a. An attacker can register a domain that uses one of these lookalikes so the Unicode display looks exactly like a well-known brand, while the actual domain is different. This is a homograph attack.

Punycode is not the cause, but it is the cure. Because the real domain is always the ASCII xn-- string, you can decode a suspicious link and see the genuine characters underneath. If a domain that displays as a familiar name decodes to a label full of mixed-script characters, that is a red flag. Browsers automate part of this by refusing to show the Unicode form for risky labels.

Reading and writing Punycode

When you need the ASCII form to publish a record or configure a server, or you want to check what an xn-- domain really says, paste it into the Punycode converter. Encoding turns a Unicode domain into its xn-- labels, and decoding turns an xn-- domain back into the characters it stands for. The conversion runs in your browser, so the domains you check are never sent anywhere, which is the safe way to inspect a link you are not sure about.

Frequently asked questions

What is Punycode?

Punycode is a way to represent Unicode characters in a domain name using only ASCII letters, digits and the hyphen. A label with non-ASCII characters is converted into a string that starts with xn--, so münchen becomes xn--mnchen-3ya. It is defined in RFC 3492 and lets international domains travel through the existing DNS unchanged.

Why does a domain start with xn--?

The xn-- prefix is the ACE prefix, short for ASCII-Compatible Encoding. It marks a label as Punycode so software knows to decode the rest back to Unicode for display. If you see xn-- in a domain, certificate or email header, the label holds non-Latin or accented characters underneath.

Is Punycode a security risk?

Punycode itself is just an encoding, but attackers use lookalike characters from other scripts to build domains that resemble real ones, a homograph attack. Most browsers defend against this by showing the xn-- form when a label mixes scripts. Decoding a suspicious xn-- domain lets you see the actual characters it contains.

DNS only speaks ASCII

How the xn— form is built

Where you actually meet it

The homograph problem

Reading and writing Punycode

Frequently asked questions

Ready to try it?

More from the Hivly network

DNS only speaks ASCII

How the xn— form is built

Where you actually meet it

The homograph problem

Reading and writing Punycode

Frequently asked questions

Ready to try it?

More tools

More from the Hivly network