Hi!
I'm happy to announce that I just merged support for Unicode domain names (IDNA) in WebURL.
It's available right now, on the main
branch, and it's a big milestone for a lot of reasons.
Note: IDNA support is on main
. There will be a release in the ~next few weeks which includes it, but for now you should use a branch-based dependency for IDNA.
With this post, I'd like to discuss what IDNA is, why you should support it, how WebURL supports it, and why I think that's such a big deal.
What is IDNA?
Firstly, IDNA stands for "Internationalizing Domain Names for Applications". It is defined by Unicode Technical Standard 46, and this is how they describe it:
One of the great strengths of domain names is universality. The URL
http://Apple.com
goes to Apple's website from anywhere in the world, using any browser. [...]Initially, domain names were restricted to ASCII characters. This was a significant burden on people using other characters. Suppose, for example, that the domain name system had been invented by Greeks, and one could only use Greek characters in URLs. Rather than
apple.com
, one would have to write something likeαππλε.κομ
. An English speaker would not only have to be acquainted with Greek characters, but would also have to pick those Greek letters that would correspond to the desired English letters. One would have to guess at the spelling of particular words, because there are not exact matches between scripts.Most of the world’s population faced this situation until recently, because their languages use non-ASCII characters. A system was introduced in 2003 for internationalized domain names (IDN). This system is called Internationalizing Domain Names for Applications. [...]
In a nutshell, it's Unicode - in domain names. And it's good for the reasons Unicode is good. It means you can have URLs like:
-
http://招商银行.中国 (China Merchant Bank)
-
http://中国移动.中国 (China Mobile)
-
https://국립중앙도서관.한국 (National Library of South Korea)
-
http://스타벅스코리아.com (Starbucks South Korea)
-
https://日本語.jp (Japan Registry Services)
-
http://全国温泉ガイド.jp (JRS National Hot Spring Guide)
Each top-level domain (TLD) sets its own limits on which characters may be registered - for example, JPRS, responsible for Japan's .jp
country-code TLD, have decided that only ASCII, Kanji, Hiragana, and Katakana symbols may be used in a .jp
domain. Many TLDs restrict or forbid Emoji in domain names, but some are less proscriptive, such as the .fm
and .ws
TLDs:
- https://💩.la (Panic Blog » The World’s First Emoji Domain)
- https://
.fm ('It is what it is' - it's complicated)
- http://🦄.ws ('Emojizing' - a link shortener service)
More information: Wikipedia - Emoji Domain
So no Emoji URLs for me? data:image/s3,"s3://crabby-images/955e5/955e513010b1ff4afbe4a05d337f49921ef0d3c0" alt=":pleading_face: :pleading_face:"
Fret not! IDNA applies to the entire domain - including subdomains! So you can totally have emoji subdomains on regular, ASCII domains you already own. And they will work in URLs.
Sadly, none of these point anywhere, but they're all technically valid! Alternatively, you could add localized variants of existing subdomains, such as support domains:
- https://help.example.com/
- https://ヘルプ.example.com/
- https://帮助.example.com/
- https://يساعد.example.com/
- https://עֶזרָה.example.com/
- etc...
How does WebURL support IDNA?
Now that WebURL supports IDNA, all of the above URL strings can be successfully parsed by the WebURL.init?(String)
initializer, and the .hostname
property setter accepts Unicode domains. Previously, these operations would fail.
let url = WebURL("https://국립중앙도서관.한국/")! // ✅ works
var url = WebURL("https://example.com/")!
url.hostname = "日本語.jp" // ✅ works
Now, URLs do not really support Unicode - it turns out to be quite important that URL strings are always plain ASCII, and DNS is far more restrictive even than that. So how does it work now that URL components can have Unicode contents?
Similarly to the way WebURL normalizes other URL components (such as by collapsing ..
segments in paths or adding percent-encoding), Unicode domains are normalized using the ToASCII
algorithm defined by UTS46, with parameters defined by the URL Standard. The algorithm normalizes, case-folds, and applies compatibility mappings to the domain, before encoding it as ASCII using an encoding format known as Punycode.
All of that happens automatically when you perform either of the above operations. After parsing a URL or setting its hostname, you will find that it has been converted to ASCII.
Unicode | ASCII |
---|---|
http://招商银行.中国 | http://xn--czrx92avj3aruk.xn--fiqs8s/ |
https://국립중앙도서관.한국/ | https://xn--zb0b2h01ozygv9j7lgn8g.xn--3e0b707e/ |
https://we❤️swift.fm | https://xn--weswift-z98d.fm/ |
https://🛍.example.com/ | https://xn--878h.example.com/ |
let url = WebURL("https://招商银行.中国/")!
print(url.hostname) // "xn--czrx92avj3aruk.xn--fiqs8s"
There are a couple of interesting things to point out:
-
Unicode text is normalized and case-folded before Punycode.
We all know that Unicode is complex. The strings
caf\u{00E9}.fr
andcafe\u{0301}.fr
do not contain the same bytes, nor do they even contain the scalars, but something called "Unicode Canonical Equivalence" says we should treat these as the same string.So how does that apply to domains? Do routers and caches need to check canonical equivalence to tell if two domains are the same? Does SSL certificate validation depend on Unicode canonical equivalence?
No. With IDNA, canonically-equivalent strings (or strings which differ only in case) produce the same ASCII result. It doesn't matter whether the hostname is
caf\u{00E9}.fr
,cafe\u{0301}.fr
, orCAFE\u{0301}.fr
- they all returnhttp://xn--caf-dma.fr/
. All of your expectations about how URLs and ASCII strings work are maintained, and you can more-or-less forget that it represents Unicode contents. -
IDNA is applied per-label.
Notice how
🛍.example.com
becamexn--878h.example.com
in our subdomain example? IDNA applies to each segment of a domain individually (known as a "label"), so any code or routing rules which expect to see*.example.com
will continue to see that.The theme here is compatibility - essentially, the Unicode portions (and only those potions) end up as funny-looking ASCII segments, and everything "just works".
-
URL
andURLSession
.Because IDNA is designed to support legacy systems, the Unicode -> ASCII conversion is all handled by WebURL. And since WebURL has excellent interoperability with Foundation, you can now make requests to Unicode domains using
URLSession
and other Foundation APIs.let page = try! String(contentsOf: WebURL("http://招商银行.中国")!) // ✅ Works let (data, _) = try await URLSession.shared.data(from: WebURL("https://日本語.jp")!) // ✅
The same applies to our WebURL-native port of
async-http-client
, which performs true, web-compatible URL processing throughout the entire request process. It now also supports IDNA. -
And it back deploys.
WebURL supports all Apple platforms, Linux, and Windows. It is a pure package implementation, so you can guarantee IDNA support for users no matter which OS they're running. It only requires a Swift 5.3+ compiler to build, so anybody should be able to use it.
Why is it such a big deal for WebURL to support IDNA?
IDNA has had a bit of a rough launch. The first IDN TLDs were approved in 2010, and since then progress has been... a bit mixed:
It's a little underwhelming, but there are some technical issues which may shed light on why that is. I'd like to draw your attention to this infographic by IDN World Report (a joint research project from the EU, Coordination Center, and UNESCO):
And specifically to this part:
Currently, software support for IDNA is poor. As is tradition for anything URL-related, there are multiple, incompatible revisions of the standard, with some domains producing different results depending on which revision you use, or being valid in one but not the other.
Yeah. This again. Uhhhhhh
We see these differences in browsers - German speakers can't have 'ß' in domains (Chrome turns it in to 'ss' but Safari turns it in to 'xn--zca'), and URLs such as https://www.👨🦰.tk/
only work in Chrome (but should be considered invalid). The current state of the browsers is that Safari is fully compliant with the latest version of the standard, followed by Firefox, and Chrome (the browser with by far the largest market-share worldwide) is the least compliant. That's why the major browser developers have made decided to make alignment on IDNA an Interop-22 priority item for the web.
But take another look at the devices in that picture. Seem familiar?
Hey - wait a second! Servers? Smartphones? That's our house!!!
It even has a notch!
So yeah - besides browsers, Smartphones have, of course, been the primary driver of increased internet connectivity across the entire world over the last decade while IDNA was seeking greater adoption -- and yet, core system frameworks such as Foundation
on iOS have lacked support. That should underscore just how poor the IDNA experience has been across the industry so far - browsers may/may not work (can you expect that somebody will be able to open an IDN URL from an email? Who knows? Probably not), and if you used an iPhone or iPad, you can basically guarantee that nothing would support IDNA unless the developer went to an exceptional amount of effort to specifically support them. And the same is true for Swift on Server.
When you consider all of this, I think it's clear that IDNA hasn't been given a fair shake. Maybe it'll catch on, or maybe it won't - but that should be decided by its ability to make life better for the people it is designed to include, not by unreliable software killing the project with a lack of interoperability.
And now?
It was recently announced that Foundation.URL
will gain support for IDNA with the next major OS release, expected later this year, which is fantastic news.
WebURL offers something extra - not only the latest URL standard, but now also backwards deployable support for IDNA, building on all the work we've done for interoperability with existing Foundation.URL
code-bases. And because it's actually a Swift-native implementation, we can offer more advanced APIs for analyzing and rendering Unicode domains.
And it's available today. Right now.
So that's why I think this is a big deal. With package-based IDNA support that plugs in to existing applications, there is no longer any reason for Swift applications to not support IDNA.
At the start of this year, IDNA support across the industry was pretty thin. But now, with a renewed focus from browser vendors to align their implementations to the latest standard, and with IDNA support coming in Apple's system libraries, and backwards-deployable with WebURL, IDNA's future is looking much more promising.
I encourage you all to check whether your applications can support IDNA, and it's yet another reason to try out WebURL in your apps.