WebURL 0.4.0 released! Now with IDN support

0.4.0

This is a big one!

  • WebURL now supports Internationalised Domain Names (IDNs).
  • The URL host parser is now exposed as API, so you can parse hostnames like URLs do.
  • There is a new Domain type, which supports rich processing of domains/IDNs.

IDN support was the missing piece. Now it is done, we can say:

:tada: WebURL fully conforms to the WHATWG URL Standard :tada:

[GitHub]


:globe_with_meridians: Internationalised Domain Names

WebURL now supports Internationalised Domain Names (IDNs):

import WebURL

WebURL("http://中国移动.中国")
// ✅ "http://xn--fiq02ib9d179b.xn--fiqs8s/"

WebURL("https://🛍.example.com/")
// ✅ "https://xn--878h.example.com/"

This may look strange if you are unfamiliar with IDNs. In order to be compatible with existing internet infrastructure, Unicode text in domains needs special compatibility processing, resulting in an encoded string with the distinctive "xn--" prefix. This processing is called IDNA. If somebody wants to register the domain "中国移动.中国", they instead register "xn--fiq02ib9d179b.xn--fiqs8s", and behind the scenes, everything works just like it always did with plain, non-Unicode domains -- importantly, we don't need internet routing infrastructure or applications to process hostnames differently to how they normally would. This encoded version is not very helpful to humans, but browsers and applications can detect these domains and present them in Unicode (we have APIs for that; more info below).

For more information about IDNs see IDN World Report.

Browsers are making an increased effort this year to align their own IDNA implementations (Safari/WebKit already conforms), and it has been announced that Apple's next major operating system releases will include support in Foundation URL. Now WebURL also implements this part of the URL Standard, it is available now, and it fully backwards-deploys. It's important that URLs work consistently for everybody, and WebURL can help with that.

What's more - since this processing happens in the URL type, it works with our existing Foundation interop:

import WebURL
import Foundation
import WebURLFoundationExtras

let (data, _) = try await URLSession.shared.data(for: WebURL("http://全国温泉ガイド.jp")!)
// ✅ Works

let convertedToURL = URL(WebURL("http://全国温泉ガイド.jp")!)!
// ... continue processing 'convertedToURL' as you normally would

Developers have been asking for better IDN support across the industry for years - at this stage of adoption, most IDNs are in China, so Chinese developers in particular have been wanting to work with these kinds of URLs. I'm especially pleased that WebURL is now able to offer it to any Swift application.

:open_book: Host Parsing API

IDN support as the standard requires is great and all, but it isn't enough.

URLs are designed to be universal - infinitely customisable. There are some "special" schemes which the standard knows about, such as http:, and while their hosts have semantic meaning (they are network addresses, hence we should use IDNA, detect IPv4 addresses, etc), generally, for other schemes, the host is just an opaque string and is not interpreted.

That's the correct model, but frequently we are processing URLs which are very HTTP-like, and we would like to support the same network addresses, in the same way, as an HTTP URL. For instance, suppose we were writing an application to handle ssh: URLs - the standard would only parse IPv6 addresses out for us, and everything else would just be an opaque string.

WebURL("ssh://karl@somehost/")!.host
// 😐 .opaque, "somehost"

WebURL("ssh://karl@abc.أهلا.com/")!.host
// 😕 .opaque, "abc.%D8%A3%D9%87%D9%84%D8%A7.com"

WebURL("ssh://karl@192.168.0.1/")!.host
// 🤨 .opaque, "192.168.0.1"

Request libraries generally need to write their own parsers to handle this, but it is difficult to match the host parser for HTTP URLs exactly... unless, of course, you are the URL host parser :thinking:...

So with 0.4.0, WebURL's Host type exposes the URL host parser directly to your applications. Not only is this great for processing URLs of any scheme, it's also useful for hostnames provided via command-line interfaces or configuration files. Being able to guarantee the host is interpreted the same way as it would be in an http: URL is a very useful property, just by itself.

WebURL.Host("EXAMPLE.com", scheme: "http")
// 😍 .domain, Domain { "example.com" }

WebURL.Host("abc.أهلا.com", scheme: "http")
// 🤩 .domain, Domain { "abc.xn--igbi0gl.com" }

WebURL.Host("192.168.0.1", scheme: "http")
// 🥳 .ipv4Address, IPv4Address { 192.168.0.1 }

:duck: Domain API

Exposing the host parser is great and all, but it also isn't enough.

Previously, we only had types for IPv4 and IPv6 addresses, and domains were represented as Strings. Now, domains have their own type - WebURL.Domain, which is guaranteed to contain a validated, normalised domain from the URL host parser, and can be a useful place to house APIs which operate on domains.

WebURL.Domain("example.com")  // ✅ "example.com"
WebURL.Domain("localhost")    // ✅ "localhost"
WebURL.Domain("api.أهلا.com")  // ✅ "api.xn--igbi0gl.com"
WebURL.Domain("xn--caf-dma")  // ✅ "xn--caf-dma" ("café")

WebURL.Domain("in valid")     // ✅ nil (spaces are not allowed)
WebURL.Domain("xn--cafe-yvc") // ✅ nil (invalid IDN)
WebURL.Domain("192.168.0.1")  // ✅ nil (not a domain)

The most important API right now is render, which builds a result using an encapsulated algorithm. There is opportunity for renderers to produce any kind of result - for example, they might perform spoof-checking to guard against confusable text, or they might use a database to shorten domains to their most important section, or they might have special formatting for particular domains. You can create a renderer by conforming to the WebURL.Domain.Renderer protocol.

WebURL comes with an uncheckedUnicodeString renderer, so you can recover the Unicode form of a domain. This renderer does not perform any spoof-checking, so is not recommended for use in UI.

let domain = WebURL.Domain("xn--fiq02ib9d179b.xn--fiqs8s")!
domain.render(.uncheckedUnicodeString)
// ✅ "中国移动.中国"

And with that, I'm happy with WebURL's host story. It provides rich, detailed information about the hosts defined in the URL Standard and gives you the means to easily and robustly process them. Please try it out and leave feedback!

:gift: Bonus: Spoof-checked renderer prototype

It is important that applications use spoof checking when displaying domains in Unicode form. We have a proof-of-concept renderer which ports much of Chromium's IDN spoof-checking logic. It works on my Mac, but deploying it can be a pain because it depends on the ICU library for its implementation of UAX39.

// Non-IDNs.
WebURL.Domain("paypal.com")?.render(.checkedUnicodeString) // ✅ "paypal.com"
WebURL.Domain("apple.com")?.render(.checkedUnicodeString)  // ✅ "apple.com"

// IDNs.
WebURL.Domain("a.أهلا.com")?.render(.checkedUnicodeString)   // ✅ "a.أهلا.com"
WebURL.Domain("你好你好")?.render(.checkedUnicodeString)     // ✅ "你好你好"

// Spoofs.
WebURL.Domain("раγpal.com")?.render(.checkedUnicodeString) // ✅ "xn--pal-vxc83d5c.com"
WebURL.Domain("аpple.com")?.render(.checkedUnicodeString)  // ✅ "xn--pple-43d.com"

It would be great to turn this in to a maintained, easily-deployable package. I'm too busy right now, so it remains a prototype, but maybe one day? Or if anybody else would like to get involved, they can use it as a starting point.

Bugfixes

  • Fixed a crash when appending an empty array of form params (#140). Thanks to adam-fowler for the report. Sorry it took so long to get in to a release.
14 Likes