IDN/Punycode in URL

Sure. I haven't had nearly as much time to work on it as I would have liked, but I'm back at it now. You can check out my progress at GitHub - karwa/base at url (in "Sources/URL"), although I will warn that it is pretty rough. I'm mostly concerned with getting the algorithm correct, then I'll worry about cleaning up the various utility functions I added along the way.

The main things I still have to do are:

  • Finishing host parsing: IPv4 and v6 addresses are done, and match libc's inet_aton and inet_pton. I spent a little bit more time on them since it's a nicely encapsulated set of functionality. Only "opaque host" and domain parsing (including IDN/punycode stuff) remains.
  • Serialisation
  • Figuring out the API (including the name - I've called it XURL as a placeholder; I don't think it would be wise to have a second type called URL with different behaviour, and we all know that the 'X' makes it cool)
  • Lots and lots and lots of tests
  • Optimising the layout. Basically the only way to tell if a URL is valid is to parse it and see if it fails, which could result in loads of heap allocations depending on the lengths of the Strings. I'd like to see if it's possible to use "shared strings" to have them share a single allocation, and maybe have some kind of in-line storage for paths/query strings without many components.

The main parsing algorithm is essentially complete, save for a couple of clearly-marked TODOs (e.g. query parameters).

1 Like