The implementation is more-or-less contained to one file. I offered to contribute the implementation to swift-system a while ago, but I think we agreed that things such as IP addresses should go in some kind of lower-level, platform-independent library.
That said, even if there were a common data type defined somewhere, WebURL would still implement its own parsing and serialisation routines because these are defined by the URL standard.
--
As for splitting things in to separate modules:
I split IDNA in to its own package product, because it's huge and implements a self-contained standard. It's fully-documented, and you can use it via the "_WebURLIDNA" product, although I'm not promising a stable API for it at the moment (hence the leading underscore).
When it comes to IP addresses, percent-encoding, and other "utilities" like that, the problem is that I can see that quickly expanding to almost all of the library.
For instance, the host parser - a user gives you some string (e.g. on the command line or via a configuration file), and you want to interpret it the same way the URL parser does, to find out whether it's a domain/IPv4/IPv6, and if it's a domain, to perform IDNA compatibility processing in order to normalise it. You seem to know that you want IP addresses, but I can imagine a lot of other applications wanting a general host parser.
It's a non-trivial operation, so it's good for a library to handle that for you -- for instance, the hostname "0x𝟕f.1"
contains U+1D7D5 MATHEMATICAL BOLD DIGIT SEVEN
. The way the URL standard's host parser algorithm works, the string goes through IDNA before it is parsed as an IP address, so this gets mapped to "0x7f.1"
(with a normal "7"), which then gets successfully parsed as 127.0.0.1
. Check it in the reference parser.
In fact, it gets even worse - you can percent-encode that mathematical 7, and it still parses to 127.0.0.1
(Reference parser). So it goes through this enormous journey where the host string "0x%f0%9d%9f%95f.1"
needs to be percent-decoded, and then we figure out that the result contains Unicode text, so we send it through IDNA, then we figure out that the result is some ancient form of IPv4 address and parse that. That's why I say it's really not trivial.
That's all lots of fun, but the part where it gets serious is that, if a browser (or any other conforming URL implementation) is going to see the string 0x𝟕f.1
(with the mathematical 7), or the percent-encoded version, and think "oh, that's 127.0.0.1
", then that's a really valuable thing for applications to know. There is a whole class of vulnerabilities known as Server-Side Request Forgery which involves tricking servers to make requests to internal services.
But all of this processing? It's not part of the IP address parser; you need the full URL host parser for that. That's why I think it would quickly grow to encompass the host parser as well. Which is like basically half the library.
By the way, WebURL already does expose the host parser as API, so you can parse hosts from anywhere exactly like the URL parser would:
WebURL.Host("0x%f0%9d%9f%95f.1", scheme: "http")
// .ipv4Address(127.0.0.1)
It's part of the WebURL module, and I don't see a problem with that.
So at the moment I'm not inclined to split things like IP addresses and percent-encoding out in to separate modules. I try to expose as many things as I can as API, with generics, so you can use the parts that you need without costly and annoying data type conversions. In this particular case, I've tried to make the file more-or-less self-contained, but I'm not inclined to break the library apart for a fully à la carte experience.