Thanks for the comments!
I really appreciate scrutinising the API. URLs are complicated, and I'm sure we've all had moments where we found a URL library's API awkward or confusing. I'm not arrogant enough to think I can just dream up the best API all by myself, and I really believe it makes a better product if everyone who is interested gets involved. Especially when you're deep in the technical details and standards, it can be easy to forget how to make something that's easy for everyone to use.
Right; that's the reason the WHATWG standard was created. Browsers didn't follow the standards, all acted differently, and URL libraries such as cURL
and Python's urllib
all stopped strictly following the standards.
- The standards didn't match the web
- That meant browsers couldn't "fix" their URL handling without breaking the web
- Developers expect things to work like they do on the web. All of the major libraries decided in favour of some number of "web compatibility" hacks over strict standards compliance
And this goes beyond user input - you can get "technically invalid" URLs in server responses, including HTTP headers, databases, etc., and browsers support weird URLs in all of those places (they have to). So it's really important, or some services and data will work on the web (the most important place of all) but then start failing once you switch to a new technology or add a new client. It can even be a source of exploits.
People tend to think that URLs are a "solved" problem, but they're really not. The fact that browsers are finally codifying what it means to be compatible with the web and aligning their implementations is (unfortunately) cutting-edge standardisation work.
As for parsing modes, one of the goals of the new standard is this:
- Ensure the combination of parser, serializer, and API guarantee idempotence. For example, a non-failure result of a parse-then-serialize operation will not change with any further parse-then-serialize operations applied to it. Similarly, manipulating a non-failure result through the API will not change from applying any number of serialize-then-parse operations to it.
This basically rules out special parsing modes or .normalize()
functions. A URL has no meaningful information other than its String representation, and every mutation must leave it in a completely normalised state. It's a really beautiful feature of the standard, and it makes everything extremely predictable.
We do! There's a .formParams
write-through view, which supports dynamic member lookup to get and set key-value pairs (API Reference):
let url = WebURL("https://example.com/currency/convert?amount=20&from=EUR&to=USD")!
url.formParams.amount // "20"
url.formParams.from // "EUR"
url.formParams.get("to") // "USD"
url.formParams.amount = "56"
url.formParams.from = "USD"
url.formParams.to = "Pound Sterling"
// url = "https://example.com/currency/convert?amount=56&from=USD&to=Pound+Sterling"
You can also use the append
function or +=
operator to append a dictionary:
url.formParams += [
"format": "json",
"client": "app"
]
// url = "https://example.com/currency/convert?amount=56&from=USD&to=Pound+Sterling&client=app&format=json"
Things I'd like to add to this:
- Query sorting, for more effective caching
- Percent-encoding, not just form-encoding
- Unicode-aware key lookup (requires the stdlib to expose lower-level Unicode comparison APIs):
let url = WebURL("http://example.com?jalape\u{006E}\u{0303}os=2")!
url.serialized() // "http://example.com/?jalapen%CC%83os=2"
// Key lookup needs to search through percent-encoding,
// which means it can't be Unicode-aware unless we allocate
// a String for each key-value pair:
url.formParams.get("jalape\u{006E}\u{0303}os") // "2"
url.formParams.get("jalape\u{00F1}os") // nil
// If you iterate, you will allocate a String each time,
// so you will use Unicode-aware comparison.
url.formParams.allKeyValuePairs.first(where: { $0.0 == "jalape\u{00F1}os" }) // ("jalapeΓ±os", "2")
AFAIK this would be unprecedented in any URL library, but I also think it makes sense for Swift users. Unicode normalization shouldn't be a thing Swift developers need to care about.
Well, the whole concept of usernames and passwords in URLs is officially deprecated, so it didn't seem worth adding an extra type.
We do have a couple of nice things that are related to this, though:
-
A Host
enum (via .host
), which gives you direct access to an IPv4/IPv6 address, as the URL parser interpreted it. That means you don't need to re-parse the hostname to make a network connection, and can guarantee your behaviour matches what the URL means. The IP address types have full APIs and are easy to convert to an in_addr
/in6_addr
or NIO SocketAddress
(as shown by the async-http-client
port)
-
An Origin
type (via .origin
). So if you're using an Authority
to mean "security domain", you can instead use something resembling the web's security model.
I agree, I don't really love this name. I was thinking about exposing a Scheme
enum, with cases for special schemes like .http
, .https
, .file
, and a .other(String)
case for the rest. We use a similar thing internally, so it already exists, and that might be a good place for a var defaultPort: Int?
property.