Announcing WebURL - a new URL type for Swift

I didn't have a chance to dig very deep into this yet, but it looks promising, great work :+1:
Especially on the fuzzing side, great to see this! :slight_smile:

I have some past experience on designing such type-safe URI/URL type from Akka HTTPs Uri (guide here). You may be interested in our tests there which cover some of the strict/relaxed parsing mode things as well which have been tricky to get right.


Minor questions and feedback from skimming the types:

A common thing the real world forced onto us back there in Akka was a strict and a relaxed parsing mode; Turns out the real world doesn't really respect the specifications all that well. Have you given this some thought already?

You could do a bit more with the query property. It's just a String today which is fairly boring and users have to then deal with it. I'd suggest doing a bit more work for the users of the lib here, and offer a Query type: Akka HTTP 10.2.7 - akka.http.scaladsl.model.Uri.Query that is a specialized linear collection type. The nice thing then is that people can construct an Uri/WebURL using APis like url.query = [("bla", "bla"), ("two", "2")] or just adding query elements to the url.query.add(...) etc. Though perhaps most importantly it allows for url.query.last("name") -> String?, url.query.all("q") -> [String] and similar lookup APIs.

I also didn't see an Authority in public API, that's a nice thing to expose, and move the password and user into it. For reference: Akka HTTP 10.2.7 - akka.http.scaladsl.model.Uri.Authority (one could even include the userinfo).

Nicely done with the portOrKnownPort though, if I may nitpick, avoiding Or in names could be nice; We called this effectivePort in Akka - in general "effective..." is a nice naming pattern I think :slight_smile: You could also extend the list of known ports a bit, feel free to use this list as reference.

4 Likes

Thanks for the comments!

I really appreciate scrutinising the API. URLs are complicated, and I'm sure we've all had moments where we found a URL library's API awkward or confusing. I'm not arrogant enough to think I can just dream up the best API all by myself, and I really believe it makes a better product if everyone who is interested gets involved. Especially when you're deep in the technical details and standards, it can be easy to forget how to make something that's easy for everyone to use.


Right; that's the reason the WHATWG standard was created. Browsers didn't follow the standards, all acted differently, and URL libraries such as cURL and Python's urllib all stopped strictly following the standards.

  1. The standards didn't match the web
  2. That meant browsers couldn't "fix" their URL handling without breaking the web
  3. Developers expect things to work like they do on the web. All of the major libraries decided in favour of some number of "web compatibility" hacks over strict standards compliance

And this goes beyond user input - you can get "technically invalid" URLs in server responses, including HTTP headers, databases, etc., and browsers support weird URLs in all of those places (they have to). So it's really important, or some services and data will work on the web (the most important place of all) but then start failing once you switch to a new technology or add a new client. It can even be a source of exploits.

People tend to think that URLs are a "solved" problem, but they're really not. The fact that browsers are finally codifying what it means to be compatible with the web and aligning their implementations is (unfortunately) cutting-edge standardisation work.

As for parsing modes, one of the goals of the new standard is this:

  • Ensure the combination of parser, serializer, and API guarantee idempotence. For example, a non-failure result of a parse-then-serialize operation will not change with any further parse-then-serialize operations applied to it. Similarly, manipulating a non-failure result through the API will not change from applying any number of serialize-then-parse operations to it.

This basically rules out special parsing modes or .normalize() functions. A URL has no meaningful information other than its String representation, and every mutation must leave it in a completely normalised state. It's a really beautiful feature of the standard, and it makes everything extremely predictable.


We do! There's a .formParams write-through view, which supports dynamic member lookup to get and set key-value pairs (API Reference):

let url = WebURL("https://example.com/currency/convert?amount=20&from=EUR&to=USD")!

url.formParams.amount    // "20"
url.formParams.from      // "EUR"
url.formParams.get("to") // "USD"

url.formParams.amount = "56"
url.formParams.from   = "USD"
url.formParams.to     = "Pound Sterling"
// url = "https://example.com/currency/convert?amount=56&from=USD&to=Pound+Sterling"

You can also use the append function or += operator to append a dictionary:

url.formParams += [
  "format": "json",
  "client": "app"
]
// url = "https://example.com/currency/convert?amount=56&from=USD&to=Pound+Sterling&client=app&format=json"

Things I'd like to add to this:

  • Query sorting, for more effective caching
  • Percent-encoding, not just form-encoding
  • Unicode-aware key lookup (requires the stdlib to expose lower-level Unicode comparison APIs):
    let url = WebURL("http://example.com?jalape\u{006E}\u{0303}os=2")!
    url.serialized() // "http://example.com/?jalapen%CC%83os=2"
    
    // Key lookup needs to search through percent-encoding,
    // which means it can't be Unicode-aware unless we allocate
    // a String for each key-value pair:
    url.formParams.get("jalape\u{006E}\u{0303}os") // "2"
    url.formParams.get("jalape\u{00F1}os") // nil
    
    // If you iterate, you will allocate a String each time,
    // so you will use Unicode-aware comparison.
    url.formParams.allKeyValuePairs.first(where: { $0.0 == "jalape\u{00F1}os" }) // ("jalapeΓ±os", "2")
    
    AFAIK this would be unprecedented in any URL library, but I also think it makes sense for Swift users. Unicode normalization shouldn't be a thing Swift developers need to care about.

Well, the whole concept of usernames and passwords in URLs is officially deprecated, so it didn't seem worth adding an extra type.

We do have a couple of nice things that are related to this, though:

  • A Host enum (via .host), which gives you direct access to an IPv4/IPv6 address, as the URL parser interpreted it. That means you don't need to re-parse the hostname to make a network connection, and can guarantee your behaviour matches what the URL means. The IP address types have full APIs and are easy to convert to an in_addr/in6_addr or NIO SocketAddress (as shown by the async-http-client port)

  • An Origin type (via .origin). So if you're using an Authority to mean "security domain", you can instead use something resembling the web's security model.

I agree, I don't really love this name. I was thinking about exposing a Scheme enum, with cases for special schemes like .http, .https, .file, and a .other(String) case for the rest. We use a similar thing internally, so it already exists, and that might be a good place for a var defaultPort: Int? property.

2 Likes
Terms of Service

Privacy Policy

Cookie Policy