Announcing WebURL - a new URL type for Swift

ktoso · November 26, 2021, 8:18am

I didn't have a chance to dig very deep into this yet, but it looks promising, great work
Especially on the fuzzing side, great to see this!

I have some past experience on designing such type-safe URI/URL type from Akka HTTPs Uri (guide here). You may be interested in our tests there which cover some of the strict/relaxed parsing mode things as well which have been tricky to get right.

Minor questions and feedback from skimming the types:

A common thing the real world forced onto us back there in Akka was a strict and a relaxed parsing mode; Turns out the real world doesn't really respect the specifications all that well. Have you given this some thought already?

You could do a bit more with the query property. It's just a String today which is fairly boring and users have to then deal with it. I'd suggest doing a bit more work for the users of the lib here, and offer a Query type: Akka HTTP 10.2.7 - akka.http.scaladsl.model.Uri.Query that is a specialized linear collection type. The nice thing then is that people can construct an Uri/WebURL using APis like url.query = [("bla", "bla"), ("two", "2")] or just adding query elements to the url.query.add(...) etc. Though perhaps most importantly it allows for url.query.last("name") -> String?, url.query.all("q") -> [String] and similar lookup APIs.

I also didn't see an Authority in public API, that's a nice thing to expose, and move the password and user into it. For reference: Akka HTTP 10.2.7 - akka.http.scaladsl.model.Uri.Authority (one could even include the userinfo).

Nicely done with the portOrKnownPort though, if I may nitpick, avoiding Or in names could be nice; We called this effectivePort in Akka - in general "effective..." is a nice naming pattern I think You could also extend the list of known ports a bit, feel free to use this list as reference.

Karl · November 26, 2021, 10:16am

Thanks for the comments!

I really appreciate scrutinising the API. URLs are complicated, and I'm sure we've all had moments where we found a URL library's API awkward or confusing. I'm not arrogant enough to think I can just dream up the best API all by myself, and I really believe it makes a better product if everyone who is interested gets involved. Especially when you're deep in the technical details and standards, it can be easy to forget how to make something that's easy for everyone to use.

Right; that's the reason the WHATWG standard was created. Browsers didn't follow the standards, all acted differently, and URL libraries such as cURL and Python's urllib all stopped strictly following the standards.

The standards didn't match the web
That meant browsers couldn't "fix" their URL handling without breaking the web
Developers expect things to work like they do on the web. All of the major libraries decided in favour of some number of "web compatibility" hacks over strict standards compliance

And this goes beyond user input - you can get "technically invalid" URLs in server responses, including HTTP headers, databases, etc., and browsers support weird URLs in all of those places (they have to). So it's really important, or some services and data will work on the web (the most important place of all) but then start failing once you switch to a new technology or add a new client. It can even be a source of exploits.

People tend to think that URLs are a "solved" problem, but they're really not. The fact that browsers are finally codifying what it means to be compatible with the web and aligning their implementations is (unfortunately) cutting-edge standardisation work.

As for parsing modes, one of the goals of the new standard is this:

Ensure the combination of parser, serializer, and API guarantee idempotence. For example, a non-failure result of a parse-then-serialize operation will not change with any further parse-then-serialize operations applied to it. Similarly, manipulating a non-failure result through the API will not change from applying any number of serialize-then-parse operations to it.

This basically rules out special parsing modes or .normalize() functions. A URL has no meaningful information other than its String representation, and every mutation must leave it in a completely normalised state. It's a really beautiful feature of the standard, and it makes everything extremely predictable.

We do! There's a .formParams write-through view, which supports dynamic member lookup to get and set key-value pairs (API Reference):

let url = WebURL("https://example.com/currency/convert?amount=20&from=EUR&to=USD")!

url.formParams.amount    // "20"
url.formParams.from      // "EUR"
url.formParams.get("to") // "USD"

url.formParams.amount = "56"
url.formParams.from   = "USD"
url.formParams.to     = "Pound Sterling"
// url = "https://example.com/currency/convert?amount=56&from=USD&to=Pound+Sterling"

You can also use the append function or += operator to append a dictionary:

url.formParams += [
  "format": "json",
  "client": "app"
]
// url = "https://example.com/currency/convert?amount=56&from=USD&to=Pound+Sterling&client=app&format=json"

Things I'd like to add to this:

Query sorting, for more effective caching
Percent-encoding, not just form-encoding

Unicode-aware key lookup (requires the stdlib to expose lower-level Unicode comparison APIs):

let url = WebURL("http://example.com?jalape\u{006E}\u{0303}os=2")!
url.serialized() // "http://example.com/?jalapen%CC%83os=2"

// Key lookup needs to search through percent-encoding,
// which means it can't be Unicode-aware unless we allocate
// a String for each key-value pair:
url.formParams.get("jalape\u{006E}\u{0303}os") // "2"
url.formParams.get("jalape\u{00F1}os") // nil

// If you iterate, you will allocate a String each time,
// so you will use Unicode-aware comparison.
url.formParams.allKeyValuePairs.first(where: { $0.0 == "jalape\u{00F1}os" }) // ("jalapeños", "2")

AFAIK this would be unprecedented in any URL library, but I also think it makes sense for Swift users. Unicode normalization shouldn't be a thing Swift developers need to care about.

Well, the whole concept of usernames and passwords in URLs is officially deprecated, so it didn't seem worth adding an extra type.

We do have a couple of nice things that are related to this, though:

A Host enum (via .host), which gives you direct access to an IPv4/IPv6 address, as the URL parser interpreted it. That means you don't need to re-parse the hostname to make a network connection, and can guarantee your behaviour matches what the URL means. The IP address types have full APIs and are easy to convert to an in_addr/in6_addr or NIO SocketAddress (as shown by the async-http-client port)
An Origin type (via .origin). So if you're using an Authority to mean "security domain", you can instead use something resembling the web's security model.

I agree, I don't really love this name. I was thinking about exposing a Scheme enum, with cases for special schemes like .http, .https, .file, and a .other(String) case for the rest. We use a similar thing internally, so it already exists, and that might be a good place for a var defaultPort: Int? property.

Karl · December 15, 2021, 6:01pm

New, shiny, DocC-driven docs are up

I want to say that I am sincerely grateful to the DocC team. Overall, both before I released this project and since, I've probably spent about as much time writing docs and guides as I have writing code, and I've never really been satisfied with the results until now.

I mentioned in another thread how, as a developer of an open-source library, I feel that the size and complexity of libraries we can create is effectively limited by how well we can explain how to use them. In that respect, moving to DocC has been like moving from Notepad to MS Word/Pages, or from Paint to Photoshop. It's just on a whole different level, and I think it's going to be a really great thing for the Swift ecosystem, and help us all do bigger and better things.

I can't encourage you all enough to adopt DocC in your libraries - and really, don't just rebuild your existing docs with a new coat of paint - take the time to learn what it has to offer in terms of structure, and don't be afraid to organise things radically different to how they are in the type system. The first page looks... almost alarmingly simple (to me!), almost like the library doesn't do that much, but trust me, there's still a loooot of depth - only now, it's exposed at the right places. It's all there when and where you need it.

If you compare it to the old docs, the difference is just night and day. (Yeah, I keep copies - every release from now on will have a stable URL with its specific version of the docs)

If you've been put off by the previous documentation - maybe it was too complex and disorganised - I'd encourage you to take another look at what WebURL has to offer. Otherwise, I hope that just looking at what it can do here will inspire you to adopt DocC in your own packages.

There's still a couple of things little things I want to do before the next release, but it'll be coming soon

tera · December 15, 2021, 9:40pm

Great work. I wonder if WebURL could be used from Obj-C? I guess some class wrapper on the swift side would be in order? No immediate need, just testing the boundaries.

Karl · December 15, 2021, 11:08pm

Sure, I don't see why not. It hasn't been a priority for me, but if somebody wants to take ownership of an Objective-C interface, I'd welcome that.

And it might be a more straightforward way to get involved. I don't expect everybody to just jump right in and start picking apart the standard, so API-level contributions are definitely appreciated as well.