Back `URL(string:)` with `NSURLComponents` instead of `NSURL`

From the URLComponents documentation:

This structure parses and constructs URLs according to RFC 3986. Its behavior differs subtly from that of the URL structure, which conforms to older RFCs. However, you can easily obtain a URL value based on the contents of a URLComponents value or vice versa.

Of course I understand why NSURL's implementation is stuck using older RFCs—there’s legacy Cocoa code going all the way back to 2001 (or maybe even earlier) that relies on the old behavior, and that we can’t break. But does that really apply to the Swift URL struct? At least conceptually, the Swift value types are a clean break from their Objective-C counterparts in many ways, particularly in the String type and the substantially different way it treats characters, indices and ranges. Therefore, doesn’t it make more sense for our URL struct to use the most current RFCs available to us in initialization, by using NSURLComponents-initWithString: method as the backing instead of NSURL's? For those who truly need the older RFC, the NSURL class is still available to Swift and easily bridgeable to URL.

As things stand currently, it seems to me, given the documentation, that URLComponents(string:)?.url is a more “correct” way to create a URL than URL(string:), but I’ve almost never seen anyone actually use it, to the point that I’ve actually received confused comments in the past from high-reputation users on Stack Overflow for using it in answers instead of URL(string:).

The work to change the implementation would be trivial, should we decide to do it.

What do you think?

7 Likes

Could you give us some examples of how does the behavior differ? I am not entirely familiar with the RFC’s in question, so probably some motivational examples would be great. Thanks!

1 Like

I’m not an expert on the topic, but I have managed to find a couple of URL strings that the two methods will parse differently:

let urlString = "http://www.foo.com/file;.html"

URLComponents will escape the semicolon here, whereas URL will not:

print(URLComponents(string: urlString)?.url as Any)
print(URL(string: urlString) as Any)

outputs:

Optional(http://www.foo.com/file%3B.html)
Optional(http://www.foo.com/file;.html)

Additionally, URL will attempt to parse a URL with the invalid character [ in path, even though this is invalid:

let urlString = "http://www.foo.com/file[.html"

print(URLComponents(string: urlString)?.url as Any)
print(URL(string: urlString) as Any)

outputs:

nil
Optional(http://www.foo.com/file%5B.html)

EDIT: URLComponents seems to be a lot better about sanity-checking in general; it rejects blatantly incorrect URIs which URL does not, such as:

[ and ] other than in a IPv6 address: http://www.foo.com/file[.html
non-numerics in the port number: http://www.foo.com:1~2
two @s in the user portion: http://user@user@www.foo.com/
colons where they don’t belong: http://www.f:oo.com/

Notice how the board software gives up on parsing the above URLs as soon as it hits the incorrect part, since they’re invalid URLs—but URL(string:) will try to parse them anyway.

There’s probably more; I’m far from an expert on the subject. For me the bottom line, though, is the fact that RFC 3986 is the current standard for URIs, and that the older RFCs 2732, 2396, and 1808 are all marked obsolete as per the specification. It is not considered acceptable to have String follow an obsolete version of the Unicode standard; why should this be any different?

10 Likes

Thanks for the examples. I am definitely +1 on this, as I’ve actually written some URL extensions that parse up links with unicode characters in them, etc. not knowing that URLComponents can handle this better.

Not sure how much code-breaking change this is, however, as it can break some assumptions about URL’s parsing…

In my opinion, Swift is still young enough (and the URL struct is even younger still) that we can get away with it. Compatibility with existing binaries isn’t something we have to worry about yet, although it will be soon, and since we have a migrator, we could easily just have it insert a comment before each use of URL(string:) saying something like // Consider switching to NSURL(string:) if you rely on legacy RFC 2396 behavior or some such to take care of the rare corner cases in which the old behavior may be preferable. Most of us can just delete the comment; those for whom it matters will be reminded to deal with it.

I’d argue that the cost of being stuck until the end of time with an implementation of something as fundamental as URL parsing which was already obsolete on launch day is higher, myself. String isn’t stuck only handling UCS-2 just because that’s what NSString was originally written for, after all.

5 Likes

While I agree with that, to be honest, I’ve moved to a full-Swift codebase starting Swift 2.x and I regretted it because every single year, I need to go through a major code upgrade and re-test. While this case may be rare, we should take into account that there are many developers who are quite angry that code that they wrote 3 years back no longer works in the newest Xcode…

While I can understand frustration, anger is misplaced. Apple promised from day one that incompatible changes would be coming until they declared otherwise. We don’t even have ABI stability yet, and people are upset that Apple has been true to its promise of API and language breakage?

2 Likes

“Angry” was a strong word - I should have used something along annoyed. I fully understand that it takes time to get the language to a stable state, but it was marketed as the language to develop in and there indeed are many people annoyed with the breaking releases. Nevertheless, this is quite off the topic.

The issue with this particular change is that it can manifest in ways that are hard to anticipate as you don’t break the code, but the behavior is changed in something that’s very much used almost in every single app out there.

Don’t take me wrong, I am all for this change, but I’m just trying to imagine the consequences of this change as URLs are not just used for HTTP links, but for phone numbers, email addresses (mailto), deep-linking, etc…

Well, the thing about it is that RFC 3986 was released in 2005. Being that Swift 3 with the URL value type was released over a decade later in 2016, it’s not an unreasonable assumption that most people using URL(string:) in 2018 are not expecting or desiring to parse URL strings via the 1990s-era ruleset. So I could argue that the behavior has already been changed in a way that most users would not expect, and this pitch is mainly to change it back. So in my estimation, this change would not be anywhere near as source-breaking as it may seem.

At any rate, if there are any Swift developers out there who are depending on 1990s URL-parsing behavior, I’d expect that they’ll probably be aware of it, and will do the appropriate thing if the migrator inserts a fixme warning about the change.

1 Like