[Back from revision] Foundation URL Improvements

rex-remind · May 27, 2022, 2:36am

Sometimes I find it difficult to parse everything you're attempting to explain. So at risk of sounding a bit dumb, here it goes.

My understanding is @Karl believes that leaving evaluation of URL("trash") to potentially crash at runtime is confusing, and there's not much benefit over a developer explicitly typing !.

My opinion is this is actually dangerous, and I'll provide an example a bit below. My question to your response though is, why make things even harder on the developer and end user? Yes we can't protect a developer from not proofreading, but we can protect them from writing code that crashes an app, an app that may perform many other functions that are very important and valuable to the end user.

I'll provide my example now of why what you're arguing is dangerous. A pattern I've seen in apps is that a server may generate a URL string and send it in a push notification to an application, which then ingests / uses the URL. I have explicitly witnessed another not as friendly mobile platform app crash because of an implicit crashing behavior related to parsing a URL string that was incorrectly formatted, by contrast the iOS app did not crash but reported errors that triggered a page. Soon after the server bug was fixed. However, guess which users complained and which didn't? Now imagine this app is vitally important for some arbitrary reason? Would you really choose to have the code which implicitly crashes in this scenario or the code which asks the developer to softly fail and log an error?

xwu · May 27, 2022, 12:21pm

rex-remind:

I'll provide my example now of why what you're arguing is dangerous. A pattern I've seen in apps is that a server may generate a URL string and send it in a push notification to an application, which then ingests / uses the URL. I have explicitly witnessed another not as friendly mobile platform app crash because of an implicit crashing behavior related to parsing a URL string that was incorrectly formatted, by contrast the iOS app did not crash but reported errors that triggered a page. Soon after the server bug was fixed. However, guess which users complained and which didn't? Now imagine this app is vitally important for some arbitrary reason? Would you really choose to have the code which implicitly crashes in this scenario or the code which asks the developer to softly fail and log an error?

The discussion was about literals.
I see @stackotter has replied to you in another thread about the same point.

rex-remind · May 27, 2022, 4:51pm

The discussion was about literals .

Thank you for clarifying, that assuages part of my concern however @stackotter 's reply contains

A URL literal would be required to be known at compile time, meaning that the compiler can throw an error if the URL is invalid

Which, if I have followed correctly, the second phrase here is actually not the case. The compiler isn't checking if the URL is invalid, that's still checked at runtime, and compiler evaluation is left for a hypothetical future direction. I get now why @Karl stated that this is confusing, it's already confusing to explain on this forum. I also still stand by my remark that not crashing by default is preferable for the general reasons I provided.

xwu · May 27, 2022, 6:23pm

As I said,

rex-remind · May 27, 2022, 8:45pm

Would a URL to a typosquatter's domain crash the app?

If yes I can see how one might have the opinion that "we cannot adequately protect the user from the pitfalls of not proofreading what they’ve typed" therefore just crash the app whenever one of those pitfalls occurs.

If no then that seems like a cherry picked example for arguing for a general principal of "we cannot adequately protect the user from the pitfalls of not proofreading what they’ve typed", but there's a clear and different counter-example in that we can protect users from crashing their app, which contradicts the assertion of "cannot..protect". We can also help protect, or at least assist, users from confusion in api usage.

xwu · May 27, 2022, 9:32pm

A typosquatter can steal your users' credit card info; an app that doesn't run cannot. In Swift, fatal errors are one mechanism by which safety is achieved--as has been said before, crashing is safer and more desirable than continuing execution in an unanticipated state.

rex-remind · May 27, 2022, 10:11pm

So a typosquatter's URL will crash the app? I'm asking because I'm not sure, I'd assume the typosquatter would still need to squat on a technically valid URL.

Karl · May 27, 2022, 11:28pm

I don't think anybody realistically hopes for protection again typos - when I call URL("http://aple.com"), I don't expect it to return nil because I forgot the "p". I don't expect it to read my mind.

And more importantly, if it did not return nil, I would not assume the result to be typo-free.

That said, the failable version does detect and reject invalid syntax, so I think that is part of what developers expect when constructing a URL from a string (literal or not). We also know they would like mistakes (these kinds of mistakes) to be detected at build-time.

This feature is limited to build-time strings, and removes the optional, and so has the appearance that some kind of build-time processing is taking place. I feel that is the impression this and other pitches give. But no - no build-time URL processing or validation is taking place whatsoever, and it traps at runtime if it happens to be executed with an invalid string.

I still think that is misleading. Yes, developers should test every code path, but sometimes they don't, and I don't think they deserve this sort of deterioration in developer experience for those code paths. The balance is different for each API, but IMO removing the optional on this API needs to meet a higher bar. There needs to be some amount of build-time checking to make up for it.

idrougge · May 27, 2022, 11:29pm

I think that, from a correctness perspective, a misspelt but valid URL is distinct from a misspelt but invalid URL. It is a common pattern for Swift programs to crash on invalid forms (under certain circumstances, such as literals which are known at compile time), but a compiler can never tell if you intended to write foo.org as opposed to foo.com. That distinction can’t even be caught by the most advanced compiler or type system, but it is very much possible to catch n o n s e n se as an invalid URL.

Karl · May 28, 2022, 12:02am

It's an interesting way to word it.

Yes, a compiler couldn't tell you. But then - why are we even asking a compiler? Compilers turn source code in to executable code, they don't validate URLs or check domain names! ...do they?

The benefit of having a compiler do this (via compile-time evaluation) is that the results can be statically guaranteed. It's just an optimisation - we simulated the result and can constant-fold a bunch of logic.

But if we can't do that - because the library is part of an ABI-stable SDK, or the standards are not stable enough for that kind of guarantee, or the library is too complex, or whatever other reason - why are we still asking the compiler?

I said before that I think linting is the way to go here. Build-time input validation delivered by packages but not evaluated by the compiler. Libraries would tag functions/initializers (@lintable?), and the compiler would gather all the inputs it manages to constant-fold and pass them to a package plugin. The plugin checks them, and the IDE can show a little at the call-site to show that the tool is happy with the value, and if something can't pass the build-time tool, it might even be worth failing the build.

That would actually be build-time validation, in a way that can be delivered in a realistic timeframe (right?), which scales to ABI-stable SDKs, unstable standards, and other complex libraries. It wouldn't change anything in the type system - you'd still deal in optionals like today, but you'd have that extra level of checking at build-time with no configuration needed.

Even if it only applied to strings, I think a system like that could carry us a long way.

xwu · May 28, 2022, 5:24am

It is distinguishable, certainly. My argument continues to be that (for a literal, which is the topic of this debate) it is a distinction at best of minimal significance:

Either the URL string is correct, meaning it references the author's intended resource, or it is incorrect. If there is one salient way in which to subdivide incorrect URL strings, I'd argue it's whether they're resolvable to some (unintended) resource or unresolvable entirely—these have very different security implications. For the end user, if a resource doesn't load due to a typo, I can hardly imagine they care very much whether it was nonetheless a syntactically valid typo that caused no resource to be found.

As you say, a compiler cannot know the author's intention. And as I've said above, nothing and no one but the author can know, whether at runtime or at build time, and regardless of what syntax is used. Notably, what is impossible to do for anyone or anything else boils down to merely proofreading the typed-out literal by the author.

It is true that whether a URL string is valid could conceivably be validated at build time, and invalid URLs obviously cannot reference the author's intended resource. Certainly, no one is arguing against the Swift compiler performing such validation when the features are available to implement it.

That it is inarguably possible to make such an improvement, however, doesn't mean that it is therefore a showstopper not to do so. This is because it does not follow that since having more build-time validation is desirable and ensuring the validity of URL strings is about all we can expect of the compiler, this limitation therefore defines relevant thresholds of URL "correctness" for users of the API. Indeed, as @Karl astutely points out, it does not even lead to the inevitable conclusion that it is the compiler that has to have the starring role at compile time.

It is an intriguing idea to have a linter involved because such a tool could not only call out invalid URLs, but it can also flag valid URLs that don't resolve to any resource (at least in the moment)—and it can also decrease the effort required for authors to ensure that URLs which do resolve actually reference the intended resource. As I argue here, these latter advantages are going to be more meaningful to users than catching mere syntactic invalidity.

stackotter · May 29, 2022, 12:02pm

That is correct, this pitch would not add the functionality required to add the compile-time URL initialiser afaik. However, it would not be adding an unsafe version as you suggested.

This comes back to my main issue with this proposal (and I'm not alone). That is: it does not add any benefit except for future directions. If these future directions were to be implemented at a later point, we might realise that the feature isn't actually designed in the most ergonomic way but it would be too late to change it.

stackotter · May 29, 2022, 12:09pm

Sorry, I got confused, I thought I was replying to the compile time constants proposal. My bad

aerobounce · June 30, 2022, 7:46pm

The improvements are now in! I thought this deserves mentioning for people who don't know yet (like me until a few mins ago).
This is just a link to the one of them.

https://developer.apple.com/documentation/foundation/url/3988464-init

tera · February 2, 2024, 1:19am

Conceptually, if we were to start over, shouldn't URL be a protocol (with a common functionality) and particular types (FileURL, DirectoryURL, WebURL) conforming to that protocol and adding unique per type functionality?

// A minimal sketch:
protocol URL {
	var scheme: String { get }
}
protocol FileOrDirectoryItemURL: URL {
	var parent: DirectoryURL { get }
}
struct FileURL: FileOrDirectoryItemURL {
	func data() async -> Data { ... }
}
struct DirectoryURL: FileOrDirectoryItemURL {
	func children() async -> [FileOrDirectoryItemURL] { ... }
}
struct WebURL: URL {
	var queryItems: [QueryItem] { ... }
}

George · February 2, 2024, 6:21am

Would other URL schemes be supported... for instance are "ipfs://..." or "custom-mock-url-protocol://..." valid URL literals?