[Back from revision] Foundation URL Improvements

That’s a good point. But couldn’t that potentially be fixed? I get why the parsing behavior of URL can’t change, but this has always felt like a bug to me. I can’t think of any other common Swift type where values that represent the same domain value are non-equal because they were constructed differently. (Aside from NaN.)

I’m making a bit of a broader claim, which is that extension URL: ExpressibleByStringLiteral isn’t some mere workaround desirable for the side effect of “tighter guarantees” for overloading but is the semantic expression in code form and offers the appropriate syntactic niceties of your statement: “URLs really are, fundamentally, strings.”

But since you brought it up - your logic seems to suggest that all types which possibly can be expressed as strings should conform to the ExpressibleByStringLiteral protocol.

Not quite; my logic is, as above, that types which “really are, fundamentally, strings” should conform to that protocol—or else nothing could really conform to that protocol.

One potential hint—though not necessarily pathognomonic—is that a type both can be expressed as a string and can be converted (via an unlabeled converting initializer, typically) from another type that does conform to ExpressibleBy but represents different data as strings. Serious consideration should be given in that case as to whether such a type should conform.

It is not exactly parallel, but let’s keep in mind that removeFirst, popFirst and dropFirst live side-by-side and we trust users to use the one they want. It is not fatal to have related APIs that differ only in error handling.

I would call your attention back to our earlier discussion.

As you say, even if syntactically valid, there are scheme-specific requirements too—and it is unlikely that a beginner (or, for that matter, perhaps anyone but the most expert in URL-related matters) is asking specifically for compile-time checking of valid URL syntax as per RFC-this-and-that but not of scheme-specific requirements. Meanwhile, on the other hand, it’d likely be too strict to enforce requirements on URL initializers other than that it’s initialized from a syntactically valid URL string.

Yet even if all users could handle that distinction, a URL with a typo in it can just as well point to a typosquatter’s domain as it can be syntactically invalid or fail to meet a scheme-specific requirement. And if a user actually tries to use that URL, it could be far more devastating if it resolves to an unintended and possibly malicious endpoint than if it fails to be a valid URL at all.

My conclusion from this is that we cannot adequately protect the user from the pitfalls of not proofreading what they’ve typed, whether with compile-time or runtime checks, and regardless of syntax. A user who today dutifully writes guard let url = URL(string: foo) else ... is also misled into an illusion of having thereby averted tragedy like the one who assumes a literal is checked at compile time.

Further, the lack of compile-time evaluation features today doesn’t automatically make things worse—indeed, if we did have compile-time evaluation of URL syntax, those users could be more misled into assuming, when they’ve addressed whatever the compiler has told them to fix in their URL string, it is also correct in other respects which the compiler cannot ever check (the most critical one being “does this URL point to the resource I have in mind instead of a different resource I may have actually typed?”).

I suppose you could conclude that therefore URL should never conform to ExpressibleBy.... However, while good API designs promote correct use and steer away from incorrect use, the bar cannot be that we make even unreasonable misuses impossible.

Strings have long been associated with lack of checking in terms of their contents—take the punny term “stringly typed,” for instance. Not only are users accustomed to typing literal strings with contents that need to adhere to some other syntax that isn’t checked by the compiler, in fact there is actually no situation at all today where Swift looks inside string contents for adherence to non-Swift rules. One might say that these contents are “stringly valued” (sorry). When regexes gain compile-time parsing, they will also gain a new literal type with distinct delimiters. As you say above so pithily, URLs are “fundamentally, strings,” and I do not think it is reasonable to design on the assumption that ordinary users will be naturally misled to assume text in quotation marks without any internal syntax highlighting is going to be adequately checked (for whatever level of adequate) at compile time for some non-Swift syntax simply because the literal is not surrounded by parens or followed by !. That would be counter to everything they’ve done with strings to date.

1 Like

I chose isNotDirectory to signify that's all this API cares about (whether a URL points to a directory or not). IMO isFile might cause more ambiguity because, as you mentioned, an item can be neither a directory nor a regular file. There are other types of "disk items" such as symbolicLink, socket, namedPipe, etc. (see URLFileResourceType). It'd be confusing if a developer has to choose between isDirectory and isFile when the URL is pointing to a symbolic link.

I agree with you that the closure approach isn't that much better, and I'm in the process of rethinking this. On the other hand, having access to cwd is very important because Swift is a general-purpose language. For example, one might argue having access to the current working directory is crucial to scripting (at least as of now), and the concerns over thread-safety don't apply to simple scripts. I don't think we should exclude this API just because it might lead to unexpected results in some environments.

2 Likes

I love these ideas. Thanks for the suggestions!

3 Likes

One consideration that should be highlighted: pthread_chdir_np and pthread_fchdir_np to my knowledge do not exist on non-Darwin platforms which a sensible implementation of the withCurrentDirectory would eventually result in those calls.

I'm concerned about the conflict with init(_: FilePath). As you note, today this causes undesired behavior where URL("https://www.apple.com") creates a relative file path URL with two components, "https:" and "www.apple.com". However, your proposed change would break anybody currently relying on the interpretation of a string literal as a file path, for example URL("/usr/bin"). (As a FilePath that's file:///usr/bin.) Similarly, URL("My Documents") currently succeeds, translating to My%20Documents.

1 Like

Sometimes I find it difficult to parse everything you're attempting to explain. So at risk of sounding a bit dumb, here it goes.

My understanding is @Karl believes that leaving evaluation of URL("trash") to potentially crash at runtime is confusing, and there's not much benefit over a developer explicitly typing !.

My opinion is this is actually dangerous, and I'll provide an example a bit below. My question to your response though is, why make things even harder on the developer and end user? Yes we can't protect a developer from not proofreading, but we can protect them from writing code that crashes an app, an app that may perform many other functions that are very important and valuable to the end user.

I'll provide my example now of why what you're arguing is dangerous. A pattern I've seen in apps is that a server may generate a URL string and send it in a push notification to an application, which then ingests / uses the URL. I have explicitly witnessed another not as friendly mobile platform app crash because of an implicit crashing behavior related to parsing a URL string that was incorrectly formatted, by contrast the iOS app did not crash but reported errors that triggered a page. Soon after the server bug was fixed. However, guess which users complained and which didn't? Now imagine this app is vitally important for some arbitrary reason? Would you really choose to have the code which implicitly crashes in this scenario or the code which asks the developer to softly fail and log an error?

The discussion was about literals.
I see @stackotter has replied to you in another thread about the same point.

The discussion was about literals .

Thank you for clarifying, that assuages part of my concern however @stackotter 's reply contains

A URL literal would be required to be known at compile time, meaning that the compiler can throw an error if the URL is invalid

Which, if I have followed correctly, the second phrase here is actually not the case. The compiler isn't checking if the URL is invalid, that's still checked at runtime, and compiler evaluation is left for a hypothetical future direction. I get now why @Karl stated that this is confusing, it's already confusing to explain on this forum. I also still stand by my remark that not crashing by default is preferable for the general reasons I provided.

As I said,

1 Like

Would a URL to a typosquatter's domain crash the app?

If yes I can see how one might have the opinion that "we cannot adequately protect the user from the pitfalls of not proofreading what they’ve typed" therefore just crash the app whenever one of those pitfalls occurs.

If no then that seems like a cherry picked example for arguing for a general principal of "we cannot adequately protect the user from the pitfalls of not proofreading what they’ve typed", but there's a clear and different counter-example in that we can protect users from crashing their app, which contradicts the assertion of "cannot..protect". We can also help protect, or at least assist, users from confusion in api usage.

A typosquatter can steal your users' credit card info; an app that doesn't run cannot. In Swift, fatal errors are one mechanism by which safety is achieved--as has been said before, crashing is safer and more desirable than continuing execution in an unanticipated state.

1 Like

So a typosquatter's URL will crash the app? I'm asking because I'm not sure, I'd assume the typosquatter would still need to squat on a technically valid URL.

I don't think anybody realistically hopes for protection again typos - when I call URL("http://aple.com"), I don't expect it to return nil because I forgot the "p". I don't expect it to read my mind.

And more importantly, if it did not return nil, I would not assume the result to be typo-free.

That said, the failable version does detect and reject invalid syntax, so I think that is part of what developers expect when constructing a URL from a string (literal or not). We also know they would like mistakes (these kinds of mistakes) to be detected at build-time.

This feature is limited to build-time strings, and removes the optional, and so has the appearance that some kind of build-time processing is taking place. I feel that is the impression this and other pitches give. But no - no build-time URL processing or validation is taking place whatsoever, and it traps at runtime if it happens to be executed with an invalid string.

I still think that is misleading. Yes, developers should test every code path, but sometimes they don't, and I don't think they deserve this sort of deterioration in developer experience for those code paths. The balance is different for each API, but IMO removing the optional on this API needs to meet a higher bar. There needs to be some amount of build-time checking to make up for it.

6 Likes

I think that, from a correctness perspective, a misspelt but valid URL is distinct from a misspelt but invalid URL. It is a common pattern for Swift programs to crash on invalid forms (under certain circumstances, such as literals which are known at compile time), but a compiler can never tell if you intended to write foo.org as opposed to foo.com. That distinction can’t even be caught by the most advanced compiler or type system, but it is very much possible to catch n o n s e n se as an invalid URL.

It's an interesting way to word it.

Yes, a compiler couldn't tell you. But then - why are we even asking a compiler? Compilers turn source code in to executable code, they don't validate URLs or check domain names! ...do they?

The benefit of having a compiler do this (via compile-time evaluation) is that the results can be statically guaranteed. It's just an optimisation - we simulated the result and can constant-fold a bunch of logic.

But if we can't do that - because the library is part of an ABI-stable SDK, or the standards are not stable enough for that kind of guarantee, or the library is too complex, or whatever other reason - why are we still asking the compiler?

I said before that I think linting is the way to go here. Build-time input validation delivered by packages but not evaluated by the compiler. Libraries would tag functions/initializers (@lintable?), and the compiler would gather all the inputs it manages to constant-fold and pass them to a package plugin. The plugin checks them, and the IDE can show a little :white_check_mark: at the call-site to show that the tool is happy with the value, and if something can't pass the build-time tool, it might even be worth failing the build.

That would actually be build-time validation, in a way that can be delivered in a realistic timeframe (right?), which scales to ABI-stable SDKs, unstable standards, and other complex libraries. It wouldn't change anything in the type system - you'd still deal in optionals like today, but you'd have that extra level of checking at build-time with no configuration needed.

Even if it only applied to strings, I think a system like that could carry us a long way.

3 Likes

It is distinguishable, certainly. My argument continues to be that (for a literal, which is the topic of this debate) it is a distinction at best of minimal significance:

Either the URL string is correct, meaning it references the author's intended resource, or it is incorrect. If there is one salient way in which to subdivide incorrect URL strings, I'd argue it's whether they're resolvable to some (unintended) resource or unresolvable entirely—these have very different security implications. For the end user, if a resource doesn't load due to a typo, I can hardly imagine they care very much whether it was nonetheless a syntactically valid typo that caused no resource to be found.

As you say, a compiler cannot know the author's intention. And as I've said above, nothing and no one but the author can know, whether at runtime or at build time, and regardless of what syntax is used. Notably, what is impossible to do for anyone or anything else boils down to merely proofreading the typed-out literal by the author.

It is true that whether a URL string is valid could conceivably be validated at build time, and invalid URLs obviously cannot reference the author's intended resource. Certainly, no one is arguing against the Swift compiler performing such validation when the features are available to implement it.

That it is inarguably possible to make such an improvement, however, doesn't mean that it is therefore a showstopper not to do so. This is because it does not follow that since having more build-time validation is desirable and ensuring the validity of URL strings is about all we can expect of the compiler, this limitation therefore defines relevant thresholds of URL "correctness" for users of the API. Indeed, as @Karl astutely points out, it does not even lead to the inevitable conclusion that it is the compiler that has to have the starring role at compile time.

It is an intriguing idea to have a linter involved because such a tool could not only call out invalid URLs, but it can also flag valid URLs that don't resolve to any resource (at least in the moment)—and it can also decrease the effort required for authors to ensure that URLs which do resolve actually reference the intended resource. As I argue here, these latter advantages are going to be more meaningful to users than catching mere syntactic invalidity.

1 Like

That is correct, this pitch would not add the functionality required to add the compile-time URL initialiser afaik. However, it would not be adding an unsafe version as you suggested.

This comes back to my main issue with this proposal (and I'm not alone). That is: it does not add any benefit except for future directions. If these future directions were to be implemented at a later point, we might realise that the feature isn't actually designed in the most ergonomic way but it would be too late to change it.

1 Like

Sorry, I got confused, I thought I was replying to the compile time constants proposal. My bad :man_facepalming:

1 Like

The improvements are now in! I thought this deserves mentioning for people who don't know yet (like me until a few mins ago).
This is just a link to the one of them.

https://developer.apple.com/documentation/foundation/url/3988464-init

5 Likes