[Back from revision] Foundation URL Improvements

The changes are basically absolutely great work while retaining compat.
I like the elegant DirectoryHint and new initializers. It's also great to hear future directions.

One thing, is there any particular reason variadic parameters should be avoided? (I guess I don't see it often in Swift stdlib nor in the other areas of Foundation):

extension URL {
    @available(macOS 9999, iOS 9999, tvOS 9999, watchOS 9999, *)
    public mutating func append<S: StringProtocol>(paths: S..., directoryHint: DirectoryHint = .inferFromPath)

    @available(macOS 9999, iOS 9999, tvOS 9999, watchOS 9999, *)
    public func appending<S: StringProtocol>(paths: S..., directoryHint: DirectoryHint = .inferFromPath)
}
let photoURL = baseURL.appending(paths: "Photos", "\(id.first!)", id)

Edit: Personally improvements regarding lstat is very, very welcome and thank you.

2 Likes

Looking great!

I agree with most of what been said above but want to add something additional regarding the appends:

It's quite a common operation to add one or more query items, is there a reason a .appending(queryItems: [URLRequeryItem]) was not added?

1 Like

I agree with what most people said – this looks really good. One question regarding the methods that take string (particularly the appending() family). Would it make sense to make them templated, i.e. instead of hard coding String to use something like <S: StringProtocol>, avoiding having to do explicit string conversions (as in your UUID path example)?

I understand that this isn't necessarily something we want for every API that uses strings, but given the strong relationship between URL components and strings (i.e. the expectation is that a lot of URLs will be computed from components represented as, e.g., Substrings), this might be a useful addition.

1 Like

According to SE-0213:

  • URL("...") will use init(stringLiteral:)
  • URL.init("...") will use init(_:)

So maybe the unlabelled initializer isn't needed?
Or could it be repurposed for non-literal strings and substrings?

When given a "file:///" scheme, do these new initializers use directory hinting?


According to the documentation and open-source implementation, FilePath normalizes itself by removing any trailing separators, so the DirectoryHint.inferFromPath option may not be valid.

Note that once URL itself conforms to ExpressibleByStringLiteral, this will no longer be true: writing URL("https://www.apple.com") will be exactly the same as writing "https://www.apple.com" as URL.

Maybe I'm missing something, but FilePath("file:///usr/bin/swift") is equivalent to "file://usr/bin/swift" as FilePath, and this seems actively desirable since the two forms are often confused by users (the reason why this was made a special-cased rule in the language to begin with).

"file:///usr/bin/swift" is not a file path, so this would conflict with FilePath's own string literal support.

And having FilePath's string literal initialisers support both URLs and file paths would be ambiguous on Windows: in "x:/foo" as FilePath , is x: a Windows drive letter or a URL scheme? If the former, this should succeed; if the latter, this should fail (because you clearly cannot create a file path from a non-file URL).


IMO, StaticString initialisers aren't really worth the fuss. I know I suggested ExpressibleByStringLiteral in the previous pitch, but that was more of a "if you've decided to do A, you might as well do B" comment; not an endorsement that I think A is actually a good idea in the first place.

I did, in fact, consider it a long time ago for WebURL, but I don't really think it's what people really want -- it looks like a feature that people have occasionally asked for (compile-time checking), but it really isn't. It looks so close that IMO, it's misleading to the point where it could actually be harmful.

What's the use-case?

Firstly, it is very rare that you make a request directly from a pure string literal. I can't remember the last time I saw code which actually looked like this:

URLSession.shared.data(from: "https://example.com/file", delegate: nil)

Most of the time, you will be writing something like this instead. I'm not saying there is no benefit in any of these cases, but let's be honest... it's marginal at best.

var endpoint = URL("https://api.example.com/...")
var endpoint = "https://api.example.com/..." as URL
var endpoint: URL = "https://api.example.com/..."

The most compelling use-case I can think of, where people might actually be making requests from string literals is in scripts. Building a request API optimised for scripting is interesting, but I think it probably wouldn't look much like URLSession, and would probably include an overload for specifying URLs as non-static Strings anyway (script-centric APIs are all about getting the type system out of the way for the sake of convenience).

But doesn't it help you write correct code?

So what do we actually gain? Basically, the ability to omit the ! force-unwrap. And to be clear - as mentioned in the pitch, it does not remove any runtime errors or crashes, or even catch a single bad URL earlier than you could today with the force-unwrap. It just hides them. That's literally all it does.

There are Swift developers who make it their mission to avoid all force-unwraps, even though the community has repeatedly discussed it and found that whole philosophy to be a fundamental misunderstanding of the language. It tends to come from the idea that they want their apps to never crash - which is understandable, of course nobody wants their apps to crash. But lots of functionality in Swift can fail at runtime (e.g. accessing an invalid Array index), and the only way to fully guard against it is to test your code (as it always was). Stuff like this will still fail at runtime if you don't test all code paths:

let endpoint: URL
if userLocaleIsFrench {
  endpoint = "https://api.example.com/..."
} else {
  endpoint = "https://api.fr .example.com/..." // whoops! typo!
}

Again - this feature would just hide the potential crash, not remove it. Is that really a better situation? Some might say yes, others may disagree; what I think we should be able to accept is that the benefits (if there are any) are not obvious. It is not a clear improvement. Is it obvious by looking at that code that it will fail at runtime?

It is true - things that are ExpressibleByStringLiteral, or which include non-failable StaticString initialisers, do not need to accept every string value, and it is within the protocol's semantics for them to fail at runtime. However - the reason people ask for this feature is because they want a particular failure behaviour, which this initialiser can't and won't provide. They do sometimes find typos in URLs, and they want those to be flagged even in branches which are not actually tested (which... :neutral_face:, really - there's no substitute for tests). That is something that only build-time checking can provide. This thing gives you a false sense of security; it's just as misguided as those who proclaim that all force-unwraps are evil or "code smells".

FWIW, the approach I'm planning to take for WebURL is to create a linting plugin. It hasn't been a priority so far, but my idea is to either create one of the new-fangled command plugins, or add some kind of integration with an existing linting tool. It's not compile-time checking, so it still requires an extra command/tool, but it is build-time checking, so IMO it's closer to what people actually want.

Sorry, I'm now very confused, or we are talking past each other. I thought we were talking about FilePath's own string literal support. What is the conflict of which you refer?

It is desirable when a string literal is passed to FilePath(...) that it would be parsed according to FilePath's rules, and when a string literal is passed to URL(...) that it would be parsed according to URL's rules. If "file:///usr/bin/swift" is not a file path, then an error should arise when someone writes FilePath("file:///usr/bin/swift"). This, I believe, is what already happens with FilePath at this present moment in time and if this proposal is adopted as-is—am I incorrect in that?

I am not suggesting that we should make FilePath support anything it does not currently. Since you do bring it up, by the by, I'm not sure I would oppose making it understand "file:///...", as it's unlikely there would be any broken behaviors with an added code path in the corresponding FilePath initializer that's basically if string.hasPrefix("file:///") { self.init(URL(string)); return }. But to be clear, this is not what I've been talking about.


No, not only that. If it were only that, then I too would find the idea of conforming to ExpressibleBy... more take-it-or-leave-it.

What we gain is more crucially, in my opinion, the advantage I discuss above—i.e., that passing a string literal to URL in any of the syntaxes you list (be it ... as URL or URL(...) or let x: URL = ...) causes the string literal to be parsed as per URL's rules, regardless of what other overloads exist or may be added in the future. With ExpressibleByStringLiteral conformance, this becomes an ironclad guarantee that users can lean on without shenanigans with @_disfavoredOverload or other trickery.

In essence, I’m arguing that even though ExpressibleBy... was originally conceived as a “bag of syntax” protocol to support a language feature, which we now try to avoid, with subsequent evolution we’ve endowed it with the narrow but crucial semantic significance that I describe above, which URL could benefit from both because of the many converting initializers that clash and because it actually meets this semantic requirement; namely—

When a type Foo is ExpressibleByBarLiteral, then a user is correct to expect a Foo to be directly initialized from a bar literal without (notionally) conversion via an intermediate type and thus lossiness due to what such an intermediate type can represent.

[See pitch for StaticBigInt for how we will make this come to fruition for third-party types expressible by integer literals; there is also a longstanding extant bug with respect to ExpressibleByFloatLiteral since it falls short in this respect.]

I don't disagree entirely with your point in that it correctly identifies a common misunderstanding, but nor can I agree with it.

Avoiding all force unwraps is misguided in no small part because it does not avoid all crashes, as Swift's standard library APIs enforce preconditions at runtime pervasively and these are nowhere marked with !—a point which you also allude to. Indeed, even APIs that return nil can also have preconditions!

We're not serving the misguided user by being reticent to redesign APIs in order to cater to that mistaken belief; if they go on not checking their own preconditions before calling APIs, they're going to have many problems pervasively.

The criteria here are, in my view, about the same as with any other API under design where we have to determine if it should be failable: Is it reasonable to expect the user to check the input beforehand as a precondition, or is the cost of checking it sufficiently complex that it'd require a nontrivial chunk of what the API itself is going to have to do anyway (indeed, at the limit, it could be impossible to determine success or failure without actually performing the operation)? If the latter, we should return nil; if the former, we can make it a precondition.

Literal values are generally at the polar opposite from something like, say, opening a file for which success or failure cannot be determined until the actual operation is performed. It is certainly nice, from a diagnostic standpoint, when the compiler can help the user troubleshoot their own input—it would be equally nice if the compiler could tell the user that they’re guaranteed to be asking for an out-of-bounds array index. I would enthusiastically welcome such an advance.

But in the absence of such diagnostic niceties, it is a precondition failure par excellence to pass an invalid value that’s literally literal which can be diagnosed at compile time. I’d go so far as to say that even if determining whether a string value obtained at runtime is a valid value of type T isn’t easy to do and could arguably call for a failable API, that doesn’t mean that the same applies for literals known at compile time.

Besides, the user should be verifying that any literal URLs aren’t just syntactically valid but point to the resource they want at least at the time of writing, rather than some typosquatter’s domain, say.

The statement is somewhat stronger: For any non-builtin type that conforms to an ExpressibleBy... protocol, there's (currently) no option other than failing at runtime for invalid literal inputs. Indeed, since few are designing replacements for the builtin types, any other nontrivial type that conforms will not accept every possible literal value [ExpressibleByNilLiteral excepted, of course].

If either of these is not acceptable, then you're arguing that more or less no non-builtin type should conform to ExpressibleBy... protocols at all—which, then, why vend this protocol at all, since we have the _ExpressibleByBuiltin... protocols for internal use? [As a predominantly “bag of syntax” protocol (with the narrow semantic point I raise above), there are hardly any useful algorithms that could be written generic over ExpressibleByStringLiteral that couldn’t be generic over StringProtocol if we stipulate that no non-builtin conformances should exist.]

Reasoning syllogistically:

  • Premise: If URL shouldn’t conform to ExpressibleBy... (due to the reasons above), then no non-builtin type should conform to ExpressibleBy...
    • This implies the contrapositive: if any non-builtin type should conform to the protocol, then URL should conform
  • Premise: The public ExpressibleBy protocols were designed to allow conformance by at least some non-builtin type.
  • Conclusion: URL should conform to ExpressibleBy...

Right, that is what I started with:

The point I was making is that URLs can be expressed as string literals in some contexts, but not others, and that it might be worth adding a label to the FilePath.init(URL) initializer so that URLs expressed as string literals could also be used there. So you would be able to write:

FilePath(url: "file:///usr/bin/swift")

As I said, it's more of a niche case, really more about symmetry than utility - if URL.init(FilePath) is gaining a label, I think it would be nice to add a label to the reverse.

Perhaps it is a case of talking past each other - I'm not sure if you were disagreeing with me or rephrasing what I already said about how FilePath("file:///usr/bin/swift") (no label) should be interpreted.


Oh, if only converting URLs to file paths were really that simple. There's a lot more to it, I'm afraid, and FilePath would do well to stay clear of it.


So you're saying that the benefit is disambiguation in case a bunch more URL-from-String style initialisers are added. That motivation is not mentioned in the pitch, and honestly, if that is really the motivation, I'd say it's worth re-evaluating whether those future additions are a good idea.

URLs really are, fundamentally, strings. If the API is so complex that we need literals to give tighter guarantees about which rules the parser will be using, we may have designed the wrong API.

As for @ _disfavoredOverload, I don't see it as such a big problem as to warrant this. It's unpleasant, but it exists for a reason and hopefully will be de-underscored soon :neutral_face:. And deprecated APIs are automatically disfavoured IIRC.


This isn't really what I'm talking about: my concern is that developers often ask for compile-time checking, and this would certainly look like it, but it wouldn't really be compile-time checking.

We certainly can make this a precondition, but the "should we?" part needs to take other factors in to consideration - such as what a novice developer would make of it. Developers (especially from dynamic languages) often hear that Swift is really great for compile-time checking of code, but when they first start using the language, they don't really understand what that means or what the limitations are.

The value is known at compile-time, but the diagnosis is at run-time. That's why I think it's possibly misleading.

Sure, and there are other syntactically-valid URLs which are ill-formatted or forbidden by the URL's scheme (e.g. https:///apple.com, which is the example I gave previously. In case it isn't clear, the hostname is empty, and the HTTP spec says that is invalid).

I've suggested that perhaps some of those should also be grounds for a runtime error. It would possibly be easier to debug why requests to that URL fail, so I guess I'd weakly be in favour of adding some scheme-specific checks, but you could reasonably argue that it would be too strict.


Again, that isn't really what I'm arguing. My argument is not specific to the ExpressibleByStringLiteral conformance and not generalisable to all types. My argument is that, in the context of what developers have asked for on these forums for this particular type, I don't think it is appropriate to add an initialiser which accepts only a compile-time string, is not failable, and validates its contents at run-time. To understand this argument, it is important to point out that it will live next to a failable initialiser which accepts a string whose value is only known at run-time and does allow for recoverable failure. It's all about the context, and considering the API holistically.

But since you brought it up - your logic seems to suggest that all types which possibly can be expressed as strings should conform to the ExpressibleByStringLiteral protocol. You've excluded built-in types, but that seems to be an artificial limitation - in this respect, the only difference between a built-in type and a library type is that the former is subject to the authorial judgement of the standard library developers and core team, whilst library types are subject to the authorial judgement of their respective creators. So since you're making such a broad argument, let's remove that limitation, which immediately suggests that types such as Int and Bool also should conform to ExpressibleByStringLiteral.

API design does not simply fall out of abstract logic. Not every type which can conform should conform to any particular protocol - including the ExpressibleBy... family of protocols. Despite what the US Supreme Court thinks, API design is a creative process, and like all creative processes, requires an understanding of the target audience (which may range from expert developers, all the way to users who have literally never written any sort of software before in their lives).

My personal understanding of the target audience is formed by observing the comments on these forums and elsewhere, mentoring junior colleagues, etc, and seeing the kinds of things they ask for. Based on that, I think this feature is likely to lead to confusion with actual compile-time/build-time checking, and does not counter that with significant-enough benefits even when used correctly. In other words, it's not worth it IMO.

I agree that expressing URLs with string literals in production code is of limited utility, but I’ve found it quite helpful in tests. There the type can often be inferred, and being able to elide the URL(string:) and the force-unwrap cuts down on visual noise, making it easier to see that the test itself is correct.

Compare:

XCTAssertEqual(someURL, URL(string: "https://www.example.com")!)

to:

XCTAssertEqual(someURL, "https://www.example.com")
1 Like

That's reasonable, although there are a couple of points worth mentioning:

  1. You don't need the force-unwrap

  2. URL(string: "...") is too verbose, even for the failable initialiser. I think everybody would benefit if it was just URL("...")

  3. You could create a custom XCTAssertEqualURLs method to remove even the initialiser (although the method name becomes longer, so it might not be worth it)

  4. Foundation's URL model is overly complex, so the whole idea of "equal URLs" needs some qualification and possibly warrants the XCTest framework bundling a XCTAssertEqualURLs function anyway:

    let urlA = URL(string: "http://example.com/a/b/c")!
    let urlB = URL(string: "/a/b/c", relativeTo: URL(string: "http://example.com")!)!
    let urlC = URL(string: "b/c", relativeTo: URL(string: "http://example.com/a/")!)!
    
    // All of these URLs have the same .absoluteString.
    
    urlA.absoluteString == urlB.absoluteString // true
    urlB.absoluteString == urlC.absoluteString // true
    
    // But they are not interchangeable.
    
    urlA == urlB // false (!)
    urlB == urlC // false (!)
    URL(string: urlB.absoluteString) == urlB // false (!)
    

That’s a good point. But couldn’t that potentially be fixed? I get why the parsing behavior of URL can’t change, but this has always felt like a bug to me. I can’t think of any other common Swift type where values that represent the same domain value are non-equal because they were constructed differently. (Aside from NaN.)

I’m making a bit of a broader claim, which is that extension URL: ExpressibleByStringLiteral isn’t some mere workaround desirable for the side effect of “tighter guarantees” for overloading but is the semantic expression in code form and offers the appropriate syntactic niceties of your statement: “URLs really are, fundamentally, strings.”

But since you brought it up - your logic seems to suggest that all types which possibly can be expressed as strings should conform to the ExpressibleByStringLiteral protocol.

Not quite; my logic is, as above, that types which “really are, fundamentally, strings” should conform to that protocol—or else nothing could really conform to that protocol.

One potential hint—though not necessarily pathognomonic—is that a type both can be expressed as a string and can be converted (via an unlabeled converting initializer, typically) from another type that does conform to ExpressibleBy but represents different data as strings. Serious consideration should be given in that case as to whether such a type should conform.

It is not exactly parallel, but let’s keep in mind that removeFirst, popFirst and dropFirst live side-by-side and we trust users to use the one they want. It is not fatal to have related APIs that differ only in error handling.

I would call your attention back to our earlier discussion.

As you say, even if syntactically valid, there are scheme-specific requirements too—and it is unlikely that a beginner (or, for that matter, perhaps anyone but the most expert in URL-related matters) is asking specifically for compile-time checking of valid URL syntax as per RFC-this-and-that but not of scheme-specific requirements. Meanwhile, on the other hand, it’d likely be too strict to enforce requirements on URL initializers other than that it’s initialized from a syntactically valid URL string.

Yet even if all users could handle that distinction, a URL with a typo in it can just as well point to a typosquatter’s domain as it can be syntactically invalid or fail to meet a scheme-specific requirement. And if a user actually tries to use that URL, it could be far more devastating if it resolves to an unintended and possibly malicious endpoint than if it fails to be a valid URL at all.

My conclusion from this is that we cannot adequately protect the user from the pitfalls of not proofreading what they’ve typed, whether with compile-time or runtime checks, and regardless of syntax. A user who today dutifully writes guard let url = URL(string: foo) else ... is also misled into an illusion of having thereby averted tragedy like the one who assumes a literal is checked at compile time.

Further, the lack of compile-time evaluation features today doesn’t automatically make things worse—indeed, if we did have compile-time evaluation of URL syntax, those users could be more misled into assuming, when they’ve addressed whatever the compiler has told them to fix in their URL string, it is also correct in other respects which the compiler cannot ever check (the most critical one being “does this URL point to the resource I have in mind instead of a different resource I may have actually typed?”).

I suppose you could conclude that therefore URL should never conform to ExpressibleBy.... However, while good API designs promote correct use and steer away from incorrect use, the bar cannot be that we make even unreasonable misuses impossible.

Strings have long been associated with lack of checking in terms of their contents—take the punny term “stringly typed,” for instance. Not only are users accustomed to typing literal strings with contents that need to adhere to some other syntax that isn’t checked by the compiler, in fact there is actually no situation at all today where Swift looks inside string contents for adherence to non-Swift rules. One might say that these contents are “stringly valued” (sorry). When regexes gain compile-time parsing, they will also gain a new literal type with distinct delimiters. As you say above so pithily, URLs are “fundamentally, strings,” and I do not think it is reasonable to design on the assumption that ordinary users will be naturally misled to assume text in quotation marks without any internal syntax highlighting is going to be adequately checked (for whatever level of adequate) at compile time for some non-Swift syntax simply because the literal is not surrounded by parens or followed by !. That would be counter to everything they’ve done with strings to date.

1 Like

I chose isNotDirectory to signify that's all this API cares about (whether a URL points to a directory or not). IMO isFile might cause more ambiguity because, as you mentioned, an item can be neither a directory nor a regular file. There are other types of "disk items" such as symbolicLink, socket, namedPipe, etc. (see URLFileResourceType). It'd be confusing if a developer has to choose between isDirectory and isFile when the URL is pointing to a symbolic link.

I agree with you that the closure approach isn't that much better, and I'm in the process of rethinking this. On the other hand, having access to cwd is very important because Swift is a general-purpose language. For example, one might argue having access to the current working directory is crucial to scripting (at least as of now), and the concerns over thread-safety don't apply to simple scripts. I don't think we should exclude this API just because it might lead to unexpected results in some environments.

2 Likes

I love these ideas. Thanks for the suggestions!

3 Likes

One consideration that should be highlighted: pthread_chdir_np and pthread_fchdir_np to my knowledge do not exist on non-Darwin platforms which a sensible implementation of the withCurrentDirectory would eventually result in those calls.

I'm concerned about the conflict with init(_: FilePath). As you note, today this causes undesired behavior where URL("https://www.apple.com") creates a relative file path URL with two components, "https:" and "www.apple.com". However, your proposed change would break anybody currently relying on the interpretation of a string literal as a file path, for example URL("/usr/bin"). (As a FilePath that's file:///usr/bin.) Similarly, URL("My Documents") currently succeeds, translating to My%20Documents.

1 Like

Sometimes I find it difficult to parse everything you're attempting to explain. So at risk of sounding a bit dumb, here it goes.

My understanding is @Karl believes that leaving evaluation of URL("trash") to potentially crash at runtime is confusing, and there's not much benefit over a developer explicitly typing !.

My opinion is this is actually dangerous, and I'll provide an example a bit below. My question to your response though is, why make things even harder on the developer and end user? Yes we can't protect a developer from not proofreading, but we can protect them from writing code that crashes an app, an app that may perform many other functions that are very important and valuable to the end user.

I'll provide my example now of why what you're arguing is dangerous. A pattern I've seen in apps is that a server may generate a URL string and send it in a push notification to an application, which then ingests / uses the URL. I have explicitly witnessed another not as friendly mobile platform app crash because of an implicit crashing behavior related to parsing a URL string that was incorrectly formatted, by contrast the iOS app did not crash but reported errors that triggered a page. Soon after the server bug was fixed. However, guess which users complained and which didn't? Now imagine this app is vitally important for some arbitrary reason? Would you really choose to have the code which implicitly crashes in this scenario or the code which asks the developer to softly fail and log an error?

The discussion was about literals.
I see @stackotter has replied to you in another thread about the same point.

The discussion was about literals .

Thank you for clarifying, that assuages part of my concern however @stackotter 's reply contains

A URL literal would be required to be known at compile time, meaning that the compiler can throw an error if the URL is invalid

Which, if I have followed correctly, the second phrase here is actually not the case. The compiler isn't checking if the URL is invalid, that's still checked at runtime, and compiler evaluation is left for a hypothetical future direction. I get now why @Karl stated that this is confusing, it's already confusing to explain on this forum. I also still stand by my remark that not crashing by default is preferable for the general reasons I provided.

As I said,

1 Like