Foundation URL Improvements

This has always bothered me as well, because what if the path doesn't actually exist on disk at this point? Or what if the path isn't readable from the current process permissions? Or what if the items on disk are later replaced by others (e.g. file could turn into a directory).

It seems like building a path url shouldn't depend on the current state of the filesystem indeed.

10 Likes

just as a note: Darwin actually has thread local versions of the backing API for this. Perhaps on Darwin we can do something along those lines?

2 Likes

We are looking into Sendable conformance for a lot of types - I think it would be remiss of us to not make URL adhere to Sendable. However there are some complicating factors with that particular case that don't let it be as easy as I would like.

6 Likes

That would be good, but as a platform abstraction library, I think it's important that this is safe and has predictable behaviour on all systems. For example, for the platform functions, Microsoft says:

Multithreaded applications and shared library code should not use the GetCurrentDirectory function and should avoid using relative path names. The current directory state written by the SetCurrentDirectory function is stored as a global variable in each process, therefore multithreaded applications cannot reliably use this value without possible data corruption from other threads that may also be reading or setting this value.

This limitation also applies to the SetCurrentDirectory and GetFullPathName functions. The exception being when the application is guaranteed to be running in a single thread, for example parsing file names from the command line argument string in the main thread prior to creating any additional threads.

Using relative path names in multithreaded applications or shared library code can yield unpredictable results and is not supported.

It would be pretty harsh if this worked on Darwin, but is just an unsafe, mutable global on Windows (the kind of thing I think we're hoping to eliminate with the future concurrency work in Swift 6+). I think it needs to be safe everywhere.

In terms of semantics, isolating working directories at the thread level doesn't make a whole lot of sense for Swift's concurrency model, because tasks can be suspended and resumed on any thread by their executors. It would be really confusing if you're, say, using an async file API (like Foundation might offer), but your current working directory keeps changing across awaits. I think Task-local is the way to go here.

6 Likes

Doing that will definitely take a decent amount of work to accomplish, but is probably something that should be researched if it can even be done. Since the intrinsics to do so are thread based we need to make sure that thread local storage is donated to the task somehow (and then cleared from that thread?); I am not sure how that can be done just yet.

Alternatively it could be done as a closure: func withCurrentDirectory<R>(_ apply: (URL) throws -> R) rethrows -> R. That would ensure that within that scope it is used correctly.

3 Likes

These additions and changes look great and I hope that some of them can be marked as alwaysEmitIntoClient. :crossed_fingers:

3 Likes

New too Swift but what if instead of using.

we used an optional enum that defaults to .web and you would add something like .file for file path's?

init(_ string: StaticString, .file)

It’s probably out of scope for this but my biggest issue with URL is its strict parsing. The amount of times a feature breaks because suddenly we get a url with spaces that URL refuses to parse it’s not small. While other platforms are fine because their url parsing success :sweat:
I would love to see improvements on this aspect.

Ah.. I didn't know Swift favors String by default. Thank you Becca!

Why do these require Collection instead of Sequence? Also, have you considered requiring C.Element to conform to StringProtocol rather than specifically being a String?

4 Likes

The way I see it, Foundation.URL’s job is to follow the standard exactly. In fact, it should explicitly state that in the documentation.

Anything valid under that standard should be a legal URL, and anything invalid under that standard should not. Further correction, like escaping spaces, should only be used by initializers and methods that explicitly perform it.

If other platforms don’t do that, other platforms are (by that standard) wrong. Any complaints about that should be directed to WHATWG, which Apple is actually a founding member of.

1 Like

Correct. The alternative would be using the existing failable initializer init?(string:).

2 Likes

Would it be possible to apply these changes while simultaneously encouraging that users transition to Swift System, especially if they aren’t relying on resource value caching? I think that’d be the best of both worlds.

1 Like

It seems to me that this update should match what's happening elsewhere in Foundation these days, with a replacement for URLComponents that mimics FormatStrings.

e.g. URL would match up with Date.ParseStrategy in this code, and the format argument would be whatever is the new equivalent of URLComponents. isLenient would take care of the spaces issue mentioned above. (Nothing I can think of would match up with the Date initializer.)

try Date(
  "2022-11-15 @ 7 in the jolly old PM",
  strategy: Date.ParseStrategy(
    format: """
      \(year: .defaultDigits)-\(month: .defaultDigits)-\(day: .defaultDigits)
      @ \(hour: .defaultDigits(clock: .twelveHour, hourCycle: .oneBased))
      in the jolly old \(dayPeriod: .standard(.abbreviated))
      """,
    timeZone: .current,
    isLenient: false
  )
)
2 Likes

The convention is:

  • Swift.assert(_:_:file:line:) is used for serious errors by the programmer
  • Swift.precondition(_:_:file:line:) is used for serious errors by other programmers (as the name implies, mainly violated preconditions)
  • Swift.fatalError(_:file:line:) (and forced unwrapping) is used for serious errors that aren’t the result of programmer error.

By requiring StaticString, the new initializer forces invalid input to fall into the first two categories. It arguably belongs in the second category, but the benefit of skipping assertion checks entirely in release builds probably justifies that.

Personally, I’ve been using Swift.Optional.unsafelyUnwrapped for hard-coded URLs, which has very similar behavior to assertions.

1 Like

Foundation URL is not built to conform to the WHATWG URL Standard.

It isn't mentioned in the Swift URL docs, but NSURL's documentation mentions that it is built to conform to RFC-1738. URLComponents/NSURLComponents actually conforms to a different standard, RFC-3968, so there can be behaviour differences between them. URLs have been in a pretty sorry state for a long time, in terms of predictable behaviour and standardisation.

The WHATWG standard is something entirely different. It's the web's model of URLs - which is really what everybody wants and expects anyway, but there wasn't any formal description of how it should work, and browsers and URL libraries all behaved differently.

But it is very different. As a frequent contributor to that standard (even the latest commit is one of my contributions), I don't think it's feasible to retrofit on to Foundation URL's design (I have tried) -- or at least, it's a substantial enough difference that it's a very awkward fit, but also offers significant opportunity to rethink the design at a more fundamental level.

9 Likes

I was wondering about that. In that case, it should explicitly conform to RFC 1738 and follow that.

I believe one of the problems with the old RFCs is that the standards themselves are ambiguous.

One of the other problems is that literally nobody follows the spec, so it can lead to some surprising results that are incompatible with other implementations following different standards.

When I say "URLs have been in a pretty sorry state for a long time", I mean it. It's kind of shocking when you think of how much critical infrastructure depends on them.

4 Likes

I’ve come to view Foundation as a bit of a backwards-compatible minefield: a lot of it is extremely powerful and useful, but a lot of it is also the victim of poor/obsolete design decisions, immense technical debt, and poorly-documented or surprising behavior.

I think it’d be helpful if Apple started pushing the modern Swift packages as the first resort for their niche, even in code that is specific to the Apple ecosystem. Sort of like how everyone is encouraged to migrate to SwiftUI when feasible.

In this case, that means actively trying to redirect people to use WebURL once that is stable.

3 Likes

Thank you for bringing this up!


Is international domain support on the table? RFC 5890 / 5891, etc
as of today this code doesn't work:

let urlString = "https://www.ελ:443/public/?x=1"
let url = URL(string: urlString)!

and if you have the url string originally you have to first manually parse it into components, then feed components individually to URLComponent machinery to finally grab the resulting url:

let urlString = "https://www.ελ:443/public/?x=1"
// let url = URL(string: urlString)! // this doesn't work

// as a workaround:
// parse urlString manually, and derive scheme / host / port / path / query:
// let scheme = ... some function of urlString
// let host = ... some function of urlString
// let port = ... some function of urlString
// let path = ... some function of urlString
// let query = ... some function of urlString

var c = URLComponents()
c.scheme = scheme // https
c.host = host // "www.ελ"
c.port = port // 443
c.path = path // "/public/"
c.query = query // "x=1"
let url = c.url!
print(url) // "https://www.%CE%B5%CE%BB:443/public/?x=1"
print(url.host!) // "www.ελ"

This is nice and cool. Can you go one step further?

// After
let photoURL = baseURL.appending(paths: ["Photos", "\(id.first!)", id])

// After after
let photoURL = baseURL + "Photos" + id.first! + id

Using universal URL is double edge sword and leads a non type safe API. There are APIs that only accept file url and if you feed them http url you'll get a runtime error. Obviously it would be nicer if that was a compile time error. Example:

 let webUrl = URL(string: "https://file-examples-com.github.io/uploads/2017/11/file_example_MP3_700KB.mp3")!
 let player = try! AVAudioPlayer(contentsOf: webUrl)
 // ?!?!  [plugin] AddInstanceForFactory: No factory registered 
 // for id <CFUUID 0x6000031f85c0> F8BB1C28-BAE8-11D6-9C31-00039315CD46

I'd seriously consider splitting the two worlds, while still providing a way to support both type of resources for those APIs that can. And do this in a type safe manner.


URL caching is rather mysterious and not obvious. From the docs:

What if the cached resource changed since it was cached? Will the cache be properly invalidated? Does cache invalidation depend upon a particular way how the change is made?

Does that mean the cache is very very temporary and the cached entry will not be there on the next callout of event loop, say, some 0.01 seconds after resource was placed in cache?!?!