Foundation URL Improvements

I believe the goal of making the language easier through some of these ergonomics decision is on-point.

That said, the URL struct seems to be crowding over responsibilities which are not related to what a URL is. Bringing over FileManager functionality to URL seems to only benefit in the easy syntax access of the directories through static methods. The recently added URL.lines is also an example.

@Karl makes some great points about the use of dedicated tools for the job. If FilePath is a better tool for handling any file directory jobs, let’s normalize that instead of making URL fit too many roles and not doing them all as good.

In my opinion, and if I understand it as I think I am, the codebase of many projects would not be so heavily affected as this proposal implies. Maybe an example would communicate ideas better as your motivation of not going through this alternative.

I clarify in no way I’m saying to make everything break in order to innovate, but to guide developers into implementing better tools for their job over time. Maybe even at first, with optional opt-in.

In another subject, I would love to see a refinement or replacement to the use of an array urlqueryitems to modify a URL’s query.

Ideally, they could be methods like set, append and remove in URL and a URLQuery struct that allows going over, as a sequence, through the query items and even get their value from a O(1) key access.

This would enrich the concept of an URL as a specialized string manipulator which is its main responsibility.

7 Likes

I very warmly welcome the first public Foundation pitch (who would guess we'd read "FOU-NNNN" one day?). I hope to see more in the future :clap:

if there are other straightforward changes we should consider to help

I wonder, @icharleshu, if URL will become Sendable eventually.

16 Likes

Do I understand correctly that these are runtime assertions? Thus URL("invalid URL") would still compile and type-check as non-optional URL without any compile-time errors?

Also Comparable.

It sounds weird - like, surely everybody would have noticed already if URLs didn't conform to Comparable - but it's true! It doesn't conform!

print(URL(string: "http://example-a.com")! < URL(string: "http://example-b.com")!)
// ❌ Referencing operator function '<' on 'Comparable' requires that 'URL' conform to 'Comparable'

I find it really interesting that this isn't a bigger deal.

1 Like

Very excited about this! A huge portion of static URL initializers that I encounter are just immediately force-unwrapped (e.g. URL("https://forums.swift.org")!) so making this the default in StaticString cases like this will be a huge improvement.

5 Likes

This has always bothered me as well, because what if the path doesn't actually exist on disk at this point? Or what if the path isn't readable from the current process permissions? Or what if the items on disk are later replaced by others (e.g. file could turn into a directory).

It seems like building a path url shouldn't depend on the current state of the filesystem indeed.

10 Likes

just as a note: Darwin actually has thread local versions of the backing API for this. Perhaps on Darwin we can do something along those lines?

2 Likes

We are looking into Sendable conformance for a lot of types - I think it would be remiss of us to not make URL adhere to Sendable. However there are some complicating factors with that particular case that don't let it be as easy as I would like.

6 Likes

That would be good, but as a platform abstraction library, I think it's important that this is safe and has predictable behaviour on all systems. For example, for the platform functions, Microsoft says:

Multithreaded applications and shared library code should not use the GetCurrentDirectory function and should avoid using relative path names. The current directory state written by the SetCurrentDirectory function is stored as a global variable in each process, therefore multithreaded applications cannot reliably use this value without possible data corruption from other threads that may also be reading or setting this value.

This limitation also applies to the SetCurrentDirectory and GetFullPathName functions. The exception being when the application is guaranteed to be running in a single thread, for example parsing file names from the command line argument string in the main thread prior to creating any additional threads.

Using relative path names in multithreaded applications or shared library code can yield unpredictable results and is not supported.

It would be pretty harsh if this worked on Darwin, but is just an unsafe, mutable global on Windows (the kind of thing I think we're hoping to eliminate with the future concurrency work in Swift 6+). I think it needs to be safe everywhere.

In terms of semantics, isolating working directories at the thread level doesn't make a whole lot of sense for Swift's concurrency model, because tasks can be suspended and resumed on any thread by their executors. It would be really confusing if you're, say, using an async file API (like Foundation might offer), but your current working directory keeps changing across awaits. I think Task-local is the way to go here.

6 Likes

Doing that will definitely take a decent amount of work to accomplish, but is probably something that should be researched if it can even be done. Since the intrinsics to do so are thread based we need to make sure that thread local storage is donated to the task somehow (and then cleared from that thread?); I am not sure how that can be done just yet.

Alternatively it could be done as a closure: func withCurrentDirectory<R>(_ apply: (URL) throws -> R) rethrows -> R. That would ensure that within that scope it is used correctly.

3 Likes

These additions and changes look great and I hope that some of them can be marked as alwaysEmitIntoClient. :crossed_fingers:

3 Likes

New too Swift but what if instead of using.

we used an optional enum that defaults to .web and you would add something like .file for file path's?

init(_ string: StaticString, .file)

It’s probably out of scope for this but my biggest issue with URL is its strict parsing. The amount of times a feature breaks because suddenly we get a url with spaces that URL refuses to parse it’s not small. While other platforms are fine because their url parsing success :sweat:
I would love to see improvements on this aspect.

Ah.. I didn't know Swift favors String by default. Thank you Becca!

Why do these require Collection instead of Sequence? Also, have you considered requiring C.Element to conform to StringProtocol rather than specifically being a String?

4 Likes

The way I see it, Foundation.URL’s job is to follow the standard exactly. In fact, it should explicitly state that in the documentation.

Anything valid under that standard should be a legal URL, and anything invalid under that standard should not. Further correction, like escaping spaces, should only be used by initializers and methods that explicitly perform it.

If other platforms don’t do that, other platforms are (by that standard) wrong. Any complaints about that should be directed to WHATWG, which Apple is actually a founding member of.

1 Like

Correct. The alternative would be using the existing failable initializer init?(string:).

2 Likes

Would it be possible to apply these changes while simultaneously encouraging that users transition to Swift System, especially if they aren’t relying on resource value caching? I think that’d be the best of both worlds.

1 Like

It seems to me that this update should match what's happening elsewhere in Foundation these days, with a replacement for URLComponents that mimics FormatStrings.

e.g. URL would match up with Date.ParseStrategy in this code, and the format argument would be whatever is the new equivalent of URLComponents. isLenient would take care of the spaces issue mentioned above. (Nothing I can think of would match up with the Date initializer.)

try Date(
  "2022-11-15 @ 7 in the jolly old PM",
  strategy: Date.ParseStrategy(
    format: """
      \(year: .defaultDigits)-\(month: .defaultDigits)-\(day: .defaultDigits)
      @ \(hour: .defaultDigits(clock: .twelveHour, hourCycle: .oneBased))
      in the jolly old \(dayPeriod: .standard(.abbreviated))
      """,
    timeZone: .current,
    isLenient: false
  )
)
2 Likes

The convention is:

  • Swift.assert(_:_:file:line:) is used for serious errors by the programmer
  • Swift.precondition(_:_:file:line:) is used for serious errors by other programmers (as the name implies, mainly violated preconditions)
  • Swift.fatalError(_:file:line:) (and forced unwrapping) is used for serious errors that aren’t the result of programmer error.

By requiring StaticString, the new initializer forces invalid input to fall into the first two categories. It arguably belongs in the second category, but the benefit of skipping assertion checks entirely in release builds probably justifies that.

Personally, I’ve been using Swift.Optional.unsafelyUnwrapped for hard-coded URLs, which has very similar behavior to assertions.

1 Like