Foundation URL Improvements

Oh, and one more thing:

    /// The working directory of the current process.
    /// Calling this property will issue a `getcwd` syscall.
    public static var currentDirectory: URL

I really wish we wouldn't do this. I discussed some of the reasons on this swift-system PR. Basically: the current working directory is process-wide, mutable state, making it unsuitable in multithreaded environments. Many OS vendors (including Apple) have specific APIs for thread-local working directories to work around this.

Now, there are reasons why swift-system might not care about that. Its job is to represent the system APIs as they are, warts and all - but IMO it is dangerous to expose in a high-level library such as Foundation.

Introducing a Task-local current directory is probably out of scope for this proposal, but we shouldn't make things worse by adding sugar for dangerous APIs.

13 Likes

Glad to here such useful improvements!

I was wondering if there was an initializer that allows explicitly setting the encoding strategy of URL components. I had encountered some old-styled APIs that took ; as query separators:

https://example.com/portal/notices;columnsId=123;pageNo=1

I didn't find a way to correctly initialize such a URL in Swift at that time, so I had to deploy a simple proxy for such API. Is such URL allowed today?

In fact, I don't think FilePath has better cross-platform support at the time. FilePath is still a multi-platform API that acts very differently across platforms (between Windows and other OSs).

I had proposed to generalize FilePath in swift-system#67, which will also allow existing FilePath APIs to be used in remote paths (like scp arguments and object storage services). I believe we shouldn't replace any existing APIs with FilePath until it has a stable cross-platform behavior. And once FilePath is generalized, it should also become an individual target instead of remaining a part of SystemPackage.

1 Like

Common Directories as Static URL Properties

I second Karl on this not being a good idea, including because that will introduce a hidden, implicit dependency on the FileSystem. (this is similar to how I feel like Date.now introduces a hidden implicit dependency on the hardware clock for example).
While I understand the appeal (of making it easy to call using the static member lookup aka "dot-leading" syntax), this would have unexpected repercussions imho.

URL should be DTO-like, i.e. just a model object holding data, but not relying on querying external things (like the FileSystem and FileManager) to return its value. Not only because of thread-safety concerns as mentioned by Karl, but for consistency with the URL type's goal and design being a data holder only.

The fact that URL and FileManager are decoupled today is for a good reason imho, with FileManager encompassing the interaction with the file system and being able to check the presence of files on disk, and rely on the File System implementation (macOS vs Windows vs Linux, etc) of the currently mounted root, etc… while URL should be agnostic to all this and should not interact with hardware, system or disk.


That being said, the rest of the suggestions all look like welcome changes, and thanks for opening the discussion on this!

11 Likes

These look great! How about also adding something like this:

public func appending(query: String, value: String?) -> URL
2 Likes

That would be nice, wouldn't it?

Unfortunately, URL.appendingPathComponent just forwards to NSURL.appendingPathComponent, which makes blocking calls to the filesystem. So does URL(fileURLWithPath:) (here).

But I do agree that a URL type should be a pure model type, and that the results of URL manipulation should never depend on the state of the filesystem. That's why I also objected to the recent URL.lines async addition, which continues this trend of blurred lines -- you can request data (and text lines) using a URL, but the URL API should be kept pure and focussed on its task of... basically, string manipulation.

It would be impolite to link it here, but there are alternatives if you'd prefer a true model type for URLs.

13 Likes

I believe the goal of making the language easier through some of these ergonomics decision is on-point.

That said, the URL struct seems to be crowding over responsibilities which are not related to what a URL is. Bringing over FileManager functionality to URL seems to only benefit in the easy syntax access of the directories through static methods. The recently added URL.lines is also an example.

@Karl makes some great points about the use of dedicated tools for the job. If FilePath is a better tool for handling any file directory jobs, let’s normalize that instead of making URL fit too many roles and not doing them all as good.

In my opinion, and if I understand it as I think I am, the codebase of many projects would not be so heavily affected as this proposal implies. Maybe an example would communicate ideas better as your motivation of not going through this alternative.

I clarify in no way I’m saying to make everything break in order to innovate, but to guide developers into implementing better tools for their job over time. Maybe even at first, with optional opt-in.

In another subject, I would love to see a refinement or replacement to the use of an array urlqueryitems to modify a URL’s query.

Ideally, they could be methods like set, append and remove in URL and a URLQuery struct that allows going over, as a sequence, through the query items and even get their value from a O(1) key access.

This would enrich the concept of an URL as a specialized string manipulator which is its main responsibility.

7 Likes

I very warmly welcome the first public Foundation pitch (who would guess we'd read "FOU-NNNN" one day?). I hope to see more in the future :clap:

if there are other straightforward changes we should consider to help

I wonder, @icharleshu, if URL will become Sendable eventually.

16 Likes

Do I understand correctly that these are runtime assertions? Thus URL("invalid URL") would still compile and type-check as non-optional URL without any compile-time errors?

Also Comparable.

It sounds weird - like, surely everybody would have noticed already if URLs didn't conform to Comparable - but it's true! It doesn't conform!

print(URL(string: "http://example-a.com")! < URL(string: "http://example-b.com")!)
// ❌ Referencing operator function '<' on 'Comparable' requires that 'URL' conform to 'Comparable'

I find it really interesting that this isn't a bigger deal.

1 Like

Very excited about this! A huge portion of static URL initializers that I encounter are just immediately force-unwrapped (e.g. URL("https://forums.swift.org")!) so making this the default in StaticString cases like this will be a huge improvement.

5 Likes

This has always bothered me as well, because what if the path doesn't actually exist on disk at this point? Or what if the path isn't readable from the current process permissions? Or what if the items on disk are later replaced by others (e.g. file could turn into a directory).

It seems like building a path url shouldn't depend on the current state of the filesystem indeed.

10 Likes

just as a note: Darwin actually has thread local versions of the backing API for this. Perhaps on Darwin we can do something along those lines?

2 Likes

We are looking into Sendable conformance for a lot of types - I think it would be remiss of us to not make URL adhere to Sendable. However there are some complicating factors with that particular case that don't let it be as easy as I would like.

6 Likes

That would be good, but as a platform abstraction library, I think it's important that this is safe and has predictable behaviour on all systems. For example, for the platform functions, Microsoft says:

Multithreaded applications and shared library code should not use the GetCurrentDirectory function and should avoid using relative path names. The current directory state written by the SetCurrentDirectory function is stored as a global variable in each process, therefore multithreaded applications cannot reliably use this value without possible data corruption from other threads that may also be reading or setting this value.

This limitation also applies to the SetCurrentDirectory and GetFullPathName functions. The exception being when the application is guaranteed to be running in a single thread, for example parsing file names from the command line argument string in the main thread prior to creating any additional threads.

Using relative path names in multithreaded applications or shared library code can yield unpredictable results and is not supported.

It would be pretty harsh if this worked on Darwin, but is just an unsafe, mutable global on Windows (the kind of thing I think we're hoping to eliminate with the future concurrency work in Swift 6+). I think it needs to be safe everywhere.

In terms of semantics, isolating working directories at the thread level doesn't make a whole lot of sense for Swift's concurrency model, because tasks can be suspended and resumed on any thread by their executors. It would be really confusing if you're, say, using an async file API (like Foundation might offer), but your current working directory keeps changing across awaits. I think Task-local is the way to go here.

6 Likes

Doing that will definitely take a decent amount of work to accomplish, but is probably something that should be researched if it can even be done. Since the intrinsics to do so are thread based we need to make sure that thread local storage is donated to the task somehow (and then cleared from that thread?); I am not sure how that can be done just yet.

Alternatively it could be done as a closure: func withCurrentDirectory<R>(_ apply: (URL) throws -> R) rethrows -> R. That would ensure that within that scope it is used correctly.

3 Likes

These additions and changes look great and I hope that some of them can be marked as alwaysEmitIntoClient. :crossed_fingers:

3 Likes

New too Swift but what if instead of using.

we used an optional enum that defaults to .web and you would add something like .file for file path's?

init(_ string: StaticString, .file)

It’s probably out of scope for this but my biggest issue with URL is its strict parsing. The amount of times a feature breaks because suddenly we get a url with spaces that URL refuses to parse it’s not small. While other platforms are fine because their url parsing success :sweat:
I would love to see improvements on this aspect.

Ah.. I didn't know Swift favors String by default. Thank you Becca!