[API Review] FilePath Syntactic APIs

I think that if you consistently separate Root from Component everywhere, then consistent use of components in the FilePath APIs would not cause ambiguity, and you wouldn't have to preface every use of the word "component" by "relative." Alternatively, it would be reasonable to just call it "relative component" everywhere.

4 Likes
I had a long detailed thing about Path APIs and my experience with working on them. Then I realized this is about the System package. Read more here if you're interested.

I am very excited to see this API get proposed, as the lack of a good path API has been one of Swift's most obvious sore points. String isn't powerful enough, and URL is just completely the wrong thing to use.

In my own code, I've developed my own path type, and based on that experience I have a couple of points of feedback on this proposal as it stands:

  • I think the name FilePath is the wrong name. Paths are useful beyond their application to a file. A URL, for example, has a .path, but that isn't necessarily a file path. Paths can be used to refer to traversals through named nodes, or as a resource identifier. I think it would be a mistake to see this API introduced as FilePath, because it implies unsuitability for all other applications of paths, leaving developers to wonder if 1) they're using the wrong thing or 2) they should be using something inferior (String/URL).

  • Because of the applicability of paths to stuff beyond the filesystem, I don't think a proper Path type should have any built-in notions of how the path applies to the file system. Just like a Date has no intrinsic knowledge of calendars nor timezones, a Path shouldn't have any intrinsic knowledge of volumes, drives, roots, file system separators, and so on. Instead, a Path should be interpreted by another object (eg, FileManager), at which point those considerations can be applied. This eliminates the need for over-burdening the public API with use-case specific information.

  • One thing I haven't seen (or may have missed) is using a distinct type for relative paths. In my own code, I have a Path protocol, with two concrete adopters: AbsolutePath and RelativePath. Having distinct types makes it very easy to know what I need to be providing to an API. Appending always takes a RelativePath. Looking things up on the filesystem always takes an AbsolutePath. Pushing a current directory can take either. Having a RelativePath type makes working with consistent structures much easier. For example, if I'm working with lots of content bundles in an application, I can construct some standard relative paths (ex ./Contents/Resources, ./Contents/Frameworks, ./Info.plist) and then quickly apply them to all of my available bundles.

With this approach, some things become clearer:

  • The need for any sort of Root component is eliminated, because rootedness is only applicable to an AbsolutePath
  • PathComponents becomes a pretty straight-forward enum with three cases: .this (.), .up (..), and .item(String). A Path is little more than an array of PathComponents.

I'm intrigued that this API exists, as I've not really had occasion to use the System package. My big piece of feedback is...

What would it take to get this considered for the Standard Library as an actual Path API, and not as part of the System package? Paths are useful for much much more than measly file system operations. :smiley:

3 Likes

What's the distinction that you perceive between these two things?

(This isn't to say that there's no difference, but rather I'm interested in which differences are relevant to you.)

Visibility → The system package (so far) is not an "on by default" package and must be manually imported. This makes it far less likely to actually be used and gives the impression that it's not universally useful. This is in contrast to URL, which lives in the standard library and is therefore implied to be universally useful. Since URL (the current unsatisfactory Path type) is in the standard library, why not put a more appropriate and broadly applicable Path type there as well?

Applicability → The package name "system" implies that it's referring to the operating system. Paths are useful beyond their usage by an operating system's filesystem.

Layering → We have path-like types in a lower layer, so why would this API live at a different layer?

1 Like

A point of clarification: URL does not live in the standard library; it's in Foundation, which--like System--must be explicitly imported. This feeds into your layering point as well; System is a lower-level moduleš than Foundation, and is the lowest-level module providing a path-like thing as far as I know.

š I don't think that there's an explicit dependency of Foundation on System yet, but Foundation absolutely sits above the C-language API that System binds for Swift, and I expect that this dependency will exist in the fullness of time.

9 Likes

One way to do this would be to have the choice of path styles be based on the root—which is really a statement about string conversion:

enum Root {
  case posix // "/"
  case dos(String) // "C:\", limiting to Character possible too
  case unc(server: String, share: String) // "\\server\share\"
}

// in practice they won't actually be Strings, since we'd want to have only one underlying buffer

One downside of the Root+Components model that I'm only thinking of now is that it doesn't allow for DOS "C:relative\path" paths unless you consider "C:" a root, which would result in a world where "has a root" ≠ "is absolute". Fortunately these paths are pretty much useless in a modern world so I think "not supported by System.FilePath" is a valid answer, but that would at least have to be documented.

The other place where this falls down is for parsing, where "foo\bar/baz" is two components in a Unix path but three components in a Windows path. I'm not sure whether the default behavior should be "the current OS" (what you want up until it isn't), "always POSIX" (more restricted and so more likely to be caught in testing), or "always Windows" (more generous behavior but could cause problems when someone finally puts a backslash in a path), but you should always be able to override it with an initializer parameter.

I'd absolutely shy away from thread-local or task-local storage here; aside from slowing down otherwise CPU-and-memory-bound operations, I don't think it's a good idea to introduce a situation where different parts of a process are implicitly dealing with paths differently.

[EDIT: I went through this whole exercise without thinking about relative paths. Back to the drawing board…though I think the idea of having an underlying Form enum containing all the roots plus relativePOSIX and relativeDOS isn't the worst.]

The main other use I can think of is as part of a URL. What else are you thinking of?

  1. As far as I can tell, this FilePath doesn't expand ~ automatically, is that correct?

    This is how NSURL, Ruby's File and Pathname work, but personally, I think it's very non-intuitive, and bad UX. I've run into several pieces of software that misbehave because the developer forgot to call expandingTildeInPath. I don't think that a currency path type should distinguish between expanded and intact tildes, it's just a trap.

  2. I'm not a fan of starts(with:) and ends(with:). From my guess, you're trying to distinguish these from the naming conventions established by Swift.String such as hasPrefix. However, I think we should use domain language instead, like:

    • "/usr".isParent(of: "/usr/bin")
    • "/usr/bin".isChild(of: "/usr")
  3. How does stem work with multiple extensions, like foo.tar.gz? Is the result foo.tar, or just foo?

What about recognizing or normalizing separators? \ is part of a component on Unix but a separator on Windows.

This proposal does, along with C++17 and C#.

Yes, that is the world we live in. Rust does use the term "root" to only mean the \ separator and "prefix" to refer to anything before it. But, even there, \foo\bar is not absolute.

My knee-jerk reaction is to support these, but I don't have a good intuition of Windows best practices. @compnerd?

However, we don't support legacy DOS devices at the library level (though they would pass through to a syscall). The latter combined with separator normalization does mean that you can't use a trailing separator to distinguish between a legacy device and a folder/file named after a legacy device, but that's probably for the better. This is also consistent with C#, FWIW, though they don't strip a trailing separator.

Or, in the future we stick a bit on FilePath and introduce UnixPath and WindowsPath, likely conforming to some common protocol. Or, perhaps even make one a typealias depending on platform, haven't thought that through though.

2 Likes

No, System is a library for systems level programming. Shell expansion happens, well, in the shell prior to a syscall, so we wouldn't want to automatically expand these. We will be adding support for accessing the environment in the future which would include variable and tilde expansion. But that should remain an explicit operation, as its behavior is dependent on the environment.

Nope, they're from BidirectionalCollection. No strings involved.

foo.tar. Some API for traversing the "components" of a component is future work (which I'll call out explicitly in the next version of the proposal). I'll also update the comment to show this example, thanks for pointing that out.

2 Likes

@Michael_Ilseman

But that should remain an explicit operation, as its behavior is dependent on the environment.

Why is that? It seems needlessly error-prone to me. What usecase is there for distinguishing ~/Desktop from /Users/User/Desktop? A failure to expand the first into the second is just a programming error (IMO), so why would we want to allow it to go unfixed?

Yep, makes sense, I figured you were trying to dinstance yourself from the terminology used by String. But I think you missed this part:

The result of this expansion completely depends on which user executes the expansion. ~/Desktop will expand to /Users/User1/Desktop when executed as User1 (I'm not sure if the expansion would actually read from the USER environment vairable, but that would make sense), and to /Users/User2/Desktop when executed as User2. A call to setuid could change the current user between multiple expansion calls, which would lead to different results. This may not be desirable in a lot of cases. I personally would prefer a library implementing this to be explicit about it, and definitely would not expect such a low-level library as Swift System to implement this at all.

6 Likes

You can have a file or folder named ~. Auto-expansion would yield surprising behavior in programmatic code. I do agree that a UI/CLI would normally want to expand the tilde prior to making any syscalls.

Tilde expansion is dependent on environment variables, which can be modified during program execution. We want programmatic consistency when during purely syntactic operations, such as creating a path.

Even worse, reading the value of an environment variable can be a data race. We don't want to make FilePath.init's behavior dependent on what other threads in the process are doing.

@lukasa, how would server-side programming view automatic and implicit expansion of tilde?

Clients of System are encouraged to wrap FilePath with a type ensuring/asserting on their notion of "canonical". This is highly use-case dependent, as mentioned in the proposal.

6 Likes
➜  test  mkdir -p \~/Desktop
➜  test  cd \~/Desktop
➜  ~  pwd
/Users/avi/test/~/Desktop

Yes, in the shell one must escape the ~, but in a Swift script or program, I would not expect to have to do so. Shell expansion should not be part of a low-level API.

4 Likes

As you said upthread, the answer is contextual and depends very much where you use it, but in general it would be considered surprising.

In general, shell expansion on paths is not something we'd expect to see in the lowest-layer path API. It's important to have an API that says "treat this the way a shell would", but the more magic you allow in your path API the more security risk you open yourself up to. In this context ~ is effectively a magic spelling of ${HOME}, and I don't think anyone would propose that arbitrary environment variable injection into paths is a good idea either.

So I agree with the rest of the thread: expanding ~ at this API level is an anti-feature.

9 Likes

I wish I had the time and energy to review the proposal and remember what is good about Path.swift.

Certainly, like usual with my open source I designed it for me however there are some nice things:

  • Always absolute paths (avoids a whole category of potential (potentially devastating) bugs where the developer doesn't realize a path they have is relative but try to use it anyway))
  • Chainable syntax (typically you have to do a series of operations, but this is perhaps less relevant if the API does not provide copy, move, delete etc.) due to how Swift works this also leads to a single try which is pleasant.
  • Separate functions for copying (moving, etc.) files into directories versus to files (again, prevents common bugs where a directory exists that you didn't expect, shell scripts suffer here all the time and NSFileManager does the same as the shell)
  • Always normalizing paths (I see this is done)
  • Codable implementation can take relative paths (avoids bugs where paths change, eg. app launches after reinstall, or username change, edge case bugs, but I aim for APIs that are as robust as possible)

There's some other nice features that an official API shouldn't have (eg. operator /, @dynamicMember paths) so I didn't mention them.

In general I was quite thoughtful about every aspect, however it was years ago I wrote this so cannot remember all the little details.

I will try to watch this proposal and be useful, thanks.

Another recommendation, don't aim to cover 100% of what people will ask for, doing the basics will suffice. The community will then fill in the gaps with extensions, some of them will be great and can then make future proposals.

4 Likes

Do you have some concrete examples of this? Most of what I've come up with would be better handled by a lexicallyStarts(with:). There is a point where we don't want to duplicate every API with a lexicallyFoo variant, but if lexicallyContains is the right tool and covers the important cases, we can definitely add it.

My biggest objection to that method is simply its name. Its name does not, IMO, clearly communicate what it does. This minimises the odds that it will occur to users to call that function when we want them to call it.

In this instance I think I'm proposing a special-case, where an alternative name would be preferred solely to make it more discoverable for users.

Version 2 of this proposal can be found here. This thread was tremendously helpful and I want to thank everyone who took the time to review and comment on this API.

Changes:

  • Spin off FilePath.Root from Component
    • Provides a much clearer separation of API
    • Allows for many corner cases to be handled by the type system
    • FilePath.Root can in the future be a namespace for Windows root analysis
  • FilePath.ComponentView is now also a RangeReplaceableCollection
    • Standard Swift algorithms operate over homogeneous components of a path
  • FilePath.Component has a Kind enum, illustrating mutual exclusivity
  • relativePath was renamed removingRoot()
    • Consistent with other "removing" APIs
    • More precise on Windows (where a rooted path can be relative)
  • Rename basename/dirname to lastComponent and removingLastComponent()
    • Include @available(unavailable, renamed:) entries for discoverability
    • removeLast is now removeLastComponent (it doesn't remove a root)
  • append overhaul:
    • append now only takes Components, so there are no roots involved
    • append overload taking a String for common stringy treatment of paths
      • Ignores a leading separator if needed
      • Will be preferred for string literal arguments
    • push is introduced for the common cd-like semantics
      • aka join in Python, push in Rust, Combine in C#, append in C++17
  • ing variants of everything introduced for expression chaining
    • __consuming and __owned annotations added to make these efficient
  • Add lexicallyResolving(), a secure-ish append over untrusted subpaths
  • Add CTypes empy enum to serve as a namespace for C typealiases
    • PlatformChar and PlatformUnicodeEncoding are nested inside CTypes
    • Allows us to add more C types without polluting global namespace
  • Added deferred/future-work section about working with paths from another platform
8 Likes