SE-0529: Add FilePath to the Standard Library

[quote="John_McCall, post:34, topic:86194"]
Also, I'm not sure if any system I/O APIs actually provide an async resolve.
[/quote]

As far as I am aware no. At least on Linux there is no way to perform path resolution without potentially blocking on networked filesystem operations.

Don't think it's particularly relevant but I think you can achieve it by io_uring'ing an open/openat using O_PATH (neither read nor write) and then getting the path from that. All in linked I/Os to also insta-close it again. So essentially a linked open+get path+close io_uring.

Apart from that the usual file I/O thread pool offload of course does the trick too (with some caveats regarding pool size/potential growth).

May I suggest we defer it to a separate proposal to give the team time to investigate if there are platform-specific non-blocking options we can leverage?

It parses properly. The "Windows path styles" section of the proposal calls out that / is treated as \ on conventional (non-verbatim) Windows paths, and the ComponentView normalization rule formalizes it. So let p: FilePath = "/usr/local/bin" on Windows decomposes to anchor \ with components [usr, local, bin] and renders back with \ separators. The reference implementation has a playground if you want to try paths across platforms: GitHub - milseman/SE-0529-FilePath: Reference implementation for SE-0529 Add FilePath to the Standard Library ¡ GitHub.

A future direction is a multi-platform path library (probably not in the Swift module) that would include failable conversions between platform-specific paths. Similarly, protocols to abstract or generalize when possible. Another (or the same) future direction is path parsing functionality over Span-like types. That might be spelled using a generic wrapper type as you mentioned, or some other way.

Either way, these are distinct from the role that FilePath performs as a currency type passed all the way from command-line scripts down to the kernel itself (and through every layer in between).

Yes

There's a little nuance worth pulling apart. A build system that's itself a Swift program running on some machine uses FilePath for filesystem work on that machine, same as any other program. This is fine.

The case to avoid is converting arbitrary compile-time path literals into runtime paths under the assumption they will somehow be meaningful at run time. This is especially dubious under cross compilation, but it's still dubious even if it's for the same platform: those files may simply not exist at run time at that location.

Yes, perhaps it's worth an additional clarifying sentence in the Introduction or the top of Proposed Solution to make this clearer.

You can, it works. If you were to, say, write:

let assets = #"C:\foo\bar"#
// or
let assets = #"\\?\UNC\server\share\foo\bar"#

you'd end up with a very different looking path on Linux or Darwin that would likely be meaningless. This is a fundamental constraint of supporting multiple platforms in a programming language and standard library.

FilePath is the platform-level path type for the running process: it's the value you hand to the OS to open a file. Under cross-compilation the running process is the target, so a binary compiled for Linux runs on Linux and opens files via Linux paths. Having FilePath track the host's syntax in macros and plugins would produce values the target's kernel can't use, which defeats the type.

Again, it sounds like the underlying need maps to the future work above: a multi-platform path library where you can construct and manipulate, say, a Linux path explicitly from a macOS process, with failable conversions where they make sense. That's the right home for "I want to reason about a path with rules other than my host's." I'll brush up the Future Directions description to make that role clearer.

2 Likes

Should this type be defined in the Runtime module that we now have in the standard library?

What parts of the standard library aren’t used at runtime?

Runtime doesn't strike me as the right fit here. That module right now only holds APIs for backtracing and symbolication; it's designed to house low-level details about program execution and their interactions with the Swift core. FilePath would sit oddly next to those.

Also if we added more I/O features in the future (either within stdlib or some other toolchain module), it would be weird to tell clients "you also need to import the module that primarily contains backtracing stuff to get the FilePath type".

Runtime is also not an implicit import of Swift so putting it there would forbid any other use of it in the standard library, like the API that SE-0513 would like to add to CommandLine.

12 Likes

I agree with making the unlabeled String initializer on FilePath be failable, just like FilePath.Anchor.init?(_:String) and FilePath.Component.init?(_:String). Similarly, the Span-based inits. It will reject strings/spans containing NUL.

Beyond that, future work is more validation API to handle degenerate path forms. Such paths are syntactically valid from the kernel's perspective, even if dubious or semantically meaningless. It's the role of future and higher-level API to reason about path validity, which may involve application-specific concerns or configuration details.

3 Likes

I am open to dropping isRelative, especially since there are multiple different kinds of relativity on Windows (foo\bar, C:foo\bar, \foo\bar).

I think this should be future work, especially since we'd have to tackle whether this only works for individual components, collections of components, or if this is a path join operation (in which case we have to decide what to do if the RHS has an anchor in it).

That code snippet would invoke String's conformance to ExpressibleByStringInterpolation in order to make a string to pass to FilePath's init. Nothing we can do to prevent it outside of making it impossible to construct paths from strings (which I don't think we want to prevent).

Hi all, I thought I'd post a TL;DR; from this discussion thread to sync up on what has changed and what hasn't:

Changes

We're arguing for the following API changes:

  • FilePath.init(_ string: String) becomes failable, matching the corresponding inits on Anchor and Component. The Span-based init becomes failable as well. Both reject input containing NUL. init(stringLiteral:) keeps its trapping behavior for ill-formed literals.
  • resolve() is annotated @available(*, noasync).
  • nullTerminatedCodeUnits is removed and replaced by a closure-based withCString(_:) tracking String.withCString(_:). The body's pointer parameter is UnsafePointer<FilePath.CodeUnit>, so it is wide on Windows. String already establishes via withCString(encodedAs:_:) that this name is not narrow-only.
  • isRelative is removed; isAbsolute remains. There are several distinct kinds of relative on Windows (foo\bar, C:foo\bar, \foo\bar), which a single negated property obscures.

Future work

The proposal's existing Future Directions cover several deferred items already. Based on this thread, we want to clarify or expound on the following:

  • The multi-platform path library direction: failable cross-platform conversions and possibly a generic-over-storage path type, distinct from the currency-type role FilePath is playing here.
  • Async non-blocking and sync non-blocking variants of resolve(), both pending a stdlib I/O pool design.
  • Richer validation API for degenerate path forms. The new failable inits reject only NUL; further validation requires application-specific configuration.
  • A / operator for append, with the component-vs-sequence-vs-join question called out explicitly.
  • Two language-level wishes that would shape future FilePath API:
    • a standard story for null-terminated pointer types (e.g. a var cString instead of closure-based withCString)
    • an effects system for blocking I/O that would let resolve() carry the corresponding annotation.

Kept, with rationale

resolve() stays in the proposal, synchronous and blocking, marked @noasync. Correct path resolution is hard to get right and easy to get wrong: it interacts with symlinks, .. segments, mount points, Darwin volfs anchors, and Windows reparse points. Without a stdlib implementation from day one, developers will write their own incorrect or racy versions. Async non-blocking is a much needed, but longer-term, variant that depends on an I/O pool. Async+blocking would be worse to ship than sync+blocking, since it would silently block executors.

We're keeping resolve() as the proposed name. We'll add a new Alternatives Considered subsection for the alternative names proposed (resolveByBlocking(), resolveFromFileSystem(), sync/async pairs, namespaced free functions) for LSG visibility. We're still recommending the original name but curious how the LSG views this issue.

Character is kept for separator and driveLetter. The intended usage is more print()-like than low-level parser-like. Code that parses path bytes by hand is already platform-encoding-specific enough to zero-extend the separator (U+002F or U+005C) via separator.utf8.first!. Better support for custom parsers is future work.

Cross-platform parsing of string literals is by design. A POSIX literal parses correctly on Windows and renders with \. Some literal forms remain unavoidably platform-specific, and that is a constraint of multi-platform support rather than a defect of FilePath.

FilePath tracks the target platform under cross-compilation, not the host. It is the path type for the running process, intended to be handed to the OS. Use cases wanting host syntax in macros and plugins are better served by the multi-platform path library above.

8 Likes

If a value of this type isn't portable across platforms should it really be called FilePath or something like OSPath to make that clear and not name-squat on a cross platform version?

6 Likes

Can this pass the length to the closure? Otherwise any call to a bounded function such as strn... or strl... requires calling strlen, even though the information was already available.

2 Likes

I mostly haven't been following, but I found the amount of platform specificity concerning around anchors and suffixes, not because I don't think they're useful, but because they don't feel like they ever come with a guarantee of completeness...namedfork was designed to support multiple forks, not just the resource fork, even if it didn't turn out that way. /.nofollow is relatively new; who's to say what Apple will add next year? //?/ is a whole reserved namespace of things Microsoft could add. These two examples behave differently too because Apple ships Swift with the OS and can update FilePath in the same release as a new anchor feature, whereas Windows apps bundle a version of the stdlib that applies no matter which version of the OS is running. And…well, I was going to say "what about the upcoming BSD port", but I suppose this becomes one more thing involved in porting to a new platform.

The API I expected was

  • file system representation
  • syntactic dirname and basename
  • support for "classic" Windows paths
  • trailing / support
  • maybe support for //<component>/ as a root, since "iterate components" starting with ? would be weird

I'm not sure what we get by treating all the anchors as special here, in the lowest-level API on the platform. Sure, you can't decompose /a/b/c into b/c relative to resolve(/a). But you can't do that with /.nofollow anyway; /.nofollow/a/b/c isn't equivalent to b/c relative to resolve(/.nofollow/a).

Given that no one else has mentioned this and that the general design has been around in the System package for a while, I figure I must be missing something, or at least incorrect in my evaluation of tradeoffs. But I'd like to see that written down in the proposal, because I found the current explanation of the "complex" anchors (and suffixes) unmotivated.

3 Likes