Pitch: Add FilePath to the standard library

Hello, I'd like to pitch adding FilePath to the stdlib. This topic has come up in a number of threads and discussions on the forums and I think it makes sense for us to seriously consider this direction.

You can view the latest and full draft in the S-E PR


Add FilePath to the Standard Library

  • Proposal: TBD
  • Authors: Michael Ilseman
  • Review Manager: TBD
  • Status: Pitch
  • Implementation: TBD
  • Review: TBD

Introduction

We propose adding FilePath and its syntactic operations to the Swift standard library. These operations inspect, decompose, and modify paths without making system calls. The API has been shipping in the swift-system package since version 0.0.2, iterated through two review cycles.

Motivation

Swift has no standard representation for file system paths. The swift-system package introduced FilePath in System 0.0.1, and it has since gained a comprehensive set of syntactic operations for platform-correct path manipulation. But because FilePath lives in an external package, it cannot be depended on by the standard library or the Swift runtime, nor can it appear in API in toolchain libraries such as Foundation.

Every new API that needs to name a file path faces the same dilemma around using String that SE-0513 faces.

String is a poor fit for paths. It does not capture the structure of a file path, as paths have an optional root, a sequence of components, and things like per-component stem/extension decomposition. Path string representations are platform-specific: the separator is / on Unix and \ on Windows, and Windows paths have complex root forms (drive letters, UNC paths, device paths). Path encodings are also platform-specific and not Unicode. On Unix, paths are null-terminated byte sequences that are not necessarily valid UTF-8; on Windows, null-terminated UInt16 sequences that are not necessarily valid UTF-16.

FilePath addresses all of these concerns. It stores the path in its native platform encoding, provides a rich set of syntactic operations that are consistent across platforms, and enables strongly-typed programming with paths.

Proposed solution

We propose adding FilePath, FilePath.Root, FilePath.Component, and FilePath.ComponentView to the Swift module, along with their syntactic operations for decomposition, mutation, lexical normalization, and string conversion.

var path: FilePath = "/tmp/archive.tar.gz"

path.extension               // "gz"
path.stem                    // "archive.tar"
path.lastComponent           // "archive.tar.gz" as a FilePath.Component
path.removingLastComponent() // "/tmp"
path.isAbsolute              // true
path.root                    // "/"

path.starts(with: "/tmp")   // true
path.starts(with: "/tm")    // false (component-aware)

// Protecting against path traversal
let base: FilePath = "/var/www/static"
base.lexicallyResolving("../../etc/passwd")  // nil

// In-place mutation
var config: FilePath = "/etc/nginx/nginx.conf"
config.extension = "bak"     // "/etc/nginx/nginx.bak"

// Iterating components
for component in path.components {
    print(component, component.kind)
}

Detailed design

FilePath

FilePath stores a null-terminated sequence of platform characters (CChar on Unix, UInt16 on Windows). It normalizes directory separators on construction: trailing separators in the relative portion are stripped, repeated separators are coalesced, and on Windows forward slashes are normalized to backslashes.

/// A file path is a null-terminated sequence of bytes that represents
/// a location in the file system.
///
/// The file path is stored in the file system's native encoding:
/// UTF-8 on Unix and UTF-16 on Windows.
///
/// File paths are a currency type across many APIs.
///
/// Example:
///
///     let path: FilePath = "/tmp/foo.txt"
///     if path.isAbsolute && path.extension == "txt" {
///         // ...
///     }
///
public struct FilePath: Sendable {
  /// Creates an empty file path.
  public init()

  /// Creates a file path from a string.
  public init(_ string: String)
}

extension FilePath:
  Hashable, Codable,
  CustomStringConvertible, CustomDebugStringConvertible,
  ExpressibleByStringLiteral
{
  /// A textual representation of the file path.
  ///
  /// If the content of the path isn't well-formed Unicode,
  /// this replaces invalid bytes with U+FFFD. See `String.init(decoding:)`.
  public var description: String { get }

  /// A textual representation of the file path, suitable for debugging.
  public var debugDescription: String { get }

  /// Create a file path from a string literal.
  public init(stringLiteral: String)
}

FilePath.Root

FilePath.Root represents the root of a path. On Unix, this is simply /. On Windows, it can include volume and server/share information in several syntactic forms.

extension FilePath {
  /// Represents a root of a file path.
  ///
  /// On Unix, a root is simply the directory separator `/`.
  ///
  /// On Windows, a root contains the entire path prefix up to and including
  /// the final separator.
  ///
  /// Examples:
  /// * Unix:
  ///   * `/`
  /// * Windows:
  ///   * `C:\`
  ///   * `C:`
  ///   * `\`
  ///   * `\\server\share\`
  ///   * `\\?\UNC\server\share\`
  ///   * `\\?\Volume{12345678-abcd-1111-2222-123445789abc}\`
  public struct Root: Sendable { }
}

extension FilePath.Root:
  Hashable,
  CustomStringConvertible, CustomDebugStringConvertible,
  ExpressibleByStringLiteral
{
  /// A textual representation of the path root.
  ///
  /// If the content of the path root isn't well-formed Unicode,
  /// this replaces invalid bytes with U+FFFD. See `String.init(decoding:)`.
  public var description: String { get }

  /// A textual representation of the path root, suitable for debugging.
  public var debugDescription: String { get }

  /// Create a file path root from a string literal.
  ///
  /// Precondition: `stringLiteral` is non-empty and is a root.
  public init(stringLiteral: String)

  /// Create a file path root from a string.
  ///
  /// Returns `nil` if `string` is empty or is not a root.
  public init?(_ string: String)
}

FilePath.Component

FilePath.Component represents a single non-root component of a path. Components are always non-empty and do not contain a directory separator.

extension FilePath {
  /// Represents an individual, non-root component of a file path.
  ///
  /// Components can be one of the special directory components (`.` or `..`)
  /// or a file or directory name. Components are never empty and never
  /// contain the directory separator.
  ///
  /// Example:
  ///
  ///     var path: FilePath = "/tmp"
  ///     let file: FilePath.Component = "foo.txt"
  ///     file.kind == .regular           // true
  ///     file.extension                  // "txt"
  ///     path.append(file)               // path is "/tmp/foo.txt"
  ///
  public struct Component: Sendable {
    /// Whether a component is a regular file or directory name, or a special
    /// directory `.` or `..`
    public enum Kind: Sendable {
      /// The special directory `.`, representing the current directory.
      case currentDirectory

      /// The special directory `..`, representing the parent directory.
      case parentDirectory

      /// A file or directory name
      case regular
    }

    /// The kind of this component
    public var kind: Kind { get }
  }
}

extension FilePath.Component:
  Hashable,
  CustomStringConvertible, CustomDebugStringConvertible,
  ExpressibleByStringLiteral
{
  /// A textual representation of the path component.
  ///
  /// If the content of the path component isn't well-formed Unicode,
  /// this replaces invalid bytes with U+FFFD. See `String.init(decoding:)`.
  public var description: String { get }

  /// A textual representation of the path component, suitable for debugging.
  public var debugDescription: String { get }

  /// Create a file path component from a string literal.
  ///
  /// Precondition: `stringLiteral` is non-empty and has only one component in it.
  public init(stringLiteral: String)

  /// Create a file path component from a string.
  ///
  /// Returns `nil` if `string` is empty, a root, or has more than one component
  /// in it.
  public init?(_ string: String)
}

Stem and extension

Components may be decomposed into their stem and optional extension (.txt, .o, .app, etc.). FilePath provides convenience APIs for dealing with the stem and extension of the last component.

For full code listing, see the S-E PR

FilePath.ComponentView

FilePath.ComponentView is a BidirectionalCollection and RangeReplaceableCollection of the non-root components that comprise a path.

For full code listing, see the S-E PR

Basic queries

extension FilePath {
  /// Returns whether `other` is a prefix of `self`, only considering
  /// whole path components.
  ///
  /// Example:
  ///
  ///     let path: FilePath = "/usr/bin/ls"
  ///     path.starts(with: "/")              // true
  ///     path.starts(with: "/usr/bin")       // true
  ///     path.starts(with: "/usr/bin/ls")    // true
  ///     path.starts(with: "/usr/bin/ls///") // true
  ///     path.starts(with: "/us")            // false
  ///
  public func starts(with other: FilePath) -> Bool

  /// Returns whether `other` is a suffix of `self`, only considering
  /// whole path components.
  ///
  /// Example:
  ///
  ///     let path: FilePath = "/usr/bin/ls"
  ///     path.ends(with: "ls")             // true
  ///     path.ends(with: "bin/ls")         // true
  ///     path.ends(with: "usr/bin/ls")     // true
  ///     path.ends(with: "/usr/bin/ls///") // true
  ///     path.ends(with: "/ls")            // false
  ///
  public func ends(with other: FilePath) -> Bool

  /// Whether this path is empty
  public var isEmpty: Bool { get }
}

Absolute and relative paths

Windows roots are more complex than Unix roots and can take several syntactic forms. The presence of a root does not imply the path is absolute on Windows. For example, C:foo refers to foo relative to the current directory on the C drive, and \foo refers to foo at the root of the current drive. Neither is absolute (i.e. "fully-qualified" in Windows terminology).

For full code listing, see the S-E PR

Path decomposition

Paths can be decomposed into their optional root and their (potentially empty) relative components.

For full code listing, see the S-E PR

Lexical operations

FilePath supports lexical operations (i.e. operations that do not consult the file system to follow symlinks) such as normalization of . and .. components.

extension FilePath {
  /// Whether the path is in lexical-normal form, that is `.` and `..`
  /// components have been collapsed lexically (i.e. without following
  /// symlinks).
  ///
  /// Examples:
  /// * `"/usr/local/bin".isLexicallyNormal == true`
  /// * `"../local/bin".isLexicallyNormal   == true`
  /// * `"local/bin/..".isLexicallyNormal   == false`
  public var isLexicallyNormal: Bool { get }

  /// Collapse `.` and `..` components lexically (i.e. without following
  /// symlinks).
  ///
  /// Examples:
  /// * `/usr/./local/bin/.. => /usr/local`
  /// * `/../usr/local/bin   => /usr/local/bin`
  /// * `../usr/local/../bin => ../usr/bin`
  public mutating func lexicallyNormalize()

  /// Returns a copy of `self` in lexical-normal form, that is `.` and `..`
  /// components have been collapsed lexically (i.e. without following
  /// symlinks). See `lexicallyNormalize`
  public func lexicallyNormalized() -> FilePath
}

FilePath also provides API to protect against arbitrary path traversal from untrusted subpaths:

extension FilePath {
  /// Create a new `FilePath` by resolving `subpath` relative to `self`,
  /// ensuring that the result is lexically contained within `self`.
  ///
  /// `subpath` will be lexically normalized (see `lexicallyNormalize`) as
  /// part of resolution, meaning any contained `.` and `..` components will
  /// be collapsed without resolving symlinks. Any root in `subpath` will be
  /// ignored.
  ///
  /// Returns `nil` if the result would "escape" from `self` through use of
  /// the special directory component `..`.
  ///
  /// This is useful for protecting against arbitrary path traversal from an
  /// untrusted subpath: the result is guaranteed to be lexically contained
  /// within `self`. Since this operation does not consult the file system to
  /// resolve symlinks, any escaping symlinks nested inside of `self` can still
  /// be targeted by the result.
  ///
  /// Example:
  ///
  ///     let staticContent: FilePath = "/var/www/my-website/static"
  ///     let links: [FilePath] =
  ///       ["index.html", "/assets/main.css", "../../../../etc/passwd"]
  ///     links.map { staticContent.lexicallyResolving($0) }
  ///       // ["/var/www/my-website/static/index.html",
  ///       //  "/var/www/my-website/static/assets/main.css",
  ///       //  nil]
  public func lexicallyResolving(_ subpath: FilePath) -> FilePath?
}

Modifying paths

extension FilePath {
  /// If `prefix` is a prefix of `self`, removes it and returns `true`.
  /// Otherwise returns `false`.
  ///
  /// Example:
  ///
  ///     var path: FilePath = "/usr/local/bin"
  ///     path.removePrefix("/usr/bin")   // false
  ///     path.removePrefix("/us")        // false
  ///     path.removePrefix("/usr/local") // true, path is "bin"
  ///
  public mutating func removePrefix(_ prefix: FilePath) -> Bool

  /// Append a `component` on to the end of this path.
  ///
  /// Example:
  ///
  ///     var path: FilePath = "/tmp"
  ///     let sub: FilePath = "foo/./bar/../baz/."
  ///     for comp in sub.components.filter({ $0.kind != .currentDirectory }) {
  ///       path.append(comp)
  ///     }
  ///     // path is "/tmp/foo/bar/../baz"
  ///
  public mutating func append(_ component: FilePath.Component)

  /// Append `components` on to the end of this path.
  ///
  /// Example:
  ///
  ///     var path: FilePath = "/"
  ///     path.append(["usr", "local"])     // path is "/usr/local"
  ///     let otherPath: FilePath = "/bin/ls"
  ///     path.append(otherPath.components) // path is "/usr/local/bin/ls"
  ///
  public mutating func append<C: Collection>(_ components: C)
    where C.Element == FilePath.Component

  /// Append the contents of `other`, ignoring any spurious leading separators.
  ///
  /// A leading separator is spurious if `self` is non-empty.
  ///
  /// Example:
  ///   var path: FilePath = ""
  ///   path.append("/var/www/website") // "/var/www/website"
  ///   path.append("static/assets") // "/var/www/website/static/assets"
  ///   path.append("/main.css") // "/var/www/website/static/assets/main.css"
  ///
  public mutating func append(_ other: String)

  /// Non-mutating version of `append(_:Component)`.
  public func appending(_ other: Component) -> FilePath

  /// Non-mutating version of `append(_:C)`.
  public func appending<C: Collection>(
    _ components: C
  ) -> FilePath where C.Element == FilePath.Component

  /// Non-mutating version of `append(_:String)`.
  public func appending(_ other: String) -> FilePath

  /// If `other` does not have a root, append each component of `other`. If
  /// `other` has a root, replaces `self` with other.
  ///
  /// This operation mimics traversing a directory structure (similar to the
  /// `cd` command), where pushing a relative path will append its components
  /// and pushing an absolute path will first clear `self`'s existing
  /// components.
  ///
  /// Example:
  ///
  ///     var path: FilePath = "/tmp"
  ///     path.push("dir/file.txt") // path is "/tmp/dir/file.txt"
  ///     path.push("/bin")         // path is "/bin"
  ///
  public mutating func push(_ other: FilePath)

  /// Non-mutating version of `push()`
  public func pushing(_ other: FilePath) -> FilePath

  /// In-place mutating variant of `removingLastComponent`.
  ///
  /// If `self` only contains a root, does nothing and returns `false`.
  /// Otherwise removes `lastComponent` and returns `true`.
  ///
  /// Example:
  ///
  ///     var path = "/usr/bin"
  ///     path.removeLastComponent() == true  // path is "/usr"
  ///     path.removeLastComponent() == true  // path is "/"
  ///     path.removeLastComponent() == false // path is "/"
  ///
  @discardableResult
  public mutating func removeLastComponent() -> Bool

  /// Remove the contents of the path, keeping the null terminator.
  public mutating func removeAll(keepingCapacity: Bool = false)

  /// Reserve enough storage space to store `minimumCapacity` platform
  /// characters.
  public mutating func reserveCapacity(_ minimumCapacity: Int)
}

Rationale: removeLastComponent does not return the component, as components are slices of FilePath's underlying storage. Returning a removed component would trigger a copy-on-write copy.

Rationale: We do not propose append taking a FilePath since appending absolute paths is problematic. Silently ignoring a root (loose stringy semantics) is commonly expected when given a string literal, so we provide an overload of append taking a String, which is far more convenient than splitting components out by hand. Silently ignoring a root is surprising and undesirable in programmatic/strongly-typed use cases, so we provide push, which has similar semantics to operations from other languages (Rust's push, C#'s Combine, Python's join, and C++17's append). This allows programmatic use cases to explicitly choose semantics by calling either other.push(myPath) or other.append(myPath.components).

Paths and strings

FilePath, FilePath.Component, and FilePath.Root can be decoded/validated into a Swift String.

FilePath, FilePath.Component, and FilePath.Root each provide String.init(decoding:) (lossy, replacing invalid bytes with U+FFFD) and String.init?(validating:) (returning nil on invalid encoding), interpreting content as UTF-8 on Unix and UTF-16 on Windows. Each type also provides a convenience .string property equivalent to the decoding initializer.

For full code listing, see the S-E PR

Rationale: While we strongly encourage the use of strong types for handling paths and path operations, systems programming has a long history of using weakly typed strings as paths. These properties enable more rapid prototyping and easier testing while being far more discoverable and ergonomic than the corresponding String initializers. This API (anti)pattern is to be used sparingly.

Separator normalization

FilePath normalizes directory separators on construction and maintains this invariant across mutations. In the relative portion of the path, FilePath strips trailing separators and coalesces repeated separators.

  FilePath("/a/b/") == "/a/b"
  FilePath("a///b") == "a/b"

Windows accepts either forward slashes (/) or backslashes (\) as directory separators, though the platform's preferred separator is backslash. On Windows, FilePath normalizes forward slashes to backslashes on construction. Separators after a UNC server/share or DOS device path's volume are treated as part of the root.

  FilePath("C:/foo/bar/") == #"C:\foo\bar"#
  FilePath(#"\\server\share\folder\"#) == #"\\server\share\folder"#
  FilePath(#"\\server\share\"#) == #"\\server\share\"#
  FilePath(#"\\?\volume\"#) == #"\\?\volume\"#

Rationale: Normalization provides a simpler and safer internal representation. A trailing slash can give the false impression that the last component is a directory, leading to correctness and security hazards.

Source compatibility

All changes are additive.

Existing users of SystemPackage.FilePath or System.FilePath may encounter ambiguity if they also have the standard library's FilePath in scope. The migration strategy is described below.

ABI compatibility

This proposal is purely an extension of the ABI of the standard library and does not change any existing features.

On Darwin, System.FilePath currently has ABI commitments. Migration from the System module on Darwin can be handled through ABI-level redirection so that existing binaries linked against System.FilePath continue to work.

Implications on adoption

Adopters will need a toolchain that includes this change. The type cannot be back-deployed to older runtimes without additional work.

For existing users of swift-system, SystemPackage (the SwiftPM package) can use #if conditionals to provide initializers and conversions between SystemPackage.FilePath and Swift.FilePath on toolchain versions that include this change, enabling a smooth source-compatible migration path. System (the Darwin framework) can perform ABI migration, redirecting the existing System.FilePath symbol to the standard library implementation, preserving binary compatibility for existing Darwin binaries.

Future directions

Platform string APIs and CInterop

swift-system defines a CInterop namespace with typealiases for platform-specific character types (PlatformChar, PlatformUnicodeEncoding) and provides withPlatformString/init(platformString:) APIs on FilePath, FilePath.Component, and FilePath.Root. These are important escape hatches for C interoperability. For now, these APIs remain in swift-system. Bringing them into the standard library would require a notion of what a "platform string" is at the standard library level, which is a larger design question.

SystemString

swift-system internally uses a SystemString type that handles the underlying storage for FilePath. This type may be independently useful as a public type for working with null-terminated platform-encoded strings.

Operations that consult the file system

Operations such as resolving symlinks, checking existence, and enumerating directory contents require system calls. These remain in swift-system and are not part of this proposal.

RelativePath and AbsolutePath

Libraries and tools built on top of FilePath often raise some notion of "canonical" paths to type-level salience. This design space includes lexically-normalized absolute paths, semantically-normal paths (expanding symlinks and environment variables), and equivalency-normal paths (Unicode normalization, case-folding). Each tool may have a slightly different notion of "absolute" (e.g. whether ~ counts). We are deferring these types until the design space is better understood. Libraries and tools can define strongly-typed wrappers over FilePath that check their preconditions on initialization.

Windows root analysis

Windows roots can be decomposed further into their syntactic form (traditional DOS vs. DOS device syntax) and their volume information (drive letter, UNC server/share). APIs for this decomposition could be added in the future.

Paths from other platforms

A cross-platform application targeting a specific platform (e.g. a script that manages files on a remote Linux server) might want to construct and manipulate paths with the semantics of a platform other than the host. This could be addressed by explicit UnixPath and WindowsPath types conforming to a common protocol.

Alternatives considered

Do more: bring all of swift-system into the toolchain

As discussed in system-in-the-toolchain, it may also make sense to have a System module in the toolchain for low-level OS interfaces and low-level currency types (like FileDescriptor, Errno, etc).

FilePath is different from these other types in that it transcends the entire tech stack, from kernel-level programming to high level scripts and automation. Note that we are not pulling in syscalls such as FilePath.stat, those will remain in System/SystemPackage.

Add FilePath to a separate standard library module

FilePath could live in a new module (e.g. FilePaths, Path, Files, ...) that ships with the toolchain but requires an explicit import. Currency types lose much of their value when they require an import. String, Array, Int, and Result are all in the Swift module; it is our (weakly held) opinion FilePath should be too.

Use Foundation's URL

Foundation's URL is designed for URI semantics, including scheme parsing and percent-encoding. File system paths and URIs have different structure and different invariants. For example, URL(fileURLWithPath:) and URL.appendingPathComponent make blocking file system calls, which is surprising for what appears to be a pure data type. On Unix, paths containing bytes that are not valid UTF-8 cannot survive conversion to a file:// URL, which requires percent-encoding. Foundation also sits high in the dependency stack; the Swift runtime and toolchain components cannot depend on it.

Do nothing

Every new API that needs to name a file will continue using String, perpetuating the loss of structure, platform correctness, and type safety that FilePath was designed to address.

Acknowledgments

Thanks to Saleem Abdulrasool for co-authoring the original FilePath syntactic operations design and implementation in swift-system. Thanks to the participants in the System-in-toolchain discussion and the SE-0513 review for helping clarify that FilePath specifically belongs in the standard library.

52 Likes

Am I correct in understanding that as a result FilePath isn’t able to represent the distinction between /foo/bar/baz and /foo/bar/baz/? This is probably by design since it has been in swift-system for so long but that distinction does occasionally matter for things like command line arguments, for example in rsync (no comment on whether that’s a good thing).

11 Likes

I've been known to say "the Swift standard library needs a FilePath type", so I'm definitely in favour of moving it.

This would neatly solve the problem we're currently dealing with in SE-0513.

8 Likes

I think this is a great idea, has come up many times and lots of libraries have implemented similar types so would be great to have it standardised in the standard library

5 Likes

I don't feel strongly about it, but it would be interesting to Alternatively Consider "Do less": could the amount of API surface in FilePath be reduced to a smaller subset, and what that might look like? Does the full API surface of FilePath need to be in the stdlib?

A much easier-to-demonstrate example is ls, where ls /path/to/symlink gives just the symlink while ls /path/to/symlink/ enumerates the contents of the directory the symlink is pointing to.

6 Likes

Yes, FilePath normalizes trailing separators, so /foo/bar/baz and /foo/bar/baz/ are the same FilePath. This is by design and has been the behavior in swift-system since 0.0.2.

This is not just an rsync convention. POSIX path resolution specifies that a trailing slash is resolved as if . were appended, so /foo/bar/ is equivalent to /foo/bar/.. In practice, open("/foo/bar/", O_RDONLY) will fail with ENOTDIR if bar is a regular file, where open("/foo/bar", O_RDONLY) would succeed. So the trailing slash carries real meaning at the syscall level: it acts as a runtime assertion that the final component is a directory.

Whether a given path names a directory is a property of file system state at the instant of the syscall, and that state can change between calls (e.g. an NFS volume modified or unmounted between syscalls). FilePath does not (and cannot) encode transient file system state. The current design normalizes trailing separators on storage, which means equality is byte equality, hashing is byte hashing, and component iteration doesn't need to account for trailing separators at every call site. The cost is that the POSIX directory assertion is lost on construction.

An alternative would be to preserve the distinction, perhaps by encoding some/path/ as some/path/. on POSIX platforms rather than stripping the trailing slash. Lexical normalization of this trailing sub-path could then drop the /.. This would preserve the runtime error semantics through to syscalls. We would then need to decide whether some/path, some/path/, and some/path/. all produce the same hash value (i.e. what equality means across these forms).

10 Likes

+1 on the overall pitch. It is good to see this separated out from the larger System in the toolchain pitch. FilePath is an important cross platform currency type for many APIs. The fact that we don't have this type yet has caused API design problems across the ecosystem. So let's solve this.

A few comments while reading the proposal:

Enum exhaustiveness

public enum Kind: Sendable

I assume this enum is @frozen like the Darwin version currently is right?

Accessing raw bytes

I wasn't able to find it but is there a way to get to the raw bytes of the FilePath for the cases where they can't be represented as valid Strings?

Implications on adoptions

For existing users of swift-system, SystemPackage (the SwiftPM package) can use #if conditionals to provide initializers and conversions between SystemPackage.FilePath and Swift.FilePath on toolchain versions that include this change, enabling a smooth source-compatible migration path. System (the Darwin framework) can perform ABI migration, redirecting the existing System.FilePath symbol to the standard library implementation, preserving binary compatibility for existing Darwin binaries.

Can't SystemPackage just introduce a Swift compiler conditional type alias instead e.g.

#if compiler(>6.X) // Assuming this lands in 6.X
public typealias FilePath = Swift.FilePath
#else
public struct FilePath { ... }
#endif 
5 Likes

Yes, in that this is a non-extensible enum.

A bit of background: from System's perspective, enums should be avoided in API if we care in the slightest bit about layout equivalence and/or extensibiliy. I do not believe these apply to Kind here and that is why it is frozen even in System. The same rationale should apply to the stdlib.

Right, that's part of the withPlatformString and CInterop unsafe pointer layer in future directions. That being said, we could (should) have Span-based views of paths and subpaths instead anyways. A question is how we want to parameterize/distinguish over UInt8 vs UInt16 in the API. For example, do we want to pull over something like System's PlatformChar.

Yes, this is a good point. If the API is unchanged as part of the migration to the stdlib, a typealias might be the smoothest solution. If there are minor API changes, we still might want to make it a typealias and just have the API diff be present on System. Major API changes might argue for keeping both types around in some capacity (even if via deprecation) with conversions between them.

3 Likes

It seems to me like preserving this is valuable, and it would be unfortunate if folks needed to fall back to strings for APIs where this matters (is this what swift-system does today?). Maybe we can add a hastrailingseparator bool to the type which gets set to false on canonicalization? This would still give use byte-equality on the type and be expressive enough to represent all three forms: /foo/bar (canonical); /foo/bar/ (hastrailingseparator); and /foo/bar/. (has an additional component).

3 Likes

Assuming ABI constraints allow for it, I'd be inclined to just #if os(Windows) the thing and avoid additional typealiases or helper modules.

1 Like

+1 yes please!

Regardless of what happens to System, I think it makes a lot of sense to put FilePath in the toolchain & in the Swift module specifically.

FilePath is much more cross-platform than the rest of System purports to be. And we desperately need a common currency type for file paths, which have got to be one of the most commonly used concepts in programming. I’ve experienced a real reluctance to use FilePath in cross-platform (Linux + Darwin) code right now due to the difficulty around System vs. SystemPackage.

Two questions about conformances:

  1. Can we fix the Codable representation of FilePath to be a string? Right now it’s really weird. Maybe the Decodable conformance can handle the old representation to help people migrate? If this isn’t possible due to compatibility concerns, is it possible to deprecate & eventually remove a conformance? I feel a deprecated/removed Codable conformance would be better than the current situation.
  2. Can FilePath and FilePath.Component adopt RawRepresentable? There are a wide variety of generic functions and types that work on RawRepresentables with String raw values. swift-configuration is one I’m thinking about specifically, but they’re sort of everywhere.
9 Likes

I see value in bringing in FilePath into the standard library, though, I still have concerns about the cross-platform nature.

Not being able to run the test suite for "foreign" spellings IMO makes it too easy to break another platform. At least for testing, we should be able to construct foreign paths. I'm also wondering how this would work for some projects, e.g. SPM, which tries to have a single way to reprsent paths across platforms.

2 Likes

Very broadly speaking, Windows is the odd one out and every other modern platform uses POSIX-style paths, right? Are there any other counter-examples? That's not to say we should ignore Windows of course; just trying to get a sense of the problem scope.

It would not be outside the realm of possibility to have two types and a platform-specific typealias, something like:

struct WindowsFilePath {}
struct POSIXFilePath {}
#if os(Windows)
typealias FilePath = WindowsFilePath
#else
typealias FilePath = POSIXFilePath
#endif

Thus allowing me to say "this is a native file path, either is fine" or "this is explicitly a POSIX file path". But do we expect there to be a common need for e.g. a FreeBSD user to construct and manipulate Windows paths, or a Windows user to construct and manipulate Linux paths?

1 Like

Depends on your definition of modern. AIUI, Windows/DOS use \, OpenVMS (last release was ~14m ago) uses . and brackets (e.g. SYS$DISK:[DIR1.DIR2]FILE.TXT;1), RISC OS uses . (last release ~21m ago) (e.g. $.Documents.File), and Z/OS (last releases ~4m ago) uses . for dataset separators (e.g. USER.DATA.FILE).

I'd say that we should assume that there are plenty of non-/ separator options.

4 Likes

I'm definitely ignoring MS-DOS and compatibles. :slight_smile: Swift doesn't build for any of those other platforms, but Embedded Swift conceivably could. So an open question is whether we try to support them out the door or say "FilePath supports these kinds of paths, if you want to support OpenVMS we take PRs."

Right, I think that we should ensure that we have enough flexibility in FilePath out of the door to support that, and I think that the biggest concern is around the inability to form foreign spellings. I don't think that we need to add support for these other platforms to get FilePath introduced.

1 Like

I asked our security professionals and we have concerns for logic bugs. In general, developers really want to hear that there is a way to safely work with paths as strings that are detached from the filesystem, but this is significantly misleading.

Even only considering lexicon, there are discrepancies between what FilePath does and what the actual file system does: HFS/HFS+/APFS are known for jarring equivalences, and XNU supports path roots other than /, among two examples. Most of the details aren’t very exciting, but for a little bit of harmless fun, can you guess what program, if any, macOS will start if you put /usr/bin/ßh in the terminal?

As correctly noted, this is even worse when you start actually trying to resolve file paths to files. The bottom line is that there are virtually no reliable inferences that you can make over file paths without querying the file system. Of particular interest:

  • Opening the same path twice may open two different files because most path components can change arbitrarily between two calls to open.
  • Opening two different paths can open the same file for a lot of reasons:
    • arbitrary changes between the two operations
    • the file system compares strings differently than FilePath
    • links, mountpoints, etc
  • Testing that filePath starts with prefixPath does not guarantee that the opened file resolves inside of prefixPath (and the existence of starts(with:) is probably an attractive nuisance).

Some of these properties are useful on read-only filesystems and conceptual file systems. For instance, if you load a zip file in memory, it’s safe to use something like FilePath to identify files inside of it. If you are routing URLs, it’s safe to use something like FilePath to decide what the request does. However, it’s clear that FilePath will be used for, well, file paths on mutable file systems, and this design does not address the most common mistakes.

File path mishandling could well be the #2 cause of security bugs, right after memory safety. It’s likely that involving some security people will result in designs that push the state of the art. Given how long it takes to refresh APIs, I’m fairly confident this is a good use of time.

14 Likes

I think this is a greeeeeat idea! Many Foundation APIs, such as Subprocess, are moving towards using FilePath as a currency type instead of String. We currently have issues with where SystemPackage lives: on Darwin, we use import System directly, whereas on other platforms, we need to add swift-system as a dependency. This discrepancy forces us to write

#if canImport(System)
import System
#else
import SystemPackage
#endif

everywhere. It would be great if we could use FilePath directly from the stdlib.

4 Likes

My off-the-cuff sense here is that these are all [very interesting!] facts about filesystems / kernels, and not file paths per se. I’ll grant that many people probably imagine those things to be inseparable or in fact identical, but however common that misconception may be, it is a misconception, and I don’t think there’s anything to be done about it.

I think it’s somewhat indicative that you list TOCTOU problems alongside more complex or subtle problems. I would consider “symlinks make reasoning about outcomes with filepaths hard” to be equally true as “[mutable] filesystems are shared mutable state, so you can never really know what’s going to happen when you try to access a path”.

It might be useful for there to be a library that aggressively checks with the filesystem to try to reduce the incidence of the situations you described†, but I don’t think that’s a reasonable default method of operating for a currency type like FilePath, and even the most aggressive version of that idea wouldn’t entirely eliminate those problems (except on readonly filesystems, as you mentioned).

(The /usr/bin/Ăźh example was quite jarring!)

† Symlinks, mountpoints, (and reparse points on Windows, which are similar to the foregoing), TOCTOU and path → node parsing/canonicalization

3 Likes