[API Review] FilePath Syntactic APIs - Version 2

Hello everyone, I created a new thread for version 2 of the FilePath APIs. I received so much valuable insight in the other thread that I've incorporated into this design, so I thought it would be cleaner to make a new thread.

You can find the full version with extended rationale and future directions in the gist. Below is an excerpt, trimmed due to forum character limits.

FilePath Syntactic Operations

Introduction

FilePath appeared in System 0.0.1 with a minimal API. This proposal adds API for syntactic operations, which are performed on the structure of the path and thus do not consult with the file system or make any system calls. These include inspecting the structure of paths, modifying paths, and accessing individual components.

Additionally, this proposal greatly expands Windows support and enables writing platform-agnostic path manipulation code.

Design

Windows support

Furthering Swift's push for Windows and System's initial Windows support, this proposal updates FilePath to work well on Windows in addition to Unix platforms.

The proposed API is designed to work naturally and intuitively with both Unix-style and Windows paths. Most of the concepts and even terminology are shared across platforms, though there are some minor differences (e.g. this proposal uses the word "absolute" to refer to what is formally called "fully-qualified" on Windows).

Introducing FilePath.Root and FilePath.Component

FilePath.Root represents the root of a path. On Unix, this is simply /, but on Windows it can include volume and server/share information.

extension FilePath {
  /// Represents a root of a file path.
  ///
  /// On Unix, a root is simply the directory separator `/`.
  ///
  /// On Windows, a root contains the entire path prefix up to and including
  /// the final separator.
  ///
  /// Examples:
  /// * Unix:
  ///   * `/`
  /// * Windows:
  ///   * `C:\`
  ///   * `C:`
  ///   * `\`
  ///   * `\\server\share\`
  ///   * `\\?\UNC\server\share\`
  ///   * `\\?\Volume{12345678-abcd-1111-2222-123445789abc}\`
  public struct Root { }
}

FilePath.Component represents a single non-root component of a path. A component can be a file or directory name or one of the special directories . or ... Components are always non-empty and do not contain a directory separator.

extension FilePath {
  /// Represents an individual, non-root component of a file path.
  ///
  /// Components can be one of the special directory components (`.` or `..`)
  /// or a file or directory name. Components are never empty and never
  /// contain the directory separator.
  ///
  /// Example:
  ///
  ///     var path: FilePath = "/tmp"
  ///     let file: FilePath.Component = "foo.txt"
  ///     file.kind == .regular           // true
  ///     file.extension                  // "txt"
  ///     path.append(file)               // path is "/tmp/foo.txt"
  ///
  public struct Component {
    /// Whether a component is a regular file or directory name, or a special
    /// directory `.` or `..`
    public enum Kind {
      /// The special directory `.`, representing the current directory.
      case currentDirectory

      /// The special directory `..`, representing the parent directory.
      case parentDirectory

      /// A file or directory name
      case regular
    }

    /// The kind of this component
    public var kind: Kind { get }

    /// Whether this component is either special directory `.` or `..`.
    public var isSpecialDirectory: Bool { get }
  }
}

These can conveniently be created from a string literal and can be printed, just like FilePath.

(See the gist for CustomStringConvertible, CustomDebugStringConvertible, ExpressibleByStringLiteral conformances for Root and Component.)

FilePath.ComponentView

FilePath.ComponentView is a BidirectionalCollection and RangeReplaceableCollection of the non-root components that comprise a path. The Index type is an opaque wrapper around FilePath's underlying storage index.

extension FilePath {
  /// A bidirectional, range replaceable collection of the non-root components
  /// that make up a file path.
  ///
  /// ComponentView provides access to standard `BidirectionalCollection`
  /// algorithms for accessing components from the front or back, as well as
  /// standard `RangeReplaceableCollection` algorithms for modifying the
  /// file path using component or range of components granularity.
  ///
  /// Example:
  ///
  ///     var path: FilePath = "/./home/./username/scripts/./tree"
  ///     let scriptIdx = path.components.lastIndex(of: "scripts")!
  ///     path.components.insert("bin", at: scriptIdx)
  ///     // path is "/./home/./username/bin/scripts/./tree"
  ///
  ///     path.components.removeAll { $0.kind == .currentDirectory }
  ///     // path is "/home/username/bin/scripts/tree"
  ///
  public struct ComponentView: BidirectionalCollection, RangeReplaceableCollection { }

  /// View the non-root components that make up this path.
  public var components: ComponentView { get set }
}

FilePath can be created by a given root and components. FilePath.init(root:_:ComponentView.SubSequence) is a more efficient overload that can directly access the underlying storage, which already has normalized separators between components.

extension FilePath {
  /// Create a file path from a root and a collection of components.
  public init<C: Collection>(root: Root?, _ components: C)
    where C.Element == Component

  /// Create a file path from a root and any number of components.
  public init(root: Root?, components: Component...)

  /// Create a file path from an optional root and a slice of another path's
  /// components.
  public init(root: Root?, _ components: ComponentView.SubSequence)
}

Basic queries

  /// Returns whether `other` is a prefix of `self`, only considering
  /// whole path components.
  ///
  /// Example:
  ///
  ///     let path: FilePath = "/usr/bin/ls"
  ///     path.starts(with: "/")              // true
  ///     path.starts(with: "/usr/bin")       // true
  ///     path.starts(with: "/usr/bin/ls")    // true
  ///     path.starts(with: "/usr/bin/ls///") // true
  ///     path.starts(with: "/us")            // false
  ///
  public func starts(with other: FilePath) -> Bool

  /// Returns whether `other` is a suffix of `self`, only considering
  /// whole path components.
  ///
  /// Example:
  ///
  ///     let path: FilePath = "/usr/bin/ls"
  ///     path.ends(with: "ls")             // true
  ///     path.ends(with: "bin/ls")         // true
  ///     path.ends(with: "usr/bin/ls")     // true
  ///     path.ends(with: "/usr/bin/ls///") // true
  ///     path.ends(with: "/ls")            // false
  ///
  public func ends(with other: FilePath) -> Bool

  /// Whether this path is empty
  public var isEmpty: Bool { get }
}

Windows roots are more complex and can take several different syntactic forms, carry additional information within them such as a drive letter or server/share information, and the presence of a root does not mean that the path is absolute (i.e. "fully-qualified" in Windows-speak).

For example, C:foo refers to foo relative to the current directory on the C drive, and \foo refers to foo at the root of the current drive. Neither of those are absolute, i.e. fully-qualified, even though they have roots.

extension FilePath {
  /// Returns true if this path uniquely identifies the location of
  /// a file without reference to an additional starting location.
  ///
  /// On Unix platforms, absolute paths begin with a `/`. `isAbsolute` is
  /// equivalent to `root != nil`.
  ///
  /// On Windows, absolute paths are fully qualified paths. `isAbsolute` is
  /// _not_ equivalent to `root != nil` for traditional DOS paths
  /// (e.g. `C:foo` and `\bar` have roots but are not absolute). UNC paths
  /// and device paths are always absolute. Traditional DOS paths are
  /// absolute only if they begin with a volume or drive followed by
  /// a `:` and a separator.
  ///
  /// NOTE: This does not perform shell expansion or substitute
  /// environment variables; paths beginning with `~` are considered relative.
  ///
  /// Examples:
  /// * Unix:
  ///   * `/usr/local/bin`
  ///   * `/tmp/foo.txt`
  ///   * `/`
  /// * Windows:
  ///   * `C:\Users\`
  ///   * `\\?\UNC\server\share\bar.exe`
  ///   * `\\server\share\bar.exe`
  public var isAbsolute: Bool { get }

  /// Returns true if this path is not absolute (see `isAbsolute`).
  ///
  /// Examples:
  /// * Unix:
  ///   * `~/bar`
  ///   * `tmp/foo.txt`
  /// * Windows:
  ///   * `bar\baz`
  ///   * `C:Users\`
  ///   * `\Users`
  public var isRelative: Bool { get }
}

Path decomposition and analysis

Paths can be decomposed into their (optional) root and their (potentially empty) components.

extension FilePath {
  /// Returns the root of a path if there is one, otherwise `nil`.
  ///
  /// On Unix, this will return the leading `/` if the path is absolute
  /// and `nil` if the path is relative.
  ///
  /// On Windows, for traditional DOS paths, this will return
  /// the path prefix up to and including a root directory or
  /// a supplied drive or volume. Otherwise, if the path is relative to
  /// both the current directory and current drive, returns `nil`.
  ///
  /// On Windows, for UNC or device paths, this will return the path prefix
  /// up to and including the host and share for UNC paths or the volume for
  /// device paths followed by any subsequent separator.
  ///
  /// Examples:
  /// * Unix:
  ///   * `/foo/bar => /`
  ///   * `foo/bar  => nil`
  /// * Windows:
  ///   * `C:\foo\bar                => C:\`
  ///   * `C:foo\bar                 => C:`
  ///   * `\foo\bar                  => \`
  ///   * `foo\bar                   => nil`
  ///   * `\\server\share\file       => \\server\share\`
  ///   * `\\?\UNC\server\share\file => \\?\UNC\server\share\`
  ///   * `\\.\device\folder         => \\.\device\`
  ///
  /// Setting the root to `nil` will remove the root and setting a new
  /// root will replace the root.
  ///
  /// Example:
  ///
  ///     var path: FilePath = "/foo/bar"
  ///     path.root = nil // path is "foo/bar"
  ///     path.root = "/" // path is "/foo/bar"
  ///
  /// Example (Windows):
  ///
  ///     var path: FilePath = #"\foo\bar"#
  ///     path.root = nil         // path is #"foo\bar"#
  ///     path.root = "C:"        // path is #"C:foo\bar"#
  ///     path.root = #"C:\"#     // path is #"C:\foo\bar"#
  ///
  public var root: FilePath.Root? { get set }

  /// Creates a new path containing just the components, i.e. everything
  /// after `root`.
  ///
  /// Returns self if `root == nil`.
  ///
  /// Examples:
  /// * Unix:
  ///   * `/foo/bar => foo/bar`
  ///   * `foo/bar  => foo/bar`
  ///   * `/        => ""`
  /// * Windows:
  ///   * `C:\foo\bar                  => foo\bar`
  ///   * `foo\bar                     => foo\bar`
  ///   * `\\?\UNC\server\share\file   => file`
  ///   * `\\?\device\folder\file.exe  => folder\file.exe`
  ///   * `\\server\share\file         => file`
  ///   * `\                           => ""`
  ///
  public func removingRoot() -> FilePath
}

Setters allow for in-place mutation. FilePath.root's setter allows making a path relative or absolute, and even allows switching root representations on Windows.

A common decomposition of a path is between it's last non-root component and everything prior to this (e.g. basename and dirname in C).

extension FilePath {
  /// Returns the final component of the path.
  /// Returns `nil` if the path is empty or only contains a root.
  ///
  /// Note: Even if the final component is a special directory
  /// (`.` or `..`), it will still be returned. See `lexicallyNormalize()`.
  ///
  /// Examples:
  /// * Unix:
  ///   * `/usr/local/bin/ => bin`
  ///   * `/tmp/foo.txt    => foo.txt`
  ///   * `/tmp/foo.txt/.. => ..`
  ///   * `/tmp/foo.txt/.  => .`
  ///   * `/               => nil`
  /// * Windows:
  ///   * `C:\Users\                    => Users`
  ///   * `C:Users\                     => Users`
  ///   * `C:\                          => nil`
  ///   * `\Users\                      => Users`
  ///   * `\\?\UNC\server\share\bar.exe => bar.exe`
  ///   * `\\server\share               => nil`
  ///   * `\\?\UNC\server\share\        => nil`
  ///
  public var lastComponent: Component? { get }

  /// Creates a new path with everything up to but not including
  /// `lastComponent`.
  ///
  /// If the path only contains a root, returns `self`.
  /// If the path has no root and only includes a single component,
  /// returns an empty FilePath.
  ///
  /// Examples:
  /// * Unix:
  ///   * `/usr/bin/ls => /usr/bin`
  ///   * `/foo        => /`
  ///   * `/           => /`
  ///   * `foo         => ""`
  /// * Windows:
  ///   * `C:\foo\bar.exe                 => C:\foo`
  ///   * `C:\                            => C:\`
  ///   * `\\server\share\folder\file.txt => \\server\share\folder`
  ///   * `\\server\share\                => \\server\share\`
  public func removingLastComponent() -> FilePath

(See the gist for dirname and basename unavailable-renamed declarations to aid discoverability.)

Components may be decomposed into their stem and (optional) extension (.txt, .o, .app, etc.) FilePath gains convenience APIs for dealing with the stem and extension of the last component, if it exists. FilePath also gains a setter for extension for easy and efficient in-place mutation.

extension FilePath.Component {
  /// The extension of this file or directory component.
  ///
  /// If `self` does not contain a `.` anywhere, or only
  /// at the start, returns `nil`. Otherwise, returns everything after the dot.
  ///
  /// Examples:
  ///   * `foo.txt    => txt`
  ///   * `foo.tar.gz => gz`
  ///   * `Foo.app    => app`
  ///   * `.hidden    => nil`
  ///   * `..         => nil`
  ///
  public var `extension`: String? { get }

  /// The non-extension portion of this file or directory component.
  ///
  /// Examples:
  ///   * `foo.txt => foo`
  ///   * `foo.tar.gz => foo.tar`
  ///   * `Foo.app => Foo`
  ///   * `.hidden => .hidden`
  ///   * `..      => ..`
  ///
  public var stem: String { get }
}

extension FilePath {
  /// The extension of the file or directory last component.
  ///
  /// If `lastComponent` is `nil` or one of the special path components
  /// `.` or `..`, `get` returns `nil` and `set` does nothing.
  ///
  /// If `lastComponent` does not contain a `.` anywhere, or only
  /// at the start, `get` returns `nil` and `set` will append a
  /// `.` and `newValue` to `lastComponent`.
  ///
  /// Otherwise `get` returns everything after the last `.` and `set` will
  /// replace the extension.
  ///
  /// Examples:
  ///   * `/tmp/foo.txt                 => txt`
  ///   * `/Appliations/Foo.app/        => app`
  ///   * `/Appliations/Foo.app/bar.txt => txt`
  ///   * `/tmp/foo.tar.gz              => gz`
  ///   * `/tmp/.hidden                 => nil`
  ///   * `/tmp/.hidden.                => ""`
  ///   * `/tmp/..                      => nil`
  ///
  /// Example:
  ///
  ///     var path = "/tmp/file"
  ///     path.extension = ".txt" // path is "/tmp/file.txt"
  ///     path.extension = ".o"   // path is "/tmp/file.o"
  ///     path.extension = nil    // path is "/tmp/file"
  ///     path.extension = ""     // path is "/tmp/file."
  ///
  public var `extension`: String? { get set }

  /// The non-extension portion of the file or directory last component.
  ///
  /// Returns `nil` if `lastComponent` is `nil`
  ///
  ///   * `/tmp/foo.txt                 => foo`
  ///   * `/Appliations/Foo.app/        => Foo`
  ///   * `/Appliations/Foo.app/bar.txt => bar`
  ///   * `/tmp/.hidden                 => .hidden`
  ///   * `/tmp/..                      => ..`
  ///   * `/                            => nil`
  public var stem: String? { get }
}

FilePath.extension's setter allows for convenient in-place reassigning or adding/removing of an extension.

Lexical operations

FilePath supports lexical (i.e. does not call into the file system to e.g. follow symlinks) operations such as normalization of special directory components (. and ..).

extension FilePath {
  /// Whether the path is in lexical-normal form, that is `.` and `..`
  /// components have been collapsed lexically (i.e. without following
  /// symlinks).
  ///
  /// Examples:
  /// * `"/usr/local/bin".isLexicallyNormal == true`
  /// * `"../local/bin".isLexicallyNormal   == true`
  /// * `"local/bin/..".isLexicallyNormal   == false`
  public var isLexicallyNormal: Bool { get }

  /// Collapse `.` and `..` components lexically (i.e. without following
  /// symlinks).
  ///
  /// Examples:
  /// * `/usr/./local/bin/.. => /usr/local`
  /// * `/../usr/local/bin   => /usr/local/bin`
  /// * `../usr/local/../bin => ../usr/bin`
  public mutating func lexicallyNormalize()

  /// Returns a copy of `self` in lexical-normal form, that is `.` and `..`
  /// components have been collapsed lexically (i.e. without following
  /// symlinks). See `lexicallyNormalize`
  public var lexicallyNormal: FilePath { get }
}

FilePath provides API to help protect against arbitrary path traversal from untrusted sub-paths:

extension FilePath {
  /// Create a new `FilePath` by resolving `subpath` relative to `self`,
  /// ensuring that the result is lexically contained within `self`.
  ///
  /// `subpath` will be lexically normalized (see `lexicallyNormalize`) as
  /// part of resolution, meaning any contained `.` and `..` components will
  /// be collapsed without resolving symlinks. Any root in `subpath` will be
  /// ignored.
  ///
  /// Returns `nil` if the result would "escape" from `self` through use of
  /// the special directory component `..`.
  ///
  /// This is useful for protecting against arbitrary path traversal from an
  /// untrusted subpath: the result is guaranteed to be lexically contained
  /// within `self`. Since this operation does not consult the file system to
  /// resolve symlinks, any escaping symlinks nested inside of `self` can still
  /// be targeted by the result.
  ///
  /// Example:
  ///
  ///     let staticContent: FilePath = "/var/www/my-website/static"
  ///     let links: [FilePath] =
  ///       ["index.html", "/assets/main.css", "../../../../etc/passwd"]
  ///     links.map { staticContent.lexicallyResolving($0) }
  ///       // ["/var/www/my-website/static/index.html",
  ///       //  "/var/www/my-website/static/assets/main.css",
  ///       //  nil]
  public func lexicallyResolving(_ subpath: FilePath) -> FilePath?
}

Modifying paths

FilePath supports common mutation operations.

extension FilePath {
  /// If `prefix` is a prefix of `self`, removes it and returns `true`.
  /// Otherwise returns `false`.
  ///
  /// Example:
  ///
  ///     var path: FilePath = "/usr/local/bin"
  ///     path.removePrefix("/usr/bin")   // false
  ///     path.removePrefix("/us")        // false
  ///     path.removePrefix("/usr/local") // true, path is "bin"
  ///
  public mutating func removePrefix(_ prefix: FilePath) -> Bool

  /// Append a `component` on to the end of this path.
  ///
  /// Example:
  ///
  ///     var path: FilePath = "/tmp"
  ///     let sub: FilePath = "foo/./bar/../baz/."
  ///     for comp in sub.components.filter({ $0.kind != .currentDirectory }) {
  ///       path.append(comp)
  ///     }
  ///     // path is "/tmp/foo/bar/../baz"
  ///
  public mutating func append(_ component: FilePath.Component)

  /// Append `components` on to the end of this path.
  ///
  /// Example:
  ///
  ///     var path: FilePath = "/"
  ///     path.append(["usr", "local"])     // path is "/usr/local"
  ///     let otherPath: FilePath = "/bin/ls"
  ///     path.append(otherPath.components) // path is "/usr/local/bin/ls"
  ///
  public mutating func append<C: Collection>(_ components: C)
    where C.Element == FilePath.Component

  /// Append the contents of `other`, ignoring any spurious leading separators.
  ///
  /// A leading separator is spurious if `self` is non-empty.
  ///
  /// Example:
  ///   var path: FilePath = ""
  ///   path.append("/var/www/website") // "/var/www/website"
  ///   path.append("static/assets") // "/var/www/website/static/assets"
  ///   path.append("/main.css") // "/var/www/website/static/assets/main.css"
  ///
  public mutating func append(_ other: String)

  /// Non-mutating version of `append(_:Component)`.
  public func appending(_ other: Component) -> FilePath

  /// Non-mutating version of `append(_:C)`.
  public func appending<C: Collection>(
    _ components: C
  ) -> FilePath where C.Element == FilePath.Component

  /// Non-mutating version of `append(_:String)`.
  public func appending(_ other: String) -> FilePath

  /// If `other` does not have a root, append each component of `other`. If
  /// `other` has a root, replaces `self` with other.
  ///
  /// This operation mimics traversing a directory structure (similar to the
  /// `cd` command), where pushing a relative path will append its components
  /// and pushing an absolute path will first clear `self`'s existing
  /// components.
  ///
  /// Example:
  ///
  ///     var path: FilePath = "/tmp"
  ///     path.push("dir/file.txt") // path is "/tmp/dir/file.txt"
  ///     path.push("/bin")         // path is "/bin"
  ///
  public mutating func push(_ other: FilePath)

  /// Non-mutating version of `push()`
  public func pushing(_ other: FilePath) -> FilePath

  /// In-place mutating variant of `removingLastComponent`.
  ///
  /// If `self` only contains a root, does nothing and returns `false`.
  /// Otherwise removes `lastComponent` and returns `true`.
  ///
  /// Example:
  ///
  ///     var path = "/usr/bin"
  ///     path.removeLastComponent() == true  // path is "/usr"
  ///     path.removeLastComponent() == true  // path is "/"
  ///     path.removeLastComponent() == false // path is "/"
  ///
  @discardableResult
  public mutating func removeLastComponent() -> Bool

  /// Remove the contents of the path, keeping the null terminator.
  public mutating func removeAll(keepingCapacity: Bool = false)

  /// Reserve enough storage space to store `minimumCapacity` platform
  /// characters.
  public mutating func reserveCapacity(_ minimumCapacity: Int)
}

Paths and strings

Just like FilePath, FilePath.Component and FilePath.Root can be decoded/validated into a Swift String.

extension String {
  /// Creates a string by interpreting the path component's content as UTF-8 on
  /// Unix and UTF-16 on Windows.
  ///
  /// - Parameter path: The path component to be interpreted as
  ///   `CTypes.PlatformUnicodeEncoding`.
  ///
  /// If the content of the path component isn't a well-formed Unicode string,
  /// this initializer replaces invalid bytes with U+FFFD.
  /// This means that, depending on the semantics of the specific file system,
  /// conversion to a string and back to a path component
  /// might result in a value that's different from the original path component.
  public init(decoding component: FilePath.Component)

  /// Creates a string from a path component, validating its contents as UTF-8
  /// on Unix and UTF-16 on Windows.
  ///
  /// - Parameter path: The path component to be interpreted as
  ///   `CTypes.PlatformUnicodeEncoding`.
  ///
  /// If the contents of the path component isn't a well-formed Unicode string,
  /// this initializer returns `nil`.
  public init?(validating component: FilePath.Component)

  /// On Unix, creates the string `"/"`
  ///
  /// On Windows, creates a string by interpreting the path root's content as
  /// UTF-16.
  ///
  /// - Parameter path: The path root to be interpreted as
  ///   `CTypes.PlatformUnicodeEncoding`.
  ///
  /// If the content of the path root isn't a well-formed Unicode string,
  /// this initializer replaces invalid bytes with U+FFFD.
  /// This means that on Windows,
  /// conversion to a string and back to a path root
  /// might result in a value that's different from the original path root.
  public init(decoding root: FilePath.Root) {
    self.init(_decoding: root)
  }

  /// On Unix, creates the string `"/"`
  ///
  /// On Windows, creates a string from a path root, validating its contents as
  /// UTF-16 on Windows.
  ///
  /// - Parameter path: The path root to be interpreted as
  ///   `CTypes.PlatformUnicodeEncoding`.
  ///
  /// On Windows, if the contents of the path root isn't a well-formed Unicode
  /// string, this initializer returns `nil`.
  public init?(validating root: FilePath.Root) {
    self.init(_validating: root)
  }
}

FilePath, FilePath.Component, and FilePath.Root gain convenience properties for viewing their content as Strings.

extension FilePath {
  /// Creates a string by interpreting the path’s content as UTF-8 on Unix
  /// and UTF-16 on Windows.
  ///
  /// This property is equivalent to calling `String(decoding: path)`
  public var string: String
}

extension FilePath.Component {
  /// Creates a string by interpreting the component’s content as UTF-8 on Unix
  /// and UTF-16 on Windows.
  ///
  /// This property is equivalent to calling `String(decoding: component)`.
  public var string: String
}

extension FilePath.Root {
  /// On Unix, this returns `"/"`.
  ///
  /// On Windows, interprets the root's content as UTF-16.
  ///
  /// This property is equivalent to calling `String(decoding: root)`.
  public var string: String
}

Separators are always normalized

FilePath now normalizes directory separators on construction and maintains this invariant across mutations. In the relative portion of the path, FilePath will strip trailing separators and coalesce repeated separators.

  FilePath("/a/b/") == "/a/b"
  FilePath("a///b") == "a/b"

Windows accepts either forwards slashes (/) or backslashes (\) as directory separators, though the platform's preferred separator is backslash. On Windows, FilePath normalizes forwards slashes to backslashes on construction. Backslashes after a UNC server/share or DOS device path's volume are treated as part of the root.

  FilePath("C:/foo/bar/") == #"C:\foo\bar"#
  FilePath(#"\\server\share\folder\"#) == #"\\server\share\folder"#
  FilePath(#"\\server\share\"#) == #"\\server\share\"#
  FilePath(#"\\?\volume\"#) == #"\\?\volume\"#

Wide and narrow characters

Unix paths are represented as contiguous CChars in memory and convert to a String by validating as UTF-8. Windows paths are represented as contiguous UInt16s in memory and are converted to a String by validating as UTF-16. Either platform may have invalid Unicode content, which only affects the conversion to Swift's Unicode-correct String type (i.e. it does not affect the semantics of other FilePath operations).

To avoid polluting the global namespace with more typealiases now and in the future, introduce CTypes to hold typealiases to (often weakly-typed) types imported from C. CModeT was defined as a global typealias in System 0.0.1 and is now nested inside CTypes, alongside CChar.

/// A namespace for C and platform types
public enum CTypes {
  #if os(macOS) || os(iOS) || os(watchOS) || os(tvOS)
  /// The C `mode_t` type.
  public typealias Mode = UInt16
  #elseif os(Windows)
  /// The C `mode_t` type.
  public typealias Mode = Int32
  #else
  /// The C `mode_t` type.
  public typealias Mode = UInt32
  #endif

  /// The C `char` type
  public typealias Char = CChar
}

To aid readability and make it easier to write code agnostic to the platform's character-width, introduce typealiases for the platform's preferred character and Unicode encoding.

extension CTypes {
  #if os(Windows)
  /// The platform's preferred character type. On Unix, this is an 8-bit C
  /// `char` (which may be signed or unsigned, depending on platform). On
  /// Windows, this is `UInt16` (a "wide" character).
  public typealias PlatformChar = UInt16
  #else
  /// The platform's preferred character type. On Unix, this is an 8-bit C
  /// `char` (which may be signed or unsigned, depending on platform). On
  /// Windows, this is `UInt16` (a "wide" character).
  public typealias PlatformChar = CTypes.Char
  #endif

  #if os(Windows)
  /// The platform's preferred Unicode encoding. On Unix this is UTF-8 and on
  /// Windows it is UTF-16. Native strings may contain invalid Unicode,
  /// which will be handled by either error-correction or failing, depending
  /// on API.
  public typealias PlatformUnicodeEncoding = UTF16
  #else
  /// The platform's preferred Unicode encoding. On Unix this is UTF-8 and on
  /// Windows it is UTF-16. Native strings may contain invalid Unicode,
  /// which will be handled by either error-correction or failing, depending
  /// on API.
  public typealias PlatformUnicodeEncoding = UTF8
  #endif
}

String, FilePath, FilePath.Component, and FilePath.Root gain "escape hatch" APIs for C interoperability using these typealiases.

(See the gist for init(platformString:), init?(validatingPlatformString:), and withPlatformString on String, FilePath, Component, and Root.)

Rejected or deferred alternatives

Deferred: Introduce RelativePath and AbsolutePath

See the gist.

Dropped: basename and dirname names and setters

See the gist.

Rejected: "Root" only refers to a separator, does not include Windows volumes

See the gist.

Deferred: Ability to work with paths from another platform

See the gist.

Source and ABI stability impact

API changes are strictly additive.

Separator normalization does not affect the semantics of path operations. It can change how paths are printed, compared, and hashed (this proposal argues these changes are for the better).

Deprecations

A handful of APIs have been deprecated in favor of better-named alternatives.

@available(*, deprecated, renamed: "CTypes.Mode")
public typealias CModeT = CTypes.Mode

extension FilePath {
  @available(*, deprecated, renamed: "init(validating:)")
  public init?(validatingUTF8 path: FilePath)

  @available(*, deprecated, renamed: "init(platformString:)")
  public init(cString: UnsafePointer<CChar>)

  @available(*, deprecated, renamed: "withPlatformString(_:)")
  public func withCString<Result>(
    _ body: (UnsafePointer<CChar>) throws -> Result
  ) rethrows -> Result
}
extension String {
  @available(*, deprecated, renamed: "init(validating:)")
  public init?(validatingUTF8 path: FilePath)
}
10 Likes

I think there's still a use case for a lexicallyResolving that doesn't prevent escapes, such as resolving relative references in HTML files (href="../article.html"). Given that you can build the "safe" one on top of the "unsafe" one (normalize, check for leading .., go ahead), not having the "unsafe" API makes it harder to implement that behavior. (Not that it's so hard to implement, I guess.)

Even without that, though, "Any root in subpath will be ignored" seems like a bad idea. If the API doesn't support relative escapes, passing an absolute rooted path should be a programmer error or result in nil.


One thing missing from the proposal, but shown in examples: FilePath.init?(_ path: String). It's worth discussing this because of an example brought up in a previous thread: "foo/bar\baz/" is three components in a DOS path and two components in a POSIX path.


EDIT: but this is really good, it looks like an API that will be precise and usable and possible-to-implement-efficiently.

1 Like

What would be the difference between that resolve and an append?

I originally had the name lexicallyRebasing, treating this as providing a chroot-like behavior. I noticed that many static sites generate paths starting with /, hence that behavior. @lukasa, what do you think?

What examples and what would be the semantics of this be?

Yes, and //server/share/file.txt is a root and 3 components on Unix and a root and one component on Windows, C:/foo is a relative path with two components on Unix and an absolute path with one on Windows, etc. Are you asking for an API that will fail or otherwise alert you to differences in platform interpretation?

I did bring up working with platform-specific representations here, but there could be room for something that told you whether a path was identical on both platforms. This wouldn't be an invariant we could maintain across APIs and separator normalization would still fire.

1 Like

…uh, I guess only that it would accept absolute paths. Maybe that's not a good enough reason to take it.

Sorry, yes, you've discussed that the differences exist, and that they're being subset out, but the proposal doesn't actually seem to include the initializer. The only initializers that take String are on Root and Component.

I've been trying to figure out consistent naming and terminology. It might be that:

  • append denotes concatenation of path components
  • push will do the cd-like directory traversal with paths
  • resolve will do the safely-nested-within kind of append

Ah, I see the confusion. FilePath already has an initializer taking a String: FilePath.swift.

I haven't decided where proposals such as this end up in the repo (perhaps under a proposals folder or grouped by version), but it might be clearer if I retroactively made one for System 0.0.1 at the same time.

2 Likes

Ah, I guess what I want specified is the parsing behavior that will be added to that initializer, even if it’s just “follows the current OS with no way to override today”.

Good idea, but the convention for pluralization is to use it only when there is also a protocol with a singular name.

e.g.
Publishers & Publisher
Subscribers & Subscriber
Subscriptions & Subscription

Pluralizing is a workaround for not being able to nest enums inside of a protocol. Someday it won't be necessary—the plural names are incorrect. Only use the "s" if you're backed into this particular corner.

In general I think the specified behaviour is the right one. For the web, you'll most commonly see either absolute or relative paths. In the case of an absolute path, we're essentially chrooting the path and so it makes sense to strip the leading /. In the case of a relative path, you need a two step process where you use the absolute path as perceived by the client of the current resource, append that path, and then chroot again.

In either case, you'd overwhelmingly want to invoke this method with an absolute path, not a relative one.

1 Like

I wanted to provide some more rationale and start more discussion on the naming conventions employed.

Naming lexical and non-lexical operations.

The proposed APIs are syntactic only, meaning they don't touch the file system. For resolving .. in this manner, I put the word lexical in the name (similarly to C++17). These operations are fast approximations that might give the wrong answer in the presense of symlinks. This allows me to reserve the name without lexical in it for the correct-but-slow altenative (again, this is what C++17 does).

Thus, we have lexicallyNormalize() and lexicallyResolve(), so that in the future we may have normalize() and resolve(). We haven't designed the file-system APIs yet, but it's likely they will hang off of FilePath as well, though they will all be throw-ing.

I am totally open to argument/debate. In general, I'd like the simpler name to be the safest/most-correct operation we can do. Additionally, it's easier and more friendly to deprecate longer names in favor of new shorter ones than the other way around, so using "lexically" buys us some breathing room.

Naming append, push, and currently-called resolve

When it comes to manipulating FilePath by adding things on to the end, I can see 3 kinds of desired semantics:

  1. String-like concatenation (+), called append
  2. Directory traversal (cd), called push
  3. Subpaths with guaranteed containment (chroot), called resolve

Python's join and C#'s Combine is the equivalent of our push: it will clear the path if joined with an aboslute path. This is the cause of widespread confusion and stack overflow posts asking why someone can't join /mysubpath/foo onto the end of a path. And as one commenter pointed out:

so we want to get this right.

1. append

Our definitions of append do not take a FilePath, but rather take Components which cannot be root. Thus, the type system guarantees we are never programmatically appending an absolute path.

  public mutating func append(_ component: FilePath.Component)
  public mutating func append<C: Collection>(_ components: C)
    where C.Element == FilePath.Component

For the common string literal scenario, we do provide an overload of append taking a String. This would be preferred over the string literal conversion of Component ("foo/bar" would be otherwise rejected because it has more than one component in it). Its behavior mimics the common sense expected behavior of stringy path operations, like C#'s (later-added) Join.

  /// Append the contents of `other`, ignoring any spurious leading separators.
  ///
  /// A leading separator is spurious if `self` is non-empty.
  ///
  /// Example:
  ///   var path: FilePath = ""
  ///   path.append("/var/www/website") // "/var/www/website"
  ///   path.append("static/assets") // "/var/www/website/static/assets"
  ///   path.append("/main.css") // "/var/www/website/static/assets/main.css"
  ///
  public mutating func append(_ other: String)

2. push

This takes a FilePath and implements cd-like semantics where pushing an absolute path will reassign self to other.

  /// If `other` does not have a root, append each component of `other`. If
  /// `other` has a root, replaces `self` with other.
  ///
  /// This operation mimics traversing a directory structure (similar to the
  /// `cd` command), where pushing a relative path will append its components
  /// and pushing an absolute path will first clear `self`'s existing
  /// components.
  ///
  /// Example:
  ///
  ///     var path: FilePath = "/tmp"
  ///     path.push("dir/file.txt") // path is "/tmp/dir/file.txt"
  ///     path.push("/bin")         // path is "/bin"
  ///
  public mutating func push(_ other: FilePath)

This is Rust's push, C++17's append, Python's join, and C#'s Combine. It is the most commonly implemented semantics in other langauges/libraries, but we have the advantage of leveraging the type system for our definition of append.

push isn't the perfect name, but I think it's preferable to something like navigate(to:), which might carry connotations of adjusting the current working directory or causing other effects.

3. resolve

This takes a subpath and ensures that the result is fully-contained within the directory structure starting at self. Note that in practice, we're only adding resolving as the non-mutating form makes the most sense (see below), but I'm using the word resole as the root for terminology.

  /// Create a new `FilePath` by resolving `subpath` relative to `self`,
  /// ensuring that the result is lexically contained within `self`.
  ///
  /// `subpath` will be lexically normalized (see `lexicallyNormalize`) as
  /// part of resolution, meaning any contained `.` and `..` components will
  /// be collapsed without resolving symlinks. Any root in `subpath` will be
  /// ignored.
  ///
  /// Returns `nil` if the result would "escape" from `self` through use of
  /// the special directory component `..`.
  ///
  /// This is useful for protecting against arbitrary path traversal from an
  /// untrusted subpath: the result is guaranteed to be lexically contained
  /// within `self`. Since this operation does not consult the file system to
  /// resolve symlinks, any escaping symlinks nested inside of `self` can still
  /// be targeted by the result.
  ///
  /// Example:
  ///
  ///     let staticContent: FilePath = "/var/www/my-website/static"
  ///     let links: [FilePath] =
  ///       ["index.html", "/assets/main.css", "../../../../etc/passwd"]
  ///     links.map { staticContent.lexicallyResolving($0) }
  ///       // ["/var/www/my-website/static/index.html",
  ///       //  "/var/www/my-website/static/assets/main.css",
  ///       //  nil]
  public func lexicallyResolving(_ subpath: FilePath) -> FilePath?

I originally had the name "rebase", which AFAICT would be me inventing a bespoke term. resolve carries more connotation as to what this is doing, and it sounds heavier/more-involved than append. The order of arguments flow left-to-right similarly to append and push, making it discoverable in code completion as a more secure alternative.

I hesitate to use the word "safe" in the name, as "safe" usually means memory safety in Swift (though a library like System could redefine terminology, I'd prefer to not cross that bridge just for this use case). I'm avoiding "secure" as well as this is not the safest/most secure variant: it would be better to call realpath or something similar on the result to guarantee a non-escape (at least at the time of the operation, some bad actor might add a symlink later).

Since this is done lexically, and the containment is just lexical containment (i.e. the subpath may actually point to a symlink that escapes containment), this is currently named lexicallyResolving(_ subpath: FilePath). Note that the self portion is not lexically normalized in this API (it doesn't have to be and we can avoid correctness errors), but subpath has to be. An alternative could be resolve(lexicalSubpath subpath: FilePath), which might be more discoverable.

There is no in-place mutating variant. I didn’t think that would be as useful as most of the time self is a constant such as a server root which is applied to many subpaths. Additionally, it would have to return a Bool signaling whether it “worked” and there would have to be some kind of sane well-defined fall-back behavior. I did put the ownership annotations on it so that it would chain efficiently and its storage could be consumed.

4 Likes

This looks great and I think the choice of making join safe via the inputs is even better than throwing :ok_hand:

What happens on Windows for the following case?
The docs fragment seems to me to assume Unix roots.

var path: FilePath = "tmp"
path.push("C:")
path.push("\bar")

I don't use Windows, so IDK what to expect, but it is a corner case.

@compnerd is currently working out the precise semantics for these operations, but I would expect to see

var path: FilePath = "tmp" // path is "tmp"
path.push("C:")            // path is "C:"
path.push("\bar")          // path is "C:\bar"

That is, relative roots have some well-defined combining vs replacing behavior.

I'm very excited by this!

My one comment is that maybe we should consider using throwing initializers for initialization from String. It is much simpler to write try? to convert a throwing initializer to a fallible one than it is to make up a custom error to throw in the case of nil. I also suspect it will be helpful in debugging to know why a particular String did not yield a valid FilePath.

First, FilePath's string initializer cannot fail (as in we treat path content as a superset of valid Unicode), deferring things like restricted character sets to the relevant syscall and/or specific file system involved. Path separators are interpreted for normalization and component operations, the . for extensions, etc., on an API-by-API basis. Otherwise, FilePath faithfully transports the bytes to the syscall.

We need a non-failable non-throwing labeled initializer to use for anything that can be created by a string literal, such as FilePath, Component, and Root. Component and Root will currently trap at runtime if their constraints are not met (component has to be a non-empty non-root single component and root has to be a non-empty root). We want to eventually promote these to compile time checks.

There could be an argument that the unlabled initializers from String for Component and Root be failable or throwing, while the string literals would remain trapping (until promoted to static asserts). I like this idea.

I don't know if throwing a bespoke error would really convey that much more safety and rationale. So far throws nicely conveys an intuition that the OS or file system is consulted (e.g. it could imply that the file/directory exists).

3 Likes

Done.

extension FilePath.Component {
  /// Create a file path component from a string.
  ///
  /// Returns `nil` if `string` is empty, a root, or has more than one component
  /// in it.
  public init?(_ string: String)

  /// Creates a file path component by copying bytes from a null-terminated
  /// platform string.
  ///
  /// Returns `nil` if `platformString` is empty, is a root, or has more than
  /// one component in it.
  ///
  /// - Parameter string: A pointer to a null-terminated platform string.
  ///
  public init?(platformString: UnsafePointer<CTypes.PlatformChar>)
}

extension FilePath.Root {
  /// Create a file path root from a string.
  ///
  /// Returns `nil` if `string` is empty or is not a root.
  public init?(_ string: String)

  /// Creates a file path root by copying bytes from a null-terminated platform
  /// string.
  ///
  /// Returns `nil` if `platformString` is empty or is not a root.
  ///
  /// - Parameter string: A pointer to a null-terminated platform string.
  ///
  public init?(platformString: UnsafePointer<CTypes.PlatformChar>)
}

The literal initializers will still trap for now, to be promoted to static assert in the future.

1 Like

I don't know if that is an only situation possible, but I do want to explore a better name. Unlike enums-as-values which have strong rationale for needing singular, this enum-as-namespace is uninhabited. For some definition of submodules, this could instead be a submodule like a Shims submodule, for which I don't see a plural name being an issue.

Something about e.g. CType.Mode feels off, perhaps because I read CType as the type of C and subvocalize the "dot". But, I could imagine it instead reading as "The C type mode".

Just C.Mode would be a more Swifty name for the previous CModeT, but I'm unsure about stealing the name C. I can't think of a good word other than "type", but if there is such a word, that would be better.

1 Like

This does not seem intuitive to me at all. I don't know of any precedent for throws = system call, and it seems very limiting to set this precedent (i.e. only system calls can propagate error information via throwing).

I admit that the additional information might be limited, but certainly non-zero (for instance, FilePath.Component.init(String) can differentiate between an empty string, a root or having multiple components). The big win of throws would be making it easier to compose these APIs into higher level APIs. One existing case study is String(data:encoding:), I can't tell you how many times I've manually created a DataIsNotUTF8Error or something analogous for use in a higher-level type which had a throwing initializer which happened to decode a String from Data. Having folks wanting fallible-initializer syntax use try? seems significantly simpler.

That said, the fallible initializer for FilePath already exists, so maybe this is a separate, complementary proposal.

Out of curiosity, how were the clients of that higher level initializer able to reason about or otherwise adapt to your custom error? Or were they using try?/! at the top level?

Mostly it was used for reporting. Namely, I would be initializing a type from some unvalidated input and the performing some (possibly throwing) operations on that type. Often, this would happen repeatedly in my program and I desired to have something better than "something failed" for the specific inputs that failed. So in answer to your question it is often effectively try? but with logging.

1 Like

The leading contender (thanks @lorentey!) is CInterop. This nicely names what it is used for rather than what it is. It also has the nice side effect of punting the "is a namespace pluralized like a module or like a value type" down the road. Updated to that and it looks much nicer.