[API Review] FilePath Syntactic APIs

Hello everyone, I'd like to open up review of new FilePath APIs to a wide audience. The forums impose a character limit, so see the full text with extended rationale and future directions here.

FilePath Syntactic Operations

Introduction

FilePath appeared in System 0.0.1 with a minimal API. This proposal adds API for syntactic operations, which are performed on the structure of the path and thus do not consult with the file system or make any system calls. These include inspecting the structure of paths, modifying paths, and accessing individual components.

Additionally, this proposal greatly expands Windows support and enables writing platform-agnostic path manipulation code.

Future Work: Operations that consult the file system, e.g. resolving symlinks.

Pull request

Design

Windows support

Furthering Swift's push for Windows and System's initial Windows support, this proposal updates FilePath to work well on Windows in addition to Unix platforms.

The proposed API is designed to work naturally and intuitively with both Unix-style and Windows paths. Most of the concepts and even terminology are shared across platforms, though there are some minor differences (e.g. this proposal uses the word "absolute" to refer to what is formally called "fully-qualified" on Windows).

Introducing FilePath.Component

FilePath.Component represents a single component of a path. A component can be a root (which must occur at the front of the path), a special directory (. or ..), or a relative path component such as a directory or a file. Components are always non-empty and do not contain a directory separator.

extension FilePath {
  /// Represents an individual component of a file path.
  ///
  /// Components can be one of the special directory components (`.` or `..`), a root
  /// (e.g. `/`), or a file or directory name. Components are never empty and non-root
  /// components never contain the directory separator.
  public struct Component: Hashable {
    /// Whether this component is the root of its path.
    public var isRoot: Bool { get }

    /// Whether this component is the special directory `.`, representing the current directory.
    public var isCurrentDirectory: Bool { get }

    /// Whether this component is the special directory `..`, representing the parent directory.
    public var isParentDirectory: Bool { get }

    /// Whether this component is either special directory `.` or `..`.
    public var isSpecialDirectory: Bool { get }
  }
}

FilePath.Component can conveniently be created from a string literal and can be printed, just like FilePath.

extension FilePath.Component: CustomStringConvertible, CustomDebugStringConvertible, ExpressibleByStringLiteral {
  /// A textual representation of the path component.
  ///
  /// If the content of the path component isn't a well-formed Unicode string,
  /// this replaces invalid bytes them with U+FFFD. See `String.init(decoding:)`.
  public var description: String { get }

  /// A textual representation of the path component, suitable for debugging.
  ///
  /// If the content of the path component isn't a well-formed Unicode string,
  /// this replaces invalid bytes them with U+FFFD. See `String.init(decoding:)`.
  public var debugDescription: String { get }

  /// Create a file path component from a string literal.
  ///
  /// Precondition: `stringLiteral` is non-empty and has only one component in it.
  public init(stringLiteral: String)

  /// Create a file path component from a string.
  ///
  /// Precondition: `string` is non-empty and has only one component in it.
  public init(_ string: String)
}

FilePath.ComponentView

FilePath.ComponentView is a BidirectionalCollection of the components that comprise a path. The Index type is an opaque wrapper around FilePath's underlying storage index.

extension FilePath {
  /// A bidirectional collection of the components that make up a file path.
  ///
  /// If the path has a root, it will be the first component. All other components will be part
  /// of the relative path.
  public struct ComponentView: BidirectionalCollection { }

  /// View the components that make up this path.
  public var components: ComponentView
}

FilePath can be created by a component or collection of components. FilePath.init(_:ComponentView.SubSequence) is a more performant overload that can directly access the underlying storage, which already has normalized separators between components.

extension FilePath {
  /// Create a file path from a collection of components.
  public init<C>(_ components: C) where C: Collection, C.Element == Component

  /// Create a file path from a single component.
  public init(_ component: Component)

  /// Create a file path from a slice of another path's components.
  public init(_ components: ComponentView.SubSequence)
}

Future Work: (see gist)

Basic queries

  /// Returns whether `other` is a prefix of `self`, only considering
  /// whole path components.
  ///
  /// Example:
  ///
  ///     let path: FilePath = "/usr/bin/ls"
  ///     path.starts(with: "/")              // true
  ///     path.starts(with: "/usr/bin")       // true
  ///     path.starts(with: "/usr/bin/ls")    // true
  ///     path.starts(with: "/usr/bin/ls///") // true
  ///     path.starts(with: "/us")            // false
  public func starts(with other: FilePath) -> Bool

  /// Returns whether `other` is a suffix of `self`, only considering
  /// whole path components.
  ///
  /// Example:
  ///
  ///     let path: FilePath = "/usr/bin/ls"
  ///     path.ends(with: "ls")             // true
  ///     path.ends(with: "bin/ls")         // true
  ///     path.ends(with: "usr/bin/ls")     // true
  ///     path.ends(with: "/usr/bin/ls///") // true
  ///     path.ends(with: "/ls")            // false
  public func ends(with other: FilePath) -> Bool

  /// Whether this path is empty
  public var isEmpty: Bool { get }
}

Windows roots are more complex and can take several different syntactic forms, carry additional information within them such as a drive letter or server/share information, and the presence of a root does not mean that the path is absolute (i.e. "fully-qualified" in Windows-speak).

For example, C:foo refers to foo relative to the current directory on the C drive, and \foo refers to foo at the root of the current drive. Neither of those are absolute, i.e. fully-qualified, even though they have roots.

extension FilePath {
  /// Returns true if this path uniquely identifies the location of
  /// a file without reference to an additional starting location.
  ///
  /// On Unix platforms, absolute paths begin with a `/`. `isAbsolute` is equivalent
  /// to `root != nil`.
  ///
  /// On Windows, absolute paths are fully qualified paths. UNC paths and device paths
  /// are always absolute. Traditional DOS paths are absolute if they begin with a volume or drive
  /// followed by a `:` and a separator.
  ///
  /// NOTE: This does not perform shell expansion or substitute
  /// environment variables; paths beginning with `~` are considered relative.
  ///
  /// Examples:
  /// * Unix:
  ///   * `/usr/local/bin`
  ///   * `/tmp/foo.txt`
  ///   * `/`
  /// * Windows:
  ///   * `C:\Users\`
  ///   * `\\?\UNC\server\share\bar.exe`
  ///   * `\\server\share\bar.exe`
  public var isAbsolute: Bool { get }

  /// Returns true if this path is not absolute (see absolute).
  ///
  /// Examples:
  /// * Unix:
  ///   * `~/bar`
  ///   * `tmp/foo.txt`
  /// * Windows:
  ///   * `bar\baz`
  ///   * `C:Users\`
  ///   * `\Users`
  public var isRelative: Bool { get }
}

Path decomposition and analysis

Paths can be decomposed into their (optional) root and (potentially empty) relative portion. Or, they can be decomposed into their (optional) final relative component (basename) and the directory of that component (dirname).

extension FilePath {
  /// Returns the root directory of a path if there is one, otherwise `nil`.
  ///
  /// On Unix, this will return the leading `/` if the path is absolute
  /// and `nil` if the path is relative.
  ///
  /// On Windows, for traditional DOS paths, this will return
  /// the path prefix up to and including a root directory or
  /// a supplied drive or volume. Otherwise, if the path is relative to
  /// both the current directory and current drive, returns `nil`.
  ///
  /// On Windows, for UNC or device paths, this will return the path prefix
  /// up to and including the host and share for UNC paths or the volume for
  /// device paths followed by any subsequent separator.
  ///
  /// Examples:
  /// * Unix:
  ///   * `/foo/bar => /`
  ///   * `foo/bar  => nil`
  /// * Windows:
  ///   * `C:\foo\bar                => C:\`
  ///   * `C:foo\bar                 => C:`
  ///   * `\foo\bar                  => \`
  ///   * `foo\bar                   => nil`
  ///   * `\\server\share\file       => \\server\share\`
  ///   * `\\?\UNC\server\share\file => \\?\UNC\server\share\`
  ///   * `\\.\device\folder         => \\.\device\`
  ///
  /// Setting the root to `nil` will remove the root and setting a new
  /// root will replace the root. Passing a non-root to the setter will trap.
  ///
  /// Example:
  ///
  ///     var path: FilePath = "/foo/bar"
  ///     path.root = nil // path is "foo/bar"
  ///     path.root = "/" // path is "/foo/bar"
  ///
  /// Example (Windows):
  ///
  ///     var path: FilePath = #"\foo\bar"#
  ///     path.root = nil         // path is #"foo\bar"#
  ///     path.root = "C:"        // path is #"C:foo\bar"#
  ///     path.root = #"C:\"#     // path is #"C:\foo\bar"#
  ///
  public var root: FilePath.Component? { get set }

  /// Gets or sets the relative portion of the path (everything after root).
  ///
  /// Examples:
  /// * Unix:
  ///   * `/foo/bar => foo/bar`
  ///   * `foo/bar  => foo/bar`
  ///   * `/        => ""`
  /// * Windows:
  ///   * `C:\foo\bar                  => foo\bar`
  ///   * `foo\bar                     => foo\bar`
  ///   * `\\?\UNC\server\share\file   => file`
  ///   * `\\?\device\folder\file.exe  => folder\file.exe`
  ///   * `\\server\share\file         => file`
  ///   * `\                           => ""`
  ///
  ///
  /// Setting a relative path replaces everything after the root with `newValue`.
  ///
  /// Example:
  ///
  ///     var path: FilePath = "/foo/bar"
  ///     path.relativePath = "tmp/file.txt" // path is "/tmp/file.txt"
  ///     path.relativePath = ""             // path is "/"
  ///
  public var relativePath: FilePath { get set }

  /// Returns the final relative component of the path.
  /// Returns `nil` if the path is empty or only contains a root.
  ///
  /// Note: Even if the final relative component is a special directory
  /// (`.` or `..`), it will still be returned. See `lexicallyNormalize()`.
  ///
  /// Examples:
  /// * Unix:
  ///   * `/usr/local/bin/ => bin`
  ///   * `/tmp/foo.txt    => foo.txt`
  ///   * `/tmp/foo.txt/.. => ..`
  ///   * `/tmp/foo.txt/.  => .`
  ///   * `/               => nil`
  /// * Windows:
  ///   * `C:\Users\                    => Users`
  ///   * `C:Users\                     => Users`
  ///   * `C:\                          => nil`
  ///   * `\Users\                      => Users`
  ///   * `\\?\UNC\server\share\bar.exe => bar.exe`
  ///   * `\\server\share               => nil`
  ///   * `\\?\UNC\server\share\        => nil`
  ///
  /// Setting the basename to `nil` pops off the last relative
  /// component, otherwise it will replace it with `newValue`.
  ///
  /// Example:
  ///
  ///     var path: FilePath = "/usr/bin/ls"
  ///     path.basename = "cat" // path is "/usr/bin/cat"
  ///     path.basename = nil   // path is "/usr/bin"
  ///
  public var basename: FilePath.Component? { get set }

  /// Creates a new path with everything up to but not including the `basename`.
  ///
  /// If the path only contains a root, returns `self`.
  /// If the path has no root and only includes a single component,
  /// returns an empty FilePath.
  ///
  /// Examples:
  ///  * `/usr/bin/ls => /usr/bin`
  ///  * `/foo        => /`
  ///  * `/           => /`
  ///  * `foo         => ""`
  ///
  /// Setting the `dirname` replaces everything before `basename` with `newValue`.
  ///
  /// Example:
  ///
  ///     var path: FilePath = "/usr/bin/ls"
  ///     path.dirname = "/bin" // path is "/bin/ls"
  ///     path.dirname = ""     // path is "ls"
  ///
  public var dirname: FilePath { get set }
}

extension FilePath.ComponentView {
  /// The root component, if it exists. See `FilePath.root`.
  public var root: FilePath.Component? { get }

  /// The portion of this path after the root.
  public var relativeComponents: SubSequence { get }

  /// The final relative component of the path. Returns `nil` if the path is empty or only
  /// contains a root. See `FilePath.basename`.
  public var basename: FilePath.Component?

  /// The portion of this path with everything up to but not including the `basename`.
  public var dirname: SubSequence
}

Rationale: (see gist)

Future Work: (see gist)

Setters allow for in-place mutation. FilePath.root's setter allows making a path relative or absolute, and even allows switching root representations on Windows. FilePath.relativePath's setter allows a path to point to a new location from the same root (if there is one). FilePath.basename's setter allows a path point to a peer entry.

Components may be decomposed into their stem and (optional) extension (.txt, .o, .app, etc.) FilePath gains convenience APIs for dealing with the stem and extension of the basename, if it exists.

extension FilePath.Component {
  /// The extension of this file or directory component.
  ///
  /// If `self` does not contain a `.` anywhere, or only
  /// at the start, returns `nil`. Otherwise, returns everything after the dot.
  ///
  /// Examples:
  ///   * `foo.txt    => txt`
  ///   * `foo.tar.gz => gz`
  ///   * `Foo.app    => app`
  ///   * `.hidden    => nil`
  ///   * `..         => nil`
  ///
  public var `extension`: String? { get }

  /// The non-extension portion of this file or directory component.
  ///
  /// Examples:
  ///   * `foo.txt => foo`
  ///   * `Foo.app => Foo`
  ///   * `.hidden => .hidden`
  ///   * `..      => ..`
  ///
  public var stem: String { get }
}

extension FilePath {
  /// The extension of the file or directory last component.
  ///
  /// If `basename` is `nil` or one of the special path components
  /// `.` or `..`, `get` returns `nil` and `set` does nothing.
  ///
  /// If `basename` does not contain a `.` anywhere, or only
  /// at the start, `get` returns `nil` and `set` will append a
  /// `.` and `newValue` to `basename`.
  ///
  /// Otherwise `get` returns everything after the last `.` and `set` will
  /// replace the extension.
  ///
  /// Examples:
  ///   * `/tmp/foo.txt                 => txt`
  ///   * `/Appliations/Foo.app/        => app`
  ///   * `/Appliations/Foo.app/bar.txt => txt`
  ///   * `/tmp/foo.tar.gz              => gz`
  ///   * `/tmp/.hidden                 => nil`
  ///   * `/tmp/.hidden.                => ""`
  ///   * `/tmp/..                      => nil`
  ///
  /// Example:
  ///
  ///     var path = "/tmp/file"
  ///     path.extension = ".txt" // path is "/tmp/file.txt"
  ///     path.extension = ".o"   // path is "/tmp/file.o"
  ///     path.extension = nil    // path is "/tmp/file"
  ///     path.extension = ""     // path is "/tmp/file."
  ///
  public var `extension`: String? { get set }

  /// The non-extension portion of the file or directory last component.
  ///
  /// Returns `nil` if `basename` is `nil`
  ///
  ///   * `/tmp/foo.txt                 => foo`
  ///   * `/Appliations/Foo.app/        => Foo`
  ///   * `/Appliations/Foo.app/bar.txt => bar`
  ///   * `/tmp/.hidden                 => .hidden`
  ///   * `/tmp/..                      => ..`
  ///   * `/                            => nil`
  public var stem: String? { get }
}

FilePath.extension's setter allows for convenient in-place reassigning or adding/removing of an extension.

Rationale: (see gist)

Lexical operations

FilePath supports lexical (i.e. does not call into the file system to e.g. follow symlinks) operations such as normalization of special directory components (. and ..) and forming relative paths.

extension FilePath {
  /// Whether the path is in lexical-normal form, that is `.` and `..` components have
  /// been collapsed lexically (i.e. without following symlinks).
  ///
  /// Examples:
  /// * `"/usr/local/bin".isLexicallyNormal == true`
  /// * `"../local/bin".isLexicallyNormal   == true`
  /// * `"local/bin/..".isLexicallyNormal   == false`
  public var isLexicallyNormal: Bool { get }

  /// Collapse `.` and `..` components lexically (i.e. without following symlinks).
  ///
  /// Examples:
  /// * `/usr/./local/bin/.. => /usr/local`
  /// * `/../usr/local/bin   => /usr/local/bin`
  /// * `../usr/local/../bin => ../usr/bin`
  public mutating func lexicallyNormalize()

  /// Returns a copy of `self` in lexical-normal form, that is `.` and `..` components
  /// have been collapsed lexically (i.e. without following symlinks). See `lexicallyNormalize`
  public var lexicallyNormal: FilePath { get }
}

Modifying paths

FilePath supports common mutation operations.

extension FilePath {
    /// If `prefix` is a prefix of `self`, removes it and returns `true`. Otherwise
    /// returns `false`.
    ///
    /// Example:
    ///
    ///     var path: FilePath = "/usr/local/bin"
    ///     path.stripPrefix("/usr/bin")   // false
    ///     path.stripPrefix("/us")        // false
    ///     path.stripPrefix("/usr/local") // true, path is "bin"
    ///
    public mutating func stripPrefix(_ prefix: FilePath) -> Bool

    /// Push each component of `other`. If `other` has a root, replaces `self` with
    /// other.
    ///
    /// Example:
    ///
    ///     var path: FilePath = "/tmp"
    ///     path.append("dir/file.txt") // path is "/tmp/dir/file.txt"
    ///     path.append("/bin")         // path is "/bin"
    ///
    public mutating func append(_ other: FilePath)

    /// If `other` is a relative path component, pushes it onto the end of `self`.
    /// If `other` is a root, replaces `self` with `other`.
    ///
    /// Example:
    ///
    ///     var path: FilePath = "/tmp"
    ///     path.pushLast("dir")      // path is "/tmp/dir"
    ///     path.pushLast("file.txt") // path is "/tmp/dir/file.txt"
    ///     path.pushLast("/")        // path is "/"
    ///
    public mutating func pushLast(_ other: FilePath.Component)

    /// Remove the last component of this file path. If the path is
    /// root or empty, does nothing and returns false.
    ///
    /// To see the component that will be popped, use `basename`. `popLast()` is
    /// equivalent to setting `basename` to `nil`.
    ///
    /// Examples:
    /// * `"/".popLast()        == false // path is "/"`
    /// * `"/foo/bar".popLast() == true  // path is "/foo"`
    ///
    public mutating func popLast() -> Bool

    /// Remove the contents of the path, keeping the null terminator.
    public mutating func removeAll(keepingCapacity: Bool = false)

    /// Reserve enough storage space to store `minimumCapacity` `PlatformChar`s.
    public mutating func reserveCapacity(_ minimumCapacity: Int)
}

Rationale: (see gist)

Paths and strings

Just like FilePath, FilePath.Component can be decoded/validated into a Swift String.

extension String {
  /// Creates a string by interpreting the path component's content as UTF-8 on Unix
  /// and UTF-16 on Windows.
  ///
  /// - Parameter path: The path component to be interpreted as `PlatformUnicodeEncoding`.
  ///
  /// If the content of the path component isn't a well-formed Unicode string,
  /// this initializer replaces invalid bytes them with U+FFFD.
  /// This means that, depending on the semantics of the specific file system,
  /// conversion to a string and back to a path component
  /// might result in a value that's different from the original path.
  public init(decoding component: FilePath.Component)

  /// Creates a string from a path component, validating its contents as UTF-8 on Unix
  /// and UTF-16 on Windows.
  ///
  /// - Parameter path: The path component to be interpreted as `PlatformUnicodeEncoding`.
  ///
  /// If the contents of the path component isn't a well-formed Unicode string,
  /// this initializer returns `nil`.
  public init?(validating component: FilePath.Component)
}

FilePath and FilePath.Component gain convenience properties for viewing their content as Strings.

extension FilePath {
  /// Creates a string by interpreting the pathā€™s content as UTF-8 on Unix
  /// and UTF-16 on Windows.
  ///
  /// This property is equivalent to calling `String(decoding: path)`.
  public var string: String

  /// Creates an array of strings representing the components of this
  /// path. Interprets the file pathā€™s content as UTF-8 on Unix and UTF-16 on Windows.
  ///
  /// If the content of the path isn't a well-formed Unicode string,
  /// this replaces invalid bytes them with U+FFFD.
  public var componentStrings: [String]
}

extension FilePath.Component {
  /// Creates a string by interpreting the componentā€™s content as UTF-8 on Unix
  /// and UTF-16 on Windows.
  ///
  /// This property is equivalent to calling `String(decoding: component)`.
  public var string: String
}

Rationale: (see gist)

Separators are always normalized

FilePath now normalizes directory separators on construction and maintains this invariant across mutations. In the relative portion of the path, FilePath will strip trailing separators and coalesce repeated separators.

  FilePath("/a/b/") == "/a/b"
  FilePath("a///b") == "a/b"

Rationale: (see gist)

Windows accepts either forwards slashes (/) or backslashes (\) as directory separators, though the platform's preferred separator is backslash. On Windows, FilePath normalizes forwards slashes to backslashes on construction. Backslashes after a UNC server/share or DOS device path's volume are treated as part of the root.

  FilePath("C:/foo/bar/") == #"C:\foo\bar"#
  FilePath(#"\\server\share\folder\"#) == #"\\server\share\folder"#
  FilePath(#"\\server\share\"#) == #"\\server\share\"#
  FilePath(#"\\?\volume\"#) == #"\\?\volume\"#

Future Work: (see gist)

Wide and narrow characters

Unix paths are represented as contiguous CChars in memory and convert to a String by validating as UTF-8. Windows paths are represented as contiguous UInt16s in memory and are converted to a String by validating as UTF-16. Either platform may have invalid Unicode content, which only affects the conversion to Swift's Unicode-correct String type (i.e. it does not affect the semantics of other FilePath operations).

To aid readability and make it easier to write code agnostic to the platform's character-width, we introduce typealiases for the platform's preferred character and Unicode encoding.

/// The platform's preferred character type. On Unix, this is an 8-bit `CChar` (which
/// may be signed or unsigned, depending on platform). On Windows, this is
/// `UInt16` (a "wide" character).
#if os(Windows)
public typealias PlatformChar = UInt16
#else
public typealias PlatformChar = CChar
#endif

/// The platform's preferred Unicode encoding. On Unix this is UTF-8 and on Windows
/// it is UTF-16. Native strings may contain invalid Unicode,
/// which will be handled by either error-correction or failing, depending on API.
#if os(Windows)
public typealias PlatformUnicodeEncoding = UTF16
#else
public typealias PlatformUnicodeEncoding = UTF8
#endif

String, FilePath, and FilePath.Component gain "escape hatch" APIs for C interoperability using these typealiases.

extension String {
  /// Creates a string by interpreting the null-terminated platform string as
  /// UTF-8 on Unix and UTF-16 on Windows.
  ///
  /// - Parameter platformString: The null-terminated platform string to be
  ///  interpreted as `PlatformUnicodeEncoding`.
  ///
  /// If the content of the platform string isn't well-formed Unicode,
  /// this initializer replaces invalid bytes them with U+FFFD.
  /// This means that, depending on the semantics of the specific platform,
  /// conversion to a string and back might result in a value that's different
  /// from the original platform string.
  public init(platformString: UnsafePointer<PlatformChar>)

  /// Creates a string by interpreting the null-terminated platform string as
  /// UTF-8 on Unix and UTF-16 on Windows.
  ///
  /// - Parameter platformString: The null-terminated platform string to be
  ///  interpreted as `PlatformUnicodeEncoding`.
  ///
  /// If the contents of the platform string isn't well-formed Unicode,
  /// this initializer returns `nil`.
  public init?(validatingPlatformString: UnsafePointer<PlatformChar>)

  /// Calls the given closure with a pointer to the contents of the string,
  /// represented as a null-terminated platform string.
  ///
  /// - Parameter body: A closure with a pointer parameter
  ///   that points to a null-terminated platform string.
  ///   If `body` has a return value,
  ///   that value is also used as the return value for this method.
  /// - Returns: The return value, if any, of the `body` closure parameter.
  ///
  /// The pointer passed as an argument to `body` is valid
  /// only during the execution of this method.
  /// Don't try to store the pointer for later use.
  public func withPlatformString<Result>(
    _ body: (UnsafePointer<PlatformChar>) throws -> Result
  ) rethrows -> Result
}

extension FilePath {
  /// Creates a file path by copying bytes from a null-terminated platform string.
  ///
  /// - Parameter platformString: A pointer to a null-terminated platform string.
  public init(platformString: UnsafePointer<PlatformChar>)

  /// Calls the given closure with a pointer to the contents of the file path,
  /// represented as a null-terminated platform string.
  ///
  /// - Parameter body: A closure with a pointer parameter
  ///   that points to a null-terminated platform string.
  ///   If `body` has a return value,
  ///   that value is also used as the return value for this method.
  /// - Returns: The return value, if any, of the `body` closure parameter.
  ///
  /// The pointer passed as an argument to `body` is valid
  /// only during the execution of this method.
  /// Don't try to store the pointer for later use.
  public func withPlatformString<Result>(
    _ body: (UnsafePointer<PlatformChar>) throws -> Result
  ) rethrows -> Result
}

extension FilePath.Component {
  /// Creates a file path component by copying bytes from a null-terminated platform string.
  ///
  /// - Parameter string: A pointer to a null-terminated platform string.
  public init(platformString: UnsafePointer<PlatformChar>)

  /// Calls the given closure with a pointer to the contents of the file path component,
  /// represented as a null-terminated platform string.
  ///
  /// If this is not the last component of a path, an allocation will occur in order to
  /// add the null terminator.
  ///
  /// - Parameter body: A closure with a pointer parameter
  ///   that points to a null-terminated platform string.
  ///   If `body` has a return value,
  ///   that value is also used as the return value for this method.
  /// - Returns: The return value, if any, of the `body` closure parameter.
  ///
  /// The pointer passed as an argument to `body` is valid
  /// only during the execution of this method.
  /// Don't try to store the pointer for later use.
  public func withPlatformString<Result>(
    _ body: (UnsafePointer<PlatformChar>) throws -> Result
  ) rethrows -> Result
}

Future Work: (see gist)

Rejected or deferred alternatives

Deferred: Introduce RelativePath and AbsolutePath

(see gist)

Considering: Alternate names to basename, dirname, and popLast()

(see gist)

Considering: "Root" only refers to a separator, does not include Windows volumes

(see gist)

Source and ABI stability impact

API changes are strictly additive.

Separator normalization does not affect the semantics of path operations. It can change how paths are printed, compared, and hashed (this proposal argues these changes are for the better).

Deprecations

A handful of APIs have been deprecated in favor of better-named alternatives.

extension FilePath {
  @available(*, deprecated, renamed: "init(validating:)")
  public init?(validatingUTF8 path: FilePath)

  @available(*, deprecated, renamed: "init(platformString:)")
  public init(cString: UnsafePointer<CChar>)

  @available(*, deprecated, renamed: "withPlatformString(_:)")
  public func withCString<Result>(
    _ body: (UnsafePointer<CChar>) throws -> Result
  ) rethrows -> Result
}
extension String {
  @available(*, deprecated, renamed: "init(validating:)")
  public init?(validatingUTF8 path: FilePath)
}
38 Likes

Thanks for the write up, the API makes sense to me.

What's your view on a variadic FilePath.init overload? I.e. should it allow FilePath("/usr", "bin") in addition to FilePath(["/usr", "bin"])? I wouldn't mind if it doesn't, but I'm somewhat used to the variadic version in TSC and I wonder if there's a specific reasoning for omitting the variadic version here?

1 Like

Given the many APIs out there that still take URLs, would it be valuable to include a url computed property on a FilePath?
ie:

extension FilePath {
    var url: URL { URL(fileURLWithPath: self.string) }
}

*bike shedding

I've never liked the names basename and dirname for those properties.

basename sounds OK, but I wonder if something like fileName would be clear that this is just the name of the file at the filepath?
I would also prefer directoryName over dirname. Swift often uses full words over abbreviations and this isn't too different from the original dirname to cause any confusion IMO.

I understand there is prior art and I am accustomed to the original names so I'm not too attached to making them different. Just thought I'd bring it up and see where it goes.

6 Likes

These are more or less mutually exclusive. That makes me wonder if we should instead represent this with an enum like:

public extension FilePath.Component {
  enum Kind {
    case normal, rootDirectory, currentDirectory, parentDirectory
  }
  var kind: Kind { get }
}

(You could even make an argument for having FilePath.Component itself be an enum with an associated type, but Iā€™m guessing this wouldnā€™t work out as well as one would like.)

Are these top-level types, or are they nested inside FilePath? Iā€™m worried that as top-level types, these names may not give enough context to understand what theyā€™re for.

Correct me if Iā€™m wrong, but even on Windows you can use POSIX filesystem APIs that take UnsafePointer<CChar>, right? If so, Iā€™m not sure these deprecations are appropriate.

8 Likes

Given how common some multi-part extensions like .tar.gz are in the real world, what are your thoughts on including ways to get/set all of a file's extensions at once?

In reality .tgz and .tar.gz are the same, but with this API you couldn't tell that a .tar.gz existed without inspecting the base name directly, or both the extension and the stem.

2 Likes

Just found the "Considering Alternate Names to base name, dirname, and popLast()" section in the proposal gist.

Given the type is already FilePath, I don't think that fileName would be too confusing and since both Rust and C++17 are headed in this direction I think this may be something people just get used to over time.

I do really like the name parent as a replacement for dirname. There is prior art in other areas already like Python's pathlib and I don't think people would be confused by that spelling.

If we one day have typed file paths (ie: File, Directory, Socket, Character, Block, etc) then the confusion around a fileName property should also decrease since all those types' paths will include a file name.

3 Likes

Overall a huge +1 to this proposal. Having built a FilePath library myself (because I hate the un-swifty Foundation.FileManager APIs) I am really looking forward to first-class path support in swift! The Syntactic APIs is a great start before diving into platform path operations and I don't really have any problems with the designs, just a few comments which I stated above.

3 Likes

I found appendā€˜s behaviour here surprising:

I would have expected this to result in /tmp/bin. Replacing the whole path seems a bit at odds for a method called append or is there some particular reason it should behave this way?

6 Likes

I am a developer of iOS/macOS/watchOS. So the first question comes to my mind is why use FilePath instead of URL? As far as I know in Apple's documents the URL is used to replace FilePath, the latter is a String type.

There must be someday that URL will also be in System, right? And URL is the ultra set of a FilePath. So...

1 Like

One concern might be that since FilePath and FilePath.Component are ExpressibleByStringLiteral, there might be some ambiguity between overloads taking either. Components have the additional (currently runtime) precondition that any literal is just one component. I don't know how a variadic init over FilePath.Components... would interact with an init over FilePath when given a single string literal. Perhaps @xedin or @hborla can help with that.

Otherwise I think it's a neat idea.

1 Like

The type checker will prefer a non-variadic overload over a variadic one, so I don't think there would be an ambiguity even if the argument labels are the same:

struct S {

  init(arg: String) {}

  init(arg: String...) {}
}

let s = S(arg: "hello!") // calls S.init(arg: String)
5 Likes

Since the root can only exist at the start of a path, Iā€™m not convinced it should be a Component. Having a FilePath be a (Root?, [Component]) makes the ā€œone root at the startā€ rule part of the type system. The downside, of course, is that it makes the path representation a little more complicated, necessitating more APIs (or more complicated APIs).

This might turn out to be a bad idea, but at the very least itā€™d be good to put in Alternatives Considered.

2 Likes

Swift System is a library to enable systems level programming and it exists at a lower layer than Foundation, so it cannot refer to types from Foundation.

In Apple's SDK, there is an entire SDK layer that exists below Cocoa whose APIs are expressed in C, e.g. POSIX, XPC, and anything in libSystem.dylib. If you import Darwin, you will get a bunch of weakly typed C APIs where paths are expressed as UnsafePointer<CChar>. Swift System is aiming to provide a better story for working at this layer of the OS which traditionally has been C only -- i.e. it doesn't link against the Objective-C runtime, much less Foundation. Some of System's users and potential users can't even link against Foundation.

Yes, albeit not in System directly. In Apple's SDK, there is a cross-import overlay, where this functionality is defined if you have imported both System (i.e. the binary version of System in the SDK) and Foundation together.

extension URL {
  /// Creates a URL from a file path by by copying and validating its UTF-8 contents.
  ///
  /// This initializer does not try to repair ill-formed UTF-8 code unit
  /// sequences. If any are found, the result of the initializer is `nil`.
  ///
  public init?(_ path: FilePath)

  /// Creates a URL from a file path by by copying and validating its UTF-8 contents.
  ///
  /// This initializer does not try to repair ill-formed UTF-8 code unit
  /// sequences. If any are found, the result of the initializer is `nil`.
  ///
  public init?(_ path: FilePath, isDirectory: Bool)
}

extension FilePath {
  /// Creates a file path from a URL
  ///
  /// The result is nil if `url` doesn't refer to a local file.
  public init?(_ url: URL)
}

We want to include these in System's open source repo, but we'll need to figure out the tooling for that. There is an issue tracking it.

10 Likes

That's definitely something that would be interesting, but I'm not sure exactly how we'd want that to look. The nice thing about extension is that non-pathological extensions fit in String's small-form, meaning there is no allocation for any extension <= 15 UTF-8 code units.

For an extensions property, if we made it an Array<String> it would always trigger an allocation. This is unfortunate because in practice even compound extensions like .tar.gz would fit in a String without allocation. We might want to consider a BidirectionalCollection view of the extensions themselves (where Element == String). Adding a custom type just for the extensions does carry some weight in binary size and API size, but that could be ameliorated if/when Swift adds support for opaque types with bound associated types. I'm not sure if that would exclude us from having a setter, though.

The POSIX variants do not support Unicode, limiting you to ASCII files only. If you have any unicode-enabled files, you would not be able to access them. As such, we really should be using Unicode throughout on Windows (and already do so in all the core libraries).

2 Likes

Unfortunately, modeling things as enums doesn't work out as well as one would like :-). On Windows, roots are much more complex than a simple separator, so they would carry a payload, meaning that non-special-directory components are effectively slices of the path. Future perf work includes coming up with a small-component representation, which is also better done if we use a struct now.

We could go with the Kind approach. The reason I shied away from it is I tend to find the boolean properties make better code at the use sites than comparing enum cases and they are more consistent with the compound properties like isSpecialDirectory. Enums also carry with them the extensibility problem, where they are either open and we lose total switching, or they cannot be extended in the same way that we can add new computed properties. I originally also had static properties (which would enable switching over them like an open enum), but in practice I found myself always using the string literals "." and ".." instead. The only exception would be where type context was just ambiguous enough that literals wouldn't work, but not too ambiguous so that the type name could be omitted ala .parentDirectory.

We do miss out on expressing the mutual exclusion and total switching though. I did end up preferring using the if statements inside the implementation of lexicallyNormalize(), but FWIW that's using a pretty low-level algorithm (and I definitely don't want to retain the path as I'm trying to mutate it).

edit: Whoops, I missed this part:

These are top level, as in they exist alongside CChar. We could consider having a enum Platform {} namespace so that they would be Platform.Char and Platform.UnicodeEncoding, which might be nice if we add more of these. A downside would be "taking" the name Platform from anything downstream of System (including future versions of System). In a different PR for a different feature, I did consider having a enum CTypes {} instead of top-level CModeT, which I liked, but that's less likely to be used downstream.

1 Like

I was thinking of the "file" in "FilePath" as being more analogous to the "file" in "file system", but I do see your point.

C++17 and Rust's fileName do have different semantics from Unix's basename. It is nice to be able to decompose a path into something like (parent, lastRelativeComponent?), which would have to model (dirname, basename?) semantics in the case of a trailing . or ...

Lacking a better name for very specific semantics, I used the existing Unix (and standardized in POSIX, XPG, etc.,) names in lieu of baseName and directoryName. I didn't want to put much distance from the Unix concepts and could add to confusion since the dirname of /usr/bin/ls/. is a file. I do wonder if there is a use for both deconstruction semantics and Rust's API, though.

I'm very interested in hashing this detail out more. It also intersects a bit with the naming of popLast(), since it's only the (currently called) basename that is popped. It also intersects with another interesting area:

I had a very early prototype that did this. Originally I tried to make the ComponentView RangeReplaceable, but that is a very bad idea for a heterogenous collection where a root can only occur at the beginning. It was a hazard to use generically (and who doesn't like to call shuffle on their path components). To correct this I did actually make a Root, RelativeComponent, and RelativeComponentView where "relative" is in the name because it would be odd if "/bin/ls".components didn't have / in it.

In the end I found I really preferred calling sensible top-level mutating methods on FilePath vs the generic RangeReplaceable methods on a view, and it's nice that a simple walk over the components of a path was sufficient to reconstitute the path.

However, the approach of separating them out does have a lot of benefits, including a type system solution to pushLast()'s special-case semantics for when a root is pushed. It could also make the leading alternative name for basename (lastRelativeComponent) less unwieldy if we can drop "relative". Then, popLast() can be popLastComponent() (or removeLastComponent(), etc).

append() would still need the special semantics for when you're appending an absolute path. RelativePath would help with that, but does carry some downsides without struct subtyping and adding a FilePathProtocol:

This is the "normal" behavior of this operation across the languages I looked at. I use C++17 as an example of a standardized cross-platform library for paths, Rust as an example of a fresh take on systems programming for Unix and Windows, and C# to help guide intuition for Windows specifically. C++17's append, Rust's push, and C#'s combine implement this, though C# did provide an alternative that does not. For scripting, Python's os.path.join has these same semantics. I'm not familiar with all of the rationale, but I can guess at some of it.

When using string literals, it does seem odd and is a source of confusion in those languages. But for programmatic use case, it would be very surprising if appending another strongly typed path just ignored a root. Additionally, the paths .. and /.. are two very different paths and should probably not be appended the same. This also models the use case of treating a path as a stack of subpaths that we are pushing on to in a similar way that cd does (cd /usr/bin; cd /bin results in being in /bin).

Another potential (Windows-specific) rationale would be that appending a rooted relative path can assume the specified drive of the base. If we do keep the existing semantics, I should revisit the precise Windows behavior (@compnerd, thoughts?).

We need to define some behavior in this situation, and no choice is a clear cut winner for me. We could

  1. Trap if the argument is absolute
  2. Silently ignore the root (surprising in programmatic use)
  3. Silently drop the base (surprising in string literal use)
  4. Return a bool describing what we "silently" did (either 2 or 3 above) for those studious users that would check it.

Or, if we go with @jrose's suggestion and that causes the type system to solve the pushLast() special behavior for us, we could consider dropping this in exchange for something like appendComponents<C: Collection>(_:c) where C.Element == Component.

Thanks! Specifically, we have:

extension FilePath {
  init(_: String) { ... }

  struct Component: ExpressibleByStringLiteral {...}

  // Proposed:
  init(_: Components...) { ... }
}

And in the future we might add:

struct SystemString: ExpressibleByStringLiteral { ... }
extension FilePath {
  init(_: SystemString)
}

Since string literals always contain validly encoded Unicode, it's ok to prefer the String init to the SystemString init, but we definitely do not want to prefer the variadic one in any case (especially since components cannot have directory separators in them).

So it sounds like this would work. I do wonder if it might be clearer with an argument label that the variadics would be joined with a directory separator: FilePath(components: "/", "bin", "ls") == "/bin/ls" or FilePath(joining: "/", "bin", "ls") == "/bin/ls".

1 Like

I had no idea os.path.join worked this way and now I'm terrified how many bugs I've left behind over the years :flushed:

It makes sense when you think of /bin being a path with a root and not a component that you add and which comes with a separator for padding.

I just wonder how obvious that is going to be for users of the API. Maybe throwing if you append a rooted path would be safer?

Edit: If it behaves as it does, mirroring Python's name join might be better than append.

2 Likes