[API Review] FilePath Syntactic APIs - Version 2

I wanted to provide some more rationale and start more discussion on the naming conventions employed.

Naming lexical and non-lexical operations.

The proposed APIs are syntactic only, meaning they don't touch the file system. For resolving .. in this manner, I put the word lexical in the name (similarly to C++17). These operations are fast approximations that might give the wrong answer in the presense of symlinks. This allows me to reserve the name without lexical in it for the correct-but-slow altenative (again, this is what C++17 does).

Thus, we have lexicallyNormalize() and lexicallyResolve(), so that in the future we may have normalize() and resolve(). We haven't designed the file-system APIs yet, but it's likely they will hang off of FilePath as well, though they will all be throw-ing.

I am totally open to argument/debate. In general, I'd like the simpler name to be the safest/most-correct operation we can do. Additionally, it's easier and more friendly to deprecate longer names in favor of new shorter ones than the other way around, so using "lexically" buys us some breathing room.

Naming append, push, and currently-called resolve

When it comes to manipulating FilePath by adding things on to the end, I can see 3 kinds of desired semantics:

  1. String-like concatenation (+), called append
  2. Directory traversal (cd), called push
  3. Subpaths with guaranteed containment (chroot), called resolve

Python's join and C#'s Combine is the equivalent of our push: it will clear the path if joined with an aboslute path. This is the cause of widespread confusion and stack overflow posts asking why someone can't join /mysubpath/foo onto the end of a path. And as one commenter pointed out:

so we want to get this right.

1. append

Our definitions of append do not take a FilePath, but rather take Components which cannot be root. Thus, the type system guarantees we are never programmatically appending an absolute path.

  public mutating func append(_ component: FilePath.Component)
  public mutating func append<C: Collection>(_ components: C)
    where C.Element == FilePath.Component

For the common string literal scenario, we do provide an overload of append taking a String. This would be preferred over the string literal conversion of Component ("foo/bar" would be otherwise rejected because it has more than one component in it). Its behavior mimics the common sense expected behavior of stringy path operations, like C#'s (later-added) Join.

  /// Append the contents of `other`, ignoring any spurious leading separators.
  ///
  /// A leading separator is spurious if `self` is non-empty.
  ///
  /// Example:
  ///   var path: FilePath = ""
  ///   path.append("/var/www/website") // "/var/www/website"
  ///   path.append("static/assets") // "/var/www/website/static/assets"
  ///   path.append("/main.css") // "/var/www/website/static/assets/main.css"
  ///
  public mutating func append(_ other: String)

2. push

This takes a FilePath and implements cd-like semantics where pushing an absolute path will reassign self to other.

  /// If `other` does not have a root, append each component of `other`. If
  /// `other` has a root, replaces `self` with other.
  ///
  /// This operation mimics traversing a directory structure (similar to the
  /// `cd` command), where pushing a relative path will append its components
  /// and pushing an absolute path will first clear `self`'s existing
  /// components.
  ///
  /// Example:
  ///
  ///     var path: FilePath = "/tmp"
  ///     path.push("dir/file.txt") // path is "/tmp/dir/file.txt"
  ///     path.push("/bin")         // path is "/bin"
  ///
  public mutating func push(_ other: FilePath)

This is Rust's push, C++17's append, Python's join, and C#'s Combine. It is the most commonly implemented semantics in other langauges/libraries, but we have the advantage of leveraging the type system for our definition of append.

push isn't the perfect name, but I think it's preferable to something like navigate(to:), which might carry connotations of adjusting the current working directory or causing other effects.

3. resolve

This takes a subpath and ensures that the result is fully-contained within the directory structure starting at self. Note that in practice, we're only adding resolving as the non-mutating form makes the most sense (see below), but I'm using the word resole as the root for terminology.

  /// Create a new `FilePath` by resolving `subpath` relative to `self`,
  /// ensuring that the result is lexically contained within `self`.
  ///
  /// `subpath` will be lexically normalized (see `lexicallyNormalize`) as
  /// part of resolution, meaning any contained `.` and `..` components will
  /// be collapsed without resolving symlinks. Any root in `subpath` will be
  /// ignored.
  ///
  /// Returns `nil` if the result would "escape" from `self` through use of
  /// the special directory component `..`.
  ///
  /// This is useful for protecting against arbitrary path traversal from an
  /// untrusted subpath: the result is guaranteed to be lexically contained
  /// within `self`. Since this operation does not consult the file system to
  /// resolve symlinks, any escaping symlinks nested inside of `self` can still
  /// be targeted by the result.
  ///
  /// Example:
  ///
  ///     let staticContent: FilePath = "/var/www/my-website/static"
  ///     let links: [FilePath] =
  ///       ["index.html", "/assets/main.css", "../../../../etc/passwd"]
  ///     links.map { staticContent.lexicallyResolving($0) }
  ///       // ["/var/www/my-website/static/index.html",
  ///       //  "/var/www/my-website/static/assets/main.css",
  ///       //  nil]
  public func lexicallyResolving(_ subpath: FilePath) -> FilePath?

I originally had the name "rebase", which AFAICT would be me inventing a bespoke term. resolve carries more connotation as to what this is doing, and it sounds heavier/more-involved than append. The order of arguments flow left-to-right similarly to append and push, making it discoverable in code completion as a more secure alternative.

I hesitate to use the word "safe" in the name, as "safe" usually means memory safety in Swift (though a library like System could redefine terminology, I'd prefer to not cross that bridge just for this use case). I'm avoiding "secure" as well as this is not the safest/most secure variant: it would be better to call realpath or something similar on the result to guarantee a non-escape (at least at the time of the operation, some bad actor might add a symlink later).

Since this is done lexically, and the containment is just lexical containment (i.e. the subpath may actually point to a symlink that escapes containment), this is currently named lexicallyResolving(_ subpath: FilePath). Note that the self portion is not lexically normalized in this API (it doesn't have to be and we can avoid correctness errors), but subpath has to be. An alternative could be resolve(lexicalSubpath subpath: FilePath), which might be more discoverable.

There is no in-place mutating variant. I didn’t think that would be as useful as most of the time self is a constant such as a server root which is applied to many subpaths. Additionally, it would have to return a Bool signaling whether it “worked” and there would have to be some kind of sane well-defined fall-back behavior. I did put the ownership annotations on it so that it would chain efficiently and its storage could be consumed.

4 Likes