I spent an unreasonable amount of time (and tokens) grappling with what FilePath should do with respect to trailing slashes as well as lexical normalization. I thought I'd share some findings so far.
Here's a Broad language survey that covers C++17, Rust, and Python's Pathlib in great detail, as well as Go, Java.NIO, Node.js, Zig, and Haskell.
Here's the synthesized tables providing an overview by area.
Based on this research, I'm proposing some changes from the current swift-system behavior. This is for POSIX paths; Windows will be addressed separately.
Trailing separators
I think that FilePath should preserve a trailing slash if given a trailing slash. Trailing slashes carry meaning in rsync , .gitignore , shell completion, and other application-level conventions. As for stat, a FilePath instance shouldn't destroy information that downstream consumers need without explicit API calls.
Python's pathlib strips trailing separators on construction, destroying the information, and this is largely seen as a mistake that they've been unable to walk back. Swift-system currently strips trailing separators the same way. This is the last opportunity to change that before standard library ABI commitments make it more permanent.
Rust stores verbatim bytes, so the trailing separator was never lost. The nightly has_trailing_sep / with_trailing_sep / trim_trailing_sep APIs add ergonomic accessors for information that was always there but previously required raw byte inspection. Haskell has had equivalent first-class APIs (hasTrailingPathSeparator, addTrailingPathSeparator, dropTrailingPathSeparator) from the start. I think we should provide the same three APIs.
We would still normalize repeated separators on construction (a///b to a/b ), since that's purely an encoding-level concern with no semantic content. FilePath is a COW type, meaning it will be copying the bytes over, so it might as well do semantic-preserving normalization that preserves path algebra and speeds up component iteration, equality, and hashing.
That being said, this raises a question about substitutability: what should Equatable and Hashable mean?
Equality
I weakly believe FilePath("foo/bar") and FilePath("foo/bar/") should compare equal and hash identically, but their literal bytes should differ as one stores a /. This means that insertion and retrieval in a Dictionary can change the presence of a trailing slash, and that will be confusing, but less confusing than the reverse.
Code that needs to distinguish the two forms can use an explicit hasTrailingSeparator API. Similarly, code doing niche symlink handling can explicitly add or remove the trailing separator as needed.
This is, I believe, the right default for a currency type: the common case (these name the same thing) should be easy, and the uncommon case (I care about the trailing separator specifically) should be possible. But, this is obviously a place where there's no perfect solution, we're just trying to find the right solution for Swift.
Normalization
I've also come to believe that lexicallyNormalized() should not resolve .. by default, at least for POSIX-style paths.
Languages are split on this. Rust's component iterator and Python's pathlib both preserve .. . Go's filepath.Clean and C++17's lexically_normal() (lexically) resolve it. I think preserving .. is the right default for Swift: lexical .. resolution is only correct in the absence of symlinks, and a method that silently gives the wrong answer in their presence is an attractive nuisance.
I think we should still offer .. resolution, but as an opt-in rather than the default sense of "normal.". Similarly, normalization could preserve the trailing separator by default (consistent with construction). For example, perhaps we instead have (but with a better name):
func normalize(dropTrailingSeparators: Bool = false, lexicallyResolveDotDot: Bool = false)
Note: I said earlier, I believe we must ship full resolution API for any lexical normalizing API that handles .. and give the full resolution function the better name.
Question: Drop interior or trailing . on construction
A related question is whether construction should also drop interior . components. Currently swift-system preserves them: FilePath("foo/./bar") stores the . as a component. Rust's component iterator (upon which equality and hashing is based) silently skips interior . (but not leading . ), and Python removes them on construction.
There's an argument for stripping them eagerly: . components have no semantic content in a path (unlike .., which does), and removing them at construction simplifies components, iteration, and comparison without losing information. If we're copying the bytes over anyways, and also normalizing repeated separators (a//b -> a/b), then this is a good time to drop the dots.
If we do strip . on construction, note that FilePath.ComponentView's RRC append operations could reintroduce them as part of its path algebra. We still want explicit normalization API, and one could argue that normalizing the . is now done defensively if you don't know where a path came from. If we don't strip ., we have a more consistent normalization story for ., but we're encouraging developers to call that normalization function in more places in code. At this point, we'd probably establish a term like "canonical" instead of "normal" for on-construction operations.
My very weak current opinion is that we keep interior . on construction (like C++17 and Rust) but treat it as an actual component for iteration, equality, and hashing (like C++17 and existing swift-system, but unlike Rust). This preserves the path algebra: RRC operations don't need to worry about silently reintroducing something that construction would have removed. Explicit normalization is then the tool for cleaning up . components, whether they came from construction or from mutations.
I'd like to hear the community's thoughts on where the right line is.