This is already handled by the pitch (and the existing System.FilePath):
Yes, we may want verbatim mode for the reasons @allevato mentioned as well as other reasons. What I'm asking is what your concern is in addition to @allevato's reasons. Specifically, around separators. When you say:
A path component is a bag of bytes passed to the file system, and the existing System.FilePath, the first version of the pitched FilePath in this thread, as well as any theoretical second pitch discussed so far preserves this. A full path is not passed to the file system. It is parsed by the kernel which honors the contract of the syscall.
We're not proposing to modify the bytes of any component here.
That may be true on Darwin, but on Windows itās not the case. The whole path is passed to the filesystem layer so it can implement reparse points and filters.
Because I could see this being equally surprising either way, would there be any value in exploring making FilePathnot conform to Equatable and Hashable, and instead requiring the API user to express their expectations explicitly? I don't have a (good) name in mind, but I could imagine, for example, FilePath vending Hashable "views" whose implementation of Equatable and Hashable takes trailing slashes into account (or not), e.g., like String prior to conformance to Collection directly.
ETA: I could also imagine making FilePath generic on its trailing slash semantics (using a marker type) so it could directly conform while being explicit, e.g. FilePath<WithTrailingSlash> vs. FilePath<WithoutTrailingSlash> ā with convenience conversions between the two.
This isn't the case on Windows either, not even for reparse points. A reparse point is an attribute on a file or directory that the filesystem has already located through normal separator-based traversal. What is passed to the file system is a subpath, not the whole path, and that subpath is a sequence of components separated by \, not an opaque bag of bytes.
Again, we might want a verbatim storage mode for other reasons. I'm not arguing against the conclusion, just trying to unpack the rationale. The rationale is relevant even if we do a verbatim mode as we will have to decide what the "default" or unlabeled initializers do: verbatim mode, separator-canonicalized, further canonicalized, etc.
We don't have to provide an unlabeled initializer. It might be safer to require the caller to acknowledge what kind of transformation (if any) will happen to the input by requiring a label on all overloads.
Sure, but we'd have to either pick a behavior for init(stringLiteral:) or drop the ExpressibleByStringLiteral conformance. I don't think we can kick the can far enough down the road to not think about separator coalescing.
Weāre getting into esoteric possibilities, but couldnāt a 3rd party filesystem, reparse point, or filter decide to parse that subpath however it wants? \\?\C:\MountPoint\.\\..\.\\. might be semantically meaningful to a filesystem mounted at C:\MountPoint.
Yup, I think it would be consistent with having only labeled initializers to drop this conformance. Iād be curious how often and in what scenarios users construct file paths out of whole cloth from literals. My uninformed guess is that itās not like a string or integer type where itās a sine qua non.
Oh, I can point to a bunch in my own projects. I write a lot of tools that do things like Subprocess.run(.path("/usr/bin/xcrun"), ...), which invokes FilePath.init(stringLiteral:). The tooling/scripting case shouldn't be underestimated here, and I think it would be a loss ergonomically to lose that conformance.
My gut feeling is that if we decide we want to keep it, the safest/least risky option is to give it verbatim behavior. I would wager that most folks are going to write literal paths like "/usr/bin/xcrun", not "/usr///sbin/../bin/./xcrun", so verbatim is already correct, and if for some reason they do write something nonstandard, they probably know what they're doing and can ask for it to be explicitly canonicalized.
It's unsurprising that "/tmp/foo" != "/tmp/foo/". Whether it's surprising that FilePath("/tmp/foo") != FilePath("/tmp/foo/") depends on whether you expect that FilePath(_:) is a normalizing operation.
Unlike /tmp////foo and /tmp/foo, which (to the best of my knowledge) are always interchangeable on all platforms worth considering, /tmp/foo and /tmp/foo/ are not always interchangeable (if nothing else, /tmp/foo/ can open foo if it's a directory, but not if it's a file).
If you want to treat these paths as equal keys but FilePath says they aren't, you have the option to normalize path before you use them (with operations that you have already pitched). If you don't want to treat these paths as equal but FilePath says they are, you need to write a FilePath wrapper whose equality also checks for a trailing slash.
What about interior . components? My understanding is that we could similarly coalesce away any . component that is not in leading position. Trying to add it anywhere except the start of a path that doesn't already start in one is a no-op. Upon construction, a trailing . component is dropped but the slash is preserved. Appending a dot component would put a trailing slash in if there wasn't one before, but not the ..
That is, a/.///./b/. is a/b/ and consists of two components: [a, b].
Thinking about this for longer, I am increasingly in favor of storing the trailing separator in a Bool, and adding methods to normalize it away:
init(_: String) /// Keep everything
init(normalizing: String) /// Remove consecutive separators and . components, or ask the OS if it provides special methods
init(normalizingAndRemovingTrailingSeparator: String) /// Does what it says on the tin, name provisional
let hasTrailingSeparator: Bool /// Controls whether e.g. String(_:) adds a trailing slash
func removingTrailingSeparator() -> FilePath /// Simply sets the above Bool to `false`, so that String(_:) does not add it.
Then, it is natural that FilePath("/path/to/") != FilePath("/path/to"), but FilePath("/path/to/").removingTrailingSeparator() == FilePath("/path/to")
Maybe we could even have func addingTrailingSeparator() -> FilePath for those extra special subprocess calls.
While I am sympathetic to both views on what == should do with a trailing-slash, equality should generally imply substitutability. If your leading thesis (great research btw!) is that the trailing slash carries meaning that must not be lost, then it should not be ignored by the equality operator. There will be a foot-gun either way; preserving substitutability seems less surprising to me.
I would very much like to fix the current situation, but I'm not sure how feasible it is. @kperryua would be the more authoritative person here.
My preferred (but breaking) scenario: For encoding, we prefer strings when possible, but we fall back to bytes when invalid Unicode or upon explicit request by the user. For decoding, we accept strings or bytes for the encoding.
Under that scenario, if the new FilePath's Codable conformance were to encode as a string, but then it were to be decoded by an older version of the conformance, it would be a kind of binary compatibility break. We could provide an opt-in to encode as the old version but then it's an imperfect solution. Instead, it might have to be the case that the new version is opt-in.
I'd like to hear a little more about the impact the existing conformance has on you.
I'd like to hear more of the community's opinion on this. My personal use of RawRepresentable has been in a fairly narrow sense: FileDescriptor's raw value is a 32-bit integer because it literally is one. Similarly, an ASCII bitset is a 128-bit integer. In these cases, foo.rawValue doesn't allocate or copy memory and leans hard on the "raw"ness. But I know the protocol is used more broadly, e.g. String-backed enums (which are integer indices into string tables).
For FilePath, a String raw value would be the most useful choice, but it's not lossless for paths with invalid Unicode and it would allocate and copy its contents over. A more "raw" type could be _SystemString, i.e. FilePath's storage type, which is a COW bag of null-terminated UInt8/UInt16 (depending on the platform). That would be the more faithful "raw" type, but it is pretty niche (and currently internal). The null-termination invariant also means it's not usable for Root or any Component, though.
The conversion being lossy would be a deal breaker for it to be RawValue = String. The documentation states:
With a RawRepresentable type, you can switch back and forth between a custom type and an associated RawValue type without losing the value of the original RawRepresentable type.
So if I have this,
let path: FilePath = /* some non-UTF-8 file path */
let samePath = FilePath(rawValue: path.rawValue)!
// ^ this *must* always succeed
To avoid introducing a new sort-of-a-string-bag-of-bytes type as part of this proposal, what if we kept the raw value type opaque?
extension FilePath: RawRepresentable {
public init?(rawValue: some Collection<UInt8>) {
// copy the argument into whatever internal representation makes sense
}
public var rawValue: some Collection<UInt8> {
// return that internal representation
}
}
Then _SystemString or something else could still be used internally, and all we'd promise to clients is that it's a collection of bytes.
One option is to drop starts(with:) and ends(with:) entirely, and add the missing non-mutating removingPrefix(_:) -> FilePath?. That is, we're providing an inverse of resolve-beneath and the query that starts(with:) was answering can be path.removingPrefix(base) != nil. For ends(with:) we could provide a removingSuffix I suppose, but I'm wondering if it would be useful beyond symmetry.
Adding ExpressibleByStringInterpolation on FilePath.Component would in effect turn
let component: FilePath.Component = "\(stem).\(ext)"
into the equivalent of
let component: FilePath.Component = .init(stringLiteral: "\(stem).\(ext)")
Both trap if the result isn't a valid component. The concern is that when dynamic values are involved, we'd prefer the developer to write something that makes the fallibility visible. Currently, that's written as:
let component: FilePath.Component = .init("\(stem).\(ext)")!
// or
guard let component = FilePath.Component("\(stem).\(ext)") else { ... }
I'm inclined to leave the ExpressibleByStringInterpolation conformance out.
Thatās an elegant solution, but I think RawRepresentable is mostly used with common/currency types as the RawValue. Offering an opaque RawValue feels a bit off to me as I mostly use the protocol to get a simpler/more easily transferable value, pass that along in code paths/storage where my custom type is inconvenient, and recreate my custom type in some other code again.
I retract my suggestion about RawRepresentable ā I had been thinking that since we can have FilePath.init(_: String) we should be able to have RawRepresentable<String>, but I hadn't considered that we also need a non-optional rawValue: String. I don't think the conformance makes sense given the documentation @allevato quoted.
My main usages of codable file paths are in a server application:
Reading config from JSON files. The config is defined as a Codable struct - Iāve had to resort to decoding a String and & adding a computed var which converts to FilePath to keep the JSON reasonably human authorable.
Passing messages between processes. I have a manager process which fetches jobs and inputs then passes them to a worker process. The worker is sandboxed & is informed of both the job to do and the local location of the inputs via JSON. This JSON then needs to have FilePaths. In this case itās much less important that the JSON is human readable, but Iāve still opted for en/decoding String and using a computed var for FilePath just in case anyone wants to take a peek.
In other words - the impact is, well, just that I donāt use their Codable conformances. Itās not the end of the world. But I do think the current state is an attractive nuisance. IMO it would be better to remove the conformance (or somehow warn on its use) than to have it in there. Codable conformances are just so easy to get locked into ā obviously Swift System has encountered this problem here, but it spreads to every codebase which conforms a FilePath-containing struct to Codable without closely checking the format.