SE-0529: Add FilePath to the Standard Library

We seem to have silently shifted topics from syntactic to filesystem-based APIs. So to be clear: do you agree that an API that syntactically maps ./wwwroot/../../../etc/passwd to ../../etc/passwd is necessary and appropriate for the standard library to expose? I would argue it is, because the alternative is clients hand-rolling insecure checks like guard path.components.first != ".claude" else { throw EPERM() } .

That’s not the same as resolve() against the filesystem, which does not solve the same problem and which I believe we are both dubious of. My first impression is that resolve() actually creates more TOCTOU problems than it solves—its existence literally creates a new point on the timeline between path construction and path usage. That’s why I asked @Michael_Ilseman for more context.

To expand on why I am dubious of resolve(): there is no such thing as the “true” path of a file, thanks to platform-specific filesystem features like NTFS directory junctions, Linux UnionFS, HFS+ directory hardlinks, APFS data volumes, and probably several others that I’m unaware of. So that can’t be the purpose of resolve(), especially if resolve() admits paths that name nonexistent files. If its purpose is solely to resolve symbolic links within the path, then I argue 1) that’s a classic TOCTOU vector and 2) see all the aforementioned features that filesystems have grown since the 1980s.

It makes sense to me that a hypothetical Filesystem.open(_: FilesystemPath) could return an object that can vend its own resolved path—that is, the path that the kernel resolved the path to at the time of opening the file. This collapses the middle and final points on the timeline, and allows clients to safely do perform additional checks on the resolved path without the risk of the file moving underneath them. Though it’s still unsafe to use this resolved path for further filesystem operations! So this “resolved path” type ought to be distinct from FilesystemPath in the type system, and the standard library ought to make it easy for the client to perform subsequent operations on the handle, rather than the resolved path:

// Administrator is responsible for the security of the
// directory named by the TEMP_ROOT environment variable.
//
// This isn’t something the stdlib can help secure; it’s
// part of the launch environment outside the
// process’s control.
let tempRootPath = FilesystemPath(getenv("TEMP_ROOT"))
let tempDirPath = tempRootPath.appending(["my_prog"])

// Trying to create the program’s own temporary
// directory will cause the kernel to *atomically*
// resolve the path and either succeed or fail.
// (This API is imaginary and platform-dependent.)
let tempDir: FilesystemHandle = try! System.makeDirectoryAt(tempDir, permissions: [.ownerRead, .ownerWrite, .ownerExecute])

// To safely create temporary files within tempDir,
// we *must* use operations on the handle.
// Recomputing a path from tempDirPath exposes
// us to someone swapping $TEMP_ROOT/my_prog with 
// a symlink!
let tempFilePath = FilesystemPath("file.dat")

// Again, this is hypothetical platform-dependent API
// that only accepts *relative* paths, which is a
// property that can be validated syntactically.
let tempFile: FilesystemHandle = try! tempDir.makeRelativeFile(tempFilePath)

// We can log the resolved path of tempFile so the
// sysadmin can find it later. This may or may not
// be the only path by which the file can be reached.
print("Opened temporary file: \(tempFile.resolvedPath)")
2 Likes

We seem to have silently shifted topics from syntactic to filesystem-based APIs. So to be clear: do you agree that an API that syntactically maps ./wwwroot/../../../etc/passwd to ../../etc/passwd is necessary and appropriate for the standard library to expose?

It seems appropriate but not strictly required in the first iteration to me. To me, it's less important than making FilePath with basic, uncontroversial functionality available. As I understand, there are reservations of making a syntactical resolution available without also bringing a file system resolution API. If that's the concensus opinion, then I'd be happy to wait and would prioritise getting FilePath in asap.

That’s not the same as resolve() against the filesystem , which does not solve the same problem and which I believe we are both dubious of. My first impression is that resolve() actually creates more TOCTOU problems than it solves—its existence literally creates a new point on the timeline between path construction and path usage.

Yes. Agree on all accounts.

To expand on why I am dubious of resolve() : there is no such thing as the “true” path of a file, thanks to platform-specific filesystem features like NTFS directory junctions, Linux UnionFS, HFS+ directory hardlinks, APFS data volumes, and probably several others that I’m unaware of. So that can’t be the purpose of resolve() , especially if resolve() admits paths that name nonexistent files. If its purpose is solely to resolve symbolic links within the path, then I argue 1) that’s a classic TOCTOU vector and 2) see all the aforementioned features that filesystems have grown since the 1980s.

Agreed

It makes sense to me that a hypothetical Filesystem.open(_: FilesystemPath) could return an object that can vend its own resolved path—that is, the path that the kernel resolved the path to at the time of opening the file . This collapses the middle and final points on the timeline, and allows clients to safely do perform additional checks on the resolved path without the risk of the file moving underneath them. Though it’s still unsafe to use this resolved path for further filesystem operations!

Yup, agreed, on UNIX at least opening a file/dir doesn't 'lock it in place' and therefore getting a path from a file descriptor isn't any better than fileSystemResolve(path).

So this “resolved path” type ought to be distinct from FilesystemPath in the type system, and the standard library ought to make it easy for the client to perform subsequent operations on the handle

100%, the API design matters a lot here and doing it correctly should be the easiest. I will point out that this can be difficult to achieve because on a lot of systems (macOS, for example), the default file descriptor limit is very low (256 per process), so doing things correctly by keeping all to-be-operated-on handles open and minimising the file descriptor usage to be maximally compatible can be at odds, especially under concurrency. Nevertheless, agreed once again.

1 Like

SE-0529 has been accepted with the modifications already discussed in this review.

Thank you all for your thoughtful contributions throughout this review.

John McCall
Review Manager.

1 Like