Pitch: Add FilePath to the standard library

I was surprised by the behavior of startsWith, while documented I think it should have another name.

Otherwise +1 the more great currency types we have the better.

I think it’s a great idea to have a type per kind of path (windows, Unix etc) and a platform type alias. Reminds me of recent discussions about byte order Seems obvious to me that some program will need both but I never personally had a use for now.

1 Like

This is great overall, I need to check if there's any surprises with the specific APIs but overall this is a great taking a piece of System rather than the whole thing! Thank you for splitting it out of the System package like that :100:

1 Like

I just want to clarify one point: landing FilePath in the stdlib / toolchain in some module (whether "Swift", or "Paths", or even a module named "System") does not mean that System doesn't also belong in the toolchain in some form or capacity. What it does do is give us greater clarity on what System is and it untangles a cross-cutting (i.e. tech stack transcending) concept from the rest of the very nitty-gritty low-level concerns of System.

2 Likes

Would it be possible to do that work? In particular, I’m imagining something like implicitly inserting import System/System. when the deployment target is older than the Swift/OS version where this proposal lands?

Are you planning a follow-up proposal to bring the URL/FilePath conversion APIs to Foundation?

Would it make sense to expose a FilePath.Root() initializer on Unix-y systems, or do you see that as something that would lead to non-portable code in a bad way?

Why are all the remove* methods mutating-only? Have you considered removing* methods to go with them?

2 Likes

I concur. Perhaps something like “isChildOf(:)” or “hasParent(:)”?

I don't agree with this, or with the characterization of FilePath as a "currency type" similar to Array and Int.

I think, in general, a notion and representation of a path is closely tied to a specific API. The platform, zip files, HTTP, etc. each use their own notion of a path. On Windows, even within a single platform, the kernel internally uses a different notion of a path than user applications,[1] and for user applications, there are (as far as I know) two representations of paths: the 16-bit strings used in the Win32 API, and the 8-bit strings used in the C standard library.

FilePath is chiefly useful for directly interacting with the platform's file system. I can't think of an application that would use FilePath without also using System or a thin wrapper over it. I feel like, if the platform's file system is abstracted away, FilePath would lose its usefulness.

There are probably some good reasons to abstract away the platform's file system. One reason is correctness and security: as fclout argued, it's potentially dangerously error-prone to directly interact with the file system. So I think, ideally, the file system would be encapsulated in a safer abstraction suited for the application domain (unless the domain is directly interacting with the system, such as a command-line tool). For example, an app framework could have an API for accessing bundled resources and storing user data, or a web framework could have an API for storing user-uploaded files. Maybe databases can also be seen as a safer abstraction over file systems.

In contrast, Array and Int are used across a wide set of APIs regardless of abstraction level. So, I think it makes sense for FilePath to remain closely tied to System.


It's possible on Windows for a path traversal vulnerability to involve not .., but device names like COM1 and AUX, which similarly exist in every directory prior to Windows 11.[2] Path traversal vulnerabilities using Windows device names are classified as CWE-67. FilePath.lexicallyResolving should probably have to handle this.


  1. ↩︎

  2. ↩︎

6 Likes

Fully agree we need to be able to model Windows paths from POSIX systems and visa versa. I know a lot of people are fans of Python’s pathlib library which does a great job of modelling that.

4 Likes

Thank you! Yes, please, definitely +1, we need a type to hold file paths in the stdlib.

The testability concerns that @compnerd raised are real though. It would be really great to be able to run a test suite with Windows paths on Linux/macOS.

9 Likes

Sorry for the delay in response, I wanted to think very carefully about some of the details here and do some more background research.

I think it's now very clear to me that if we are to propose or add any kind of lexical resolution method, we should also add the "real" resolution method(s) and bless that with the better name. Such method would be throwing as they consult the kernel and file system. While the pitch is careful to put "lexical" in the names, we should move full resolution out of future work and into the present.

This is a subtle but very important point. Just as Windows (Win32) paths have their own custom roots and root information, so too do Darwin (XNU) paths. I went ahead and spelunked in the open source XNU implementation and produced this analysis: XNU paths · GitHub

I think that FilePath should model these as part of the Root on Darwin and that full-resolution methods should take parameters governing whether to set e.g. /.nofollow/. The default should probably be that it is set on resolution.

This means that resolution status is two bits: was it resolved in the past and does it have the nofollow path prefix. Linux can only support the former: at some time in the past it was fully resolved, but that status is dependent on file system state.

We'll want to think through the mutation story a little bit here, but in general mutations clear the was-it-resolved bit.

We absolutely do want a Hashable (and similarly Comparable and Equatable) conformance with sensible semantics, noting that it is not the goal of these conformances to solve the path-equivalence problem. They are still purely syntactic and the developer can resolve them before sticking them in a Dictionary.

Yes, I think we want to put more work into resolve-beneath operations here for any method directly on FilePath.

That being said, FilePath.Components is an RRC (i.e. algebraic) view of the syntactic components of a path, and thus has the generic Collection algorithms implemented syntactically. I think this is fine.


Another thing I found was that we probably want to parse Named forks correctly on Darwin. Similarly, /.vol/ prefixes.

5 Likes

Yes, named forks were one of the few other things I had in mind. This is actually kind of a blind spot for file paths. If you do mkdir -p forks/..namedfork; cd forks; touch ..namedfork/rsrc:

  • form inside forks, you can access ..namedfork/rsrc;
  • from the outside (e.g. if your path looks like forks/..namedfork/rsrc), the filesystem will try to find a resource fork on forks and path resolution fails.

As a result, you can create a file that cannot be opened with an absolute path.

NTFS has named forks too: they call them “alternate data streams” and the syntax is path\to\file.ext:streamname. Unaware applications are often open to serious security bugs if they conflate the stream name with the extension.

I would really like to see what a file path design built around openat looks like. I imagine that instead of starting with a prefix FilePath that you append components to, you would start with a PathBase (representing a directory FD) that lets you open files or other PathBases based on path components relative to them, or something like that. There would probably be more emphasis on open files than file paths.

5 Likes

I would be extremely grateful for this change.

For apple/container and apple/containerization the bulk of our filepath manipulation we done with String and URL and neither are comfortable to use. The URL type has a lot of evolutionary bulk which has made it confusing to use, and if you need to work with relative paths, it’s not really the right type.

We plan to move everything related to filesystem paths to FilePath in these projects over the coming months, and since me make liberal use of the Linux static SDK, having FilePath as “batteries included” would work really well for us.

My preference is that the core FilePath be structural only, lexical path manipulation, zero OS interaction.

The types that interact with the OS via filesystem paths should be near at hand and ergonomic, and it’d be great to have an intuitive correspondence between FilePath-based operations and FileDescriptor-based counterparts for TOCTOU-safe work without having to dive down to ...at(2) system calls.

13 Likes

I don't think that my suggestion would be that big of a departure from what apple/container does (I haven't looked at containerization yet): you already carry around directory URLs and try to open things relative to them. That's exactly what openat is very good at, except that you would pass around a base directory file descriptor ("PathBase" in my 5-line pitch) instead of an URL.

I would like to see how that plays out because there's several TOCTOU bugs in apple/container that API design can prevent. For instance, BuildFile.resolvePath gives you either a path to Dockerfile or Containerfile based on which one exists when it runs. By the time you get around to opening the file, which one exists can have changed from under your feet. Or, in DirectoryWatcher, you check if the file exists and is a directory before opening it, but the check is unnecessary at best because that state can change between the time you check and the time you open it. Ideally, you would open the file and verify that what you opened is a directory. These two-step checks are encouraged by APIs that put the focus on file paths rather than file descriptors.

I'm not saying that this is greatly significant for the container project (in general I think these just lead to a slightly worse error than if the first check failed), but the implication of moving FilePath as is to the standard library are that:

  • it is good enough for almost everybody;
  • SDK API surface is encouraged to vend FilePaths.

This is significant for a lot of projects. We see a lot of security bugs caused by mishandling of file paths. I would take FilePath with the changes that Michael is on board with over FilePath without them, but I think that there's a safer and faster design around the corner that we are missing out on in favor of old habits.

I think that you are trying to avoid syscalls for performance reasons. I want to point out as well that checking if a file exists before trying to open it is at least twice as many syscalls as just opening it.

No promises on whether I'll have any time to do that soon, but I'd be interested in seeing what it would take to move container to use less of FileManager

1 Like

I'm not saying that this is greatly significant for the container project

Those examples, maybe not, but this one I think illustrates your point quite clearly.

I think that you are trying to avoid syscalls for performance reasons.

Not at all. It’s because the code was developed by a team with developers of varying experience with this particular security concern, and it takes time (and ideally contributions from those with significant experience) to evolve it into a better state.

I think that we agree that it should be easy to have types right at hand that allow us to think in terms of paths but act in terms of fds, but differ on what the concerns of “the most basic FilePath” are. My opinion on that’s less strong than my desire to be able to use FilePath without import System(Package)?

one thing that always infuriated me about swift-system’s FilePath.Component is the way it is ExpressibleByStringLiteral but not ExpressibleByStringInterpolation.

i understand why an API designer might be tempted to outlaw string interpolations in this domain, but i find it just gets in the way in practice and offers little in real safety benefit to compensate.

My preference is that the core FilePath be structural only, lexical path manipulation, zero OS interaction.

My first answer to “why this preference?” is “what if I want to reason about filesystem pathnames that don’t correspond to any mounted filesystem?”

The same problem exists in reverse though: the results might be incorrect if you do not consult the file system.

Consider something like: C:\Application\Data\Private\Link\... If you normalize this, you end up with C:\Application\Data\Private, but Link might be a junction and puts you at D:\. As such, C:\Application\Data\Private\Link\.. should give you D:\, but lexical processing gives you C:\Application\Data\Private and you suddenly can end up leaking data.

Or if you want something more Windows specific: a minifilter driver could alter file access via IRP_MJ_DIRECTORY_CONTROL - so early normalization may leak content that the minifilter driver hides. And there is no mechanism for determinism, it can be done on a per process/per thread/per call.

The fundamental difference in the Windows FS model is that no user-mode path resolution is correct, you risk incorrect results if you do not round-trip through the kernel at each point.

1 Like

I agree that even if it's pared down entirely to filesystem use cases, you still need to have file paths that don't correspond to any file, if for no other reason that it's how you create files.

It's also useful to be able to work with paths as they are typically represented in archives and URLs, but this comes with the very important caveat that FilePath as proposed is a native file path, and native file paths may have important differences from (say) a file path that you would find in a tar archive, or what subtleties a command-line tool expects. Using FilePath in that way is probably wrong at least on some platforms (and therefore should probably not be encouraged).

In general, I think I'm supportive of more ergonomic tools to build relative paths, parse components from a native path of some flavor, and serialize as a path of some flavor. We should generally not use strings to decide if a path contains another one (or at least this is very annoying to do reliably in a cross-platform way).

1 Like

Not really answering anything, just throwing wood into the fire and documenting some things from an offline conversation. My perspective, I know that Python's os.path Windows file path handling implementation ends up changing pretty frequently, and pretty much gives up on any sort of resolution if you're not actually on Windows, but there are plenty of places where Python has been very useful, even if it doesn't get it right 100% of the time (Something something systems can be complete or consistent something something). Python also has the newer pathlib library that separates the notion of a "pure path" from a "concrete path", where a pure path is for computation, while a concrete path has I/O operations and makes syscalls, which I think could be used to help alleviate some of this.

Consider something like: C:\Application\Data\Private\Link\.. . If you normalize this, you end up with C:\Application\Data\Private , but Link might be a junction and puts you at D:\ . As such, C:\Application\Data\Private\Link\.. should give you D:\ , but lexical processing gives you C:\Application\Data\Private and you suddenly can end up leaking data.

I don't know about the minifilter, but the junction behavior seems to be generally consistent with symlinks on Unix/Linux. I set up the directory structure (minus the private subdir) on Windows and pointed it at my CMake directory because I don't have a D: drive.

C:\Application\Data>dir
 Volume in drive C is Windows
 Volume Serial Number is 90F9-FA94

 Directory of C:\Application\Data

02/27/2026  01:34 PM    <DIR>          .
02/27/2026  01:23 PM    <DIR>          ..
02/27/2026  01:34 PM    <JUNCTION>     Link [C:\Program Files\CMake]
               0 File(s)              0 bytes
               3 Dir(s)  598,418,132,992 bytes free

C:\Application\Data>cd C:\Application\Data\Link\..\

C:\Application\Data>dir
 Volume in drive C is Windows
 Volume Serial Number is 90F9-FA94

 Directory of C:\Application\Data

02/27/2026  01:34 PM    <DIR>          .
02/27/2026  01:23 PM    <DIR>          ..
02/27/2026  01:34 PM    <JUNCTION>     Link [C:\Program Files\CMake]
               0 File(s)              0 bytes
               3 Dir(s)  598,418,132,992 bytes free

Jumping back up one in the command prompt takes you back to C:\Application\Data.

Just to make sure that it's not command prompt being clever, we have this little C program:

#include <stdio.h>
#include <windows.h>

char buffer[1024];

int main(int argc, char** argv) {
    buffer[1023] = 0;
    const char* directory = "C:\\Application\\Data\\Link\\..\\";
    SetCurrentDirectory(directory);
    GetCurrentDirectory(1023, buffer);
    printf("Current directory: %s", buffer);
    return 0;
}
PS C:\Application\Data> clang .\hello.c
PS C:\Application\Data> .\a.exe        
Current directory: C:\Application\Data
PS C:\Application\Data> 

I know that programs mess up symlinks all of the time, and hard-links are even worse, but I think that as long as you stay away from trying to actually resolve things,

That all said, if you ask the kernel to return the filepath to the file from a file handle, I would expect that to completely ignore symlinks and basically just walk the FS tree from the inode to the root (or however it wants to store that data). Using GetFinalPathNameByHandleA on Windows, struct kinfo_file on FreeBSD, and the /proc/self/fd dance on Linux should (and does seem to) return the fully resolved filepath, which can be confusing if you're using your current directory, which may not be fully resolved.

It looks like Foundation does use this, which raises some questions about how we want to interface this type with the Foundation behavior. Maybe we need to design in separate types for "resolved" vs "unresolved" filepaths? Unresolved filepaths can textually collapse things all day long, but can't be compared with resolved filepaths, which actually need to map to something on the disk. Looking at Python again, the pathlib does kind of have this notion with PurePath being purely a syntactic thing, and Path being an actual "concrete" path. You can use pure paths to construct paths for other operating system, but concrete paths only work on the OS that you're on since they involve making syscalls.

1 Like

If I'm understanding the (fascinating!) discussion here we're suggesting that there should be some distinction between an abstract path (eg. a CLI input from the user, or the program looking for a config file in a standard location) and a "resolved" path (which is acquired by comparing an abstract path, component-by-component, with the actual current content of the filesystem).

But isn't that in and of itself just creating a TOCTOU bug? Unless a resolved path holds open whatever it once referred to, there's no guarantee that any of the filesystem checks done during resolution is still valid when it comes time to use such a path?

I can understand that certain manipulations of abstract paths are non-obviously security-sensitive; I just don't see how anything can make that any safer. Is this problem just completely intractable? What would a safe filesystem API (that also allows for things like CLI arguments specifying absolute or relative paths) look like?

(edit) I'm also comparing to Rust, whose Path more or less matches the current FilePath, design-wise — it's an abstract path that can be manipulated without reference to the filesystem. And maybe my duckduckgo-fu is weak, but I'm not finding a ton of people screaming about the insecurity of path handling in Rust?

9 Likes

"resolved" path (which is acquired by comparing an abstract path, component-by-component, with the actual current content of the filesystem).

But isn't that in and of itself just creating a TOCTOU bug? Unless a resolved path holds open whatever it once referred to, there's no guarantee that any of the filesystem checks done during resolution is still valid when it comes time to use such a path?

Correct.

5 Likes