Hi
The next feature I'm working on for WebURL
is file URL <-> path conversion. This thread is to let you know what's happening, what the challenges are, and to discuss if there are any issues with the approach. I would be particularly grateful for any feedback on how I intend to handle Windows paths, as it's the trickiest platform and the one I have the least experience with.
The goal is that you can create a file URL from a local path, give it to an application which supports both local and remote resources, and that application will be able to reverse the transformation to end up at exactly the same local path.
The most challenging aspect of this is Windows support, as it has 3 different kinds of paths (DOS paths, UNC paths, and DOS device paths), each of which has sub-types with their own sets of challenges. I'm basing the implementation on Chromium's utilities, because there is a general consensus among participants in the URL Standard to align with Chrome when it comes to handling files on Windows. However, Chrome's implementation is not extremely well tested, so some aspects of its behaviour are not very well defined, and other aspects make it impractical for direct exposure to developers as API (e.g. you can create file URLs with hostnames on any platform, but those URLs will fail to round-trip back to paths on any platform but Windows). So some adjustments will have to be made and test coverage will have to be expanded.
I'd like for WebURL to incorporate modern best practices for security for reliability, and that means some kinds of paths won't be supported, sometimes depending on the platform:
-
Relative paths will not be supported.
URLs cannot express relative paths - for some URL
file:///foo
, there is no way to say thatfoo
should be resolved relative to the working directory of application processing the URL. This means that when creating a file URL from a path, we would need to resolve it in to an absolute path.Automatically resolving such paths has the potential to leak internal server/application details - for instance, if some part of the path contains data derived from user input, it may be possible to escape the intended filesystem subtree so that the resolved path points to an internal configuration file. This is known as a directory traversal attack.
In order to help mitigate these kinds of errors, it will not be possible to create a file URL from a relative path. No automatic resolution will be performed. Similarly, when creating a path from a file URL, if the result would be interpreted by the operating system as a relative path, the function will fail and throw an error instead.
If developers need to create a file URL from a relative path, they can use libraries such as
swift-system
or utilities provided by the operating system, such asrealpath
, to resolve the path relative to whichever base directory is appropriate. Those libraries can also provide utilities to check that the resulting path does not escape the intended subtree. I feel that there is value in keeping this as an explicit, separate step for applications which deal in relative paths.On POSIX, this is relatively straightforward. For path -> URL conversion, we reject paths which do not begin with a "/". For the reverse transformation (URL -> path), we don't need to do anything, since file URLs are assumed to have absolute paths and always start with a "/".
On Windows, things are more complicated. Fortunately, of the 3 kinds of paths, only DOS-style paths can be relative. Unfortunately, DOS-style paths themselves support 3 kinds of 'relativeness':
example relative to foo\bar current directory, current volume \foo\bar current volume D:foo\bar current directory on given volume Note that even though the last example includes a volume, the path after it does not begin with a '/', so it is considered relative to the current directory of the D: volume (every volume can have a different "current directory" on Windows). Leaving out that slash is a common source of bugs.
I intend that creating a file URL from any of these 3 paths, or creating any of these paths from a file URL, would fail on Windows. This means that, for instance, the URL
file:///foo/bar
can be turned in to a path on POSIX systems, but not on Windows systems. -
Creating a POSIX path from a file URL with a hostname will not be supported.
There is no obvious interpretation of a file host on POSIX systems, where remote filesystems are typically mapped to the local filesystem. We could just ignore the host, but RFC-8089 discourages this and Chrome stopped doing it because it could lead to spoofing issues (e.g.
file://accounts.google.com/some/local/file
andfile:///some/local/file
would display the same content). File hosts will be supported on Windows, where they will result in UNC paths. -
Creating a path from a URL containing percent-encoded directory separators will not be supported.
For instance,
file:///some/folder/..%2F..%2F..%2F..%2F../etc/passwd
is a valid URL. However, when we create a file path from a URL, we need to decode all percent-encoding (percent-encoding has no meaning in FS paths). This would result in the path/some/folder/../../../../../etc/passwd
which again, can leave users open to directory traversal attacks. We're following Chrome here, which determined that this was a security risk.Note that percent-encoded backslashes (%5C) are allowed on POSIX systems, as the OS does not consider them to be path separators. Windows supports both, so we need to block both on Windows.
It's worth noting that this behaviour is more restrictive than Foundation's URL type, which automatically resolves relative paths using the current process' working directory and allows percent-encoded directory separators. For the reasons given above, I consider both of these to be security risks.
Examples of Foundation doing things I think it probably shouldn't be doing
// Example: Foundation automatically resolves relative paths.
import Foundation
// CWD = /home/myservice/
let templateName = "../../../etc/passwd"
URL(fileURLWithPath: "templates/\(templateName)").absoluteURL // file:///etc/passwd
// Example: Foundation decodes pct-encoded separators
import Foundation
let url = URL(string: "file:///some/folder/..%2F..%2F..%2F..%2F../etc/passwd")!
print(url) // file:///some/folder/..%2F..%2F..%2F..%2F../etc/passwd
url.withUnsafeFileSystemRepresentation {
print(String(cString: $0!)) // /some/folder/../../../../../etc/passwd
}
(Note that in the latter case, .standardizedFileURL
will decode and simplify the URL to file:///etc/passwd
before it gets to the OS, but the regular .standardized
will not. So there is technically a way around it, but you need to know precisely what you need and go looking for it).
My hope is that this strikes a pragmatic balance, allowing as many file paths as we can reasonably and reliably support, while learning from some of the security risks that browsers have encountered and improving on the APIs that Swift developers currently have at their disposal. Each of these failure conditions will throw descriptive errors, so if they occur in your application, you'll have some guidance on what went wrong and how you can fix it.
Currently, implementation is blocked by some changes I'd like to make to the URL Standard in order to better support UNC hostnames, but I may release a POSIX-only update if it takes too long to make progress on that issue.
So that's it. What do you think - are these safeguards too restrictive? Have you ever been affected by a directory traversal attack? Or a bug stemming from Foundation's automatic relative path resolution? Do you know of any other edge-cases to consider? It's all useful signal, so I'd appreciate the feedback.
Thanks for reading!