Foundation.URL sidesteps this nicely: rather than appending path A to path B, you resolve path B relative to path A. With that framing, the situation where B is absolute is less of a surprise. Unfortunately, I have no idea how to spell this as a mutating operation on A…but if someone comes up with a good name I think append
is worth replacing.
I’m excited to have low-level Swift-native path operators!
Is there some situation where path.isAbsolute == path.isRelative
?
If not, it makes more sense to me to keep only one of these (I would suggest isRelative
since it’s the diminutive case), or — more clunkily — to have this return an enum { .absolute, .relative }
.
It’s not Swift library or Cocoa standard to have two opposing flags for the same underlying value.
-Wil
Just want to make sure that you're aware of GitHub - mxcl/Path.swift: Delightful, robust, cross-platform and chainable file-pathing functions.
It does a very similar job, is also cross-platform, and has been through several iterations.
I think there is probably a lot of knowledge and experience which has been embedded in those updates.
Correct, Windows is interestingly different here. It has the notion of 26 volume roots (A - Z). Each one is a separate stack, and you explicitly change between them. However, the operation occurs on the appropriate volume. A change of directory operation is always rooted at the currently selected volume, but an absolute path (which includes the volume, as opposed to a root-relative path which is not absolute) performs the operation on the specified volume without changing the volume selection.
I can see see why it may be confusing for some to append a "/bin" to concatenate the string literal, and I can see the use of it too - you may have a path which you are not guaranteed is path separator delimited and wish to append a component to it.
I feel like the current behaviour makes sense, though I could also see an argument for appending an absolute path to an existing path to be an invalid operation and perhaps we should make this operation throw in that case. (Seems that @finestructure made the same comment)
Relative paths are an interesting subset of paths (not covered by this initial proposal). There are operations which could benefit from a more thorough modelling of the behaviour of relative paths. By keeping both here, we have the opportunity to extend this in the future with a richer vocabulary for working with relative paths. In fact, if you look at swift-tools-support-core, you will find that it found it useful to distinguish AbsolutePath
and RelativePath
.
This "duplication" is an expression that relative paths are relative paths rather than paths which are not absolute.
Maybe .appendRelative(...)
or .append(relative: ...)
?
Although it should probably still throw if the path is absolute.
In general I think this looks really great! I haven’t had time to dive really deeply into it yet, so I apologise for this in advance if it’s addressed, but I do have a specific feature I’d like us to consider adding here.
Path traversal attacks are one of the most common logical security flaws in software. They regularly affect server code, and also sometimes crop up in somewhat unexpected places elsewhere. These are usually triggered by resolving a file path derived from remote input and making one of two mistakes:
- Not validating that the resolved path is within one or more valid directories, or;
- Not normalizing the resolved path in the way that the OS will, e.g. forgetting to collapse
..
or mishandling drive letters
While we can’t defend against users forgetting to do this, we can make it easier for users to avoid these pitfalls by having a FilePath.contains(otherPath:)
or FilePath.isContainedWithin(parent:)
or some similar method. This would automatically normalize the paths and then perform the appropriate comparison. This would make it much easier for programmers to defend against this surprisingly common bug form in a genuinely effective way.
It's a bit hard challange for me the Swift language
I like the idea of throwing if we keep the API as-is. If we can avoid that with a different API, that would be best, especially for any future APIs that interact with the file system. C+17 has those operations as free functions in the filesystem
namespace, while Rust has them methods on Path
. If we do end up putting those operations directly on FilePath
as methods, it would be nice if throws
was a syntactic hint that an operation consults the filesystem (in addition to being called out in the documentation).
But, I do like the idea and also like calling it something other than append
. I went with append
because Swift uses it for similar operations (albeit over homogeneous collections) and there was precedent in C++17.
Beyond broad precedent in other languages, I think we do want some operation that has this cd
-liked behavior. The operation models the path as a stack that we push batches of values onto, where there's a special sentinel value that clears the stack prior to pushing the rest of the batch. I'm leaning heavily towards a name that has "push" in it, like Rust's. Hopefully a better name there will avoid issues with "combine" or "join" in other languages, for which there is significant user confusion.
public mutating func push<C: Collection>(_ c: C) where C.Element == Component
If we can split root off from Component
, then there is no issue -- we've hoisted the "special sentinel value" consideration into the programmatic level (users explicitly clear the path prior to a push). If we cannot, then at least there will be a deliberate choice between components
and relativeComponents
(which was future work, but could be justified if it eases this problem).
Another observation is that this behavior seems to be the most surprising when using strings or string literals. I wonder if we should have an overload with looser semantics for a String that will ignore a leading slash.
Yes, I also saw this in Java, though the word "resolve" often carries connotations of resolving symlinks as in pathlib.
That was definitely considered and debated. I think it depends on how firmly reified the opposing values are in the given domain. On one extreme, use of the word "not" is a strong hint that you should not name the opposing values (e.g. isNotEmpty
would be avoided).
I feel like relative and absolute paths are strongly established concepts on their own. Even though they can be defined as "not absolute" and "not relative", they tend to be used and documented with this established terminology. It's also not always clear which should be the canonical name, e.g. FilePath.relativePath
and a ComponentView.relativeComponents
exists.
(This is all under the assumption that "absolute" means "fully-qualified" on Windows, otherwise they are not opposites.)
For example, the components of a relative path can be treated generically as a RangeReplaceableCollection
as it is effectively homogenous, which is not true of absolute paths. And again, we'd have to be explicit about Windows roots (which can be present on a relative path); such a type would just be the components beyond the root.
I am definitely aware of that! It will be more relevant as a comparison when we tackle APIs that interact with the file system.
I am very interested in guarding against this and any way that we (reasonably) can. For example, we normalize separators on construction which includes dropping any trailing separator. If that means rsync
needs to model different behavior by adding a Bool
rather than munging it into the path representation, that's for the better.
(Similarly, it's worth it even if it makes super sketchy Windows practices involving directories named after DOS devices more explicit).
Beyond that, we are very explicit what normalizations are lexical (i.e. do not consult the file system) and which are not (those are future work).
Would this be a lexical normalization or would this be realpath
? There are multiple levels of "normality" that might be desired by a tool (taken from the gist):
Generally it is sufficient to use lexical normalization. OWASP's guidance uses the Java normalize()
function as an example of something to use to combat path traversal, and that's really what we're shooting for.
The presence of a symlink is always going to be a bit of a wrinkle, but in general while symlinks do allow escaping from the specific directory tree they don't allow arbitrary path traversal. They glue a different directory tree into this one. You're only at risk if for some reason you symlinked /
into your directory tree, and that's a pretty bad idea to begin with.
This is definitely not sufficient to fully protect oneself, but it's a good tradeoff of performance (no syscalls!) and safety. If we make it easy to do, we can incentivise users to actually do it.
I know you're exploring other options as well, but throwing on append
strikes me as a little heavy handed. If appending a rooted path is disallowed, what about returning a boolean like SetAlgebra.insert(_:)
and Collection.formIndex(_:offsetBy:limitedBy)
?
If appending a rooted path is disallowed, what about returning a boolean
We need to define some behavior in this situation, and no choice is a clear cut winner for me. We could
- Trap if the argument is absolute
- Silently ignore the root (surprising in programmatic use)
- Silently drop the base (surprising in string literal use)
- Return a bool describing what we "silently" did (either 2 or 3 above) for those studious users that would check it.
So this would be a "do nothing, return false" variant.
Ya I think I'm just voting that throwing feels like a disproportionate response for a special case of a syntactic operation. Do we have any guildelines written down for when to throw errors vs return an Optional/Bool?
Update: I'm working on a new prototype and some API changes that explore a bolder, but I think cleaner, solution.
-
Root
is a separate type fromComponent
, i.e. components are non-root folder/file names- This also makes it clearer for future Windows root parsing/analysis APIs
-
FilePath.init
s taking components now have an optional root parameter with argument label.- In my use so far, these more explicit initializers seem to improve clarity.
- Also adding
FilePath.init(root: Root?, components: Component...)
-
FilePath.ComponentView
is similarly a view of the relative portion- It now conforms to
RangeReplaceableCollection
for less common but useful things like insertion-in-the-middle - It no longer has
root
,relativeComponents
, ordirname
, as roots are irrelevant to this view - It no longer has
basename
, which is now just.last
- It now conforms to
-
FilePath.components
might be renamed toFilePath.relativeComponents
, to avoid confusion -
FilePath.Component
has aKind
enum and we dropisParent/CurrentDirectory()
-
append
/pushLast
overhaul:-
pushLast(_:Component)
is renamed toappend(_:Component)
, which cannot be a root - We add
append(_:String)
which can treat a leading slash as a separator instead of a root, if applicable - Previous
append(_:FilePath)
is renamed toFilePath.push(_:FilePath)
- This implements the
cd
-like stack semantics, where pushing a root clears anything before it in the stack.
- This implements the
-
- Some renames:
stripPrefix -> removePrefix
,popLast -> removeLast
Some questions I'm considering:
- Should
FilePath.Component
be renamedFilePath.RelativeComponent
?- That seems onerous: there's nothing "absolute" about components themselves. There's more so the question of root-or-not.
- Should
ComponentView
beRelativeComponentView
?- It's rare to type the name itself and this is a view of the components of the relative portion
- Better name for
FilePath.push()
's cd-like, file system navigating semantics?-
navigate(to:)
sounds like something is happening beyond a syntactic modification of the path
-
- We're going with
lexicallyNormalize()
andisLexicallyNormal
, which mirrors other words likecanonicalize()
andisCanonical
, but there could be an argument forisLexicallyNormalized
. - What happens to
componentStrings
?- It's convenient (even more so when root is separated out), but is the name confusing?
- Alternatively, it could be dropped or deferred.
- The
dirname
/basename
naming debate-
basename
is now simplyrelativeComponents.last
, but makes sense to keep if we keepdirname
-
dirname
has precise semantics, while something likeparent
would mislead (worse) for a path like/bin/ls/..
- Swiftifying the name to
directoryName
would establish a false intuition that it is a directory, or even the directory immediately above (bin
inusr/bin/ls
) and would put distance from its precise semantics
-
- Non-global-namespace for
PlatformChar
andPlatformUnicodeEncoding
- Don't necessarily want to "steal" the name "Platform" from downstream code, but it makes sense for System
- We're likely to add a
enum CTypes {}
soon anyways, so we could host them there.
- Consider adding
lexicallyContains
or similar as @lukasa requested - Ability to construct Windows-style paths on Unix and vice-versa
- This would be useful for scripts that output paths as text for different platforms
- While these could be print methods, they wouldn't allow for Windows root behavior on Unix
- E.g.
\\server\share\
is just a root, soremoveLast
would return false on Windows
- E.g.
- Internally, we test by having the choice be contextual, driven by thread-local storage. This has some problems if we make it general use:
- Thread local storage would be incompatible with task-based concurrency
- Even if we use (proposed future) task-local storage, we have to worry about escaping paths
- Storing a bit on FilePath complicates initialization and conversion
- E.g. what should a string literal use?
Update: I'm working on a new prototype and some API changes that explore a bolder, but I think cleaner, solution.
I very much like these changes. The whole API is impressive so far, and separating the FilePath.Root
concept into its own type makes a lot of sense to me. I also like the rename of popLast
, since pop
to me implies removing and returning a portion of the collection.
Non-global-namespace for
PlatformChar
andPlatformUnicodeEncoding
Has hosting these in FilePath
been considered? FilePath.PlatformChar
is a bit wordy in a vacuum, but makes sense to me as "the smallest unit of a file-path string on the current platform that I can type on a keyboard." FilePath.PlatformUnicodeEncoding
is similarly descriptive. Does Windows or Unix use different unicode encodings for different contexts? If not, is it possible that Swift may in future support a platform that does? Nesting these under FilePath
expresses the file-path context for these abstractions now and in the future.
We might rename these to FilePath.Char
and FilePath.UnicodeEncoding
for brevity. Similar to how Swift's Int
type has an architecture-dependent size, FilePath.Char
would have a platform-dependent underlying type, and therefore size. FilePath.UnicodeEncoding
seems also to fit that paradigm to me.