URL Path on Windows

What are the intended semantics of the various path‐related methods of URL? Which methods are supposed to use a URL path separator (i.e. /), and which are supposed to use the platform path separator (i.e. \ on Windows)?

I ask because the behaviour of URL(fileURLWithPath: x).path == x changed between Swift 5.2 and Swift 5.3, where x is a Windows path containing \. Originally it returned true, but now it returns false, because the left side ends up with / instead. None of the documentation specifies what it is supposed to do, so I’m wondering whether the change was a bug or a bug fix.

Everything shouldreturn a URL-friendly path. The only time you should get back a path that you can use for Windows would be when you explicitly convert it via fileSystemRepresentation. The URL representation should always use /, it is only when you convert to the file system representation would you get a path with \.

3 Likes

So then the fact that init(fileURLWithPath:) still interprets \ as a path separator is a bug?

Are there plans to expose the \ variants as String instead of just UnsafePointer<Int8>? The following smells very bad and I don’t even know if the encoding is right, but it is the best I can piece together after several minutes of searching the web.

let thisFile = #filePath.withCString() { pointer in
  URL(
    fileURLWithFileSystemRepresentation: pointer,
    isDirectory: false,
    relativeTo: nil
  )
}

The parser should interpret \ and / as equivalent in URLs with "special" schemes (file/ftp/http/https/ws/wss, maybe also gopher) on all platforms. However, the parsed and normalised URL should always use /. Unfortunately URL appears to have platform-specific behaviour about how it interprets \ through both initializers:

image
image
(backslash is not being interpreted as a path separator on macOS, Xcode 12, Swift 5.3)

But Safari 14 behaves properly and accepts them both:

image
(Live URL Viewer)

In general, Windows hacks should apply to both platforms. For example, you also should not be able to pop beyond a Windows drive letter root, but URL still lets you (at least on macOS):


(No, I don't know why it keeps those .. components around, either*).

image
(Live URL Viewer)


* in fact, your browser and any servers or other applications will likely condense file:///../.. in to file:///, further warping the understanding of which resource you're actually trying to point to. Let's hope this doesn't result in reading sensitive configuration files or destroyed user-data! :crossed_fingers::four_leaf_clover:

image
(Live URL Viewer)

2 Likes

Sorry, I should have been clearer. Precisely this - when given the fileURL, it will normalize the input into the URL form. However, all non-FSR return values should be URL-form, and conversion to FSR is should give you the path to pass to the system.

3 Likes

Can you use the UTF-8 code page with Swift for Windows? Can it be set in the Swift runtime with _setmbcp(_MB_CP_UTF8), and/or enforced with a GetACP() == CP_UTF8 precondition?

Foundation for Windows uses UTF-8 in APIs such as CFStringFileSystemEncoding() and NSURL.init(fileURLWithFileSystemRepresentation:isDirectory:relativeTo:).

An open-source System.FilePath might also be useful.

Yes and no. Your input can be UTF-8, but you cannot use UTF-8 for the file system. Everything must be UTF-16. Foundation silently takes care of the conversation for you.

Foundation will do what it needs to do internally to make its API work, including converting paths to other encodings, but you should never have to.

As @compnerd noted: you can pass a string you receive from the user to URL(fileURLWithPath:) to turn it into a URL, and on Windows that can be a string that is a valid Windows path when typed. If you need to pass it to Win32/WinRT API, you can use .fileSystemRepresentation, which on Windows is defined to be the UCS-2 encoding expected by the SomethingSomethingW family of functions.

Swift System has abstractions for concepts shared by OSes, but for the more abstract concept of 'this is a location on disk where a file or folder can be read or written' in the core libraries, the currency type Foundation-and-higher levels use is effectively Foundation.URL when the scheme is file, just like on Darwin.

Public APIs taking/returning a fileSystemRepresentation: UnsafePointer<Int8> all seem to be using UTF-8 on Windows. Users of these APIs would need to convert to/from the ANSI code page, unless the UTF-8 code page will be a requirement.

There are also some internal APIs, taking/returning a wideFileSystemRepresentation: UnsafePointer<UInt16>, are these experimental?

Yes, this was the key piece of information that made sense out of everything for me. I hadn’t realized the initializer accepted either separator, and thought I needed to manually normalize them to whichever it expected. But everything appears to be working as intended, the design is sound, and nothing seems to be missing.

Just for final clarification about the platform differences, both of these are supposed to hold, right?:

// On Windows:
let url = URL(fileURLWithPath: #"C:\some\path"#)
assert(url.lastPathComponent == "path") // Were treated as separators.
// On UNIX:
let url = URL(fileURLWithPath: #"/some/path/with/a/\/in/it"#)
assert(url.pathComponents.reversed()[2] == #"\"#) // Wasn’t a separator.

I think that is supposed to be correct. If it doesn't hold, please file a bug.