Portability of Swift Package Manager

I managed to come back around to take a look at getting swift-package-manager working on Windows. It seems that some things have gotten better and some things are moving in a direction of being easier to work with.

One of challenges to porting s-p-m to Windows that still remains frustrating is the path representation. swift-tools-support-core has a set of types that work with the file system representation and try to perform operations on them. The problem is that there is a ton of logic which has been baked into a particular representation.

Paths on Windows are far more complicated. As an example, the following set (probably although not definitively without being able to look at the particular hard disk and computer) refer to the same file:

  • \??\S:\BinaryCache\Release\Windows-x86_64\swift-package-manager\bin\swift-build.exe
  • \\UNC\S:\BinaryCache\Release\Windows-x86_64\swift-package-manager\bin\swift-build.exe
  • \BinaryCache\Release\Windows-x86_64\swift-package-manager\bin\swift-build.exe
  • S:/BinaryCache/Release\Windows-x86_64\swift-package-manager/bin/swift-build.exe
  • \\?\S:\BinaryCache\Release\Windows-x86_64\swift-package-manager\bin\swift-build.exe
  • \\.\Devices\HardDisk1\BinaryCache\Release\Windows-x86_64\swift-package-manager\bin\swift-build.exe
  • \\UNC\localhost\S:\BinaryCache\Release\Windows-x86_64\swift-package-manager\bin\swift-build.exe
  • S:\Binary~1\Release\Window~1\swift-~1\bin\swift-~1.exe
  • S:\Binary~1\Release\Window~1\swift-~4\bin\swift-~2.exe

And there are probably other representations as well (I can think of a couple more). Each of these behave slightly differently and have different limitations.

Currently, given the previous attempts I continued down the path of trying to replace a function at a time, but this is getting really frustrating to find and alter. Eventually, we will end up with multiple paths for different representations. More so than that, these paths are conditionally compiled. This means that changing one side could break another. Addition of new interfaces would break other targets. Although some of that is possible to catch with CI, iteration becomes a problem. Not only do you need specific tests for each of the targets, you also need to have multiple builds to ensure that everything is tested in all the combinations. This becomes increasingly more difficult and makes it a problem to support. Additionally using all the platform dependent APIs makes it harder to change things for a platform that you may not be as familiar with.

There is however a universally accepted encoding of these paths - URLs. The idea being that you can represent the path as a URL consistently and then convert to the file system representation before calling into the system. This would mean that we could have a uniform implementation and representation that works across all platforms.

I would like to suggest that we keep the interfaces as is but switch the implementation to use URLs and then convert to the file system representation when making calls to the system. This really would simplify the code base and make it easier to port.

CC: @Aciid

13 Likes

CC: @Torust (since he seemed interested in helping see s-p-m on Windows)

How much does Foundationā€™s URL type already cover here? I was thinking it might make sense to standardise on that, either as an implementation detail or eventually as the interface type - itā€™s what Iā€™d generally expect to use in cross-platform Swift code.

I would say a good 95% is a good starting point. Although in all likelihood, it is probably closer to 99.9%. The URL path was updated for all RFC insanity associated with path representations and has associated tests.

1 Like

I quickly started a URL-based implementation to replace PathImpl. A few issues I've noticed so far:

  • 10.11 becomes the minimum deployment target for macOS.
  • URL requires knowing whether a path is to a folder or file at construction time. This is an issue for e.g. appending c/d to /a/b when /a/b has been standardised to file:///a/b rather than file:///a/b/; appending to the former results in /a/b/d rather than /a/b/c/d.
  • Rerooted file systems are kind of a weird concept to handle with URL since URL assumes absolute paths ā€“ should still be possible, however.
  • NSString.standardizingPath doesn't handle relative paths of the type "../abc/.." and a few other edge cases ā€“ we might still need a custom solution for that.

I do wonder whether the TSC test suite is over-specified for what SPM requires from AbsolutePath/RelativePath, so it might be an easier approach to replace AbsolutePath and RelativePath with URL in SPM than to try and reach feature parity for the purposes of the test suite.

2 Likes

One thing I feel very strongly about here is that we must retain a very performant implementation. Path manipulation is a very common operation for build systems, and we've frequently seen it show up in performance profiles. A strict requirement of any changes we make here is that the performance must be fairly close to our existing implementation (ideally better, as the current one still has known room for improvement).

IIRC, we already have some performance tests for the Path class, I would suggest as any tests are made make sure to run the performance tests of the old and new representations and compare.

In general, I am highly ambivalent about the stated need to move to URLs to solve Windows path problems. I have worked on many cross platform, Windows supporting, code bases that did not need to move to a URL representation just to solve the Windows path issues.

1 Like

Swift/LLVM uses paths internally, right? (not URLs). Is it wise to interpret these edge-cases differently to how the compiler itself will interpret them? What about things like IndexstoreDB and SourceKit?

It seems to me that paths are like ASCII - fast, simple to use, but with lots of corner cases and portability issues that can give weird results. URLs are like Unicode - slower, and more awkward, but gives more accurate results.

2 Likes

That's correct... it isn't actually likely to cause problems, but it does indicate that using URLs is not a requirement to solve the problem. LLVM has its own filesystem/path abstractions that have grown over the years to accommodate Windows (to a sufficient degree to support heavy use of Clang on Windows).

It's not even more accurate, IMHO. I think it's actively harmful to conflate a general purpose URL with situations where it must be a file path.

1 Like

I didn't mean to imply that URLs are the only solution. However, the file system abstractions would need to be designed with portability in mind. The construction of the path and the handling of the path needs to be careful about the differences on different platforms. How do you deal with something like Macintosh HD:Users:compnerd:Desktop:file.txt vs /Volumes/Macintosh HD/Users/compnerd/Desktop/file.txt vs \\?\Volume{a731a2e0-b5a7-4511-9562-029098723ab1}\Users\compnerd\Desktop\file.txt. URLs have already solved the problem. If you want to implement another abstraction that clearly works across all the platforms, that is entirely plausible. However, the tradeoff is the amount of effort that requires.

To be abundantly clear: my opinion is that the current state of the path representation is not designed such that it would enable a stable port to Windows and that trying to effectively construct a layer to mimic the existing implementation for Windows will make the port needlessly complicated and onerous to maintain. I am not tied to the implementation being Foundation or URLs, but would like a technical solution which can accommodate the different behaviors of different platforms in a uniform manner.

2 Likes

Is it possible that SwiftPM is usable in some capacity on Windows (perhaps with awkward limitations, like you can only use it on certain drives), prior to this support being in place?

That to me is an acceptable compromise versus adopting an abstraction that potentially degrades performance for all users, while a long term solution to meeting performance goals & full Windows support is developed.

1 Like

I don't think thats really viable. The disk layouts are quite varied and different between development and release scenarios. It quickly becomes a problem to use if you limit it to certain layouts. In fact, the differences between the CI and development environments alone would mean that it starts to fall apart.

1 Like

Please donā€™t bump this. There is currently no way to conditionally import a dependency based on the deployment target, which means clients are forced to either drop SwiftPM or bump their targets too, even if SwiftPM is only necessary for some features. For now it is better to mark everything in the module with @available than to bump the minimum target in the manifest.

1 Like

@Karl, I think that the people who work on swift-package-manager are the ones who would suffer if we went with the approach of letā€™s treat paths as the compiler does. The abstraction of a path there is:

struct Path {
  const char *ptr;
  size_t length;
}

Most others will treat paths similarly I imagine. The compiler and other programs will use different APIs as appropriate, but usually do not go much beyond open and unlink. I donā€™t think that swift package manager can deal with files at that level. The package is inherently about path management, and needs to have a different view than the compiler.

2 Likes

Thatā€™s a fair point, I was just wondering about the potential issues stemming from mismatched interpretations of paths (SwiftPM thinks itā€™s one thing, but the compiler thinks itā€™s another). Maybe itā€™s not such a big deal for the compiler specifically, but I wonder about other projects in the tool chain such as IndexStoreDB/SourceKit.

I would imagine those projects would like to know if two differently-written paths resolve to the same file. Maybe your editor will request information about the file at some path, due to how the project was opened, but then the index that gets built during compilation uses some other spelling. It seems like this would be a pervasive problem. I donā€™t know how/if they handle that today.

I completely share Danielā€™s performance concerns. I understand that Windows paths are more complicated than POSIX paths are, and I appreciate that there is an engineering disadvantage to having two separate implementations of some of the AbsolutePath/RelativePath methods.

However, on macOS, URLs are bridged to Foundation objects, which means that URL operations allocate lots of Objective-C objects. Because of the bridging, the compiler canā€™t optimize this away. Historically, both NSString and NSURL have been too slow to use when lots of paths are involved (such as in build systems).

So despite the engineering costs of having separate implementations of the path abstraction, I donā€™t think we canā€™t accept a significant performance hit. If the URL-based implementation performs as well as the current one (ideally better, since as Daniel said the current path implementation hasnā€™t been optimized yet), then that seems like a solution. Otherwise, I think we would need to accept the engineering cost of a forked implementation, at least for now.

1 Like

Wait, what? Doesnā€™t Swift Foundation use CFURL directly?

Can you give a concrete example of what this looks like in practice?

I normally build on S:\BinaryCache\Release\Windows-x86_64 on Windows or /Users/compnerd/BinaryCache/Release/Windows-x86_64/ on Linux. The CI machines build in C:\s\4\b, the other bots build on T: which is actually a remapping of S:\....\ which may be a remapping of C:\.... Installation by users actually builds in C:\.... or D:\... because the Windows installation is on another drive.

I can vouch for the weird fun of this one just from my experience working on the Windows build last year - at that time the compiler itself was having exactly the kind of path problems that SPM does. Remapped drives only worked if they had assigned drive letters. Shared directories (I was using a Parallels virtual machine) just plain failed, even with a drive letter mapping (because "realpath" resolution yielded a UNC path). Various tools repeatedly hit the legacy pathname length limit (something I'm also surprised doesn't happen more often on macOS, really...), and so on. In the end, there was a long stretch of time where a very specific setup was required to get the build past a certain stage, and it was all about keeping paths short and in their simplest possible form.

It's also not fair to imagine that Windows is the only platform that has esoteric path handling under various circumstances - it's just the most egregious and unrepentant one. Look at how APFS on Catalina deals with firmlinks, or check out how many obscure inode type bits there used to be in <sys/stat.h>. And does anyone remember /..namedfork/rsrc? :slightly_smiling_face:

URL is the de-facto "golden standard" for working with paths because, as Saleem pointed out, it has solved the various problems already. It makes very little sense (in this context, at least) to try to nail down an alternate abstraction; most of the facilities such an abstraction would end up exposing are exactly what's available on URL (of course URLs also have features even Windows paths have no use for, but after all it's also an Internet resource specifier).

To address some of the concerns that've been raised:

  • Performance - I don't believe it necessarily follows that using URL is likely to incur a large penalty versus the current "purpose-built" implementation. This may be true, but there are plenty of ways to mitigate such overhead. Switch implementations on a per-platform basis so only Windows takes the performance hit (if there is one even when there's no Objective-C bridging to deal with). Investigate in depth where any bottlenecks are or will be and aggressively optimize around them. Dynamically switch implementations at runtime based on in vivo benchmarking. And so on. As long as the "need to know if the URL refers to a directory at creation time" issue is addressed, I don't see this as a serious hurdle, especially over time.
  • Deployment target - Clearly there's considerable resistance to raising SPM's minimum OS requirement (speaking solely for myself, I don't see 10.11 as a particularly arduous requirement at this point, though I know there are still those who'd be affected). As with performance, swapping implementations on a per-platform basis may be the easiest short-term solution to the issue, even if it imposes a greater maintenance burden on the code itself (a burden I don't pretend to minimize - but one I do think is bearable). Over time, I dare say the question would eventually solve itself.
  • Other solutions - Sure, the problem has been "solved" (hahahahaha....) without URLs by projects like LLVM and Python. At the same time, I've run into plenty of cases where clang can be easily confused by Windows paths, so I'm not sure I'd say something other than URL is likely to be any more robust - on the other hand, I would put the battle-hardened time-tested expertly-maintained (well, certainly in the case of s-c-f anyhow - the good ol' Objective-C version does have a touch of bit rot here and there) code in [Core]Foundation up against any other implementation and see how many unit tests are passing when they come out the other side.

To sum up, I don't think URLs are a panacea for the paths problem (well, I kinda do in some respects, actually...), and I definitely don't think they're a drop-in solution for all of SPM's Windows compatibility woes. But I do believe using URL is a reasonable and sensible move when looking at the long term. If opportunities for improving the performance, compatibility, and/or competitive edge of Foundation in general appear in the process, so much the better.

7 Likes

I am not at all saying SwiftPM should actually work at the default target (of 10.10?). Iā€™m pretty sure its runtime behaviour relies on it precisely matching the system anyway. Iā€™m only asking that it be done using @available in the source instead of platforms in the manifest, at least until there is a way to use such a package under a runtime condition. Otherwise it can impose restrictions on the entire package graph, even if it is a barely touched node. I have learned the hard way that for now only topā€level packages should set deployment targets in the manifest.