It's been two years since I originally suggested that we start down the path of requiring Swift on the host when building a Swift compiler and toolchain. Since then, the gap between the "builds only with C++ code" and "builds with a host Swift toolchain" has grown considerably: macros, Embedded Swift, driver features, and many new optimization passes are written in Swift. And on the other hand, we're not really testing the "builds only with C++ code" path much, because the official CI builds for Windows, Linux, and Darwin are all built with host Swift toolchains.
The details of the "how" haven't changed much since my August, 2022 post on the matter, but I think it's time to remove the ability to build the Swift compiler without a host Swift toolchain. I think there are roughly three steps:
Introduce a CMake error when building the Swift compiler without a host Swift toolchain, only on main. This will flush out remaining places where we aren't building with a host Swift toolchain.
Remove the ability to disable the use of swift-syntax in the compiler build.
Start deleting the C++ implementations that are only used when there is no host Swift toolchain, because this code is now dead. Give a little cheer for each such case, because our maintenance effort has been reduced.
After this, there's still a long road for the various C++ implementations to be replaced by their newer Swift counterparts. The driver is fairly close, but the parser requires a lot more engineering effort to switch. However, by taking the steps above, it gives us a path back to only having a single implementation for these compiler components... one that's written in Swift, with all we've learned over the years.
My reflex is to be all for this. However, i'm concerned about the bootstrap process for new platforms and architectures. Is it possible to cross-build the compiler for a new platform if someone wants to start adding support for it?
Great news. As more and more of the compiler is rewritten in Swift, has there been any serious benchmarking of these Swift components? If so, how is Swift performance looking compared to the C++ versions, in this admittedly niche domain of implementing the compiler itself?
You'd probably want to start with porting the stdlib first, and once the C++ Driver is removed, you'd need to port libdispatch and swift-corelibs-foundation next, as the swift-driver uses them. Finally, you would cross-compile the rest of the compiler for the new platform.
At least in my experience, 5.10 was the last compiler that would cleanly build without a Swift compiler preinstalled, so this doesn't change my experience that much.
That said, I'd like to ask that we amend the requirement of what the version of Swift the Swift source files in the Swift compiler must build with to 5.10 so that we can avoid the multi-hour multi-stage toolchain-build-hopping dance, and that anything that doesn't compile with the 5.10 compiler can be reverted aggressively. At least until we have a reliable cross-compiling story and can bring up new compilers on new platforms from one that has an existing compiler easily.
Totally. I just wanted to be clear that the implication is that bringing up new platforms involves cross-compiling, which makes it important that the Swift build process has reasonable cross-compilation support (right now, I think if we're honest, it needs some work).
Yes and no? The new driver has been enabled most everywhere for years now, and did not contribute to any measurable performance degradation. We see compile-time performance get worse when we end up having to run both the new and old parsers (e.g., macro expansion is only implemented using the new parser), because we're literally doing twice the work to be in the in-between state.
We have done benchmarking and optimization of the new Swift parser vs. the C++ parser, but that doesn't tell you the impact of completely replacing the C++ parser because they aren't doing quite the same things. The only thing that tells you the impact of that change is to do it---once it's done, completely and correctly, and we're not yet there with replacing the C++ parser.
The end result of reimplementation of these components in Swift should not regress the user experience, and compile-time performance is part of that story. But it's too early to speculate.
Sure, that's fine. We've already been doing this in practice.
Our CI jobs still use a 5.9 compiler which has contributed to some minor monstrosities in SwiftPM. If we're going to require 5.10, let's make sure CI is using it too!
Good to hear, especially since swift-driver also uses Foundation.
I assume these differences are extensively measured by some compiler engineers, it would be good to share that info publicly.
Part of the story with Swift's C++ interoperability is that new code in a mixed codebase can mostly be written in Swift, such benchmark info can help make that case. Even if there is a performance regression, the other Swift benefits can make up for it in most situations, depending on how much slower it is.
I started watching Herb Sutter's 2022 talk about his Cpp2 experiment to completely reform C++ while maintaining its design goals and backwards compatibility, which he likens to the original cppfront translator to C and is now the seventh most-viewed video on that CppCon channel. Clearly, the C++ committee sees new languages like Rust, Go, Zig, and Swift taking over and this is their move to compete.
I think these compiler component benchmarks might be a good place to compare the two languages as they stand today, aimed at those contemplating Swift in their C++ codebase.
How close are we to building a Webassembly version of the compiler? We could offer that as a fallback option for platforms which don't already have a native toolchain, as it abstracts over the CPU architecture and OS.
There are Webassembly interpreters which can even run on embedded microcontrollers, so it seems a reasonable expectation that any platform capable of building and running the compiler + llvm + standard library + supplementary libraries is able to run a Wasm interpreter. It is also possible to compile wasm ahead-of-time to native instructions for improved performance.
Not close at all, as this has to be done for all dependencies of the compiler first, mainly LLVM and Clang, neither of which support WebAssembly as the host platform.
Some one-off prototypes built by third parties off their own forks may exist, but upstream LLVM and Clang never supported Wasm as a host platform.
Adding support for a new host platform to a project of LLVM's size is a big endeavor and requires a serious commitment from people willing to maintain it on an ongoing basis.
For Swift that would also mean ongoing maintenance in Swift's fork of LLVM if we were willing to adopt it.
That's true, although I think LLVM is the primary toolchain used to produce Wasm, isn't it? So given that it is always going to have strong Wasm target support (and that is already maintained), Wasm host support might not be as burdensome as it would be for a completely new platform.
In any event, it's unfortunate that Wasm doesn't seem like an option right now. Maybe one day...
I wouldn't agree with that conjecture. Clang has somewhat "strong" support for targeting embedded Wasm (e.g. lagging behind Binaryen in some features), but can't build for WASI out of the box. The WebAssembly GitHub org has to maintain their own distribution of LLVM and Clang that one is meant to rely on when using wasi-libc. SwiftWasm had to adopt, port, and maintain some of those changes for Swift to be able to use WASI as a target.
Thus, supporting an embedded target platform has little overlap in supporting a full-fledged host platform, especially since WASI doesn't overlap enough with POSIX to make porting of the LLVM and Clang codebases easy. There's no process spawning in WASI, and multi-threading is not trivial either, never mind the numerous limitations in file system APIs.
This topic has been on my mind lately given my recent[1] experiences[2] getting Swift going on Gentoo, primarily because the idiomatic Gentoo approach to packages presents effectively a worst-case scenario for bootstrapping Swift:
It is strongly preferred that packages build from source (ideally on the machine they're going to run on), and cross-compilation wouldn't be a preferred solution;
It is atypical (though possible!) for multiple versions of a package to be installable simultaneously, so
It's quite uncommon of for a version of a package to depend on an earlier version of the same unless absolutely necessary
(Of course, Gentoo is... niche... and I wouldn't expect (nor suggest) that the Swift project at large consider it a meaningful use-case, but it is interesting to design with the potential of this in mind.)
I was initially concerned, but with a bit of thought, this seems pretty reasonably surmountable:
So long as there's a guarantee that Swift 5.10 can be relied upon to bootstrap future compilers, one possible path forward on the platform:
Split the current swift-5.10.1 package (which builds Swift 5.10.1, installs it into a versioned dir, and symlinks some binaries into /usr/bin) into swift-toolchain-5.10.1 (which just builds and installs) and swift-5.10.1 (which depends on swift-toolchain-5.10.1, and just symlinks)
Since swift-toolchain-<vers> is standalone, I can slot it to allow multiple major.minor versions to be installed simultaneously
Since swift-<vers> is not, it will continue to only allow one version at a time
Future versions of swift-<vers> can then depend on swift-toolchain-5.10.1 and swift-toolchain-<vers>
Of course, these details are pretty Gentoo-specific so no need to pay too close attention, but this is at least a vote of confidence that any platform which the Swift 5.10 line can be updated to support can bootstrap itself without needing cross-compilation. (Which, again, is a sort of worst-case scenario for support.)
So, +1 from me. (Especially if we're comfortable marking Swift 5.10.x as a sort of LTS release, given the special nature of this change.)
I've used Gentoo before (10+ years ago though, so take this with a healthy handful of salt), but I seem to remember the compilers being prebuilt binaries. I can't think of a modern C/C++ compiler that builds without an existing C/C++ compiler. I've certainly written a joke compiler in bash/grep/sed/awk, but that was a joke project, and I'm certain that a C/C++ compiler written that way would be absolute nightmare fuel and you'd still have to get a bash/grep/sed/awk binary to build it. Recent versions of clang certainly require something with C++17 support to build.
How do other languages like, say, Go, bootstrap themselves in this environment without using an initial binary blob?
Great question! It depends on the package. Gentoo packages can provide binary versions, it's that the binary is expected to be an alternative to source-based installation unless the source is not otherwise available. Binaries are typically provided for popular packages that would otherwise be onerous to build on user machines (e.g., Gentoo has both www-client/firefox for building from source and www-client/firefox-bin for a prebuilt version).
In terms of bootstrapping compilers specifically:
C and C++ are easy because the base Gentoo images typically used for bootstrapping a system come with GCC preinstalled, so you can build your OS to include whatever you want
(I realize that my previous wording was overly assertive and somewhat misleading, so I've edited it down a bit. The context I had in mind was the wider field of all Gentoo packages, not specifically compilers.)
Swift could do the same, sans prebuilt binary (unless Gentoo were a fully-supported platform and the Swift project were willing to host binary builds somewhere). Bootstrapping from 5.10 is certainly an option.
(It does appear that the Go and Rust ebuilds are written such that you can technically have multiple versions installed simultaneously â it's just that the last-installed version would overwrite files from another version; for Rust, it overwrites symlinks in /usr/bin; for Go, it appears that the whole installation is overwritten? I could write the Swift package like Rust, though I don't love it; the "toolchain" approach at least keeps those symlinks appropriately versionedEdit: Looks like Rust on Gentoo has an eselect module to allow selecting between one of multiple versions to activate. That might be an even better approach.)
Yeah, the challenge with Gentoo (and Arch for that matter) is that I don't know what is available on the system. Since they're rolling release distros, I can't produce a binary that will work with an arbitrary Gentoo/Arch because I don't know what version of each library are installed on that particular installation.
With Ubuntu/Debian/Fedora, each version has a specific set of library versions that are kept ABI stable for a given release (in some cases, more than one), which means that we can build something for that release (hence why we have packages for each version of each distro).
A potentially better world for this purpose would be to package all of the toolchain dependencies in the toolchain tarball and rpath everything to those libraries instead of the libraries on the system. This way we don't run into issues with ABI, but it does mean a bigger download. This isn't unheard of though, the Android NDK for Linux does exactly this for this problem. Then we could get away with having two packages, an x86_64-linux and arm64-linux download that would work on any of the distros. It's on my list of things to do...