[build-script] Status Update: Moving towards eliminating build-script-impl and splitting the toolchain and stdlib build

Hey everyone!

Status Update: Eliminating build-script-impl.

I just want to give an update on an ongoing effort that Dario
Rexin, I, and other collaborators are puttig work into. Big
picture is that we are trying to start eliminating
build-script-impl from the build system, doing it incrementally
over time. We are doing that by finishing the migration of
build-script-impl products to build-script products, starting at
the front and going backwards. We are also using our knowledge to
change our build to be easier to cross compile. Towards that end
recently,

  1. ArtemC added support for having a build-script product that
    ran /before/ build-script-impl and used that to implement support
    for Early Swift Driver. c2dc8e3d0748597e2c964a48aaeb9c20426f618f.

  2. Dario Rexin migrated cmark to use a build-script product. They
    did it in a manner that have allowed us to create a generic way
    of calling cmake based products that we are building
    upon. 3c19cc432dab76bf44e6c36f039e2bb16277db62.

  3. I changed the generic cmake product implementation to always
    build products in a cross compiled fashion with cmake. CMake's
    behavior changes when one cross compiles vs when one compiles
    normally. So by taking this step, we are making it so that our
    custom cmake code that is being invoked by build-script are
    always dealing with one CMake behavior, the cross compilation
    behavior. This will make it significantly easier to maintain our
    ability to cross compile reducing maintainance costs. NOTE: This
    works since we just cross compile on the host for the host (in
    linux cross compilation terminology, our build and host are the
    same). d40e4f31c9568e3d2ce47f63408c80eb6f28af48

  4. Dario is currently working on migrating LLVM's
    build-script-impl to a build-script product. This is in progress:
    [Build] Make LLVM a build-script product by drexin · Pull Request #38507 · apple/swift · GitHub.

  5. I have recently split the build-script-impl phase into 2
    phases, a toolchain build phase (e.x.: building LLVM, Swift,
    LLDB) and a library build phase (e.x.: building
    swift-corelibs-libdispatch, swift-corelibs-foundation,
    xctest). By doing this, it will allow us to start shaving down
    build-script-impl internally as well. I did this because I am
    currently working on splitting the toolchain and stdlib parts of
    swift's build and this is where I would need to place building
    the libswift/standalone stdlib. I will talk about that in more
    detail below. 4f149d07ce7f85ed2277a3a3825b1c717899b266.

LibSwift and splitting the toolchain and stdlib builds

Background: One thing that we have wanted to do for some time is
to split the stdlib build from the toolchain build. This will
make it easier to cross compile the stdlib in general and will
let us to begin migrating the stdlib to use standard Swift cmake
support. It will also then allow us to begin migrating the
toolchain cmake to be pure LLVM cmake. This will enable us to
eliminate a bunch of the custom CMake code that we maintain as a
project allowing us to instead just rely on standard cmake
maintained by other projects reducing maintainance burden and
making it easier to bring up new engineers.

That being said, we needed a driver for this work and I have used
bringing up the build system support for libswift. For those who
are unaware, libswift is a new static library in the optimizer
that is being written in pure swift. The work requires us to
build libswift using the just built swift and then link a new
swiftc reusing the artifacts from a stage 1 swift build and then
finally compiling the stdlib with that. To implement that I am
taking the following steps:

  • Building upon (5) above, I am currently working on a
    build-script product that builds the stdlib as a separate
    invocation of Swift's cmake and that builds against the Swift in
    the just built toolchain. This will let us begin to split swift's
    cmake into two at the CMake level and transition the stdlib to
    standard swift cmake (and then deleting a ton of nasty code!). I
    am going to send a follow on post with more information on how
    this will change how people interact with the build outputs. I am
    hoping to get this done ASAP.

  • Once that is complete, I am going to be able to re-use the same
    code from the first part to create a sort of Stage 1.5 swift
    invocation that builds libswift (using the just built libswift),
    relinks swiftc with libswift/the dependencies and then installs
    that in the just built toolchain (the stage 1 compiler and if we
    are bootstrapping stdlib will be in a subdirectory in the just built
    toolchain). From the perspective of the stdlib build, nothing has
    changed at all since it has already been split out meaning we don't need to introduce any cmake hacks to build the stdlib with the libswift-ified stage1.5 compiler.

Michael

22 Likes

This is fantastic! I'm so looking forward to the world without build-script-impl. Maintaining it wasn't fun at all, especially from the perspective of supporting multiple platforms.

Could you elaborate on the bootstrapping steps here (esp. what is built first and how many times different things are built) and what is required from the host? I read through the post a couple of times, but still not super clear about build order. Maybe having a graphviz diagram labelling the different stages would help?

FYI, I posted the promised larger document with more details on the splitting of the toolchain/stdlib builds here: [build-script] Splitting the toolchain and stdlib build (more detail).

TLDR: On macOS we will still only build everything once due to system stdlib. On other platforms without system stdlibs, we will need to build a stdlib stage1.

It works like this:

  1. swift builds the toolchain. We install this into the install_dir under ./stage1_swift/

  2. On a machine where we do not have a system stdlib, we build the stdlib (a). This is done using a standalone stdlib implementation that is installed into ./stage1_swift/.

  3. A separate swift invocation uses the just built compiler to build libswift. We used cmake exports to pass in the relevant c++ libraries to relink with libswift to create stage2 binaries. Importantly in cmake land these stage2 binaries will have different cmake names (e.x.: swift-frontend-stage2) but we will ensure that they when installed in the toolchain are given the appropriate name. We will also copy out binaries that we need from the stage1_swift part of the just built toolchain. I think we are going to need to be able to serialize into the cmake lists the components that individual targets belong to. But once I implement that, it should be pretty easy to just automagically create install rules.

  4. The split stdlib builds just normally using the just built toolchain (b). Little does it know about the work we have gone through previously to get things set up so that it doesn't have to care. Then the stdlib installs itself into the just built toolchain.

(a) We can avoid this on systems that have a system stdlib by forcing libswift to use the most conservative ABI assumptions about the stdlib (I imagine with time given which machines are supported we can make it more aggressive).

(b) As mentioned in note (a), on macOS we will actually use the system stdlib to run the libswiftified swiftc. On other platforms without system stdlibs, we will use the stdlib from step 1.

1 Like

[Putting things in my own words to make sure I'm getting this correctly]

On macOS, the build looks like:

  1. A stage 1 toolchain (this excludes libswift) is built using the host's C++ toolchain (e.g. via Xcode).
  2. The stage 1 toolchain is used to build libswift, which links to the system-wide Swift stdlib. Other .o and .a files from the stage 1 toolchain build are linked with this libswift to create a stage 2 toolchain.
  3. The stage 2 toolchain is used to compile a stage 2 stdlib directly, skipping a stage 1 stdlib thanks to the system-wide Swift stdlib. libswift does not link to the stage 2 stdlib.

On platforms other than macOS, which do not have a system-wide Swift stdlib:

  1. A stage 1 toolchain (this excludes libswift) is built using a host C++ toolchain.
  2. The stage 1 toolchain is used to build a stage 1 stdlib.
  3. The stage 1 toolchain is used to build libswift, which links to the stage 1 stdlib. Other .o and .a files from the stage 1 toolchain build are linked with this libswift to create a stage 2 toolchain.
  4. The stage 2 toolchain is used to compile a stage 2 stdlib. libswift does not link to the stage 2 stdlib.

Is the above description correct?

Almost, but more to a bit of potential variability (I have put everything into stone yet). The only thing that may change is I may want to build libswift using the host compiler rather than the stage 1 compiler. The reason why the split here is good is that it makes it so that we do not have as many weird variations per platform. The code run in each phase is the same, just we add some additional phases on non-Darwin.

1 Like

This sounds really great. I have 2 questions:

  1. Could this potentially enable building the stdlib using a nightly toolchain?

Let's say I download a nightly build, then check out the repository at that exact revision. Could I just build the stdlib and build/run the associated tests, without having to build LLVM, Clang, etc? Because that would be amaaaaazing.

I wonder. It seems like we're adding an awful lot of complexity to the build system that might not be worth it. We have stable builds on enough systems that getting a machine with a swift compiler installed isn't all that much hassle, and from there you can cross-compile to build a native toolchain for your target platform. You only need to do that once, and then you can can skip all of this multi-stage compilation and build using that native toolchain, just like macOS would do with its toolchain. Is that right?

So it seems to me that this process is only essential the very first time you bring up a new platform, and only if you literally cannot find a single machine capable of running a previously-built, packaged toolchain.

Probably a controversial opinion, but - is it maybe time to just say that you need a Swift toolchain to build a Swift toolchain? And allow even mandatory parts of the compiler to be written in Swift?

You can already do that. There is a preset that builds the stdlib standalone.

So I have thought about this. The problem is sometimes you do need to be able to support a bootstrap (even if it may seem that we do not today). Once one can no longer bootstrap, it becomes a pain in the butt to be able to bootstrap again. That being said, I think we only need one platform that blocks CI to maintain this, so we will probably just do it on Linux (and Windows probably).

2 Likes

Let's say I download a nightly build, then check out the repository at that exact revision. Could I just build the stdlib and build/run the associated tests, without having to build LLVM, Clang, etc? Because that would be amaaaaazing .

I do this all the time, it is how I cross-compile the Swift toolchain for Android. Take a look at the build preset he mentioned, which will build the stdlib from source with a prebuilt Swift toolchain and run the stdlib tests alone.

is it maybe time to just say that you need a Swift toolchain to build a Swift toolchain? And allow even mandatory parts of the compiler to be written in Swift?

I agree. We already have prebuilt Swift toolchains available for Win/Mac/Linux and even iPadOS and Android, which is what 99% of the market already? For the few stragglers like the BSDs or Haiku that don't have a native Swift compiler build yet, I don't think it's too much to ask them to cross-compile the Swift stdlib and compiler the first time.

1 Like

@Michael_Gottesman Note that in the bootstrapping process we need to build libswift twice.
The first libswift is built with compile which doesn't have libswift (which contains optimizations), so the first compiler build with libswift build will have bad compile-time performance.
Only the second compiler build with libswift will have the full compile-time performance.

That should be fine. I just don't want to do it as part of the initial bootstrapping process. Big picture I think based off of our discussions:

  1. We need to build the first libswift with the just built toolchain. This is so we can use the c++ importer.
  2. We /are/ ok with requiring on macOS for libswift to always build against the oldest stabilized stdlib that we support. This ensures that we do not need to build the stdlib multiple times (a requirement since otherwise we would balloon PR time on the macOS PR tests that are some of the slower PR jobs).
  3. Not as part of the first version of this but via iterated work, we bootstrap libswift as well. This means that we build libswift a second time using the initial libswift we compiled. The nice thing is that given how I am setting this up, it should be pretty trivial beyond some changes around where we install things. We just run another of the libswift build-script products.