As building larger applications on Windows becomes more durable, one thing that
I've noticed is that it does not feel very efficient. When building code with
SPM, the CPU utilisation is often low during the execution. Using CMake does do
better (citation needed, annecdata), with CPU utilization a bit higher, but it
remains spiky.
I spent some time over the holiday pondering what a world that was more attuned to
building Swift code bases could possibly be like. Something which could utilize
the cores more effectively. I ended up playing around with the idea of a Ninja
build generator which would integrate with swift-driver. The results of this as
an experiment is antimony.
While incomplete, near to the point of uselessness, it is interesting enough as
an exploration point. I was able to use it to test building Swift/Win32's core
library. While it is more a toy, it is not entirely a facetious application. It
requires dependencies (e.g. cassowary, swift-collections, swift-com, etc). The
following results are more of a baseline as I have not bothered doing anything
beyond the most naĂŻve approach:
CPU
SPM Time (sec)
antimony Time (sec)
Delta
Snapdragon (TM) 8cx Gen2 @ 3.15 GHz
91.3644129
81.144729
0.888143714
Intel(R) Core(TM) i7-4785T CPU @ 2.20GHz
81.7764491
60.591287
0.740938102
Intel(R) Core(TM) i9-10900 CPU @ 2.80GHz
40.136513
17.949565
0.447212865
13th Gen Intel(R) Core(TM) i9-13900K
26.8264414
7.015942
0.261530849
Binary Sizes for SwiftWin32.dll on x86_64-unknown-windows-msvc:
Build System
Size (bytes)
SPM
10,443,264
antimony
2,921,472
While these numbers are far from scientific, they do cover a reasonable set of
machines. The time measurements were performed by using Measure-Command under
PowerShell. The SPM builds were done twice to attempt to avoid any type of
network operations from interfereing:
swift package reset & swift build & swift package clean
Measure-Command { swift build --skip-update --product SwiftWin32 }
The same checkouts were used with antimony as:
sb gen out\debug
Measure-Command { ninja -C out\debug }
One thing that that I noticed was that antimony was able to saturate the CPU far
better than either SPM or CMake.
SPM:
antimony:
I have some idea of how I would refine the PoC into something more usable leaning on the experiences of having worked with Bazel, Buck, CMake, and GN. However, that is a long and arduous road, and before setting down that path, I wanted to see if this conceptually was interesting to others in the community as well. I would state that I would likely want a design that grants the developer more control and explicitly supports cross-compilation and would be a mixed-host design (i.e. support building build tools and host content).
Build times with SPM can be quite long especially on Windows machines in environments where the use of an anti-virus program is required that cannot be configured accordingly (it could be helpful to measure times in such an environment, or what improvements would one expect in this case? Does the build system use less files on disk? …but maybe this is a silly question, one might just be lost with a bad virus scanner). And from my understanding SPM misses some important features. So indeed, improvements would be very useful, and the use of a simplex solver sounds like a good idea.
The question that I would have is if such improvements could be incorporated into SPM, or is SPM somehow inherently limited on some respect? (From a naive standpoint SPM is just the interface, “any” tools could work in the background.)
And if this leads to an alternative build system, then one question would be, how easy would its usage be for “easy“ cases (would it be suitable for “beginners”), and another question if this build system would be easily composable with SPM, both in the case of an SPM project where a dependency is to be built with this new build system, and vice versa (and recursively so).
Can you elaborate on the difference in binary size? Which build system's output is "correct"?
And kudos for exploring this. swift build is a bit sluggish on all platforms - I've certainly had plenty of time too, while waiting for builds to complete, to stare forlornly at the CPU usage chart in iStatMenus. Some healthy competition - or even just some insightful experiments that can ultimately improve swift build - is very welcome!
I suspect the code size aspect is features that SwiftPM has beyond being a build system, like fetching repositories over HTTPS, implementing non-build subcommands, etc, but the job saturation thing is definitely interesting. llbuild is supposed to be competitive with Ninja for job management (its proof-of-concept was a drop-in Ninja replacement), so either SwiftPM + the driver aren’t generating jobs well, or there’s a lot of weird overhead getting them scheduled, or Antimony is taking shortcuts that aren’t strictly valid (like not properly handling incremental builds). It definitely seems worth investigating given that gap, though! I would hope the result can get folded back into SwiftPM though rather than introducing another build system.
I don't think that the antivirus matters for two reasons:
I work entirely on a dedicated volume that is excluded from the antivirus as it is meant strictly for development use only.
I actually did these tests on machines using DevDrives which automatically excludes the volume from the AV interference as well.
Actually quite the opposite - antimony generates more files as it is generating static libraries and dynamic libraries as intermediates even.
SPM does not grant you control of the linkage model, nor does it use static or dynamic libraries but only object libraries. The simplex solver is a dependency for Swift/Win32, not antimony itself.
I think that what I have in mind might be quite a bit different. Namely, removal of Package.swift in favour of per-directory build files with explicit file lists and target definitions. This would be closer in some senses to bazel's BUILD files.
The bigger thing though is the fundamental removal of the concept of object libraries in the model. The user would have to define the library and whether it is static or dynamic and the library would be created. Additionally, the separation of a static and dynamic library, forcing multiple builds for each if necessary is core to the model.
The design decision that is important is supporting native paths. The model explicitly supports paths such as:
I do not know if there are possible concerns around completely changing the model from what is closer to Xcode's to something that is closer to CMake or Bazel's - namely working in terms of library and executable targets rather than the Xcode model of target and products. This would effectively remove the target and products and only deal with what are currently deemed "products".
I think that its reasonably easy to start with, but I also am likely a bad judge. To that end, let me share what something for building an executable would look like:
As to the difficulty of composability - I expect that this would likely fall into the same problem that bazel and buck do - they do not interact with other systems. You could do the same thing as them and overlay a file to build, but there wouldn't be a good way to interact with them. Although, in complete fairness, SPM does not interact with other build systems either - I cannot simply take a CMake project and use it with SPM nor can I just take a autotools project and use that with SPM.
The correct output is from antimony. The binary size differences is due to the proper use of static libraries over object libraries which allows for proper dead code elimination. SPM does not support libraries, it uses object libraries only as it is designed with the MachO format and ld64 in mind and does not always take into consideration ELF, COFF, WASM needs.
Sorry if it was unclear, the code size is for the products that are being built, not the tool. While that is important, I don't think that I am particularly concerned about that currently.
But you are pointing out rightly something that was implied - I was not looking at dependency management, only the build aspect. SPM does not really fetch code over HTTPS per se - it delegates that to git and just invokes it and does not package it or use a library.
The "non-build subcommands" are relatively tiny (and antimony does have a small one that accidentally leaked in the prototype - one to format the definitions). The build command itself generates a manifest and delegates to llbuild (which is dynamically linked).
In that sense, both of these tools do something similar in both of those cases - antimony assumes that something has cloned the dependencies where desired, and both delegate to other tools to build.
I think that you are missed what antimony does. All it does is take the jobs that the driver says it will execute and writes that out to a ninja specification. It does close to nothing other than reading the configuration to determine the libraries and executables. Everything is the result of the driver.
I don't know how much of it is "weird overhead" and how much of it is the model that SPM uses for building. IMO there is a lot more complexity in the SPM model of writing out a manifest that will be re-processed by llbuild to generate the commands.
I think that I would be happier with that assuming that we can easily change the design of SPM without having to worry about the implications for interactions with Xcode as this will fundamentally change the product/target model.
Can you elaborate on the difference in binary size? Which build system's output is "correct"?
SPM does object linking instead of library linking. Object linking pretty well copies the entire object into the resulting shared object/executable, regardless of whether all of the contents are used. I haven't looked at how the objects are merged, whether they're dumped into a big pot at the end and left for the linker to figure out what to do, or whether they're merged based on dependency graph info, but because objects get fully resolved in an object link, object linking is a good way to hide duplicate copies of the same symbol without ODR violations since the linker will resolve calls to that symbol with calls directly into the object that it's merging with. With static archives, most linkers will search for unresolved symbols and only extract the ones they need, so you effectively end up with automatic dead-code-elimination. With object linking, you don't get that. Then, since the linker actually ends up doing more work, you end up with longer link jobs too.
Now, the challenge with static archives and the Linux linker, due to the link algorithm, you need to put the static archives in order from depender to dependee (e.g. if liba.a depends on libb.a, you would need to ensure that libb.a comes later in the link command ld liba.a libb.a. And if you have cyclic dependencies between static archives, you may need to have the library show up multiple times). lld and ld64 both have different algorithms that don't require that behavior though.
Bazel also does library linking by default (when targeting Apple platforms), and one issue we encountered was that it was possible to accidentally drop protocol conformances when nothing else referenced those symbols. For example, if a type is extended to conform to a protocol in a separate .swift file and that protocol conformance is only used via dynamic casts, then there would be no symbolic references to anything in the corresponding .o file in the static archive, and the linker would drop it from the linkage. (On Apple platforms this issue is resolved by using the -ObjC flag, which also forces any object with Swift metadata to be included, but the problem existed on Linux unless we did the equivalent force-load there.)
Does the Windows linker have different behavior in this case, or is it exposed to the same risk?
Oops, I definitely misread that as “code size of the build system”. Now it’s clear that SwiftPM is just missing features on Windows. Thanks for explaining!
I don’t know enough about the new driver and the “new” incremental system to know if there are any problems with that; back in the Swift 4 days, the incremental build information wasn’t precise enough to plan jobs up front, and so the compiler declared that it would work on every file in every build, then choosing to skip some if it didn’t actually need them. The driver also tries to compile multiple files in one frontend process, rather than using one process per file, to avoid some of the shared costs of…imports, mostly, but this too can lead to uneven CPU usage. So you’d need someone familiar with the new driver and incremental build to say whether this up-front plan is reliably faster.
But it is empirically faster, so it could also be that SwiftPM / the driver is too smart for its own good.
I think that there might be some cases where Windows is subject to this risk. I would like to fix this, but the problem that has plagued me on that front is the difficulty in coming up with a reduced synthetic test case (I tried recently and was unable to get the failing case ). With Windows, it is not as susceptible to this problem due to the protocol conformances being coalesced and then referenced by the grouping for the registrar (SwiftRT-COFF.cpp).
It basically comes down to identifying the conformance in the SIL and ensuring that we mark the symbols as linker used, which is a portable enough technique that we should be able to apply it irrespective of the object file format.
It's not so much "resolved" there as "this is the best we could do with the linker we have". It would be more ideal if ld allowed -u symbol options to be present in the autolink commands, or there were some other way to precisely mark specific symbols in object files as being required so that they don't get dropped during static linking. When I last looked into this, I vaguely recall there being a way to do that with ELF object files, though I forget the details. We ideally wouldn't force all metadata to be loaded all the time, only that for which the protocol and/or type metadata is actually used dynamically.
In the "hermetically sealed" mode that @kubamracek developed for embedded Swift, we take all of the LLVM IR for a program together, and we keep conformance metadata based on which conformances' type and/or protocol descriptors have a use. This has the obvious drawback of needing to basically do an LTO-style build, but if a linker natively supported that sort of reverse dependency of the conformance on the type and protocol, we could take advantage of it.
The following test case exhibits the issue for me on Linux, so it would be interesting to know how it behaves on Windows: BUILD.bazel · GitHub
By putting the conformance to CustomStringConvertible in a separate file, which the standard library only looks up dynamically via String.init(describing:), this code will print MyType() instead of I got called! if library linking is used instead of object linking (the latter being controlled in Bazel via the alwayslink = True attribute).
Ah, that is not ideal - at least unless we can separate the idea for the metadata pruning from the hermetically sealed binaries.
@kateinoigakukun worked on a GSoC project to enable LTO which ran into some (non-technical) issues that prevented the work from being fully realized. But I think that doing something like that might be possible with certain restrictions (e.g. must use lld). A thick LTO style link would likely be cost prohibitive, but for just the metadata references it might be possible to do. We could do a thin-style LTO with callbacks into the compiler and then do a thick LTO for the metadata.
Conceptually at least, stripping metadata shouldn't be dependent on any kind of special hermetically sealed mode. In a more typical build, you'd have to be mindful of public linkage for public types, but there are still plenty of private/internal/package-level types and protocols that can be effectively stripped.
Another thought I had along these lines was to introduce our own pre-linker which could work on object files and dead-strip symbols from we can know are dead based on Swift's semantics but which we can't generally communicate portably to linkers. I wonder if we could also solve @allevato's reverse problem of dynamically-used symbols getting stripped by stacking up artificial relocations that refer to the necessary symbols, like if there was a dummy symbol whose value was address of T: P conformance - address of T: P conformance + address of U: Q conformance - address of U: Q conformance and so on. I wouldn't expect a linker to anticipate that sort of technically-redundant set of relocations so they probably wouldn't try to optimize them out.
Improving static linking on Windows is certainly on the roadmap (and hopefully @Alex_L might even be looking into some of that soon!). The nice thing about it, particularly with Antimony, is that the developer is responsible for acting as the oracle and telling the build system what is best. You can combine a set of libraries as dynamically and statically linked so that re-used code is shared and non-shared code is statically linked.
Overall, I believe that the numbers here are baseline improvements rather than roofline and there are still more opportunities for improving the performance here.
I think that using lld and using the callback into the compiler as a LTO mode would be better. Rather than creating additional layers of tooling, reusing existing paths makes it easier to integrate with other systems. There is a mechanism for injecting pre-linking in the form of a LTO plugin, why would that not be preferable here?
Could we not just mark the symbol as llvm.linker.used and get the same preservation behaviour? What am I not considering? (It may be overly pessimistic in terms of retained symbols, but that can become the optimization that the linker does by means of a "LTO" plugin).