I spent a little bit of time this weekend looking into how to reduce the size of an installed swift toolchain, and I thought it'd be worth starting a discussion here to see what ideas people have for improvements in this area. While I don't think shrinking the toolchain is necessarily a top priority, it does substantially benefit anyone who has a slow internet connection or is managing multiple installs.
Some ideas I've thought of so far:
Eliminate the duplicated SwiftPM subcommands: swift-build, swift-package, etc. are 5 single-file executables that all statically link the same libraries. It looks like we should be able to cut ~12% of the OSS toolchain size on some platforms by combining them into a single binary.
Similarly, I think it should be possible to fold swift-api-digester into swift-frontend alongside the modulewrap and symbolgraph tools to save another 10% or so. I haven't tried this yet though.
I saw your commit for the SwiftPM one, and I think it is a good improvement.
However, I think the most effective spaceâsaving improvements would be those that reduce the duplication of all of SwiftPMâs fetched material. My computer warned me about running out of space the other day so I searched the drive for SwiftPM checkouts (by SwiftPM proper and by Xcode). Just by deleting those I gained about 30 GB of space. It was the same half a dozen repositories cloned in full 40 times over across the file system.
If the edge cases involving simultaneous builds can be solved, it should be possible to massively reduce drive usage and continue to improve speed at the same time.
[global cache]/Repositories/[package] would hold the repositories.
When working from pins, any new clones are only made as shallow clones.
When resolving anew, and new clones are made as deep clones and any existing shallow clones that are touched are turned into deep clones.
[global cache/Checkouts/[package]/[commit] would hold the working copies of the sources, and each would only be created the first time it is needed. There would only ever be one for any packageâcommit combination, no matter how many packages elsewhere on the file system depend on it.
[global cache]/Products/[package]/[commit] would hold the build artifacts for that commit (before any crossâmodule optimizations). Just like sources, there would only ever be one for any packageâcommit combination, no matter how many packages elsewhere on the file system depend on it.
Dates of last use could be tracked, and the clear cache command could be parameterized to remove anything not touched in x amount of time, or to remove the oldest things until the cache has shrunk to x size.
Maybe this would make a good GSoC project for someone?