SwiftPM: 2x faster resolves, 3x smaller disk footprint

Me too, I’m trying to do my part and pushed a PR to address the crash in URLSession w/ libcurl.

2 Likes

Updated numbers, 5 runs on macOS and Linux each, results merged into percentiles as before. Most importantly disk space usage reduction due to symlinks and swift package update optimisation so that the operation completes in 2-3 seconds for large projects.

Projects which show bigger .build sizes are due to dependencies with submodules, swift prebuilts or a binaryTarget dependency in case of SwiftLint.

Cold resolve (shared SPM cache + .build/ + Package.resolved wiped - the CI scenario)

Project Deps zip p50 p75 p99 git p50 p75 p99 Faster
spi-server 67 62s 67s 70s 92s 93s 96s 1.3-1.5x
swiftpm-large-project 48 51s 54s 55s 93s 97s 104s 1.7-2.0x
penny-bot 47 42s 47s 49s 76s 77s 86s 1.6-2.0x
container 29 29s 30s 31s 40s 40s 44s 1.3-1.5x
swift-composable-architecture 17 15s 17s 18s 16s 16s 19s 0.9-1.3x
SwiftLint 9 12s 13s 14s 12s 14s 16s 0.9-1.3x

Warm resolve (.build/ wiped, shared caches retained)

Project Deps zip p50 p75 p99 git p50 p75 p99 Faster
spi-server 67 9s 10s 10s 18s 19s 20s 1.8-2.2x
swiftpm-large-project 48 8s 9s 10s 31s 32s 33s 3.1-4.1x
penny-bot 47 6s 7s 10s 14s 14s 17s 1.4-2.8x
container 29 3s 3s 4s 18s 18s 21s 4.5-7.0x
swift-composable-architecture 17 2s 2s 3s 4s 5s 5s 1.3-2.5x
SwiftLint 9 4s 5s 5s 5s 6s 7s 1.0-1.8x

swift package update (on warm .build/)

Project Deps zip p50 p75 p99 git p50 p75 p99 Faster
spi-server 67 3s 3s 4s 12s 14s 14s 3.0-4.7x
swiftpm-large-project 48 3s 3s 3s 9s 9s 10s 3.0-3.3x
penny-bot 47 2s 3s 3s 9s 10s 11s 3.0-5.5x
container 29 1s 1s 1s 4s 4s 5s 4.0-5.0x
swift-composable-architecture 17 1s 1s 2s 2s 3s 3s 1.0-3.0x
SwiftLint 9 0s 1s 1s 1s 1s 2s 1.0-2.0x

.build/ disk usage

Project Deps Source archives Git Reduction
spi-server 67 1 MB 1,496 MB 1,496x
swiftpm-large-project 48 193 MB 1,830 MB 9x
penny-bot 47 1 MB 1,437 MB 1,437x
container 29 163 MB 893 MB 5x
swift-composable-architecture 17 1 MB 209 MB 209x
SwiftLint 9 207 MB 341 MB 2x

The large project:

$ du -sh .build-SA/*
4.0K	.build-SA/artifacts
964K	.build-SA/checkouts
82M	.build-SA/prebuilts
3.3M	.build-SA/repositories
193M	.build-SA/shallow-clones
408K	.build-SA/source-archives
36K	.build-SA/workspace-state.json

Breakdown:

3.3M	.build-SA/repositories/swift-lmdb-933a2802
166M	.build-SA/shallow-clones/github.com/apple/swift-protobuf/1.36.1-a008af1a
4.3M	.build-SA/shallow-clones/github.com/swift-otel/swift-otel/1.0.5-1a56b3a8
23M	.build-SA/shallow-clones/github.com/swift-server/swift-kafka-client/1.0.0-alpha.9-434af114

Current SPM:

$ du -sh .build-git/*
4.0K	.build-git/artifacts
840M	.build-git/checkouts
82M	.build-git/prebuilts
995M	.build-git/repositories
28K	.build-git/workspace-state.json
2 Likes

if someone sneaks extra code into GitHub archives specifically, the current way would catch it and the new way won't.

Considering the archives are built entirely by GitHub's API without involvement from the repository owners, it wouldn't be possible to sneak malware into an archive without changing the Git hash (which my approach will catch), or by compromising the code of GitHub itself. The latter would be an unprecedented disaster, and would affect all platforms that rely on the archiving method, as well as compromise all secrets and private repositories hosted there.

Yes, that's the concern! GitHub itself may be compromised, and it would be a disaster! That's why we have things like checksums and lockfiles, to mitigate the scope of the disaster!

I understand that checking something that "will never be wrong" like this feels superfluous. But we should not bake into our tools that GitHub can do no wrong, just because they're currently the biggest git host on the internet and owned by a large company. Like the language itself, it's our responsibility to make the thing people reach for first easy, safe, and performant.

If there's really no answer for this, perhaps it can be an option skipGitChecksByDownloadingArchive or similar. But I would not want it to be the default behavior, just like integer overflow is checked unless you specifically request it not be.

12 Likes

GitSafeSleepHub's code quality and project management has been on the decline for years now.

Just one example: https://github.com/actions/runner/issues/3792

tl;dr: Their "safe sleep" script was neither safe (it occasionally hung indefinitely) and was spinning instead of sleeping (= kept one thread at 100% usage). And they ignored the PR fixing the first part for 1.5 years.

There is an existing flag that will abort if the tag's sha has changed in case it is different from what was previously recorded:

  --resolver-fingerprint-checking <resolver-fingerprint-checking>
                          (default: strict)

Please stop bringing up tag/commit integrity checking as a response to commit/contents integrity concerns.

2 Likes

could we mitigate these problems by recording a hash of the zip file contents in the Package.resolved file? it would not protect against the case where GitHub is compromised at the exact moment you resolve the package for the very first time, but i would expect it to be an effective defense in the sense that if GitHub is compromised then everyone who happens to hold a lockfile from before it was compromised would be instantly alerted.

it’s not 100 percent airtight, but it would be a meaningful improvement, and personally, i’m getting frustrated with package registries being floated as the “more desirable” solution. registries are technically sound but they’re economically irrational, and i feel that the only real world impact that SE-0292 has had on the (open source) Swift ecosystem is by serving as a distraction that inhibits progress in this area.

6 Likes

They are only economically irrational if you are not the one paying for the operational cost of the git server.

nearly all of the packages people are complaining about with respect to resolution performance are popular open-source libraries hosted on GitHub with long git histories that make them heavy to download.

I almost forgot that GitHub already acts as a package registry: Introduction to GitHub Packages - GitHub Docs

So there is even less reasons to try to optimize usage with GitHub's git server instead of creating a good package registry.

Not sure what you're saying, GitHub doesn't support the Swift Registry specification. They do host a lot of packages for other languages, but actual Swift Registry support never materialized due to some issue in the initial registry proposal process. Right now the biggest actual registry I know about is the one hosted by Tuist (which uses the Swift Package Index as the trigger source to rebuild, since they don't take direct uploads). Unfortunately, one big limitation of registries in SPM is that swift-syntax prebuilts don't work with it. Whether it's more likely to fix that issue or add support for another cache type, I'm not sure.

1 Like

Just one more counter argument against the "package registries are economically irrational" argument.

Also Forgejo (which Codeberg uses) supports the current Swift package registry: Swift Packages Repository | Forgejo – Beyond coding. We forge.

Not sure I track. We have tons of private repos also which we can’t register there and the cost to build, run and maintain our own repository is a non zero cost. It’d be nice if that was faster out of the box.

4 Likes

I just tested the one on Codeberg, uploading worked fine but when I try to incorporate that package into one of my other projects, I get this error:

error: DecodingError.keyNotFound: Key 'name' not found in keyed decoding container. Path: metadata.author. Debug description: No value associated with key CodingKeys(s
tringValue: "name", intValue: nil) ("name").

I didn't add metadata during upload as it is optional. Same error with Swift 6.2.3 and 6.3.0 on Linux Mint 22.3.

Package I'm trying to add: Cyberbeni.LruCache - Codeberg.org

cc @dschaefer2

edit: This seems like a Forgejo issue, docs say that field is required: Documentation

1 Like

See the Background: how we got here, especially the expand section.

It might be fine to use a Package Registry for a single project or in an organisation where entire code base has no GitHub dependencies; but when there are inter-dependencies and having to upload and manage ~50 repositories, it becomes very non-trivial and not out of the box.