SPM Support for Binaries Distribution

BalestraPatrick · June 10, 2019, 7:44pm

Hello!

It's very exciting to finally see SPM being integrated in Xcode 11. I've seen mentioned multiple times over the last few days that currently SPM packages only support source code and unit tests. Support for assets and binaries need to be explored by the community. Taking inspiration from some ongoing Twitter discussions (here and here), I've decided to start this forum thread in order to gather some ideas on what it would take to support binaries distribution in SPM.

Adding binaries support to SPM would definitely increase adoption from bigger projects that may contain many dependencies and could be affected longer build times for example.

What do you think? What are the dependencies and restrictions of adding binaries distribution to SPM?

Braden_Scothern · June 10, 2019, 7:53pm

There were a few of us at try! Swift on Friday that started talking through a proposal and its behaviors. We will hopefully have a pitch for this very soon.

(We are currently getting it up on Github so we can work on it now that we aren't in person)

Aciid · June 11, 2019, 12:32am

There are two orthogonal areas that we need to tackle for "binary support" in SwiftPM:

Local/Remote build caching for your package dependencies
Distributing closed-source prebuilt binaries as Swift packages

I think @Braden_Scothern is talking about 2 where as @BalestraPatrick is talking about 1?

Braden_Scothern · June 11, 2019, 12:34am

Ya we started #2 and I’ve had some thoughts about #1 but am not actively working on it.

Thanks for helping clarify @Aciid

hartbit · June 11, 2019, 5:25am

I'm fairly sure @BalestraPatrick is talking about #2 also. @Braden_Scothern do you have the github pitch available to look at?

tmpz · June 11, 2019, 8:21am

Hi all,

Could someone take the time to explain to me the difference between #1 and #2 and how they are orthogonal?

bielikb · June 11, 2019, 8:39am

Hi,

I might not give you the best answer here, but imho:
#1 talks about building and caching the open source package, similar to what Carthage does, in order to gain productivity by not rebuilding the packages after each project cleanup.
#2 talks about providing support for closed source binaries in SwiftPM - this is essential for the integrators, same as for vendors, since supporting the binaries will cut out the need of having 3rd party dependency manager - eg Carthage or Cocoapods.

tmpz · June 11, 2019, 9:35am

I have been maintaining Carthage and a distributed cache manager for Carthage (GitHub - tmspzz/Rome: Carthage cache for S3, Minio, Ceph, Google Storage, Artifactory and many others) and I don't see any difference between the two cases.

Whether a binary is vendored or produced from source, it shouldn't make any difference. The goal is to never ever build again the same stuff over and over and over again. For example, once Alamofire is built by the CI running on Alamofire's repo, that's pretty much it. No one in the world should have to rebuild that commit ever again.

Ideally whether it's a fresh clone of a project (think CI) or a simple re-run, what is actually built should be the minimum possible subset.

At this level of granularity I therefore don't see a difference between vendored binaries and others. If a binary is available, grab it and be done with it. If not, build form source. If no source, fail.

bielikb · June 11, 2019, 9:54am

I agree with the above,

yet the main distinction here is that Swift Packages != xcframeworks thus require different handling/distribution/caching strategy. There are also some constraints, eg xcframework cant depend on swift package, etc.
Im happy for SwiftPM engineers and the guys from trySwift! to bring clarity to the two points mentioned earlier.

Best,
Boris.

BalestraPatrick · June 11, 2019, 12:54pm

As @tmpz explained, I am talking about both actually. I don't see a big technical difference between the two. Sure, open-source package binaries would need to be hosted somewhere in the open internet (GitHub, Artifactory, you name it), but we should also support closed-source binaries that could be stored on any arbitrary storage. This would allow to build a package only once from source and store it forever as a binary (with related artifacts and metadata if needed) under the form of XCFramework for example.

hartbit · June 11, 2019, 1:22pm

The difference is that #2 can be done first, before #1 and that without #2, users who depend on dependencies solely vendored as pre-built binaries can't replace Cocoapods/Carthage with SwiftPM.

tmpz · June 11, 2019, 1:44pm

Thanks for the explanation. I would consider #2 the base for #1 then, definitely not orthogonal problems imho.

Aciid · June 11, 2019, 2:04pm

These might look the same problem on the surface but they come with different sets of problems. Let me try to give some context:

Local/Remote build caching for your package dependencies

This basically means sources of dependencies is available to clients but they don't want to rebuild them after performing dependency resolution. One solution could be: SwiftPM builds and store the build products of the dependencies in some shared location and it shares the build products across all projects (when possible) that use this dependency on the system. SwiftPM would be responsible for things like building with the right client-specific specialization and cache eviction. This is mostly doesn't even require manifest API additions.

Alamofire is a bit of a trivial example as it doesn't have further dependencies. Imagine a package Foo that depends on package Bar from 1.0.0 and going up to 2.0.0 (.upToNextMajor(from: "1.0.0")). You don't really know what package Bar is going to resolve at for your clients and since SemVer means API stability but not ABI stability, it's not safe to cache build products of Bar at a particular version in that range. This is a good thing. Package authors who provide source shouldn't really need to concerns themselves with ABI stability. The package manager can handle all these things for you and cache the build products accordingly.

Distributing closed-source prebuilt binaries as Swift packages

In this case, we have different problems to solve. We need a manifest API to declare how binary artifacts are laid out and where to fetch them from. The build system can't do much at the client side. The package author is (mostly) responsible for making sure that the right artifacts are provided. Another problem is if these packages vendor or declare dependency on another package, this effectively means SwiftPM can only resolve to a single version if the other package appear multiple times in the graph (because of the same ABI issue). Or, you could be in a worse situation if the prebuilt binary statically links a package that you're also using in your app. Sure, you could use this for build caching but it'll be much better to have a separate feature for that. There is also a security aspect since it's difficult to inspect binaries than sources.

Braden_Scothern · June 11, 2019, 2:25pm

I just got the github link and will post it later today (I’m not the owner of it on github).

Braden_Scothern · June 11, 2019, 2:29pm

I am hoping to have time to work on #1 after finishing #2. It will depend on other proposals that I am also starting since try! swift that I want to share soon as well.

tmpz · June 11, 2019, 2:45pm

Thanks for the detailed explanation! I have a few more questions if you don't mind

You don't really know what package Bar is going to resolve at for your clients

Why is that? Once the dependency graph is resolved, this question is answered.

and since SemVer means API stability but not ABI stability, it's not safe to cache build products of Bar at a particular version in that range.

Can you elaborate on this? I think it's perfectly fine to cache build product per toolchain version in case the binary is not ABI stable. In case it is, then no issue there.

In this case, we have different problems to solve. We need a manifest API to declare how binary artifacts are laid out and where to fetch them from.

Why not use the same format to describe the cache? I would not assume the cache is system global or even lives on the same system where swiftPM is running. I would have a 2 level cache system, remote & local (system global). This is particularly important to achieve what in my opinion is the most important goal, never build a binary again if it has been built somewhere in the world.

Another problem is if these packages vendor or declare dependency on another package, this effectively means SwiftPM can only resolve to a single version if the other package appear multiple times in the graph

To my understanding this is the case already even for source.

Or, you could be in a worse situation if the prebuilt binary statically links a package that you're also using in your app.

I see the problem here. Wouldn't it work to just distribute as relocatable object file without linking performed?

Sure, you could use this for build caching but it'll be much better to have a separate feature for that.

I would rather prefer to use the same building blocks and all information made explicit and accessible. Any piece of information that is not in the manifest but is private knowledge of swiftPM will effectively prevent others from building tools that supplement swiftPM.

FranzBusch · June 11, 2019, 8:09pm

Regarding #2 at try swift San José @ddunbar, @Braden_Scothern and I started an initial draft for binary dependencies. So far we have laid out some syntax for defining a binary target and using a binary dependencies and addressed some security concerns. The proposal still needs some work before we can pitch it but we hope to get it to a presentable level soon.

SDGGiesbrecht · June 11, 2019, 8:42pm

I don’t think that should ever become a general assumption about how packages will be used.

I understand that it is useful to speed up development‐time tasks, so it would probably be a good idea in --configuration debug.

Ultimately though, a package’s products will be executed many more times than they are built or linked. I would much rather that --configuration release always built the entire dependency tree from scratch and applied an as‐yet hypothetical whole package optimization. For release builds the compiler could optimize across module and even package boundaries. Inlining could be done all the way through to the private methods of a dependency. Generic functions could be specialized once and provided even for types declared in a sub‐sub‐sub‐client. Dependencies could be dead stripped even to the point of removing entire sub‐dependencies—even dropping dynamic librarie products that would go unused. While no work has gone into this yet (that I know of), this direction has been hinted at since the earliest days of the package manager. And all these wonderful improvements would be blocked by a design that assumes a model of pre‐built dependencies.

(Closed‐source is closed‐source. None of what I said really applies to option 2.)

tmpz · June 11, 2019, 9:18pm

Great ideas. I totally agree that caching should be opt-in

ddunbar · June 13, 2019, 6:15pm

This is a bit of a tangent, but note that how we typically would approach this in LLVM is that the "build" of the initial package would just produce an intermediate form which still could be cached (and "never ever built again") that would cache much of the slow compilation work, while still allowing full program optimization to be done at link time. See also things like ThinLTO