[PITCH] Support for binary dependencies

SDGGiesbrecht · August 13, 2019, 8:50pm

I would love to have conditional dependencies... but it is actually source dependencies I want to make conditional (and also the same on a per‐target basis). I wouldn’t like a world where people would be (inadvertently) encouraged to switch to binaries in order to enable conditions because the new binary API has them, but the existing source API doesn’t yet. I see conditionals as valuable, but I think they belong as a separate proposal (which could either come before or come after this one).

NeoNacho · August 13, 2019, 9:08pm

A few SwiftPM contributors (@abertelrud, @Aciid, @ddunbar, @rballard and me) met, discussed this proposal and came to the following observations:

Binary targets should correspond closely to source-based targets, which means their output would be object files as opposed to dynamic/static libraries. This would also mean that the name of a binary target matches is equal to its module name. This would improve consistency with existing targets and eliminate problems posed by static libraries (such as linking the same one into multiple dynamic images in the same process).
A binaryTarget declaration could specifiy a hash of the binary artifact to ensure the downloaded file matches the package manifest author's expectation.
If an opt-in mechanism is used, adding a new binary target to an existing package requires a major version bump, because it will require any clients to change their manifests to include the opt-in.

The current proposal is also quite vague about the archives and formats (e.g. can the archive be a tarball?), it would be good to make this more concrete.

NeoNacho · August 13, 2019, 9:14pm

I agree that this seems orthogonal to this proposal. That should be a separate proposal, which supports any kind of packages, especially since this proposal only introduces binary targets, so strictly speaking there are no binary dependencies.

FranzBusch · August 13, 2019, 9:23pm

While I understand the goal of this to align it with the outputs of source-based targets I think this would severely limit the community adoption for this if I understand it correctly. This would basically mean that we would expect .a file as a binary format. A lot of binary distributed frameworks in the iOS world contain resources and with XCFrameworks that would be allowed as well. Nevertheless, I see that we will have problems with linking which result in duplicated symbols, e.g.:

Target A (source):

Depends on B and C

Target B (binary/dynamic):

Depends on C

Target C (source)

@NeoNacho Is our goal to prohibit this kind of scenario?

What problem do we want to solve with this? Since the author of the package could swap that out anytime and we would implement a binary hash in the .resolved anyhow. Could you elaborate a bit here please

That's completely true and I haven't thought about the SEMVER implications yet, though I think requiring a major version bump when depending on a new binary dependency is not the worst.

My first implementation would have been based on zips but supporting different formats might make sense. I will add a section in the proposal.

SDGGiesbrecht · August 13, 2019, 9:35pm

Then I would really like to see resources work with source targets first (or at the same time). Otherwise it will unnecessarily end up encouraging everyone to switch to binaries.

NeoNacho · August 13, 2019, 9:36pm

We would expect one (or more) object files (.o) which should be more flexible, because then the product can determine the linkage again, similar to how it does for source-based targets. It would mean that vendors of existing frameworks would likely need a separate build for supporting SwiftPM, but I don't think it is necessarily a goal that vendors can re-use existing artifacts.

Regarding resources, since Swift packages don't support them at all, I don't think this proposal should concern itself with that at the moment.

I believe this is a common practice in many package managers (e.g. homebrew) to make it more difficult to inject a malicious binary. An attacker would need to both compromise the server which provides the binary artifacts as well as the GH repo which provides the package manifest. A secondary reason is that the server providing the binary might be out of the package author's control and this way we can ensure that the expected binary is used.

Agreed, but this could also be worth spelling out in the proposal, so that everyone is on the same page.

FranzBusch · August 14, 2019, 7:24am

Thanks for expanding on these points. I will work the above mentioned points into the proposal over next few days, just need to find time for it.

FranzBusch · August 14, 2019, 7:26am

@Ryan_Wilson It would be interesting to get your feedback on the .o file approach from a Firebase point of view. Could your whole suite of frameworks work with just .o files? What frameworks rely on resources?

paulb777 · August 17, 2019, 10:44pm

A few of the Firebase frameworks rely on resources - InAppMessaging, Firestore's gRPC dependency, and some of the FirebaseML frameworks.

Also, almost all for the Firebase frameworks share public headers as part of frameworks.

abertelrud · August 19, 2019, 4:05pm

I think it's possible to keep support for binaries orthogonal to support for resources.

By using .o files (or possibly a single pre-linked .o file) for binary support, we can allow resources for binary-based packages to be specified the same way as for source-based packages (which will be covered by a separate proposal) and let the build system assemble the final artifact. For some clients on some platforms, this might mean constructing an embedded framework; in other cases, it might mean statically linking in the code and copying the resources into the client. The tricky thing there will be to make sure that the code can access the resources even though it has already been compiled, but that should be solvable if there is a standard approach for doing this (defined in the resource proposal).

Supporting binaries through .o files rather than a finished runtime artifact lets us support both static and dynamic linking into the client, and avoids making the binary package have to bake in assumptions about the nature of the runtime artifact. And by using pre-linking (using ld -r) and vending just a single .o, the names of private externs and other possibly sensitive symbols can be removed from the binary artifact before it's distributed.

There are some issues to figure out, but I don't think that using .o files would preclude inclusion of resources.

paulb777 · August 19, 2019, 4:14pm

Supporting binaries through .o files rather than a finished runtime artifact lets us support both static and dynamic linking into the client, and avoids making the binary package have to bake in assumptions about the nature of the runtime artifact.

Will you explain why that is? The build tools support frameworks whether the included library is statically or dynamically linked.

It seems a loss of useful abstraction if library providers can no longer wrap the binaries, headers, and resources into a single entity.

abertelrud · August 19, 2019, 5:02pm

Even though the linker supports the presence of a static library inside a framework, nothing happens by default with the resources inside such a framework. So they would still need to be copied from the framework to the client, with the resulting possibility of name collisions etc.

It seems to me that the proper abstraction here is not at the artifact level, but at the package target level, where there will be a defined way to provide resources etc (in a separate proposal).

This also avoids having to provide both a dynamic and a static version of the framework, since .o files could be linked as either static or dynamic (which an already-linked binary cannot).

Taking a step back, if the goal of binary packages is to avoid having to expose source code, then it seems that providing .o files instead of source is the right level to solve the problem, since it leaves as much flexibility as possible for the build system.

tmpz · August 20, 2019, 3:12pm

Aren't there problems with what symbols are expected to be in the .o files?
What about debug information? desymbolication information?

jakepetroules · August 21, 2019, 7:32am

Thanks for posting this proposal! Here is my feedback:

The `allowsBinary` attribute and security impact

The allowsBinary attribute and its associated requirements are a good idea in theory, but I think the practical implications would make this too inconvenient as it is currently proposed, for a number of reasons:

It's very unlikely a developer actually wants to think about applying this attribute to specific subparts of the dependency graph, because the position of the binary dependencies in the graph is not the interesting part. It's the fact that they are in the graph at all.
It introduces a heavy burden on consumers of a binary package, who have to adopt the allowsBinary flag up through the entire dependency chain - it could be several levels deep. This could make it more difficult for package authors to add useful (perhaps optional) functionality that is dependent on binary packages, for fear of breaking clients. Further, because the allowsBinary attribute is on the package dependency, while binary targets are a kind of target, a dependency on that package doesn't necessarily mean that target is being used by the client and so the allowsBinary requirement is overbearing because it allows for potential of use rather than actual use.
Lastly, there is a bit of a security hole: because allowsBinary effectively grants trust to the entire dependency subgraph to which it is applied, the addition of previously-unknown binary packages within that subgraph will NOT require any action from the developer of the root package, which is undesired. For example, consider the dependency chain App => A => B => C => D => E. App may trust A by marking its dependency allowsBinary, but if evil package E adds a dependency on a malware package M, allowsBinary wouldn't actually protect App from this new package being added to the graph, i.e. nothing about the proposed design would actively alert them.

I propose an alternative design based on the primary goal of forcing the developer to consciously acknowledge the addition of a package containing binary dependencies, to the graph, which would solve all of these problems and also be less intrusive for both package consumers and developers.

Instead of making allowsBinary part of the manifest, we should rely on the SwiftPM configuration file introduced in SE-0219 feature.

By default, all packages in the dependency graph which contain binary targets would be blacklisted unless specifically whitelisted in the configuration file.

Whitelisting of a particular package could be managed using the following commands:

# Add a package potentially containing binary targets to the whitelist
$ swift package config binary-package-whitelist --add \
      https://github.com/Core/libCore.git

# Remove a package potentially containing binary targets from the whitelist
$ swift package config binary-package-whitelist --remove \
      https://github.com/Core/libCore.git

# Set the whitelist to include all possible packages (wildcard)
$ swift package config binary-package-whitelist --add-all

# Remove all entries from the whitelist
$ swift package config binary-package-whitelist --clear

The config file might look like:

{
  ...
  "binary-package-whitelist": [
    "https://github.com/Core/libCore.git"
  ],
  "version": 2
}

Package resolution could also interactively prompt the user to add packages containing binary targets to the whitelist during package resolution to reduce the overhead of maintaining the whitelist manually.

$ swift build
WARNING: https://github.com/evilcorp/intrusive-analytics.git contains binary targets, do you want to add it to the whitelist? [y/n]

Note that the presence of a package in the whitelist would NOT apply to its dependencies, unlike the allowsBinary attribute. Every single package containing binary targets would be required to be present in the whitelist in order for the root package to be resolved. This ensures that newly added binary package dependencies are always consciously whitelisted when they are added.

Binary packages will likely become quite popular, and we run the risk of culturally normalizing addition of the allowsBinary attribute - eventually people may just add it to their packages pre-emptively, to avoid "blocking" downstream packages from adding binary package dependencies of their own, or just copy-pasting boilerplate from installation instructions without a second thought. But if we force acknowledgement on a package by package basis, we would go a lot further in keeping developers safe from unwanted content making it into their dependency graph.

In summary, allowsBinary forces you to trust that entire subgraph (and people often won't take the time to audit that). A whitelist based approach requires you to focus on whether to trust the specific individual packages which have greater security impact, and forces you to more consciously acknowledge them because you'd have to do so on a case by case basis.

A related thought is also that I think a similar problem exists with regard to software licenses in terms of controlling and acknowledging attributes of the content in the dependency graph. For many developers, certain licenses such as the GPL, or a proprietary or non-OSI approved license, may be problematic.

Having a manifest API flag for each of these conditions; allowsBinary, allowsGPL, etc., could lead to an unwieldy explosion of APIs for solving problems of this nature, while a configuration file based approach seems far more scalable and has greater flexibility of configuration (i.e. for a license restriction, you'd probably want to blacklist certain license types rather than whitelist certain packages).

Artifact conditions and architectures

I recognize the need to conditionalize binary artifacts based on architecture, but this opens a massive can of worms once we start thinking about multiple platforms and especially Linux.

More generally, it's not so much the architecture we need to be concerned about, but the ABI (i.e. what an LLVM target triple roughly describes). For example, macOS and Mac Catalyst binaries are both x86_64 architecture, but they are different and incompatible ABIs (described by the LLVM target triples (x86_64-apple-macos vs x86_64-apple-ios-macabi), and can't be linked together.

I've had a lot of experience with this specific issue as it pertains to cross platform build systems, and I think trying to map a single architecture meaningfully across platforms is simply not possible. The precise meaning of the two proposed values "arm" and "x86" has also not been defined, for example 32-bit vs 64-bit, how incompatible sub-architectures and variants come into play, etc.

For Apple's platforms we have 6 different architectures across our ABIs: armv7k, arm64, arm64e, i386, x86_64, and x86_64h. There is no way to map those 1:1 with those of other platforms, like ARM / ARM64 / X86 / X64 on Windows and armeabi-v7a / arm64-v8a / x86 / x86_64 on Android, or the dozens of Linux ABIs which are often incompatible even within the same architecture.

@lukasa's first reply and @compnerd's post in particular does an excellent job at explaining how difficult this issue is on Linux in particular, especially for ARMv7.

I would lean towards taking an approach where we eliminate the architecture conditional from the manifest and instead rely on the binary artifact packaging described in the Binary Target Artifact Format section of the proposal. For each major platform family:

Apple

We use XCFrameworks. A single XCFramework can support any or all of the ABIs relevant for Apple OSes (macOS, iOS, tvOS, watchOS) as well as the special cases of Mac Catalyst and DriverKit. Done.

Windows

Nest artifacts inside architecture folders using the same naming convention as Microsoft does: ARM, ARM64, X86, X64. Done.

Android

Similar with Windows, nest inside architecture folders using the platform naming convention: armeabi-v7a, arm64-v8a, x86, x86_64. Done.

Note that armeabi (armv5), mips and mips64 are obsolete since NDK r17; let's not worry about those.

Linux

Given the difficulty in dealing with the Linux issue, I almost wonder if we should defer binary target support on Linux specifically. It's incredibly uncommon for developers to rely on binary-only dependencies on Linux (for reason of them being proprietary), and for everything else there's always the system's package manager (apt, yum, etc.).

I would like to see a compelling argument for why we absolutely need binary packages on Linux when we have the system package manager to handle the vast majority (if not close to 100%) of those use cases, including for closed-source proprietary packages (of which there are few to begin with).

The overall approach I described above would also be compatible with the suggestion @NeoNacho metioned regarding use of object files as the binary artifacts (which I agree makes sense).

Nitpicks/other

One of the use cases listed is: "A large company has an internal team which wants to deliver a Swift package for use in their iOS applications, but for security reasons cannot publish the source code."

As someone with a bit of a security background, this bothers me a little. I would prefer it say "for business reasons". Calling it security-related encourages the fallacy of security by obscurity, which is a harmful idea.

abertelrud · August 21, 2019, 8:10am

Debug information is stored as DWARF in the .o files. .dSYM files and other post-linking products can then be created based on this information so that the debugging information doesn't have to be included in the linked binary, but with the default toolchain it's all there in the .o files.

What wouldn't be there would be the list of libraries and other linker inputs, but that would be in the package manifest just as if source files had been provided.

.o files are really the closest drop-in replacement we have for source files, leaving the other parts of the package as close to source form as possible.

tmpz · August 21, 2019, 8:19am

I like the whitelisting idea, what do you think of combining it with allowsBinary and use it as the entry point to trigger the prompt?

lukasa · August 21, 2019, 8:43am

In the fullness of time I’d like to see the package manager be able to distribute pre-compiled binaries of source-available packages on Linux expressly to allow us to enable compiling awkward packages with weird build dependencies in SwiftPM. However, I agree that much more functionality is required to enable that to work, so for now I am in agreement that we should simply punt on Linux support.

jakepetroules · August 21, 2019, 8:44am

I don't think we should have the allowsBinary flag at all, but could you elaborate on why you think combining it with the whitelist would be useful?

tmpz · August 21, 2019, 8:57am

My concern is with a global whitelist. If I'm not mistaken this is the case with your proposed solution. As a consumer all my projects would now consult this list and silently use it.

allowsBinary could still be useful as a per-package explicit opt-in to the global whitelist.

Not sure it's the best way though.

AlexisQapa · August 21, 2019, 8:57am

Big +1 on the whitelist system, convenient but flexible. Also I love the pragmatic approach to architectures. Might be biased because I use SPM only for Apple platforms but using the new XCFramework format would be awesome.