[PITCH] Support for binary dependencies

jakepetroules · August 21, 2019, 7:32am

Thanks for posting this proposal! Here is my feedback:

The `allowsBinary` attribute and security impact

The allowsBinary attribute and its associated requirements are a good idea in theory, but I think the practical implications would make this too inconvenient as it is currently proposed, for a number of reasons:

It's very unlikely a developer actually wants to think about applying this attribute to specific subparts of the dependency graph, because the position of the binary dependencies in the graph is not the interesting part. It's the fact that they are in the graph at all.
It introduces a heavy burden on consumers of a binary package, who have to adopt the allowsBinary flag up through the entire dependency chain - it could be several levels deep. This could make it more difficult for package authors to add useful (perhaps optional) functionality that is dependent on binary packages, for fear of breaking clients. Further, because the allowsBinary attribute is on the package dependency, while binary targets are a kind of target, a dependency on that package doesn't necessarily mean that target is being used by the client and so the allowsBinary requirement is overbearing because it allows for potential of use rather than actual use.
Lastly, there is a bit of a security hole: because allowsBinary effectively grants trust to the entire dependency subgraph to which it is applied, the addition of previously-unknown binary packages within that subgraph will NOT require any action from the developer of the root package, which is undesired. For example, consider the dependency chain App => A => B => C => D => E. App may trust A by marking its dependency allowsBinary, but if evil package E adds a dependency on a malware package M, allowsBinary wouldn't actually protect App from this new package being added to the graph, i.e. nothing about the proposed design would actively alert them.

I propose an alternative design based on the primary goal of forcing the developer to consciously acknowledge the addition of a package containing binary dependencies, to the graph, which would solve all of these problems and also be less intrusive for both package consumers and developers.

Instead of making allowsBinary part of the manifest, we should rely on the SwiftPM configuration file introduced in SE-0219 feature.

By default, all packages in the dependency graph which contain binary targets would be blacklisted unless specifically whitelisted in the configuration file.

Whitelisting of a particular package could be managed using the following commands:

# Add a package potentially containing binary targets to the whitelist
$ swift package config binary-package-whitelist --add \
      https://github.com/Core/libCore.git

# Remove a package potentially containing binary targets from the whitelist
$ swift package config binary-package-whitelist --remove \
      https://github.com/Core/libCore.git

# Set the whitelist to include all possible packages (wildcard)
$ swift package config binary-package-whitelist --add-all

# Remove all entries from the whitelist
$ swift package config binary-package-whitelist --clear

The config file might look like:

{
  ...
  "binary-package-whitelist": [
    "https://github.com/Core/libCore.git"
  ],
  "version": 2
}

Package resolution could also interactively prompt the user to add packages containing binary targets to the whitelist during package resolution to reduce the overhead of maintaining the whitelist manually.

$ swift build
WARNING: https://github.com/evilcorp/intrusive-analytics.git contains binary targets, do you want to add it to the whitelist? [y/n]

Note that the presence of a package in the whitelist would NOT apply to its dependencies, unlike the allowsBinary attribute. Every single package containing binary targets would be required to be present in the whitelist in order for the root package to be resolved. This ensures that newly added binary package dependencies are always consciously whitelisted when they are added.

Binary packages will likely become quite popular, and we run the risk of culturally normalizing addition of the allowsBinary attribute - eventually people may just add it to their packages pre-emptively, to avoid "blocking" downstream packages from adding binary package dependencies of their own, or just copy-pasting boilerplate from installation instructions without a second thought. But if we force acknowledgement on a package by package basis, we would go a lot further in keeping developers safe from unwanted content making it into their dependency graph.

In summary, allowsBinary forces you to trust that entire subgraph (and people often won't take the time to audit that). A whitelist based approach requires you to focus on whether to trust the specific individual packages which have greater security impact, and forces you to more consciously acknowledge them because you'd have to do so on a case by case basis.

A related thought is also that I think a similar problem exists with regard to software licenses in terms of controlling and acknowledging attributes of the content in the dependency graph. For many developers, certain licenses such as the GPL, or a proprietary or non-OSI approved license, may be problematic.

Having a manifest API flag for each of these conditions; allowsBinary, allowsGPL, etc., could lead to an unwieldy explosion of APIs for solving problems of this nature, while a configuration file based approach seems far more scalable and has greater flexibility of configuration (i.e. for a license restriction, you'd probably want to blacklist certain license types rather than whitelist certain packages).

Artifact conditions and architectures

I recognize the need to conditionalize binary artifacts based on architecture, but this opens a massive can of worms once we start thinking about multiple platforms and especially Linux.

More generally, it's not so much the architecture we need to be concerned about, but the ABI (i.e. what an LLVM target triple roughly describes). For example, macOS and Mac Catalyst binaries are both x86_64 architecture, but they are different and incompatible ABIs (described by the LLVM target triples (x86_64-apple-macos vs x86_64-apple-ios-macabi), and can't be linked together.

I've had a lot of experience with this specific issue as it pertains to cross platform build systems, and I think trying to map a single architecture meaningfully across platforms is simply not possible. The precise meaning of the two proposed values "arm" and "x86" has also not been defined, for example 32-bit vs 64-bit, how incompatible sub-architectures and variants come into play, etc.

For Apple's platforms we have 6 different architectures across our ABIs: armv7k, arm64, arm64e, i386, x86_64, and x86_64h. There is no way to map those 1:1 with those of other platforms, like ARM / ARM64 / X86 / X64 on Windows and armeabi-v7a / arm64-v8a / x86 / x86_64 on Android, or the dozens of Linux ABIs which are often incompatible even within the same architecture.

@lukasa's first reply and @compnerd's post in particular does an excellent job at explaining how difficult this issue is on Linux in particular, especially for ARMv7.

I would lean towards taking an approach where we eliminate the architecture conditional from the manifest and instead rely on the binary artifact packaging described in the Binary Target Artifact Format section of the proposal. For each major platform family:

Apple

We use XCFrameworks. A single XCFramework can support any or all of the ABIs relevant for Apple OSes (macOS, iOS, tvOS, watchOS) as well as the special cases of Mac Catalyst and DriverKit. Done.

Windows

Nest artifacts inside architecture folders using the same naming convention as Microsoft does: ARM, ARM64, X86, X64. Done.

Android

Similar with Windows, nest inside architecture folders using the platform naming convention: armeabi-v7a, arm64-v8a, x86, x86_64. Done.

Note that armeabi (armv5), mips and mips64 are obsolete since NDK r17; let's not worry about those.

Linux

Given the difficulty in dealing with the Linux issue, I almost wonder if we should defer binary target support on Linux specifically. It's incredibly uncommon for developers to rely on binary-only dependencies on Linux (for reason of them being proprietary), and for everything else there's always the system's package manager (apt, yum, etc.).

I would like to see a compelling argument for why we absolutely need binary packages on Linux when we have the system package manager to handle the vast majority (if not close to 100%) of those use cases, including for closed-source proprietary packages (of which there are few to begin with).

The overall approach I described above would also be compatible with the suggestion @NeoNacho metioned regarding use of object files as the binary artifacts (which I agree makes sense).

Nitpicks/other

One of the use cases listed is: "A large company has an internal team which wants to deliver a Swift package for use in their iOS applications, but for security reasons cannot publish the source code."

As someone with a bit of a security background, this bothers me a little. I would prefer it say "for business reasons". Calling it security-related encourages the fallacy of security by obscurity, which is a harmful idea.