[PITCH] Support for binary dependencies

A few of the Firebase frameworks rely on resources - InAppMessaging, Firestore's gRPC dependency, and some of the FirebaseML frameworks.

Also, almost all for the Firebase frameworks share public headers as part of frameworks.

I think it's possible to keep support for binaries orthogonal to support for resources.

By using .o files (or possibly a single pre-linked .o file) for binary support, we can allow resources for binary-based packages to be specified the same way as for source-based packages (which will be covered by a separate proposal) and let the build system assemble the final artifact. For some clients on some platforms, this might mean constructing an embedded framework; in other cases, it might mean statically linking in the code and copying the resources into the client. The tricky thing there will be to make sure that the code can access the resources even though it has already been compiled, but that should be solvable if there is a standard approach for doing this (defined in the resource proposal).

Supporting binaries through .o files rather than a finished runtime artifact lets us support both static and dynamic linking into the client, and avoids making the binary package have to bake in assumptions about the nature of the runtime artifact. And by using pre-linking (using ld -r) and vending just a single .o, the names of private externs and other possibly sensitive symbols can be removed from the binary artifact before it's distributed.

There are some issues to figure out, but I don't think that using .o files would preclude inclusion of resources.

1 Like

Supporting binaries through .o files rather than a finished runtime artifact lets us support both static and dynamic linking into the client, and avoids making the binary package have to bake in assumptions about the nature of the runtime artifact.

Will you explain why that is? The build tools support frameworks whether the included library is statically or dynamically linked.

It seems a loss of useful abstraction if library providers can no longer wrap the binaries, headers, and resources into a single entity.

Even though the linker supports the presence of a static library inside a framework, nothing happens by default with the resources inside such a framework. So they would still need to be copied from the framework to the client, with the resulting possibility of name collisions etc.

It seems to me that the proper abstraction here is not at the artifact level, but at the package target level, where there will be a defined way to provide resources etc (in a separate proposal).

This also avoids having to provide both a dynamic and a static version of the framework, since .o files could be linked as either static or dynamic (which an already-linked binary cannot).

Taking a step back, if the goal of binary packages is to avoid having to expose source code, then it seems that providing .o files instead of source is the right level to solve the problem, since it leaves as much flexibility as possible for the build system.

4 Likes

Aren't there problems with what symbols are expected to be in the .o files?
What about debug information? desymbolication information?

Thanks for posting this proposal! Here is my feedback:

The allowsBinary attribute and security impact

The allowsBinary attribute and its associated requirements are a good idea in theory, but I think the practical implications would make this too inconvenient as it is currently proposed, for a number of reasons:

  1. It's very unlikely a developer actually wants to think about applying this attribute to specific subparts of the dependency graph, because the position of the binary dependencies in the graph is not the interesting part. It's the fact that they are in the graph at all.

  2. It introduces a heavy burden on consumers of a binary package, who have to adopt the allowsBinary flag up through the entire dependency chain - it could be several levels deep. This could make it more difficult for package authors to add useful (perhaps optional) functionality that is dependent on binary packages, for fear of breaking clients. Further, because the allowsBinary attribute is on the package dependency, while binary targets are a kind of target, a dependency on that package doesn't necessarily mean that target is being used by the client and so the allowsBinary requirement is overbearing because it allows for potential of use rather than actual use.

  3. Lastly, there is a bit of a security hole: because allowsBinary effectively grants trust to the entire dependency subgraph to which it is applied, the addition of previously-unknown binary packages within that subgraph will NOT require any action from the developer of the root package, which is undesired. For example, consider the dependency chain App => A => B => C => D => E. App may trust A by marking its dependency allowsBinary, but if evil package E adds a dependency on a malware package M, allowsBinary wouldn't actually protect App from this new package being added to the graph, i.e. nothing about the proposed design would actively alert them.

I propose an alternative design based on the primary goal of forcing the developer to consciously acknowledge the addition of a package containing binary dependencies, to the graph, which would solve all of these problems and also be less intrusive for both package consumers and developers.

Instead of making allowsBinary part of the manifest, we should rely on the SwiftPM configuration file introduced in SE-0219 feature.

By default, all packages in the dependency graph which contain binary targets would be blacklisted unless specifically whitelisted in the configuration file.

Whitelisting of a particular package could be managed using the following commands:

# Add a package potentially containing binary targets to the whitelist
$ swift package config binary-package-whitelist --add \
      https://github.com/Core/libCore.git

# Remove a package potentially containing binary targets from the whitelist
$ swift package config binary-package-whitelist --remove \
      https://github.com/Core/libCore.git

# Set the whitelist to include all possible packages (wildcard)
$ swift package config binary-package-whitelist --add-all

# Remove all entries from the whitelist
$ swift package config binary-package-whitelist --clear

The config file might look like:

{
  ...
  "binary-package-whitelist": [
    "https://github.com/Core/libCore.git"
  ],
  "version": 2
}

Package resolution could also interactively prompt the user to add packages containing binary targets to the whitelist during package resolution to reduce the overhead of maintaining the whitelist manually.

$ swift build
WARNING: https://github.com/evilcorp/intrusive-analytics.git contains binary targets, do you want to add it to the whitelist? [y/n]

Note that the presence of a package in the whitelist would NOT apply to its dependencies, unlike the allowsBinary attribute. Every single package containing binary targets would be required to be present in the whitelist in order for the root package to be resolved. This ensures that newly added binary package dependencies are always consciously whitelisted when they are added.

Binary packages will likely become quite popular, and we run the risk of culturally normalizing addition of the allowsBinary attribute - eventually people may just add it to their packages pre-emptively, to avoid "blocking" downstream packages from adding binary package dependencies of their own, or just copy-pasting boilerplate from installation instructions without a second thought. But if we force acknowledgement on a package by package basis, we would go a lot further in keeping developers safe from unwanted content making it into their dependency graph.

In summary, allowsBinary forces you to trust that entire subgraph (and people often won't take the time to audit that). A whitelist based approach requires you to focus on whether to trust the specific individual packages which have greater security impact, and forces you to more consciously acknowledge them because you'd have to do so on a case by case basis.


A related thought is also that I think a similar problem exists with regard to software licenses in terms of controlling and acknowledging attributes of the content in the dependency graph. For many developers, certain licenses such as the GPL, or a proprietary or non-OSI approved license, may be problematic.

Having a manifest API flag for each of these conditions; allowsBinary, allowsGPL, etc., could lead to an unwieldy explosion of APIs for solving problems of this nature, while a configuration file based approach seems far more scalable and has greater flexibility of configuration (i.e. for a license restriction, you'd probably want to blacklist certain license types rather than whitelist certain packages).

Artifact conditions and architectures

I recognize the need to conditionalize binary artifacts based on architecture, but this opens a massive can of worms once we start thinking about multiple platforms and especially Linux.

More generally, it's not so much the architecture we need to be concerned about, but the ABI (i.e. what an LLVM target triple roughly describes). For example, macOS and Mac Catalyst binaries are both x86_64 architecture, but they are different and incompatible ABIs (described by the LLVM target triples (x86_64-apple-macos vs x86_64-apple-ios-macabi), and can't be linked together.

I've had a lot of experience with this specific issue as it pertains to cross platform build systems, and I think trying to map a single architecture meaningfully across platforms is simply not possible. The precise meaning of the two proposed values "arm" and "x86" has also not been defined, for example 32-bit vs 64-bit, how incompatible sub-architectures and variants come into play, etc.

For Apple's platforms we have 6 different architectures across our ABIs: armv7k, arm64, arm64e, i386, x86_64, and x86_64h. There is no way to map those 1:1 with those of other platforms, like ARM / ARM64 / X86 / X64 on Windows and armeabi-v7a / arm64-v8a / x86 / x86_64 on Android, or the dozens of Linux ABIs which are often incompatible even within the same architecture.

@lukasa's first reply and @compnerd's post in particular does an excellent job at explaining how difficult this issue is on Linux in particular, especially for ARMv7.

I would lean towards taking an approach where we eliminate the architecture conditional from the manifest and instead rely on the binary artifact packaging described in the Binary Target Artifact Format section of the proposal. For each major platform family:

Apple

We use XCFrameworks. A single XCFramework can support any or all of the ABIs relevant for Apple OSes (macOS, iOS, tvOS, watchOS) as well as the special cases of Mac Catalyst and DriverKit. Done.

Windows

Nest artifacts inside architecture folders using the same naming convention as Microsoft does: ARM, ARM64, X86, X64. Done.

Android

Similar with Windows, nest inside architecture folders using the platform naming convention: armeabi-v7a, arm64-v8a, x86, x86_64. Done.

Note that armeabi (armv5), mips and mips64 are obsolete since NDK r17; let's not worry about those.

Linux

Given the difficulty in dealing with the Linux issue, I almost wonder if we should defer binary target support on Linux specifically. It's incredibly uncommon for developers to rely on binary-only dependencies on Linux (for reason of them being proprietary), and for everything else there's always the system's package manager (apt, yum, etc.).

I would like to see a compelling argument for why we absolutely need binary packages on Linux when we have the system package manager to handle the vast majority (if not close to 100%) of those use cases, including for closed-source proprietary packages (of which there are few to begin with).


The overall approach I described above would also be compatible with the suggestion @NeoNacho metioned regarding use of object files as the binary artifacts (which I agree makes sense).

Nitpicks/other

One of the use cases listed is: "A large company has an internal team which wants to deliver a Swift package for use in their iOS applications, but for security reasons cannot publish the source code."

As someone with a bit of a security background, this bothers me a little. I would prefer it say "for business reasons". Calling it security-related encourages the fallacy of security by obscurity, which is a harmful idea.

4 Likes

Debug information is stored as DWARF in the .o files. .dSYM files and other post-linking products can then be created based on this information so that the debugging information doesn't have to be included in the linked binary, but with the default toolchain it's all there in the .o files.

What wouldn't be there would be the list of libraries and other linker inputs, but that would be in the package manifest just as if source files had been provided.

.o files are really the closest drop-in replacement we have for source files, leaving the other parts of the package as close to source form as possible.

1 Like

I like the whitelisting idea, what do you think of combining it with allowsBinary and use it as the entry point to trigger the prompt?

In the fullness of time I’d like to see the package manager be able to distribute pre-compiled binaries of source-available packages on Linux expressly to allow us to enable compiling awkward packages with weird build dependencies in SwiftPM. However, I agree that much more functionality is required to enable that to work, so for now I am in agreement that we should simply punt on Linux support.

I don't think we should have the allowsBinary flag at all, but could you elaborate on why you think combining it with the whitelist would be useful?

My concern is with a global whitelist. If I'm not mistaken this is the case with your proposed solution. As a consumer all my projects would now consult this list and silently use it.

allowsBinary could still be useful as a per-package explicit opt-in to the global whitelist.

Not sure it's the best way though.

Big +1 on the whitelist system, convenient but flexible. Also I love the pragmatic approach to architectures. Might be biased because I use SPM only for Apple platforms but using the new XCFramework format would be awesome.

This summary is exactly why I think it's wrong to think of allowsBinary as a security feature and design it with that in mind. If you're going to audit every update to every part of your dependency graph then you'll be well aware of any added dependencies, because you'll have to audit them too. In the much more likely situation where you're not going to do that, whether the malware is in a binary or the source code is essentially irrelevant. You need to trust your dependencies for better reasons than the source being theoretically, but not practically, inspectable. Consider all the unintentionally introduced security issues that survive for years or decades in open source software.

2 Likes

I don't quite understand how moving the opt-in to a configuration file solves this in the general case. Each package needs to be able to work and be tested in isolation, so now the author has to add the new binary to the configuration file of every package in the chain. That seems like a very similar burden to me.

Sorry if I wasn't clear enough -- I don't mean that the whitelist should be global to the user's machine, I meant that the whitelist should apply to the entire dependency graph of a specific root package. The config file is stored at .swiftpm/config in your repository's root directory (again, see SE-0219).

Is that what you were concerned about? I can update my post to try and clarify.

It distributes the load better.

With the allowsBinary flag, for A => B => C => D => E where E adds a dependency on binary package F, E must add the flag, followed by D, followed by C, followed by B, finally followed by A.

With the whitelist approach, if E adds the dependency on binary package F and the new version of E is compatible with the version constraints in the package dependency graph, A can begin using it immediately, without B, C, D having to do anything. That's very valuable.

Think of it in terms of a dependency graph; this allows better "parallelization" of effort.

Ah right, that makes sense.

I think I would prefer the whitelist to be on a binary artifact vs. a package basis, though, since one package can have multiple binary targets. Maybe it could even be combined with my idea of using URL prefix matches for opt-ins, such that one can optionally trust all binaries coming from a certain domain.

Yes, I like that.

Do we want to require that the binary artifacts always come from a remote URL, though? What if a developer wants to store the binary artifact inside the binary package's repo and reference it using a relative path? Whitelisting becomes a little more complicated if that is possible.

Not quite sure, the current proposal requires it, but I was actually thinking it might be good to also allow referencing artifacts from the repository. I don't think anyone has brought it up as something they would like to see on the thread so far.

That clears it. Thanks!

Terms of Service

Privacy Policy

Cookie Policy