[Pitch] SwiftPM Support for Binary Static Library Dependencies

Hi everyone, I have a follow on proposal for SE-0272 and SE-0305, I would like to pitch to the community. For those who prefer a plain markdown file it's available here.

Binary Static Library Dependencies

Introduction

Swift continues to grow as a cross-platform language supporting a wide variety of use cases from programming embedded device to server-side development across a multitude of operating systems.
However, currently SwiftPM supports linking against binary dependencies on Apple platforms only.
This proposal aims to make it possible to provide static library dependencies exposing a C interface on non-Apple platforms.

Swift-evolution thread:

Motivation

The Swift Package Manager’s binaryTarget type lets packages vend libraries that either cannot be built in Swift Package Manager for technical reasons,
or for which the source code cannot be published for legal or other reasons.

In the current version of SwiftPM, binary targets support the following:

  • Libraries in an Xcode-oriented format called XCFramework, and only for Apple platforms, introduced in SE-0272.
  • Executables through the use of artifact bundles introduced in SE-0305.

We aim here to bring a subset of the XCFramework capabilities to non-Apple platforms in a safe way.

While this proposal is specifically focused on binary static library dependencies without unexpected unresolved external symbols on non-Apple platforms,
it tries to do so in a way that will not prevent broader future support for static libraries and dynamically linked libraries.

Proposed solution

This proposal extends artifact bundles introduced by SE-0305 to include a new kind of artifact type to represent a binary library dependency: staticLibrary.
The artifact manifest would encode the following information for each variant:

  • The static library to pass to the linker.
    On Apple and Linux platforms, this would be .a files and on Windows it would be a .lib file.
  • Enough information to be able to use the library's API in the packages source code,
    i.e., headers and module maps for libraries exporting a C-based interface.

Additionnaly, we propose the addition of an auditing tool that can validate the library artifact is safe to use across the Linux-based platforms supported by the Swift project.
Such a tool would ensure that people do not accidentally distribute artifacts that require dependencies that are not met on the various deployment platforms.

Detailed design

This section describes the changes to artifact bundle manifests in detail, the semantic impact of the changes on SwiftPM's build infrastructure, and describes the operation of the auditing tool.

Artifact Manifest Semantics

The artifact manifest JSON format for a static library is described below:

{
    "schemaVersion": "1.0",
    "artifacts": {
        "<identifier>": {
            "version": "<version number>",
            "type": "staticLibrary",
            "variants": [
                {
                    "path": "<relative-path-to-library-file>",
                    "headerPaths": ["<relative-path-to-header-directory-1>, ...],
                    "moduleMapPath": "<path-to-module-map>",
                    "supportedTriples": ["<triple1>", ... ],
                },
                ...
            ]
        },
        ...
    }
}

The additions are:

  • The staticLibrary artifact type that indicates this binary artifact is not an executable but rather a static library to link against.
  • The headerPaths field specifies directory paths relative to the root of the artifact bundle that contain the header interfaces to the static library.
    These are forwarded along to the swift compiler (or the C compiler) using the usual search path arguments.
    Each of these directories can optionally contain a module.modulemap file that will be used for importing the API into Swift code.
  • The optional moduleMapPath field specifies a custom module map to use if the header paths do not contain the module definitions or to provide custom overrides.

As with executable binary artifacts, the path field represents the relative path to the binary from the root of the artifact bundle,
and the supportedTriples field provides information about the target triples supported by this variant.

Auditing tool

Without proper auditing it would be very easy to provide binary static library artifacts that call into unresolved external symbols that are not available on the runtime platform, e.g., due to missing linkage to a system dynamic library.

We propose the introduction of a new tool that can validate the "safety" of a binary library artifact across the platforms it supports and the corresponding runtime environment.

In this proposal we restrict ourselves to static libraries that do not have any external dependencies beyond the C standard library and runtime.
To achieve this we need to be able to detect validate this property across the three object file formats used in static libraries on our supported platforms: Mach-O on Apple platforms, ELF on Linux-based platforms, and COFF on Windows.
All three formats express references to external symbols as relocations which reside in a single section of each object file.

We propose adding the llvm-objdump to the toolchain to provide the capability to inspect relocations across all three supported object file formats. The tool would use llvm-objdump every object file in the static library and construct a complete list of symbols defined and referenced across the entire library.
Additionally, the tool would construct a simple C compiler invocation to derive to generate a default linker invocation.
This would be used to derive the libraries linked by default in a C program, these libraries would then be scanned to contribute to the list of defined symbols.
The tool would then check that the referenced symbols list is a subset of the set of defined symbols and emit an error otherwise.

This would be sufficient to guarantee that all symbols from the static library would be available at runtime for statically linked executables or for ones running on the build host.
To ensure maximum runtime compatibility we would also provide a Linux-based Docker image that uses the oldest supported glibc for a given Swift version.
As glibc is backwards compatible, a container running the audit on a given static library would ensure that the version of glibc on any runtime platform would be compatible with the binary artifact.
This strategy as been succesfully employed in the Python community with manylinux.

Security

This proposal brings the security implications outlined in SE-0272 to non-Apple platforms,
namely that a malicious attacker having access to both the server hosting the artifact and the git repository that vends the Package Manifest could provide a malicious library.
Users should exercise caution when onboarding binary dependencies.

Impact on existing packages

No current package should be affected by this change since this is only an additive change in enabling SwiftPM to use binary target library dependencies on non-Apple platforms.

Future directions

Support Swift static libraries

To do this we would extend the static library binary artifact manifest to provide a .swiftinterface file that can be consumed by the Swift compiler to import the Swift APIs.
Additionally we would extend the auditing tool to validate the usage of Swift standard library and runtime symbols, e.g., from libSwiftCore.

Extend binary compatibility guarantees

This proposal limits itself to providing facilities for binary compatibility only with the C standard library and runtime.
In the future we could provide a system to allow binary artifact distributors to specify additional linkage dependencies for their binary artifacts.
These would be used to customize the operation of the audit tool and perform automatic linking of them in any client target that depends on the binary artifact, in the same way CMake propagates link dependencies transitively.

Add support for dynamically linked dependencies

On Windows dynamic linking requires an import library which is a small static library that contains stubs for symbols exported by the dynamic library.
These stubs are roughly equivalent to a PLT entry in an ELF executable, but are generated during the build of the dynamic library and must be provided to clients of the library for linking purposes.
Similarly on Linux and Apple platforms binary artifact maintainers may wish to provide a dynamic library stub to improve link performance.
To support these use cases the library binary artifact manifest schema could be extended to provide facilities to provide both a link-time and runtime dependency.

18 Likes

Overall, looks good!

Why is supportedTriples an array? Is that for multi-arch Mach-Os? I'll note that the existing XCFramework based solution does not support multi-arch Mach-Os and instead requires a distinct file per architecture. It might be better to be consistent, just so there aren't multiple ways of doing things.

Also, can you provide a concrete example of this JSON schema in the pitch?

Do we actually document Swift's required glibc version anywhere? FWIW, I'd like to bump that to 2.29 in Swift 6.2 so that we can rely on posix_spawn_file_actions_addchdir being present. Swift Build requires that function, and once SwiftPM switches to it as its default build engine, it means we can no longer support any Linux distribution with an older glibc, including Amazon Linux 2 (which is going EoL in July).

I'm not aware of it being documented. Probably a question for the @platform-steering-group.

Yeah, sadly this is not documented anywhere; the answer is that, at any time, it's whatever the minimum version on the oldest supported Linux distribution is. (Put another way: we express support for Linux distributions, not glibc versions per se.)

This is still useful for triple aliases, even if we disregard multi-arch object files. I.e. wasm32-unknown-wasi and wasm32-unknown-wasip1 are equivalent to each other. Library authors might want to specify both in the metadata file so that library consumer could use either triple alias.

This is tangential, but glibc supports the concept of (static) version checking and we might want to try to expose that in Swift for the benefit of code that uses glibc.

Jake, DM me? I may be able to help you here. :smiley:

What's the definition of "equivalent"? Equivalent in that LLVM canonicalizes one triple to the other? Or they are distinct but link-compatible?

The existing artifact bundle format for executables supports this so I thought it would be worth staying consistent, there is also no particular reason why multi-arch Mach-Os couldn't be supported and as @Max_Desiatov noted it's useful for both triple aliases.

1 Like

I was going to do a survey of the various platforms before the proposal goes into review. However would it be worthwhile to formally track for both glibc and musl? This would make it slightly safer for folk that want to create a minimal very minimal linux setup.

We obviously would be interest to hear your thoughts on how this pitch intersects with this one that was predating it:

The mentioned proposal tries to solve a somewhat different problem although admittedly there is some overlap. Specifically the RLP proposal is about a mechanism for distributing and evolving components of a single application for a known deployment platforms and punts all ABI stability concerns to the user. Additionally the RLP system isn't suitable for using a binary target dependency in anything but the final executable product.

Our proposal is trying to solve a somewhat different problem, namely general distribution of binary artifact that expose a C-interface as dependencies for both executable products but also library products. As a result our proposal must address the ABI stability problems that RLP proposal says are the concern of the organization deploying the final application. Because Swift doesn't have a stable ABI on Linux, we haven't explored support for it in our proposal.

3 Likes

I very much prefer the ideas behind this pitch over the RLP's. Admittedly my main focus is distributing and linking dynamic libraries/dependencies rather than static ones, but this at least lays down the necessary groundwork required to improve the developing/deploying experience of Swift libraries for everyone.

1 Like

I'm not sure if LLVM operates on that level, respective Clang and Swift drivers parse and normalize triples. IIRC SwiftPM also does something to triples, as it vendors the triple parser from Swift Driver.

IIUC wasm32-unknown-wasi and wasm32-unknown-wasip1 are not distinct and are fully interchangeable. But I could foresee that a library built for wasm32-unknown-none is link-compatible with wasm32-unknown-wasi, which is clearly a distinct triple.

Just as well I could see someone producing an arm64-linux-none library with Embedded Swift that only makes syscalls and has no libc symbols, and it would still link fine with either arm64-linux-gnu or arm64-linux-musl.

While I welcome this type of feature, I think the pitch would benefit from an explicit example of a library that you would like to enable, especially since it is limited to C API interfacing and doesn't support the Swift ABI yet.

I love this proposal - it is a godsend to everyone of us using swift beyond apple platforms.

Why does this proposal require looking for unresolved symbols at all? It seems that this feature would run into a long list of caveats, a long implementation and maintenance code and the sole purpose would be an early warning.

For instance, what if I wanted to link against a library that has dependencies on another library (both included)? It is not clear if this would be supported and if it is, we would find ourselves fighting against the baked rules with no recourse but to come back here and plead to relax that rule

7 Likes

So this is principally my fault, as I've argued strongly for adopting this model.

The reason to do this is to ensure that developers can feel confident that the package they have built is going to be able to actually be installed by users. To that end, the intention is that the tool should warn you that you have accidentally linked a library other than the appropriate libc.

Nobody is intending to enforce that you pass this checking: it's strictly advisory. But I believe it is really, really important that we offer tooling that enables developers to do quick integrity checks.

Supporting interdependencies of binaries is a good feature to call out, but the solution is for the tool to be able to line those symbols up.

The intent with the pitch is not to gate the build on the artifact satisfying the (stringent) compatibility requirements, but rather provide out-of-band tooling to make it easy to validate that an artifact will work on a supported deployment platform (i.e. won't miss any dependencies e.t.c.). As the ecosystem moves forward there is scope to relax the rules a bit as we standardize the expectations around platforms on which Swift can be deployed.

Let me know where I have been unclear in the proposal so that I can clarify.

I'd argue that this is quite unfortunate. In the ideal world I'd like to see this as a hard requirement. Too many times this has caused issues that are hard to diagnose and hard to debug. Making this advisory and easy to skip creates a huge footgun that will only cause more problems down the road.

At the same time, relaxing this requirement in the future when we find appropriate ways to deal with it is easy. In general, relaxing hard requirements and making things more flexible is easy, but the opposite is quite hard.

2 Likes

I partially disagree with this. I think other ecosystems successfully showed that this is doable with an advisory tool e.g. the Python ecosystem. I agree that it would be great if we have full safety at build time that something is going to run wherever it is deployed. However, I think achieving this is incredibly hard. I personally really don't want perfect be the enemy of good here. I had the need multiple times in the past to be able to use a static library that just uses libc symbols and it is really unfortunate that this is supported today on Darwin via xcframework's and not on any other platform. Let's take the first step and build on top of that.

5 Likes

I am very sympathetic to Max’s concern, but I think the key is here:

The goal of this tool is to move this issue from “hard to debug” to “easy to debug”. It should also be a hook to support detecting other misuse in future, such as incorrect architecture targets.

2 Likes