Package Manager Source Archive Dependencies

Last month, I submitted a pitch for a package registry service. Thanks again to everyone for your thoughtful feedback so far!

Designing and implementing a package registry is a huge undertaking, beyond the scope of a single Swift Evolution proposal. Our first pitch attempted to split things out between client and server, starting with a server API. However, there are open questions about how everything should work, which can't be addressed as effectively with this framing. (We've already had to split out security considerations into its own separate thread)

So instead, we're going to try an incremental approach, whereby each proposal can stand alone as a complete new feature. To that end, this pitch lays the necessary groundwork for a registry while also providing functionality that's useful independently.

As always, I look forward to hearing what everyone thinks, and hearing your ideas for how we can make this even better.


Introduction

Swift Package Manager added support for binary dependencies with SE-0272. This proposal extends that functionality to support non-binary source dependencies as well.

Motivation

Swift Package Manager requires a source dependency to be hosted in a Git repository with a package manifest located in its root. This can cause problems for projects with a different directory structure or that use a version control system other than Git.

Proposed solution

Provide an alternative mechanism for downloading source dependencies that uses Zip as a file format and HTTP as a transport mechanism.

This proposal describes new PackageDescription APIs for declaring source archive dependencies in a package manifest as well as a new swift package archive subcommand for generating a source archive for a package.

Goals of this proposal

We believe the proposed changes mitigate several important barriers to adopting Swift Package Manager. In addition, this proposal lays the foundation for future interactions with any package registry service that may be proposed in the future.

Non-goals of this proposal

Source archives are not a replacement for Git repositories. They should be used primarily to work around existing limitations. For example:

  • Packages managed by an incompatible source control system
  • Packages located in a non-root directory
  • Packages stored in a large Git repository, whether because of unrelated files or commit history
  • Packages generated dynamically

Detailed Design

New PackageDescription APIs

The Package.Dependency type adds the following static methods:

extension Package.Dependency {
    /// Declares a source archive with the given url.
    public static func archive(
        name: String,
        url: String,
        checksum: String
    ) -> Package.Dependency

    /// Declare a source archive with the given path on disk.
    public static func archive(
        name: String,
        path: String
    ) -> Package.Dependency
}

These methods can be called in the dependencies field of a package manifest to declare source archive dependencies.

dependencies: [
   .archive(name: "LinkedList",
            url: "https://github.com/mona/LinkedList/archive/1.2.0.zip",
            checksum: "1feec3d8d144814e99e694cd1d785928878d8d6892c4e59d12569e179252c535"),
   .archive(name: "Local",
            path: "/path/to/Local.zip")
]

For dependency resolution, source archives act like dependencies with an exact version specifier. They are, therefore, more likely to result in unresolvable dependency graphs. Although a source archive is unversioned, lacking associated commit reference, the integrity checksum may be used as a substitute revision identifier.

When Swift Package Manager downloads a source archive for the first time, it compares the result of the swift package compute-checksum subcommand with the value provided in the package manifest. If the values are different, the build fails with an error.

New swift package archive subcommand

Swift package source archives are Zip files. You can generate a source archive by running swift package archive in the root directory of a package.

SYNOPSIS
	swift package archive [--output=<file>]

OPTIONS
	-o <file>, --output=<file>
		Write the archive to <file>.
		If unspecified, the package is written to `\(PackageName).zip`.

For example:

$ tree -a -L 1
LinkedList
β”œβ”€β”€ .git
β”œβ”€β”€ Package.swift
β”œβ”€β”€ README.md
β”œβ”€β”€ Sources
└── Tests
$ swift package archive
# Created LinkedList.zip

By default, the filename of the generated archive is the name of the package with a .zip extension (for example, "LinkedList.zip"). This can be configured with the --output option:

$ swift package archive --output="Package.zip"
# Created Package.zip

The archive subcommand has the equivalent behavior of git-archive(1) using the zip format with its default compression level. Therefore, the following command produces equivalent output to the previous example:

$ git archive --format zip --output LinkedList.zip HEAD

If desired, this behavior may be changed in future tool versions.

Note: git-archive ignores files with the export-ignore Git attribute. By default, this ignores hidden files and directories, including.git and .build. Delegating this behavior to Git has the benefit of built-in support for any Zip archives provided by code hosting providers like GitHub.

To generate the integrity checksum for a source archive, use the existing compute-checksum subcommand:

$ swift package compute-checksum LinkedList.zip
1feec3d8d144814e99e694cd1d785928878d8d6892c4e59d12569e179252c535

When publishing a source archive, the package's owner should provide the computed checksum alongside the Zip file to make it safer and easier for other developers to use it as a dependency. If a precomputed checksum isn't available, the developer is responsible to verifying the contents of the source archive and computing a checksum themselves.

Security

Adding external dependencies to a project increases the attack surface area of your software. However, much of the associated risk can be mitigated.

To better understand the security implications of this proposal β€” and Swift dependency management more broadly β€” we employ the STRIDE mnemonic below:

Spoofing

An attacker could interpose a proxy between you and the source archive host to intercept credentials for that host and use them to impersonate the user in subsequent requests.

The impact of this attack is potentially high, depending on the scope and level of privilege associated with these credentials. However, the use of secure connections over HTTPS goes a long way to mitigate the overall risk.

Swift Package Manager could further mitigate this risk by taking the following measures:

  • Enforcing HTTPS for all dependency URLs
  • Resolving dependency URLs using DNS over HTTPS (DoH)
  • Requiring dependency URLs with Internationalized Domain Names (IDNs) to be represented as Punycode

Tampering

An attacker could interpose a proxy between you and the source archive host to construct and send Zip files containing malicious code.

Although the impact of such an attack is potentially high, the risk is largely mitigated by the use of cryptographic checksums to verify the integrity of downloaded source archives.

$ echo "$(swift package compute-checksum Package.zip) *Package.zip" | \
    shasum -a 256 -c -
Package.zip: OK

Integrity checks alone can't guarantee that a package isn't a forgery; an attacker could compromise the website of the host and provide a valid checksum for a malicious package. However, a checksum database can provide a tamper-proof system for associating artifacts with valid checksums.

Repudiation

A compromised host could serve a malicious package with a valid checksum and be unable to deny its involvement in constructing the forgery.

This threat is unique and specific to binary and source artifacts; Git repositories can have their histories audited, and individual commits may be cryptographically signed by authors. Unless you can establish a direct connection between an artifact and a commit in a source tree, there's no way to determine the provenance of that artifact.

A transparent log of checksums or the use of digital signatures may provide non-repudiation guarantees. We look forward to considering possible remediation strategies using a package registry in a future proposal.

Information disclosure

An attacker could scrape public code repositories for Package.swift files that use hardcoded credentials in dependency URLs, and attempt to reuse those credentials to impersonate the user.

dependencies: [
  .archive(name: "TopSecret",
           url: "https://<token>:x-oauth-basic@github.com/Mona/TopSecret/archive/1.0.0.zip",
           checksum: "2c4a4ce92225fb766447c1757abb916e13f68eba0459f1287ee62e4941d89bbf")
]

This kind of attack can be mitigated on an individual basis by using an unauthenticated URL and setting a mirror.

$ swift package config set-mirror \
    --original-url https://github.com/Mona/TopSecret/archive/1.0.0.zip \
    --mirror-url https://<token>:x-oauth-basic@github.com/Mona/TopSecret/archive/1.0.0.zip

The risk could be mitigated for all users if Swift Package Manager forbids the use of hardcoded credentials in Package.swift files.

Denial of service

An attacker could scrape public code repositories for Package.swift files that declare source archive dependencies and launch a denial-of-service attack in an attempt to reduce the availability of those resources.

The likelihood of this attack is generally low but could be used in a targeted way against resources known to be important or expensive to distribute.

This threat can be mitigated by obfuscating dependency URLs, such that they can't be pattern matched from source code.

func rot13(_ string: String) -> String {
    String(string.unicodeScalars.map { unicodeScalar in
        var value = unicodeScalar.value
        switch unicodeScalar {
        case "A"..."M", "a"..."m": value += 13
        case "N"..."Z", "n"..."z": value -= 13
        default: break
        }

        return Character(Unicode.Scalar(value)!)
    })
}

dependencies: [
  .archive(name: "TopSecret",
           url: rot13("uggcf://tvguho.pbz/Zban/GbcFrperg/nepuvir/1.0.0.mvc"),
           //       ^ "https://github.com/Mona/TopSecret/archive/1.0.0.zip"
           checksum: "2c4a4ce92225fb766447c1757abb916e13f68eba0459f1287ee62e4941d89bbf")
]

Important: Never store credentials in code β€” even if they're obfuscated.

Escalation of privilege

There are no known threats of privilege escalation threats arising from downloading and resolving dependencies. However, even authentic packages from trusted creators can contain malicious code.

Code analysis tools can help, to some degree, as can system permissions and other OS-level security features. But developers are ultimately the ones responsible for the code they ship to users.

Impact on existing packages

Current packages won't be affected by this change, as they'll continue to be able to specify and download dependencies using Git. Swift developers can opt-in to source archives on a per-dependency basis.

Alternatives considered

Use of tar or other archive formats

Swift Package Manager currently uses Zip archives for binary dependencies, which is reason enough to use it again here.

We briefly considered tar as an archive format but concluded that its behavior of preserving symlinks and executable bits served no useful purpose in the context of package management, and instead raised concerns about portability and security.

Use of digital signatures

SE-0272 includes discussion about the use of digital signatures for binary dependencies, concluding that they were unsuitable because of complexity around transitive dependencies. However, it's unclear what specific objections were raised in this proposal. We didn't see anything inherent tension with the example provided, and no further explanation was given.

Without understanding the context of this decision, we decided it was best to abide by this determination and instead discuss adding this functionality in a future proposal. For the reasons outlined in the preceding Security section, we believe that digital signatures may offer additional guarantees of authenticity and non-repudiation beyond what's possible with checksums alone.

Future directions

The functionality described in this proposal lays the groundwork for future integration with package registries β€” specifically, the ability to generate source archives, verify their integrity, and download them over HTTP independently of Git.

17 Likes

I think this observation means that if we add this feature, its use must be very carefully considered by package authors. Today we say this about exact in the SwiftPM documentation:

Specifying exact version requirements are usually not recommended

The same would apply here and it doesn't seem ideal to add a new feature that we immediately have to advise against using.

2 Likes

I agree that this is an important consideration, but I wouldn't characterize this as a feature to be advised against using. Like an exact dependency specification, it shouldn't be the first thing you should reach for, but it's there in case you need it.

Thanks for putting this pitch together @mattt!

I think @NeoNacho’s question here is prescient: if we’re planning to build package registry support on top of this interface, I think this proposal doesn’t get us where we want to go. In particular, the absence of versioning support makes it very hard to meaningfully adopt this in a Package.swift due to the inevitable dependency graph issues.

In particular this is difficult for libraries to adopt, and if libraries can’t adopt this then we also need to address how this interacts with the possibility of getting the same library from multiple sources. This pitch does not discuss how this interacts with git-based dependency specifications, but I think it will have to, as the odds of large packages having the same dependency specified both in git and in an archive is very high.

If these approaches are unsolveable, we may need to consider an alternative way to specify this information. For example, perhaps we could point at a manifest URL instead, where the manifest URL contains information about where to find both source archives and git repositories for a given package.

When was this subcommand added? It doesn't seem to work on my machine.

bash-3.2$ swift --version
Apple Swift version 5.2.4 (swiftlang-1103.0.32.9 clang-1103.0.32.53)
Target: x86_64-apple-darwin19.4.0
bash-3.2$ touch LinkedList.zip
bash-3.2$ swift package compute-checksum LinkedList.zip
error: expected arguments: _format, clean, config, generate-xcodeproj, experimental-api-diff, reset, update, dump-package, tools-version, unedit, show-dependencies, describe, edit, resolve, init, fetch, completion-tool

edit: It works in 5.3 :)

To be clear, this isn't the interface for a package registry, but rather part of its implementation. A package registry would share plumbing with source archive dependencies β€” both features would use the same code to fetch Zip files and verify their integrity. Versioning support would be provided by the registry.

To put it another way: The current system relies on Git for both transport (cloning over HTTP or SSH) and version resolution (listing and checking out tags). This proposal offers an alternative transport mechanism (fetching and decompressing a Zip file over HTTP), in anticipation of a future, alternative mechanism for resolving versions through a registry interface (getting available versions from a REST API call over HTTP).

A source archive dependency wouldn't be appropriate for a library if there's any chance that the dependency could be shared as a common transitive dependency with another library. The guidance for when you should and shouldn't use a source archive dependency is consistent with the guidance for specifying dependencies generally: Be only as specific as you need to be.

As proposed, a source archive dependency is functionally equivalent to a Git dependency specified with an exact version. You'd get the same result if you downloaded the Zip file, initialized a Git repository, committed its contents, pushed it to a URL, and specified a conventional dependency that pointed to the single Git commit.

What you're describing is exactly what I had in mind for a package registry :slightly_smiling_face: . For example, GET /{namespace}/{package}/{version} would have links to the source archive and package manifest.

How can the package registry do this if the Package.swift file is specifying a hash for the source archive?

Sure, but what I'm getting at here is that this model is fundamentally unusable for a library. If a library were to do this it would essentially guarantee that one of its consumers will encounter dependency hell. Libraries pinning to exact versions of dependencies is entirely un-scalable. This therefore becomes a feature that can only safely be adopted by non-public libraries, or by applications.

I don't mind having that, but I think we should be calling it out.

That's fab, I think what I'm not yet getting is how this proposal is a useful stepping stone to that one.

As a user-facing feature, source archive dependencies are orthogonal to a package registry. They're related only by virtue of their shared tooling around HTTP, Zip files, and integrity checks. You could either specify a source archive dependency and opt-out of dependency resolution OR specify a dependency that supports dependency resolution through a REST API instead of Git.

I disagree that an exact dependency is inherently untenable for non-public libraries. The risk is proportional to the likelihood of one being shared as a transitive dependency, and may be negligible. It all depends on how small and tightly coupled each individual component is.

All of that said, I agree that it's something that should be called out. My point is to say that I think the guidance is less binary than you're suggesting, and would best be described as a balancing test rather than any hard rules.

Thanks for writing the pitch!

I think using such a feature as a stepping stone towards implementation of a package registry makes a lot of sense as it would make it possible to evaluate various aspects of having a package registry but without requiring the full blown implementation. That said, I don't quite (at least yet) believe that this feature will be very useful on its own and it might introduce long-term maintenance burden for the package manager. What do you think about adding this functionality as a private API in PackageDescription (maybe via the new @_spi attribute) so we can play with the feature for evaluation purposes but don't actually ship it as a user-facing feature?

1 Like

That's a valid concern, and one that I share to some degree. I don't have evidence to back this up yet, but I suspect that this feature would indeed advance interests in Swift Package Manager adoption in larger organizations with unique constraints and a mixed technology ecosystem. (If you're reading this and work at a company that fits this description, I'd love to hear more!)

I can think of a few situations where source archive distribution could allow for novel uses of Swift Package Manager, like packages that are created dynamically. You could have a developer that distributes Swift packages with code generated from a protobuf, or a company that embeds license information alongside distributed code. A CI/CD system could regenerate libraries each time a build goes out, replacing the corresponding integrity checksum for the dependency in the package manifest.

These are hypothetical, and insufficient justification for adding a new feature prospectively. But they speak to the idea that there could be use cases that we don't yet appreciate. ("The absence of evidence is not the evidence of absence", yada yada)

Sounds good! That sounds like a totally reasonable approach.

1 Like

I can see how non-binary source archive support in SwiftPM would share same plumbing as package registry's download functionality, but I am not sure adding source archive dependency to package manifest is the necessary intermediate step.

Here is how I envision integration between SwiftPM and package registry would work. Suppose this is how we declare a dependency served by package registry:

dependencies: [
    .package(id: "github.com/mona/LinkedList", from: "1.0.0"),
]

After using package registry API(s) to resolve to the correct version 1.2.0, SwiftPM downloads the source archive for github.com/mona/LinkedList version 1.2.0 from package registry. Package registry provides the archive and checksum, etc. which SwiftPM needs to verify (per Swift Package Registry Service - Security proposal). Once authenticity is confirmed, SwiftPM can extract the source and use it as if it were a cloned Git repository.

So IMO SwiftPM would need to be able to work with non-binary source archive, but adding support for source archive dependency to package manifest is not required.

I also have questions on this:

I would think creating source archive and computing checksum should be done by package registry (or the "authority") rather than individual package owners, so if/when package registry becomes available it would take over the quoted process. Is that the assumption? Or, do we expect package registry to support some sort of upload API?

This would be really helpful for some generated code I work with. Essentially we have a collection of dozens of micro services that each has its own protobuf definitions and generated packages. Currently we generate all the packages and then generate a Package.swift file that includes all the individual packages.

The other languages though can generate packages just for the services that change, which is usually only 1 at a time. They then upload individual packages to Github package registry. To do that with Swift right now we'd need a unique git repo for each service, which we don't want to do. Plus the trickiness of pulling and pushing a second git repo in CI is not fun.

2 Likes

I needed this today to access code that is only available in a tgz file and found this thread looking for a solution. This is a niche request but still completely valid for when it is required. Thanks for the very detailed write up Mattt.