Last month, I submitted a pitch for a Swift package registry service, and the feedback we received from the community was excellent. (Thank you all so much!) We’re working to incorporate that feedback into a revised draft soon.
In particular, I’d like to thank @lukasa for his feedback about the proposed use of digital signatures for package verification. Because security is a core value proposition of this proposal, I wanted to spin up a new thread to focus on that specifically (as suggested by @rballard).
I’m going to start by sharing what I’ve learned since posting the original pitch last month, and will then describe some ideas I had for alternative approaches to security. My hope is that these ideas can serve as a good starting point for a discussion on this topic.
The story so far
A primary goal of the proposed registry service is to provide strong guarantees that the package you downloaded is authentic. One approach is built on trust: If you assume that a registry always sends you exactly what you ask for, you only need to verify the sender (though it wouldn’t hurt to verify the contents anyway).
Modern information security relies on public-key cryptography to verify claims of identity. Broadly speaking, there are two approaches to certificate trust:
- A centralized, hierarchical public-key infrastructure (PKI) scheme, in which certificate authorities (CAs) issue certificates that prove the ownership of public keys. This is the approach taken by TLS, which is used by HTTPS.
- A decentralized, distributed “web of trust” whereby individuals vouch for the identity of one another by signing each others’ keys directly. This is the approach taken by PGP.
The original proposal relies on both TLS and PGP for security; TLS to verify the identity of the registry’s domain (e.g. github.com
) and PGP to verify the registry as the creator of the package archive. My thinking was that this “belt and suspenders” approach would offer more security than relying on one alone. Instead, this turned out to be more of a “weak link in the chain”.
“PGP considered harmful”
A few days after I pitched this on the forums, Filippo Valsorda, who’s is in charge of cryptography and security on the Go team at Google, reached out to caution against using PGP. Although a decentralized “web of trust” model can work in theory, PGP itself has some serious issues in practice. An article titled “The PGP Problem” provides a good summary of these issues.
A quick aside to help clarify any confusion around PGP naming:
- PGP (“Pretty Good Privacy”): A piece of software developed by Phil Zimmermann in 1991.
- OpenPGP: An open standard for PGP’s functionality, codified by IETF RFC 4880
- GPG (GnuPG or “Gnu Privacy Guard”): A popular piece of software that implements the OpenPGP standard.
To its credit, PGP remains a popular security standard, and in many cases, it’s better than nothing. For most systems, “pretty good” is good enough. For example, I’ll continue to sign my Git commits with GPG (until GitHub adds support for something better). However, there appears to be a consensus among security researchers that new systems can do better by choosing a modern, focused tool for their security needs.
“We don’t use the ‘B’ word” (Blockchain)
I had the privilege of chatting with Filippo and his colleague, Katie Hockman, who explained the approach they took with Go’s checksum database. The basic idea is that, when a new version of a module is published, a checksum of its contents is added to a public, append-only log.
Fun Fact: The terms “package” and “module” in Swift have the opposite meanings in Go.
This approach is very different from the one initially proposed for the package registry. Instead of trusting that a registry is on the up-and-up, a transparent log allows you to be inherently distrustful. Because the entire, immutable history of package release is made public, it can be independently audited and verified that a checksum for a release hasn’t changed.
Adopting this approach, we could do away with the /mona/LinkedList/1.0.0.zip.asc
endpoint entirely. Instead, the response for the Zip archive could provide the checksum for the package release contents in a header (distinct from Digest
which is for the Zip file itself;Content-Checksum
, perhaps?). The checksum can then be verified against the registry’s checksum database. Counter-intuitively, this simpler approach offers stronger guarantees than a digital signature, which can only attest to the identity of who archived a package, not what’s inside.
Client-side, the security model is Trust on first use (TOFU). When you add a 3rd-party dependency to your own Swift project, SPM would download the package, verify its integrity based on the registry-provided checksum, and then store that checksum in a new field in Package.resolved
.
// Straw-man proposal
{
"object": {
"pins": [
{
"package": "LinkedList",
"url": "https://swift.pkg.github.com/mona/LinkedList",
"state": {
"checksum": ["sha256", "ed008d5af44c1d0ea0e3668033cae9b695235f18b1a99240b7cf0f3d9559a30d"],
"version": "1.2.0"
}
}
]
},
"version": 2
}
Later, when your CI downloads those dependencies during a build, it can use the checksum in the checked-in Package.resolved
file to verify that the contents of the package haven't changed for that version. TOFU operates under the principle that "you can't fool everyone all of the time". There's a chance that you download a forged package the first time you resolve a new dependency, but it will likely be caught the next time you or someone else attempts to validate the checksum of the forged package. (TUF, discussed later on, offers stronger guarantees than TOFU)
Note: Transparency logs and digital signatures are complementary. If digital signatures were something we wanted to support, there’s no technical reason why we couldn’t have a checksum database and signatures (either using OpenPGP or something like signify or minisign).
Go’s checksum database is powered by a piece of software called Trillian, which is written an Go and sponsored by Google. Filippo noted that there’s interest to make Trillian available for deployment in other language ecosystems, so that may be an option for Swift.
Seeing the Forest for the (Merkle) Trees
At the heart of this append-only log are Merkle Trees. If you’ve heard of them before, there’s a good chance it was in the context of Bitcoin (though Katie was quick to point out that Merkle Trees ≭ blockchain).
But you know what else uses Merkle Trees? Git.
So I got to thinking, “Could we use Git as a transparency log?” It’s not as far-out as you might think. Rustaceans do this for Crates.io. CocoaPods actually does this, too.
I started to sketch out how this might work with a simple Bash script that runs through the steps of adding a new package release.
$ tree .
.
└── 2e
└── 80
└── 6e
└── 827387c39e8c594053f5c90701dbb5b5aa00a83812cb0452b5695e45be
├── 1.2.0.zip
└── 1.2.0.zip.asc
4 directories, 3 files
# Use git tree to lookup releases for a package (according to content-addressable path)
$ git ls-files "$(checksum "$package")" -x "*.zip"
2e/80/6e/827387c39e8c594053f5c90701dbb5b5aa00a83812cb0452b5695e45be/1.2.0.zip
# Verify that release archive hasn't been modified
$ git log --follow "$(checksum "$package")/1.2.0.zip" --oneline
792276c (HEAD -> master) sha-256=c584882ef498cc6e043e0f100134483dfa04ea2d5921f56
Some highlights of this proof-of-concept:
- Using Git provides a familiar, trustworthy mechanism for auditing integrity (it’s one thing to technically secure; it’s another to have consensus, understanding, and trust that it is secure)
- Using SHA-256 for content-addressable mapping with packages shards packages evenly in filesystem, protects against homographic attacks, and provides a possible mechanism for privacy / anonymity
- Using Git notes allows changes to metadata without affecting history of the release artifact
- Using Git LFS keeps repository slim and storage agnostic
I’d be interested to hear any thoughts you have— especially as it compares to, for example, Go’s checksum database.
Securing the Software Supply Chain
In the course of my research, I learned about The Update Framework (TUF), as well as its sister project In-Toto. Together, these frameworks can go a long way to improving package and supply chain security.
As I understand it, transparent logs and TUF provide different guarantees and can be implemented together. This article by Trishank Karthik Kuppusamy and Marina Moore has a great explanation of these two approaches and how they relate.
I’m still looking into how TUF and TLs can fit together in the context of the package registry proposal. So if you have any ideas about what that might look like, I’d love to hear them.
To summarize:
- There are practical reasons to avoid OpenPGP in a new specification
- A checksum database / transparent log is a simple and effective strategy for securing the package ecosystem that doesn't (directly) require public-key cryptography
- Frameworks like TUF can provide even stronger security guarantees