Swift Package Registry Service

Thanks for your feedback, @yim_lee! I'm excited to work with you and the rest of the team on this.

I think package discovery is important, too, and I'm really excited for how registries can help with that. In that respect, it may be to draw a distinction between our goals for registries generally and for the registry API specification specifically.

So far, I've tried to narrowly scope this to the essential functionality for integration with SPM. There may be some more things it needs to do in order to get there, such as providing the package manifest separately from the archive (discussed more below). But beyond that, I think additional features, like search, would best be explored in separate proposals.

It's easy to add new functionality, but much harder to remove or change functionality once it's defined.

Yes, this is absolutely something we're considering. From the spec:

I originally punted on this for lack of strong conventions about how this should work. However, I'd be very happy to add this in if we can settle on a good model for deprecation.

For an extreme comparison, I timed cloning vs. downloading + unzipping apple/swift and found Git to take ~1 minute compared to ~10 seconds with ~500MB of transfer compared to ~30MB. But for a more realistic / practical benchmark, I plan to run this for the top 100 or so packages to get a sense of performance differences in the aggregate.

1 Like

I think this kind of analysis is interesting but also keep in mind that the cost of fetching the entire git repository during dependency resolution is somewhat amortized over time (unless the user deletes the .build directory). Downloading just the zip files will definitely be cheaper when compared to downloading git repositories but the resolver will have to download them for every resolution that might happen over time. Of course, this can be mitigated by adding local caching but then swiftpm has to manage that.

I believe an efficient way of performing dependency resolution is exposing an endpoint that returns a map of version -> hash of the package manifest(s) and another endpoint that serves the file contents given a hash. This will be a really good optimization since manifest contents often don't change between versions and swiftpm only needs to perform minimal amount of download operations during resolution.

3 Likes

Thanks so much for the kind words, Rick. I'm very much looking forward to working together on Package Manager once again.

You raise some excellent points in your reply, and I'll try my best to respond to your concerns:

My intention was for each registry to constitute its own name registry, such that any package is identified by its fully-qualified name within that registry (e.g. github.com/mona/LinkedList or mona.dev/LinkedList or coolpackages.io/github.com/mona/LinkedList).

Unfortunately, I don't think limiting acceptable namespaces in registries helps us avoid the Morning Star / Evening Star problem. As soon as folks start pointing to registries for packages, we lose the ability for the url in a Package.swift dependency specification to uniquely identify a package.

There's a good chance that we'll need to solve package identity before we're able to support non-.git URLs. For the first iteration, we may be limited to adding transparent support, such that SPM translates .git URLs to use registry endpoints when available.

I apologize if this wasn't clear in the specification. Each registry is responsible for providing its own authentication scheme. For GitHub's Swift package registry, the only people who can publish or unpublished a version are those who own the repo on GitHub.com (or have the relevant packages:write scope permissions).

If sketchypackages.io wants to name-squat a popular package name or fling the doors open to let anyone do anything without any permissions... well, there's indeed nothing stopping them from doing so. But then again, folks make a choice to use a registry, and they're unlikely to pick one that they can't trust.

Apologies for the length of this specification. It's certainly a substantial proposal, but I think a lot of its word count is an attempt to define explicit behavior for HTTP APIs, which are notoriously hard to pin down.

We're only 1 day into this thread, but things seem to be under control for now. If that changes, I certainly wouldn't be opposed to breaking out discussion for any individual topic.

Did you have any specific concerns about the security model? Or was there anything you'd like more details about?

I think we can take a lot of inspiration from what the Yarn package manager does with its offline cache and plug'n'play — especially as we start to consider first-class scripting support for Swift.

That's a great idea! It'd be the easiest thing in the world to add that field to the response for package releases:

{
    "releases": {
        "1.1.1": {
            "url": "https://swift.pkg.github.com/mona/LinkedList/1.1.1",
            "checksum": ["sha256", "1179902b126096145c8feebca4c153f81506c3d86acc45109480d36838d1445e"]
        }
    }
}

In fact, that could be a clever solution to the identity problem identified by @rballard and others.

A few questions about implementation:

  • Would the existence of Package@swift-4.swift or other tools-versions variants affect behavior in any way? (My guess would be, "No")
  • Any preference in hashing algorithm? (SHA-256?)
1 Like

Thanks for the links. This made me think what if we actually go ahead and use a per-user cache for holding package sources using llbuild2's new file-backed CAS implementation (we would also need a small caching layer to look up things in the CAS database but I believe @David_M_Bryson is already working on that). The idea would be that swiftpm will fetch the package sources directly into the CAS and read from it during the dependency resolution. We might end up fetching more versions than actually needed but they will be automatically de-duplicated and shared across all packages on the user's machine. And at the end of the resolution, the checkout too can be done from the CAS so there is no network operation needed there. In the future, we might be able to even skip creating checkouts of the sources as we would be able to directly read them from CAS during the build (provided llbuild2 + swiftpm-on-llbuild2 experiment works out).

That's an interesting idea but I think there are deeper problems with identity and name clashes that might be worth discussing separately. Some of us (cc @johannesweiss) once discussed introducing a reverse-domain identifier in the package manifest which is used as the identity. And that can also be used by the swift compiler to namespace the modules so you avoid module name clashes (+ you would have some way of disambiguating if needed). However, this is certainly not an easy task and requires a lot of work in the compiler.

I am starting to think that a per-user cache is a better approach and that also simplifies the spec. However, if we do end up using this approach I would expect that the server returns names and hashes of all package manifests present in the package. Using SHA-256 for content hashing makes sense.

5 Likes

NPM’s proposal is a very interesting link, thanks. What I think neither you nor NPM have done is explained exactly what attacker is being defeated here. “If an attacker has interposed a proxy between me and the registry” raises some interesting questions. Given that this API runs entirely over HTTPS, how are attackers supposed to do that? If an attacker is capable of achieving that privileged network position, how are you handling key distribution to avoid them simply intercepting and delivering their own key?

Package signing is not a priori unreasonable, but doing so without a clear idea of what attacker you’re worried about is. Additionally, explaining how the registry is distributing and updating keys is also vital. How are keys updated? Can keys be revoked? How do the answers to these questions affect the threat model?

I’d really like to see this explored much more deeply. Right now the document assumes that package signing has value over-and-above HTTPS without explaining what that value is, why HTTPS is not providing it, and what the intended usage model is. I’m very nervous about adding cryptographic features simply because we can without this kind of justification.

3 Likes

HTTPS isn't panacea and neither is PGP signing, but working together they improve the overall security of the system. That's the philosophy of "defense in depth". While that may seem unnecessary, consider that what we're sending — packages — contains executable code, which deserves the highest level of scrutiny.

There are at least a few different ways that attackers can work around TLS / HTTPS. For example, developers sometimes install trusted root certificates to their system so that they can do things like inspect network traffic. By design, those can be used to undermine transport-level security, and can be exploited as an attack vector.

Or forget HTTPS for a moment. Consider what @Aciid is proposing with a local package cache: An attacker could swap out real packages with malicious forgeries by way of some privilege escalation on the filesystem. Keeping detached signatures for all of those cached packages would be a good way to prevent that from happening.

PGP has robust processes and infrastructure for issuing, sharing, and revoking keys, which are described in documentation. For the purposes of the specification, it should be sufficient to link to PGP as a standard, much like we do for HTTPS / TLS. But I'm exploring ways to strike the right balance to provide enough context for those references.

1 Like

Please don’t mistake what I’m saying here: I’m not saying “don’t sign packages”. I’m saying that the pitch should clearly explain how package signing works, from beginning to end, with a clear description of what attacks it prevents or mitigates. This would need to include an assessment of why HTTPS isn’t valuable.

As an example of why I’m proposing this, consider your last paragraph:

If the attacker has privilege escalation to user level privilege, they are exactly as privileged as users are. Presumably you’re allowing users to control which keys they trust: in that case, an attacker can simply add their own key to the trusted chain. Additionally, if the attacker has privilege escalation they already have privilege escalation. They can just write any other binary on disk. While macOS has some mitigations against this kind of attack, an attacker who has already achieved privilege escalation is exceedingly hard to defend against, and package signing is unlikely to save you.

Again, I must stress that I’m not saying package signing doesn’t have value. I am saying that it is incumbent upon this pitch to clearly address what attack scenarios are mitigated and how. Adding cryptography to a protocol should not be done simply because it’s nice to have, it should be done with clear and reasoned intent. I’m just asking for the pitch to show its working.

Sorry, I don’t think I posed the question clearly enough, let me rephrase: how does the package manager interact with the PGP ecosystem to manage keys and do verification?

There are lots of possible answers here. Let’s outline some:

  1. The package manager uses any existing gpg installation on the box to manage verification. It does not download signatures or attempt to update them in any way, it just attempts to validate. Missing keys are not errors, and pass silently.
  2. As (1) but missing keys do not pass silently, the package manager emits warnings.
  3. As (1) but missing keys are errors. Users are required to perform out-of-band steps to obtain those keys.
  4. As (2), but the package manager will attempt to download any key from SKS and then present it to the user asking them to validate.
  5. The package manager ships with a known-trusted key.
  6. The package manager can be “configured” with a registry which includes some out-of-band system to communicate a trusted key at setup.
  7. The package manager does nothing, users are required to take manual steps to perform verification.
  8. This pitch declares this out of scope: it says that registries must sign, but places no rules upon the package manager about what to do with this information.

Each of these is very different! They provide different levels of defense, they have different weaknesses and strengths, they trade off availability and ease of use in different ways. IMO this pitch should address whether it a) requires anything from package managers, and b) from registries.

2 Likes

We're in agreement here. I've already made changes to my draft of the proposal locally that expand on what package signing does and how it works, and I'm continuing to refine that to be responsive to feedback from you and others. Given that this is a key feature of the registries, it deserves a more satisfactory explanation of how it works.

That's an excellent question, and one that I look forward to address in the follow-up proposal for how Swift Package Manager integrates with package registries. And please don't read that as brushing off your concerns — I agree completely that these details matter, and sincerely appreciate your raising these points.

A specification has to balance several competing interests: specificity vs. flexibility, brevity vs. exhaustivity, the needs of providers and consumers. Being too specific about implementation details risks invaliding equally viable (or even better) alternatives.

I agree that more can be done to articulate the motivation and value of the security features introduced by this proposal, but I want to make sure we're doing that in a way that doesn't micromanage implementations.

2 Likes

First off thank you @mattt for the proposal! Apologies if this question is already answered but how would this proposal expect to solve authentication of clients to the registry for "pulling" packages? It seems like the authentication scheme used by each registry can be different but yet it's not covered how it would be expected to be used. I am thinking along the lines of enterprises that use a package registry of some sort already for custom/private packages and who may not have an SSH level authentication pattern or host in Github.

Packages in Github will work great because the github.com domain is in the public DNS and the authentication can be done with SSH keys with the right URL scheme but when using HTTPS for private packages what is the expectation of the client (Package.swift) to specify how the client authenticates to the repository? Should a "registries" list be supported in the Package.swift file so that the client (SPM) can use the information provided when resolving packages? This would be similar to how npm, ruby, and rust rust all work as well today.

When I initially read through this proposal my mind immediately went to how JFrog would be an adopter of this and how it will benefit enterprise based companies I work with. JFrog has a "universal" repository solution, meaning they support a large number of repository types (npm, ruby, rust, maven, docker, etc) which would then be secured through enterprise level authentication (SAML, OpenID, etc). When a client registers the repository they need to specify the URL, Username and an API Token. When the client (resolver) searches for a package at the URL it will use the token and authentication scheme to authenticate requests. The problem with authentication is that not everyone does the same style but the most common to name a few:

  1. Custom HTTP Header + API Key (in case of JFrog it's X-JFrog-Art-Api)
  2. SSH based (Github)
  3. HTTP basic auth
  4. Some form of Access Token based (using http header Authorization: Bearer <value>)

The point / question is that if the registry like you described is expected to be widely adopted by enterprises (which is where I think there is a really strong need for this proposal) then registry registration in the Package.swift needs to consider how to support a generic authentication model.

There are more feature that an enterprise would need but the support for customization is the most important to get started (MVP). Thanks!

1 Like

This isn't related to this proposal but rather SwiftPM integration so I didn't comment on it before but I think separating the description of dependencies from the location of the registry (and using something equivalent to NPM's config set registry) is more flexible.

2 Likes

Thanks for sharing this perspective, @kylep! I found JFrog's approach to artifact management to be quite helpful in shaping my own understanding of this problem space, and believe that the proposed solution is compatible with the use case you're describing.

In the proposal, I leave the matter of authentication up to the server:

While GitHub.com will likely use the same OAuth 2 system they use throughout their API, an enterprise system (like GitHub Enterprise) may use something like SAML or client certificate authentication over an internal network. For JFrog, it may well be that X-JFrog-Art-Api key you alluded to.

Those permissions determine who can access, publish, or unpublish a package release. As for how a registry accesses the code itself to publish or serve a release, that's another detail left to the server. The API call to publish a package includes an optional URL parameter, which you can use to specify where code is hosted. That could include a hard-coded credential or rely on SSH-based authentication between the registry and the code host. Again, something left to the server.

The question of how a client (Swift Package Manager) would interact with a registry like JFrog is something still to be determined in a future proposal. I'll be sure to keep this in mind as we figure out how that will work. A few other folks on this thread (including @tachyonics' post above) have also proposed registry lists, so that's a strategy we'll definitely consider.

Thanks @mattt, I agree this proposal is big. It sounds like in its current form you are just proposing the server side of the registry for publishing and not how the client (Swift Package Manager) interacts with it. This server or service would cover the searching, publish and unpublish side and doesn't describing a "publishing" client which is fine and I agree with (this is how JFrog structures it).

I agree as well, I have learned a lot about this space lately as I was trying to find a similar solution to support secured carthage binary dependencies. I was able to come up with a solution that is quite clever that leverages JFrog's generic repository pattern and Rest API.

It seems as though the final solution is going to have multiple parts right?:

  1. Server Registry specification (this proposal)
  2. Client side (Swift Package Manager) read model to service registry of #1
  3. (Optional) Client side (Swift Package Manager) write model to service registry of #1

Makes sense. However it mean we also have to change url in our Package.swift when using branch/hashes:

- .package(url: "https://swift.pkg.github.com/apple/swift-argument-parser", from: "0.0.1")
+ .package(url: "https://github.com/apple/swift-argument-parser.git", .branch("test"))

Which feel weird to me as end-user. Beside from my understanding of the proposal a package is not tied to one repository url (as we can tag with url parameter) so it might be not always be obvious to end-user what the repository url should be.

My question was can we easily determine the source code the release is attached to? As it seem we can release anything (a tag, a branch, a commit) having just the repository url and the tag don't seem enough when releasing a branch or a commit.

Correct, though I'm not convinced that Swift Package Manager should have anything to do with publishing to service registries.

As discussed earlier in the thread, that listing under "Impact to existing packages" was not intended to be an actual proposal for how to integrate with Swift Package Manager. We still need to figure out what that will look like in Package.swift, and this will certainly be a consideration.

In your question, does "we" refer to the client or the server? For the server, there are optional commit, branch, tag, path, and url parameters. For the client, I'd like to find a way to incorporate this information into the detached signature as a form of metadata. But the more likely solution is to include a commit reference in the releases endpoint, alongside the url field for each release.

"1.1.1" {
  "url": "https://swift.pkg.github.com/mona/LinkedList/1.1.1",
  "commit": "d8978910a0934c21ea08e3c9a031a5baa967f5b1"
}

100% Agree with you. I only listed it as someone is going to need to build these but it should be up to the registry to prescribe. Ideally many will be API based so tools like bash scripting or Fastlane can be used to call the endpoints when the producer determines they are ready to publish.

This reason seems weak -- I would drop it from the list.

I disagree with this. Git is slow compared to a dedicated package registry. Additionally for the reasons I listed about how the client (user) should pull these dependencies the registry should try to decouple from the original Git repository as much as possible. Like I had been discussing, I wouldn't want to give a consumer both access to the git repository (source) and the registry, if I am an enterprise then I will likely have very different auth models for source code and packages.

6 Likes

The relation between the two is very tight in SwiftPM’s current model. It is reasonable to want to enable a package to exist without an associated repository, but it should be a non‐goal to decouple the two. If I’m using a package from a registry, and I want to temporarily insert some print statements to help debug my client code, swift package edit foo should just work. If I then discover the problem is actually with the package and not with my client code, it should be possible to directly check in and push the changes like I can now (provided the package also exists in repository form and I have the necessary permissions).

While Git may never be quite as fast, I don’t think the status quo is a very good indication of how big of a difference there is between the two. There is a lot of low‐hanging fruit in the area of performance that could go a long way to making SwiftPM faster in ways that also benefit Git‐based packages (which are likely to continue to be heavily used at least for team‐local packages). Some examples:

  • Store clones once in a global location instead of duplicated in every .build directory. This would mean Git only has to fetch it once even if many of your packages depend on it.
  • Do shallow cloning for pins, .exact dependencies, and other fully‐resolved constraints. When SwiftPM can know ahead of time exactly what checkout it needs, there is no real reason for it to fetch the entire repository. This would mean until you alter the dependency graph, it would only be fetching a tiny fraction of what it fetches for you now.

These have both been discussed extensively, and I haven’t seen anyone oppose them. From what I can tell, the only reason they haven’t been implemented yet is that they just haven’t reached the top of anyone’s priority list.

So yes, Git is a little slower than a direct download, but no, the difference is not “significant”. At least not when compared to the real slowdowns in the current implementation.

The speed and efficiency arguments are weak. The security and durability arguments are much stronger (along with discoverability).

4 Likes

I agree.

My focus is not so much around having a search endpoint that each package registry must implement, but I think there needs to be a standard way to find out what packages a registry has such that it's possible to look for a package across multiple registries. This could be done with a package registry webhook like you suggested earlier (subscribes to registry change events), a "list packages" API as @tachyonics mentioned (polls periodically for registry updates), or an actual search endpoint (performs search on each registry), etc.

3 Likes