Swift Package Registry Service

There's a lot to like in this! Looks like some great steps forward.

One thing that I think @tachyonics touched on but that doesn't seem to be addressed it how this is going to affect dependency resolution. One of the problems of the current state of things is that each dependency needs to be completely cloned rather than just a specific version. The reason for this of course is that SwiftPM needs to be able to see every tag/release and parse the manifest for every release
to resolve dependencies correctly. From how I've read it, SwiftPM would need to download a ton of ZIP archives to try and find compatible versions of dependencies? Which definitely doesn't seem like an improvement. I guess explicitly declaring that the package registry can provide either dependencies or manifest for individual versions might help, but either way I see this and a future SwiftPM proposal to integrate this having to be tightly coupled.

6 Likes

Thanks for this feedback. I agree that this could be better articulated in the proposal.

Source archives and their signatures are produced by the package registry. The signature certifies that the archive was created by the registry at a particular time. (I'm still looking at how to reasonably tie GPG signatures to the commit hash; if anyone has any ideas, I'd love to hear them.)

A signature defends against man-in-the-middle attacks. This post from the npm blog has a great write-up about a similar approach they're taking:

If an attacker has interposed a proxy between you and the registry, they can tamper with both the package JSON document that advertises the shasum and the tarball itself. This attacker could create a tarball with unexpected content, generate an integrity field for it, then construct a packument advertising this poisoned tarball. An npm client would trust the packument and therefore also trust the tarball.

The specification only asks that a registry check for a file named Package.swift. Having access to the Swift compiler seemed like a burdensome requirement to me, and since it wasn't strictly necessary for registries to function, I left that out. However, a registry MAY (and likely SHOULD) validate Package.swift, and it'd make sense to say that in the specification.

What should happen? :man_shrugging:
What would happen is that Swift Package Manager would have nondeterministic behavior when installing or updating packages.

For the purposes of this proposal, I don't think a registry introduces anything new, so I don't think we have to address it here. But this is something that we should consider in a wider discussion about Package.swift manifests.

No, correct — a registry only serves package releases, which is incompatible with dependency specifications on commit or branch references.

Yes, you'd need to make an API call after tagging your release. GitHub and other registries will likely make that an automatic process.

Our intention isn't to avoid a separate release process as much as a parallel release process. Our proposed solution creates a 1:1 correspondence between release and git versions. Whereas with other solutions, like RubyGems or CocoaPods, you can get into a situation where a version metadata field disagrees with a git tag.

Sorry, I'm not sure I understand your question. The concept of a package registry release is separate from any platform-specific features on GitHub.

Correct. The solution we're proposing is for SPM to download release archives to retrieve manifest files. I suspect that downloading several Zip files would still be faster than direct Git access, and I wanted to get a baseline for performance before adding any optimizations.

An earlier draft of this proposal included an endpoint for fetching Package.swift files for releases separately, but I punted on that for a few reasons:

  • I don't have a good sense of how to reconcile multiple / different tools versions. For example, if a package has Package.swift and Package@swift-4.swift, which one do I serve? And if I parameterize this call to include a tools_version parameter, what happens when the requested version isn't available (e.g. request 5.2, but have 5.1 and 5.3)?
  • There could be better ways to optimize dependency resolution, such as serving cached JSON serializations of Package.swift or creating an endpoint for solving dependency graphs server-side.

There are a few different ways this could be done, and I think the follow-up proposal for SPM integration will give us a good opportunity to evaluate the best option. Any server-side optimizations would be additive changes (no breaking changes to the API).

6 Likes

If it helps, there are currently no packages known to swiftpm.co with more than 300 semantic versions and only 20 with 100 or more.

2 Likes

While the example with randomness is worth a good laugh, non‐constant does not necessarily mean non‐deterministic.

A real‐world example would be these sorts of things. (The need for some of those will be removed in 5.3 due to SE‐0273, but the web toolchain’s rejection of manifests with dynamic libraries will still be an issue.)

That would break a lot of stuff for the reasons shown above. Yes, ideally we would have all the tools we need in declarative form in the manifest. But the years have shown that we repeatedly need to resort to dynamism between the time features become available and the time the manifest API catches up to directly support them and the bugs are worked out. It happened when the generated projects began to support iOS, tvOS and watchOS. It happened when Xcode integrated SwiftPM. It happened when toolchains appeared for for Android, for Windows, and then for the web. And I suspect we haven’t seen the last of it.

2 Likes

First I must say this looks really awesome and well thought-out. I think we can all agree that a Swift Package Registry Service can provide features and capabilities that SwiftPM cannot on its own, but as different organizations have different requirements for a package registry, taking a centralized approach is difficult. The idea of defining a minimal API spec that all package registries must implement yet offering flexibility for individuals to include custom APIs sounds compelling to me. This way it doesn't matter which or how many registries one uses, as long as there is a common API that SwiftPM can work with.

Besides some of the reasons you listed (e.g., security, efficiency, etc.), what are other goals that you would like to achieve with a package registry? Package discovery is important IMO:

While I agree that search is a complex problem that is best left as an implementation detail, I think we should at least have an API for it--perhaps make it optional as with the "unpublish release" API.

Speaking of unpublish/delete, have you considered deprecation instead of deletion? It's not uncommon for people to depend on exact versions and making a version unavailable would break their builds. Deprecated versions may lead to warnings in builds but at least they continue to work and give people time to fix things.

This is also interesting in that:

  • Potential issues with dependency resolution seem to be a general concern.
  • I am curious what the performance impact of downloading archives vs. direct git access would be.
  • An endpoint for fetching the manifest could be useful and deserves further discussion.

These are out of scope for this post though and are more suitable for the SwiftPM integration proposal.

Anyway, I think the proposal looks great overall and eager to see how it evolves!

8 Likes

This is phenomenal; I’m very excited about this pitch and the hard work that’s gone into developing it. Thank you! It’s going to take some time to dig into all the details, but I think the overall arc of this is on the right track.

My first two questions are around package identity and unpublishing packages.

Identity

In order to have a portable package graph without registry-specific lock-in or hidden cross-registry package name/identity conflicts, we should ensure that a given package can be uniquely identified everywhere it is referenced in a package dependency graph. Today that is done with a fully-specified git host URL (albeit with some bugs and edge cases that we still need to work out, as others called out above). In this model, the “global namespace authority” is effectively outsourced to domain name ownership. This pitch’s “Namespace and package name resolution” section allows a registry to support an alternate namespace with short names like mona/LinkedList or LinkedList. While short package names are highly desirable for ergonomics, especially for e.g. Swift scripts, I don’t think Swift packages should allow dependencies to be specified in this way without some sort of globally-acceptable namespacing that isn’t controlled by one vendor-specific registry.

(Our current URL-based identity does suffer from domain lock-in already, but we should be careful not to make the problem worse).

It is possible that we could get the benefits of short names in “root” contexts without needing to solve the global namespace problem. E.g. any entity that can’t be a depended-upon node in a package dependency graph could allow use of short names; so a hypothetical Swift script could specify a registry to use and then import dependencies by short name, while a versioned package that could be a non-root node in a package dependency graph would not be allowed to do this. Even in this case, I think a registry protocol supporting short names would also need to be able to provide the canonical identity (aka git URL) of the package being referenced, to allow clients to detect additional references to the same package in their dependency graph. (I think this might be what @jechris was suggesting earlier in this thread).

I’d suggest we leave alternate namespacing out of this initial pitch, require that registries respect the canonical package URL, and discuss anything further in a seperate pitch.

Unpublishing packages

One of the things I really like about your pitch is how you use a “pull” model instead of a “push” model for publishing new package versions. This seems to sidestep the complicated questions around how you establish secure permissions & ownership for publishing package versions; with this model, anyone can notify the registry to check for new versions, and the source repository remains the source of truth for the versions available and their content.

Re: the identity questions above, this pitch doesn’t explain how short names for new repository would get sensibly claimed under the “pull” model & avoid namesquatting. But I suppose that’s a problem which can be solved in a registry-specific manner, or can be addressed in a seperate namespacing/identity pitch.

Is the intent behind the unpublish command that it will work the same way as publish: anyone could ask the registry to unpublish, but it’s really just a permissionless notification for the registry to go look and see if the repository has been deleted? I have some questions about specific challenges here (and especially about the “alternate release” redirect functionality you mention), and I wonder if this gets complicated enough that unpublish should be broken out into a seperate pitch.

More topics

This is a very meaty proposal and it’s honestly a little overwhelming to try to dig into all of it at once. Might this be best broken out into smaller additive proposals (or at least seperate discussion threads)? E.g. the whole topic of signatures and security deserves its own in-depth conversation. What do you think?

Thanks for your feedback, @yim_lee! I'm excited to work with you and the rest of the team on this.

I think package discovery is important, too, and I'm really excited for how registries can help with that. In that respect, it may be to draw a distinction between our goals for registries generally and for the registry API specification specifically.

So far, I've tried to narrowly scope this to the essential functionality for integration with SPM. There may be some more things it needs to do in order to get there, such as providing the package manifest separately from the archive (discussed more below). But beyond that, I think additional features, like search, would best be explored in separate proposals.

It's easy to add new functionality, but much harder to remove or change functionality once it's defined.

Yes, this is absolutely something we're considering. From the spec:

I originally punted on this for lack of strong conventions about how this should work. However, I'd be very happy to add this in if we can settle on a good model for deprecation.

For an extreme comparison, I timed cloning vs. downloading + unzipping apple/swift and found Git to take ~1 minute compared to ~10 seconds with ~500MB of transfer compared to ~30MB. But for a more realistic / practical benchmark, I plan to run this for the top 100 or so packages to get a sense of performance differences in the aggregate.

1 Like

I think this kind of analysis is interesting but also keep in mind that the cost of fetching the entire git repository during dependency resolution is somewhat amortized over time (unless the user deletes the .build directory). Downloading just the zip files will definitely be cheaper when compared to downloading git repositories but the resolver will have to download them for every resolution that might happen over time. Of course, this can be mitigated by adding local caching but then swiftpm has to manage that.

I believe an efficient way of performing dependency resolution is exposing an endpoint that returns a map of version -> hash of the package manifest(s) and another endpoint that serves the file contents given a hash. This will be a really good optimization since manifest contents often don't change between versions and swiftpm only needs to perform minimal amount of download operations during resolution.

3 Likes

Thanks so much for the kind words, Rick. I'm very much looking forward to working together on Package Manager once again.

You raise some excellent points in your reply, and I'll try my best to respond to your concerns:

My intention was for each registry to constitute its own name registry, such that any package is identified by its fully-qualified name within that registry (e.g. github.com/mona/LinkedList or mona.dev/LinkedList or coolpackages.io/github.com/mona/LinkedList).

Unfortunately, I don't think limiting acceptable namespaces in registries helps us avoid the Morning Star / Evening Star problem. As soon as folks start pointing to registries for packages, we lose the ability for the url in a Package.swift dependency specification to uniquely identify a package.

There's a good chance that we'll need to solve package identity before we're able to support non-.git URLs. For the first iteration, we may be limited to adding transparent support, such that SPM translates .git URLs to use registry endpoints when available.

I apologize if this wasn't clear in the specification. Each registry is responsible for providing its own authentication scheme. For GitHub's Swift package registry, the only people who can publish or unpublished a version are those who own the repo on GitHub.com (or have the relevant packages:write scope permissions).

If sketchypackages.io wants to name-squat a popular package name or fling the doors open to let anyone do anything without any permissions... well, there's indeed nothing stopping them from doing so. But then again, folks make a choice to use a registry, and they're unlikely to pick one that they can't trust.

Apologies for the length of this specification. It's certainly a substantial proposal, but I think a lot of its word count is an attempt to define explicit behavior for HTTP APIs, which are notoriously hard to pin down.

We're only 1 day into this thread, but things seem to be under control for now. If that changes, I certainly wouldn't be opposed to breaking out discussion for any individual topic.

Did you have any specific concerns about the security model? Or was there anything you'd like more details about?

I think we can take a lot of inspiration from what the Yarn package manager does with its offline cache and plug'n'play — especially as we start to consider first-class scripting support for Swift.

That's a great idea! It'd be the easiest thing in the world to add that field to the response for package releases:

{
    "releases": {
        "1.1.1": {
            "url": "https://swift.pkg.github.com/mona/LinkedList/1.1.1",
            "checksum": ["sha256", "1179902b126096145c8feebca4c153f81506c3d86acc45109480d36838d1445e"]
        }
    }
}

In fact, that could be a clever solution to the identity problem identified by @rballard and others.

A few questions about implementation:

  • Would the existence of Package@swift-4.swift or other tools-versions variants affect behavior in any way? (My guess would be, "No")
  • Any preference in hashing algorithm? (SHA-256?)
1 Like

Thanks for the links. This made me think what if we actually go ahead and use a per-user cache for holding package sources using llbuild2's new file-backed CAS implementation (we would also need a small caching layer to look up things in the CAS database but I believe @David_M_Bryson is already working on that). The idea would be that swiftpm will fetch the package sources directly into the CAS and read from it during the dependency resolution. We might end up fetching more versions than actually needed but they will be automatically de-duplicated and shared across all packages on the user's machine. And at the end of the resolution, the checkout too can be done from the CAS so there is no network operation needed there. In the future, we might be able to even skip creating checkouts of the sources as we would be able to directly read them from CAS during the build (provided llbuild2 + swiftpm-on-llbuild2 experiment works out).

That's an interesting idea but I think there are deeper problems with identity and name clashes that might be worth discussing separately. Some of us (cc @johannesweiss) once discussed introducing a reverse-domain identifier in the package manifest which is used as the identity. And that can also be used by the swift compiler to namespace the modules so you avoid module name clashes (+ you would have some way of disambiguating if needed). However, this is certainly not an easy task and requires a lot of work in the compiler.

I am starting to think that a per-user cache is a better approach and that also simplifies the spec. However, if we do end up using this approach I would expect that the server returns names and hashes of all package manifests present in the package. Using SHA-256 for content hashing makes sense.

5 Likes

NPM’s proposal is a very interesting link, thanks. What I think neither you nor NPM have done is explained exactly what attacker is being defeated here. “If an attacker has interposed a proxy between me and the registry” raises some interesting questions. Given that this API runs entirely over HTTPS, how are attackers supposed to do that? If an attacker is capable of achieving that privileged network position, how are you handling key distribution to avoid them simply intercepting and delivering their own key?

Package signing is not a priori unreasonable, but doing so without a clear idea of what attacker you’re worried about is. Additionally, explaining how the registry is distributing and updating keys is also vital. How are keys updated? Can keys be revoked? How do the answers to these questions affect the threat model?

I’d really like to see this explored much more deeply. Right now the document assumes that package signing has value over-and-above HTTPS without explaining what that value is, why HTTPS is not providing it, and what the intended usage model is. I’m very nervous about adding cryptographic features simply because we can without this kind of justification.

3 Likes

HTTPS isn't panacea and neither is PGP signing, but working together they improve the overall security of the system. That's the philosophy of "defense in depth". While that may seem unnecessary, consider that what we're sending — packages — contains executable code, which deserves the highest level of scrutiny.

There are at least a few different ways that attackers can work around TLS / HTTPS. For example, developers sometimes install trusted root certificates to their system so that they can do things like inspect network traffic. By design, those can be used to undermine transport-level security, and can be exploited as an attack vector.

Or forget HTTPS for a moment. Consider what @Aciid is proposing with a local package cache: An attacker could swap out real packages with malicious forgeries by way of some privilege escalation on the filesystem. Keeping detached signatures for all of those cached packages would be a good way to prevent that from happening.

PGP has robust processes and infrastructure for issuing, sharing, and revoking keys, which are described in documentation. For the purposes of the specification, it should be sufficient to link to PGP as a standard, much like we do for HTTPS / TLS. But I'm exploring ways to strike the right balance to provide enough context for those references.

1 Like

Please don’t mistake what I’m saying here: I’m not saying “don’t sign packages”. I’m saying that the pitch should clearly explain how package signing works, from beginning to end, with a clear description of what attacks it prevents or mitigates. This would need to include an assessment of why HTTPS isn’t valuable.

As an example of why I’m proposing this, consider your last paragraph:

If the attacker has privilege escalation to user level privilege, they are exactly as privileged as users are. Presumably you’re allowing users to control which keys they trust: in that case, an attacker can simply add their own key to the trusted chain. Additionally, if the attacker has privilege escalation they already have privilege escalation. They can just write any other binary on disk. While macOS has some mitigations against this kind of attack, an attacker who has already achieved privilege escalation is exceedingly hard to defend against, and package signing is unlikely to save you.

Again, I must stress that I’m not saying package signing doesn’t have value. I am saying that it is incumbent upon this pitch to clearly address what attack scenarios are mitigated and how. Adding cryptography to a protocol should not be done simply because it’s nice to have, it should be done with clear and reasoned intent. I’m just asking for the pitch to show its working.

Sorry, I don’t think I posed the question clearly enough, let me rephrase: how does the package manager interact with the PGP ecosystem to manage keys and do verification?

There are lots of possible answers here. Let’s outline some:

  1. The package manager uses any existing gpg installation on the box to manage verification. It does not download signatures or attempt to update them in any way, it just attempts to validate. Missing keys are not errors, and pass silently.
  2. As (1) but missing keys do not pass silently, the package manager emits warnings.
  3. As (1) but missing keys are errors. Users are required to perform out-of-band steps to obtain those keys.
  4. As (2), but the package manager will attempt to download any key from SKS and then present it to the user asking them to validate.
  5. The package manager ships with a known-trusted key.
  6. The package manager can be “configured” with a registry which includes some out-of-band system to communicate a trusted key at setup.
  7. The package manager does nothing, users are required to take manual steps to perform verification.
  8. This pitch declares this out of scope: it says that registries must sign, but places no rules upon the package manager about what to do with this information.

Each of these is very different! They provide different levels of defense, they have different weaknesses and strengths, they trade off availability and ease of use in different ways. IMO this pitch should address whether it a) requires anything from package managers, and b) from registries.

2 Likes

We're in agreement here. I've already made changes to my draft of the proposal locally that expand on what package signing does and how it works, and I'm continuing to refine that to be responsive to feedback from you and others. Given that this is a key feature of the registries, it deserves a more satisfactory explanation of how it works.

That's an excellent question, and one that I look forward to address in the follow-up proposal for how Swift Package Manager integrates with package registries. And please don't read that as brushing off your concerns — I agree completely that these details matter, and sincerely appreciate your raising these points.

A specification has to balance several competing interests: specificity vs. flexibility, brevity vs. exhaustivity, the needs of providers and consumers. Being too specific about implementation details risks invaliding equally viable (or even better) alternatives.

I agree that more can be done to articulate the motivation and value of the security features introduced by this proposal, but I want to make sure we're doing that in a way that doesn't micromanage implementations.

2 Likes

First off thank you @mattt for the proposal! Apologies if this question is already answered but how would this proposal expect to solve authentication of clients to the registry for "pulling" packages? It seems like the authentication scheme used by each registry can be different but yet it's not covered how it would be expected to be used. I am thinking along the lines of enterprises that use a package registry of some sort already for custom/private packages and who may not have an SSH level authentication pattern or host in Github.

Packages in Github will work great because the github.com domain is in the public DNS and the authentication can be done with SSH keys with the right URL scheme but when using HTTPS for private packages what is the expectation of the client (Package.swift) to specify how the client authenticates to the repository? Should a "registries" list be supported in the Package.swift file so that the client (SPM) can use the information provided when resolving packages? This would be similar to how npm, ruby, and rust rust all work as well today.

When I initially read through this proposal my mind immediately went to how JFrog would be an adopter of this and how it will benefit enterprise based companies I work with. JFrog has a "universal" repository solution, meaning they support a large number of repository types (npm, ruby, rust, maven, docker, etc) which would then be secured through enterprise level authentication (SAML, OpenID, etc). When a client registers the repository they need to specify the URL, Username and an API Token. When the client (resolver) searches for a package at the URL it will use the token and authentication scheme to authenticate requests. The problem with authentication is that not everyone does the same style but the most common to name a few:

  1. Custom HTTP Header + API Key (in case of JFrog it's X-JFrog-Art-Api)
  2. SSH based (Github)
  3. HTTP basic auth
  4. Some form of Access Token based (using http header Authorization: Bearer <value>)

The point / question is that if the registry like you described is expected to be widely adopted by enterprises (which is where I think there is a really strong need for this proposal) then registry registration in the Package.swift needs to consider how to support a generic authentication model.

There are more feature that an enterprise would need but the support for customization is the most important to get started (MVP). Thanks!

1 Like

This isn't related to this proposal but rather SwiftPM integration so I didn't comment on it before but I think separating the description of dependencies from the location of the registry (and using something equivalent to NPM's config set registry) is more flexible.

2 Likes

Thanks for sharing this perspective, @kylep! I found JFrog's approach to artifact management to be quite helpful in shaping my own understanding of this problem space, and believe that the proposed solution is compatible with the use case you're describing.

In the proposal, I leave the matter of authentication up to the server:

While GitHub.com will likely use the same OAuth 2 system they use throughout their API, an enterprise system (like GitHub Enterprise) may use something like SAML or client certificate authentication over an internal network. For JFrog, it may well be that X-JFrog-Art-Api key you alluded to.

Those permissions determine who can access, publish, or unpublish a package release. As for how a registry accesses the code itself to publish or serve a release, that's another detail left to the server. The API call to publish a package includes an optional URL parameter, which you can use to specify where code is hosted. That could include a hard-coded credential or rely on SSH-based authentication between the registry and the code host. Again, something left to the server.

The question of how a client (Swift Package Manager) would interact with a registry like JFrog is something still to be determined in a future proposal. I'll be sure to keep this in mind as we figure out how that will work. A few other folks on this thread (including @tachyonics' post above) have also proposed registry lists, so that's a strategy we'll definitely consider.

Thanks @mattt, I agree this proposal is big. It sounds like in its current form you are just proposing the server side of the registry for publishing and not how the client (Swift Package Manager) interacts with it. This server or service would cover the searching, publish and unpublish side and doesn't describing a "publishing" client which is fine and I agree with (this is how JFrog structures it).

I agree as well, I have learned a lot about this space lately as I was trying to find a similar solution to support secured carthage binary dependencies. I was able to come up with a solution that is quite clever that leverages JFrog's generic repository pattern and Rest API.

It seems as though the final solution is going to have multiple parts right?:

  1. Server Registry specification (this proposal)
  2. Client side (Swift Package Manager) read model to service registry of #1
  3. (Optional) Client side (Swift Package Manager) write model to service registry of #1

Makes sense. However it mean we also have to change url in our Package.swift when using branch/hashes:

- .package(url: "https://swift.pkg.github.com/apple/swift-argument-parser", from: "0.0.1")
+ .package(url: "https://github.com/apple/swift-argument-parser.git", .branch("test"))

Which feel weird to me as end-user. Beside from my understanding of the proposal a package is not tied to one repository url (as we can tag with url parameter) so it might be not always be obvious to end-user what the repository url should be.

My question was can we easily determine the source code the release is attached to? As it seem we can release anything (a tag, a branch, a commit) having just the repository url and the tag don't seem enough when releasing a branch or a commit.