Swift Package Registry Service

theoriginalbit · June 5, 2020, 3:18am

Could this be something solved with adding registries to your machine in a similar way you can add git remotes? Something to the effect of swift package source add https://github.com/[org]/packages or swift package source add https://some-package-registry.com. Then swift package search would only need to discover through the registered registries.

yeradis · June 5, 2020, 5:48am

I think that registry entries should be declared inside the same package. So for new people or even ourself in the future will not need to remember which registry is missing in order to be able to find packages to use

Mainly thinking on when you work for different organizations, when you setup a new pc, when team members join, etc

lukasa · June 5, 2020, 6:41am

This proposal generally looks really good! While I have a few nits here and there I don’t think it’s really worth diving into them: they mostly derive from having applied what is clearly a more-general framework to a specific case. None of these nits make the proposal meaningfully worse, they just provide functionality that’s of minimal value (e.g. creating releases from custom-formatted tags).

Otherwise, this looks great. It’s straightforward to implement, allowing alternative interesting registry designs. It’s easy to understand. And it’s limited in scope. There’s a lot to like there!

There are some concerns around package deduplication that we really do have to address, but as others upthread have noted, these are a more fundamental problem for SwiftPM anyway.

One final note: the specification talks at length about how to retrieve digital signatures, but does not discuss who created them. I’d like to see clarity on what exactly the threat model is for these signatures: what attacks are they intended to defend against, and how?

cukr · June 5, 2020, 9:08am

Would the registry try to parse the Package.swift file (for example to check which dependencies it has for the security auditing), or just check if it exist, and accept it even if it's empty or filled with garbage?

If it will parse it, what should happen when the dependency list is not constant?

.target(
        name: "foo",
        dependencies: [Bool.random() ? "A" : "B"]
)

jechris · June 5, 2020, 9:49am

That's a very exciting proposal, thank you!

There are one or two things I'm not sure I understood completely:

How can we fetch a package to a specific branch/commit? Would we need to fallback to its repository url?
How is PUT working? Specifically taking Github as example, do we need to make our tag on the repository and then call PUT /mona/LinkedList/1.1.1?tag=v1.1.1? Is it something that's going to be automatic with platforms like Github? Otherwise I don't really see how it avoid the separate release process.
In {/namespace*}/{package}/{version} shouldn't we have a url pointing to the release resource (https://github.com/mona/LinkedList/releases/tag/v1.1.1 or maybe the hash itself)? It seem to me that otherwise we will lose information about what release is referencing to.

0xTim · June 5, 2020, 12:16pm

There's a lot to like in this! Looks like some great steps forward.

One thing that I think @tachyonics touched on but that doesn't seem to be addressed it how this is going to affect dependency resolution. One of the problems of the current state of things is that each dependency needs to be completely cloned rather than just a specific version. The reason for this of course is that SwiftPM needs to be able to see every tag/release and parse the manifest for every release
to resolve dependencies correctly. From how I've read it, SwiftPM would need to download a ton of ZIP archives to try and find compatible versions of dependencies? Which definitely doesn't seem like an improvement. I guess explicitly declaring that the package registry can provide either dependencies or manifest for individual versions might help, but either way I see this and a future SwiftPM proposal to integrate this having to be tightly coupled.

mattt · June 5, 2020, 1:36pm

Thanks for this feedback. I agree that this could be better articulated in the proposal.

Source archives and their signatures are produced by the package registry. The signature certifies that the archive was created by the registry at a particular time. (I'm still looking at how to reasonably tie GPG signatures to the commit hash; if anyone has any ideas, I'd love to hear them.)

A signature defends against man-in-the-middle attacks. This post from the npm blog has a great write-up about a similar approach they're taking:

If an attacker has interposed a proxy between you and the registry, they can tamper with both the package JSON document that advertises the shasum and the tarball itself. This attacker could create a tarball with unexpected content, generate an integrity field for it, then construct a packument advertising this poisoned tarball. An npm client would trust the packument and therefore also trust the tarball.

The specification only asks that a registry check for a file named Package.swift. Having access to the Swift compiler seemed like a burdensome requirement to me, and since it wasn't strictly necessary for registries to function, I left that out. However, a registry MAY (and likely SHOULD) validate Package.swift, and it'd make sense to say that in the specification.

What should happen?
What would happen is that Swift Package Manager would have nondeterministic behavior when installing or updating packages.

For the purposes of this proposal, I don't think a registry introduces anything new, so I don't think we have to address it here. But this is something that we should consider in a wider discussion about Package.swift manifests.

No, correct — a registry only serves package releases, which is incompatible with dependency specifications on commit or branch references.

Yes, you'd need to make an API call after tagging your release. GitHub and other registries will likely make that an automatic process.

Our intention isn't to avoid a separate release process as much as a parallel release process. Our proposed solution creates a 1:1 correspondence between release and git versions. Whereas with other solutions, like RubyGems or CocoaPods, you can get into a situation where a version metadata field disagrees with a git tag.

Sorry, I'm not sure I understand your question. The concept of a package registry release is separate from any platform-specific features on GitHub.

Correct. The solution we're proposing is for SPM to download release archives to retrieve manifest files. I suspect that downloading several Zip files would still be faster than direct Git access, and I wanted to get a baseline for performance before adding any optimizations.

An earlier draft of this proposal included an endpoint for fetching Package.swift files for releases separately, but I punted on that for a few reasons:

I don't have a good sense of how to reconcile multiple / different tools versions. For example, if a package has Package.swift and Package@swift-4.swift, which one do I serve? And if I parameterize this call to include a tools_version parameter, what happens when the requested version isn't available (e.g. request 5.2, but have 5.1 and 5.3)?
There could be better ways to optimize dependency resolution, such as serving cached JSON serializations of Package.swift or creating an endpoint for solving dependency graphs server-side.

There are a few different ways this could be done, and I think the follow-up proposal for SPM integration will give us a good opportunity to evaluate the best option. Any server-side optimizations would be additive changes (no breaking changes to the API).

finestructure · June 5, 2020, 4:10pm

If it helps, there are currently no packages known to swiftpm.co with more than 300 semantic versions and only 20 with 100 or more.

SDGGiesbrecht · June 5, 2020, 7:19pm

mattt:

cukr:
If it will parse it, what should happen when the dependency list is not constant?
.target(
  name: "foo",
  dependencies: [Bool.random() ? "A" : "B"]
)
What should happen?
What would happen is that Swift Package Manager would have nondeterministic behavior when installing or updating packages.

While the example with randomness is worth a good laugh, non‐constant does not necessarily mean non‐deterministic.

A real‐world example would be these sorts of things. (The need for some of those will be removed in 5.3 due to SE‐0273, but the web toolchain’s rejection of manifests with dynamic libraries will still be an issue.)

That would break a lot of stuff for the reasons shown above. Yes, ideally we would have all the tools we need in declarative form in the manifest. But the years have shown that we repeatedly need to resort to dynamism between the time features become available and the time the manifest API catches up to directly support them and the bugs are worked out. It happened when the generated projects began to support iOS, tvOS and watchOS. It happened when Xcode integrated SwiftPM. It happened when toolchains appeared for for Android, for Windows, and then for the web. And I suspect we haven’t seen the last of it.

yim_lee · June 5, 2020, 9:29pm

First I must say this looks really awesome and well thought-out. I think we can all agree that a Swift Package Registry Service can provide features and capabilities that SwiftPM cannot on its own, but as different organizations have different requirements for a package registry, taking a centralized approach is difficult. The idea of defining a minimal API spec that all package registries must implement yet offering flexibility for individuals to include custom APIs sounds compelling to me. This way it doesn't matter which or how many registries one uses, as long as there is a common API that SwiftPM can work with.

Besides some of the reasons you listed (e.g., security, efficiency, etc.), what are other goals that you would like to achieve with a package registry? Package discovery is important IMO:

While I agree that search is a complex problem that is best left as an implementation detail, I think we should at least have an API for it--perhaps make it optional as with the "unpublish release" API.

Speaking of unpublish/delete, have you considered deprecation instead of deletion? It's not uncommon for people to depend on exact versions and making a version unavailable would break their builds. Deprecated versions may lead to warnings in builds but at least they continue to work and give people time to fix things.

mattt:

0xTim:

One thing that I think @tachyonics touched on but that doesn't seem to be addressed it how this is going to affect dependency resolution. One of the problems of the current state of things is that each dependency needs to be completely cloned rather than just a specific version. The reason for this of course is that SwiftPM needs to be able to see every tag/release and parse the manifest for every release to resolve dependencies correctly. From how I've read it, SwiftPM would need to download a ton of ZIP archives to try and find compatible versions of dependencies?

Correct. The solution we're proposing is for SPM to download release archives to retrieve manifest files. I suspect that downloading several Zip files would still be faster than direct Git access, and I wanted to get a baseline for performance before adding any optimizations.

An earlier draft of this proposal included an endpoint for fetching Package.swift files for releases separately, but I punted on that for a few reasons:

I don't have a good sense of how to reconcile multiple / different tools versions. For example, if a package has Package.swift and Package@swift-4.swift , which one do I serve? And if I parameterize this call to include a tools_version parameter, what happens when the requested version isn't available (e.g. request 5.2 , but have 5.1 and 5.3 )?

There could be better ways to optimize dependency resolution, such as serving cached JSON serializations of Package.swift or creating an endpoint for solving dependency graphs server-side.

There are a few different ways this could be done, and I think the follow-up proposal for SPM integration will give us a good opportunity to evaluate the best option. Any server-side optimizations would be additive changes (no breaking changes to the API).

This is also interesting in that:

Potential issues with dependency resolution seem to be a general concern.
I am curious what the performance impact of downloading archives vs. direct git access would be.
An endpoint for fetching the manifest could be useful and deserves further discussion.

These are out of scope for this post though and are more suitable for the SwiftPM integration proposal.

Anyway, I think the proposal looks great overall and eager to see how it evolves!

rballard · June 5, 2020, 9:38pm

This is phenomenal; I’m very excited about this pitch and the hard work that’s gone into developing it. Thank you! It’s going to take some time to dig into all the details, but I think the overall arc of this is on the right track.

My first two questions are around package identity and unpublishing packages.

Identity

In order to have a portable package graph without registry-specific lock-in or hidden cross-registry package name/identity conflicts, we should ensure that a given package can be uniquely identified everywhere it is referenced in a package dependency graph. Today that is done with a fully-specified git host URL (albeit with some bugs and edge cases that we still need to work out, as others called out above). In this model, the “global namespace authority” is effectively outsourced to domain name ownership. This pitch’s “Namespace and package name resolution” section allows a registry to support an alternate namespace with short names like mona/LinkedList or LinkedList. While short package names are highly desirable for ergonomics, especially for e.g. Swift scripts, I don’t think Swift packages should allow dependencies to be specified in this way without some sort of globally-acceptable namespacing that isn’t controlled by one vendor-specific registry.

(Our current URL-based identity does suffer from domain lock-in already, but we should be careful not to make the problem worse).

It is possible that we could get the benefits of short names in “root” contexts without needing to solve the global namespace problem. E.g. any entity that can’t be a depended-upon node in a package dependency graph could allow use of short names; so a hypothetical Swift script could specify a registry to use and then import dependencies by short name, while a versioned package that could be a non-root node in a package dependency graph would not be allowed to do this. Even in this case, I think a registry protocol supporting short names would also need to be able to provide the canonical identity (aka git URL) of the package being referenced, to allow clients to detect additional references to the same package in their dependency graph. (I think this might be what @jechris was suggesting earlier in this thread).

I’d suggest we leave alternate namespacing out of this initial pitch, require that registries respect the canonical package URL, and discuss anything further in a seperate pitch.

Unpublishing packages

One of the things I really like about your pitch is how you use a “pull” model instead of a “push” model for publishing new package versions. This seems to sidestep the complicated questions around how you establish secure permissions & ownership for publishing package versions; with this model, anyone can notify the registry to check for new versions, and the source repository remains the source of truth for the versions available and their content.

Re: the identity questions above, this pitch doesn’t explain how short names for new repository would get sensibly claimed under the “pull” model & avoid namesquatting. But I suppose that’s a problem which can be solved in a registry-specific manner, or can be addressed in a seperate namespacing/identity pitch.

Is the intent behind the unpublish command that it will work the same way as publish: anyone could ask the registry to unpublish, but it’s really just a permissionless notification for the registry to go look and see if the repository has been deleted? I have some questions about specific challenges here (and especially about the “alternate release” redirect functionality you mention), and I wonder if this gets complicated enough that unpublish should be broken out into a seperate pitch.

More topics

This is a very meaty proposal and it’s honestly a little overwhelming to try to dig into all of it at once. Might this be best broken out into smaller additive proposals (or at least seperate discussion threads)? E.g. the whole topic of signatures and security deserves its own in-depth conversation. What do you think?

mattt · June 5, 2020, 10:11pm

Thanks for your feedback, @yim_lee! I'm excited to work with you and the rest of the team on this.

I think package discovery is important, too, and I'm really excited for how registries can help with that. In that respect, it may be to draw a distinction between our goals for registries generally and for the registry API specification specifically.

So far, I've tried to narrowly scope this to the essential functionality for integration with SPM. There may be some more things it needs to do in order to get there, such as providing the package manifest separately from the archive (discussed more below). But beyond that, I think additional features, like search, would best be explored in separate proposals.

It's easy to add new functionality, but much harder to remove or change functionality once it's defined.

Yes, this is absolutely something we're considering. From the spec:

I originally punted on this for lack of strong conventions about how this should work. However, I'd be very happy to add this in if we can settle on a good model for deprecation.

For an extreme comparison, I timed cloning vs. downloading + unzipping apple/swift and found Git to take ~1 minute compared to ~10 seconds with ~500MB of transfer compared to ~30MB. But for a more realistic / practical benchmark, I plan to run this for the top 100 or so packages to get a sense of performance differences in the aggregate.

Aciid · June 5, 2020, 10:25pm

I think this kind of analysis is interesting but also keep in mind that the cost of fetching the entire git repository during dependency resolution is somewhat amortized over time (unless the user deletes the .build directory). Downloading just the zip files will definitely be cheaper when compared to downloading git repositories but the resolver will have to download them for every resolution that might happen over time. Of course, this can be mitigated by adding local caching but then swiftpm has to manage that.

I believe an efficient way of performing dependency resolution is exposing an endpoint that returns a map of version -> hash of the package manifest(s) and another endpoint that serves the file contents given a hash. This will be a really good optimization since manifest contents often don't change between versions and swiftpm only needs to perform minimal amount of download operations during resolution.

mattt · June 5, 2020, 10:45pm

Thanks so much for the kind words, Rick. I'm very much looking forward to working together on Package Manager once again.

You raise some excellent points in your reply, and I'll try my best to respond to your concerns:

My intention was for each registry to constitute its own name registry, such that any package is identified by its fully-qualified name within that registry (e.g. github.com/mona/LinkedList or mona.dev/LinkedList or coolpackages.io/github.com/mona/LinkedList).

Unfortunately, I don't think limiting acceptable namespaces in registries helps us avoid the Morning Star / Evening Star problem. As soon as folks start pointing to registries for packages, we lose the ability for the url in a Package.swift dependency specification to uniquely identify a package.

There's a good chance that we'll need to solve package identity before we're able to support non-.git URLs. For the first iteration, we may be limited to adding transparent support, such that SPM translates .git URLs to use registry endpoints when available.

I apologize if this wasn't clear in the specification. Each registry is responsible for providing its own authentication scheme. For GitHub's Swift package registry, the only people who can publish or unpublished a version are those who own the repo on GitHub.com (or have the relevant packages:write scope permissions).

If sketchypackages.io wants to name-squat a popular package name or fling the doors open to let anyone do anything without any permissions... well, there's indeed nothing stopping them from doing so. But then again, folks make a choice to use a registry, and they're unlikely to pick one that they can't trust.

Apologies for the length of this specification. It's certainly a substantial proposal, but I think a lot of its word count is an attempt to define explicit behavior for HTTP APIs, which are notoriously hard to pin down.

We're only 1 day into this thread, but things seem to be under control for now. If that changes, I certainly wouldn't be opposed to breaking out discussion for any individual topic.

Did you have any specific concerns about the security model? Or was there anything you'd like more details about?

mattt · June 5, 2020, 11:01pm

I think we can take a lot of inspiration from what the Yarn package manager does with its offline cache and plug'n'play — especially as we start to consider first-class scripting support for Swift.

That's a great idea! It'd be the easiest thing in the world to add that field to the response for package releases:

{
    "releases": {
        "1.1.1": {
            "url": "https://swift.pkg.github.com/mona/LinkedList/1.1.1",
            "checksum": ["sha256", "1179902b126096145c8feebca4c153f81506c3d86acc45109480d36838d1445e"]
        }
    }
}

In fact, that could be a clever solution to the identity problem identified by @rballard and others.

A few questions about implementation:

Would the existence of Package@swift-4.swift or other tools-versions variants affect behavior in any way? (My guess would be, "No")
Any preference in hashing algorithm? (SHA-256?)

Aciid · June 6, 2020, 1:57am

Thanks for the links. This made me think what if we actually go ahead and use a per-user cache for holding package sources using llbuild2's new file-backed CAS implementation (we would also need a small caching layer to look up things in the CAS database but I believe @David_M_Bryson is already working on that). The idea would be that swiftpm will fetch the package sources directly into the CAS and read from it during the dependency resolution. We might end up fetching more versions than actually needed but they will be automatically de-duplicated and shared across all packages on the user's machine. And at the end of the resolution, the checkout too can be done from the CAS so there is no network operation needed there. In the future, we might be able to even skip creating checkouts of the sources as we would be able to directly read them from CAS during the build (provided llbuild2 + swiftpm-on-llbuild2 experiment works out).

That's an interesting idea but I think there are deeper problems with identity and name clashes that might be worth discussing separately. Some of us (cc @johannesweiss) once discussed introducing a reverse-domain identifier in the package manifest which is used as the identity. And that can also be used by the swift compiler to namespace the modules so you avoid module name clashes (+ you would have some way of disambiguating if needed). However, this is certainly not an easy task and requires a lot of work in the compiler.

I am starting to think that a per-user cache is a better approach and that also simplifies the spec. However, if we do end up using this approach I would expect that the server returns names and hashes of all package manifests present in the package. Using SHA-256 for content hashing makes sense.

lukasa · June 6, 2020, 1:30pm

mattt:

Source archives and their signatures are produced by the package registry. The signature certifies that the archive was created by the registry at a particular time. (I'm still looking at how to reasonably tie GPG signatures to the commit hash; if anyone has any ideas, I'd love to hear them.)

A signature defends against man-in-the-middle attacks. This post from the npm blog has a great write-up about a similar approach they're taking:

If an attacker has interposed a proxy between you and the registry, they can tamper with both the package JSON document that advertises the shasum and the tarball itself. This attacker could create a tarball with unexpected content, generate an integrity field for it, then construct a packument advertising this poisoned tarball. An npm client would trust the packument and therefore also trust the tarball.

NPM’s proposal is a very interesting link, thanks. What I think neither you nor NPM have done is explained exactly what attacker is being defeated here. “If an attacker has interposed a proxy between me and the registry” raises some interesting questions. Given that this API runs entirely over HTTPS, how are attackers supposed to do that? If an attacker is capable of achieving that privileged network position, how are you handling key distribution to avoid them simply intercepting and delivering their own key?

Package signing is not a priori unreasonable, but doing so without a clear idea of what attacker you’re worried about is. Additionally, explaining how the registry is distributing and updating keys is also vital. How are keys updated? Can keys be revoked? How do the answers to these questions affect the threat model?

I’d really like to see this explored much more deeply. Right now the document assumes that package signing has value over-and-above HTTPS without explaining what that value is, why HTTPS is not providing it, and what the intended usage model is. I’m very nervous about adding cryptographic features simply because we can without this kind of justification.

mattt · June 6, 2020, 2:14pm

HTTPS isn't panacea and neither is PGP signing, but working together they improve the overall security of the system. That's the philosophy of "defense in depth". While that may seem unnecessary, consider that what we're sending — packages — contains executable code, which deserves the highest level of scrutiny.

There are at least a few different ways that attackers can work around TLS / HTTPS. For example, developers sometimes install trusted root certificates to their system so that they can do things like inspect network traffic. By design, those can be used to undermine transport-level security, and can be exploited as an attack vector.

Or forget HTTPS for a moment. Consider what @Aciid is proposing with a local package cache: An attacker could swap out real packages with malicious forgeries by way of some privilege escalation on the filesystem. Keeping detached signatures for all of those cached packages would be a good way to prevent that from happening.

PGP has robust processes and infrastructure for issuing, sharing, and revoking keys, which are described in documentation. For the purposes of the specification, it should be sufficient to link to PGP as a standard, much like we do for HTTPS / TLS. But I'm exploring ways to strike the right balance to provide enough context for those references.

lukasa · June 6, 2020, 2:33pm

Please don’t mistake what I’m saying here: I’m not saying “don’t sign packages”. I’m saying that the pitch should clearly explain how package signing works, from beginning to end, with a clear description of what attacks it prevents or mitigates. This would need to include an assessment of why HTTPS isn’t valuable.

As an example of why I’m proposing this, consider your last paragraph:

If the attacker has privilege escalation to user level privilege, they are exactly as privileged as users are. Presumably you’re allowing users to control which keys they trust: in that case, an attacker can simply add their own key to the trusted chain. Additionally, if the attacker has privilege escalation they already have privilege escalation. They can just write any other binary on disk. While macOS has some mitigations against this kind of attack, an attacker who has already achieved privilege escalation is exceedingly hard to defend against, and package signing is unlikely to save you.

Again, I must stress that I’m not saying package signing doesn’t have value. I am saying that it is incumbent upon this pitch to clearly address what attack scenarios are mitigated and how. Adding cryptography to a protocol should not be done simply because it’s nice to have, it should be done with clear and reasoned intent. I’m just asking for the pitch to show its working.

Sorry, I don’t think I posed the question clearly enough, let me rephrase: how does the package manager interact with the PGP ecosystem to manage keys and do verification?

There are lots of possible answers here. Let’s outline some:

The package manager uses any existing gpg installation on the box to manage verification. It does not download signatures or attempt to update them in any way, it just attempts to validate. Missing keys are not errors, and pass silently.
As (1) but missing keys do not pass silently, the package manager emits warnings.
As (1) but missing keys are errors. Users are required to perform out-of-band steps to obtain those keys.
As (2), but the package manager will attempt to download any key from SKS and then present it to the user asking them to validate.
The package manager ships with a known-trusted key.
The package manager can be “configured” with a registry which includes some out-of-band system to communicate a trusted key at setup.
The package manager does nothing, users are required to take manual steps to perform verification.
This pitch declares this out of scope: it says that registries must sign, but places no rules upon the package manager about what to do with this information.

Each of these is very different! They provide different levels of defense, they have different weaknesses and strengths, they trade off availability and ease of use in different ways. IMO this pitch should address whether it a) requires anything from package managers, and b) from registries.

mattt · June 6, 2020, 4:44pm

We're in agreement here. I've already made changes to my draft of the proposal locally that expand on what package signing does and how it works, and I'm continuing to refine that to be responsive to feedback from you and others. Given that this is a key feature of the registries, it deserves a more satisfactory explanation of how it works.

That's an excellent question, and one that I look forward to address in the follow-up proposal for how Swift Package Manager integrates with package registries. And please don't read that as brushing off your concerns — I agree completely that these details matter, and sincerely appreciate your raising these points.

A specification has to balance several competing interests: specificity vs. flexibility, brevity vs. exhaustivity, the needs of providers and consumers. Being too specific about implementation details risks invaliding equally viable (or even better) alternatives.

I agree that more can be done to articulate the motivation and value of the security features introduced by this proposal, but I want to make sure we're doing that in a way that doesn't micromanage implementations.