URLs as Swift Package Identifiers

As discussed with @ktoso, yes, they do. For example Rust, NPM, Haskell.

I am arguing that this is the best option proposed so far, and also good enough. If we all came to the conclusion that this is indeed the case and we proceeded with opaque ID, then there's a separate discussion to have: How are we assigning the opaque IDs. Can anybody register any string? Do they need to start with a reverse DNS that you need to prove you have when registering? Something else? But that's entirely a second step in my mind. The first discussion to have is whether the location of the code and the package ID are linked. In my mind they should not be linked.

AFAIU, the code for all the dependencies can get downloaded from the registry (so we don't need to git clone anymore). And you can mirror that registry into your corp network.

No, at build time, SwiftPM knows (without internet) that opaque ID 1 and opaque ID 2 are the same package when their byte representation is equal. The registry would just make sure that there is of course only one package registered with a given name (sure, two registries could in theory have the same package ID). Again, must like Rust's crates.io (package name is tokio and Crates uses https://crates.io/crates/tokio if you want more information about it), Haskell's stackage (package name for example is bytestring and Hackage uses https://hackage.haskell.org/package/bytestring if you want more info).

You would have your internal mirrors and internal registry that always gives you all the code from inside of your corp network.

Isn't the main claimed advantage of the URL the "trust model" it gives you (I don't buy there's actually any more trust but we discussed that before). I'll quote @mattt :

And the URI-based proposal is not supposed to make "moving packages around" easy:

What you @Jon_Shier seem to be pitching is essentially opaque IDs where the opaque ID happens to be in the form of a URL. With one addition: That registries can also hold aliases. Is that correct?

Personally, at this stage I don't think it's helpful to discuss the shape of the opaque package ID (I think we should first decide that we want opaque package IDs). What I mean with that is that I personally don't mind if the SwiftNIO is called:

  • swift-nio, as in Rust (Crates), Haskell (Hackage/Stackage), JavaScript NPM, ...
  • com.apple.swift-nio, as I think in the JVM ecosystem (with initial proof that you own apple.com)
  • apple/swift-nio like Swift Package Index seems to use
  • afb757d66ed2e63824e8aff4984d6f97c43d85ea (just a hash)
  • https://github.com/apple/swift-nio

What I do care about is that the package name is comparable by a simple string comparison. If we agreed on that, then I'd be very happy. Of course, I also don't think that URIs make good opaque IDs because location may change (and the ID now can't) and people may think it's a URI and not an opaque ID (so they may assume that https://github.com/apple/swift-nio == https://github.com/apple/swift-nio/ or so).

Isn't this a counter example? Assume your package graph contains both https://gitHub.com/apple/swift-nio and https://gitLab.com/apple/swift-nio. I think we agree that SwiftPM needs to de-dupe and find the "correct" SwiftNIO. However,

  • Registry 1 claims: I have a package, canonically hosted at https://gitHub.com/apple/swift-nio with an old alias https://gitLab.com/apple/swift-nio.
  • Registry 2 claims: I have a package, canonically hosted at https://gitLab.com/apple/swift-nio with an old alias https://gitHub.com/apple/swift-nio.
  • Registry 3 claims: I have a package, canonically hosted at https://gitHub.com/apple/swift-nio, no aliases
  • Registry 4 claims: I have a package, canonically hosted at https://gitLab.com/apple/swift-nio

So two registries agree that SwiftNIO has 2 package IDs but they disagree which one is the canonical one. The other two registries see SwiftNIO at only one place. Similar situations can even happen with non-malicious registries if for example one registry can't reach (or doesn't properly implement) the "forwarding". It's a distributed system of mutually distrusting parties, we can't easily reach one definitely correct view of the world (there's a blockchain joke somewhere in here).

What I'm saying is that the trust model is more or less "I trust the registry" (if I have multiple, I probably need an order how much I trust them) in both the URI-based and the opaque ID system. The opaque ID system is however much simpler and solves all the common cases.

I don't think they do. The one thing that URLs can give you is that when you first register, you prove that your code is hosted at a certain URL. Opaque IDs can do a very similar thing if we (at a later stage) decide that to register an opaque ID, you need to prove ownership of a certain domain/website/... . That's I believe what they do in the JVM ecosystem, I think you can only register com.corp.package if you provably own package.corp.com at the time of registration of the package. Conveniently, Github would then allow everybody (through Github pages) to claim io.github.user-name without having to pay money for a domain etc. The crucial thing is that there is only ever one ID that is immutable and doesn't change.

That's great. But it's not related to URI vs. opaque IDs though, right? And once you swift pacakge update your packages, there's again no more trust than what "opaque IDs+trusted registries" gives you too.

The same would apply if we use opaque IDs, right? In both, the URI and the opaque ID case, we usually download the code from the registry. In both cases can also get information about the source host and download it manually directly from there.

Right, which adds extra complexity and I fail to see the benefit. We also need to trust these intermediate registry proxies.

3 Likes

@johannesweiss I really appreciate your feedback, but I'm having trouble understanding some of your concerns, which don't seem to relate to what we're proposing. I think there's a misunderstanding that's causing us to talk past one another...

Based on your reply to @Jon_Shier, I think one point of misunderstanding may be that our proposal doesn't involve a list of registries like in Maven or other systems.

Where are registries #3 and #4 coming from in your hypothetical?

Here's how I understand this scenario to play out according to our proposal:

  1. A package graph contains both github.com/apple/swift-nio (:octopus:) and gitlab.com/apple/swift-nio (:fox_face:).
  2. Swift Package Manager sends a HEAD request to github.com/apple/swift-nio (:octopus:), and the server responds with a redirect to the registry interface at swift.pkg.github.com/github.com/apple/swift-nio (:octopus::package:)
  3. Swift Package Manager sends a GET request to swift.pkg.github.com/github.com/apple/swift-nio (:octopus::package:), which returns a package registry response that says gitlab.com/apple/swift-nio (:fox_face:) is the canonical URL for the package.
  4. Swift Package Manager now knows that it can relabel any dependency nodes in its resolution graph with github.com/apple/swift-nio (:octopus:) to instead be gitlab.com/apple/swift-nio (:fox_face:).

So long as there isn't a circular reference, where gitlab.com/apple/swift-nio (:fox_face:) points to github.com/apple/swift-nio (:octopus:) as the canonical version, then there's no problem. There could be any number of registry interfaces out there (corporate networks, personal domains, mirrors, malicious websites), but those don't matter — we don't consult them.

Would it help to describe this proposal as an "alternative download interface" instead of using the word "registry"?

2 Likes

How would swift.pkg.github.com know about the new package URL at gitlab.com though? And what if gitlab.com has their own registry (e.g., swift.pkg.gitlab.com)?

2 Likes

From my response to @abertelrud:


That would be no problem at all. GitLab would communicate with Swift Package Manager to list versions, fetch manifests, and download source archives.

I do agree with this; in the absence of any other information github.com/mona/LinkedList and giblab.com/mona/LinkedList should be treated as unrelated packages (and indeed an internal corp registry should happily be able to mirror both and resolve package graphs accordingly). But we were discussing the capability for a registry to be able to explicitly declare they are equivalent packages and dupe them in a package graph.

Essentially this does feel like what is being proposed where the identifier of a package is a canonical URI of the form 'https://home-registry/registry-specific-package-identifier'.

I'd prefer the package identifier not to be a URI for these points however ultimately I think the proposal is workable. I was primarily proposing using something like github.com/apple/swift-nio or apple/swift-nio, homed on github.com as the package identifier rather than https://github.com/apple/swift-nio (so specifically not a location/URI but carrying the same information) as opposed to huge fundamental differences to what has been proposed.

4 Likes

Yes, I totally did. I do apologise. I did now re-read the proposal and I can confirm that it doesn't mention that you can configure additional registries (which I was just assuming to be true, apologies). However, it also doesn't mention that you cannot add registries. Which especially given that it mentions search (at the very bottom) I find surprising.

I think that would help. Also would it help to explicitly spell out "No, you can't configure your own registries."

Knowing what I know now, I think I have much graver concerns.


Assumptions:

  • As a user, I cannot manually configure registries.
  • To benefit from the package "registry" feature, the selected git host for a certain package will ALSO need to (correctly) implement the SwiftPM registry protocol.

If the above is correct, then I have a bunch of questions:

  • How does the bootstrapping work? I understand that Github was involved and are willing to implement this. What are the timelines? What do we do before Github has implemented this?
  • How does one enable registry support in Github? Is it on or off by default?
  • Once Github has implemented the feature, how do we get bugfixes if they mis-implemented something? Will the code be open-source at least?
  • What about the people who don't have (or can't have) a Github account?
  • Are we expecting people that use say Gitlab to also create a repository on Github just so their package benefits from the registry feature?
  • If it is the case that to start with everybody should have a Github repository to benefit from this feature, do the package authors now also need to convince all of their users to enter https://github.com/my-org/my-package-just-here-for-the-registry despite the fact that they're actually hosted at https://my-skunk-works-git-host.com/swifty-awesome/swifty-awesome?
  • What if Github or another git host decommissions its SwiftPM package service for some reason? Are we then expecting everybody to move elsewhere? If yes, how does the forwarding now work given that the registry service was just shut down?
1 Like

Your first assumption is incorrect, and that's where the intermediate registry comes in.

By default, Swift Package Manager will try to connect to package source hosts using the registry interface, falling back to Git if that's unavailable (and the user hasn't disabled this fallback behavior). If a mirror is configured for that package, SPM does the same thing but with the mirrored URL.

If a user specifies an intermediate registry (swift package config set-registry-proxy https://internal.example.com/), Swift Package Manager will only consult that configured URL through the registry interface.

If you have a moment, please take a look at this test harness I set up, which lets you try out the registry interface yourself today. It uses mirrors instead of a intermediate registry, but the effect is the same.


One of the ways I've described our proposal is to relate this line from Mitch Hedberg:

I like an escalator, because it can never break. It can only becomes stairs.

If the registry interface is available, Swift Package Manager will use that. If not, it can fall back on how things work right now. Our proposal is designed so that users can benefit from stronger security guarantees and better performance with little to no changes in their package manifest.

Let me know if you think the questions you raised here are still valid given this change in assumptions, and I'll do my best to address them.

Opaque package identifiers are preferred...Using opaque identifiers implies location is separate from package identity.

As someone who works on artifact registry service provider (AWS CodeArtifact), I agree that it is important to separate package location and package identity.

A section that details how the registries configuration works.

Yes, if SwiftPM extracts the package registry host based on the package URL then there needs to be an alternative configuration mechanism to use a registry at a different location, such as an Artifactory or CodeArtifact registry that isn’t at the same host name as the git repository. Specifically, our customers would request a way to configure SwiftPM to fetch all packages from the configured registries.

2 Likes

As I said in my reply to @yim_lee:


This is addressed in my original post on this thread:

2 Likes

Intermediate registries aren't part of the actual proposal, though, are they? They are listed in Future Directions right now: https://github.com/apple/swift-evolution/blob/64922bdf9e218a1673937a35c7d79d33f29dde99/proposals/0292-package-registry-service.md#intermediate-registry-proxies

1 Like

They were put under Future Directions in an attempt to scope the work (they aren't a necessary feature for early adopters, since you can do the same with mirrors and other mechanisms), but we could certainly incorporate them into the main proposal before submitting for second review.

2 Likes

Fantastic, I'm glad to hear and sorry I missed that. I thought the intermediate registry proxies are just meant for corp and other special environments and not a thing general users use. But of course, you're right, we could just make everybody configure that intermediate registry proxy to something like https://swiftpackageindex.com or so (assuming they're willing to implement).

Isn't that similar to every user having configured exactly one registry?

New assumption:

  • all users fall into exactly two camps:
    1. the ones that do not have a config set-registry-proxy
    2. the ones that have a config set-registry-proxy

If my assumption is correct, then I have two sets of responses:

  • For users in camp 1 (no registry proxy): All the questions in the previous post that assumed no configured registry over here.
  • For users in camp 2 (registry proxy configured):
    Don't all users in camp 2 essentially have just one single registry that decides everything? If that's the case what is the problem that URIs solve?
    • There's no extra trust (because we have to trust the registry proxy).
    • Location is in the package ID (the URI) but it doesn't actually apply.
    • If there is essentially just one single configured registry, what would be the problem with opaque IDs?

From what I understand (may be wrong again ofc) right now is that to solve the bootstrapping issue, we'd start with everybody in camp 2 and have everybody configure their registry proxy to something that works from day 1. Is that accurate?

Users can start using the registry interface today by setting up a registry service (like my reference implementation) and using mirrors / intermediate registry to resolve through that.

Once GitHub adds support, users can start taking advantage of it for their dependencies hosted on github.com without any additional configuration (just pass the --enable-package-registry option). I don't know and couldn't speak for what GitHub's plans are, as far as timing, whether it's opt-in, or the nature of how they would support it.

The story is the same for dependencies hosted elsewhere: If the host supports the registry interface (whether that's gitlab.com or my-skunk-works-git-host.com), Swift Package Manager would try to resolve using that.

By configuring to resolve everything through a single, chosen registry, you are indeed trusting them to report available versions and provide source archives. As I mentioned before, checksums allow us to verify the integrity of downloaded code.

When going directly through a single, configured registry in this way, the URIs effectively become opaque identifiers. I believe the advantage of using URIs is that it supports a seamless migration from our current URL-based system to the proposed registry interface ("escalators don't break, they become stairs").

If your registry serves up github.com/Alamofire/Alamofire v 5.4.0, it's clear what URL you could go to out-of-band to audit that code, find documentation, and see a changelog.

The one other benefit that I'd like to note about URIs, that I haven't discussed so far, is that URIs can establish relationships between forked repositories. If a CVE is issued for github.com/mona/LinkedList, it follows that forks of that repository are likely to have the same vulnerability. We can determine shared history and therefore shared vulnerability through GitHub's website in a way that opaque identifiers can't.

Hmm, this reads like we'll have to have precisely one registry to start with. And likely forever. I just don't see all relevant hosting providers (correctly) implementing this and we even reduce their incentives to implement it because to bootstrap everybody will use the registry proxies.

I don't think the mirror setup works. I don't think we should start to circulate a huge mirrors file for all repositories. Then essentially the users are doing the job of a registry by distributing a large file that has to always be kept up to date (similar to how the hosts file was circulated on the ARPAnet (is that how it was called?) before we had DNS.)

If we accept that the current proposal will essentially be used as one global registry (called the "registry proxy") + URIs as opaque identifiers, then I think we made progress in the discussion.

We can then start to argue if URIs are good or bad opaque IDs. My opinion is that they are bad opaque IDs because users don't think they are opaque, de-duping becomes hard or impossible, the location doesn't mean anything, ...

(this paragraph is a later edit of the post): As others have pointed out "registry proxies" aren't actually proposed but I'm still operating as if they were proposed. Because without we can't bootstrap the registry ecosystem and we're locked into Github [if/when they even implement], why? See the questions at the end of this post because without registry proxies my assumptions are actually correct I think.

With opaque IDs, the registry would also know where the code lives. So it can query whatever attributes it wants too. And yes, it should definitely query Github's security information.

1 Like

To be clear: you wouldn't configure github.com or swift.pkg.github.com as your intermediate registry client-side. As soon as GitLab or another host implemented Swift registry support, that would be a few more packages that benefit from speed and security improvements of that interface.

Again, that's where the intermediate registry configuration comes in. Mirrors are for really for one-offs, an intermediate registry is for handling all package routing. Edit: The mirrors approach used in the test harness I linked to are a temporary scaffolding.

That's not guaranteed. A requirement of enterprise artifact repositories like CodeArtifact and Artifactory is that they can store artifacts without getting access to the original source code.

1 Like

Yes, I understand. I was more thinking of configuring https://mattts-implementation.registry or https://swiftpackageindex.com or whatever the community settles on. Likely there'll be one that turns out to be "the one" that everybody configures to keep everything working and benefit from the faster downloads etc. My assumption is that swift.pkg.github.com won't actually be there when SwiftPM's registry feature launches.

I'm not super convinced they'll all implement it. And without all the relevant ones implementing it, I need a registry proxy.

Yes, I understand. I just phrased it badly. Bottom line is that everybody will need a registry proxy IMHO.

Yes, they don't need access to serve you the artefacts. But from what I understand, they do have metadata, so they will still know where it's from (and they can know where it's actually from, unlike the URIs where it may or may not be there). So you can easily imagine some cron-job like thing that periodically checks if new versions have to be put on the bad list for security vulnerabilities.

Thanks for clarifying. I'm actually not too worried about this — in fact, I'm encouraged by what will come of competition. Since our proposal works without configuring a registry, any such service would necessarily be a value-add in some respect. Broadly-speaking, I don't think a central registry will be faster than connecting to the code host directly (though you could imagine an region-specific offerings). Other services may try to differentiate on being more secure, more auditable wrt/ licensing, or having integrations with enterprise software. I don't think this would be a winner-take-all.

And besides, the server API is designed to be simple enough that anyone can host a registry themselves with static files.

And that's totally fine. Swift Package Manager can continue to resolve dependencies on gitlab.com by cloning Git repositories as it does now.

Maybe at first, when this feature is in its early stages. But that scaffolding can be removed once GitHub and other hosts implement registry support.

My point is that URIs encode the inherent relationships between projects in a way that opaque identifiers can't. A registry could associate equivalent metadata, but that would be an additional step.

2 Likes

But do they, in a way that’s reliable and useful? If I understand correctly, you’re suggesting to associate forks based on URI alone, right? Wouldn’t that open the door to an unrelated CVE-ridden repository with a matching name triggering false alarms?

If that’s the case then I think that’s probably an example why I have the nagging feeling that URIs aren’t great as ids. It’s just too tempting to parse them when all that should be done is compare them.

Sorry, that's not what exactly what I'm saying. You can't know the relationship between github.com/mona/LinkedList and github.com/octocorp/LinkedList from the URIs alone. But the resources located by those URLs can tell you what they are, independently of the registry. An opaque identifier would require an additional piece of metadata to make that connection.

Every external Swift package dependency is already identified by URL. Our proposal is to continue this tradition, rather than invent a new layer of abstraction.

2 Likes

Right, I see what you mean now, thanks for clearing that up!

1 Like