URLs as Swift Package Identifiers

What is those languages in this sentence? JVM languages? E.g. java, scala, kotlin, clojure etc?

There very much is a blessed default "standard" repository that pretty much all of them use: maven central. Even package managers which are not maven (gradle, ivy, sbt, lein, ...), still default to using maven central and resolve it because it is simply the good default to go to. There's alternatives, such as jcentral but usually to use those you'd add a single line to your package definition to use it in addition to or instred of maven central.

The same would be true here I imagine, we'd have some default related to swift.org I guess, and others can be configured.

I've not tracked this thread in depth, but wanted to chime in to clear up this confusion about such registries in the real world -- yes they exist and work just like that.

4 Likes

This really doesn't seem to be likely, given everything that's been discussed around the registry proposal and the fact that Mattt's been working with GitHub to define it. At best it seems that Swift can expect GitHub to host an integrated version of a Swift registry (if it's accepted), but I'm not sure that would play the same role as Maven Central. If Apple is planning on an official Swift registry they should announce that fact now, as it effects many of the decisions we're talking about. With a centralized registry at the "top" of the stack, opaque identifiers become much more trustable. Without one, which seems likely, Swift is in a different situation than the other languages.

2 Likes

To be clear, I don't think it matters if there would be a swift one or not, nor do I know if there will be.

What matters is that it is likely that one or some repositories are likely to be "default", be it the github one, a swift one or some other community maintained one (e.g. "Sundel's favorite packages" :wink:).

I'm just drawing the parallel/potential to maven central at some default existing, while the entire system is completely distributed. If you don't like maven central, you can not use it at all, as is true with repositories here as well.

Pretty much, since it could be that "good default".

There is no "top" in the model at all. Everything is completely and fully distributed and repositories are "equal", other than how the user decides to configure them (e.g. order in configs might matter). There are "good defaults" and it does not mean that it is in any way "the one and only authoritative source of identifiers".

4 Likes

It seems odd to me to describe a system with a de facto authority as “fully distributed”, but whatever. I don’t think allowing an authority to organically arrive at some point in the future is a good idea. The registry proposal doesn’t envision such a future, GitHub doesn’t seem to want such a future, and the community won’t be best served by such a future.

2 Likes

Sure it is - a shared default does not mean a system isn't distributed/decentralized, because you can always choose to not use the default and use whatever you want instead.

This is in the same way git is distributed: even though most people use github as their upstream, nothing mandates doing so. And at any point, without contacting github, you may decide to move upstream somewhere else and just start doing that.

Anyway, we derailed from the core discussion so let's get back to that :slight_smile:

5 Likes

@johannesweiss @tachyonics I think you both identify some important points. However, we appear to be reaching different conclusions, and I'm not sure if it's due to disagreement or misunderstanding on one or more points.

To help identify where we disagree, here's my understanding of your concerns and my responses to them. Please tell me if I've mischaracterized your positions or if you have any objections / questions to my points:

"What if a mirror redirects github.com to mailicio.us?"

Swift Package Manager lets you override the location of packages with mirrors (SE-0219). This allows developers to point external dependencies to forks across the dependency graph. As you've identified, the trade-off is that it introduces context for understanding where a dependency is located.

Our proposal associates SHA256 checksums to versioned source archives in Package.resolved. If you do a fresh build of your project (such as in a CI) and a host tries to send you something else, Swift Package Manager would raise an error. This is an improvement over current behavior, which can't protect against external dependency code changing over time (whether because it's hosted elsewhere, or the version was re-tagged in the original repository).

As we write in the Security section of our proposal, and previously discussed in a separate thread:

Package.resolved provides a Trust on first use (TOFU) security model that can offer strong guarantees about the integrity of dependencies over time. A registry can further improve on this model by implementing a transparent log or some comparable, tamper-proof system for associating artifacts with valid checksums.

As I said in a previous reply, tooling — whether that's a new subcommand or making the output of build and resolve more verbose — could go a long way to helping us understand what's being downloaded from where.

"What if I'm on a corporate network and can't access the Internet?"

We've pitched an enhancement that allows users to configure an intermediate registry proxy. Whereas a mirror lets you redirect a single dependency, an intermediate proxy would route all packages directly. A corporate CI server, for example, could configure an intermediate proxy to an internal package registry to serve cached versions of external packages.

If you're concerned about downloading code at all as part of the build process, the alternative is to vendor all of your external dependencies. This is currently a manual process, but we could add functionality to do this automatically (for example, a swift package vendor subcommand that resolves and downloads dependencies to Vendor/Sources/ and rewrites Package.swift to declare dependencies using local paths).

"Why github.com/mona/LinkedList and not 'mona/Linked, hosted on GitHub?'"

For example, what @tachyonics described as:

It's tempting to treat GitHub and GitLab as interchangeable, but they're not. To use an analogy: we can't assume mona/LinkedList is the same on GitHub and GitLab, much the same as we can't assume that @mona on Twitter is the same person on Instagram. (To extend the analogy: our proposal is more like Mastodon than Twitter)

Packaging systems in other languages demonstrate that there's more than one way to solve the problem of external dependency resolution. A lot of things would be easier if all identifiers went through a central registry, but that's not how things work currently, and I don't think that's how things should work going forward. The decisions we made in our proposal were informed by facts specific to Swift and Swift Package Manager; given an alternative set of requirements, we might make different decisions.

Or, as @Jon_Shier put it:

3 Likes

As discussed with @ktoso, yes, they do. For example Rust, NPM, Haskell.

I am arguing that this is the best option proposed so far, and also good enough. If we all came to the conclusion that this is indeed the case and we proceeded with opaque ID, then there's a separate discussion to have: How are we assigning the opaque IDs. Can anybody register any string? Do they need to start with a reverse DNS that you need to prove you have when registering? Something else? But that's entirely a second step in my mind. The first discussion to have is whether the location of the code and the package ID are linked. In my mind they should not be linked.

AFAIU, the code for all the dependencies can get downloaded from the registry (so we don't need to git clone anymore). And you can mirror that registry into your corp network.

No, at build time, SwiftPM knows (without internet) that opaque ID 1 and opaque ID 2 are the same package when their byte representation is equal. The registry would just make sure that there is of course only one package registered with a given name (sure, two registries could in theory have the same package ID). Again, must like Rust's crates.io (package name is tokio and Crates uses https://crates.io/crates/tokio if you want more information about it), Haskell's stackage (package name for example is bytestring and Hackage uses https://hackage.haskell.org/package/bytestring if you want more info).

You would have your internal mirrors and internal registry that always gives you all the code from inside of your corp network.

Isn't the main claimed advantage of the URL the "trust model" it gives you (I don't buy there's actually any more trust but we discussed that before). I'll quote @mattt :

And the URI-based proposal is not supposed to make "moving packages around" easy:

What you @Jon_Shier seem to be pitching is essentially opaque IDs where the opaque ID happens to be in the form of a URL. With one addition: That registries can also hold aliases. Is that correct?

Personally, at this stage I don't think it's helpful to discuss the shape of the opaque package ID (I think we should first decide that we want opaque package IDs). What I mean with that is that I personally don't mind if the SwiftNIO is called:

  • swift-nio, as in Rust (Crates), Haskell (Hackage/Stackage), JavaScript NPM, ...
  • com.apple.swift-nio, as I think in the JVM ecosystem (with initial proof that you own apple.com)
  • apple/swift-nio like Swift Package Index seems to use
  • afb757d66ed2e63824e8aff4984d6f97c43d85ea (just a hash)
  • https://github.com/apple/swift-nio

What I do care about is that the package name is comparable by a simple string comparison. If we agreed on that, then I'd be very happy. Of course, I also don't think that URIs make good opaque IDs because location may change (and the ID now can't) and people may think it's a URI and not an opaque ID (so they may assume that https://github.com/apple/swift-nio == https://github.com/apple/swift-nio/ or so).

Isn't this a counter example? Assume your package graph contains both https://gitHub.com/apple/swift-nio and https://gitLab.com/apple/swift-nio. I think we agree that SwiftPM needs to de-dupe and find the "correct" SwiftNIO. However,

  • Registry 1 claims: I have a package, canonically hosted at https://gitHub.com/apple/swift-nio with an old alias https://gitLab.com/apple/swift-nio.
  • Registry 2 claims: I have a package, canonically hosted at https://gitLab.com/apple/swift-nio with an old alias https://gitHub.com/apple/swift-nio.
  • Registry 3 claims: I have a package, canonically hosted at https://gitHub.com/apple/swift-nio, no aliases
  • Registry 4 claims: I have a package, canonically hosted at https://gitLab.com/apple/swift-nio

So two registries agree that SwiftNIO has 2 package IDs but they disagree which one is the canonical one. The other two registries see SwiftNIO at only one place. Similar situations can even happen with non-malicious registries if for example one registry can't reach (or doesn't properly implement) the "forwarding". It's a distributed system of mutually distrusting parties, we can't easily reach one definitely correct view of the world (there's a blockchain joke somewhere in here).

What I'm saying is that the trust model is more or less "I trust the registry" (if I have multiple, I probably need an order how much I trust them) in both the URI-based and the opaque ID system. The opaque ID system is however much simpler and solves all the common cases.

I don't think they do. The one thing that URLs can give you is that when you first register, you prove that your code is hosted at a certain URL. Opaque IDs can do a very similar thing if we (at a later stage) decide that to register an opaque ID, you need to prove ownership of a certain domain/website/... . That's I believe what they do in the JVM ecosystem, I think you can only register com.corp.package if you provably own package.corp.com at the time of registration of the package. Conveniently, Github would then allow everybody (through Github pages) to claim io.github.user-name without having to pay money for a domain etc. The crucial thing is that there is only ever one ID that is immutable and doesn't change.

That's great. But it's not related to URI vs. opaque IDs though, right? And once you swift pacakge update your packages, there's again no more trust than what "opaque IDs+trusted registries" gives you too.

The same would apply if we use opaque IDs, right? In both, the URI and the opaque ID case, we usually download the code from the registry. In both cases can also get information about the source host and download it manually directly from there.

Right, which adds extra complexity and I fail to see the benefit. We also need to trust these intermediate registry proxies.

3 Likes

@johannesweiss I really appreciate your feedback, but I'm having trouble understanding some of your concerns, which don't seem to relate to what we're proposing. I think there's a misunderstanding that's causing us to talk past one another...

Based on your reply to @Jon_Shier, I think one point of misunderstanding may be that our proposal doesn't involve a list of registries like in Maven or other systems.

Where are registries #3 and #4 coming from in your hypothetical?

Here's how I understand this scenario to play out according to our proposal:

  1. A package graph contains both github.com/apple/swift-nio (:octopus:) and gitlab.com/apple/swift-nio (:fox_face:).
  2. Swift Package Manager sends a HEAD request to github.com/apple/swift-nio (:octopus:), and the server responds with a redirect to the registry interface at swift.pkg.github.com/github.com/apple/swift-nio (:octopus::package:)
  3. Swift Package Manager sends a GET request to swift.pkg.github.com/github.com/apple/swift-nio (:octopus::package:), which returns a package registry response that says gitlab.com/apple/swift-nio (:fox_face:) is the canonical URL for the package.
  4. Swift Package Manager now knows that it can relabel any dependency nodes in its resolution graph with github.com/apple/swift-nio (:octopus:) to instead be gitlab.com/apple/swift-nio (:fox_face:).

So long as there isn't a circular reference, where gitlab.com/apple/swift-nio (:fox_face:) points to github.com/apple/swift-nio (:octopus:) as the canonical version, then there's no problem. There could be any number of registry interfaces out there (corporate networks, personal domains, mirrors, malicious websites), but those don't matter — we don't consult them.

Would it help to describe this proposal as an "alternative download interface" instead of using the word "registry"?

1 Like

How would swift.pkg.github.com know about the new package URL at gitlab.com though? And what if gitlab.com has their own registry (e.g., swift.pkg.gitlab.com)?

2 Likes

From my response to @abertelrud:


That would be no problem at all. GitLab would communicate with Swift Package Manager to list versions, fetch manifests, and download source archives.

I do agree with this; in the absence of any other information github.com/mona/LinkedList and giblab.com/mona/LinkedList should be treated as unrelated packages (and indeed an internal corp registry should happily be able to mirror both and resolve package graphs accordingly). But we were discussing the capability for a registry to be able to explicitly declare they are equivalent packages and dupe them in a package graph.

Essentially this does feel like what is being proposed where the identifier of a package is a canonical URI of the form 'https://home-registry/registry-specific-package-identifier'.

I'd prefer the package identifier not to be a URI for these points however ultimately I think the proposal is workable. I was primarily proposing using something like github.com/apple/swift-nio or apple/swift-nio, homed on github.com as the package identifier rather than https://github.com/apple/swift-nio (so specifically not a location/URI but carrying the same information) as opposed to huge fundamental differences to what has been proposed.

4 Likes

Yes, I totally did. I do apologise. I did now re-read the proposal and I can confirm that it doesn't mention that you can configure additional registries (which I was just assuming to be true, apologies). However, it also doesn't mention that you cannot add registries. Which especially given that it mentions search (at the very bottom) I find surprising.

I think that would help. Also would it help to explicitly spell out "No, you can't configure your own registries."

Knowing what I know now, I think I have much graver concerns.


Assumptions:

  • As a user, I cannot manually configure registries.
  • To benefit from the package "registry" feature, the selected git host for a certain package will ALSO need to (correctly) implement the SwiftPM registry protocol.

If the above is correct, then I have a bunch of questions:

  • How does the bootstrapping work? I understand that Github was involved and are willing to implement this. What are the timelines? What do we do before Github has implemented this?
  • How does one enable registry support in Github? Is it on or off by default?
  • Once Github has implemented the feature, how do we get bugfixes if they mis-implemented something? Will the code be open-source at least?
  • What about the people who don't have (or can't have) a Github account?
  • Are we expecting people that use say Gitlab to also create a repository on Github just so their package benefits from the registry feature?
  • If it is the case that to start with everybody should have a Github repository to benefit from this feature, do the package authors now also need to convince all of their users to enter https://github.com/my-org/my-package-just-here-for-the-registry despite the fact that they're actually hosted at https://my-skunk-works-git-host.com/swifty-awesome/swifty-awesome?
  • What if Github or another git host decommissions its SwiftPM package service for some reason? Are we then expecting everybody to move elsewhere? If yes, how does the forwarding now work given that the registry service was just shut down?
1 Like

Your first assumption is incorrect, and that's where the intermediate registry comes in.

By default, Swift Package Manager will try to connect to package source hosts using the registry interface, falling back to Git if that's unavailable (and the user hasn't disabled this fallback behavior). If a mirror is configured for that package, SPM does the same thing but with the mirrored URL.

If a user specifies an intermediate registry (swift package config set-registry-proxy https://internal.example.com/), Swift Package Manager will only consult that configured URL through the registry interface.

If you have a moment, please take a look at this test harness I set up, which lets you try out the registry interface yourself today. It uses mirrors instead of a intermediate registry, but the effect is the same.


One of the ways I've described our proposal is to relate this line from Mitch Hedberg:

I like an escalator, because it can never break. It can only becomes stairs.

If the registry interface is available, Swift Package Manager will use that. If not, it can fall back on how things work right now. Our proposal is designed so that users can benefit from stronger security guarantees and better performance with little to no changes in their package manifest.

Let me know if you think the questions you raised here are still valid given this change in assumptions, and I'll do my best to address them.

Opaque package identifiers are preferred...Using opaque identifiers implies location is separate from package identity.

As someone who works on artifact registry service provider (AWS CodeArtifact), I agree that it is important to separate package location and package identity.

A section that details how the registries configuration works.

Yes, if SwiftPM extracts the package registry host based on the package URL then there needs to be an alternative configuration mechanism to use a registry at a different location, such as an Artifactory or CodeArtifact registry that isn’t at the same host name as the git repository. Specifically, our customers would request a way to configure SwiftPM to fetch all packages from the configured registries.

2 Likes

As I said in my reply to @yim_lee:


This is addressed in my original post on this thread:

2 Likes

Intermediate registries aren't part of the actual proposal, though, are they? They are listed in Future Directions right now: https://github.com/apple/swift-evolution/blob/64922bdf9e218a1673937a35c7d79d33f29dde99/proposals/0292-package-registry-service.md#intermediate-registry-proxies

1 Like

They were put under Future Directions in an attempt to scope the work (they aren't a necessary feature for early adopters, since you can do the same with mirrors and other mechanisms), but we could certainly incorporate them into the main proposal before submitting for second review.

2 Likes

Fantastic, I'm glad to hear and sorry I missed that. I thought the intermediate registry proxies are just meant for corp and other special environments and not a thing general users use. But of course, you're right, we could just make everybody configure that intermediate registry proxy to something like https://swiftpackageindex.com or so (assuming they're willing to implement).

Isn't that similar to every user having configured exactly one registry?

New assumption:

  • all users fall into exactly two camps:
    1. the ones that do not have a config set-registry-proxy
    2. the ones that have a config set-registry-proxy

If my assumption is correct, then I have two sets of responses:

  • For users in camp 1 (no registry proxy): All the questions in the previous post that assumed no configured registry over here.
  • For users in camp 2 (registry proxy configured):
    Don't all users in camp 2 essentially have just one single registry that decides everything? If that's the case what is the problem that URIs solve?
    • There's no extra trust (because we have to trust the registry proxy).
    • Location is in the package ID (the URI) but it doesn't actually apply.
    • If there is essentially just one single configured registry, what would be the problem with opaque IDs?

From what I understand (may be wrong again ofc) right now is that to solve the bootstrapping issue, we'd start with everybody in camp 2 and have everybody configure their registry proxy to something that works from day 1. Is that accurate?

Users can start using the registry interface today by setting up a registry service (like my reference implementation) and using mirrors / intermediate registry to resolve through that.

Once GitHub adds support, users can start taking advantage of it for their dependencies hosted on github.com without any additional configuration (just pass the --enable-package-registry option). I don't know and couldn't speak for what GitHub's plans are, as far as timing, whether it's opt-in, or the nature of how they would support it.

The story is the same for dependencies hosted elsewhere: If the host supports the registry interface (whether that's gitlab.com or my-skunk-works-git-host.com), Swift Package Manager would try to resolve using that.

By configuring to resolve everything through a single, chosen registry, you are indeed trusting them to report available versions and provide source archives. As I mentioned before, checksums allow us to verify the integrity of downloaded code.

When going directly through a single, configured registry in this way, the URIs effectively become opaque identifiers. I believe the advantage of using URIs is that it supports a seamless migration from our current URL-based system to the proposed registry interface ("escalators don't break, they become stairs").

If your registry serves up github.com/Alamofire/Alamofire v 5.4.0, it's clear what URL you could go to out-of-band to audit that code, find documentation, and see a changelog.

The one other benefit that I'd like to note about URIs, that I haven't discussed so far, is that URIs can establish relationships between forked repositories. If a CVE is issued for github.com/mona/LinkedList, it follows that forks of that repository are likely to have the same vulnerability. We can determine shared history and therefore shared vulnerability through GitHub's website in a way that opaque identifiers can't.

Hmm, this reads like we'll have to have precisely one registry to start with. And likely forever. I just don't see all relevant hosting providers (correctly) implementing this and we even reduce their incentives to implement it because to bootstrap everybody will use the registry proxies.

I don't think the mirror setup works. I don't think we should start to circulate a huge mirrors file for all repositories. Then essentially the users are doing the job of a registry by distributing a large file that has to always be kept up to date (similar to how the hosts file was circulated on the ARPAnet (is that how it was called?) before we had DNS.)

If we accept that the current proposal will essentially be used as one global registry (called the "registry proxy") + URIs as opaque identifiers, then I think we made progress in the discussion.

We can then start to argue if URIs are good or bad opaque IDs. My opinion is that they are bad opaque IDs because users don't think they are opaque, de-duping becomes hard or impossible, the location doesn't mean anything, ...

(this paragraph is a later edit of the post): As others have pointed out "registry proxies" aren't actually proposed but I'm still operating as if they were proposed. Because without we can't bootstrap the registry ecosystem and we're locked into Github [if/when they even implement], why? See the questions at the end of this post because without registry proxies my assumptions are actually correct I think.

With opaque IDs, the registry would also know where the code lives. So it can query whatever attributes it wants too. And yes, it should definitely query Github's security information.

1 Like