URLs as Swift Package Identifiers

Cavelle_Benjamin · January 30, 2021, 10:24pm

Why not use gpg as proof of ownership or at least the SHA from from say a zip version of the repo.

If the job to be done is downloading a package from a registry with the following UX considerations

Least amount of cognitive overhead
Least number of steps
Least amount of total time
Fast resolution of issues

There may be a smaller list of alternatives to consider.

drewbenson · January 31, 2021, 5:44am

Any plan where changing the URL of the package changes its identifier is a bad plan.

drewbenson · January 31, 2021, 5:59am

signing. me.michelf can host a public key. All of Michael's package scan be signed with his private key.

tachyonics · February 3, 2021, 8:46pm

Options for preventing this have previously been floated on this thread. We could build it into the expectations of a well-behaving registry that the registry must verify that someone publishing the package com.utilities.Utilities actually does own the domain utilities.com. There are other variations on registry-level identity verification that could also be used.

Jon_Shier · February 3, 2021, 10:03pm

I don't believe reverse DNS which ties into the actual DNS system meets the definition of opaque identifier for the core team, otherwise they would've been okay with regular URLs which we could normalize into similar notation. Of course, reverse DNS that's simply registered somewhere could meet that definition. In any case, requiring every package to own a unique domain seems like both a high bar and untenable in the long run given the ongoing cost of registration.

tachyonics · February 3, 2021, 10:55pm

I think the core team has been clear on this point and the difference between identities that are opaque with regards to hosting location compared to hosting location specific URLs.

Maybe. But it might be the best compromise get to 1) package identifiers independent of hosting location 2) unique package identifiers 3) a distributed package ecosystem

tomerd · February 3, 2021, 11:06pm

Yes. The core idea is that the package identifiers are not used to deduce where the package is hosted.

The concrete identifier format (e.g. simple string, namespace/name pair, reverse DNS) is open to discussion as there are tradeoffs for the various options, and the forums are a great way to discuss this and inform the design.

John_McCall · February 3, 2021, 11:08pm

I wouldn't say it categorically doesn't meet our requirements, as long as there isn't some technical requirement imposed on the domain that would recreate the hosting problem, like adding special TXT records to DNS. It would have to be understood that organizations don't get to stably repudiate packages; for example, if my illicit shell company creates the package com.rjmccall.MoneyLaundr, and I realize too late that I shouldn't have published that under my real name, I have to recognize that renaming it might cause problems for my clients.

A random UUID would side-step that problem entirely, of course.

tomerd · February 3, 2021, 11:12pm

The way systems like Maven use DNS is solely to prove ownership at the time of registering a "namespace", and not to define where the package is hosted. As such, DNS based identity scheme (does not have to be reverse actually) could work if we use it in similar fashion - proof of ownership when registering a namespace, then opaque string for dependency resolution.

iabudiab · February 4, 2021, 12:08am

Well it seems, at least to me, that the discussion is going in circles. Which is why I would suggest to take a step back and look at this through a series of questions/use-cases in the opposite direction, a reverse brainstorming if you will.

As I could gather, the main points to consider are:

A decentralised model
Package identity can't be tied to its location
- => URLs as the package identifier are unacceptable (according to the core team)
- => Hence, opaque IDs are preferred
Namespacing is required, that is not location-based, to prevent conflicts

So here are some questions from the top of my head to bounce around. Some may seem trivial but I am just thinking loud here.

Assuming a decentralised package registry model:

Does hosting a public repository, for example on GitHub, mean that it is a published package?
Does the act of forking said repository mean publishing the same package again?
- If so, is it the same package at another location? or a new package?
- For example, is the announced ElasticSearch fork by AWS the same package as the original?
What if we use <org>/<repo> as in <namespace>/<package>? i.e. OctoCorp/LinkedList as ID?
- Assuming GitHub supports the Registry API, then there is no conflict within GitHub with this model
- A fork is then by definition a new package
What about the same <namespace>/<package> on another registry, e.g. GitLab?
- How should this work in SwitPM? Should SwiftPM know about all the registries out there and scan each one of them for the package ID?
- Or should the "to scan" registries be configured beforehand?
What about mirroring, i.e. requiring <namespace>/<package> from local registry, that doesn't have said package?
- Again, should it just 404 or search for it in all the registries?
- How do we even know about all the registries?
What about hosting an internal registry for corporate use?
- What should happen if my internal namespace shadows another one in a configured registry?
Assume two registries are somehow configured and we require a package which exists in both. What should SwiftPM do?
- Demand manual intervention to specify the registry explicitely <registry-id>/<namespace>/<package> e.g. GitHub/OctoCorp/LinkedList
- Or try to resolve the graph by itself? How?

Then there is the perspective of the package maintainer/publisher.

Do we really want to force a package maintainer to register a domain, to host some key, well-known, etc.?
Or to prove ownership in some other way? e.g. Maven style when registering a namespace.

IMHO:

We shouldn't focus on preventing global ID conflict in a decentralised model, but rather focus on defining a clean resolution strategy
We should focus on making publishing packages as easy as possible, because that is how a diverse and flourishing ecosystem emerges.

mmarston · February 4, 2021, 12:14am

I see merit to what you are suggest here. But I also see that the existing proposal afforded GitHub a clear, simple path to establishing proof of ownership for package owners publishing to GitHub Packages (since the URL-as-package-identity provided a clear mapping to a GitHub repository). It remains to be seen whether GitHub would be willing to take on the responsibility of verifying proof of ownership using a DNS based identity scheme.

The discussion of the desired registry protocol can and should proceed whether or not any party is committed to implementing and operating the end result. And we absolutely should focus on a solution that meets the long term needs of the Swift community. But we should also recognize that GitHub may take some time to re-evaluate their plans for SwiftPM support in GitHub Packages based on the move away from URLs and so the registry proposal process may lose some momentum. (I'm assuming that GitHub currently bears most of the cost of serving SwiftPM packages today and is a driving force behind the registry protocol).

tomerd · February 4, 2021, 12:20am

Yes. The point made above was a direct reply to the claim reverse DNS does not meet the definition of opaque identifier by the core team, which is not correct if we use reverse DNS in similar fashion to Maven.

+1

Karl · February 4, 2021, 12:34am

I was wondering about that, but I suppose it would be too easy to spoof, and you'd have no way to independently verify if a package was who it claimed to be. It would give packages a portable identity, though.

Is it crazy to wonder about multiple package ID formats? So we might recognise reverse-DNS as a special ID that is portable and can only be used by domain holders. Those kinds of packages would basically get some badge or something to indicate "we know com.apple.swiftnio is linked to apple.com, and we have some way of verifying this registry listing with them idependently so we know you're not getting some knockoff package info".

We might also allow for registry-local IDs. They would have to be visually distinct, and they wouldn't get any kind of special badge because we don't know who github.com:apple/swiftnio actually is, and if they move one day to bitbucket.com:apple/swiftnio, we might have to assume they're different packages.

mmarston · February 4, 2021, 12:49am

I think these are important points, but it seems to me that so many posts on this thread focus on the publishing aspect and their hasn't been enough discussion around the user experience as a developer that is a consumer of published packages.

I'd say making it as easy as possible to consume packages is just as if not more important than making it easy to publish them.

For example, preventing ID conflicts puts more work on registry providers and publishers, but resolving conflicts puts more work on the consumer. Addressing and resolving a potential conflict when someone attempts to publish a package means the problem is taken care of up front rather than requiring every consumer that encounters the conflict to resolve it on the client side.

What is the vision for the out-of-box experience when a user installs SwiftPM? Does it come pre-configured with one or more registries? For example, npm is preconfigured to fetch from npmjs.org, pip from pypi.org, gem from rubygems.org, cargo from crates.io, etc. Or does the installation process require the user to choose a registry?

The community members commenting in this thread don't seem to mind configuring a list of registries, choosing which ones they trust, and setting a priority order. But we need to think about the average developer and the new developer too.

iabudiab · February 4, 2021, 7:35am

I totally agree. However, I don't think that

is necessarily true. You've described a solution already. Just ship SwiftPM with one preconfigured default registry. And provide extra UX on top, e.g. make switching the default registry a no-brainer. SwiftPM could be preconfigured with GitLab, Butbucker etc. but one default like GitHub. Make adding new registries a no-brainer, e.g. swift registry add mycorp https://octocorp.com. Each compliant registry could be preconfigured with the same defaults and check for namespace collisions with the default. and many more.

A conflict can only arise, only if more than one registry is used. Which is where a solution for resolving this is required. It can be completely on the registry side or in SwiftPM. But of course, there are scenarios where a user intervention is inevitable, which I would categorise as not a scenario an average developer will face very often.

I can't talk for anybody else. It's not that a great consumer experience any less important, but rather that it is easier to achieve imho once the main pain points are taken care of.

Here is how I thought about it:

Let's say the default registry is GitHub and we're using <namespace>/<package> as ID.
Each registry takes care of its own <namespace> collision
Every developer can just start consuming packages like before, .package(id: "OctoCorp/LinkedList")
Every one can publish packages like before. Push a tag to repository.
A local mirror would also fetch from the default, i.e. GitHub
An internal corporate registry could check for <namespace> collisions with the default registry
- Either preventing me from using the package before I resolve the conflict, or until I explicitly define the resolution strategy (which is to be defined)

Jon_Shier · February 4, 2021, 6:59pm

I haven't had time to put all my thoughts together on this, but I think we should all be aware of today's news: JFrog is shutting down a variety of their hosted registries, including Bintray. To my mind this is all the evidence we should need that Swift shouldn't rely on any third party service, whatever the outcome of these discussions. Either an Apple / Swift service should form the root of our package ecosystem and be SPM's default, or nothing should, and the system should be fully and formally federated.

iabudiab · February 10, 2021, 4:54pm

Here is another relevant bit of news and a must-read regarding Package IDs without proper namespacing: Dependency Hijacking Software Supply Chain Attack Hits More Than 35 Organizations

And probably validating credentials in some form should be enforced.

tomerd · February 10, 2021, 10:55pm

one very interesting way to deal with supply chain attacks is a trust system based on research!rsc: Transparent Logs for Skeptical Clients. SE-0292 mentions this kind of system in it's security section, and IMO this could be an additional layer to enforce identity authenticity beyond requiring a process to prove namespace ownership

gahms · February 11, 2021, 8:31am

Do I understand it correctly that the primary reason why the core team considers URLs as package identifiers to be unacceptable, is that it will not allow me to force a fork of a package?

For example:
If I use package A and that in turn use package B but I want to use a specific fork of package B, I would not be able to do so if package A refer to package B with a URL. The URL would point to a specific fork and force that specific fork to be used no matter my wishes to point it elsewhere.

The underlying assumption here being that if a URL is used as identifier it must be assumed that the URL is not only used to identify the package B but also used to download the package.

marwan-at-work · February 12, 2021, 4:36pm

Just to bring an example from a different language on how this was sovled:

In Go, the issue with forks is solved by the replace clause which works well when you want to fork a library: Go Modules Reference - The Go Programming Language

It works even better because you can use the replace clause against a local filesystem path. This is helpful when you want to work on a library locally instead of upstream while keeping its identifier as the upstream URL. This is very common when you are developing on your program and developing on one if its dependencies to test out a contribution you want to make or it's simply one of your other libraries.

As for packages with conflicting names, Go also solves this by letting you rename an import path identifier:

import (
"github.com/user1/pkg1"
user2pkg1 "github.com/user2/pkg1"
)

// you can refer to the first import by "pkg1" and the second import by "user2pkg1"

Admittedly not too familiar with SwiftPM's details but it can potentially do something similar maybe even on the manifest level and not the swift code files?