URLs as Swift Package Identifiers

benrimmington · January 21, 2021, 6:41pm

I think this will depend on how (and when) the namespace ownership is verified:

e.g. by continuous integration testing.
e.g. by SwiftPM or Xcode (when resolving dependencies).
e.g. by the Package Registry Service (when publishing a release).

Having swift package edit for registry-based dependencies may also help.

I'd like to have support for import com.apple.NIO (or import com.apple::NIO) but I've no idea how difficult this would be to implement.

(There was a pitch for module selectors, where the :: separator was originally suggested.)

The import NIO statement would still be supported (with a warning and fix-it?) but the mangled Swift names would be derived from com.apple.NIO.*.

As I understand it, the recommendation is to use the same UpperCamelCase name for a package and its product. Your com.example.tools name should be com.example.Tools (if possible).

Another solution is to use com.example::Tools, if that separator is valid in path components. On macOS, that would be valid in the Terminal, but you'd have to use com.example//Tools in the Finder.

jechris · January 21, 2021, 6:58pm

Giving my opinion as a end user: I really don't want to have to claim any id or whatsoever to any global registry.
I really like the fact that right now SPM is just using URLs to identify a package. It make things simpler than things like Cocoapods, especially as a package maintainer.

It also let met know easily what the source code should and make a comparison if I want to. With an opaque id I must first find where the source code should be coming from.

rauhul · January 21, 2021, 8:08pm

Also chiming in as an end user. While I agree with @jechris that claiming an id would be annoying, I also am very jealous of how simple it is to declare dependencies in a cargo.toml file. Theres just so much less ceremony than a Package.swift.

John_McCall · January 21, 2021, 8:36pm

Yes, I think this is key. The fact that GitHub does this well is great for users, but making this a requirement, along with the expectation that the previous host has to maintain resources at the old URL forever, is unrealistic. Companies and hosts come and go. It's inappropriate for us to build this design around GitHub and its current technical capabilities.

A namespace like a domain name does not suffer from this problem because there are no technical reasons to ever change the namespace for an existing package. Even if a package changes ownership — e.g. because a new organization took it over, or conversely its old maintainers decided to abandon it — there's no reason that has to be reflected in the namespace. We can simply make it a condition of participating in the package system that you aren't allowed to "repudiate" an existing package that was registered in your namespace, even if it no longer has anything to do with you or your organization. Since it doesn't actually require resources or cooperation from the current organization, it's just the price of history.

If people find reverse DNS confusing, I don't think there's a fundamental reason not to use forward DNS. The point is just that (1) it's a string without any special technical significance, (2) different people and organizations are naturally likely to use different strings, and (3) there's an existing authority for determining ownership of strings so that the Swift project isn't caught in the crossfire.

Anyway, I don't think this needs to be a heavy burden. The goal of namespacing is to avoid the problem of having to make package names globally unique. If a package doesn't have a namespace, then we can just rely on its base name being unique, and generally things will work out. We can encourage using namespaces in various ways, like making the use of a namespace a condition of being added to various package lists and registries. Ben's post lays out what seems like a reasonable basis for gradual migration.

The analogue of this in your proposal requires URIs to be primarily treated as opaque strings that must be looked up in a registry for correctness. That leads to exactly the problems I brought up in my earlier post: it's drastically simpler to ignore that extra step and treat the URI as authoritative.

mattt · January 21, 2021, 9:37pm

Sorry, how do you mean? I don't see anything about this kind of solution requiring that URIs be treated as opaque strings or not.

It may be simpler for Swift Package Manager, but it makes everything else much more difficult.

If packages have opaque identifiers, then they need a registry to be resolved. That means additional configuration from the user, and additional tooling so that users can locate their dependencies.

If packages have opaque identifiers, then we need a name registry and verification process. That's more work for the registries and for maintainers.

We considered opaque identifiers early in the process, but rejected them in favor of URIs because they introduced a lot of friction and complexity for no discernible benefit. We already have a namespaced identity scheme for packages, and it's what we've been using from the start.

John_McCall · January 21, 2021, 10:01pm

The analogue under your proposal of registering a non-existent package just to forward it to the real thing would be to add a reference to some registry that says that https://github.com/my/package.git is actually found at https://github.com/mynew/package.git. But if you have to look up the URI in a registry in order to check whether it's been forwarded, you've basically implemented 9/10ths of what a non-URI proposal would need, and the URI can really just be any unique string that identifies the package.

Anyway, I think I understand your perspective on this.

mmarston · January 21, 2021, 10:05pm

That was one of the most illuminating statements in this whole forum discussion, helping me see the proposal in a different light from what I had assumed based on my prior experience with things that have used the word "registry".

I share this opinion.

We can continue to discuss the relative merits of URL-based vs opaque identifiers. But I think it is a stretch to say there is "no discernible benefit" (however, maybe that is true from GitHub's perspective).

mmarston · January 21, 2021, 10:45pm

I’ve seen several comments about not having a central authority, and that seems to be one of the underlying assumptions leading to the proposal to use URLs as package identifiers (that, along with ease of migration). I’m relatively new to the discussion around a SwiftPM registry so I don’t have the benefit of any prior discussion and decisions with regard to why not have a central authority.

Is it a philosophical objection against central authority? Is it to avoid having the cost of running and administering a central registry fall on the organization that operates the central registry? I realize there are a few modern package ecosystems that take a decentralized approach (e.g. go-lang), but most have a central registry (e.g. Maven Central, PyPI, RubyGems, NuGet Gallery, npmjs, crates.io).

If there is no central registry, I wouldn't be surprised if some company such as JFrog sets up an intermediate registry proxy and will call that the "central SwiftPM registry" (like they've done with gocenter.io).

mattt · January 22, 2021, 12:27am

I wouldn't say that I have a philosophical objection to a central authority, per se. It's more questioning the necessity, utility, and viability of this approach for this particular problem.

As I wrote above, Reverse-DNS may have been a right decision in the '90s, but that decision was predicated on the conditions at the time. Is it still a good idea in the 2020s? Maybe. But enough has changed that it's worth questioning base assumptions.

The examples you cite are interesting. Java, Python, Ruby, and JavaScript were all first developed in the '90s, and their respective package registries / managers / indexes started in the early '00s. This was a time when SVN and SourceForge were the default, before Git and GitHub (and CI/CD and AWS and Heroku, the iPhone, and so on).

I think most of us would agree that a lot has changed since then. That's not to say that any of those are bad solutions or the wrong solutions — I just think we would do well to adjust our priors as part of our technical decision making process.

For example, the central registries you listed are many things at once:

A package / namespace registry
A searchable index
A CDN

Doing all of this has enormous costs, which require donations or a commercial offering. It also creates a single point of failure for things like Denial-of-Service attacks, lapses in security, or downstream consequences.

The example of moving from SVN to Git teaches us that it's easy to centralize a decentralized system, but it's much harder to decentralize something that's centralized. One of the interesting things about Swift Package Manager is that it's been decentralized from the very start. As such, we don't have the same constraints as other languages / ecosystems had when they chose their respective solutions.

Do we need to have a central registry acting to serve me code on GitHub, or could I cut out the middleman and get it myself? Do we need a separate registry for namespaces and package names, or can we delegate all of that to DNS?

I wouldn't either! In fact, we're already seeing ambitious new projects from the community that are stepping up to fill these roles:

https://swiftpackageregistry.com

To quote something I wrote earlier in the thread:

mmarston · January 22, 2021, 1:34am

The reasons that you've given for discounting these registries actually supports them in my mind. You've stated that these are old and that alot has changed. Yes, they are old, but they aren't all as old as you make them out to be (^[1]). To me the fact that they are old indicates their approach has stood the test of time.

And yes, SVN and SourceForge were the default before Git and GitHub. To me that just goes to show that source control systems and source hosting providers change over time. So for longevity it seems best to have a plan for a durable registry that doesn't assume git and GitHub as eternal constants.

Do you want a thriving Swift community in 2040? Following the pattern of registries that have thrived for nearly 20 years seems like a good way to prepare the Swift package community to thrive for the next 20 years.

These problems don't go away by having a distributed system. Instead the community becomes dependent on many different hosting providers to operate their services with a high security and availability bar.

If my dependency graph includes packages hosted by 4 different providers then my ability to reliably perform builds in my CI system is now dependent on the availability of all 4. The availability of the whole is no better than that of the provider with the lowest availability.

Isn't that an example of the opposite, going from a centralized source control system (SVN) to a distributed/decentralized one (Git)?

From the history I could find PyPI launched in 2000, RubyGems in 2004, Maven Central somewhere in between, npmjs in 2010, NuGet in 2010 and crates.io in 2014. ↩︎

Max_Desiatov · January 22, 2021, 11:11am

I think what @mattt means is that GitHub's approach with forks and pull requests, and alternative hostings like GitLab wouldn't be possible with SVN. Git decentralized the whole source control approach to make it more composable. People can then aggregate ("centralize") their repositories on an ad hoc basis as they find convenient. Solutions centralized by design are inherently inflexible in comparison.

I also would like to voice my concern that centralized registries and opaque identifiers resolution would put similar constraints on SwiftPM and the community. I was initially skeptical about the decentralized approach when SwiftPM was introduced, but in hindsight I admire the work of people who designed it this way. It's a big improvement after coming from the JavaScript world with NPM, where package management is much more painful.

So far I haven't seen convincing arguments for SwiftPM to discard all of the benefits it gained from decentralized design and to convert to a centralized approach mid-flight. Situations with redirects and proxies can be resolved as technical details within the same framework of the SE-0292 proposal. In comparison, switching to centralized registries in an established ecosystem just for the sake of it would be quite painful.

SlaunchaMan · January 22, 2021, 1:12pm

This is a good solution for the case where you want to move a package from one URL to another. But it does presume that you still have access to the old location. What happens if you create a Swift package at some new host which then fails? If you can't access the original URL to update it then you effectively lose the package. And even if the service doesn't go away, what's stopping someone from making a package hosting service, then altering the deal to raise the price once there are some popular packages on it? Those packages are now held captive behind the hosting company. I think this is my biggest concern when it comes to URLs as identifiers.

mattt · January 22, 2021, 2:46pm

Right — they're old enough that it makes sense to make sure the decision still makes sense today, but not old enough to be timeless and benefit from the Lindy effect. If you wanted a solution that has stood the test of time, look no further than the web itself: DNS, HTTP, and URLs.

Indeed they do. And you've seen plenty of long-standing projects like LLVM and GraphViz migrate to these new hosts. Again, I think this actually cuts against your point; why should we expect a centralized registry to be immune to the same fate?

Consider the counterfactual: What if the communities thrived in spite of their centralized registry? Based on my own experience experience working in the Ruby and JavaScript ecosystems, these registries feel increasingly out-of-step with emerging best practices.

Developers have different bars for security and availability, and a distributed system allows for solutions to meet those individual needs. In a CI context, you would absolutely want to use your own registry instead of relying on multiple sources. But if you're just playing around, you don't need that.

We, as developers, constitute a distributed system, and our consensus has moved from one technology preference to another. My point was that those of us who have made that transition from SVN to Git saw first-hand the difference moving from a centralized to a decentralized technology.

jechris · January 22, 2021, 6:55pm

Reading thread again I wonder why @iabudiab proposition was discarded. Using something like:

.package(“swift”, owner: “apple”)*

We would avoid non understandable reverse-DNS id and URL but still have a meaningful identity.

If we stay on decentralised path (which I hope), we would also add the registry:

.package(“swift”, owner: “apple”, source: “github.com”)

It would then be up to each package provider/registry to decide which people can publish under “apple” although it can naturally fit with what Github & co do today (username/organization).

Some people might argue that decentralised lack protection because anyone can claim .package(“swift”, owner: “apple”, source: “bitbcuket.org”). I would say that today nothing stop you from taking “apple” as user/organisation name and publish (malicious) code on it.

Although having .package(“apple/swift”, source:...) or .package(“apple@swift”, source: ...) would be cool I think

haikuty · January 23, 2021, 12:49am

Haven't had a chance to read the whole proposal here, but followed a link from @mattt's tweet re centralized vs decentralized. I favor decentralized. Note that a primary one can arise if the/a community wants to all agree to use one.

Would it be possible for a big company to have a private package registry for their internal code (on a private intra-net server that remote offices can connect to) and also use some modules from a public registry without support for a distributed registry model? Because that seems like a must have basic requirement for larger companies (though many fork repos to their internal and maintain there own private fork, of course).

haikuty · January 23, 2021, 1:08am

Good text from Twitter (@mxcl) that seemed like it should be part of this thread:

@ mattt @ JonBash I went with decentralized with SwiftPM for a variety of reasons but off the top of my head being able to swap out a dependency for a fork easily is important if you want a community that has the tooling available to fix issues in their dependencies.

@ mattt Centralized repos lead to stagnant packages with no clear resolution going forward. With URLs forks can emerge and replace stagnant packages.

@ mattt You can easily enough build a centralized system on top of a decentralized one also. Which would be my suggestion going forwards. Hopefully the base that is SwiftPM means it cannot be screwed up into something bad going forwards no matter what Apple do.

@ mattt Homebrew has become bad IMO now, and I took some of those lessons to SwiftPM. A good base cannot be twisted by the future maintainers into something that will in the long term be useless. Fingers crossed anyway.

@ mattt I came to this having tried and become repeatedly frustrated at just how difficult it is to fix your deps in all other languages. Have you ever tried? Just using a local copy of a dep so you can debug is usually frustrating. No wonder only 2% of users submits PRs to projects.

@ mattt The process needs to be trivial, people need a very simple method for fixing deps or the adage “fork it and fix it yourself” is actually just a slap in everybody’s face. Forking is easy, but what exactly do you do next? SwiftPM was initially designed so this was easy.

NeoNacho · January 23, 2021, 1:26am

I don't think anybody is suggesting that overriding dependencies with forks via URLs wouldn't be possible anymore. I would even say that's one of the main reason why the deduplication aspects of package identity are so important. If https://github.com/apple/swift-nio and https://github.com/me/swift-nio become truly different, we would actually break forking entirely.

John_McCall · January 23, 2021, 2:27am

Right, exactly. You should be able to build with your homegrown fork of any arbitrary package you want without having to edit every package that depends on it to replace the URL.

Currently, deduplication is broken because the package-uniqueness system is broken. That has to be fixed.

Frizlab · January 26, 2021, 12:41pm

Reading all this thread, my understanding is there are two notions that are mixed up.

AFAICT there is the notion of package ID, and the notion of package source.
These must be, IMHO, totally independent, otherwise private forks and/or local package overrides would not be possible, or would be complex to setup.
The package ID should be defined inside the repository (currently we have the package name; we can say it’s its ID).

To me the optimal solution would be to define a dependency to a package like so (based on @iabudiab proposition, also mentioned by @jechris):

/* Note the id here is totally arbitrary. I used a reverse-DNS identifier
 * because conflicts are close to non-existent w/ this, but the ID could be
 * any Swift string.
 * It **must** match the id at the given source though, and is not optional. */
.package(id: "com.apple.combine", source: "https://github.com/apple/combine")

We can currently define a package dependency this way (using name instead of id), but name is optional.
It think the id in the package definition must not be optional, though redundant it seems, because 1/ it is the only way for the compiler to verify the package it got from the given source is the expected one and 2/ it is easier for the reader of the source code to get the id of the package directly from the Package.swift rather than follow the source of the package.

To retrieve the package, a local configuration could be applied to force using a specific package registry and/or proxies, or github could reply “I have a registry for this, you can use it”, and the client would use it or not depending on a local configuration (default would be to use it).
When the package is resolved using the determined source, the compiler would verify the retrieved package has indeed the expected ID. If it does not, compilation fails.

mattt · January 26, 2021, 1:19pm

I don't think that's necessary. Swift Package Manager doesn't currently make such a distinction, and yet forks and overrides still work.

What you're proposing here is essentially to make the name parameter (added by SE-0226) in dependency declarations mandatory. I think @tonyarnold has the best take on this:

For id to work, you need a centralized registry to coordinate who gets what identifiers, which we don't have. URIs provide the same uniqueness guarantees without requiring a central naming authority or any changes to package manifests.