Any plans for SwiftPM to include owner name in package identity?

today, SPM does not really have a great way of dealing with very generic, common package names. for example, one might have a package basics that would appear in a Package.resolved file like:

{
    "identity" : "basics",
    "kind" : "remoteSourceControl",
    "location" : "https://github.com/tayloraswift/basics.git",
    "state" : {
    "revision" : "980c9b0d117e7ac955d5caf38306f9c38e0dcaa3",
    "version" : "0.10.4"
    }
},

this doesn’t really cause problems within a single build tree, since colliding package names are already forbidden. (although one might become very frustrated when trying to add a dependency on a package with a name conflict.) but when building “bird’s eye” package tooling that analyzes many packages from many authors, it becomes necessary to assign a more-unique identity to such packages, like tayloraswift.basics.

an extreme solution would be to forget about the identity field and just use the git repo URL as the package identity. but the URL can take a lot of non-canonical forms, and it is not straightforward to discard irrelevant URL features in a general manner, especially if packages are mirrored across multiple hosting providers.

even today, there is no public, centralized package registry that we can kick the problem to, and no plans from the language team to create/fund one, so it is motivating to come up with an appropriate decentralized rule for identifying packages until such a thing exists.

i propose we settle on using the penultimate URL path component as the scope component of a package identifier. therefore all of the following package URLs would coalesce to the same package identifier tayloraswift.basics:

https://github.com/tayloraswift/basics
https://gitlab.com/tayloraswift/basics
https://mycompany.com/mirrored-repos/tayloraswift/basics

WDYT?

That seems like a poor assumption for anything that isn’t a repository manager. For my personal repos, the penultimate component is “source”. For dependencies specified by SSH, there may not even be a penultimate component.

I realize this is a longstanding problem, but adding semantics on top of URLs isn’t going to cut it, because SwiftPM doesn’t control URLs. It’s bad enough that Package.swift has to be at the root of a repo.

7 Likes

would i be sufficient to special-case local and SSH dependencies to receive some implicit scope such as _local, and reserve scoped package identities for remote URL dependencies?

i’ll note that the Swift Package Index is already effectively using GitHub username as a package scope, which is the penultimate path component of GitHub URLs. so this is really just codifying existing practice.

I don’t mean local dependencies; I mean dependencies I host myself ( Belkadan Software - Source ). I realize this is relatively unusual in our current day and age, but if you want to have parent directories mean something, I think you should have a closed list of hosts that do do that, rather than an open list minus some cases that don’t.

Really, though, I think the whole URL is a much better identifier, with special rules to do just enough canonicalization (ignore “.git”, ignore protocol, maybe translate the non-URL form of git SSH to the URL form). That’ll have a much smaller long tail of bugs than trying to break up URLs into pieces, and it will be much more obvious when things don’t match that should (rather than the other way around). But I think you’ve gone down this route before, and it has its own problems?

If we want to continue special-casing GitHub for display purposes, or “GitHub and this other list of repository hosting companies”, that seems fine to me. I just wouldn’t want that to affect behavior more than it already does. It’s the difference between “SwiftPM makes GitHub more convenient because it’s the common case” and “SwiftPM makes arbitrary URLs more brittle because they’re not the common case”.

(I don’t think “mirrored across hosts” is a good argument for equivalence, by the way, because the states of the repositories could be very different, so it matters which one is picked as canonical.)

that’s a fair point.

as git is a version control system, the state of the repository is already identified by the git tags in each mirror. it’s possible that some mirrors might be ahead of others and contain tags others do not, but arguably that is a feature, not a bug.

this is why encoding hostnames in package identities is problematic, because hostname is a DNS concept that is used for a lot of things besides identifying resources. for example, selecting a mirror that is colocated in the same AWS region as a CI runner.

To me that sounds like the same kind of issue as “I want to use this fork of a package throughout my package graph instead of the default”. I don’t remember if SwiftPM has an answer for this yet.

What is actually wrong with URLs? Other than GitHub allowing with and without .git?

the problem with URLs is many packages can be built in multiple “ways” that may be optimal for a particular build infrastructure. these build modes often involve using environment variables to rewrite the URLs as Package.swift is evaluated. one visible example of this are the Apple open source packages, that rewrite URLs into local file system dependencies when building on the swift CI. encoding the full URL as a package identifier causes many duplicate packages to appear in index tooling.

It sounds like a coincidence that that even works with any scheme, then.

For comparison I’ll point you to Cargo’s “patch” mechanism. Note that they special-case crates.io, but allow arbitrary URLs as IDs as well.

https://doc.rust-lang.org/cargo/reference/overriding-dependencies.html