URLs as Swift Package Identifiers

NeoNacho · January 20, 2021, 7:02pm

Intermediate registries aren't part of the actual proposal, though, are they? They are listed in Future Directions right now: https://github.com/apple/swift-evolution/blob/64922bdf9e218a1673937a35c7d79d33f29dde99/proposals/0292-package-registry-service.md#intermediate-registry-proxies

mattt · January 20, 2021, 7:07pm

They were put under Future Directions in an attempt to scope the work (they aren't a necessary feature for early adopters, since you can do the same with mirrors and other mechanisms), but we could certainly incorporate them into the main proposal before submitting for second review.

johannesweiss · January 20, 2021, 7:09pm

Fantastic, I'm glad to hear and sorry I missed that. I thought the intermediate registry proxies are just meant for corp and other special environments and not a thing general users use. But of course, you're right, we could just make everybody configure that intermediate registry proxy to something like https://swiftpackageindex.com or so (assuming they're willing to implement).

Isn't that similar to every user having configured exactly one registry?

New assumption:

all users fall into exactly two camps:
1. the ones that do not have a config set-registry-proxy
2. the ones that have a config set-registry-proxy

If my assumption is correct, then I have two sets of responses:

For users in camp 1 (no registry proxy): All the questions in the previous post that assumed no configured registry over here.
For users in camp 2 (registry proxy configured):
Don't all users in camp 2 essentially have just one single registry that decides everything? If that's the case what is the problem that URIs solve?
- There's no extra trust (because we have to trust the registry proxy).
- Location is in the package ID (the URI) but it doesn't actually apply.
- If there is essentially just one single configured registry, what would be the problem with opaque IDs?

From what I understand (may be wrong again ofc) right now is that to solve the bootstrapping issue, we'd start with everybody in camp 2 and have everybody configure their registry proxy to something that works from day 1. Is that accurate?

mattt · January 20, 2021, 7:21pm

Users can start using the registry interface today by setting up a registry service (like my reference implementation) and using mirrors / intermediate registry to resolve through that.

Once GitHub adds support, users can start taking advantage of it for their dependencies hosted on github.com without any additional configuration (just pass the --enable-package-registry option). I don't know and couldn't speak for what GitHub's plans are, as far as timing, whether it's opt-in, or the nature of how they would support it.

The story is the same for dependencies hosted elsewhere: If the host supports the registry interface (whether that's gitlab.com or my-skunk-works-git-host.com), Swift Package Manager would try to resolve using that.

By configuring to resolve everything through a single, chosen registry, you are indeed trusting them to report available versions and provide source archives. As I mentioned before, checksums allow us to verify the integrity of downloaded code.

When going directly through a single, configured registry in this way, the URIs effectively become opaque identifiers. I believe the advantage of using URIs is that it supports a seamless migration from our current URL-based system to the proposed registry interface ("escalators don't break, they become stairs").

If your registry serves up github.com/Alamofire/Alamofire v 5.4.0, it's clear what URL you could go to out-of-band to audit that code, find documentation, and see a changelog.

The one other benefit that I'd like to note about URIs, that I haven't discussed so far, is that URIs can establish relationships between forked repositories. If a CVE is issued for github.com/mona/LinkedList, it follows that forks of that repository are likely to have the same vulnerability. We can determine shared history and therefore shared vulnerability through GitHub's website in a way that opaque identifiers can't.

johannesweiss · January 20, 2021, 7:42pm

mattt:

johannesweiss:

From what I understand (may be wrong again ofc) right now is that to solve the bootstrapping issue, we'd start with everybody in camp 2 and have everybody configure their registry proxy to something that works from day 1. Is that accurate?

Users can start using the registry interface today by setting up a registry service (like my reference implementation) and using mirrors / intermediate registry to resolve through that.

Once GitHub adds support, users can start taking advantage of it for their dependencies hosted on github.com without any additional configuration (just pass the --enable-package-registry option). I don't know and couldn't speak for what GitHub's plans are, as far as timing, whether it's opt-in, or the nature of how they would support it.

The story is the same for dependencies hosted elsewhere: If the host supports the registry interface (whether that's gitlab.com or my-skunk-works-git-host.com ), Swift Package Manager would try to resolve using that.

Hmm, this reads like we'll have to have precisely one registry to start with. And likely forever. I just don't see all relevant hosting providers (correctly) implementing this and we even reduce their incentives to implement it because to bootstrap everybody will use the registry proxies.

I don't think the mirror setup works. I don't think we should start to circulate a huge mirrors file for all repositories. Then essentially the users are doing the job of a registry by distributing a large file that has to always be kept up to date (similar to how the hosts file was circulated on the ARPAnet (is that how it was called?) before we had DNS.)

If we accept that the current proposal will essentially be used as one global registry (called the "registry proxy") + URIs as opaque identifiers, then I think we made progress in the discussion.

We can then start to argue if URIs are good or bad opaque IDs. My opinion is that they are bad opaque IDs because users don't think they are opaque, de-duping becomes hard or impossible, the location doesn't mean anything, ...

(this paragraph is a later edit of the post): As others have pointed out "registry proxies" aren't actually proposed but I'm still operating as if they were proposed. Because without we can't bootstrap the registry ecosystem and we're locked into Github [if/when they even implement], why? See the questions at the end of this post because without registry proxies my assumptions are actually correct I think.

With opaque IDs, the registry would also know where the code lives. So it can query whatever attributes it wants too. And yes, it should definitely query Github's security information.

mattt · January 20, 2021, 8:00pm

To be clear: you wouldn't configure github.com or swift.pkg.github.com as your intermediate registry client-side. As soon as GitLab or another host implemented Swift registry support, that would be a few more packages that benefit from speed and security improvements of that interface.

Again, that's where the intermediate registry configuration comes in. Mirrors are for really for one-offs, an intermediate registry is for handling all package routing. Edit: The mirrors approach used in the test harness I linked to are a temporary scaffolding.

That's not guaranteed. A requirement of enterprise artifact repositories like CodeArtifact and Artifactory is that they can store artifacts without getting access to the original source code.

johannesweiss · January 20, 2021, 8:12pm

Yes, I understand. I was more thinking of configuring https://mattts-implementation.registry or https://swiftpackageindex.com or whatever the community settles on. Likely there'll be one that turns out to be "the one" that everybody configures to keep everything working and benefit from the faster downloads etc. My assumption is that swift.pkg.github.com won't actually be there when SwiftPM's registry feature launches.

I'm not super convinced they'll all implement it. And without all the relevant ones implementing it, I need a registry proxy.

Yes, I understand. I just phrased it badly. Bottom line is that everybody will need a registry proxy IMHO.

Yes, they don't need access to serve you the artefacts. But from what I understand, they do have metadata, so they will still know where it's from (and they can know where it's actually from, unlike the URIs where it may or may not be there). So you can easily imagine some cron-job like thing that periodically checks if new versions have to be put on the bad list for security vulnerabilities.

mattt · January 20, 2021, 8:22pm

Thanks for clarifying. I'm actually not too worried about this — in fact, I'm encouraged by what will come of competition. Since our proposal works without configuring a registry, any such service would necessarily be a value-add in some respect. Broadly-speaking, I don't think a central registry will be faster than connecting to the code host directly (though you could imagine an region-specific offerings). Other services may try to differentiate on being more secure, more auditable wrt/ licensing, or having integrations with enterprise software. I don't think this would be a winner-take-all.

And besides, the server API is designed to be simple enough that anyone can host a registry themselves with static files.

And that's totally fine. Swift Package Manager can continue to resolve dependencies on gitlab.com by cloning Git repositories as it does now.

Maybe at first, when this feature is in its early stages. But that scaffolding can be removed once GitHub and other hosts implement registry support.

My point is that URIs encode the inherent relationships between projects in a way that opaque identifiers can't. A registry could associate equivalent metadata, but that would be an additional step.

finestructure · January 20, 2021, 9:10pm

But do they, in a way that’s reliable and useful? If I understand correctly, you’re suggesting to associate forks based on URI alone, right? Wouldn’t that open the door to an unrelated CVE-ridden repository with a matching name triggering false alarms?

If that’s the case then I think that’s probably an example why I have the nagging feeling that URIs aren’t great as ids. It’s just too tempting to parse them when all that should be done is compare them.

mattt · January 20, 2021, 9:15pm

Sorry, that's not what exactly what I'm saying. You can't know the relationship between github.com/mona/LinkedList and github.com/octocorp/LinkedList from the URIs alone. But the resources located by those URLs can tell you what they are, independently of the registry. An opaque identifier would require an additional piece of metadata to make that connection.

Every external Swift package dependency is already identified by URL. Our proposal is to continue this tradition, rather than invent a new layer of abstraction.

finestructure · January 20, 2021, 9:33pm

Right, I see what you mean now, thanks for clearing that up!

benrimmington · January 20, 2021, 11:43pm

It might be possible to gradually migrate to reverse-DNS namespaces.

The package manifest would have a new namespace parameter:
```
 // swift-tools-version:6.0

 import PackageDescription

 let package = Package(
+  namespace: "com.apple",
   name: "swift-nio",
```
The namespace argument would be:
- required by the Package Registry Service;
- optional for existing Git-only packages.

External dependencies, on other packages and their products, would have fully-qualified names:

 dependencies: [
   .package(
+    name: "com.apple.swift-nio",
     url: "https://github.com/apple/swift-nio.git",
     from: "3.0.0"
   ),
 ]

 dependencies: [
   .product(
-    name: "NIO",
-    package: "swift-nio"
+    name: "com.apple.NIO",
+    package: "com.apple.swift-nio"
   ),
 ]

Internal dependencies, on targets in the same package, wouldn't need to be qualified.

Content negotiation would redirect to a registry endpoint base URL, containing a fully-qualified package name as its last path component.

Package dependencies could also use registries directly, in which case the fully-qualified name would be derived from the URL:

 dependencies: [
   .package(
-    name: "com.apple.swift-nio",
-    url: "https://github.com/apple/swift-nio.git",
+    url: "https://swift.pkg.github.com/com.apple.swift-nio",
     from: "3.0.0"
   ),
 ]

When a registry proxy is configured, the url argument could either be omitted (i.e. name only) or have ~~a custom scheme~~ the last path component only:

 dependencies: [
   .package(
-    url: "https://swift.pkg.github.com/com.apple.swift-nio",
+    url: "com.apple.swift-nio",
     from: "3.0.0"
   ),
 ]

swift package config set-registry-proxy 'https://swift.pkg.github.com/'

A local file-system registry could be supported, if the endpoints were changed:

/com.apple.swift-nio/index.{html,json}
/com.apple.swift-nio/3.0.0/archive.zip
/com.apple.swift-nio/3.0.0/index.{html,json}
/com.apple.swift-nio/3.0.0/Package{,@swift-4.2}.swift

ETA: relative URL instead of custom scheme; index.{html,json} instead of release{,s}.json.

mattt · January 21, 2021, 12:48am

Thank you so much for thinking more about this and sharing your solution. In my opinion, this is the only viable alternative that I've seen so far.

I'll take some more time to review this tonight, but here are some of initial thoughts #hottakes:

benrimmington:

The package manifest would have a new namespace parameter:

 // swift-tools-version:6.0

 import PackageDescription

 let package = Package(
+  namespace: "com.apple",
   name: "swift-nio",

I wonder if having this in source would cause problems for forks. Consider the following scenario:

Alice forks github.com/mona/LinkedList
Alices makes changes to the package, including the namespace so that she can use her fork
Later, Alice submits changes upstream, forgetting about the diff in Package.swift
Mona accepts the PR, overwriting the old namespace
???

Another scenario:

Bob publishes a tutorial with sample code for how to publish your first package
Carol forks the project and tries to publish the package, but forgets to change the namespace argument
???

benrimmington:

External dependencies, on other packages and their products, would have fully-qualified names:

 dependencies: [
   .package(
+    name: "com.apple.swift-nio",
     url: "https://github.com/apple/swift-nio.git",
     from: "3.0.0"
   ),
 ]

 dependencies: [
   .product(
-    name: "NIO",
-    package: "swift-nio"
+    name: "com.apple.NIO",
+    package: "com.apple.swift-nio"
   ),
 ]

Would this impact the import statement at all? Is it still import NIO or import com.apple.NIO? If the former, how would we resolve module name collisions?

Taking a step back: One of the main arguments I've heard in favor of reverse-DNS is that it has a long history of use in Java, going back to the late '90s. Back then, there were only a handful of TLDs. From Wikipedia:

The initial set of generic top-level domains, defined by RFC 920 in October 1984, was a set of "general purpose domains": com, edu, gov, mil, org. The net domain was added with the first implementation of these domains.

It occurs to me that we have many more TLDs today (at least 310 according to DNSimple). If a requirement of the design is that identifiers not be confused as URLs, this may be a problem since popular packages overlap with generic TLDs. If Alice publishes a package named tools at example.com (com.example.tools), Mallory could register example.tools and create a subdomain com that resolves to https://com.example.tools.

If the trend for more gTLDs continues, does this start to become a problem?

ktoso · January 21, 2021, 4:30am

Oh, the reverse-dns style style isn't really concerned at all about what exactly the domains specifically are, it only uses them as a simple way to ownership proof.

Not sure why the TLD attack concern, but I can walk you through how claiming a namespace works in maven central, so a secure registry which takes care of namespaces and verifies ownership would do that or something similar. (There can be registries which don't care and don't check of course, but that's a repo I would not use at work for example etc.)

Say you want to publish artifacts under so.kto which looks silly, but I own the kto.so domain... so might as well use that example I'd have to:

make an account, I use an email in that domain (and confirm it),
apply to "claim the so.kto namespace",
prove that I "own that namespace", this step involes a human being checking things actually. So in Maven Central's case this is an actual Sonatype employee looking if my claim seems legit, and checks if the account really seems to be who it claims it is...
- check if you registered using an email address using the same domain
- they may ask you to put some TXT record into the domain
- they may confirm that the project you're intending to publish is also owned by the same entity (e.g. we want to publish com.apple.swift-nio, is the repo being published apple/swift-nio on github?)
- if it's a github domain (com.github.ktoso) they'd check if name, email you signed up with match the email on github account etc.
get approved, the account now has rights to publish to so.kto I can push my artifacts, yay!

The same with com.google or com.apple etc. There has to be sufficient proof that the person signing up "represents the domain". The more known the domain the more checks of course.

Then such repositories rely on the initial owner requesting "oh, and Bob also needs to publish here, please allow him to". So the repository maintainer checks stuff and adds permissions; Some of that can be automated, in maven central it isn't because it's and human checks are involved, it could be though -- it's up to the repo how to deal with this.

There can of course be other repos which don't check, and allow anyone to publish com.google stuff. The solution for secure builds is simple: don't ever use repos you don't trust Some organizations don't use any public internet repos at all, and only import artifacts they trust into an internal mirror repository.

So it really isn't as much about "the domain" rather than it being "the domain is a nice starting point for a tree". Say I got access rights for com.apple.swift, so some Apple/Swift admin gave it to me. It does not mean I can publish to com.apple.sparkle though; So the reverse-dns allows namespace/domain owners to further divide the namespace given their organizational requirements.

No one can register the "com" namespace or just a full TDL usually in such registries, it has to be 2 part. I guess one could but by now the train has shipped since there are already com.something namespaces claimed, which such person would have to prove all belong to them specifically which they can't, so a lower level domain can't be claimed the moment there's already stuff under it published. In practice this prevents weird take-over attacks.

John_McCall · January 21, 2021, 6:24am

I apologize if any of what I'm about to say has been said before; I haven't had time to catch up on the entire discussion.

We could use any unique string to identify/namespace a package. The advantage of using a URL is that the URL can also tell you where to find the package. However, if we unconditionally trust that URL to tell us where to find the package, projects can never change anything about how they host packages without changing the identity of those packages. That seems like a problem, a large enough one that I would state its converse as a basic requirement:

Projects must be able to change the basic details of how and where they host packages without changing the fundamental identity of those packages.

That means that the identity-URL must actually just be a string that identifies the package; at best, it's a string which will be treated as a URL in the default case where we don't have any better information in some global registry.

Once we acknowledge that, I think the advantage of a URL turns into a disadvantage. When writing a new tool, the temptation will always be to just trust the URL rather than doing the more correct registry lookup. Even in registry-aware tools, if something goes wrong with loading registries, tools will fall back on trusting the URL. In either case, projects that change how they host packages will be locked into a worse experience for their clients.

URLs also have a number of unique disadvantages, like that they contain quite a lot of information and structure that's primarily useful for web clients and servers rather than adding any disambiguation for the project. We would probably feel strongly pressured to hard-code shortenings for presumptively-common URL schemas like https://github.com/(\w+)/(\w+).git into Swift's symbol mangling to avoid drastic build time / binary size impact. That seems inappropriate; Swift as a project should not be directly favoring one hosting strategy over another.

URLs are also not naturally embeddable in source code, e.g. as qualifiers on import directives. It seems appropriate for us to consider the potential for such things rather than ruling them out.

jawbroken · January 21, 2021, 7:12am

This seems overstated. The package URL has to be https://, and the normalisation rules for package identity remove both https:// and the .git, so there would be no need to include those in the mangled symbol, leaving you with github.com/(\w+)/(\w+). This is roughly what you would end up with in a reverse-dns system except you haven't confused beginners by writing parts of it backwards.

In general I would say that a big advantage to using URLs is that they work better for beginners. If they're looking at a package in their browser they can probably* just copy the URL straight into their Package.swift and start using it. And vice versa, if they're looking at Package.swift for a project they can probably* just copy the URL into their browser to view a dependency. They can push their code to a remote repository and start using it as a dependency without having to invent and register a unique identifier with a registry, or prove they own a domain, or whatever. I still remember it being unfriendly to have to invent reverse-DNS namespaces for Java code at a time when I didn't even own a domain.

You can freely choose the parsing rules inside such a qualifier, or specify the URL as a string like in the package manifest, so I don't see how it rules anything out.

* There are caveats discussed in this thread, hence "probably", but this will work almost all of the time without having to know where to go to search a particular package registry for some other identifier.

mattt · January 21, 2021, 11:57am

I'm familiar with this process, but I'm glad to have your explanation of how claiming a namespace works in Maven Central for the benefit of everyone on this thread. See also @benrimmington's previous reply with some good ideas about how we might do this if we adopt a similar identity scheme.

Reverse-DNS was a good solution when it emerged in the '90s (give or take), and I wanted to think about whether the underlying assumptions still hold in 2021 and beyond. Does reverse-DNS work with 50x more valid top-level domains (6 in 1998 vs. 310 today)? Does the proliferation of gTLDs like .tools pose UX or security concerns? I don't know — this was mostly just me thinking out loud.

mattt · January 21, 2021, 12:35pm

Thank you for weighing in on this thread, @John_McCall. I'm very happy to respond to any concerns you and the rest of the Core Team have.

Our proposal absolutely allows for packages to relocate. If your package is hosted at github.com/mona/LinkedList, it could be renamed (github.com/mona/SwiftLinkedList), moved to a new org (github.com/OctoCorp/LinkedList), or an entirely new domain (mona.dev/LinkedList). All a maintainer / registry needs to do is establish a relationship between these packages, using HTTP redirects and/or rel="canonical" links. These are the same semantics we have on the web, and they've worked well.

So long as these redirections form a directed, acyclic graph, Swift Package Manager can unambiguously resolve old and new locations for packages.

What our proposal does require is that packages remain available at existing URLs once published. This ensures that dependent packages don't break when one of their dependencies move.

For example, if a dependency graph contains both github.com/mona/LinkedList () and mona.dev/LinkedList (), which are the same package. This can resolve one of two ways:

mona.dev/LinkedList () redirects to github.com/mona/LinkedList (); Swift Package Manager treats github.com/mona/LinkedList () as the package identity.
github.com/mona/LinkedList () lists mona.dev/LinkedList () as the canonical location of the package; Swift Package Manager treats mona.dev/LinkedList () as the package identity

If a package's dependency graph contains only mona.dev/LinkedList (), then everything works as expected; this is the base case.

If a package's dependency graph contains only github.com/mona/LinkedList (), that continues to work, even if mona.dev/LinkedList () is set as the canonical location. This may or may not be communicated to the user ("`Info: github.com/mona/LinkedList has been moved to mona.dev/LinkedList; please update your package manifest accordingly"). SPM could even add tooling to automate updating dependency declarations.

Any other identity scheme is going to have to deal with these same problems. Adopting reverse-DNS, for example, only shifts the problem of relocation to renaming. The reason we chose to use URLs in our proposal is that we can leverage the familiar, robust infrastructure of URLs and HTTP semantics rather than reinvent the wheel with an additional layer of abstraction.

Technically, our proposal identifies packages using URIs not URLs. I've used these terms interchangeably because the distinction hasn't been important.

Essentially, package URIs are URLs without the scheme (https://); our proposal normalizes to remove the .git as well. What you end up with is package identities like github.com/mona/LinkedList or mona.dev/LinkedList. To properly namespace packages in another scheme, you'd encode about as much information (com.github.mona.LinkedList or dev.mona.LinkedList). All things being equal, I'd rather go with URLs, which are more useful and familiar, and don't require us to invent a new layer of abstraction.

The same would be true of other schemes like reverse-DNS. You can't currently import com.github.mona.LinkedList.

In the original post on this thread, we describe how our proposal dovetails nicely with this proposal by Proposed syntax by Rahul Malik, Ankit Aggarwal, and David Hart:

mattt:

Importing external modules without a package manifest

Developers enjoy using Swift as a scripting language, but wish there were an easier way to import external dependencies in a standalone file, without the overhead of the package manifest and directory structure. Various solutions have been explored through community projects like swift-sh, Beak, and Marathon, and discussions on the Swift forums.

For example, running the command swift path/to/file.swift with the following file would automatically resolve and download the LinkedList dependency before building and running the script executable.
// Proposed syntax by Rahul Malik, Ankit Aggarwal, and David Hart
// See: https://forums.swift.org/t/swiftpm-support-for-swift-scripts/33126
@package(url: "https://github.com/mona/LinkedList", from: "1.1.0")
import LinkedList

// Variation with hypothetical module aliasing
@package(url: "https://github.com/OctoCorp/linkedlist", from: "0.1.0")
import LinkedList as OctoCorpLinkedList

print("Hello, world!")

Using anything other than URLs / URIs would require Swift scripts to additionally specify a registry to resolve the package identity.

@package(name: "com.github.mona.LinkedList", from: "1.1.0", registry: "https://swift.pkg.github.com")
import LinkedList

hisekaldma · January 21, 2021, 3:45pm

I don’t think this is as trivial as ”all you need to do.” Both HTTP redirects and rel="canonical" headers need to be supported by the hosting service. Sure, GitHub might add support for it, but if you use a service that doesn’t support it, you’re basically stuck there. And just like with vanity URLs, once you’re realize that you need it, it’s too late.

I’m starting to come around to URLs as package identifiers (@jawbroken’s argument that they’re easier for beginners is extremely compelling to me), but I think there needs to be a better solution for redirects.

What if we made it possible to redirect to a new URL in the package itself?

let package = Package(
    movedTo: "https://mona.dev/LinkedList"
)

mattt · January 21, 2021, 5:35pm

That's a good point, and I appreciate your bringing it up. While the process of redirection is technically straightforward, there are real logistical considerations to consider.

We could add a portability requirement to the server specification, but without an enforcement mechanism for what would be an out-of-band process, that's not much better than using an honor system.

Since I'm working on this proposal with GitHub, they could publicly commit to this, which would cover the vast majority of existing packages.

But even if you have a host that doesn't support forwarding, there's a clever workaround. Your example of movedTo: reminded me of how one of my colleagues at Heroku registered the gem bundle.

You really mean gem install bundler. It's okay. I'll fix it for you this one last time...

Bundler is a Ruby dependency manager, but its command-line executable is named bundle. The bundle gem is a virtuous typosquat; it contains no code itself, but has a runtime dependency on bundler. (For a sense of how often this happens, the bundle gem >5 million downloads, ~1,000 times per day, on average, over the past decade).

Back to Swift: if you were really stuck on a host, you could release a new (final) release of your project that depends on the new package. We could support this without any changes to SPM, and could provide this directly in a new API in the future.