SE-0292: Package Registry Service

mattt · December 18, 2020, 7:04pm

Following up on my last message:

I was curious to see how compatible the infrastructure of AWS CodeArtifact was to the semantics of the proposed Swift package registry interface, so I went ahead and implemented a proof-of-concept web application using aws-sdk calls:

I'm happy to report that most concepts in CodeArtifact appear to map cleanly onto a counterpart in Swift. Here's my first approximation of how their respective terminology lines up:

AWS CodeArtifact	Swift Package Registry
Domain	URL top-level domain or subdomain
Repository	Package
Package	Release
Package version	Release version number
Package version revision	N/A (releases are immutable)
Upstream repository	Code host repository
Asset	Source archive (`.zip` file)
Package namespace	Leading path components in package URL

If nothing else, I hope this code helps you and your team contextualize our ideas for the Swift registry. I look forward to working with y'all to help bring Swift package support to AWS.

mattt · December 18, 2020, 8:35pm

Correct. For example, if a package dependency is specified with the URL of ssh://git@github.com:mona/LinkedList.git, the URL template parameter {package} would be github.com/mona/linkedlist. The GET request made to the package registry would therefore be made to https://registry.example.com/github.com/mona/linkedlist.

From the perspective of a registry, the most important thing is that these identifiers are case- and diacritic-insensitive. I think the exact behavior of URL canonicalization is ultimately an implementation detail of Swift Package Manager.

That said, it's an important detail, and I'd certainly welcome your help in making the implementation as robust as possible.

Ready to dive into the weeds for a bit?

Parsing is always a minefield, but I think the stakes are relatively low for this particular application. Let's take your example of "https://test1@test2@test3/":

You (correctly) point out that CanonicalPackageIdentity incorrectly parses the userinfo component, so the result is an identifier of "test2@test3". This identifier is then used to construct URL(string: "https://test2@test3"), which either returns nil because it's an invalid URL, or fails to resolve. As far as I can tell, that's no worse than having a typo in your URL.

Could you help me understand the practical impact of this kind of bug? What's the worst that could happen if canonicalization working incorrectly?

Thanks again for pointing that out! I'll open a PR to fix that. I'd love nothing more than for you or anyone else to find a dozen more test cases that we don't handle correctly.

I was actually on the fence about this. At one point, my implementation only removed the default port for the specified scheme (e.g. 22 for SSH, 80 for HTTP, 443 for HTTPS, 9418 for Git). Can you think of any reason why we should / shouldn't do that instead? IIRC, I got as far as the protocols supported by Git, but decided against it after thinking about which other SCM protocols to support.

Karl:

It replaces tildes ( ~ ) with the username component from the URL (if it has one), so using the example from the code comments:
ssh://mona@example.com/~/LinkedList.git → example.com/~mona/LinkedList
I have never seen this behaviour before; is there any precedent for it? Also, is the component after replacement ~mona or just mona ? Given that the code comments appear to be the authoritative documentation for this "canonicalisation" transform, it's important that it is accurate.

I can't find my original source, but I recall this being some behavior of SCP that was inherited by Git; I also recall being surprised that ~mona was the expanded form. Anyway, we can take it out if a reasonable citation eludes us.

At least for the registry interface, I'm confident that these go through Foundation.URL.init(string:), which performs its own validation. For repositories, I'd have to check.

But I take your point that this could use another validation step at the end.

You're totally right — that was an oversight on my part. I'll tack that onto the aforementioned PR.

You're correct in pointing out this distinction. "Actually" was the wrong word here; what I meant was "effectively". For the purpose of canonicalizing package identities, so that we can match ones that are the same save for insignificant differences, I think our treatment of paths as file URLs is acceptable. Identifiers for local packages (which have file paths or file:/// URLs) are treated differently than remote dependencies.

IDNs are the next big thing I'd like to tackle. I have a (perhaps overstated) fear of homograph attacks. I'll take a closer look at how disallowed characters are / should be handled.

Despite their complexity and baggage, I still think URLs are the best solution to the problem of identifying packages. I think the idea that https://github.com/mona/LinkedList and ssh://git@github.com:mona/LinkedList.git are the same aligns well with user expectation, and that the process of HTTP content negotiation provides a convenient mechanism for allowing users to upgrade to a more secure system without changing their code.

We can (and should) take every opportunity to make our canonicalization function robust and correct, and limit the impact of anything that manages to slip through the cracks. That said, I don't think adopting URLs here poses a significant security or logistical threat.

0xTim · December 18, 2020, 10:04pm

I've forked the benchmark to create a 'real' world app with a large number of dependencies if people want to use it to test

shahzadmajeed · December 21, 2020, 8:42pm

Could that interface be implemented by companies/developers by themselves? So that they can have their own private register service? Basically what if people don't want to use Github?

mattt · December 21, 2020, 9:25pm

Yes! Because this is an open standard, any developer can build and deploy a Swift package registry that hosts projects of their choosing.

shahzadmajeed · December 21, 2020, 10:27pm

Thanks for your reply Matt. I am curious if swift package itself can act as a registry service where you can have multiple local sub-packages in a top level package and top level package chooses which ones should be distributed publicly or privately. That way you don't even need to implement a separate registry service.

mattt · December 22, 2020, 11:31am

I'm not sure if I understand what you're describing, but I'll point you to this thread, which discusses different ways that Swift Package Manager could support projects containing multiple packages, including the package registry interface.

mattt · January 11, 2021, 5:57pm

Following up on this: You can find the OpenAPI specification here.

I'm very interested to hear your thoughts about my responses to your posts in this thread. Please also take a look at this thread discussing the decision to use URIs / URLs to identify packages.

mattt · January 11, 2021, 7:39pm

I just submitted a PR to apple/swift-package-manager with fixes to the issues you identified, as well as some new logic for normalizing Windows paths. If you have time to take a look, I'd really appreciate any feedback you have.

tomerd · June 1, 2021, 6:16pm

This first review for SE-0292 has concluded and the proposal was returned for revision.

Thank you to everyone for the feedback and contributions to this proposal.