I was curious to see how compatible the infrastructure of AWS CodeArtifact was to the semantics of the proposed Swift package registry interface, so I went ahead and implemented a proof-of-concept web application using aws-sdk calls:
I'm happy to report that most concepts in CodeArtifact appear to map cleanly onto a counterpart in Swift. Here's my first approximation of how their respective terminology lines up:
AWS CodeArtifact
Swift Package Registry
Domain
URL top-level domain or subdomain
Repository
Package
Package
Release
Package version
Release version number
Package version revision
N/A (releases are immutable)
Upstream repository
Code host repository
Asset
Source archive (.zip file)
Package namespace
Leading path components in package URL
If nothing else, I hope this code helps you and your team contextualize our ideas for the Swift registry. I look forward to working with y'all to help bring Swift package support to AWS.
Correct. For example, if a package dependency is specified with the URL of ssh://git@github.com:mona/LinkedList.git, the URL template parameter {package} would be github.com/mona/linkedlist. The GET request made to the package registry would therefore be made to https://registry.example.com/github.com/mona/linkedlist.
From the perspective of a registry, the most important thing is that these identifiers are case- and diacritic-insensitive. I think the exact behavior of URL canonicalization is ultimately an implementation detail of Swift Package Manager.
That said, it's an important detail, and I'd certainly welcome your help in making the implementation as robust as possible.
Ready to dive into the weeds for a bit?
Parsing is always a minefield, but I think the stakes are relatively low for this particular application. Let's take your example of "https://test1@test2@test3/":
You (correctly) point out that CanonicalPackageIdentity incorrectly parses the userinfo component, so the result is an identifier of "test2@test3". This identifier is then used to construct URL(string: "https://test2@test3"), which either returns nil because it's an invalid URL, or fails to resolve. As far as I can tell, that's no worse than having a typo in your URL.
Could you help me understand the practical impact of this kind of bug? What's the worst that could happen if canonicalization working incorrectly?
Thanks again for pointing that out! I'll open a PR to fix that. I'd love nothing more than for you or anyone else to find a dozen more test cases that we don't handle correctly.
I was actually on the fence about this. At one point, my implementation only removed the default port for the specified scheme (e.g. 22 for SSH, 80 for HTTP, 443 for HTTPS, 9418 for Git). Can you think of any reason why we should / shouldn't do that instead? IIRC, I got as far as the protocols supported by Git, but decided against it after thinking about which other SCM protocols to support.
I can't find my original source, but I recall this being some behavior of SCP that was inherited by Git; I also recall being surprised that ~mona was the expanded form. Anyway, we can take it out if a reasonable citation eludes us.
At least for the registry interface, I'm confident that these go through Foundation.URL.init(string:), which performs its own validation. For repositories, I'd have to check.
But I take your point that this could use another validation step at the end.
You're totally right — that was an oversight on my part. I'll tack that onto the aforementioned PR.
You're correct in pointing out this distinction. "Actually" was the wrong word here; what I meant was "effectively". For the purpose of canonicalizing package identities, so that we can match ones that are the same save for insignificant differences, I think our treatment of paths as file URLs is acceptable. Identifiers for local packages (which have file paths or file:/// URLs) are treated differently than remote dependencies.
IDNs are the next big thing I'd like to tackle. I have a (perhaps overstated) fear of homograph attacks. I'll take a closer look at how disallowed characters are / should be handled.
Despite their complexity and baggage, I still think URLs are the best solution to the problem of identifying packages. I think the idea that https://github.com/mona/LinkedList and ssh://git@github.com:mona/LinkedList.git are the same aligns well with user expectation, and that the process of HTTP content negotiation provides a convenient mechanism for allowing users to upgrade to a more secure system without changing their code.
We can (and should) take every opportunity to make our canonicalization function robust and correct, and limit the impact of anything that manages to slip through the cracks. That said, I don't think adopting URLs here poses a significant security or logistical threat.
Could that interface be implemented by companies/developers by themselves? So that they can have their own private register service? Basically what if people don't want to use Github?
Thanks for your reply Matt. I am curious if swift package itself can act as a registry service where you can have multiple local sub-packages in a top level package and top level package chooses which ones should be distributed publicly or privately. That way you don't even need to implement a separate registry service.
I'm not sure if I understand what you're describing, but I'll point you to this thread, which discusses different ways that Swift Package Manager could support projects containing multiple packages, including the package registry interface.
Following up on this: You can find the OpenAPI specification here.
I'm very interested to hear your thoughts about my responses to your posts in this thread. Please also take a look at this thread discussing the decision to use URIs / URLs to identify packages.
I just submitted a PR to apple/swift-package-manager with fixes to the issues you identified, as well as some new logic for normalizing Windows paths. If you have time to take a look, I'd really appreciate any feedback you have.