Swift Package Registry Service

kylep · June 6, 2020, 9:05pm

First off thank you @mattt for the proposal! Apologies if this question is already answered but how would this proposal expect to solve authentication of clients to the registry for "pulling" packages? It seems like the authentication scheme used by each registry can be different but yet it's not covered how it would be expected to be used. I am thinking along the lines of enterprises that use a package registry of some sort already for custom/private packages and who may not have an SSH level authentication pattern or host in Github.

Packages in Github will work great because the github.com domain is in the public DNS and the authentication can be done with SSH keys with the right URL scheme but when using HTTPS for private packages what is the expectation of the client (Package.swift) to specify how the client authenticates to the repository? Should a "registries" list be supported in the Package.swift file so that the client (SPM) can use the information provided when resolving packages? This would be similar to how npm, ruby, and rust rust all work as well today.

When I initially read through this proposal my mind immediately went to how JFrog would be an adopter of this and how it will benefit enterprise based companies I work with. JFrog has a "universal" repository solution, meaning they support a large number of repository types (npm, ruby, rust, maven, docker, etc) which would then be secured through enterprise level authentication (SAML, OpenID, etc). When a client registers the repository they need to specify the URL, Username and an API Token. When the client (resolver) searches for a package at the URL it will use the token and authentication scheme to authenticate requests. The problem with authentication is that not everyone does the same style but the most common to name a few:

Custom HTTP Header + API Key (in case of JFrog it's X-JFrog-Art-Api)
SSH based (Github)
HTTP basic auth
Some form of Access Token based (using http header Authorization: Bearer <value>)

The point / question is that if the registry like you described is expected to be widely adopted by enterprises (which is where I think there is a really strong need for this proposal) then registry registration in the Package.swift needs to consider how to support a generic authentication model.

There are more feature that an enterprise would need but the support for customization is the most important to get started (MVP). Thanks!

tachyonics · June 6, 2020, 9:10pm

This isn't related to this proposal but rather SwiftPM integration so I didn't comment on it before but I think separating the description of dependencies from the location of the registry (and using something equivalent to NPM's config set registry) is more flexible.

mattt · June 6, 2020, 11:34pm

Thanks for sharing this perspective, @kylep! I found JFrog's approach to artifact management to be quite helpful in shaping my own understanding of this problem space, and believe that the proposed solution is compatible with the use case you're describing.

In the proposal, I leave the matter of authentication up to the server:

While GitHub.com will likely use the same OAuth 2 system they use throughout their API, an enterprise system (like GitHub Enterprise) may use something like SAML or client certificate authentication over an internal network. For JFrog, it may well be that X-JFrog-Art-Api key you alluded to.

Those permissions determine who can access, publish, or unpublish a package release. As for how a registry accesses the code itself to publish or serve a release, that's another detail left to the server. The API call to publish a package includes an optional URL parameter, which you can use to specify where code is hosted. That could include a hard-coded credential or rely on SSH-based authentication between the registry and the code host. Again, something left to the server.

The question of how a client (Swift Package Manager) would interact with a registry like JFrog is something still to be determined in a future proposal. I'll be sure to keep this in mind as we figure out how that will work. A few other folks on this thread (including @tachyonics' post above) have also proposed registry lists, so that's a strategy we'll definitely consider.

kylep · June 7, 2020, 12:49am

Thanks @mattt, I agree this proposal is big. It sounds like in its current form you are just proposing the server side of the registry for publishing and not how the client (Swift Package Manager) interacts with it. This server or service would cover the searching, publish and unpublish side and doesn't describing a "publishing" client which is fine and I agree with (this is how JFrog structures it).

I agree as well, I have learned a lot about this space lately as I was trying to find a similar solution to support secured carthage binary dependencies. I was able to come up with a solution that is quite clever that leverages JFrog's generic repository pattern and Rest API.

It seems as though the final solution is going to have multiple parts right?:

Server Registry specification (this proposal)
Client side (Swift Package Manager) read model to service registry of #1
(Optional) Client side (Swift Package Manager) write model to service registry of #1

jechris · June 7, 2020, 9:59am

Makes sense. However it mean we also have to change url in our Package.swift when using branch/hashes:

- .package(url: "https://swift.pkg.github.com/apple/swift-argument-parser", from: "0.0.1")
+ .package(url: "https://github.com/apple/swift-argument-parser.git", .branch("test"))

Which feel weird to me as end-user. Beside from my understanding of the proposal a package is not tied to one repository url (as we can tag with url parameter) so it might be not always be obvious to end-user what the repository url should be.

My question was can we easily determine the source code the release is attached to? As it seem we can release anything (a tag, a branch, a commit) having just the repository url and the tag don't seem enough when releasing a branch or a commit.

mattt · June 7, 2020, 12:51pm

Correct, though I'm not convinced that Swift Package Manager should have anything to do with publishing to service registries.

jechris:

Makes sense. However it mean we also have to change url in our Package.swift when using branch/hashes:
- .package(url: "https://swift.pkg.github.com/apple/swift-argument-parser", from: "0.0.1")
+ .package(url: "https://github.com/apple/swift-argument-parser.git", .branch

As discussed earlier in the thread, that listing under "Impact to existing packages" was not intended to be an actual proposal for how to integrate with Swift Package Manager. We still need to figure out what that will look like in Package.swift, and this will certainly be a consideration.

In your question, does "we" refer to the client or the server? For the server, there are optional commit, branch, tag, path, and url parameters. For the client, I'd like to find a way to incorporate this information into the detached signature as a form of metadata. But the more likely solution is to include a commit reference in the releases endpoint, alongside the url field for each release.

"1.1.1" {
  "url": "https://swift.pkg.github.com/mona/LinkedList/1.1.1",
  "commit": "d8978910a0934c21ea08e3c9a031a5baa967f5b1"
}

kylep · June 7, 2020, 9:36pm

100% Agree with you. I only listed it as someone is going to need to build these but it should be up to the registry to prescribe. Ideally many will be API based so tools like bash scripting or Fastlane can be used to call the endpoints when the producer determines they are ready to publish.

chmaynard · June 7, 2020, 9:44pm

This reason seems weak -- I would drop it from the list.

kylep · June 7, 2020, 10:32pm

I disagree with this. Git is slow compared to a dedicated package registry. Additionally for the reasons I listed about how the client (user) should pull these dependencies the registry should try to decouple from the original Git repository as much as possible. Like I had been discussing, I wouldn't want to give a consumer both access to the git repository (source) and the registry, if I am an enterprise then I will likely have very different auth models for source code and packages.

SDGGiesbrecht · June 7, 2020, 11:27pm

The relation between the two is very tight in SwiftPM’s current model. It is reasonable to want to enable a package to exist without an associated repository, but it should be a non‐goal to decouple the two. If I’m using a package from a registry, and I want to temporarily insert some print statements to help debug my client code, swift package edit foo should just work. If I then discover the problem is actually with the package and not with my client code, it should be possible to directly check in and push the changes like I can now (provided the package also exists in repository form and I have the necessary permissions).

While Git may never be quite as fast, I don’t think the status quo is a very good indication of how big of a difference there is between the two. There is a lot of low‐hanging fruit in the area of performance that could go a long way to making SwiftPM faster in ways that also benefit Git‐based packages (which are likely to continue to be heavily used at least for team‐local packages). Some examples:

Store clones once in a global location instead of duplicated in every .build directory. This would mean Git only has to fetch it once even if many of your packages depend on it.
Do shallow cloning for pins, .exact dependencies, and other fully‐resolved constraints. When SwiftPM can know ahead of time exactly what checkout it needs, there is no real reason for it to fetch the entire repository. This would mean until you alter the dependency graph, it would only be fetching a tiny fraction of what it fetches for you now.

These have both been discussed extensively, and I haven’t seen anyone oppose them. From what I can tell, the only reason they haven’t been implemented yet is that they just haven’t reached the top of anyone’s priority list.

So yes, Git is a little slower than a direct download, but no, the difference is not “significant”. At least not when compared to the real slowdowns in the current implementation.

The speed and efficiency arguments are weak. The security and durability arguments are much stronger (along with discoverability).

yim_lee · June 8, 2020, 6:09am

I agree.

My focus is not so much around having a search endpoint that each package registry must implement, but I think there needs to be a standard way to find out what packages a registry has such that it's possible to look for a package across multiple registries. This could be done with a package registry webhook like you suggested earlier (subscribes to registry change events), a "list packages" API as @tachyonics mentioned (polls periodically for registry updates), or an actual search endpoint (performs search on each registry), etc.

mattt · June 8, 2020, 2:31pm

I think it'd be helpful to have some specific use cases in mind to guide development of this functionality. What kind of stories would you like to make possible for someone using Swift Package Manager w/registry integration? For example:

As a developer, I want to write a small script in Swift. The script downloads a website, extracts text from HTML, and uses text-to-speech to generate a narration of that content. To do this, I add import statements for the Alamofire and Markup packages and the AVFoundation framework. Swift Package Manager automatically finds the necessary packages from a list of registries I'd configured, and either uses a locally-cached version if available or downloads the latest version.

Karl · June 8, 2020, 5:04pm

Is there a way for developers to trigger non-fatal errors (warnings) in package resolution?

For example, let's say I have a popular package hosted on a registry. Now I want to move active development to a new host; I'd like to keep the old registry entry available, so as not to break existing clients, but I'd like to warn them to update their registry because I won't be pushing new releases on the previous host.

As far as I can tell, there is only the ability to set fatal errors ("problem" fields) in the version information, and the metadata request does not include any kind of "deprecated" flag or anything that is required to trigger a warning message in clients.

yim_lee · June 8, 2020, 5:11pm

Let's simplify the use-case. Suppose I want to find a package for parsing HTML and I don't know about the SwiftDocOrg/Markup package. I would do a web search on "swift html parser", sort through the relevant results, then choose one based on certain criteria that I might have (e.g., most popular, most active).

If I had to do the same for a NodeJS package, I would simply search for "html parser" on https://www.npmjs.com. Its search results page provides at-a-glance useful information such as last release date, quality score, etc. I'd probably still need to click through the results and read the package details, but the contents are centralized and focused so the process is easier.

We could achieve similar user experience if we had a Swift Package Registry, but how about when there is more than one? Should I search on each registry or resolve back to doing web searches?

If we were support a search command, we would need to have a common search API that registries should implement (if they don't then they opt out of their packages being discoverable).

There also seems to be interests in the community to create a package index (e.g., https://swiftpackageindex.com). If it gets information by crawling then perhaps having a list-packages API would do. Or, it could subscribe to change events through registry webhooks.

Again, I agree that it should be up to the registry implementation to decide if it wants to add search feature or not, but there are opportunities for the community to build tools/services that would provide that capability, and I am wondering how we can facilitate that integration.

mattt · June 8, 2020, 7:00pm

I agree that the argument for speed and efficiency in the proposal is underdeveloped, but I maintain that the difference is indeed significant.

In an effort to back this up with real evidence, I constructed a benchmark comparing the performance of doing a full git clone, shallow clone, and downloading and unzipping a Zip archive from GitHub. You can find the results and source code in this gist.

Here's a chart summarizing the results:

Here's another chart that breaks results down by package:

In these initial findings, curl && unzip was consistently about twice as fast as doing a full git clone (and in extreme cases, up to an order of magnitude faster). A shallow clone for a particular version was also faster than a regular clone, but not quite as fast as downloading an archive.

These results are for a fast computer on a fast Internet connection. I'd be interested to collect additional results for other combinations of platform, hardware, and location. I suspect that downloading archives will be as much as 3x to 5x faster in other geographies, due to their proximity to globally-distributed CDNs that host Zip files but don't host Git repositories.

SDGGiesbrecht · June 8, 2020, 10:28pm

You got a full clone of swift-package-manager in less than two seconds?!? I envy Portland’s connection speeds.

I won’t waste a whole workday running it for every one of the packages in your list, but just for fun, I timed that one package using all the three variants. With your verbatim commands in Saskatoon, Canada, this is what happens for me:

Full clone: 23 s
Shallow clone: 3 s
CURL: 3 s

So I guess each of our perceptions are accurate for our respective locations, and that the internet just isn’t homogenous.

stevapple · June 10, 2020, 1:37am

I’m looking forward to this movement, but there’s one long-existing problem: Using URL in Package.swift largely reduces simplicity, and makes it easier to make a mistake since the URL is too long and contains duplicated parts, whether you use git or a registry.

Here’s my suggestion on simplifying Package.swift:

let package = Package(
      name: "MyPackage",
      git: "https://github.com",
      dependencies: [
        .package(identifier: 
"apple/swift-argument-parser", from: "0.0.1"),
      ]
)

let package = Package(
      name: "MyPackage",
      registry: "https://swift.pkg.github.com",
      dependencies: [
        .package(identifier: "apple/swift-argument-parser", from: "0.0.1"),
      ]
)

Such syntax will make the dependencies clearer, and significantly reduce the effort to switch between registries or switch between git and a registry. If one would like to use a package from another registry or git host, he can still use .package(url:). We can easily distinguish if a package is from a source other than the default one.

0xTim · June 10, 2020, 8:26am

So my concern with this is whilst this is great if you're pulling in a single dependency, what happens when you have multiple dependencies that aren't explicitly declared (say two of your dependencies sharing a dependency). Going off these results (and I get they're just early benchmarks!) you'd only need to curl two versions of a dependency before it's quicker to do a full clone. E.g. resolving a Vapor 3 package, which will have ~20 dependencies all of which will be several releases old. This could significantly slow down what is already a slow process.

The other option would be to change it depending on if you had a Package.resolved. If you do, you already know which versions of your packages you won't, so that could speed things up. If you don't, do a full clone to get every version. This already sounds like it's getting coupled with a SwiftPM proposal to integrate with this So whilst we're at it, it would be interesting to see how this plays with the llbuild2 work going on

stevapple · June 10, 2020, 10:46am

Can we host the dependency info in the registry? This may add some work like designing the API and parser, but it can really solve the problem. By retrieving dependency info from the registry, SwiftPM can build the dependency graph by curl before downloading the archives. Dependency problems can also be detected in this stage.

cukr · June 10, 2020, 12:52pm

It breaks down when dependencies aren't constant. Registry cannot know which dependencies would be resolved by the swiftpm on the client machine. It's not a weird edge case, even apple's own packages like sourcekit-lsp do that.