Swift Package Registry Service

This reason seems weak -- I would drop it from the list.

I disagree with this. Git is slow compared to a dedicated package registry. Additionally for the reasons I listed about how the client (user) should pull these dependencies the registry should try to decouple from the original Git repository as much as possible. Like I had been discussing, I wouldn't want to give a consumer both access to the git repository (source) and the registry, if I am an enterprise then I will likely have very different auth models for source code and packages.

6 Likes

The relation between the two is very tight in SwiftPM’s current model. It is reasonable to want to enable a package to exist without an associated repository, but it should be a non‐goal to decouple the two. If I’m using a package from a registry, and I want to temporarily insert some print statements to help debug my client code, swift package edit foo should just work. If I then discover the problem is actually with the package and not with my client code, it should be possible to directly check in and push the changes like I can now (provided the package also exists in repository form and I have the necessary permissions).

While Git may never be quite as fast, I don’t think the status quo is a very good indication of how big of a difference there is between the two. There is a lot of low‐hanging fruit in the area of performance that could go a long way to making SwiftPM faster in ways that also benefit Git‐based packages (which are likely to continue to be heavily used at least for team‐local packages). Some examples:

  • Store clones once in a global location instead of duplicated in every .build directory. This would mean Git only has to fetch it once even if many of your packages depend on it.
  • Do shallow cloning for pins, .exact dependencies, and other fully‐resolved constraints. When SwiftPM can know ahead of time exactly what checkout it needs, there is no real reason for it to fetch the entire repository. This would mean until you alter the dependency graph, it would only be fetching a tiny fraction of what it fetches for you now.

These have both been discussed extensively, and I haven’t seen anyone oppose them. From what I can tell, the only reason they haven’t been implemented yet is that they just haven’t reached the top of anyone’s priority list.

So yes, Git is a little slower than a direct download, but no, the difference is not “significant”. At least not when compared to the real slowdowns in the current implementation.

The speed and efficiency arguments are weak. The security and durability arguments are much stronger (along with discoverability).

4 Likes

I agree.

My focus is not so much around having a search endpoint that each package registry must implement, but I think there needs to be a standard way to find out what packages a registry has such that it's possible to look for a package across multiple registries. This could be done with a package registry webhook like you suggested earlier (subscribes to registry change events), a "list packages" API as @tachyonics mentioned (polls periodically for registry updates), or an actual search endpoint (performs search on each registry), etc.

3 Likes

I think it'd be helpful to have some specific use cases in mind to guide development of this functionality. What kind of stories would you like to make possible for someone using Swift Package Manager w/registry integration? For example:

As a developer, I want to write a small script in Swift. The script downloads a website, extracts text from HTML, and uses text-to-speech to generate a narration of that content. To do this, I add import statements for the Alamofire and Markup packages and the AVFoundation framework. Swift Package Manager automatically finds the necessary packages from a list of registries I'd configured, and either uses a locally-cached version if available or downloads the latest version.

Is there a way for developers to trigger non-fatal errors (warnings) in package resolution?

For example, let's say I have a popular package hosted on a registry. Now I want to move active development to a new host; I'd like to keep the old registry entry available, so as not to break existing clients, but I'd like to warn them to update their registry because I won't be pushing new releases on the previous host.

As far as I can tell, there is only the ability to set fatal errors ("problem" fields) in the version information, and the metadata request does not include any kind of "deprecated" flag or anything that is required to trigger a warning message in clients.

3 Likes

Let's simplify the use-case. Suppose I want to find a package for parsing HTML and I don't know about the SwiftDocOrg/Markup package. I would do a web search on "swift html parser", sort through the relevant results, then choose one based on certain criteria that I might have (e.g., most popular, most active).

If I had to do the same for a NodeJS package, I would simply search for "html parser" on https://www.npmjs.com. Its search results page provides at-a-glance useful information such as last release date, quality score, etc. I'd probably still need to click through the results and read the package details, but the contents are centralized and focused so the process is easier.

We could achieve similar user experience if we had a Swift Package Registry, but how about when there is more than one? Should I search on each registry or resolve back to doing web searches?

If we were support a search command, we would need to have a common search API that registries should implement (if they don't then they opt out of their packages being discoverable).

There also seems to be interests in the community to create a package index (e.g., https://swiftpackageindex.com). If it gets information by crawling then perhaps having a list-packages API would do. Or, it could subscribe to change events through registry webhooks.

Again, I agree that it should be up to the registry implementation to decide if it wants to add search feature or not, but there are opportunities for the community to build tools/services that would provide that capability, and I am wondering how we can facilitate that integration.

4 Likes

I agree that the argument for speed and efficiency in the proposal is underdeveloped, but I maintain that the difference is indeed significant.

In an effort to back this up with real evidence, I constructed a benchmark comparing the performance of doing a full git clone, shallow clone, and downloading and unzipping a Zip archive from GitHub. You can find the results and source code in this gist.

Here's a chart summarizing the results:

Here's another chart that breaks results down by package:

In these initial findings, curl && unzip was consistently about twice as fast as doing a full git clone (and in extreme cases, up to an order of magnitude faster). A shallow clone for a particular version was also faster than a regular clone, but not quite as fast as downloading an archive.

These results are for a fast computer on a fast Internet connection. I'd be interested to collect additional results for other combinations of platform, hardware, and location. I suspect that downloading archives will be as much as 3x to 5x faster in other geographies, due to their proximity to globally-distributed CDNs that host Zip files but don't host Git repositories.

9 Likes

You got a full clone of swift-package-manager in less than two seconds?!? I envy Portland’s connection speeds.

I won’t waste a whole workday running it for every one of the packages in your list, but just for fun, I timed that one package using all the three variants. With your verbatim commands in Saskatoon, Canada, this is what happens for me:

Full clone: 23 s
Shallow clone: 3 s
CURL: 3 s

So I guess each of our perceptions are accurate for our respective locations, and that the internet just isn’t homogenous.

2 Likes

I’m looking forward to this movement, but there’s one long-existing problem: Using URL in Package.swift largely reduces simplicity, and makes it easier to make a mistake since the URL is too long and contains duplicated parts, whether you use git or a registry.

Here’s my suggestion on simplifying Package.swift:

let package = Package(
      name: "MyPackage",
      git: "https://github.com",
      dependencies: [
        .package(identifier: 
"apple/swift-argument-parser", from: "0.0.1"),
      ]
)
let package = Package(
      name: "MyPackage",
      registry: "https://swift.pkg.github.com",
      dependencies: [
        .package(identifier: "apple/swift-argument-parser", from: "0.0.1"),
      ]
)

Such syntax will make the dependencies clearer, and significantly reduce the effort to switch between registries or switch between git and a registry. If one would like to use a package from another registry or git host, he can still use .package(url:). We can easily distinguish if a package is from a source other than the default one.

4 Likes

So my concern with this is whilst this is great if you're pulling in a single dependency, what happens when you have multiple dependencies that aren't explicitly declared (say two of your dependencies sharing a dependency). Going off these results (and I get they're just early benchmarks!) you'd only need to curl two versions of a dependency before it's quicker to do a full clone. E.g. resolving a Vapor 3 package, which will have ~20 dependencies all of which will be several releases old. This could significantly slow down what is already a slow process.

The other option would be to change it depending on if you had a Package.resolved. If you do, you already know which versions of your packages you won't, so that could speed things up. If you don't, do a full clone to get every version. This already sounds like it's getting coupled with a SwiftPM proposal to integrate with this :sweat_smile: So whilst we're at it, it would be interesting to see how this plays with the llbuild2 work going on

1 Like

Can we host the dependency info in the registry? This may add some work like designing the API and parser, but it can really solve the problem. By retrieving dependency info from the registry, SwiftPM can build the dependency graph by curl before downloading the archives. Dependency problems can also be detected in this stage.

1 Like

It breaks down when dependencies aren't constant. Registry cannot know which dependencies would be resolved by the swiftpm on the client machine. It's not a weird edge case, even apple's own packages like sourcekit-lsp do that.

1 Like

Got it. I think it’s worth further discussion. Package registry support is really a huge update so any decision should be well considered and made carefully.

Yeah, I don't know how adopting this approach without any other optimizations would compare. I think the performance characteristics depend on the implementation of the dependency resolver. In practice it could be much slower, or it could be a wash — like I said before, we really won't know until we benchmark it.

@Aciid discussed some alternative strategies earlier in the thread, and I think those are worth considering. Because the specific affordance in the registry API would depend on the decision we make there, I wanted to keep that separate from the main specification.

I don't know of any current functionality like this in Swift Package Manager, but that could be added as part of supporting package registries. The specification discusses the possibility of adding removal reason and security advisories for packages, either of which could be communicated to the user when they do swift package update.


One of the motivations for adopting a package registry is to mitigate this effect. You could delete the original source code repository without affecting the availability of existing packages.

I hope this wasn't covered yet, can't possibly read the whole thread :-) But the thing which jumped into my eye is that I think you are inventing a new header here? If so, I think at the minimum it should be X-Accept-Version until it is in an official RFC.

Also HTTP infrastructure (gateways, proxies, etc) AFAIK are not required to (and often don't) preserve arbitrary headers. It became a little less of an issue w/ HTTPS, but it still exists (because backends terminate the SSL at the front and use caching proxies and such behind), especially on mobile networks.

I think the safer way to deal with versioning is the common approach to version in the API URL, or maybe even better, the content payload.

2 Likes

Nah, BCP 178 suggests deprecating that practice. With that said, it would be sensible to consider the guidance in RFC 7231 § 8.3.1 (which incidentally also suggests not prefixing the header with X-).

I agree that misbehaving intermediaries do not preserve arbitrary headers, but RFC 7230 § 3.2.1 does require them to:

A proxy MUST forward unrecognized header fields unless the field-name is listed in the Connection header field (Section 6.1) or the proxy is specifically configured to block, or otherwise transform, such fields.

It is definitely more common to version elsewhere though, and that will be more resilient to poorly-implemented HTTP infrastructure. Whether that's a good enough reason to change the behaviour is not cut-and-dried, in my view: I can see compelling arguments either way.

3 Likes

What are the arguments against the more resilient option (or for the less-resilient option)?

Setting the compatibility bar as low as possible sounds like it should be a (implicit) design goal. You can imagine there might be Swift developers in developing countries - or even developed countries, TBH - who have no choice but to use less-than-excellent infrastructure.

Fair enough. A MIME type parameter might be another option. Something like Accept/Content-type: application/json+swiftregistry-v1 or application/json...;version=1.

(though my favorite is versioning the actual content, i.e. an XML w/ a proper versioning attribute if even necessary. ups, we do JSON, sigh ;-) )

The argument against is that it is broadly just a bit harder to implement the other options. Putting things in the URL is fine, but it leads to multiple URLs for the same resource, which is suboptimal (particularly for caching). Putting things in the payload is better but it can make parsing a bit more annoying, as you cannot properly perform a parse until you've performed at least a partial parse to help tell you what you actually want.

So the question becomes: should we not do the ideal thing because we're worried about breakage? I think the worry is largely hypothetical: I am not aware of intermediaries that are behaving this way today. But I'm aware that it's a risk.

Yup, and this has the advantage that IANA is pretty liberal with MIME type representations. This also behaves really well with caches (everything just automatically works as it's accepted that Accept/Content-Type are fields that affects caching).

1 Like
Terms of Service

Privacy Policy

Cookie Policy