SE-0292 (2nd review): Package Registry Service

This proposal looks good to me, it seems to cover a wide range of needs. I do have a few notes.

The JSON format means that if anyone wants to apply a scope with the name "default" it will not be possible to express a specific registry url for it. Presumably we should simply forbid the scope name "default" and require all registries to ignore it? Alternatively, the SwiftPM sigil for the default scope in the .swiftpm/config/registries.json file should use a different sigil that is not a valid scope name (perhaps *?)

Nit: The URLs in the section "Set-mirror option for package identifiers" are ill-formed: they begin https:/// (an extra slash).

There has been a discussion up-thread about using .netrc files for credentials. I think this is a good idea and would go further and say that URLs with userinfo components should not be added to the registry file at all (error on configuration). This substantively mitigates the need to police against secrets leaking in these files.

Regarding package name length, it's worth noting that even if this proposal does not impose one there will be a practical upper name length, gated by implicitly by the maximum size of the HTTP request target allowed by the union of proxies and origin server running the registry. That size is substantially larger than 128 characters (tending to cluster around 2000 for the full request-target, implying a length of around 1000 is almost certainly practical).

10 Likes

From the proposal

A formal specification for the package registry interface is provided alongside this proposal.

This was a problem with the last proposal, as well - there are no links to this specification that I could find from the document. It's really important that we consider both of these as being up for review together, IMO. I can't really review this until I see them both.


Just based on what is here, though: I like the idea of having package identities which are not tied to URLs. It's shockingly convenient to be able to say "this package depends on apple.swiftnio" and have things somehow... just work. You could imagine how great this would be if/when scripts are able to declare package dependencies.

One thing I am concerned about is privacy. How does SwiftPM know which registries contain which packages? Does it send the package IDs in plaintext? Because broadcasting your dependencies to every registry you have configured, fishing for one that claims to have it, is really not very private at all.

It would be great if there was a way to send an obfuscated version of the package ID, such that only the registry that holds the package would be able to know which package you're actually referring to. I'm not a crypto expert, so I'm not sure if that's possible. My concern here is less about GitHub, and more about oppressive governments who might spy on your internet traffic and be very interested in which of their residents are making use of things such as encryption technology hosted abroad.

Also, as I mentioned in the previous review, we should work to reject the user info component entirely as soon as possible. GitHub's strategy of embedding OAuth tokens in there is seriously flawed and they need to stop that ASAP - they actually recommend that you put the token in the username component! :man_facepalming:

2 Likes

Unicode case folding is context-insensitive and language-independent unless you specifically opt-in to Turkic mappings. [1] [2]

Would you have any concerns with the following transformations if we called it out more explicitly?

{Name} -> NKFC -> XID_ filter -> locale-independent case folding


Swift Package Manager should be able to resolve a mixed package graph with identifiers and URLs, because the API responses from the registry provide mappings between the two. If a transitive dependency switched from URLs to IDs, everything would continue to work as expected.

Developers would continue to be able to resolve external dependencies using Git. The mappings provided in the API responses to GET /{scope}/{package} and GET /identifiers{?url} provide a migration path for adopting package identifiers.

Thanks for pointing that out. Our original format for package identifiers was @mona/LinkedList, but the Swift core team made a request to change that shortly before it came up again for review. As such, I haven't had much opportunity to think through the full implications of that change.

You're correct to point out the need for "default" to be spelled in such a way that it's not a valid scope name. I think [default] would be a nice alternative, but I'm happy to bikeshed that.

According to the current HEAD of main on apple/swift-package-manager (bbcfe08):

// --netrc-file option only supported on macOS >=10.13

Until .netrc is supported on all platforms, I don't think we can remove support for URL-encoded credentials.

Maximum URL sizes in HTTP payloads were my primary motivation for enforcing a size. Given the inconsistent practical limits in client and server HTTP libraries, I think it makes sense to enforce such a limit in the spec.

Aside from that, putting reasonable upper bounds on the size simplifies other implementation details. It's nice for setting column constraints in the database (and avoiding TOAST). It also protects against ReDos attacks.

128 is indeed more conservative than what could safely fit in an HTTP payload, but comfortably large to accomodate the existing ecosystem. For instance, the longest name in SwiftPackageIndex/PackageList is 61. If anyone can make the case for allowing names longer than the proposed 128, I'd be very interested to hear it.

Apologies for the inconvenience here ā€” I couldn't think of a good way to durably link between these documents while they were still in-flight. The server specification and OpenAPI document will be located in the @apple/swift-package-manager repository. (Direct link to the spec).

Answering your questions individually:

  • Swift Package Manager doesn't know about the existence of registries beyond what's configured. And it doesn't know what those registries contain, except by virtue of requesting each one and either getting a 200 or a 404.
  • Our proposal requires client and server to communicate over a secure HTTPS connection. The package ID is encoded in the URL path, which is encrypted.
  • Each package is requested individually, in isolation from one another. If a default registry and a scoped registry were configured, the scoped registry would know only what packages you requested for that scope, and the default registry would know only about packages without that scope.

This is guaranteed by the transport-level security provided by HTTPS. If you wanted stronger or guarantees about privacy, you could configure an intermediate proxy server, mirror packages on a registry using an obfuscated identifier, or vendor all dependencies to avoid fetching external dependencies entirely.

2 Likes

Perhaps I didnā€™t express the issue sufficiently clearly. It does not matter what we do here; it can be absolutely ā€œperfect,ā€ but unless every registry uses the exact same transformations (and they will not, because sites like GitHub are free to determine their own requirements for user names and repository names), then any transformation applied by SwiftPM, however sensible, will create opportunities for malicious actors to squat the transformed name on a registry where there is a discrepancy between the transformed name and the original name from the perspective of the registryā€™s transforms.

For example, a Turkish repository may (and very sensibly) opt into Turkic mappings. Then SwiftPM, by applying a locale independent case folding, will create a security problem where none need exist.

2 Likes

Sorry for misunderstanding your concern. Two follow-up points to try to get on the same page:

I tried to translate that thought experiment into code, but I'm having trouble understanding what you mean:

import Foundation

// Package name declared as dependency
let name = "I" // LATIN CAPITAL LETTER I (U+0049)

// Registry with locale-independent case folding
let correct = name.folding(options: .caseInsensitive, locale: nil) // LATIN SMALL LETTER I (U+0069)

// Registry with locale-dependent case folding
let incorrect = name.folding(options: .caseInsensitive, locale: Locale(identifier: "tr_TR")) // LATIN SMALL LETTER DOTLESS I (U+0131)

// Registry returning arbitrary, incorrect string
let arbitrary = "foo"

// Swift Package Manager validating name of package returned by registry 
name.compare(correct, options: .caseInsensitive, locale: nil) == .orderedSame // true
name.compare(incorrect, options: .caseInsensitive, locale: nil) == .orderedSame // false
name.compare(arbitrary, options: .caseInsensitive, locale: nil) == .orderedSame // false

In our proposal, package names aren't necessarily derived from repository names, so I'm not sure how GitHub's (or any other hosts') naming policies for repositories would impact how package identities are resolved.

Small correction: we do have NSString.precomposedStringWithCompatibilityMapping in Foundation.

I may be mis-remembering something about the previous revision (and so I'd love to have your pushback) but here's my recollection: In the previous version there was concern that because the package id was a URL that it was difficult to move a package from one host to another. All users of the package would have had to change the URL.

While this revision changes the format of the package-id from a URL (and this is definitely a step forward), to move a package from one host to another it is still effectively necessary that each user of the package must change their configuration in order to specify a new URL to map the scope to a new registry, or to specify a new default registry. The mechanism has changed, but to move a package the user(s) still need to change from one URL to another. The difference between these two approaches seems to me to be quite marginal, and I don't really see how (this particular) aspect has improved.

Edit: the situation is better if we end up with a well-run meta-registry that points off to other hosting providers or child registries. For instance, if we end up with a successful and authoritative swift-package registry that makes it equally easy to host at GitHub, gitlab, etc. I guess my working assumption is that this won't happen, and that we'll simply have registries at GitHub, etc. As @NeoNacho points out above, the design does allow such a meta-registry, but I haven't heard anything to make me suspect that one would be created and will prevail: perhaps that's just a lack of optimism on my part :rofl:.

Yes, I agree this would be a useful requirement. But recall that there was a lot of pushback on the previous version, because reviewers did not believe that simply allowing a chain of redirects between hosts was a sufficient mechanism to deal with the problem of moving a package. Now we're back to saying that's acceptable after all?

That's probably more illustrative of the problem that you thought. From the docs:

locale
The locale to use for the folding operation. Pass nil to use the system locale.

So you're using the system locale, which means if this is a Turkish registry server using a Turkish system locale you'll get the Turkish behavior (dotless i). Probably not what you intended.

The reality is that it's very easy for bugs like this to creep in. A unit test wouldn't have caught that bug. Almost nobody test their code outside of their system locale. It's almost a given that some registry implementations will have that bug.

1 Like

For what it's worth, I wrote the docs for that symbol, so I'm familiar with that behavior. The takeaway is that a registry getting that specific implementation detail wrong wouldn't prevent Swift Package Manager from catching it on the client-side. And I'm confident that a PR to apple/swift-package-manager will receive more code review than some code I wrote in a Swift Playground ā€” especially now that it's been brought to everyone's attention. :wink:

2 Likes

Package identity

Hyphens may not occur at the beginning or end, nor consecutively within a scope.

Are there some older accounts without these conditions?

For example, https://github.com/lorem--ipsum has consecutive hyphens.

A valid package name matches the following regular expression pattern:

\A\p{XID_Start}\p{XID_Continue}{0,127}\z

Can we require C99 extended identifiers, if restrictions are added to package and target names?


New PackageDescription API

Package.Dependency.VersionBasedRequirement is a new type that provides the same interface as Package.Dependency.Requirement for version-based requirements, but excluding branch-based and commit-based requirements.

I'd prefer a Range<Version> parameter. The range arguments are succinct.

dependencies: [
  .package("apple.NIO", "2.0.0"..<"3.0.0"), // upToNextMajor(from:)
  .package("apple.NIO", "2.0.0"..<"2.1.0"), // upToNextMinor(from:)
  .package("apple.NIO", "2.0.0"..<"2.0.1"), // exact()
]

Registry configuration subcommands

This proposal adds a new swift package-registry subcommand for managing the registry used for all packages and/or packages in a particular scope.

This could be implemented as a symlink, if apple/swift-package-manager#3276 is merged.

But would a swift package config set-registry alternative be possible?

1 Like

Make sure to call _ = Locale.current first if thatā€™s what the implementation plans on using.

1 Like

This is supported on all platforms that matter. SPM doesn't back-deploy on macOS beyond 1 release, so will never need to run before 10.13, and the * means that it will safely run everywhere else.

@lukasa My point in bringing up that particular line of code was to highlight that (AFAICT) only macOS is supported. Do you know the current status of .netrc support on Linux and Windows?

Ah shoot, I missed the #if os(macOS) guard just above it. Yeah, that appears to only be implemented on macOS due to limitations in the implementation of swift-corelibs-foundation. This restriction is...frustrating, and IMO we should aim for it to be lifted either by patching SCF or by patching SwiftPM to no longer require it.

2 Likes

It appears that some older GitHub usernames may not satisfy these conditions. I don't know exactly when that policy came into effect, but I wouldn't expect any such grandfathered usernames to be a problem.

We opted to go with something closer to the proposed C++ Identifier Syntax using Unicode Standard Annex 31, but C99 extended identifiers are certainly an option.

I don't have any strong opinions, so I'll defer to whatever @tomerd, @yim_lee, et al. have to say. We'll need to finagle the new API anyway to restrict to versioned releases, so perhaps this would be an easier way to accomplish that.

Yes, that spelling would work just as well. Again, I don't have any strong feelings, so I'm happy to defer to whatever the Swift Package Manager / Core Team prefer.

2 Likes

My primary concern is related to this; has this proposal thought through how a de-centralised eco-system of registries would actually work?

So say I go out and put all my packages in a custom public registry rather than what Github is going to provide. I put in my documentation that users should either set the registry for my scope or the default registry to my custom registry.

The first problem a consumer of packages from my registry would encounter - I'm assuming - is a failure where SwiftPM wouldn't know of a registry where any of my dependencies are.

Whose problem is this-

  1. No-ones; we are not expecting to solve the problem of partial registries. Any registry out there will effectively need to at least proxy the world from the packages it hosts down. This seems like a large dis-incentive for custom registries but we should call it out if this is our expectation.
  2. Mine as the package owner; through documentation/README (in the absence of any other mechanism) will need to detail how to configure dependencies. This seems very fragile and also a dis-incentive for custom registries as adding a new dependency from a custom registry could cause resolution to fail for consumers.
  3. My package's consumer; they need to work through my package's dependency closure to determine where all packages are hosted. This just seems like a potential world of pain.

Outside of option 1 which is a solution, my feeling is that a package in a registry should be able to declare a registry of last resort for any scopes its dependencies use that aren't the hosting registry. This would have to be able to be ignored/disabled for situations where greater control over dependencies is required (such as corporate isolated builds).

There are likely other options but I think we should aim to ensure that consumers only need to configure where their dependencies are hosted when custom registries are involved rather than managing the entire dependency closure.

1 Like

I don't think there's anything wrong with Swift, SPM, etc. being hosted on GitHub. Although there are still very legitimate grievances that could be had with the overwhelming influence of GitHub in the industry.

We can't stop network effects from taking their course, but I think we should "do our part" and not accelerate them, by not outsourcing package URL resolution to GitHub. I know others can configure other package repositories, but realistically, not many will. GitHub's flywheel will just keep spinning.

2 Likes

+1 on the idea of using netrc and banning credentials in the config.

And addressing the corelib-foundation gap is the right approach: we should design for the ideal experience and safety and deal with implementation limitations for what they are.

Overall I see this as a reasonable proposal. I have a number of questions/concerns some of which Iā€™ve raised outside of this thread and Iā€™ll raise them again to get input from the community. Iā€™d love to see these addressed either in this proposal or in a future one after this is approved.

  1. Registry Only - Some users will want an option to only fetch from a registry (i.e. donā€™t fallback to downloading from a URL specified in the Package.swift). This is important for use in organizations that want all package downloads to come from a private registry. Iā€™d expect this to be a config option that a user can set. If the client doesnā€™t find a package in the registry then fail instead of fetching by URL.
  2. Packages without an ID - How can I use a registry to fetch packages that only have a URL and donā€™t (yet?) have a new-style ID (scope + name)? The GET /identifiers{?url} API doesnā€™t seem to help with this. This is related to the first item. Ideally, even packages that do not have an ID could be added to and fetched from a registry so that an organization that wants to fetch all packages from a private registry would have a way to handle open source packages in their dependency graph that donā€™t have an ID.
  3. Credentials - There has already been some discussion around avoiding passwords in URLs and whether .netrc works on non-mac platforms. But .netrc assumes the user has one set of credentials per host name while users may need to use different credentials with different private registries hosted by the same service provider (e.g. different Azure Artifacts registries hosted on pkgs.dev.azure.com). Iā€™d suggest modeling username and password as separate CLI parameters when configuring a registry and store them as separate fields. Ideally a project-level config file could specify which registries to use and credentials for each registry URL could be read from a user-level config file. See this comment where I compare how credentials are handled in npm. Iā€™d prefer if the swift client supported something like NuGet credential providers but that would definitely warrant a separate proposal.
  4. List available files - For future alternative formats (such as binary XCFramework distributions) and for mirroring support Iā€™d like if the registry protocol provided a list of files available for each release. (For previous discussion see my comment and matttā€™s reply).
  5. JSON manifest - When publishing a release to a registry Iā€™d suggest converting the imperative Package.swift into a declarative data model (preferable JSON). Perhaps this is something that swift package archive-source could do? I realize that today Package.swift files can have non-deterministic behavior so this is probably better addressed in a separate proposal. (For previous discussion see this comment and this one).
  6. Publishing - This proposal calls out that publishing is out of scope. But developers using a private registry service (such as AWS CodeArtifact, Azure Artifacts, JFrog Artifactory or Sonatype Nexus) would benefit from having a standard registry API and CLI command for publishing. (This could easily be a separate proposal)
  7. Registering Package URLs - How does a package owner register primary and alternative URLs for a package? Should these be specified in Package.swift so that when a package is published (whether by a push or pull model) the registry becomes aware of them (and it may require some form of verification)?
  8. Centralized vs Decentralized - It appears to me that this registry design works best with a central public registry. It seems likely, whether intentional or not, that GitHub well become the de-facto public registry. If so, will GitHub allow package authors to distribute packages in GitHub public registry even if the package uses a different source code hosting provider? Or if the Swift community's intent is for a decentralized model then I'll echo Simon's question:

Although the original proposal had this kind of fallback behavior, the revised proposal doesn't. A package specified by ID won't resolve through a URL if that ID isn't known to the registry.

It is true that depending on a package by ID can pull in transitive dependencies declared by URL. Is this behavior something you think should be configurable?

Our proposal provides for registries to serve packages only by ID. Assignment of a package to an identifier is a function of the registry, so a package without an ID is one that the registry doesn't know about. The GET /identifiers{?url} endpoint is for reconciling URL-based dependencies with their IDs in a mixed dependency graph.

If a user or organization wants to control how URL-based dependencies are routed (for example, to resolve through an internal server), they can use the existing swift package config set-mirror subcommand to map the original URL to a mirrored location.

Alternatively, you could use a registry to proxy packages that lack a package identifier by configuring a custom registry on a scope named, for example, mirror and translating the normalized URL into a package name, like mirror.github_mona_LinkedList.

I'm unfamiliar with the kind of authentication providers you describe, so thanks for sharing that.

The structure of the registries JSON file allows for additional information to be stored alongside each registry URL. If we wanted to extend this command to support additional authentication information, it could look like this:

  { 
      "registries": {
        "example": {
            "url": "https://example.com",
+           "login": "mona"
        }
      }
  }

In my linked response, I offered that Link headers would support this sort of agent-driven content negotiation (that is, having a server list all of the available artifacts for a release). Would this approach be sufficient for the use cases you have in mind?

This is a complicated issue that have impact beyond the registry itself, so I agree that this would best be discussed in a separate proposal.

There are a lot of questions around how exactly this should be supported, and I look forward to developing this in a follow-up proposal.

As mentioned in your previous point, publishing is out of scope for this proposal. I see this question of registering associated URLs as something to be determined as part of the publishing process. For now, the specification states that a registry should tell the client about repository URLs associated with the package.

Our proposal allows a default registry to be configured with the option to specify a custom registry for any scopes. A project can consult only a single registry for packages in a particular scope. As such, it doesn't matter if two registries disagree about the resolution of a particular package ID. By configuring a registry for your project (or globally in the user's home directory), you're telling Swift Package Manager to trust that registry to resolve packages falling within that configured scope.

Since the overwhelming majority of public packages are hosted on GitHub, I agree that a GitHub-hosted package registry would likely become a popular default. However, the proposal doesn't preclude other service providers from creating their own package registries, or individual developers from using those offerings. We describe a few example applications of custom registries in our proposal.

Custom registries can serve a variety of purposes:

  • Private dependencies: Users may configure a custom registry for a particular scope to incorporate private packages with those fetched from a public registry.
  • Geographic colocation: Developers working under adverse networking conditions can host a mirror of official package sources on a nearby network.
  • Policy enforcement: A corporate network can enforce quality or licensing standards, so that only approved packages are available through a custom registry.
  • Auditing: A custom registry may analyze or meter access to packages for the purposes of ranking popularity or charging licensing fees.