URLs as Swift Package Identifiers

As long as the URI transformations are sufficiently rigorous. What you have already listed would fix all the oddities and glitches I have encountered so far, but I’m not an expert in the nuances of URIs.

While we are at it, it would be nice if we could explicitly codify the fact that the right side of this one is supposed to be accepted by SwiftPM and compliant tools—Yes, I’m looking at you Xcode.—:

I would even say it deserves to be the recommended of the two for use in the manifest or on the command line. Unfortunately, so far the intent has been poorly defined and so one or the other forms tends to cause crashes, sometimes with disagreement one way or the other between patch versions.

1 Like

I like using URLs as identifiers.

There is also this one use case that is fairly common but haven't been mentioned in the OP: when you have a transitive dependency that you have to modify to make your own code work and you want to avoid forking all your dependencies that have this as a dependency in order to use your version until it has been merged -- for this I assume mirror-url could be used.

3 Likes

This is a big caveat, and I'm actually somewhat concerned about this. I cannot find the proper references to cite in a pinch, but URI handling is notoriously difficult, and it differs among even major browsers.

One example: It's not specified here what case- and normalization-insensitive comparisons are to be used, for instance; this is nontrivial because, if I recall correctly, RFCs having to do with URIs (or, at least, Punycode) make use of a normalization and case folding algorithm that's not otherwise commonly used, and while ICU data includes that information, the corresponding case folding APIs are not exposed in Swift.

This all isn't merely for the sake of pursuing perfection for its own sake. Rather, I would expect any discrepancy or ambiguity in mapping URIs unambiguously to a code package could be a potential security exploit allowing for unintended code to be downloaded, compiled, and/or executed.

For that reason, I would greatly urge thorough consideration of conventions or standards other than URIs for this purpose. Moreover, if we are to move forward with URIs in this role, then the applicable transformations should not be hidden from the discussion under a disclosure triangle and would need to be vetted in detail.

7 Likes

Personally I’m strongly against such syntax. The @package annotation, as an interface of SwiftPM, is okay to be ignored in environments other than scripts (maybe with a note message). A modification on import syntax, however, would lead to great inconsistency across code environments. How to deal with such syntax in Swift packages? In single-file compilation process? If we simply ignore it, the code may break, which is not what we’d like to see.

1 Like

URLs seem like the perfect answer, it solves one of the biggest pain-points of other package managers.

I love seeing the proxy-registry. Would it also be possible to configure registries for different targets, instead of one config? I think that would be a great addition, it allows flexibility between your targets/machines/registries and networks. (Separation of release builds and release artifacts)

1 Like

I do use the SPM quite a bit, but have not been following this proposal, so take what I say with some salt.

My observation is more general. My background is decentralised systems (eg https://ensembles.io, GitHub - mentalfaculty/LLVS: Low-Level Versioned Store). Every decentralized system I have been involved with has included some form of global identification. Not including any form of global identification introduces great challenges.

I have seen these problems often in practice. To give an example most will have some familiarity with, Apple's Core Data sync went with a system with no global identification of objects. This means every app adopting it has to introduce complex de-duplication logic of its own.

What does this have to do with SPIs? The danger here is that the proposal will go down the same path. Yes, it may at first seem easier to just treat every repo as independent, with every package unrelated and thus identified by a different URL. This is effectively the choice that Apple made in their sync architecture.

The cost comes when you try to relate one package with another in some way. Is it the same package? Can one package be substituted for another? I don't know how the repo manager works internally, but I assume these are the types of questions it has to know the answer to. I fear that by ignoring the existence of forks etc, which are extremely prevalent in our world, we are severely limiting the potential solution.

Adopting a reverse-DNS style identifier such as the one suggested here would fundamentally change the package ecosystem from a federated, decentralized model into one with a central naming authority and governing body.

I actually think the opposite is true here. The point of using reverse DNS, rather than just an arbitrary name (eg LinkedList), is exactly to have a system of generating unique identifiers in a decentralised manner which are extremely unlikely to clash. We want to avoid having a centralised authority. This approach has been used with success in many cases, including UUIDs in decentralised storage (eg Git), and the registration of reverse DNS names for apps.

3 Likes

I suppose one argument against using URIs as identifier is that it ties identity to location, possibly making it harder for mirroring and caching. And what about moving a package between hosting providers, or even accounts within a single hosting provider (which happens relatively often)? Should that trigger a name change?

On the other hand easier adoption is a strong argument.

But as others have mentioned, using URIs as identifiers relies on everyone using the same normalisation. One of the big problems with that is that bugs here are a pretty silent failure until you hit an edge case way down the line. And the problem is deceptively simple enough that you're tempted into rolling your own, missing something crucial in the process.

For instance, in the Swift Package Index we use URL normalisation and a unique index for our packages table and even though we control all the pieces ourselves we've had some duplicates slip through if I remember correctly.

Maybe a middle ground could be a small package that provides canonical URI normalisation, sort of like symbol mangling isn't something you'd implement yourself?

That still leaves the question what if the rules were to change or be expanded in the future. Wouldn't there need to be some sort of mechanism - like normalisation versioning? - that ensures every part of the chain is using the same rules?

Having said all that, it may be worth looking at the error case when things don't line up. In the case of the SPI it wasn't really all that critical: a duplicate isn't the end of the world. So, what happens in this case here if URIs collide or don't collapse to the same identity? Are there going to be any critical silent errors? Or is it just a matter of having to review urls and fix any inconsistencies manually?

Given that tooling around registries is probably going to be heavily automated, maybe URI normalisation would lead to subtle and hard to debug issues with cause and effect far removed from each other?

Is it perhaps feasible to use URI normalised identity as a fallback in the absence of a reverse DNS identity being defined? That could fix both the adoption issue while also providing stable identifiers.

Speaking purely in terms of the Swift Package Index, I think a true package identifier would be helpful in trying to weed out duplicates if/when we start ingesting from multiple sources, some of which might include mirrored packages.

5 Likes

I haven't been following this proposal closely and don't know if these points where discussed somewhere else. However I would like to address some things I picked along the way working mainly with JVM, JS and Python ecosystems.

  • I would suggest to take a look at existing specs trying to solve this problem like package-url

  • Many package managers split the package identification from its location via a repository/registry concept.

I suppose one argument against using URIs as identifier is that it ties identity to location, possibly making it harder for mirroring and caching.

This. Everywhere I've worked so far, had its own internal registry/repository and/or mirror for all the packages that were used, either for caching and faster pull times or for compliance reasons (more on this later).

A package repository represents the complete Bill of Materials (BoM) in a defined scope, whether it's the whole company or some single project.

Tying the package identity to one specific URL would break many such flows.

  • The URL alone does not Identity a package without a version number attached to it, i.e. https://github.com/OctoCorp/linkedlist is just a URL pointing to all the versions of the package.

  • Compliance/Security: IMHO, one of the main goals should should be focused on all the required automation around packages that would follow, especially in relation to regulation, security and compliance purposes.

Almost all of the clients we've had, required a very detailed and fine-grained Bill of Materials (BoM) describing the project. For example, they wanted to know about each package, its origin, its version, the complete licence text, the complete copyright information, even SHA checksums and whether there are any known CVEs for the specified version. (side note: we are using CycloneDX as our preferred choice for a SBoM format.)

Thus the packages should be scanned against a list of allowed licenses, should be scanned against CVE databases, include license information etc. which means a package should be globally identifiable regardless of its location. This however doesn't mean a centralised model. See: Comment 7 by drewmccormack


I guess what I'm trying to say is, that package identification should not be conceptualised without taking all of these use cases into consideration.

2 Likes

This was discussed at some length by @Karl in the review thread:

I don't want to diminish the potential for ambiguity or complexity in parsing URIs. However, I do think it's important to focus on how package identifiers would be used.

The list of URI transformations in the proposal are specific to Swift Package Manager, and are a way to reduce the incidence of duplicate nodes in the dependency graph. When the user does swift build --enable-package-registry, SPM takes the canonicalized URIs, prepends the https:// scheme and sends a HEAD request to the resulting URL to see if they support Swift package registry. If the server responds accordingly, SPM attempts to resolve that dependency through the registry interface; otherwise, it falls back to Git (unless the user opts out by setting --disable-repository-fallback).

Any dependency URLs declared in Package.swift were put there by the developers authoring that package. Any transitive dependencies were put there by the maintainers of those direct dependencies. Would you say that the potential for security exploits due to URI parsing ambiguity is distinct from those inherent from importing 3rd-party code?

3 Likes

Swift Package Manager is currently unable to answer these kinds of questions. If the same package is declared with different URLs (only the last path component is considered, case-insensitive), there's no way to reconcile them. There are various heuristics that could be implemented to determine if the contents of two repositories are equivalent (e.g. comparing their histories), but we aren't doing that yet.

The proposed registry interface is better equipped, thanks to HTTP's semantics for relocating and canonicalizing resources. A server can respond with a 301 Moved Permanently or a Link: <...> rel="canonical" header field to establish equivalence between the requested resource and what's returned.

I agree with you that we want to avoid having a centralized authority. However, I believe that's the only way (other than using random collision-free identifiers like UUIDs) to ensure coordination of namespaces. The example you cite with app names is instructive: Apple is the gatekeeper for who can publish apps under which identifiers. Without a central name registry, everyone would have equal claim to packages published under, for example, the org.swift namespace.

The proposed registry specification is designed explicitly with caching and mirroring in mind.

4.4. Fetch source archive

A client MAY send a GET request for a URI matching the expression /{package}/{version} to retrieve a release's source archive. A client SHOULD set the Accept header to application/vnd.swift.registry.v1+zip and SHOULD append the .zip extension to the requested URI.

GET /github.com/mona/LinkedList/1.1.1.zip HTTP/1.1
Host: packages.example.com
Accept: application/vnd.swift.registry.v1+zip

If a release is found for the requested URI, a server SHOULD respond with a status code of 200 (OK) and the Content-Type header application/zip. Otherwise, a server SHOULD respond with a status code of 404 (NOT FOUND).

HTTP/1.1 200 OK
Accept-Ranges: bytes
Cache-Control: public, immutable
Content-Type: application/zip
Content-Disposition: attachment; filename="LinkedList-1.1.1.zip"
Content-Length: 2048
Content-Version: 1
Digest: sha-256=a2ac54cf25fbc1ad0028f03f0aa4b96833b83bb05a14e510892bb27dea4dc812
ETag: e61befdd5056d4b8bafa71c5bbb41d71
Link: <https://mirror-japanwest.example.com/mona-LinkedList-1.1.1.zip>; rel=duplicate; geo=jp; pri=10; type="application/zip"

Furthermore, the a package's identity and its location are only related by default.

The URI canonicalization described in the specification is specific to Swift Package Manager. On the server-side, the package URI can be treated as a string (with the universal security considerations as any user-provided input). The only requirement of the server is that it treat this package identifier string as case- and normalization-insensitive.

Thanks for sharing this. I think the details about how purls are decoded is instructive, and demonstrate the feasibility of using a URL-compatible identifier.

What we're proposing here is closer to what Go does for module paths.

import (
	"fmt"
	"github.com/mona/linkedlist"
) 

We've designed the registry specification with this exact use case in mind.

4.2.1. Package release metadata data standards

A server MAY include metadata fields in its package release response. It is RECOMMENDED that package metadata be represented in JSON-LD according to a structured data standard. For example, this response using the Schema.org SoftwareSourceCode vocabulary:

{
  "@context": ["http://schema.org/"],
  "@type": "SoftwareSourceCode",
  "name": "LinkedList",
  "description": "One thing links to another.",
  "keywords": ["data-structure", "collection"],
  "version": "1.1.1",
  "codeRepository": "https://github.com/mona/LinkedList",
  "license": "https://www.apache.org/licenses/LICENSE-2.0",
  "programmingLanguage": {
    "@type": "ComputerLanguage",
    "name": "Swift",
    "url": "https://swift.org"
  },
  "author": {
      "@type": "Person",
      "@id": "https://example.com/mona",
      "givenName": "Mona",
      "middleName": "Lisa",
      "familyName": "Octocat"
  }
}

Future directions

Security auditing

The response for listing package releases could be updated to include information about security advisories.

{
    "releases": { /* ... */ },
    "advisories": [{
        "cve": "CVE-20XX-12345",
        "cwe": "CWE-400",
        "package_name": "github.com/mona/LinkedList",
        "vulnerable_versions": "<=1.0.0",
        "patched_versions": ">1.0.0",
        "severity": "moderate",
        "recommendation": "Update to version 1.0.1 or later.",
        /* additional fields */
    }]
}

Swift Package Manager could communicate this information to users when installing or updating dependencies or as part of a new swift package audit subcommand.

This feature isn't designed yet, so that's certainly a direction we could take. I look forward to hashing out the exact behavior in a future pitch thread.

4 Likes

Forgive me if I misread the core team’s prompt. But the following text reads to me like the topic here is about much more than how Swift package identifiers will be used in this proposal:

  1. Package identities are a critical component of this proposal and the Swift package ecosystem. When the core team discussed the various opinions brought up in the review regarding using URLs as identities compared to name-spaced identities @Douglas_Gregor brought up the point that using a stable and unique name-spaced package identity would also provide the infrastructure to resolve module name conflicts (e.g. Namespacing of packages/modules, especially regarding SwiftNIO ), which would be a great benefit to the ecosystem. This topic needs to be further explored in a dedicated forum thread in preparation to the next revision.

An identity for the ecosystem which is to be used as infrastructure to solve other problems, which is what the core team wants to talk about, would necessarily be more than what you’re talking about here.

And the problem I have is if we are to use URIs to which the proposed heuristics are applied to be that identity for the ecosystem, then it does not meet the requirements for an identifier at face value, those being (it would seem to me, perhaps naively):

  • Every package should have one and only one identifier
  • Every identifier should correspond to one and only one package

The heuristics proposed here for URIs partly reduce the number of identifiers for one package, but as others mention, if the package moves from GitHub to GitLab, then the heuristics are not enough.

What causes me heartburn, though, is that if normalization rules are not chosen correctly (i.e., consistently with the tools that will actually resolve the URIs and fetch the code), the same identifier could identify more than one package. This may not be an issue for what you propose here, but it does not seem to me to be a suitable identity for the ecosystem. And if that’s what the core team wants out of it, I think we need to discuss alternatives.

7 Likes

Don’t want to muddy waters too much with a lot of to- and froing, so will try to keep it short.

My main point here is that by having no way to recognize associations between packages, you may be inadvertently restricting future features, even if not needed now. Perhaps falling back on URL as an identifier where a reverse DNS is not present is a good option: it is friendly for users now, but allows package devs to already prepare by labeling their package with a unique id. Then there will never be a need to fallback to heuristics in future.

A smaller point: Apple is not really a gatekeeper for app bundle ids. It is true they will enforce it when submitting to their App Store, but bundle ids can be created by anyone without registration. You can release a Mac app with any id you like, and it can be installed on a macOS system. Collisions are very unlikely if devs do stick to the reverse DNS format. Worrying about reverse dns collisions is like worrying about uuid collisions: in practice it doesn’t happen.

1 Like

That's how I understand their feedback as well, and how I discuss them in the "Future directions" section of our updated proposal. There I describe how the proposed package identifier scheme could be used to support module name collision resolution and importing external modules without a package manifest. Are there any other problems that I need to consider as part of this proposal? Do you have any questions or concerns about the solutions as described?

The heuristics for package URL canonicalization are not a solution to that problem — those are addressed by other parts of the proposal.

If a package maintainer wants to use their own domain to identify their package, the registry makes this easy to do.

For example, if the SSWG wanted to identify SwiftNIO as swift.org/server/swift-nio, they could do that by adding a Link header in the HTTP response for that URL:

HTTP/1.1 200 OK
Content-Type: text/html
Link: <https://github.com/apple/swift-nio>; rel="service"

(Alternatively, the server can implement the package registry interface directly, and use GitHub / GitLab / whatever URLs for source archives.)

On the client-side, Swift Package Manager could establish a connection between swift.org/server/swift-nio and github.com/apple/swift-nio, using that information to resolve any transitive dependencies that use the GitHub URL.

It's similar to how DNS works generally. Some people have a website hosted on Medium.com and use an @icloud.com email address. Some people purchase their own domain and either self-host their website and email directly, or delegate to another provider using CNAME / ALIAS / MX records.

I'm curious to hear your thoughts about this, as it would help me better understand your concerns.

What's being proposed is the equivalent of submitting to the App Store. For a package to be shared, it would need to be pushed to a registry, which means enforcing identifiers.

That's assuming that actors aren't intentionally trying to cause collisions in an attempt to forge packages. Without a central naming authority, I'm unable to see how reverse-DNS can provide a workable solution.

3 Likes

Could reverse-DNS for SwiftPM be achieved by registering a new .well-known service?

For example, https://www.example.com/.well-known/swift-package-collection might be a JSON file from SE-0291, listing the packages which are allowed to use the com.example. prefix. (I'm not sure what happens about the www part.)

1 Like

A .well-known service is an option worth considering for associating named resources like packages with a domain. You could use it for authentication, similar to Apple Pay, or as a meta resource, like associated domains for iOS and macOS apps.

There are a few different ways that .well-known could be used, and I'd very much welcome a more developed proposal that could be workshopped and debated.

This could work for individuals or a single organization with a small number of packages. However, enumerating a complete list of packages would be infeasible for hosts like GitHub or GitLab.

Another consideration is that requiring a host to enumerate a complete list of packages could leak information about the existence of a package.

My understanding is that you'd need to host .well-known directories on subdomains as well.

5 Likes

Within manifests, package identity could be improved as you've suggested, possibly by updating Target.Dependency.product(name:package:condition:) to accept a package URL.

There are also subcommands which take package names:

  • swift package edit
  • swift package resolve
  • swift package unedit
  • swift package update

Should these subcommands be updated to accept package URLs? Would this affect a package which has already been put into editable mode?

Xcode has an alternative to swift package edit. A package dependency can be overridden, by adding a local package with the same name:

Should this be replaced by proper edit/unedit commands within Xcode? Or fallback to using the git remote URLs of the local package?

I'll try to give a more detailed example, but I only discovered RFC 8615 recently (for this thread).

If the package manager/registry wants to verify that a com.apple::swift-nio-* package or com.apple::NIO* module is allowed to use the com.apple::* namespace, it could make a request to:

https://apple.com/.well-known/swift-package-collection

This can redirect to a subdomain:

https://www.apple.com/.well-known/swift-package-collection

If the response is an error (e.g. 404 Not Found), then the com.apple::* namespace isn't restricted available (which might give a warning/error by default).

If the response is a package collection, then the com.apple::* namespace is restricted to package URLs in the collection.

Alternatives:

GitHub or GitLab wouldn't need to put the packages of other organizations/users in a collection.

The com.apple::* namespace would only be used for public, open-source packages. Perhaps the org.swift::* or com.apple.opensource::* namespaces are more suitable? (Existing closed-source system frameworks have com.apple.* bundle identifiers.)

1 Like

Thanks for taking the time to think about this, Ben. I really appreciate having a concrete to discuss and debate the merits on.

At least for now, external dependencies resolved through a registry can't be edited, so this has no impact on edit or unedit. It's not clear whether that functionality can/should be added in the future, but that's our starting point.

For resolve, our plan is to continue to use the name argument initially, and eventually add support for package URLs later. We may want to bolster that command to provide more information about how a dependency is resolved. I started to sketch that out in this gist (I use the spelling swift package discover, but this could be folded into resolve instead).

Same for update as for resolve; we can ship registry support as-is and add package URL arguments later.

Overall, I think this is a workable approach to using reverse-DNS identifiers. I appreciate your thinking through how this might work.

What do you see as the benefits of a reverse-DNS approach over the URI approach being proposed?

1 Like

I think packages should be identified by URL, for the addressability and compatibility reasons in your proposal.

I'm not so sure about the future direction, where module name collisions are resolved by introducing aliases in the manifest. Is it sufficient to alias only the products of direct dependencies? Can there also be collisions due to indirect dependencies?

The benefits of reverse-DNS namespaces are:

  • they are chosen by the original author, whereas aliases would be an ad hoc solution.

  • they can appear in documentation to promote usage, thereby avoiding collisions by default.

  • they can be extended to Xcode SDKs, e.g. the closed-source com.apple::System and open-source org.swift::System (SystemPackage FAQ).

I've suggested the :: separator, so that the namespace is still obvious in the case of submodules.

The goal of this discussion is for us to arrive at a package identity scheme that would help address these technical requirements:

  • SwiftPM must be able to dedupe packages since otherwise we end up with duplicate symbols (build issues) as well as violate assumption about single copy of the code in the memory space (runtime issues).
  • Using unique and unambiguous package identifiers would give SwiftPM infrastructure to generate unique and unambiguous module names. This would allow us to solve a long standing issue in Swift where module from different packages could collide (e.g., different packages that vend a Utilities module).

Suppose we set aside implementation details for a moment and consider the two proposed approaches in more generic terms:

  • URLs as identifiers => location-based identifiers
  • Reverse-DNS identifiers => opaque identifiers

Let’s take a closer look at how SwiftPM would make use of each identifier type and list the pros and cons.

Location-based package identifiers

Besides identifying a package, this type of identifiers (presumably URL/URIs) also provides a way of locating either:

  • The registry that hosts the package: The server at the specified host component of the location either confirms it is a package registry or redirects SwiftPM to the associated registry. SwiftPM will then interact with the registry via the proposed registry APIs.
  • The Git repository of the package: If a registry cannot be located at this location, SwiftPM will attempt to treat it as a Git address and perform Git operations against that location.

Pros

  • Easier transition. The model is closer to how SwiftPM works today and existing packages can take advantage of registry support without any modifications (e.g., configuration can be omitted since registries can be inferred from URL/URIs), increasing adoption of the registry from day one.
  • URLs inherently provide mechanism for ownership verification. It is much more difficult to “steal” an identity other than network level attacks. Name-squatting and social engineering around URL-mapping is possible, but difficult.
  • Automatically falling back to Git helps mitigate broken registries (the escalator to stairs analogy). This has some complications when it comes to deduplication since Git protocol is not aware of “redirection” the same way the registry HTTP protocol does.
  • URL/URIs are unique identifiers that we can use to generate unique module names. This works if we can also reliably deduplicate packages.

Cons

  • Binding identity with location makes reliable deduplication hard.
    • This is especially problematic for renamed or transferred repositories. It is fairly common that a project starts out as someone’s personal repository and later transitions to an organization account which would change its identity. Another common example is a package that starts in an internal corporate setting and later moves to the public space. Some SCM hosting providers offer a feature where it supports both the old and new URLs for Git clones, which means we could end up fetching the same package with different identities in case a package graph includes both URL/URIs. One real world example of this problem is SR-11338 ([SR-11338] Package resolution fails with "the Package.resolved file is most likely severely out-of-date and is preventing correct resolution; delete the resolved file and try again" · Issue #4673 · apple/swift-package-manager · GitHub). The proposal suggests to workaround such issues using the SwiftPM URL mirroring feature and/or setting up intermediate proxies.
    • Unclear solution for reconciling different transports (SSH vs. HTTPS) in the general case. Mapping rules adopted by one SCM hosting provider may not be applicable to others. For example, a private BitBucket instance can even do custom configuration to affect this.
  • Coupling identity and location can be confusing from the UX point-of-view, especially when a package is renamed or moved. Go works around this by introducing “Vanity URLs (Vanity import paths in Go - Márk Sági-Kazár)” but that requires domain ownership and verification would add a layer of complexity. Vanity URLs also require you to think about them upfront which is not likely since there is not a strong technical reason for you to define such when just starting out—it’s when you want to move the package that you realize that you needed a vanity URL to begin with.
  • Since URL/URIs include the hostname, using it as an identifier brings a lock-in to a certain SCM hosting provider, especially if we would also use the identifier to generate module names so it will become part of module import statements.
  • Artifact management systems that are not coupled with source control system (e.g. Artifactory, CodeArtifact) are key to the behind-the-firewall/enterprise use cases—they are relied upon instead of the Internet for both public and private packages. Using URL/URIs as identifiers will force such systems to use the URL/URIs as opaque strings, which is unnatural and likely slow adoption of the registry API.

Opaque package identifiers

These identifiers are for package identification only. The list of registries is configured separately.

SwiftPM will query each of the registries based on the configured order of preference until it finds one that hosts the given package identifier. SwiftPM will then interact with the registry via the proposed APIs.

If no registry knows about the package identifier, SwiftPM will throw an error.

Pros

  • Proven system used in other ecosystems (e.g., Maven, npm). It is easy to reason about and has short learning curve.
  • Unique and unambiguous identifiers that we can use to dedupe packages and generate module names from.
  • Location is separate from identity which makes moving package sources around a non-issue.
  • Works naturally with enterprise artifact management solutions. e.g., Artifactory, CodeArtifact

Cons

  • Harder transition from the current URL-based system. Given that we need to keep supporting URLs in existing packages, how do we support mixed references to the same graph? One approach would be to ask the registry for a set of URL aliases associated with a package identifier, so that SwiftPM can match up URL references and identifier references to the same package. We would also want to be able to ask the reverse question—i.e., for a given URL, is it associated with a package identifier in the registry. Some registries such as Artifactory might not have URL information.

  • This type of identifiers typically comprises namespace/group and package name/path, and reverse-DNS is a common naming scheme for namespace/group (e.g., Maven). Public registries would need to develop a way to make sure the package author/publisher owns the DNS name in question (e.g., Maven). This adds complexity to implementing and operating a registry obviously, but for individuals and smaller companies this might be a burden as well since they don’t want to spent the time and money on purchasing a domain. One solution would be to use username.github.io (or similar).

  • Name-squatting and identity-hijacking are potential security risks. A malicious actor may operate a registry with “poisoned” packages and “convince” users via social engineering to configure their SwiftPM to look up packages there instead of (or in priority) to the well-known registries. Some aspects of the social engineering could be mitigated by designing the registries configuration file in a way that makes it easy to express specific search path (e.g. io.oddballs.* → the oddball registry that I am not sure I should trust). Further, this could be largely mitigated by supply-chain protection solutions such as transparent logs (mentioned in the original proposal) which are going to be required in either identity scheme. As a side note, most supply-chain protections solutions employ trust-on-first-use (TOFU) scheme, so we would also need to be able to entertain take-down requests and come up with policies and non-technical processes around this problem, but this too is required in either identity scheme.

Based on this comparison, the main drawback with location-based identifiers is that SwiftPM cannot reliably dedupe packages with them, requiring that users set up proxies and/or elaborate mirroring rules to work around such issues, and pushing the complexity to the end-user side. On the other hand, opaque identifiers seem to be a more well-rounded option, but it has quite a few implementation burdens on the registry providers, pushing the complexity there. It is likely to also slow adoption as it requires gradual transition from the existing URL-based model to a different one.

Proposed changes to the current design

  • Opaque package identifiers are preferred because they satisfy all of the technical requirements listed at the beginning of this post. Using opaque identifiers implies location is separate from package identity, and no assumption or deduction of location is made from an identifier—a configuration is used to define registries from which packages are resolved.
  • A section that details how the registries configuration works (e.g., how registries are defined, the priority, impact the existing mirroring feature, etc.) should be added. IMO we could start with something simple such as a file with a list of registries URLs and drive priority based on ordering in the file and leave refinement of the configuration for future proposals.
  • Reverse-DNS is one possible scheme for assigning opaque identifiers which helps with proving ownership. We should leave the discussion open for alternatives and whether or not there will be requirements on registries to adopt specific scheme, or registries are free to choose their preferred style.
  • We should review the registry APIs to see if any of them are impacted by a transition to opaque identifiers. It seems to me on the surface that there should be minimal impact to the APIs. We should also consider if more API(s) need to be added, as mentioned in a section above.
8 Likes

Thanks for sharing this, @yim_lee . Responding to your points:

Our proposal doesn't bind identity with location. It starts with that assumption, since that's the case most of the time, but provides several mechanisms for the user, package maintainer, and registry to decouple that relationship.

  • A user can specify a mirror URL for a package URL
  • A user can specify custom registry proxy, which has direct control over how package URLs are located
  • A package maintainer can specify a canonical location for the package
  • A registry may redirect a URL to a different location

Our proposed scheme doesn't create vendor lock-in. A package maintainer can specify a rel="canonical" link in the response for GET /{package}. Swift Package Manager could use this information for deduplication and communicate that to the user (e.g. '"github.com/apple/swift-nio" has moved to "swift.org/server/swift-nio"')

Go uses URIs to identify modules, and that doesn't seem to have been an issue for artifact repositories.

https://www.jfrog.com/confluence/display/JFROG/Go+Registry

https://search.gocenter.io

In response to feedback from the Amazon CodeArtifact team in a previous thread, I created a proof-of-concept that implements Swift package registry support using the AWS SDK:

For what it's worth, newer languages like Go, Rust, and Deno, use URLs to identify packages.

This is a bit of a subjective claim. I think you could say the same about URLs — especially since that's how Swift already specifies external dependencies.

Sorry, but I don't understand why an opaque identifier is unique or unambiguous, but a URI isn't. Can you provide an example of a situation that works for opaque identifiers but not URIs?

I haven't worked extensively with Maven, but my impression is that this naming scheme is not always unambiguous. For instance, Google's Best Practices for Java Libraries guide lists a few examples of naming conventions across different projects:

Examples in open source

Case 1 - Keep Java package name and Maven ID

  • Guava
  • Hibernate
  • Joda Time

Case 2 - Keep Java package name, rename Maven ID

  • guava vs guava-jdk5
    • This technically wasn't a new major version, but it is an example of case 2
      that has caused a lot of problems.
  • javax.servlet:javax.servlet-api:3.1.0 vs javax.servlet:servlet-api:2.5

Case 4 - Rename both Java package and Maven ID

  • Square has established this approach as a policy for its Java libraries.
    • OkHttp (com.squareup.okhttp -> com.squareup.okhttp3)
    • Retrofit (com.squareup.retrofit2 -> com.squareup.retrofit2)
  • Apache Commons Lang (org.apache.commons.lang -> org.apache.commons.lang3)
  • RxJava (rx (version 1.x) -> io.reactivex (version 2.x))
  • JDOM (org.jdom -> org.jdom2)
  • jdeferred (org.jdeferred -> org.jdeferred2)

Case 5 - Bundle old and new packages in the existing Maven ID

  • JUnit (junit.framework (versions 1.x-3.x) -> org.junit (version 4))

I think this is actually an anti-feature.

In our off-thread discussions about this, you've used the example of moving from GitLab to GitHub (or vice-versa). If you believe that these services are interchangeable, this is an innocuous change. However, you might feel differently if a package moves to a service you're not familiar with, perhaps hosted in another country, say China or Russia — or indeed to the US. This speaks to my primary objection to using an opaque package identifier.

Incorporating external dependencies is, fundamentally, a matter of trust.

With URI-based package identifiers, the model is simple: A user says "I trust the package hosted on GitHub.com located at the path /mona/LinkedList, and its external dependencies." Addressability plays an important role in trust as well. If your project depends on [github.com/mona/LinkedList](http://github.com/mona/LinkedList`), you can go to that URL and find the source history.

My understanding of the alternative you've described has a very different trust model. A user not only has to decide what packages to trust, but also what registries to trust, and in what order. The package "org.swift.swift-nio" may appear trustworthy, but lacking fundamental addressability, it's unclear how this can be independently verified.


Our proposal for identifying packages by URI satisfies both of the stated requirements.

We've described in detail how everything works, in the proposal, the service interface description, and OpenAPI specification. We've provided a working implementation and a benchmark harness that anyone can use to try out registries yourself, today. We wrote a reference implementation for the registry server and a proof-of-concept of how artifact repositories can add Swift registry support.

If you feel strongly about an alternative solution — one that we considered earlier in the design process, but ultimately rejected — then I think it'd be very helpful to get into specifics before concluding that to be a better option.

This would not be a trivial drop-in change. Content negotiation and HTTP semantics are a core component of this proposal, and moving from URIs to opaque identifiers would require substantial modification to both the server specification and the client implementation.

4 Likes