URLs as Swift Package Identifiers

mattt · December 28, 2020, 10:27pm

Last June, Apple and GitHub announced that the GitHub Package Registry will support Swift packages . A few months later, Bryan Clark started a thread to gather ideas about a standard package registry API that could be implemented by anyone, not only GitHub.

For much of this year, I’ve had the privilege of working with Bryan, as well as Whitney Imura and many other great folks at GitHub, to define a draft specification for a Swift package registry service. I posted our initial pitch on June 4th, which we then developed into a formal proposal that was reviewed from December 8 – 17.

Based on feedback from the review thread and other input from stakeholders, we prepared the following revision for our original proposal. You can find the latest version of the proposal here.

As announced in this thread the Swift core team reviewed our updated proposal and returned it for further revision, seeking stronger consensus around the use of URLs as package identifiers.

[Returned for revision] SE-0292: Package Registry Service

The review of SE-0292: Package Registry Service has concluded and the core team has returned it for revision .

The feedback to the idea of defining an open-standard Package Registry HTTP based API, and implementing support for it in SwiftPM as an alternative to resolving dependencies via git, was very positive. However, there were a number of follow-on questions and concerns that came up during the review that the core team feels require further refinement:

Package identities are a critical component of this proposal and the Swift package ecosystem. When the core team discussed the various opinions brought up in the review regarding using URLs as identities compared to name-spaced identities @Douglas_Gregor brought up the point that using a stable and unique name-spaced package identity would also provide the infrastructure to resolve module name conflicts (e.g. Namespacing of packages/modules, especially regarding SwiftNIO ), which would be a great benefit to the ecosystem.This topic needs to be further explored in a dedicated forum thread in preparation to the next revision.

I'd like to use this thread to discuss the merits and tradeoffs of this approach as well as any alternatives that we might consider.

Proposed Solution: URLs as package identifiers

We believe that using URLs as package identifiers is intuitive and familiar for developers, and will best solve the immediate and future needs of this project.

Here is some relevant discussion from our updated proposal:

Changes to Swift Package Manager

Package identity

Currently, the identity of a package is computed from the last path component of its effective URL (which can be changed with dependency mirroring and forking). However, this approach can lead to conflation of distinct packages with similar names as well as duplication of the same package under different names.

We propose to instead use a normalized form of a package dependency's declared location as that package's canonical identity. Not only would this resolve the aforementioned naming ambiguities, but it would allow for package registries to be adopted by package consumers with minimal changes to their code.

For the purposes of package resolution, package identities are case-insensitive (for example, mona ≍ MONA) and normalization-insensitive (for example, n + ◌̃ ≍ ñ). In addition, the following operations are performed to mitigate insignificant variations in how a dependency may be declared in a package manifest:

The list of URI transformations

Removing the scheme component, if present:
https://github.com/mona/LinkedList → github.com/mona/LinkedList
Removing the userinfo component (preceded by @), if present:
git@github.com/mona/LinkedList → github.com/mona/LinkedList
Removing the port subcomponent, if present:
github.com:443/mona/LinkedList → github.com/mona/LinkedList
Replacing the colon (:) preceding the path component in "scp-style" URLs:
git@github.com:mona/LinkedList.git → github.com/mona/LinkedList
Expanding the tilde (~) to the provided user, if applicable:
ssh://mona@github.com/~/LinkedList.git → github.com/~mona/LinkedList
Removing percent-encoding from the path component, if applicable:
github.com/mona/%F0%9F%94%97List → github.com/mona/🔗List
Removing the .git file extension from the path component, if present:
github.com/mona/LinkedList.git → github.com/mona/LinkedList
Removing the trailing slash (/) in the path component, if present:
github.com/mona/LinkedList/ → github.com/mona/LinkedList
Removing the fragment component (preceded by #), if present:
github.com/mona/LinkedList#installation → github.com/mona/LinkedList
Removing the query component (preceded by ?), if present.
github.com/mona/LinkedList?utm_source=forums.swift.org → github.com/mona/LinkedList
Adding a leading slash (/) for file:// URLs and absolute file paths:
file:///Users/mona/LinkedList → /Users/mona/LinkedList

Future directions

Module name collision resolution

Swift Package Manager cannot build a project
if any of the following are true:

Two or more packages in the project are located by URLs with the same (case-insensitive) last path component (for example, github.com/mona/LinkedList and github.com/OctoCorp/linkedlist)

Two or more packages in the project declare the same name in their package manifest (for example, let package = Package(name: "LinkedList"))

Two or more modules provided by packages in the project have the same name (let package = Package(products: [.library(name: "LinkedList")]))

This proposal directly addresses point #1 and lays the foundation for resolving #2 and #3.

Consider the following package manifest, which Swift Package Manager currently fails to resolve for all three of the reasons listed above (assume both external dependencies are named "LinkedList" and contain a library product named "LinkedList").
// swift-tools-version:5.3
import PackageDescription

let package = Package(name: "Greeter",
                      dependencies: [
                        .package(name: "LinkedList",
                                 url: "https://github.com/mona/LinkedList",
                                 from: "1.1.0"),
                        .package(name: "LinkedList",
                                 url: "https://github.com/OctoCorp/linkedlist",
                                 from: "0.1.0")
                      ],
                      targets: [
                        .target(
                            name: "Greeter",
                            dependencies: [
                                .product(name: "LinkedList",
                                         package: "LinkedList"), // github.com/mona/LinkedList
                                .product(name: "LinkedList",
                                         package: "LinkedList") // github.com/OctoCorp/linkedlist
                            ])
                      ])
By adopting URLs as unique package identifiers,the existing name parameter in dependency .package declarationsbecomes redundant,and could instead be used to label nodes in the dependency graph.
-                        .package(name: "LinkedList",
+                        .package(name: "LinkedList-Mona",
                                  url: "https://github.com/mona/LinkedList",
                                  from: "1.1.0"),
-                        .package(name: "LinkedList",
+                        .package(name: "LinkedList-OctoCorp",
                                  url: "https://github.com/OctoCorp/linkedlist",
                                  from: "0.1.0")
These package names (or, alternatively, the package URI) could then be used in .product declarations to reference a particular package's module.

To resolve name collisions in package modules, a dictionary literal might be used to assign aliases for that module within the declared target.
                             dependencies: [
-                                .product(name: "LinkedList",
-                                         package: "LinkedList"),
-                                .product(name: "LinkedList",
-                                         package: "LinkedList")
+                                "MonaLinkedList": .product(name: "LinkedList",
+                                                           package: "LinkedList-Mona"),
+                                "OctoCorpLinkedList": .product(name: "LinkedList",
+                                                               package: "LinkedList-OctoCorp")
Following from this example, source files in the Greeter target would be able to import both modules according to their assigned aliases.
import MonaLinkedList
import OctoCorpLinkedList
Importing external modules without a package manifest

Developers enjoy using Swift as a scripting language, but wish there were an easier way to import external dependencies in a standalone file, without the overhead of the package manifest and directory structure. Various solutions have been explored through community projects like swift-sh, Beak, and Marathon, and discussions on the Swift forums.

For example, running the command swift path/to/file.swift with the following file would automatically resolve and download the LinkedList dependency before building and running the script executable.
// Proposed syntax by Rahul Malik, Ankit Aggarwal, and David Hart
// See: https://forums.swift.org/t/swiftpm-support-for-swift-scripts/33126
@package(url: "https://github.com/mona/LinkedList", from: "1.1.0")
import LinkedList

// Variation with hypothetical module aliasing
@package(url: "https://github.com/OctoCorp/linkedlist", from: "0.1.0")
import LinkedList as OctoCorpLinkedList

print("Hello, world!")
While the functionality described by this proposal isn't required for such a feature, a registry provides the same benefits of speed, efficiency, and security as it would in the context of a package.

Intermediate registry proxies

By default, the identity of the package is the same as its location. Whether a package is declared with a URL of https://github.com/mona/linkedlist or git@github.com:mona/linkedlist.git, Swift Package Manager will — unless configured otherwise — attempt to fetch that dependency by consulting github.com, which may respond with a Git repository or a source archive (or perhaps 404 Not Found).

A user can currently specify an alternate location for a package by setting a [dependency mirror][SE-0219] for that package's URL.
$ swift package config set-mirror \
--original-url https://github.com/mona/linkedlist \
--mirror-url https://github.com/octocorp/swiftlinkedlist
Dependency mirroring allows for package dependencies to be rerouted on an individual basis. However, this approach doesn't scale well for large numbers of dependencies.

Swift Package Manager could implement a complementary feature that allows users to specify one or more registry proxy URLs that would be consulted (in order) when resolving dependencies through the package registry interface.

For example, a build server that doesn't allow external network connections may specify an internal registry URL to manage all package dependency requests.
$ swift package config set-registry-proxy https://internal.example.com/
When one or more proxy URLs are configured in this way, resolving a package dependency with the URL https://github.com/mona/linkedlist results in a GET request to https://internal.example.com/github.com/mona/linkedlist.

A registry proxy decouples package identity from package location entirely, which could unlock a variety of compelling use cases:

Geographic colocation : Developers working under adverse networking conditions can host a mirror of official package sources on a nearby network.

Policy enforcement : A corporate network can enforce quality or licensing standards, so that only approved packages are available.

Auditing : A registry may analyze or meter access to packages for the purposes of ranking popularity or charging licensing fees.

Finally, I'd like to call attention to two points from the registry specification:

The registry service interface is simple — all of the endpoints (with the exception of the publish endpoint, which has since been removed) can be satisfied by a static file server. (See § 4. Endpoints )
Maintainers can host a package at their own domain and delegate to an external registrar with a custom Link header (see § 4.1.1. Content negotiation )

Alternative suggestion: reverse-DNS identifiers

The feedback shared by the Swift core team included the following example of an alternative reverse-DNS package identifier:


// Proposed by SE-0292

.package(url: "https://github.com/apple/swift-nio", from: "2.0.0") // --> git, upgradable to registry

.package(path: "/code/swift-nio", from: "2.0.0") // --> local, maybe override (based on name)

// Alternative reverse-DNS proposal

.package(url: "https://github.com/apple/swift-nio", from: "2.0.0") // --> backwards compatibility, always git

.package(path: "/code/swift-nio", from: "2.0.0") // --> backwards compatibility, local, maybe override (based on name)

.package(identifier: "com.apple.swift-nio", from: "2.0.0") // --> always registry

.package(identifier: "com.apple.swift-nio", url: "https://github.com/apple/swift-nio", from: "2.0.0") // --> always git

.package(identifier: "com.apple.swift-nio", path: "/code/swift-nio", from: "2.0.0") // --> local, means override

In our proposal, we considered this and other schemes for identifying packages, but ultimately rejected them in favor of URLs. Package.swift manifests have always declared external dependencies using URLs, so this step represents a formalization of long-standing tradition rather than a departure from existing behavior.

Truth be told, I'm unable to articulate a reasonable case for preferring reverse-DNS identifiers over URLs in Swift Package Manager.

The feedback I've received so far stated a preference for this naming scheme, but didn't include much information about how this would work or reasons why it would be a better solution. Rather than try to guess at the reasoning or try (and fail) to make the case myself, I think it'd help to start by articulating some of the reasons why we ultimately decided against them.

Technical arguments against reverse-DNS identifiers

Identifying a package by a URL is simple and intuitive. By default, Swift Package Manager tries to fetch the package at the provided URL, either by downloading through the registry interface or cloning its Git repository. If a mirror is configured, Swift Package Manager will go to the mirrored URL instead. The proposal also suggests a future enhancement, where all package requests are immediately routed through one or more registry endpoints.

Reverse-DNS identifiers, on the other hand, lack fundamental addressability. You can't, for example, navigate to the package identifier com.github.mona.LinkedList from a browser or curl. Instead, you need to first configure a registry. The same is true for Swift Package Manager itself: either the user needs to configure a registry, or Swift Package Manager needs to hardcode a predefined list. (If so, who makes the cut? And if it's a ranked list, how is that order determined?) Finally, relying on registries for resolving package identifiers creates a point of failure; if a registry goes down, there's no way to fallback to Git like there is when you use URLs.

Practical arguments against reverse-DNS package identifiers

URLs have the benefit of gradual adoption. The proposed package registry can be used existing packages without any code changes. (In fact, you can already try it out for yourself today!)

By contrast, a reverse-DNS identifier would require all packages in a project's dependency graph to update to a new syntax and new swift-tools-version before seeing the benefit of the registry interface. Such a move would be disruptive, requiring the entire ecosystem to opt-in to this new system. Network effects being what they are, it would be difficult to convince package maintainers to migrate to a system without first being able to demonstrate the utility of such a move (which requires others to migrate first). So we should expect that this transition would take some time, on the scale of years. If the Swift 2 → 3 migration or any of the changes to PackageDescription API changes are any indication, we should expect this transition to be painful and confusing, too.

Also, we shouldn't take as a given that the project would survive the transition — there are plenty of examples of languages and libraries losing developer market share after updating to a new major API version, and never getting it back. If the upgrade path is too painful, users may start looking elsewhere, whether that's CocoaPods or some other solution (perhaps even a fork of SPM).

Logistical arguments against reverse-DNS package identifiers

Adopting a reverse-DNS style identifier such as the one suggested here would fundamentally change the package ecosystem from a federated, decentralized model into one with a central naming authority and governing body. Such an organization would need to answer the following questions:

How are unique identifiers assigned to packages?
What's the application process? How do you determine who gets to register a namespace?
What is the process for renaming packages or transferring ownership?
Who is responsible for maintaining and funding the infrastructure for this organization?
What packages can be registered? Are there any prohibitions on names or functionality?
How are disputes between maintainers adjudicated? How about among registries?
How would you respond to a DMCA takedown request or other government order? How about a cease and desist order or other private civil actions?

We're encouraged by the positive response from the Swift core team and the community at large, and look forward to addressing the outstanding concerns about package identity. I'm genuinely interested in identifying and building consensus around the best solution for managing package identity now and in the future.

SDGGiesbrecht · December 28, 2020, 11:16pm

As long as the URI transformations are sufficiently rigorous. What you have already listed would fix all the oddities and glitches I have encountered so far, but I’m not an expert in the nuances of URIs.

While we are at it, it would be nice if we could explicitly codify the fact that the right side of this one is supposed to be accepted by SwiftPM and compliant tools—Yes, I’m looking at you Xcode.—:

I would even say it deserves to be the recommended of the two for use in the manifest or on the command line. Unfortunately, so far the intent has been poorly defined and so one or the other forms tends to cause crashes, sometimes with disagreement one way or the other between patch versions.

Cyberbeni · December 28, 2020, 11:45pm

I like using URLs as identifiers.

There is also this one use case that is fairly common but haven't been mentioned in the OP: when you have a transitive dependency that you have to modify to make your own code work and you want to avoid forking all your dependencies that have this as a dependency in order to use your version until it has been merged -- for this I assume mirror-url could be used.

xwu · December 29, 2020, 3:37am

This is a big caveat, and I'm actually somewhat concerned about this. I cannot find the proper references to cite in a pinch, but URI handling is notoriously difficult, and it differs among even major browsers.

One example: It's not specified here what case- and normalization-insensitive comparisons are to be used, for instance; this is nontrivial because, if I recall correctly, RFCs having to do with URIs (or, at least, Punycode) make use of a normalization and case folding algorithm that's not otherwise commonly used, and while ICU data includes that information, the corresponding case folding APIs are not exposed in Swift.

This all isn't merely for the sake of pursuing perfection for its own sake. Rather, I would expect any discrepancy or ambiguity in mapping URIs unambiguously to a code package could be a potential security exploit allowing for unintended code to be downloaded, compiled, and/or executed.

For that reason, I would greatly urge thorough consideration of conventions or standards other than URIs for this purpose. Moreover, if we are to move forward with URIs in this role, then the applicable transformations should not be hidden from the discussion under a disclosure triangle and would need to be vetted in detail.

stevapple · December 29, 2020, 6:43am

Personally I’m strongly against such syntax. The @package annotation, as an interface of SwiftPM, is okay to be ignored in environments other than scripts (maybe with a note message). A modification on import syntax, however, would lead to great inconsistency across code environments. How to deal with such syntax in Swift packages? In single-file compilation process? If we simply ignore it, the code may break, which is not what we’d like to see.

tritter · December 29, 2020, 6:50am

URLs seem like the perfect answer, it solves one of the biggest pain-points of other package managers.

I love seeing the proxy-registry. Would it also be possible to configure registries for different targets, instead of one config? I think that would be a great addition, it allows flexibility between your targets/machines/registries and networks. (Separation of release builds and release artifacts)

drewmccormack · December 29, 2020, 10:41am

I do use the SPM quite a bit, but have not been following this proposal, so take what I say with some salt.

My observation is more general. My background is decentralised systems (eg https://ensembles.io, GitHub - mentalfaculty/LLVS: Low-Level Versioned Store). Every decentralized system I have been involved with has included some form of global identification. Not including any form of global identification introduces great challenges.

I have seen these problems often in practice. To give an example most will have some familiarity with, Apple's Core Data sync went with a system with no global identification of objects. This means every app adopting it has to introduce complex de-duplication logic of its own.

What does this have to do with SPIs? The danger here is that the proposal will go down the same path. Yes, it may at first seem easier to just treat every repo as independent, with every package unrelated and thus identified by a different URL. This is effectively the choice that Apple made in their sync architecture.

The cost comes when you try to relate one package with another in some way. Is it the same package? Can one package be substituted for another? I don't know how the repo manager works internally, but I assume these are the types of questions it has to know the answer to. I fear that by ignoring the existence of forks etc, which are extremely prevalent in our world, we are severely limiting the potential solution.

Adopting a reverse-DNS style identifier such as the one suggested here would fundamentally change the package ecosystem from a federated, decentralized model into one with a central naming authority and governing body.

I actually think the opposite is true here. The point of using reverse DNS, rather than just an arbitrary name (eg LinkedList), is exactly to have a system of generating unique identifiers in a decentralised manner which are extremely unlikely to clash. We want to avoid having a centralised authority. This approach has been used with success in many cases, including UUIDs in decentralised storage (eg Git), and the registration of reverse DNS names for apps.

finestructure · December 29, 2020, 11:38am

I suppose one argument against using URIs as identifier is that it ties identity to location, possibly making it harder for mirroring and caching. And what about moving a package between hosting providers, or even accounts within a single hosting provider (which happens relatively often)? Should that trigger a name change?

On the other hand easier adoption is a strong argument.

But as others have mentioned, using URIs as identifiers relies on everyone using the same normalisation. One of the big problems with that is that bugs here are a pretty silent failure until you hit an edge case way down the line. And the problem is deceptively simple enough that you're tempted into rolling your own, missing something crucial in the process.

For instance, in the Swift Package Index we use URL normalisation and a unique index for our packages table and even though we control all the pieces ourselves we've had some duplicates slip through if I remember correctly.

Maybe a middle ground could be a small package that provides canonical URI normalisation, sort of like symbol mangling isn't something you'd implement yourself?

That still leaves the question what if the rules were to change or be expanded in the future. Wouldn't there need to be some sort of mechanism - like normalisation versioning? - that ensures every part of the chain is using the same rules?

Having said all that, it may be worth looking at the error case when things don't line up. In the case of the SPI it wasn't really all that critical: a duplicate isn't the end of the world. So, what happens in this case here if URIs collide or don't collapse to the same identity? Are there going to be any critical silent errors? Or is it just a matter of having to review urls and fix any inconsistencies manually?

Given that tooling around registries is probably going to be heavily automated, maybe URI normalisation would lead to subtle and hard to debug issues with cause and effect far removed from each other?

Is it perhaps feasible to use URI normalised identity as a fallback in the absence of a reverse DNS identity being defined? That could fix both the adoption issue while also providing stable identifiers.

Speaking purely in terms of the Swift Package Index, I think a true package identifier would be helpful in trying to weed out duplicates if/when we start ingesting from multiple sources, some of which might include mirrored packages.

iabudiab · December 29, 2020, 1:43pm

I haven't been following this proposal closely and don't know if these points where discussed somewhere else. However I would like to address some things I picked along the way working mainly with JVM, JS and Python ecosystems.

I would suggest to take a look at existing specs trying to solve this problem like package-url
Many package managers split the package identification from its location via a repository/registry concept.

I suppose one argument against using URIs as identifier is that it ties identity to location, possibly making it harder for mirroring and caching.

This. Everywhere I've worked so far, had its own internal registry/repository and/or mirror for all the packages that were used, either for caching and faster pull times or for compliance reasons (more on this later).

A package repository represents the complete Bill of Materials (BoM) in a defined scope, whether it's the whole company or some single project.

Tying the package identity to one specific URL would break many such flows.

The URL alone does not Identity a package without a version number attached to it, i.e. https://github.com/OctoCorp/linkedlist is just a URL pointing to all the versions of the package.
Compliance/Security: IMHO, one of the main goals should should be focused on all the required automation around packages that would follow, especially in relation to regulation, security and compliance purposes.

Almost all of the clients we've had, required a very detailed and fine-grained Bill of Materials (BoM) describing the project. For example, they wanted to know about each package, its origin, its version, the complete licence text, the complete copyright information, even SHA checksums and whether there are any known CVEs for the specified version. (side note: we are using CycloneDX as our preferred choice for a SBoM format.)

Thus the packages should be scanned against a list of allowed licenses, should be scanned against CVE databases, include license information etc. which means a package should be globally identifiable regardless of its location. This however doesn't mean a centralised model. See: Comment 7 by drewmccormack

I guess what I'm trying to say is, that package identification should not be conceptualised without taking all of these use cases into consideration.

mattt · December 29, 2020, 2:07pm

This was discussed at some length by @Karl in the review thread:

SE-0292: Package Registry Service

There's a lot to say about the deficiencies of URLs, but luckily I can recommend a couple of videos instead (one short, one long).

HOW FRCKN' HARD IS IT TO UNDERSTAND A URL?! - uXSS CVE-2018-6128 (15 mins). Explains a universal cross-site scripting bug that affected WebKit on iOS. I haven't looked in to the bug in detail, but either there were different URL parsers in the OS which saw different hosts for the same URL, or there was an idempotency bug (parsing -> serialising -> parsing the URL changed its meaning).

A new era of SSRF (47 mins). A now famous talk by Orange Tsai about how to exploit quirks in different URL parsers. It includes this slide, demonstrating how each of the common parsers used by Python sees a different host from the same URL. It's so beautiful I'm thinking about getting a poster made of it:

I don't want to diminish the potential for ambiguity or complexity in parsing URIs. However, I do think it's important to focus on how package identifiers would be used.

The list of URI transformations in the proposal are specific to Swift Package Manager, and are a way to reduce the incidence of duplicate nodes in the dependency graph. When the user does swift build --enable-package-registry, SPM takes the canonicalized URIs, prepends the https:// scheme and sends a HEAD request to the resulting URL to see if they support Swift package registry. If the server responds accordingly, SPM attempts to resolve that dependency through the registry interface; otherwise, it falls back to Git (unless the user opts out by setting --disable-repository-fallback).

Any dependency URLs declared in Package.swift were put there by the developers authoring that package. Any transitive dependencies were put there by the maintainers of those direct dependencies. Would you say that the potential for security exploits due to URI parsing ambiguity is distinct from those inherent from importing 3rd-party code?

mattt · December 29, 2020, 2:48pm

Swift Package Manager is currently unable to answer these kinds of questions. If the same package is declared with different URLs (only the last path component is considered, case-insensitive), there's no way to reconcile them. There are various heuristics that could be implemented to determine if the contents of two repositories are equivalent (e.g. comparing their histories), but we aren't doing that yet.

The proposed registry interface is better equipped, thanks to HTTP's semantics for relocating and canonicalizing resources. A server can respond with a 301 Moved Permanently or a Link: <...> rel="canonical" header field to establish equivalence between the requested resource and what's returned.

I agree with you that we want to avoid having a centralized authority. However, I believe that's the only way (other than using random collision-free identifiers like UUIDs) to ensure coordination of namespaces. The example you cite with app names is instructive: Apple is the gatekeeper for who can publish apps under which identifiers. Without a central name registry, everyone would have equal claim to packages published under, for example, the org.swift namespace.

The proposed registry specification is designed explicitly with caching and mirroring in mind.

4.4. Fetch source archive

A client MAY send a GET request for a URI matching the expression /{package}/{version} to retrieve a release's source archive. A client SHOULD set the Accept header to application/vnd.swift.registry.v1+zip and SHOULD append the .zip extension to the requested URI.
GET /github.com/mona/LinkedList/1.1.1.zip HTTP/1.1
Host: packages.example.com
Accept: application/vnd.swift.registry.v1+zip
If a release is found for the requested URI, a server SHOULD respond with a status code of 200 (OK) and the Content-Type header application/zip. Otherwise, a server SHOULD respond with a status code of 404 (NOT FOUND).
HTTP/1.1 200 OK
Accept-Ranges: bytes
Cache-Control: public, immutable
Content-Type: application/zip
Content-Disposition: attachment; filename="LinkedList-1.1.1.zip"
Content-Length: 2048
Content-Version: 1
Digest: sha-256=a2ac54cf25fbc1ad0028f03f0aa4b96833b83bb05a14e510892bb27dea4dc812
ETag: e61befdd5056d4b8bafa71c5bbb41d71
Link: <https://mirror-japanwest.example.com/mona-LinkedList-1.1.1.zip>; rel=duplicate; geo=jp; pri=10; type="application/zip"

Furthermore, the a package's identity and its location are only related by default.

The URI canonicalization described in the specification is specific to Swift Package Manager. On the server-side, the package URI can be treated as a string (with the universal security considerations as any user-provided input). The only requirement of the server is that it treat this package identifier string as case- and normalization-insensitive.

Thanks for sharing this. I think the details about how purls are decoded is instructive, and demonstrate the feasibility of using a URL-compatible identifier.

What we're proposing here is closer to what Go does for module paths.

import (
	"fmt"
	"github.com/mona/linkedlist"
)

We've designed the registry specification with this exact use case in mind.

4.2.1. Package release metadata data standards

A server MAY include metadata fields in its package release response. It is RECOMMENDED that package metadata be represented in JSON-LD according to a structured data standard. For example, this response using the Schema.org SoftwareSourceCode vocabulary:
{
  "@context": ["http://schema.org/"],
  "@type": "SoftwareSourceCode",
  "name": "LinkedList",
  "description": "One thing links to another.",
  "keywords": ["data-structure", "collection"],
  "version": "1.1.1",
  "codeRepository": "https://github.com/mona/LinkedList",
  "license": "https://www.apache.org/licenses/LICENSE-2.0",
  "programmingLanguage": {
    "@type": "ComputerLanguage",
    "name": "Swift",
    "url": "https://swift.org"
  },
  "author": {
      "@type": "Person",
      "@id": "https://example.com/mona",
      "givenName": "Mona",
      "middleName": "Lisa",
      "familyName": "Octocat"
  }
}

Future directions

Security auditing

The response for listing package releases could be updated to include information about security advisories.
{
    "releases": { /* ... */ },
    "advisories": [{
        "cve": "CVE-20XX-12345",
        "cwe": "CWE-400",
        "package_name": "github.com/mona/LinkedList",
        "vulnerable_versions": "<=1.0.0",
        "patched_versions": ">1.0.0",
        "severity": "moderate",
        "recommendation": "Update to version 1.0.1 or later.",
        /* additional fields */
    }]
}
Swift Package Manager could communicate this information to users when installing or updating dependencies or as part of a new swift package audit subcommand.

This feature isn't designed yet, so that's certainly a direction we could take. I look forward to hashing out the exact behavior in a future pitch thread.

xwu · December 29, 2020, 4:33pm

Forgive me if I misread the core team’s prompt. But the following text reads to me like the topic here is about much more than how Swift package identifiers will be used in this proposal:

Package identities are a critical component of this proposal and the Swift package ecosystem. When the core team discussed the various opinions brought up in the review regarding using URLs as identities compared to name-spaced identities @Douglas_Gregor brought up the point that using a stable and unique name-spaced package identity would also provide the infrastructure to resolve module name conflicts (e.g. Namespacing of packages/modules, especially regarding SwiftNIO ), which would be a great benefit to the ecosystem. This topic needs to be further explored in a dedicated forum thread in preparation to the next revision.

An identity for the ecosystem which is to be used as infrastructure to solve other problems, which is what the core team wants to talk about, would necessarily be more than what you’re talking about here.

And the problem I have is if we are to use URIs to which the proposed heuristics are applied to be that identity for the ecosystem, then it does not meet the requirements for an identifier at face value, those being (it would seem to me, perhaps naively):

Every package should have one and only one identifier
Every identifier should correspond to one and only one package

The heuristics proposed here for URIs partly reduce the number of identifiers for one package, but as others mention, if the package moves from GitHub to GitLab, then the heuristics are not enough.

What causes me heartburn, though, is that if normalization rules are not chosen correctly (i.e., consistently with the tools that will actually resolve the URIs and fetch the code), the same identifier could identify more than one package. This may not be an issue for what you propose here, but it does not seem to me to be a suitable identity for the ecosystem. And if that’s what the core team wants out of it, I think we need to discuss alternatives.

drewmccormack · December 29, 2020, 4:54pm

Don’t want to muddy waters too much with a lot of to- and froing, so will try to keep it short.

My main point here is that by having no way to recognize associations between packages, you may be inadvertently restricting future features, even if not needed now. Perhaps falling back on URL as an identifier where a reverse DNS is not present is a good option: it is friendly for users now, but allows package devs to already prepare by labeling their package with a unique id. Then there will never be a need to fallback to heuristics in future.

A smaller point: Apple is not really a gatekeeper for app bundle ids. It is true they will enforce it when submitting to their App Store, but bundle ids can be created by anyone without registration. You can release a Mac app with any id you like, and it can be installed on a macOS system. Collisions are very unlikely if devs do stick to the reverse DNS format. Worrying about reverse dns collisions is like worrying about uuid collisions: in practice it doesn’t happen.

mattt · December 29, 2020, 6:51pm

xwu:

Forgive me if I misread the core team’s prompt. But the following text reads to me like the topic here is about much more than how Swift package identifiers will be used in this proposal :

Package identities are a critical component of this proposal and the Swift package ecosystem. When the core team discussed the various opinions brought up in the review regarding using URLs as identities compared to name-spaced identities @Douglas_Gregor brought up the point that using a stable and unique name-spaced package identity would also provide the infrastructure to resolve module name conflicts (e.g. Namespacing of packages/modules, especially regarding SwiftNIO ), which would be a great benefit to the ecosystem. This topic needs to be further explored in a dedicated forum thread in preparation to the next revision.

An identity for the ecosystem which is to be used as infrastructure to solve other problems, which is what the core team wants to talk about, would necessarily be more than what you’re talking about here.

That's how I understand their feedback as well, and how I discuss them in the "Future directions" section of our updated proposal. There I describe how the proposed package identifier scheme could be used to support module name collision resolution and importing external modules without a package manifest. Are there any other problems that I need to consider as part of this proposal? Do you have any questions or concerns about the solutions as described?

The heuristics for package URL canonicalization are not a solution to that problem — those are addressed by other parts of the proposal.

If a package maintainer wants to use their own domain to identify their package, the registry makes this easy to do.

For example, if the SSWG wanted to identify SwiftNIO as swift.org/server/swift-nio, they could do that by adding a Link header in the HTTP response for that URL:

HTTP/1.1 200 OK
Content-Type: text/html
Link: <https://github.com/apple/swift-nio>; rel="service"

(Alternatively, the server can implement the package registry interface directly, and use GitHub / GitLab / whatever URLs for source archives.)

On the client-side, Swift Package Manager could establish a connection between swift.org/server/swift-nio and github.com/apple/swift-nio, using that information to resolve any transitive dependencies that use the GitHub URL.

It's similar to how DNS works generally. Some people have a website hosted on Medium.com and use an @icloud.com email address. Some people purchase their own domain and either self-host their website and email directly, or delegate to another provider using CNAME / ALIAS / MX records.

I'm curious to hear your thoughts about this, as it would help me better understand your concerns.

What's being proposed is the equivalent of submitting to the App Store. For a package to be shared, it would need to be pushed to a registry, which means enforcing identifiers.

That's assuming that actors aren't intentionally trying to cause collisions in an attempt to forge packages. Without a central naming authority, I'm unable to see how reverse-DNS can provide a workable solution.

benrimmington · December 29, 2020, 7:52pm

Could reverse-DNS for SwiftPM be achieved by registering a new .well-known service?

For example, https://www.example.com/.well-known/swift-package-collection might be a JSON file from SE-0291, listing the packages which are allowed to use the com.example. prefix. (I'm not sure what happens about the www part.)

mattt · December 29, 2020, 9:23pm

A .well-known service is an option worth considering for associating named resources like packages with a domain. You could use it for authentication, similar to Apple Pay, or as a meta resource, like associated domains for iOS and macOS apps.

There are a few different ways that .well-known could be used, and I'd very much welcome a more developed proposal that could be workshopped and debated.

This could work for individuals or a single organization with a small number of packages. However, enumerating a complete list of packages would be infeasible for hosts like GitHub or GitLab.

Another consideration is that requiring a host to enumerate a complete list of packages could leak information about the existence of a package.

My understanding is that you'd need to host .well-known directories on subdomains as well.

benrimmington · January 3, 2021, 10:37pm

Within manifests, package identity could be improved as you've suggested, possibly by updating Target.Dependency.product(name:package:condition:) to accept a package URL.

There are also subcommands which take package names:

swift package edit
swift package resolve
swift package unedit
swift package update

Should these subcommands be updated to accept package URLs? Would this affect a package which has already been put into editable mode?

Xcode has an alternative to swift package edit. A package dependency can be overridden, by adding a local package with the same name:

Editing a Package Dependency as a Local Package

Should this be replaced by proper edit/unedit commands within Xcode? Or fallback to using the git remote URLs of the local package?

I'll try to give a more detailed example, but I only discovered RFC 8615 recently (for this thread).

If the package manager/registry wants to verify that a com.apple::swift-nio-* package or com.apple::NIO* module is allowed to use the com.apple::* namespace, it could make a request to:

https://apple.com/.well-known/swift-package-collection

This can redirect to a subdomain:

https://www.apple.com/.well-known/swift-package-collection

If the response is an error (e.g. 404 Not Found), then the com.apple::* namespace isn't ~~restricted~~ available (which might give a warning/error by default).

If the response is a package collection, then the com.apple::* namespace is restricted to package URLs in the collection.

Alternatives:

The well-known URIs registry needs a specification of the format, which is why I suggested using a package collection, but a simpler format is possible.
GitHub uses a DNS TXT record to verify an organization's domain.

GitHub or GitLab wouldn't need to put the packages of other organizations/users in a collection.

The com.apple::* namespace would only be used for public, open-source packages. Perhaps the org.swift::* or com.apple.opensource::* namespaces are more suitable? (Existing closed-source system frameworks have com.apple.* bundle identifiers.)

mattt · January 4, 2021, 5:40pm

Thanks for taking the time to think about this, Ben. I really appreciate having a concrete to discuss and debate the merits on.

At least for now, external dependencies resolved through a registry can't be edited, so this has no impact on edit or unedit. It's not clear whether that functionality can/should be added in the future, but that's our starting point.

For resolve, our plan is to continue to use the name argument initially, and eventually add support for package URLs later. We may want to bolster that command to provide more information about how a dependency is resolved. I started to sketch that out in this gist (I use the spelling swift package discover, but this could be folded into resolve instead).

Same for update as for resolve; we can ship registry support as-is and add package URL arguments later.

Overall, I think this is a workable approach to using reverse-DNS identifiers. I appreciate your thinking through how this might work.

What do you see as the benefits of a reverse-DNS approach over the URI approach being proposed?

benrimmington · January 4, 2021, 7:27pm

I think packages should be identified by URL, for the addressability and compatibility reasons in your proposal.

I'm not so sure about the future direction, where module name collisions are resolved by introducing aliases in the manifest. Is it sufficient to alias only the products of direct dependencies? Can there also be collisions due to indirect dependencies?

The benefits of reverse-DNS namespaces are:

they are chosen by the original author, whereas aliases would be an ad hoc solution.
they can appear in documentation to promote usage, thereby avoiding collisions by default.
they can be extended to Xcode SDKs, e.g. the closed-source com.apple::System and open-source org.swift::System (SystemPackage FAQ).

I've suggested the :: separator, so that the namespace is still obvious in the case of submodules.

yim_lee · January 15, 2021, 5:59pm

The goal of this discussion is for us to arrive at a package identity scheme that would help address these technical requirements:

SwiftPM must be able to dedupe packages since otherwise we end up with duplicate symbols (build issues) as well as violate assumption about single copy of the code in the memory space (runtime issues).
Using unique and unambiguous package identifiers would give SwiftPM infrastructure to generate unique and unambiguous module names. This would allow us to solve a long standing issue in Swift where module from different packages could collide (e.g., different packages that vend a Utilities module).

Suppose we set aside implementation details for a moment and consider the two proposed approaches in more generic terms:

URLs as identifiers => location-based identifiers
Reverse-DNS identifiers => opaque identifiers

Let’s take a closer look at how SwiftPM would make use of each identifier type and list the pros and cons.

Location-based package identifiers

Besides identifying a package, this type of identifiers (presumably URL/URIs) also provides a way of locating either:

The registry that hosts the package: The server at the specified host component of the location either confirms it is a package registry or redirects SwiftPM to the associated registry. SwiftPM will then interact with the registry via the proposed registry APIs.
The Git repository of the package: If a registry cannot be located at this location, SwiftPM will attempt to treat it as a Git address and perform Git operations against that location.

Pros

Easier transition. The model is closer to how SwiftPM works today and existing packages can take advantage of registry support without any modifications (e.g., configuration can be omitted since registries can be inferred from URL/URIs), increasing adoption of the registry from day one.
URLs inherently provide mechanism for ownership verification. It is much more difficult to “steal” an identity other than network level attacks. Name-squatting and social engineering around URL-mapping is possible, but difficult.
Automatically falling back to Git helps mitigate broken registries (the escalator to stairs analogy). This has some complications when it comes to deduplication since Git protocol is not aware of “redirection” the same way the registry HTTP protocol does.
URL/URIs are unique identifiers that we can use to generate unique module names. This works if we can also reliably deduplicate packages.

Cons

Binding identity with location makes reliable deduplication hard.
- This is especially problematic for renamed or transferred repositories. It is fairly common that a project starts out as someone’s personal repository and later transitions to an organization account which would change its identity. Another common example is a package that starts in an internal corporate setting and later moves to the public space. Some SCM hosting providers offer a feature where it supports both the old and new URLs for Git clones, which means we could end up fetching the same package with different identities in case a package graph includes both URL/URIs. One real world example of this problem is SR-11338 ([SR-11338] Package resolution fails with "the Package.resolved file is most likely severely out-of-date and is preventing correct resolution; delete the resolved file and try again" · Issue #4673 · swiftlang/swift-package-manager · GitHub). The proposal suggests to workaround such issues using the SwiftPM URL mirroring feature and/or setting up intermediate proxies.
- Unclear solution for reconciling different transports (SSH vs. HTTPS) in the general case. Mapping rules adopted by one SCM hosting provider may not be applicable to others. For example, a private BitBucket instance can even do custom configuration to affect this.
Coupling identity and location can be confusing from the UX point-of-view, especially when a package is renamed or moved. Go works around this by introducing “Vanity URLs (Vanity import paths in Go - My blog)” but that requires domain ownership and verification would add a layer of complexity. Vanity URLs also require you to think about them upfront which is not likely since there is not a strong technical reason for you to define such when just starting out—it’s when you want to move the package that you realize that you needed a vanity URL to begin with.
Since URL/URIs include the hostname, using it as an identifier brings a lock-in to a certain SCM hosting provider, especially if we would also use the identifier to generate module names so it will become part of module import statements.
Artifact management systems that are not coupled with source control system (e.g. Artifactory, CodeArtifact) are key to the behind-the-firewall/enterprise use cases—they are relied upon instead of the Internet for both public and private packages. Using URL/URIs as identifiers will force such systems to use the URL/URIs as opaque strings, which is unnatural and likely slow adoption of the registry API.

Opaque package identifiers

These identifiers are for package identification only. The list of registries is configured separately.

SwiftPM will query each of the registries based on the configured order of preference until it finds one that hosts the given package identifier. SwiftPM will then interact with the registry via the proposed APIs.

If no registry knows about the package identifier, SwiftPM will throw an error.

Pros

Proven system used in other ecosystems (e.g., Maven, npm). It is easy to reason about and has short learning curve.
Unique and unambiguous identifiers that we can use to dedupe packages and generate module names from.
Location is separate from identity which makes moving package sources around a non-issue.
Works naturally with enterprise artifact management solutions. e.g., Artifactory, CodeArtifact

Cons

Harder transition from the current URL-based system. Given that we need to keep supporting URLs in existing packages, how do we support mixed references to the same graph? One approach would be to ask the registry for a set of URL aliases associated with a package identifier, so that SwiftPM can match up URL references and identifier references to the same package. We would also want to be able to ask the reverse question—i.e., for a given URL, is it associated with a package identifier in the registry. Some registries such as Artifactory might not have URL information.
This type of identifiers typically comprises namespace/group and package name/path, and reverse-DNS is a common naming scheme for namespace/group (e.g., Maven). Public registries would need to develop a way to make sure the package author/publisher owns the DNS name in question (e.g., Maven). This adds complexity to implementing and operating a registry obviously, but for individuals and smaller companies this might be a burden as well since they don’t want to spent the time and money on purchasing a domain. One solution would be to use username.github.io (or similar).
Name-squatting and identity-hijacking are potential security risks. A malicious actor may operate a registry with “poisoned” packages and “convince” users via social engineering to configure their SwiftPM to look up packages there instead of (or in priority) to the well-known registries. Some aspects of the social engineering could be mitigated by designing the registries configuration file in a way that makes it easy to express specific search path (e.g. io.oddballs.* → the oddball registry that I am not sure I should trust). Further, this could be largely mitigated by supply-chain protection solutions such as transparent logs (mentioned in the original proposal) which are going to be required in either identity scheme. As a side note, most supply-chain protections solutions employ trust-on-first-use (TOFU) scheme, so we would also need to be able to entertain take-down requests and come up with policies and non-technical processes around this problem, but this too is required in either identity scheme.

Based on this comparison, the main drawback with location-based identifiers is that SwiftPM cannot reliably dedupe packages with them, requiring that users set up proxies and/or elaborate mirroring rules to work around such issues, and pushing the complexity to the end-user side. On the other hand, opaque identifiers seem to be a more well-rounded option, but it has quite a few implementation burdens on the registry providers, pushing the complexity there. It is likely to also slow adoption as it requires gradual transition from the existing URL-based model to a different one.

Proposed changes to the current design

Opaque package identifiers are preferred because they satisfy all of the technical requirements listed at the beginning of this post. Using opaque identifiers implies location is separate from package identity, and no assumption or deduction of location is made from an identifier—a configuration is used to define registries from which packages are resolved.
A section that details how the registries configuration works (e.g., how registries are defined, the priority, impact the existing mirroring feature, etc.) should be added. IMO we could start with something simple such as a file with a list of registries URLs and drive priority based on ordering in the file and leave refinement of the configuration for future proposals.
Reverse-DNS is one possible scheme for assigning opaque identifiers which helps with proving ownership. We should leave the discussion open for alternatives and whether or not there will be requirements on registries to adopt specific scheme, or registries are free to choose their preferred style.
We should review the registry APIs to see if any of them are impacted by a transition to opaque identifiers. It seems to me on the surface that there should be minimal impact to the APIs. We should also consider if more API(s) need to be added, as mentioned in a section above.