SE-0292: Package Registry Service

I've mentioned it before, and I'm going to mention it again - there simply isn't enough time to give this an adequate review.

A formal specification for the package registry interface is provided alongside this proposal. In addition, an OpenAPI (v3) document and a reference implementation are provided for the convenience of developers interested in building their own package registry.

Where? I don't see them linked in the proposal document or announcement post. I spent a couple of days considering and researching the proposal before I even found that document. I don't have the time to review this proposal + the formal specification in 8 days, together with participating in the multiple ongoing concurrency reviews and proposals soon to be reviewed, plus everything else I have to do in life. I don't know if this kind of crunch and lack of work-life balance is normal at Apple, but it's really not acceptable to force it on the community and I'm going to continue to speak out against it.

It's not only bad for the community (both for its members and as a whole), but I fear this effort to rush proposals through before the end of the year and without proper scrutiny is going to lead to a reduction in quality. I hope the core team will accept responsibility for the consequences.


That being said. Here are a couple of points:

1. Ban credentials in URLs

Firstly, RFC3986 deprecated the password component 15 years ago:

Use of the format "user:password" in the userinfo field is deprecated. Applications should not render as clear text any data after the first colon (":") character found within a userinfo subcomponent unless the data after the colon is the empty string (indicating no password). Applications may choose to ignore or reject such data when it is received as part of a reference and should reject the storage of such data in unencrypted form. The passing of authentication information in clear text has proven to be a security risk in almost every case where it has been used.

The same RFC warns that credentials present a major spoofing risk:

Because the userinfo subcomponent is rarely used and appears before the host in the authority component, it can be used to construct a URI intended to mislead a human user by appearing to identify one (trusted) naming authority while actually identifying a different authority hidden behind the noise. For example

ftp://cnn.example.com&story=breaking_news@10.0.0.1/top_story.htm

might lead a human user to assume that the host is 'cnn.example.com', whereas it is actually '10.0.0.1'. Note that a misleading userinfo subcomponent could be much longer than the example above.

Secondly, major browsers have stopped supporting credentials in URLs:

  • Internet Explorer stopped it back in IE6 (yes, in at least this respect, IE6 was ahead of its time. You do not get to say that very often).
  • Safari stopped supporting them in iOS 11. I can't find any official announcement or release notes (or even which version of Safari shipped in iOS 11), but a web search shows that customers noticed. I just tested it by visiting https://guest:guest@jigsaw.w3.org/HTTP/Basic/ (a W3C test server), and even though the URL contains credentials, Safari seems to drop them and asks you to enter them manually.
  • Chrome is trying their best (status and spec discussions). I think the current state is that they ignore credentials in subresource requests but allow them for top-level navigation, or something like that.

I saw the GitHub examples in the proposal, which was a bit surprising (why are GitHub helping to resurrect this relic?). I eventually found this post from 2012 explaining it.

Now, I'm not a web developer, and I'm not going to pretend to know anything about OAuth and the best practices for using it, but this seems like a pretty irresponsible feature for GitHub to be supporting. Here's what they say in the blog post:

If you’re cloning inside a script and need to avoid the prompts, you can add the token to the clone URL:

git clone https://<token>@github.com/owner/repo.git

or

git clone https://<token>:x-oauth-basic@github.com/owner/repo.git

Note : Tokens should be treated as passwords. Putting the token in the clone URL will result in Git writing it to the .git/config file in plain text. Unfortunately, this happens for HTTP passwords, too. We decided to use the token as the HTTP username to avoid colliding with credential helpers available for OS X, Windows, and Linux.

So they tell you that this feature is designed for scripts, but to watch out because your token will be stored in the git config file in plaintext -- ignoring the fact that the token is already stored as plaintext in the script (or in our case, the package manifest). Doesn't make sense to me.

A quick look at the OAuth spec mentions that tokens can be put in to URL components (they use the query component as an example), but seems to discourage its use:

Because of the security weaknesses associated with the URI method (see Section 5), including the high likelihood that the URL containing the access token will be logged, it SHOULD NOT be used unless it is impossible to transport the access token in the "Authorization" request header field or the HTTP request entity-body. Resource servers MAY support this method.

This method is included to document current use; its use is not recommended, due to its security deficiencies (see Section 5) and also because it uses a reserved query parameter name, which is counter to URI namespace best practices, per "Architecture of the World Wide Web, Volume One" [W3C.REC-webarch-20041215].

And here is the Section 5 they refer to:

Don't pass bearer tokens in page URLs: Bearer tokens SHOULD NOT be
passed in page URLs (for example, as query string parameters).
Instead, bearer tokens SHOULD be passed in HTTP message headers or
message bodies for which confidentiality measures are taken.
Browsers, web servers, and other software may not adequately
secure URLs in the browser history, web server logs, and other
data structures. If bearer tokens are passed in page URLs,
attackers might be able to steal them from the history data, logs,
or other unsecured locations.

Given all of this, I would propose that we outright ban credentials in registry URLs, and I'd actually go one step further and ban them from all package URLs.

2. Underspecified components

The proposal mentions certain conditions for registry URLs, but not all components are specified. Can registry URLs contain query strings?

Note: I had more to say here, but after discovering the "formal specification", my questions have changed. Where does the package name in the GET /{package} actually come from? Is its character set limited? Why do the examples in that document all say things like GET /github.com/mona/HashMap? How does the client know that the package sources are hosted on GitHub? Is github.com part of the package name?

Lots of questions. No time :man_shrugging:. Give us a proper review duration and I'll write them up.

3. My suggestion

I've spent the last ~8 months diving deep in to URLs (it's not all I've been doing, but it's one of the things), and the closer you look at them, the worse they look.

There's a lot to say about the deficiencies of URLs, but luckily I can recommend a couple of videos instead (one short, one long).

  1. HOW FRCKN' HARD IS IT TO UNDERSTAND A URL?! - uXSS CVE-2018-6128 (15 mins). Explains a universal cross-site scripting bug that affected WebKit on iOS. I haven't looked in to the bug in detail, but either there were different URL parsers in the OS which saw different hosts for the same URL, or there was an idempotency bug (parsing -> serialising -> parsing the URL changed its meaning).

  2. A new era of SSRF (47 mins). A now famous talk by Orange Tsai about how to exploit quirks in different URL parsers. It includes this slide, demonstrating how each of the common parsers used by Python sees a different host from the same URL. It's so beautiful I'm thinking about getting a poster made of it:

FWIW, we have similar issues...

import Foundation
let url = URL(string: "https://test1@test2@test3/")!
print(url.host) // Optional("test2@test3")
print(url.standardized.host) // Optional("test2@test3")

Safari and most other browsers will consider the host to be test3. The older URL spec that Foundation's URL type follows left this case ambiguous. Newer URL standards have tightened their definitions, so Foundation has been left with nonstandard behaviour that could open the door for misunderstandings and exploits. Presumably SwiftPM would use Foundation's URL support and inherit its quirks.

And then you consider lower levels - I've heard that Foundation's networking is built on cURL, but how does cURL parse URLs? Does it always agree with Foundation? Again, see Orange Tsai's talk about the cURL maintainers' approach to this. There are plenty of times when your URL library says the host is x, but then you make the request and it goes out to some other host y.

There have been so many attempts to standardise URLs over the decades, and none of them have really worked. Applications diverged and added special behaviours (or had bugs), which people relied upon, which caused them to spread, which defeated the standardisation effort. The WHATWG has had to reduce its ambitions with the latest spec: it's now a living document (things can change at any time as new quirks are discovered), and one of the main goals now is just to document reality, not to enforce best practices. In fact, this is literally a quote from the latest standard:

The application/x-www-form-urlencoded format is in many ways an aberrant monstrosity, the result of many years of implementation accidents and compromises leading to a set of requirements necessary for interoperability, but in no way representing good design practices. In particular, readers are cautioned to pay close attention to the twisted details involving repeated (and in some cases nested) conversions between character encodings and byte sequences. Unfortunately the format is in widespread use due to the prevalence of HTML forms.

So if we are going to build something better than our current solution - more secure, more robust - we should take another look at whether we even need to use URLs at all. Let's consider the information packed inside a URL, and whether we actually need it:

Component Needed?
Scheme :x: - always https
Credentials :x: - bad idea, officially deprecated
Hostname :white_check_mark:
Port :grey_question: - always 443?
Path :white_check_mark: - maybe? I assume the package name comes from here somehow
Query :x: - I assume they are not supported
Fragment :x: - doesn't even get sent to the server

So out of everything URLs can do, and all the quirks they have to support, we basically only need 2 things: the hostname of the registry, and a package name (and a port if we want to support nonstandard ports).

Just ask for that data as 2 parameters, keep it separate, and you'll avoid all of those funky URL issues and their associated security vulnerabilities. There are no lower levels (like cURL) which can misinterpret which part is the host, and since we're now explicitly taking a package name instead of a path component (or wherever else that package name comes from), we can add semantic meaning as required (e.g. case insensitivity):

.package(name: "mona/LinkedList", registry: "GitHub.com", from: "1.1.0")
                     |
                    This is an opaque string, not a 'path'

It means that instead of automagically "upgrading" normal dependencies to use the registry, users would have to change their package manifests, but it would give us a much more robust system than we have now or the one proposed.

11 Likes

In general, an enthusiastic +1 for this proposal.

One issue worth considering is how and if registry mirrors will be permitted. Taking China as the obvious example, there are several mirrors set up by corporations or universities, e.g:

Note Rust Crates' mirror deployment instructions, which are used by USTC. Is this something that will be considered for this proposal, with its benefits but adverse implications on security?

This last sentence especially seems open for interpretention. Does it mean completely different URLs might be accepted as equivalent by some configuration?

3.6. Package name resolution
...
Each external package is uniquely identified by the canonical URL of its source code. Therefore, a package is a shared dependency of two packages if and only if both of them declare an external dependency with the same URL.
...
A client MAY use other techniques to determine that two dependencies are equivalent, such as comparing their contents, structure or history.
2 Likes

What is your evaluation of the proposal?

Huge +1 on that! A package registry service is very important as an alternative to git, and I think the general idea and design is very great.

Is the problem being addressed significant enough to warrant a change to Swift?

Of course! Supporting registry is another huge leap of SwiftPM, which will speed up the building process in various conditions.

Does this proposal fit well with the feel and direction of Swift?

I think yes. The proposal defined a very possible and fully functional model for SwiftPM to work with an HTTP registry, but I do have some safety concerns. The proposal suggests an endpoint for retrieving Package.swift through HTTPS, which will then be executed on the local machine. How to protect it from attacks like MitM, which may violate the manifest code? Shall we force the manifests to be encrypted during HTTP transfers?

If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?

I think it’s certainly a great one! The proposal empowers SwiftPM and fits its model quite well, but security needs to be taken extra care of especially when downloading codes (including manifests) from the Internet, since Swift will execute them instead of parsing them. Also, I think the proposal would be better if it includes contents about mirroring a registry as people in specific network conditions will certainly need this.

How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

I’ve been tracking this proposal since it first appears on the forum! Very glad to see it fully drafted. I’m pointing out these potential problems from the perspective of a user and a manager of a registry.

1 Like

Thanks for taking the time to review and share your thoughts, @Karl. Responding to your individual points:

I agree that this is a topic for further consideration, but I don't think that this is particularly relevant to this proposal. Both package registries and conventional, repository-based dependencies will share a common authentication mechanism through Swift Package Manager. The only reason I brought up hardcoded credentials in my response to @0xTim was to describe an existing strategy for accessing non-public external packages.

For context, Swift Package Manager only recently added support for .netrc files [1]. Any decision to ban credentials in URLs should take a considered approach, where deprecation warnings and documentation give SPM users an opportunity to migrate to a better solution.

Packages are uniquely identified by a canonicalized form of a URL that locates their source code. The exact behavior of this is described by the CanonicalPackageIdentity, which was introduced in this PR to apple/swift-package-manager on November 18, 2020.

That's not entirely correct. As I said before, and describe in the proposal, packages are identified by URLs. We use URLs as identifiers not just because a URLs components map to the parameters we need, but because they locate a server resource. As I discussed in my response to @StanTwinB:

From the beginning, Swift Package Manager has taken a federated approach to identity. I think that's a good thing. In contrast to other systems, there's no centralized naming authority for packages. You aren't requesting a package named mona/LinkedList that happens to be hosted on GitHub, you're requesting the package github.com/mona/LinkedList.

By using canonicalized URLs to identify packages, we're saying that https://github.com/mona/LinkedList and ssh://git@github.com:mona/LinkedList.git are the same, and we use HTTP content negotiation to upgrade requests to use a faster, more secure registry interface, when available.

Getting back to the concerns you raised about edge cases in parsing URLs: I'm having trouble imagining a scenario where this ambiguity could be exploited. If an invalid or ambiguous URL is provided, it will either fail to resolve in an HTTP request or it will cause dependency resolution to fail. Do you have a specific concern about how these URLs could cause problems for the package ecosystem?

2 Likes

@yonihemi @stevapple Thanks for weighing in! Responding to your points:

We imagine two primary mechanisms for mirroring — one on the client, and one on the server.

Client-side, you can use swift package config set-mirror (as described above) to route individual packages to a different endpoint. For example, if your project included github.com/mona/LinkedList as a direct or transitive dependency, you could set a mirror to resolve and download it through another server (perhaps geographically closer or within an internal network).

In the future, we could also add support for Swift Package Manager to set blanket policies on how package URLs are routed, rather than specifying mirrors individually. However, we consider this to be out of scope for this proposal.

Server-side, a registry may use Link headers in their response to designate alternative download locations / mirrors. From the specification:

4.4.2. Download locations

A server MAY specify mirrors or multiple download locations using Link header fields with a duplicate relation, as described by RFC 6249. A client MAY use this information to determine its preferred strategy for downloading.

HTTP/1.1 200 OK
Accept-Ranges: bytes
Cache-Control: public, immutable
Content-Type: application/zip
Content-Disposition: attachment; filename="LinkedList-1.1.1.zip"
Content-Length: 2048
Content-Version: 1
Digest: sha-256=a2ac54cf25fbc1ad0028f03f0aa4b96833b83bb05a14e510892bb27dea4dc812
ETag: e61befdd5056d4b8bafa71c5bbb41d71
Link: <https://mirror-japanwest.example.com/mona-LinkedList-1.1.1.zip>; rel=duplicate; geo=jp; pri=10; type="application/zip"

Yes. For example, projects on GitHub may be renamed or moved to other organizations, such that github.com/mona/LinkedList and github.com/Octogroup/LinkedList-Swift are equivalent. Over HTTP, this relationship is indicated by a redirect (303 status code). Swift Package Manager could use this information to reconcile nodes in the dependency graph to a single package identity.

Package registries can provide greater security guarantees than the current approach of fetching source code directly from repositories.

The package registry requires all communication to occur over secure HTTPS connections, which goes a long way to mitigating man-in-the-middle attacks. Swift Package Manager currently allows repository URLs to use insecure HTTP URLs, but most code hosts — including GitHub — automatically upgrade HTTP requests to HTTPS, so the potential security impact is limited. We could, as a separate measure, consider generating warnings or errors for external repository-based dependencies specified with an insecure protocol.

We use checksums as an additional mechanism for ensuring that the package we're downloading is authentic, and hasn't been tampered with.

All of that said, and to your point — a secure connection and verified checksum doesn't guarantee that the package being downloaded isn't itself nefarious. For that reason, package manifests are evaluated within a sandbox, which limit their effect on the system (this is true currently, and is unaffected by the registry).

1 Like

I think that’s a very important one. At least, for the current proposal, we should allow users to set a mirror of the whole registry by such command:

$ swift package config set-mirror \
--original-url https://github.com \
--mirror-url https://localhost:8080/github.com/

which would be prefix-matched and substituted.

Also, maybe we should work out a basic way that a mirror can use to sync with the origin. I’m very eager to build such a mirror for Chinese users because both traffic to GitHub and Swift.org can be terrible here and we definitely need one.

I agree that this is an important issue, and I think we should explore our options for how best to support this use case in the future.

Until we settle on a long-term solution, there are a few different workarounds that don't require any changes to Swift Package Manager:

$ echo "github.com 10.0.0.1" >> /etc/hosts
$ ifconfig lo0 10.0.0.1 alias
$ ipfw add fwd 127.0.0.1,8080 tcp from me to 10.0.0.1 dst-port 443

Sure - I mention it because this proposal already limits which kinds of URLs get the registry treatment. It could also be done later.

Ah, that's interesting - I was wondering about this. I saw some mention of “canonical” URLs in the proposal, but I couldn't find a definition for what that meant and AFAIK it's not a standard thing. So I think that needs to be part of this proposal, and just by itself needs a lot of careful attention.

Is this the {package} part of the GET /{package} request? Because if so, I think it's important for registry implementors to know exactly how that name is generated.

Additionally, I have several issues with the implementation.

  • It tries to parse URL string by itself, and I cannot overstate how bad of an idea that is.

  • It has similar flaws to Foundation’s implementation (e.g. considering the first “@“ to be the userinfo/hostname separator, when it should be the last “@“ in the authority component).

  • It keeps the hostname but drops the port. A "host" is a combination of hostname + port, and both are necessary IMO, but that's straying outside of my area so I'll defer to others about whether this is okay.

  • It replaces tildes (~) with the username component from the URL (if it has one), so using the example from the code comments:

    ssh://mona@example.com/~/LinkedList.git → example.com/~mona/LinkedList
    

    I have never seen this behaviour before; is there any precedent for it? Also, is the component after replacement ~mona or just mona? Given that the code comments appear to be the authoritative documentation for this "canonicalisation" transform, it's important that it is accurate.

  • Percent decoding happens as one of the last steps, but it doesn't validate the string after decoding. What if I included percent-encoded spaces or newlines? If that gets inserted as-is in to a GET /{package} request, the request then contains unvalidated user input and can be manipulated quite easily.

    Maybe this would get rejected by some other layer of SPM, but I don't think so - these are just opaque strings in an invented format, so this construction stage is the only validation that happens and no URL-parser would get invoked on them (I think). A proper analysis of that would require a larger effort, digging through the SPM codebase.

  • It is platform-dependent. Backslashes will be treated as path separators on Windows, but not on OSX or Linux. If we are sending this to a server and it is expecting to look up a particular canonical package identity string, that string cannot depend on the platform that is requesting the package.

  • "my conceit here is that a file path is actually a file:/// URL with an implicit scheme." No it isn't, and that kind of thinking is the path to buggy and brittle code. URL paths implement an abstract model of a tree of nodes, but they carry basically no meaning beyond "whatever the server wants to do with this string". That's the drawback of being "universal".

    There are all kinds of ways that this shows up in practice. For example, the backslash behaviour you implemented for file URLs is actually a property of file paths (which are necessarily OS and filesystem-specific). On Windows, it is considered to be a path separator. On Linux/macOS, it may be used to escape a space (e.g. /some/path/folder\ with\ spaces/). In URLs with special schemes, backslash and forwardslash are equivalent everywhere (e.g. http:\\www.example.com\some\path is the same as http://www.example.com/some/path).

  • .

    /// Swift Package Manager takes additional steps to canonicalize URLs
    /// to resolve insignificant differences between URLs.
    /// For example,
    /// the URLs `https://example.com/Mona/LinkedList` and `git@example.com:mona/linkedlist`
    /// are equivalent, in that they both resolve to the same source code repository,
    /// despite having different scheme, authority, and path components.
    

    I will need more time to process it, but as it stands I am pretty suspicious of this, just at a conceptual level.

  • I think some additional processing of the hostname may be required for this to work as intended. Hostnames with disallowed characters get processed differently based on their schemes (https: will get transformed by IDNA, ssh: will get percent-encoded), but the canonical identity has no knowledge of the scheme its hostname came from, so equivalent hostnames may be represented differently in this canonical form.

The question is whether they are truly the most robust way to represent a Swift package. I would argue that they are not. They are far too complex for our needs, and if we didn't use them, our infrastructure would be more robust with fewer points of possible failure.

Additionally, one of those things is not accurage:

  • Copy-pasting the URL in to your browser may take you to an entirely different site. See my examples in the previous post - Foundation's URL does not even match Safari on Apple's own platform. Also, I did investigate how that uXSS vulnerability was fixed, and it turns out the issue was indeed a mismatch between WebKit's URL type and Foundation (or CF, to be precise).

And the others do not require URLs:

  • TLS is a transport-level technology and does not require URLs.

  • We can still use DNS without URLs.

Maybe it will, maybe it won't. It is certainly plausible that it could be exploited (I suppose it depends what you mean by 'exploited', - perhaps it's more accurate to say that while spoofing usually means tricking a human being, this is more akin to spoofing your code and infrastructure). Even if it just manipulates requests to inject headers, isn't that bad enough? I'm not sure if there are any explicit safeguards against these kinds of things, except for whatever protections might accidentally fall out of the implementation.

A conclusive answer would take more time than I currently have, and clearly defined parameters for what we consider to be unacceptable manipulation of the process by a specially-crafted package.

Okay, well I think it might be worth considering, as part of this new registry system, whether that is still a good idea. For example, does it align with user's expectations? Are there other ways of assigning an identity to a package which are also compatible with not having a central naming authority and don't require taking on all of the baggage of URLs? And how do those solutions align with user's expectations?

For example, we already have issues with things like case-sensitivity. IDNA transformations are another source of difficulty. Basically, deciding whether 2 URLs are the same is non-trivial in Foundation's model (which doesn't do IDNA at construction), and we have no way of telling whether URLs which differ could point to the same package. We might be able to do something about that.

P.S: I just want to add that I really appreciate you and the other proposal authors taking this on. It’s an important development for SPM which is why I’m trying to really interrogate every detail.

3 Likes

+1. A huge thank you to everyone involved in this proposal.

Apologies if I missed this, but I didn’t see an affordance for “configure this machine to only fetch packages & source code through my company registry & git mirror”. Is that a supported use case in this proposal?

This would be similar to having a ~/.config/pip.conf file with ‘index-url: my company.pypi.mirror’

This would be beneficial for audited / regulated teams.

1 Like

Following up on my last message:

I was curious to see how compatible the infrastructure of AWS CodeArtifact was to the semantics of the proposed Swift package registry interface, so I went ahead and implemented a proof-of-concept web application using aws-sdk calls:

I'm happy to report that most concepts in CodeArtifact appear to map cleanly onto a counterpart in Swift. Here's my first approximation of how their respective terminology lines up:

AWS CodeArtifact Swift Package Registry
Domain URL top-level domain or subdomain
Repository Package
Package Release
Package version Release version number
Package version revision N/A (releases are immutable)
Upstream repository Code host repository
Asset Source archive (.zip file)
Package namespace Leading path components in package URL

If nothing else, I hope this code helps you and your team contextualize our ideas for the Swift registry. I look forward to working with y'all to help bring Swift package support to AWS. :smiley:

1 Like

Correct. For example, if a package dependency is specified with the URL of ssh://git@github.com:mona/LinkedList.git, the URL template parameter {package} would be github.com/mona/linkedlist. The GET request made to the package registry would therefore be made to https://registry.example.com/github.com/mona/linkedlist.

From the perspective of a registry, the most important thing is that these identifiers are case- and diacritic-insensitive. I think the exact behavior of URL canonicalization is ultimately an implementation detail of Swift Package Manager.

That said, it's an important detail, and I'd certainly welcome your help in making the implementation as robust as possible.

Ready to dive into the weeds for a bit? :wink:

Parsing is always a minefield, but I think the stakes are relatively low for this particular application. Let's take your example of "https://test1@test2@test3/":

You (correctly) point out that CanonicalPackageIdentity incorrectly parses the userinfo component, so the result is an identifier of "test2@test3". This identifier is then used to construct URL(string: "https://test2@test3"), which either returns nil because it's an invalid URL, or fails to resolve. As far as I can tell, that's no worse than having a typo in your URL.

Could you help me understand the practical impact of this kind of bug? What's the worst that could happen if canonicalization working incorrectly?

:clap: Thanks again for pointing that out! I'll open a PR to fix that. I'd love nothing more than for you or anyone else to find a dozen more test cases that we don't handle correctly.

I was actually on the fence about this. At one point, my implementation only removed the default port for the specified scheme (e.g. 22 for SSH, 80 for HTTP, 443 for HTTPS, 9418 for Git). Can you think of any reason why we should / shouldn't do that instead? IIRC, I got as far as the protocols supported by Git, but decided against it after thinking about which other SCM protocols to support.

I can't find my original source, but I recall this being some behavior of SCP that was inherited by Git; I also recall being surprised that ~mona was the expanded form. Anyway, we can take it out if a reasonable citation eludes us.

At least for the registry interface, I'm confident that these go through Foundation.URL.init(string:), which performs its own validation. For repositories, I'd have to check.

But I take your point that this could use another validation step at the end.

You're totally right — that was an oversight on my part. I'll tack that onto the aforementioned PR.

You're correct in pointing out this distinction. "Actually" was the wrong word here; what I meant was "effectively". For the purpose of canonicalizing package identities, so that we can match ones that are the same save for insignificant differences, I think our treatment of paths as file URLs is acceptable. Identifiers for local packages (which have file paths or file:/// URLs) are treated differently than remote dependencies.

IDNs are the next big thing I'd like to tackle. I have a (perhaps overstated) fear of homograph attacks. I'll take a closer look at how disallowed characters are / should be handled.

Despite their complexity and baggage, I still think URLs are the best solution to the problem of identifying packages. I think the idea that https://github.com/mona/LinkedList and ssh://git@github.com:mona/LinkedList.git are the same aligns well with user expectation, and that the process of HTTP content negotiation provides a convenient mechanism for allowing users to upgrade to a more secure system without changing their code.

We can (and should) take every opportunity to make our canonicalization function robust and correct, and limit the impact of anything that manages to slip through the cracks. That said, I don't think adopting URLs here poses a significant security or logistical threat.

2 Likes

I've forked the benchmark to create a 'real' world app with a large number of dependencies if people want to use it to test

3 Likes

Could that interface be implemented by companies/developers by themselves? So that they can have their own private register service? Basically what if people don't want to use Github?

1 Like

Yes! Because this is an open standard, any developer can build and deploy a Swift package registry that hosts projects of their choosing.

Thanks for your reply Matt. I am curious if swift package itself can act as a registry service where you can have multiple local sub-packages in a top level package and top level package chooses which ones should be distributed publicly or privately. That way you don't even need to implement a separate registry service.

I'm not sure if I understand what you're describing, but I'll point you to this thread, which discusses different ways that Swift Package Manager could support projects containing multiple packages, including the package registry interface.

Following up on this: You can find the OpenAPI specification here.

I'm very interested to hear your thoughts about my responses to your posts in this thread. Please also take a look at this thread discussing the decision to use URIs / URLs to identify packages.

I just submitted a PR to apple/swift-package-manager with fixes to the issues you identified, as well as some new logic for normalizing Windows paths. If you have time to take a look, I'd really appreciate any feedback you have.

This first review for SE-0292 has concluded and the proposal was returned for revision.

Thank you to everyone for the feedback and contributions to this proposal.