I've mentioned it before, and I'm going to mention it again - there simply isn't enough time to give this an adequate review.
A formal specification for the package registry interface is provided alongside this proposal. In addition, an OpenAPI (v3) document and a reference implementation are provided for the convenience of developers interested in building their own package registry.
Where? I don't see them linked in the proposal document or announcement post. I spent a couple of days considering and researching the proposal before I even found that document. I don't have the time to review this proposal + the formal specification in 8 days, together with participating in the multiple ongoing concurrency reviews and proposals soon to be reviewed, plus everything else I have to do in life. I don't know if this kind of crunch and lack of work-life balance is normal at Apple, but it's really not acceptable to force it on the community and I'm going to continue to speak out against it.
It's not only bad for the community (both for its members and as a whole), but I fear this effort to rush proposals through before the end of the year and without proper scrutiny is going to lead to a reduction in quality. I hope the core team will accept responsibility for the consequences.
That being said. Here are a couple of points:
1. Ban credentials in URLs
Firstly, RFC3986 deprecated the password component 15 years ago:
Use of the format "user:password" in the userinfo field is deprecated. Applications should not render as clear text any data after the first colon (":") character found within a userinfo subcomponent unless the data after the colon is the empty string (indicating no password). Applications may choose to ignore or reject such data when it is received as part of a reference and should reject the storage of such data in unencrypted form. The passing of authentication information in clear text has proven to be a security risk in almost every case where it has been used.
The same RFC warns that credentials present a major spoofing risk:
Because the userinfo subcomponent is rarely used and appears before the host in the authority component, it can be used to construct a URI intended to mislead a human user by appearing to identify one (trusted) naming authority while actually identifying a different authority hidden behind the noise. For example
ftp://cnn.example.com&story=breaking_news@10.0.0.1/top_story.htm
might lead a human user to assume that the host is 'cnn.example.com', whereas it is actually '10.0.0.1'. Note that a misleading userinfo subcomponent could be much longer than the example above.
Secondly, major browsers have stopped supporting credentials in URLs:
- Internet Explorer stopped it back in IE6 (yes, in at least this respect, IE6 was ahead of its time. You do not get to say that very often).
- Safari stopped supporting them in iOS 11. I can't find any official announcement or release notes (or even which version of Safari shipped in iOS 11), but a web search shows that customers noticed. I just tested it by visiting
https://guest:guest@jigsaw.w3.org/HTTP/Basic/
(a W3C test server), and even though the URL contains credentials, Safari seems to drop them and asks you to enter them manually.
- Chrome is trying their best (status and spec discussions). I think the current state is that they ignore credentials in subresource requests but allow them for top-level navigation, or something like that.
I saw the GitHub examples in the proposal, which was a bit surprising (why are GitHub helping to resurrect this relic?). I eventually found this post from 2012 explaining it.
Now, I'm not a web developer, and I'm not going to pretend to know anything about OAuth and the best practices for using it, but this seems like a pretty irresponsible feature for GitHub to be supporting. Here's what they say in the blog post:
If you’re cloning inside a script and need to avoid the prompts, you can add the token to the clone URL:
git clone https://<token>@github.com/owner/repo.git
or
git clone https://<token>:x-oauth-basic@github.com/owner/repo.git
Note : Tokens should be treated as passwords. Putting the token in the clone URL will result in Git writing it to the .git/config
file in plain text. Unfortunately, this happens for HTTP passwords, too. We decided to use the token as the HTTP username to avoid colliding with credential helpers available for OS X, Windows, and Linux.
So they tell you that this feature is designed for scripts, but to watch out because your token will be stored in the git config file in plaintext -- ignoring the fact that the token is already stored as plaintext in the script (or in our case, the package manifest). Doesn't make sense to me.
A quick look at the OAuth spec mentions that tokens can be put in to URL components (they use the query component as an example), but seems to discourage its use:
Because of the security weaknesses associated with the URI method (see Section 5), including the high likelihood that the URL containing the access token will be logged, it SHOULD NOT be used unless it is impossible to transport the access token in the "Authorization" request header field or the HTTP request entity-body. Resource servers MAY support this method.
This method is included to document current use; its use is not recommended, due to its security deficiencies (see Section 5) and also because it uses a reserved query parameter name, which is counter to URI namespace best practices, per "Architecture of the World Wide Web, Volume One" [W3C.REC-webarch-20041215].
And here is the Section 5 they refer to:
Don't pass bearer tokens in page URLs: Bearer tokens SHOULD NOT be
passed in page URLs (for example, as query string parameters).
Instead, bearer tokens SHOULD be passed in HTTP message headers or
message bodies for which confidentiality measures are taken.
Browsers, web servers, and other software may not adequately
secure URLs in the browser history, web server logs, and other
data structures. If bearer tokens are passed in page URLs,
attackers might be able to steal them from the history data, logs,
or other unsecured locations.
Given all of this, I would propose that we outright ban credentials in registry URLs, and I'd actually go one step further and ban them from all package URLs.
2. Underspecified components
The proposal mentions certain conditions for registry URLs, but not all components are specified. Can registry URLs contain query strings?
Note: I had more to say here, but after discovering the "formal specification", my questions have changed. Where does the package name in the GET /{package}
actually come from? Is its character set limited? Why do the examples in that document all say things like GET /github.com/mona/HashMap
? How does the client know that the package sources are hosted on GitHub? Is github.com
part of the package name?
Lots of questions. No time . Give us a proper review duration and I'll write them up.
3. My suggestion
I've spent the last ~8 months diving deep in to URLs (it's not all I've been doing, but it's one of the things), and the closer you look at them, the worse they look.
There's a lot to say about the deficiencies of URLs, but luckily I can recommend a couple of videos instead (one short, one long).
-
HOW FRCKN' HARD IS IT TO UNDERSTAND A URL?! - uXSS CVE-2018-6128 (15 mins). Explains a universal cross-site scripting bug that affected WebKit on iOS. I haven't looked in to the bug in detail, but either there were different URL parsers in the OS which saw different hosts for the same URL, or there was an idempotency bug (parsing -> serialising -> parsing the URL changed its meaning).
-
A new era of SSRF (47 mins). A now famous talk by Orange Tsai about how to exploit quirks in different URL parsers. It includes this slide, demonstrating how each of the common parsers used by Python sees a different host from the same URL. It's so beautiful I'm thinking about getting a poster made of it:
FWIW, we have similar issues...
import Foundation
let url = URL(string: "https://test1@test2@test3/")!
print(url.host) // Optional("test2@test3")
print(url.standardized.host) // Optional("test2@test3")
Safari and most other browsers will consider the host to be test3
. The older URL spec that Foundation's URL type follows left this case ambiguous. Newer URL standards have tightened their definitions, so Foundation has been left with nonstandard behaviour that could open the door for misunderstandings and exploits. Presumably SwiftPM would use Foundation's URL support and inherit its quirks.
And then you consider lower levels - I've heard that Foundation's networking is built on cURL, but how does cURL parse URLs? Does it always agree with Foundation? Again, see Orange Tsai's talk about the cURL maintainers' approach to this. There are plenty of times when your URL library says the host is x
, but then you make the request and it goes out to some other host y
.
There have been so many attempts to standardise URLs over the decades, and none of them have really worked. Applications diverged and added special behaviours (or had bugs), which people relied upon, which caused them to spread, which defeated the standardisation effort. The WHATWG has had to reduce its ambitions with the latest spec: it's now a living document (things can change at any time as new quirks are discovered), and one of the main goals now is just to document reality, not to enforce best practices. In fact, this is literally a quote from the latest standard:
The application/x-www-form-urlencoded
format is in many ways an aberrant monstrosity, the result of many years of implementation accidents and compromises leading to a set of requirements necessary for interoperability, but in no way representing good design practices. In particular, readers are cautioned to pay close attention to the twisted details involving repeated (and in some cases nested) conversions between character encodings and byte sequences. Unfortunately the format is in widespread use due to the prevalence of HTML forms.
So if we are going to build something better than our current solution - more secure, more robust - we should take another look at whether we even need to use URLs at all. Let's consider the information packed inside a URL, and whether we actually need it:
Component |
Needed? |
Scheme |
- always https |
Credentials |
- bad idea, officially deprecated |
Hostname |
|
Port |
- always 443? |
Path |
- maybe? I assume the package name comes from here somehow |
Query |
- I assume they are not supported |
Fragment |
- doesn't even get sent to the server |
So out of everything URLs can do, and all the quirks they have to support, we basically only need 2 things: the hostname of the registry, and a package name (and a port if we want to support nonstandard ports).
Just ask for that data as 2 parameters, keep it separate, and you'll avoid all of those funky URL issues and their associated security vulnerabilities. There are no lower levels (like cURL) which can misinterpret which part is the host, and since we're now explicitly taking a package name instead of a path component (or wherever else that package name comes from), we can add semantic meaning as required (e.g. case insensitivity):
.package(name: "mona/LinkedList", registry: "GitHub.com", from: "1.1.0")
|
This is an opaque string, not a 'path'
It means that instead of automagically "upgrading" normal dependencies to use the registry, users would have to change their package manifests, but it would give us a much more robust system than we have now or the one proposed.