Pitch [stdlib]: Command Line Argument Parsing

That's true. And furthermore, the big selling point for most people is the compiler integration that autogenerates it for most cases.

Even without that, Codable on its pure form is about "visiting" the properties of a type and being able to work with that. As pointed out, the specifics decoders are separated concepts.

A long way of saying that for me Codable (without the encoders/decoders) feels more like a language feature. Then you have the encoders/decoders as libraries (in foundation in this case). This and other pitches don't seem like that at all, as is shown by using "batteries included".

Don't get me wrong, I'm onboard with making things easier and getting more batteries. I just think that if we just accept specific things in the stdlib we don't really solve the problem. We should get into a point where spm + index is easy enough to work with that nobody really wants to pollute the stblib with more stuff.

3 Likes

For the purpose of this discussion, this is a distinction without a difference. These packages would still be constrained by the standard library evolution process, limited by its backwards compatibility requirements, unable to be versioned separately, and will strangle out substantial amounts of alternative development effort. The fact that they are not part of the default namespace (the only real difference here) does not really address any of the core concerns raised above.

Discoverability never solves this problem. There is always going to be a subset of people who want the standard library to grow to include more features. I say this from experience in the Python community, where discoverability is better but people still regularly propose to expand the standard library.

Discoverability does reduce the frequency of this issue, however, and is a worthy effort to consider. Again, though, it takes many forms. A motivated community member could begin maintaining a community-supported list or equivalent system that could help, perhaps beginning by scraping GitHub for Package.swift files.

As the comments below have demonstrated, Codable is a bad example here because it is agnostic to the JSON implementation. This means it encourages alternative implementations: if you don't like Foundation's JSONEncoder then you can go ahead and write your own, and the entire Swift ecosystem will keep working. That's great, it's really powerful! But it's not comparable to this proposal.

IMO, the Swift standard library is at its best when it provides "currency" data types and protocols. These are things that provide the boundary between different components, ensuring that different frameworks and libraries can be glued together without worrying too much about expressing the transformation. To me, command line parsing does not achieve this.

3 Likes

Well, most of these concerns are really non-issues in a lot of the cases. I'd rather my SHA1, compression, JSON, ... libraries to move slowly and not break things. And I don't find competition in the space of e.g. SHA1 digests to be meaningful. I'd rather we had collaboration than competition. The only controversial entry here is probably HTTP networking which is also Python's biggest failure. But Go doesn't seem to have this problem, so it seems to be more of a problem with how Python was managed than the model itself.

In turn, your answer doesn't address any of the issues that I raised, namely our software dependency problem [1]. What happens when libraries disappear, when sole maintainers get bored or move on, when malware is added (all of those have happened in the NPM world). More generally how you are asking for production apps and services to rely on and trust the work of tens (or hundrends) unpaid volunteers.

[1] research!rsc: Our Software Dependency Problem

1 Like

That's nice. Many do. Crypto libraries are not all born equal: quite aside from the risks of attack, there are substantial and meaningful performance differences between crypto libraries that mean that something perfectly suitable in one context would be unusably slow in another.

This happens to large standard libraries too. Some of the corners of those libraries grow musty and old over time as developers get bored and move on, and while those corners are technically part of the larger whole, if no remaining developer feels comfortable in that codebase then that code rots just as surely as the abandoned open source thing. The difference is only that the rot is slower, and less noticeable, because you cannot trivially see from a git history that the subproject is essentially abandoned.

I am doing no such thing. Your characterisation of my position is wrong on several fronts.

Firstly, I do not believe open source maintainers should be working for free. I have been involved, for years, in efforts to monetise open source software development for maintainers, and my opinions on the matter are public and well documented. I consider it a profound and embarrassing failure of our industry that we have failed to compensate open source maintainers appropriately.

Secondly, I am not asking that production apps and services rely on or trust the work of unpaid volunteers. It is not mandatory that a production app use a library: you can write your own. If you would like to depend on something that has only got paid maintainers, there is a very simple solution: hire some developers and pay them to write it for you.

Thirdly, it is disingenuous to claim that if something is in the standard library it is not supported by unpaid volunteers. The Swift open source project already contains substantial amounts of work done by unpaid volunteers. As Swift grows and has increasingly large numbers of non-Apple contributors, unless we can resolve the "paying for OSS contribution" problem you will inevitably have volunteer contributors in that mass. Providing more work for the stdlib team to do does not resolve this problem.

You are right that I have not addressed the dependency problem, but putting things into the standard library doesn't address it either. All it does is change the tradeoffs.

4 Likes

Some work has already been done in this area, though very limited. A REST API built on Vapor exists here: GitHub - vapor-community/PackageCatalogAPI: A replacement for the IBM Swift Package Catalog., and is hosted at https://package.vapor.cloud. An example for getting the package information looks like this: https://package.vapor.cloud/packages/apple/swift-nio. This API also supports package searching. A front-end for the API is probably the next step in the process.

2 Likes

How many though? If we survey python code in GitHub what percentage do you think would be using something else than the builtin hashlib to generate hashes? More than 0.1%? My guess would be much lower, but I would be interested to hear your perspective.

Well, we now have the hindsight and can learn from Python's mistakes. For example Go (granted a much younger language at 9 years old) doesn't have these problems. And even Python managed to clean up its library considerably recently (compared to the v2 situation).

I'm sorry, I didn't mean to imply that you are some kind of cartoon villain. I'm saying that the free market approach will lead to this, even if this is not the intension. Obviously this is just my opinion, I don't claim to have prescience.

So to develop a game or a TODO app I need to write my own compression and networking libraries? Or hire a couple developers to do it instead? This probably works great if you are Apple or Google, but it is not really sustainable if you are an indie developer or a small company. But the reality is even worse as it turns out even big companies do that, here is for example what the Nest app is using: Analysis of the Nest app for iOS

Sure, but since I'm using Swift, I'm already trusting that any code I download from swift.org is (to a reasonable degree) reliable, performant and trustworthy. There are a lot of resources put in testing, QA, etc. I'd rather not expand this trust to 10 more solo volunteers that are working on 3rd party libraries part time.

My view is that it solves it to a very reasonable degree. Let's try an example. Let's say I want to use gzip compression in my app. What options do I have? These are the top 5 results I got:

Meanwhile Python and Go have reasonably maintained implementations in their standard libraries:

Which situation do you find preferable, Swift's or Python/Go?

3 Likes

This is not an example that works too well for you because it's exactly my point. The vast majority of users use hashlib, even in circumstances where they should be using something better. For example, most web frameworks in Python use PBKDF2-based password hashing because that's what hashlib has, even though scrypt or Argon2 would be better choices.

The standard library acts as a dragging force in this way.

Go is a great example here, because it has a large and very good standard library. However, the reason for this is just straightforwardly the result of investment. Go's core standard library has been heavily invested in, and as a result has a lot of great quality implementations in it.

However, I am profoundly uncomfortable with assuming that that investment will continue if we keep adding things to the Swift standard library. Each extra thing in the stdlib is a requirement for extra ongoing maintenance investment, and if that investment is not made then there are ongoing harms associated it. Building up a wider ecosystem of third-party packages distributes that risk, and makes it easier to supplement.

I don't understand your argument here. My understanding of your argument is that, by and large, third-party packages are of low quality and under maintained, and that is unacceptable for production services. Given that you're arguing in favour of putting things in the standard library, my assumption (and it is only an assumption, please correct it if it's wrong) is that you believe the things in a language standard library are of high quality and maintained.

But I don't think the second case follows from the first. If there is someone who can maintain a stdlib quality implementation, why can't they do that outside the stdlib? What is magic about the standard library that does not apply to third party packages?

I think this is the fundamental source of our disagreement: I don't think the standard library has any special status in terms of being well maintained or high quality by virtue of being the standard library. In the case of what the Swift standard library currently covers, I believe it is both well-maintained and high quality. But there's no guarantee of that continuing.

I think this is the fundamental source of our disagreement. But I want to hit on something for a moment, because it's key. We both believe that the Swift standard library is good software right now. Your response to that appears to be to want to make the standard library cover more things, because that will mean more things have good quality implementations.

I have the opposite belief. I believe writing good software is hard, and the fact that software is in a standard library doesn't make it good. My belief is that the bigger we make the standard library, the more we dilute the focus of the standard library teams and run the risk of adding something bad, and once we add something bad it is very hard to undo that mistake. In a world of stable ABIs this is even harder, as you can only ever roll the ABI forwards, not backwards.

Your view is inherently optimistic: mine is inherently pessimistic.

I think the comparison is unfair, and represents a cherry-picked example, and here's why: no-one has proposed gzip for the Swift standard library.

Until someone does that, we're arguing in the world of highly abstract counterfactuals. It's like me pointing out that Go has a first-class X509 library and Swift doesn't, as well as not having any great third-party X509 libraries. Which of those situations is better? Clearly Go's, as having one good library is better than having zero good libraries.

But let me provide a much better example: asynchronous I/O. Python has asynchronous I/O in its standard library, in the form of asyncio and outside it, with a wide range of implementations from Twisted to Trio. Swift has asynchronous I/O outside its standard library in the form of SwiftNIO. Java has both, with NIO and Netty. Which situation would you prefer?

I choose this example deliberately because in my view there is no clear answer. SwiftNIO is not unsupported or fly-by-night. It does not attract less contribution or attention because it's outside the standard library. I would strongly contest the idea that it would be any better than it currently is by being in the standard library: in fact, I'll make the case that it has grown and improved much more quickly by being outside the standard library than inside it.

I don't think this situation is black and white. There are times when things should go into the standard library. But I think the assumption that things are better inside the stdlib than outside is not universally valid.

7 Likes

I just want to add a general +1. I think SwiftNIO is a great example for another reason: it's not part of the stdlib, but its development is directly supported by Apple. That gives the best of both worlds: a team focused on just one library, which has one focus, and the backing of Swift's creator.

Unfortunately, this model doesn't scale, because there are only so many such libraries that require a dedicated development team and that Apple is willing to support. A gzip library is essentially a one-off. Another example might be a port of libsodium to Swift, but I don't know if anyone's working on something like that.

Actually they have. :wink: (and it was fairly recent)

This is why I've proposed a Boost-like library/working group which would work as an umbrella for the various functionalities that would be useful to have a single "blessed" implementation but that don't belong in the standard library.

Things like CLI parsing, compression, crypto, etc could all be contained there. We get the benefits of a blessed library which is easily discoverable and is still subject to an evolution process, thus garnering wide support of the community and reducing the risk of it falling into disrepair (it's still open source and ideally the community at large would be dependent on it which would bolster its development) while also avoiding bloating the standard library and becoming locked into the stdlib development cycle.

2 Likes

Fair enough, allow me to rephrase.

No-one has seriously proposed gzip for the standards library. In this model I would define a “serious proposal” as one where the proposer has volunteered themselves to write and maintain that code.

Proposals of the form “the stdlib should do X” where the proposer does not also volunteer to contribute to doing X are not super serious. It’s very easy to think that things I don’t have to do would be good ideas and should be done!

Sorry, I’m on my phone so am unnecessarily brief. I don’t think proposals without code are bad: getting an idea of community interest is worthwhile. They just also have a low barrier to entry: floating an idea is not the same as committing time and effort to it.

1 Like

Hey! Author of the pitch here and just wanted to say I've been watching the conversation and working on a response. I think there are a LOT of different threads and ideas here to address so I want to be thoughtful (and thankful) regarding everyones responses.

I also recognize in lieu of an implementation or proposed API it might make it harder to understand just how much it proposes adding to the standard library which is affecting the quality of our discussion so I hope to clarify that as well.

I work a lot on tools and scripts written in Swift and started this proposal because I feel like many people including myself would benefit from these "batteries" being included and that by going through the evolution process that we could collectively discuss the merits of existing solutions and land on something really great for everyone.

Regarding this comment, I originally volunteered to implement and would not mind maintaining this solution but wanted to make sure we were in alignment on where this is going before working on it.

4 Likes

Sure! I apologise if it read as though that comment was a criticism of this proposal: it was intended as a note on the gzip proposal it was in reply to.

I don't want to drag this too much, I think you were spot on about the source of our disagreement (optimistic vs pessimistic).

I just want to say that Python started with a really big standard library, including crazy stuff like "Font Manager interface for SGI workstations". A Swift stdlib could be more conservative (closer to Go than Python) and skip the more controversial stuff like async IO etc where it is more difficult to pick a winner.

On the other hand, part of the success of Go's stdlib in my opinion is the more centralized development of the language. The core team is in charge of everything: language, libraries, tooling etc. With Swift the development is more fragmented, with different groups in charge of different areas. So maybe a standard library effort won't be as successful here indeed.

1 Like

Agreed: I think we've both made our points reasonably clear, and I'm certainly happy to agree to disagree. Fundamentally the direction here will reflect the opinions of the community, and the discussion here will hopefully help a lot of folks crystallise how they feel on this issue.

1 Like

You make a point. It would be well for the community to begin to align on the principles that guide the development of the Swift language, its standard library, and its community. For example, I love how the Python community is guided by a set of common principles like:

There should be one-- and preferably only one --obvious way to do it.

Adding standardized and feature-complete libraries to the standard library provides a default and obvious way to do something. Open Sourcing the Swift language, making it available for Linux development, and targeting server application development means that the language is seeking to address different application environments and UIs. For many (most) Linux applications the CLI is the primary UI, and yet Swift doesn't have a feature-complete natural way to do things.

From my observation, it appears that the Swift language and community are taking a batteries-not-included approach to the compiler and standard library, which is okay - it's one way of doing things. This method has served other languages well (JS takes this approach). This approach makes things easier on the language core-contributors and keeps the code base small; however, it often adds to the developer fatigue of the language's community. Lacking one (or a few) standardized and excellent ways to do things, the community often develops tens or hundreds of ways to do things the same routine things. All of them of varying degrees of quality and functionality. Which means we wasted a lot of people's time to do something that a few of us could have done or all of us could have done too much higher quality.

While I agree with the sentiment "the standard library is where packages go to die" (a mangled quote that I credit to Kenneth Reitz), I prefer the batteries-included approach. Having a culture of integrating the everyday things that significant portions of a community want into the standard library creates a joint code base that the community can grow and develop together (fewer rather than many). There are always third-party tools that might do things better, but the standard library provides a baseline of functionality that gets routine tasks done - especially for things like CLI parsing in a language that is targeting server app development.

If not the standard library, there should be a central coalescing location for discovering high-quality packages that do common things. If these projects are structured and managed for the ease of contribution with shorter development cycles, this "blessed" community listing could help us save wasted time repeating work that has already been done and gets us to higher quality and richer functionality faster.

11 Likes

So I just played with a simple category on Array:

extension Array where Array.Element == String {    
func parseKeyValue(prefix: String = "--", suffix: String = "", valueSeparator: String = "=", removePrefix: Bool = true, removeSuffix: Bool = true, separateArguments: Bool = true, endIndicator: String = "--", unlabeled: inout [String]) -> Dictionary<String, String>
}

This parses a command line in an array into a dictionary. Where key is anything that matches prefix and suffix (empty string matches anything).

If there is a prefix and there is an argument that has no prefix, it gets put into the "unlabeled" array (this is for e.g. file paths following the options). All parameters to parseKeyValue() are optional. So the common case is:

CommandLine.arguments.parseKeyValue()

Examples of syntax this can parse are:

["--flag", "--key", "value", "--key2=value", "--flag2", "george"]
    -> ["flag": "", "key": "value", "key2": "value", "flag2": "george"]

["file.txt", "--flag", "--foo=bar=baz", "heyooo!"]
    -> ["flag": "", "foo": "bar=baz"] unlabeled: ["file.txt", "heyooo!"]

Note that you can specify valueSeparator = "" to turn off support for having the key and the value in a single argument (like "--key2=value" above), or set separateArguments = false to turn off aggregating a value in a separate argument into the previous key (like "--key", "value" above), making any such value end up in the unlabeled array instead.

If you change the prefix, you can have a single dash for a key, or a slash like on Windows.

The end indicator (by default "--") indicates a string that, if an argument fully matches it, will cause parsing to stop, that argument to be skipped, and all arguments following it will be appended to the "unlabeled" array. That lets us parse loose file names that start with a "--":

["file1.txt", "--", "--dashedfile.txt"]
    -> [:] unlabeled: ["file1.txt", "--dashedfile.txt"]

Missing features

The only syntax which I mentioned above that this doesn't cover is combined single-letter flags, like in zip -xvzf myFile.zip *. I'm reluctant to add support for that, as the current feature set seems comparatively consistent (simple string parsing to create a dictionary from otherwise-formatted key-value-pairs). That said, one could use "--" as the prefix, and then -xvzf would be appended to the unlabeled array and one could manually unpack it from that.

It also doesn't know anything about your valid keys and values, and therefore the caller is responsible for rejecting dictionary keys it doesn't know, and it will associate any value that doesn't have a prefix with the previous key, even if that key is just a flag ("--warnings-as-errors", "file.txt" would result in ["warnings-as-errors": "file.txt"] unlabeled: [] instead of ["warnings-as-errors": ""] unlabeled: ["file.txt"]).

Anyway, this is just a proporsal for a single method that could be added to the array to provide the basics of command line argument parsing (and anyone who needs more can get a library). It would still be your app's responsibility to actually evaluate the dictionary or print out syntax information.

Or alternately, what about a more parameterized approach, like:

protocol RemainderConstructable {
    associatedtype RemainderElement
    init(remainder: inout ArraySlice<RemainderElement>)
    var key: String { get }
}

struct StringRemainderConstructed: RemainderConstructable {
    typealias RemainderElement = String
    
    var key = ""
    var value: String?
    
    init(remainder: inout ArraySlice<RemainderElement>) {
        if let foundKey = remainder.first, foundKey.hasPrefix("--") {
            self.key = foundKey
            remainder = remainder.dropFirst()
        }
        if let value = remainder.first, (self.key.isEmpty || !value.hasPrefix("--")) {
            self.value = value
            remainder = remainder.dropFirst()
        }
    }
}

extension Array {
    func dictionaryFromRemainder<T: RemainderConstructable>(unlabeled: inout [T]) -> [String: T]
        where Element == T.RemainderElement {
            var result = [String:T]()
            var remainder = ArraySlice(self)
            while !remainder.isEmpty {
                let parsed = T(remainder: &remainder)
                if parsed.key.isEmpty {
                    unlabeled.append(parsed)
                } else {
                    result[parsed.key] = parsed
                }
            }
            return result
    }

    func dictionaryFromRemainder<T: RemainderConstructable>() -> [String: T] where Element == T.RemainderElement {
        var unlabeled = [T]()
        return dictionaryFromRemainder(unlabeled: &unlabeled)
    }
}

var unlabeled2 = [StringRemainderConstructed]()
let parsed: [String: StringRemainderConstructed] = ["file.txt", "--foo", "bar", "--baz", "--boff"].dictionaryFromRemainder(unlabeled: &unlabeled2)
print("parsed = \(parsed) unlabeled = \(unlabeled2)")

The standard library would contain 3 or so structs like StringRemainderConstructed, which all consume one or more elements from the ArraySlice given, but in different ways (each implements a different command line syntax). Then you get back a dictionary of these objects that you can use to read your arguments.

This is still a little clunky, but isn't quite as hard to describe as my previous huge function, and sounds like it might be useful for other uses besides argument parsing.

I wonder if this approach could somehow be used with enums.

Sorry, I didn't follow up all the comments here, as it seems like discussion went to some metaphysical field :slight_smile:

I'm very into this proposal, as I think it is one of the essential part of swift becoming mainstream language for writing command line tools and scripts. Argument parser is not required to be part of the stdlib itself, but it should be part of the macOS, like libdispatch/Foundation/etc. Other solutions would require to add complexity to setup, greatly decreasing the attractiveness of using swift instead of, for example, python. The key here that it don't needs to be ideal, but good enough and work out of the box, and someday, we could slooowly migrate to some new awesome version if needed (as optparse -> argparse).

I've used argument parser from SPMUtility, and I liked it. With some improvements we could get a friendly and typesafe interface to write compact parsers. I'm also using GitHub - dmulholl/argparser: An argument-parsing library for Swift. – whole parser fits in one file only depending on Foundation, and could be copied to project if downloading dependencies is not what you want.

I've seen a comment here about argument parsers in other languages not widely used – I would completely disagree, python's argparse is widely used – over a thousand import argparse in chromium repo, much less but still – about 130 in swift repo, perl's Getopt is used in 190 files in webkit, plus about 40 argparse usage.

Of course, there are some other fields where swift needs to improve it's api, but we better get started :)

1 Like

Since this thread was originally opened, the SPMUtility package and its argument parser have been pulled out into their own repository, swift-tools-support-core.

I'm not sure the complexity is that much more significant nowadays. Swift Package Manager makes it straightforward to depend on other packages like swift-tools-support-core, and the tooling is even better now that Xcode 11 has built-in support for Swift packages.

Now, one area that's less well-supported where command line argument parsing would be frequently used is users who want to write scripts in Swift, since scripts run through the interpreter don't have easy access to package dependencies. The community has already come up with some pretty clever approaches to address that, like swift-sh from NSHipster, and someone could champion that or a solution like it as a language feature to make it official.

I guess a good question to the Swift team (specifically, maintainers of the repository like @Aciid) would be: with swift-tools-support-core now available, do you consider the question "should Swift provide a standard argument parsing library?" to already be answered? And if so, and if the answer to the nested question is "yes", to what degree would it be possible for the design/implementation to evolve and for people to contribute to that?

1 Like