Can we talk about the API surface?

dabrahams · March 21, 2020, 6:48pm

I'm very happy to see the ArgumentParser library; it's much-needed and is going to make coding in Swift much better for everybody. The aggressive use of property wrappers and mirrors to avoid repeating needless boilerplate is perfect for this use case.

That said, I can't get get over some of the design choices on the API surface, primarily naming, which I think could be improved a lot, so I wanted to bring up some of the issues here. Starting with the introductory example,

import ArgumentParser

struct Repeat: ParsableCommand {

The name ParseableCommand implies that an instance of this thing is a command that can be parsed, but that's not the case at all. Digging in the documentation for a case that actually uses an instance method, I find parseOrExit(). That makes the type, if anything, a parser, rather than a thing that is parseable. And you can see the harm of thinking of it as a command in the Repeat.main(), which reads like it's going to… repeat “main.”

Now, in programming there are many uses for components that “parse commands,“ for which this library would be wholly inappropriate: an interpreted shells such as ZSH or a handler for a network protocol such as AMP or IMAP are examples that spring to mind. Also there is a large category of things called “commands” that have no overlap with what this is doing. No, this is a parser for a particular category of command syntax, typically known as a “command line.” Those four characters, Line cost little and make the domain quite clear. This leads me to the much more straightforward-sounding name CommandLineParser. And, what do you know, the standard library already has a pseudosubmodule enum called CommandLine for facilities related to command lines. So not only is the use of “command line” precedented, but it even suggests a home for some of these components, in extensions to CommandLine.

Now, I fully appreciate the idea that something is being declared, here, and we might want to emphasize its declarative nature in the naming. Personally I don't think we lose much by saying we're “declaring a command line parser,” but if you don't like that, clearly we're declaring a “command line syntax.” AFAICT there's no sense in which we're declaring a thing that is to be parsed, if we're not “parsing the syntax.”

    @Flag(help: "Include a counter with each repetition.")
    var includeCounter: Bool

    @Option(name: .shortAndLong, help: "The number of times to repeat 'phrase'.")
    var count: Int?

    @Argument(help: "The phrase to repeat.")
    var phrase: String

As a first-time reader of this code, the differences between a @Flag, an @Option, and an @Argument weren't obvious to me. These terms are all commonly used interchangeably to mean the same thing: the general category of tokens that go on a command line after the first one, which typically names an executable. Given that this library is ArgumentParser, is there something special about the @Argument, such that it gets “parsed” and the others don't? I promise you if I work on a command line tool of any substance, by the graces of this good library I'll spend a very small percentage of my effort actually writing the command-line specification, and by the time I have to come back and add new… options(?)… I'll have forgotten the differences.

It turns out that (I think):

@Flag is a labeled argument with no parameters
@Option is an argument pair consisting of a label and a parameter
@Argument is a positional argument, with no label.

[I'm using “label” here in a way analogous to Swift's “argument label” distinction, which is followed by a colon; on the command line labels are indicated by a leading dash or pair of dashes.]

Most sources I can find agree that these terms are—and will always be—ill defined, though the one precedent I can find that attempts to draw a distinction calls @Option a “flag” and @Flag a “switch.” Surely we can do better just by picking terms that are more explicit, e.g.

@Singular is a labeled argument with no parameters
@Parameterized is an argument pair consisting of a label and a parameter
@Positional is a positional argument, with no label.

? These of course are not perfect, but at least they give a hint as to the differences. I'm not at all attached to these, have had a few other ideas, and hope that together we can come up with even better ones, so please take them as a starting point rather than a straw man.

    func run() throws {
        let repeatCount = count ?? .max

        for i in 1...repeatCount {
            if includeCounter {
                print("\(i): \(phrase)")
            } else {
                print(phrase)
            }
        }
    }
}

Repeat.main()

A few things about this:

Presumably run doesn't need to be labeled throws?
I am unconvinced that there's a big win in gaining unqualified access to the parsed members at the point where the command is actually being executed. If the command structure is complicated enough that we are accessing the value of many parsed parameters, the program structure is probably also complicated, and at that point I imagine qualifying access (e.g. args.repeatCount, args.includeCounter) is actually of great readability benefit. If there are only a few accesses to the parameters, qualifying access does little harm.
Asking people to satisfy a requirement (run()) and then invoke a wrapper (main) around something that does a simple check and calls the requirement seems like a frivolous bit of complexity for examples like this, and makes it easy to forget to call main. Was something like this considered instead?
```
CommandSyntax.parseAndRun { args in
    let repeatCount = args.count ?? .max

    for i in 1...args.repeatCount {
        if args.includeCounter {
            print("\(i): \(phrase)")
        } else {
            print(phrase)
        }
    }        
}
```
I think I understand the motivation for run when it comes to complicated systems with multiple subcommands, though I haven't explored alternatives and there may yet be a better approach.
Calling the thing I am supposed to invoke “main” seems like an unnecessary reference back to C from a Swift world where users don't write main() functions, and it's not particularly descriptive of what's happening. What's the rationale?
I think run() is probably what is motivating calling these things “commands” instead of “parsers,” but to me it seems to cause more trouble for the design than the benefit it brings.

Thanks for listening,
Dave

nnnnnnnn · March 22, 2020, 5:47am

Thanks for your comments here, Dave — we don’t consider the API naming final at this point, so it’s very helpful to hear this feedback!

The intent behind the ParsableCommand name is that Repeat is a command that can be parsed from the command-line arguments, like you might parse an integer from a string. All of the actual parser machinery is internal to the library at this point. There are a handful of static methods (including parseOrExit) that create a ParsableCommand instance by parsing an array of strings that you provide, or by parsing CommandLine.arguments when you don’t pass an array.

We definitely found the same ill-definedness in our own research. There were a variety of names considered for these concepts before release; we felt the current ones are both succinct and clear once you understand what each represents. Since there isn’t an industry-wide consensus around these names, we’ve tried to define a set that works well as a group and is easy to work with.

The one that I personally go back and forth on is @Argument, which is certainly a bit overloaded. Since the library is called “ArgumentParser,” it stands to reason that everything it parses is an argument. At some point that property wrapper was named @PositionalArgument, and was shortened to match the other two names more closely. Perhaps it would be better to reverse that decision, though I still don’t like the mismatch that creates with @Option and @Flag.

In any case, hopefully any users that don’t readily understand the current set will turn to the library’s documentation, which we’re always working to improve. (Your input is of course welcome there as well!)

About your last several points, I’d ask you to delve further into the way the commands and subcommands are declared when using the library. Making it straightforward to build nested commands by just defining types is a key part of the design and is behind a lot of the decisions you’re asking about here.

Looking at the unqualified member access question and the closure-based parseAndRun suggestion — having run() be an instance method of a command means that the author only needs to reason with the actual properties that are relevant to that command. By making even the simplest commands adhere to this model, things don’t get more complex as you move from a single command to having a nested tree of subcommands.

Note in particular that main() isn’t necessarily performing a trivial check, and isn’t necessarily calling run() on the same type. The library does the work of figuring out which command or subcommand to run and guaranteeing that the command-line tool’s user has supplied all the correct inputs.

scanon · March 22, 2020, 2:01pm

One consideration for me is that @Argument, @Option and @Flag all seem more closely tied to argument parsing than @Singular, @Parameterized and @Positional, which are all extremely generic terms.

Nothing about @Singular makes me think it has anything to do with argument parsing; I can think of lots of other purposes that someone might want to overload @Parameterized for, which means that I expect people would have to use the qualified names more often with the alternative proposal here than with Nate's existing names. That's not a deal breaker, but it's worth considering (I personally kind of like the clarity of @ArgumentParser.Xxxxx, but I'm more tolerant of explicit naming than some people are).

jberry · March 22, 2020, 2:23pm

I don’t love that the names @Argument, @Option, and @Flag are already so generic. I rather like the idea of making them a bit longer and more explicit. Calling them all arguments has an appeal, especially as this is the argument parser, after all!

Given that, I like @PositionalArgument and @BooleanArgument. It’s less clear what a replacement for @Option would be to fit in... perhaps just plain @Argument would serve, however, as the down to earth workhorse of the bunch.

anandabits · March 22, 2020, 2:54pm

Agree. While proposed names aren't perfect I prefer them to the suggested change, which I find even more foreign. I understand the intent of the suggestion but I think it makes more sense to stick with terminology that is used colloquially in the command line domain.

I agree with Dave that prioritizing concision over explicitness will make it harder for infrequent users to remember which is which and creates some cognitive friction in reading code when you're not immediately familiar with the distinctions. Option-click Xcode is ready to remind you, but readers are not always using Xcode.

Agree, I think the primary point of potential confusion is between @Argument and @Option. @Flag seems more clearly established (in my mind anyway). What if we had @Argument and @LabeledArgument. I think this is much more clear than @PositionalArgument and @Option, especially in Swift where we are very familiar with "arguments" and "labeled arguments". It's not as concise, but is immediately clear.

I think trying to fit all of the attributes into single word names is great if it works out, but should not be prioritized above clarity.

BigSur · March 22, 2020, 3:51pm

@argument @option @flag were well designed and easy to remember, with single world but
concise and expressive. No need to change.

dabrahams · March 22, 2020, 11:23pm

Yeah, it's not like that intention was lost on me. But everything one puts inside Repeat, aside from the run() method, is about how the command line should be formed by, and described to, users. I can't imagine, having already gathered so much code to describe the command syntax in this one type, wanting to keep any substantial command logic here. Separation of concerns, and all that.

That said, I see two ways to look at choosing this name:

The library is about parsing command line syntax and dispatching to one or more handlers, so the thing called Repeat is basically a grammar for that syntax. In that sense, it can be parsed, but there's really no point in saying a grammar can be parsed, so I'd be considering names like CommandLine.Syntax if I was taking this angle (and I'd be renaming run() somehow appropriately, maybe onParsed(), dispatch(), recognized()…)
The thing we're defining really is a “command:” it's a perfect example of the command pattern. But then, is it parseable? When viewed as a command, the parsing has already happened. If you want to introduce words that explain the declarations that end up forming the bulk of the code, CommandLine.Command seems like a great way to do that.

That's true because of the command line context. The notion of a “command line argument” is well-established and well-defined, and these have already been determined for us by the OS by the time our application starts running, and yes, they include the things (flags/switches/options/whatever) that start with hyphens. That's why I am fully in agreement with your discomfort about the use of @Argument

As you may know, I strongly believe comprehensibility is more important than an aesthetic concerns. In fact, IMO there's only a point in striving for consistency because sometimes it serves comprehensibility—and this isn't one of those cases.

That said, I am still hopeful we can reach a solution that combines consistency with comprehensibility. How bad would it be if all the names were clear and a bit longer? It's not like this is a DSL for a domain like regular expressions, linear algebra, or user interface construction, where information density is important because we need to be able to easily discern relationships among the elements. These things basically don't interact with one another, and the property wrapper name will typically get a whole line unto itself:

@StandaloneFlag
@ParameterFlag
@PositionalParameter

I used “Parameter” here because the things being declared are not "arguments:" the arguments are the things passed, and parsed, on the command-line. This may seem like a trivial difference, in part because many people don't know the difference between parameters and arguments, but Swift has already made conscious terminology decisions on the basis of this distinction: in function signatures we have “argument labels” and “parameter names.”

I'd rather not give up that easily. Being 100% honest, by the time I started writing this post I had already forgotten the difference between @Option and @Flag, and I was just using these names yesterday.

Thanks, I will. Now that I have some name ideas I'm comfortable with, I'd like to see how some of the more sophisticated examples look when using them. Are there specific examples you'd suggest for me?

glessard · March 23, 2020, 7:16pm

The current @Argument can be defaulted, therefore cannot be called required.

glessard · March 23, 2020, 7:19pm

I agree completely with this. I have chosen to use parseOrExit(), then passing the result to the function that runs the tool. That feels much more correct to me -- for some definition of correct that I happen to like.

Doug · March 28, 2020, 3:55pm

Could @Option be called something along the lines of @LabelledArgument? That would then make the difference between the first two more obvious.

@Argument
@LabelledArgument
@Flag

Edit: @NamedArgument might make sense given the name: parameter in its initialiser.

dabrahams · April 4, 2020, 7:09pm

@nnnnnnnn Gently bumping this thread. Everybody's busy and the world is upside-down, but I think the questions and ideas I've raised here are worthy of a more substantial exchange. If you disagree, let me know and I won't press further.

deggert · April 5, 2020, 9:36pm

Great to have this discussion.

I agree. But he problem is that Repeat is two things: It’s the grammar, but it’s also the result of parsing. CommandLine.Command seems like a good name then, while CommandLine.Syntax doesn’t seem like a good fit.

The original thinking was:

anything that gets passed on the command line is argument
something that starts with -- is an option

With that we had @Option and @PositionalArgument and the latter was (unfortunatly?) shortened to @Argument where it should probably have been shortened to @Positional? @Flag is certainly odd here (although) I didn’t think about it at the time.

Question: Should the property wrapper be a noun or an adjective? If we can make them adjectives, we can leave off Parameter and Flag etc. -- all of which are always going to be odd.

I think that “Parameter” and “Argument” are still very close and e.g. @PositionalValue might be easier to understand, although I’m not happy with that wording either.

And I find both @StandaloneFlag and @ParameterFlag to be just as confusing as the current names. The properties that you would use this on are not flags.

All of this comes from the duality: All of these are both decriptions of how to parse and they’re the property that will hold the parsed value.

As other have said: It’s great to get this discussion going, and there’s certainly room for improvement.

nnnnnnnn · April 6, 2020, 6:54pm

@dabrahams Happy to continue discussions!

As it stands right now, I'm not convinced that the proposed alternatives are an improvement over the ones currently in the library. This is the particularly the case since we're still exploring what the exact API surface area needs to be, with the potential for both adding to and removing from the group of property wrappers that we've been discussing.

I'd missed this question before, my apologies! When it comes to the points in your first post, I'd suggest looking at examples or adoptions that work with multiple subcommands, as that informs the design of the Parsable... protocols. Here are a few suggestions:

Commands and Subcommands from the documentation, plus the math command-line tool it describes.
The swift-package command from the adoption of ArgumentParser in SwiftPM (full PR here).
The swift-format tool, which has migrated to a new command-based syntax while keeping its old syntax available as a legacy command.

In all of these, note that the tool's code doesn't need to perform any switching or casting of the result of parsing the command-line arguments. The library handles all of that, so writing a each part of a deeply-nested command is as straightforward as writing a simple command like Repeat.

BigSur · April 6, 2020, 10:10pm

I like current naming convention.
Command line app using command args... -opts value -flag to indicate the common usage pattern of cli.
@argument for value only, @option for key and value, @flag for key only; were well descriptive to these scenarios.
Stay using them @aof .