Require parameter names when referencing to functions

flowtoolz · July 19, 2019, 7:21pm

Love it!

I often run into stuff like this:

Would that also get fixed?

Jumhyn · July 19, 2019, 8:04pm

Another reason to support the bare name only for unlabeled arguments is that once argument labels make their way to local variable declarations, it would be pretty annoying to have to write:

let foo(_:_:_:) = bar(arg1:arg2:arg3)

suyashsrijan · July 19, 2019, 8:42pm

I have a PR open to fix this ([Typechecker] Allow initialising a variable with a function with the same name as the variable by theblixguy · Pull Request #23695 · apple/swift · GitHub), just need to address some comments. I’ll try to find some time soon.

Nevin · July 20, 2019, 1:45am

We already have the *option* to provide argument labels when desired.

The ability to omit them is a convenience which people are able to use when they want to, but they are not forced to do so.

Any line of argument related to compilation time can immediately be rebutted with “if the unlabeled version makes an expression slow to compile, then add labels”.

• • •

The matter of:

let foo = foo(x)

Seems a separate issue to me. It should simply work.

karim · July 20, 2019, 1:49am

That’s a poor argument. The design of APIs should not have to consider compilation performance. They should be designed for call-site clarity. If the optimal call-site clarity calls for no labels, then that is the correct API design.

Nevin · July 20, 2019, 2:19am

That is exactly my point. The pitch in this thread would *prohibit* people from writing the best, clearest code in many situations.

The argument for doing so is a weak one, related to the complexity of the compiler.

The counterargument is straightforward: if there is a situation where the compiler is too slow, we already have the ability to work around it in exactly the way the pitch would force us to use everywhere.

The pitch does not gain us anything, it only takes things away.

wadetregaskis · July 21, 2019, 8:40am

It seems a bit odd to allow code to link that would not actually re-compile to the same result. Meaning, if you refer to an unambiguous foo in some library, and that library later adds a second foo (differing only in argument labels), presumably that's binary compatible because the originally referenced foo remains with unchanged mangled name? If so, then new compilation of that exact same client code, against the updated library, should intuitively also work fine (and have the same behaviour; resolve to the same foo as before).

I don't actually care about the source compatibility - in principle it's fine to me that a new version of a library might require some source changes to adopt (no different from enums changing, for example), but if that's true, then logically the library should break ABI compatibility at the same time.

Otherwise, I do like the approach of requiring user intervention only upon actual ambiguity. I don't like the idea of a passive-aggressive compiler, i.e. one that just runs really slowly if it turns out I wrote something unclear to it. That's making a significant developer pain point (compile times) even harder to fix.

If I wrote let fn = count in the presence of func count() and func countWhere(_ condition: ...), nobody would find that ambiguous - I clearly meant the first one. I don't see why it should be any different if the 'where' part happens to be on the other side of the opening parenthesis. If I meant to refer to the other, I should have written its actual name - the compiler is being very nice to me if it let me use the abbreviation count at some point, and that niceness should immediately end as soon as there's any ambiguity (to a human - it's no good the compiler being a genius at divining the only technically valid option if anyone actually reading the code doesn't know what the code's doing). The only question in my mind is whether the compiler should have been so nice to me to begin with.

If there's some other identifier under the name count in scope (e.g. a variable), I'd actually expect the compiler to just error out. It's confusing to a reader to have duplicates like that, and burdening every reader with arbitrary rules on conflict resolution (even if it then makes the compiler's life easy) seems far worse than just requiring unambiguous names.

In the max case, it's not apparent to me why there's two versions in the standard library to begin with - why does the two-argument case need to be specialised over the N-argument one?

And in playing with max briefly, it seems apparent that it's already seriously broken - e.g. this doesn't even compile:

[(3, 5, 7), (2, 6, 1), (1, 0, 2)].map(max)

Yet this does:

[(3, 5), (2, 6), (1, 0)].map(max)

That makes no sense.

toph42 · July 21, 2019, 2:25pm

Wait, what? How?

phoneyDev · July 21, 2019, 3:42pm

Not sure how this is supposed to be affected by default parameters.

Avi · July 21, 2019, 4:00pm

Default parameters only affect call sites, and this pitch is not about call sites.

beccadax · July 21, 2019, 9:23pm

We don't have the option to provide the lack-of-argument-labels when that is desired. For instance, the problem with count(where:) did not concern performance when you actually wanted count(where:); it was about performance when you wanted plain count. The compiler had to consider the possibility that you might have been talking about count(where:), and there was no way to rule out that possibility.

Thinking about it a little, I am somewhat concerned about this. It would be nice if code completions and diagnostics could notice that a candidate matches only in its uncalled state and fill it in with the parameter labels only instead of inserting placeholders for the parameters.

(But keep in mind that point-free style is a mildly advanced feature and users who don't know about it can always write a closure with an explicit call.)

Sure, but map(max(_:_:)) is actually more clear than map(max). The latter could be referring to a different max, like the self.max() and self.max(by:) that would be available in an extension on any Collection type. The former clearly doesn't refer to either of those.

(This is especially important for something like max(_:_:) because it's actually kind of weird that it's a candidate at all.)

Let's be clear here: Omitting the labels doesn't always improve clarity—it improves brevity. Improving brevity can sometimes improve clarity by allowing you to omit unimportant details, but sometimes—arguably more often—it hinders clarity by making the code vague and ambiguous, making it impossible to be completely certain what it means without actually type-checking it. And we already encourage making the names and labels brief, so you're not talking about adding much extra text.

I invite you to consider an example that's very similar to Nevin's:

a.map(String.init)

Do you know which initializer this calls? It ends up depending on the type of a.Element, and in many cases it's ambiguous and already disallowed by the compiler. Here's the list generated by referencing String.init in a file that imports Foundation (I've deleted entries with Builtin types):

init()
init(_ c: Character)
init(_ cocoaString: NSString)
init(_ content: Substring.UnicodeScalarView)
init(_ scalar: Unicode.Scalar)
init(_ substring: __shared Substring)
init(_ unicodeScalars: String.UnicodeScalarView)
init(_ utf16: String.UTF16View)
init(_ utf8: String.UTF8View)
init(_cocoaString: AnyObject)
init(_sel: Selector)
init(cString: UnsafePointer<CChar>)
init(cString: UnsafePointer<UInt8>)
init(contentsOf url: __shared URL) throws
init(contentsOf url: __shared URL, encoding enc: String.Encoding) throws
init(contentsOfFile path: __shared String) throws
init(contentsOfFile path: __shared String, encoding enc: String.Encoding) throws
init(extendedGraphemeClusterLiteral value: Self.StringLiteralType)
init(format: __shared String, arguments: __shared [CVarArg])
init(format: __shared String, locale: __shared Locale?, arguments: __shared [CVarArg])
init(from decoder: Decoder) throws
init(repeating repeatedValue: Character, count: Int)
init(repeating repeatedValue: String, count: Int)
init(stringInterpolation: DefaultStringInterpolation)
init(stringLiteral value: String)
init(unicodeScalarLiteral value: Self.ExtendedGraphemeClusterLiteralType)
init(utf16CodeUnits: UnsafePointer<unichar>, count: Int)
init(utf16CodeUnitsNoCopy: UnsafePointer<unichar>, count: Int, freeWhenDone flag: Bool)
init?(_ codeUnits: Substring.UTF16View)
init?(_ codeUnits: Substring.UTF8View)
init?(bytesNoCopy bytes: UnsafeMutableRawPointer, length: Int, encoding: String.Encoding, freeWhenDone flag: Bool)
init?(cString: UnsafePointer<CChar>, encoding enc: String.Encoding)
init?(data: __shared Data, encoding: String.Encoding)
init?(utf8String bytes: UnsafePointer<CChar>)
init?(validatingUTF8 cString: UnsafePointer<CChar>)

Give yourself a minute to soak in this list.

Did you realize that map(String.init) could unsafely read arbitrary amounts of memory if there's an UnsafePointer? Did you realize that the specifics of this behavior can actually vary slightly depending on whether the result type is optional?

Did you realize that map(String.init) could perform blocking network I/O if it ran on a URL and there happened to be a try on an enclosing expression?

Did you realize that map(String.init) could call underscored initializers that aren't even listed in the documentation?

Did you realize that you can't use map(String.init) for the vast majority of types (init(describing:) and init(reflecting:) could both match most types), and it will only work on types that conform to LosslessStringConvertible, TextOutputStreamable, or CustomStringConvertible? (Recall that CustomStringConvertible should almost never be something you constrain a generic parameter to—it's supposed to be a hook that the string formatting machinery uses, not a behavior-changing conformance.)

And that one used to be even worse—in Swift 4.2 and earlier, most calls to String.init ended up resolving to String.init(stringInterpolationSegment:), which is not supposed to be called directly at all. Many, many code bases were calling it without realizing they were. Alamofire was one of them.

This lack of clarity is frankly dangerous. It's the kind of thing that could not only cause bugs, but bugs with CVE numbers. In the face of that reality, I think the minor gains in brevity it offers have to yield.

It sure would be nice to know that a map onto String can only perform network I/O if you actually see String.init(contentsOf:) in the source code, wouldn't it?

Nevin · July 22, 2019, 12:44am

Thank you for writing this, it makes clear what problem the pitch intends to address, in a way that I did not glean from the earlier description.

michelf · July 22, 2019, 2:02am

What about a rule like this one:

When there is no overload, the short form can be used to match the function regardless of the parameter labels.
In the presence of overloads, the short form can only be used to match functions with no parameter label.

The first part is to keep things convenient when there can be no complication, for instance when you use your own uniquely named function as a callback or selector.

The second part mimics the expectations of those designing an API: with no label you match by the number of parameters and their types (which can be easily inferred), whereas with labels you are expected to actually have to write the labels at the call site (the labels are never inferred at a call site).

I think these rules would work nicely to avoid the scary behaviors of map(String.init). The API was designed with the idea that labels couldn’t be omitted, but, while true at a regular call site, that assumption doesn’t hold when resolving to a function type. This broken assumption is the danger here.

xwu · July 22, 2019, 4:45am

This has already been suggested, and @jrose points out a major problem above:

DevAndArtist · July 22, 2019, 9:41am

Correct me if I‘m wrong but isn‘t this the current behavior? If there is no overloads but the function has parameter labels it would create an issue when the user only uses the base name and if in the future one would introduce another member with the same base name such as the count example.

I think the most appropriate solution is what Jordan mentioned in the end of his post.

Alejandro_Martinez · July 22, 2019, 9:50am

I'm in favour of this change for functions with more than one parameter in theory, as it always seemed as weird that Swift allowed it.

But is true that having to write init(_:) or max(_:_:) when it's clear would make Swift code uglier that it is now.

I also think that Nevin point is really important:

that change is still causing trouble in today's Swift. So I hope we can make sure we're not causing any unwanted side effects.

Ultimately I think I would support an implementation like Jordan's suggestion:

bzamayo · July 22, 2019, 10:00am

Would a change like this impact trailing closure syntax? Requiring items.first(where:) { $0.isHidden } versus items.first { $0.isHidden } would not be very nice.

michelf · July 22, 2019, 11:36am

This is an already existing problem though. The difference with today is there's some chance the compiler will automatically switch to the new function and not tell you. This is probably fine if the labels are the same since the functions will have been designed for overloading. If the label differs that's problematic and a good thing the compiler tells you.

Another source incompatibility when adding overloads: minimum version requirements. If a new better-matching overload with the same labels appears but requires wrapping into an if #available block, that is source-breaking too:

func test(_ a: Any) {}

@available(macOS 10.99, *)
func test(_ a: Int) {}

test(1) // error: 'test' is only available on OS X 10.99 or newer

Yet, we don't force all call sites to be written like test(1 as Any) just in case a new overload appears in the future. That'd be madness. My feelings are similar towards the labels: they should be optional when there's no risk of confusion.

This non-specificity is especially useful in your own code where you have a tendency to refactor things and refine the types & labels used as you improve things in your code.

Perhaps a solution could be a warning (with a fixit) about not specifying the labels when referring to functions across module boundaries.

Jumhyn · July 22, 2019, 12:42pm

Like default parameters, trailing closure syntax is a call-site concern, which (I hope/assume) would be unaffected by this pitch.

Jumhyn · July 22, 2019, 12:48pm

Yet along a related axis, we don’t allow functions to be called with or without argument labels as desired by a client—they are considered a necessary part of the name of the function for the purpose of calling, so why should referencing be any different? That, to me, is a closer parallel than adding a purely type-based overload versus an “overload” based on argument labels. A library author may falsely believe they are safely adding this overload since it has a different argument label than existing methods, without realizing that that they may break clients which are referencing the function (or even if they are aware of the issue, they have no way to address it without changing the base name).

This is also why I’m not super compelled by the “non-specificity is especially useful in your own code” argument. Why wouldn’t the same non-specificity be useful at the call site? You already break all those usages when you change the argument label, so this change doesn’t seem like it would introduce significantly more code churn than already exists today while API details are still in flux.