SE-0207: Add a containsOnly algorithm to Sequence

hartbit · April 13, 2018, 2:25pm

+1. I would go with containsOnly(where:) for consistency and slight brevity

hartbit · April 13, 2018, 2:27pm

I'm also slightly worried about the confusion in containsOnly being interpreted as the collection containing only one element. I would strongly urge to go for containsExclusively() and containsExclusively(where:) to remove all ambiguity.

Paul_Cantrell · April 13, 2018, 10:39pm

I did a survey of naming conventions across a variety of languages:

Language	Predicate true for at least one	Predicate true for all	At least one equals given value	All elements equal given value
C#	`Any`	`All`	`Contains`	–
C++	`any_of`	`all_of`	–	–
Clojure	`some`	`every?`	–	–
F#	`exists`	`forall`	`contains`	–
Haskell	`any`	`all`	`elem`	–
Java	`anyMatch`	`allMatch`	`contains`	–
JavaScript	`some`	`every`	`includes`	–
Kotlin	`any`	`all`	`contains`	–
Matlab	`any`	`all`	`ismember`	–
PHP	`array_some`	`array_every`	`in_array`	–
Python	`any`	`all`	… `in` …	–
R	`any`	`all`	…`%in%` …	`all.equal`
Ruby	`any?`	`all?`	`include?`	–
Rust	`any`	`all`	`contains`	–
Scala	`exists`	`forall`	`contains`	–

This includes nearly all the most popular & “most loved” languages in the latest Stack Overflow survey that have closures or other predicate-like constructs.

Some observations:

For the first function:
- 10 use the word “any”
- 3 use “some”
- 2 use “exists”
For the second function:
- 12 use the word “all”
- 3 use the word “every”
For the third function:
- 6 use the word “contains”
- 3 use “in”
- 2 use “include(s)”
- 1 uses “elem”
No language strives for parallel naming between the first and third.
Only one language implements an “all elements equal a given value” function — at least that I could find. If any of these other languages have one and I missed it, please let me know and I'll update the table. (Rust does have an all_equal, but it seems to mean all equal to each other, not to a given value)

mbrandonw · April 14, 2018, 3:20am

It's more than a convention, it's the only way to define it. The reason is that you'd want this to hold:

(xs + ys).containsOnly(p) == xs.containsOnly(p) && ys.containsOnly(p)

In particular, if ys = [], then

(xs + []).containsOnly(p) == xs.containsOnly(p) && [].containsOnly(p)

So the only sensible choice is [].containsOnly(p) == true, no matter what p is.

Paul_Cantrell · April 14, 2018, 5:15am

One more tidbit of midnight-oil-burning research: Earlier in the conversation, someone (Ben Cohen, I think?) expressed skepticism about renaming the existing contains(where:) to bring it into line with other proposed naming schemes.

Curious how widely the method is used in practice, I scanned all the projects in the swift-source-compat-suite for calls to contains(where:) to assess the impact of a hypothetical name change, in case we were to decide that renaming the existing method would indeed yield the best overall naming structure.

Here are the results:

525,291 LOC total in the compat suite
73 unique calls to contains(where:) (including its trailing closure form)
That is an incidence of 1 usage per ~7100 LOC.
5 of those calls are used in definitions of other collection helper methods:
- 3 named all (equivalent to this proposal)
- 1 named any (aliasing the existing contains(where:) method)
- 1 named none (worth a proposal?)

Here are those 73 call sites, in case anyone is curious.

palimondo · April 14, 2018, 5:50am

The 3 options offered there by the core team represent a false choice and shouldn’t be setting a precedent without considering the wider picture!

We have to recognize that the source of tension in the name pitched in proposal stems from the original sin of contains. This added half of the natural API and was named in isolation to be “Swifty” without regards to prior art in other languages. The natural desire for symmetry forces the new proposed dual method to rather strange place: suffix extension containsOnly. This shrinks available design space to the pigeonhole of gramaticality of argument labels in futile effort to restore fluidity and clarity at the point of use. I think the remaining names are quite schwifty!

At this point, we are dealing solely with symptoms of self-inflicted wounds.

Additional pressure for the whole naming process comes from the looming ABI stability. I don’t fully understand all its implication, but numerous anecdotes dropped around forum talk of “having to live with current APIs forever” and a need to “get things right the first time” aren’t exactly calming... Did I miss some explainer of what exactly happenes to standard library from evolution perspective? Are we really about to launch for Proxima Centaury and what we have on board now is all there is?!

If so, I should be panicking much more…

Jens · April 14, 2018, 12:29pm

Despite, or perhaps because of, all the energy spent on trying to ensure clarity at the point of use, it looks like we might end up getting:

[].containsOnly(elementsWhere: gradeIsAPlus) // true(!)
[].contains(where: gradeIsAPlus) // false

instead of simply:

[].all(gradeIsAPlus) // true(!)
[].any(gradeIsAPlus) // false

bzamayo · April 14, 2018, 12:57pm

The rabbit hole of laying blame on contains is really distracting. contains is a perfect name for its operation and I would fiercely oppose changing it to any or anything else.

If all (or allSatisfy etc) is deemed a better name for the containsOnly algorithm, then that's fine. We could have all and contains sit alongside each other with reasonable harmony and no confusion.

FWIW my personal opinion remains in support of something with a contains basename whether that is containsOnly, containsAll, or whatever.

pyrtsa · April 14, 2018, 1:30pm

Contains as a word is perfect for the Equatable case of things.contains(something), and I’d say merely good enough for the one taking a predicate (I’m talking about contains(where:)).

If we didn’t have the resiliency burden of having to support existing (albeit rare) uses of contains(where:), I’d be all +1 for the triple of

things.contains(something)
things.any(isSomehow)  // deprecating contains(where:)
things.all(isSomehow)  // the new method

…Term-of-artness considered, and as a plus, there’s no difference in function name at call site between trailing closure use (things.all { $0.isBig }) and ordinary function arguments (things.all(isBig)).

Following this thinking, the question is: can we afford deprecating an established name (contains(where:)) in the stdlib? I’d say yes, because the fixit is straightforward and we have no problem supporting the deprecated name over some Swift versions, but YMMV.

JoeyKL · April 14, 2018, 6:24pm

I think .all(equal:) and .all(satisfy:) are far and away the best options here. They both use the term of art and form a grammatical English phrase.

Ben_Cohen · April 16, 2018, 4:32pm

First off, thank you for doing this research. It's really valuable to back up any discussions we have about things like the impact of renames with data from the compatibility suite (and/or sometimes a GitHub search or other similar data).

I might be reading too much into your post, as you don't say it explicitly, but I think you are suggesting this is fairly modest usage and therefore not a problem from a renaming perspective. I take the opposite view: that to find 73 uses of a function in the compatibility suite demonstrates fairly extensive usage. And if you look at the results in your link (thanks for that, too!) it shows that the usage is spread across a number of different projects. So this deprecation will flag in a significant number of user's projects.

And users are, in my view, tired of these kind of rename-related deprecations and want to see fewer of them, even when a migration will help them through fairly painlessly or when the deprecation has a slow burn. We saw that recently with the flatMap rename, where there was a fair amount of social media snark about the rename when 4.1 was released.

I still think that rename was worthwhile, because of the active confusion that the overloaded flatMap was causing. But that is not the case here: this would be purely a rename for consistency/preference. My belief continues to be that renames at this stage in Swift's development should be reserved only for cases where the current name can be demonstrated to be causing active harm.

This is just my personal view, rather than the official stance of the core team, but it's a view I am keen for this forum to adopt.

Paul_Cantrell · April 16, 2018, 5:52pm

That is reading a bit too much into it. My main intent is to replace assumptions with data. If a renaming decision hinges on developer impact, let's try to actually measure the impact.

My own take is that 1 use per 7600 LOC is neither “not a problem” nor “fairly extensive,” but somewhere in the murky middle of judgement calls.

17 of 59 projects use it, about 29%. Among the projects that use it, median occurences per project is 2, max is 19, half have only 1:

19 SwiftLint
10 GRDB
10 Sourcery
8 Kingfisher
5 R
4 CoreStore
4 AsyncNinja
3 siesta
2 SwifterSwift
1 ReSwift
1 NetService
1 ReactiveCocoa
1 ProcedureKit
1 ReactiveSwift
1 Kickstarter
1 vapor
1 SwiftGraph
0 (42 projects)

Again, somewhere in the murky middle between “rare” and “extensive.”

Certainly agreed. There’s clearly a tradeoff here not to be undertaken lightly.

For comparison, there are 1965 occurrences of either flatMap or compactMap in the suite (which still lags the compactMap rename for many projects) — though I'm just doing a hacked-up regex search, and thus can’t distinguish what fraction of those flatMaps are nil-filtering vs array-joining.

(An informal manual inspection of a small random sample suggests it’s on the general order of half and half, i.e. an order of magnitude more frequent than contains(where:).)

My counterpoint to this is that many on this thread (including me) make the case that contains(where:) is in fact doing active harm:

It is a name unprecedented in any other language
for something for where there is already a well-established (different) term of art.
It therefore has poor discoverability
and is likely to confuse at the point of use.
Adding this new feature has brought new attention to the old method’s problems
and maintaining consistency with the old method leads us to naming options for the new ones that have failed to achieve consensus.

Given all that, my take is that going with containsOnly(…) instead of all or allMatch is just throwing good money after bad, and it’s better to bite the bullet now instead of letting a mistake spawn even worse mistakes that we’ll have to live with indefinitely.

Again, however, I really do appreciate that there's a tradeoff here, that deprecations have a cost, and that this is a judgement call. I also appreciate that I'm advocating a choice who psychological cost I don’t personally have to pay, and you do! Some of the arguments against containsOnly got a bit … aggressive, so please know that I really do appreciate that the decisions you make are tough ones, even if I don’t always remember to say so. (I think many others here appreciate that too.)

Ben_Cohen · April 16, 2018, 6:22pm

None of the above would qualify as active harm by my (personal, subjective) definition. Consistency with other languages, consistency with newly introduced methods, increasing discoverability should all be ruled out as possible reasons to rename an existing long-standing method. We have to draw a line. Users have lost patience with these kind of changes.

I don't think it's defensible that there is confusion at the call site. There isn't another interpretation of what contains(where:) does when you see it used – unlike, for example, elementsEqual where it is clearly reasonable to think it does something other than what it does.

Tino · April 16, 2018, 9:06pm

That's surely true -- as long as this is meant as "some (or many) users" (in contrast to "all users").

There is a constant struggle of compatibility against other aspects (progress, simplicity, elegance...), and everyone has his own priorities:

There is merit in being able to build a ten year old program with a brand new compiler, and there's also merit in having a lean language and stdlib without ten years of cruft.

This thread now touches questions that are much more fundamental than adding or renaming some methods, and I think those questions could fill a whole section on their own, without ever getting final answers.

Swift can't make everyone happy, but I think being bold and without fear of breaking things to replace them with something better is part of Apples DNA.
So I strongly hope that Swift will continue to accept big changes (as long as those don't happen on a quarterly basis ;-), because we can build tools to lessen the pain of migration, but it's very hard to build tools to drive progress.

If there's agreement that a change would make Swift a better language, imho that shouldn't be ruled out light-minded.

(oh, and I like that "any/all" thingy ;-)

timv · April 17, 2018, 12:32am

all(satisfy:) only forms a grammatical English phrase if you don't use the trailing closure syntax, and array.all { ... } can easily be mistaken for some sort of filter. So if we decide to go with "all" then I think something needs to be added to the base name of that function, rather than to the argument label, to disambiguate it from filter. I'm a bigger fan of name that matches the existing contains functions though.

palimondo · April 17, 2018, 12:40am

Name of the method shouldn’t be judged in isolation. Type information and documentation must the also be considered. All three together form the fundamental pillars of API Design Guidelines. Trailing syntax always removes half of the name by stripping argument labels and it is therefore up to the user to make the call site clear and readable.

timv · April 17, 2018, 1:06am

I'm judging it based on the likelihood of people using the trailing closure syntax with this method regardless of what its final name will be. I think it's a reasonable goal to try to minimize the ambiguity of both ways a method can be (and commonly is) written, rather than just one.

That's a pretty generous "half". The standard library's higher-order functions that have an argument label might lose some of their readability like an English sentence when using the trailing closure syntax, but I can't think of any that become harder to understand by using this syntax.

Paul_Cantrell · April 17, 2018, 5:15am

I imagine that line should come when the cost of making a particular change exceeds the cost of living with the problem it fixes — total cost, for language maintainers and users, current and future.

I stand by my bullet points above. I’ve made my case for handling this problem a bit more holistically than the proposal proposes, and hopefully given some useful data regardless of whether you buy my argument, so I’ll leave it there and let the core team do its work.

hisekaldma · April 17, 2018, 9:44am

What is your evaluation of the proposal?
I already have all(match:) and any(match:) in several projects. If the standard library adds something close to them, I will remove my own versions. If the standard library adds containsOnly(where:), I will probably never remember what it's called, and just keep using my own all(match:) instead. Just like I'm currently using my own any(match:) instead of contains(where:).

Does this proposal fit well with the feel and direction of Swift?
No. The API Design Guidelines clearly state that an API should embrace precedent when naming things. The established precedent in other languages is to have functions named all and any.

If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?
I've used functions named any and all in several languages.

How much effort did you put into your review? A glance, a quick reading, or an in-depth study?
Read the proposal and most of the thread.

lancep · April 17, 2018, 4:28pm

This is a great point. The method in question returns a Bool, which should make it obvious that this is not doing a filter operation.

It's also worth noting that you can't use trailing closure syntax in the condition of an if statement. The closure must be in the parentheses of the function call.

if array.all(match: { $0 % 2 == 0 }) { // works
  print("All even")
}

if array.all { $0 % 2 == 0 } { // Error: Trailing closure requires parentheses for disambiguation in this context
  print("again")
}

The second example above gives you a fixit that puts back in the argument label for the closure.