`count(where:)` on Sequence

anthonylatsis · March 21, 2018, 4:20pm

I don't think it would be appropriate for count(of:) to return 1 or 0 in NSCountedSet, but having it behave like count(for:) is redundant. Substituting count(for:) with count(of:) is not a good idea as well, since it changes the method's semantics. But actually, maybe it's not that bad of an idea.

By the way, we don't have an isUnique property for sequences...What do you think? It sounds ambiguous in the context of NSCountedSet though.

Ben_Cohen · March 21, 2018, 4:23pm

This is an important point. One of the criteria we've discussed before for justifying addition to the standard library is that it helps people avoid common correctness or performance traps.

To run down the full list, I think this qualifies on a number of levels:

Readability: array.count(where: predicate) is much more readable at a glance than the composed alternatives using lazy.filter or reduce.
Commonality: this is always a subjective one, but I think it's clear this is a common need, and not tied to a specific domain.
Performance: This implementation avoids a performance pitfall of creating an intermediate array. There's also an argument that the Equatable version should be a customization point since some collections can implement it more efficiently (cue @jrose expressing concern we have too many of those).

It doesn't qualify on grounds of being hard to implement, and also (probably) doesn't qualify on grounds of needing access to internals to be more efficient.

I see this as similar to isEmpty, which is trivially composable as startIndex != endIndex but is more readable and avoids people accidentally writing the often-linear count > 0.

edit: ha, I mean startIndex == endIndex of course. Not so trivially composable after all!

Paul_Cantrell · March 21, 2018, 4:40pm

And count == 0, of course. It’s always a relief to know other people are human too!

And it does prove the point that even tiny cognitive burdens such as figuring out whether == means “empty” or “not empty” can introduce bugs that clearly named conveniences can help avoid.

Ben_Cohen · March 21, 2018, 4:42pm

It probably also demonstrates that isEmpty is the wrong polarity, like so many people keep saying and that I've been clinging to denial about...

anthonylatsis · March 21, 2018, 5:10pm

I would say the polarity of a boolean property is a situation similar to that meme about the blue-or-golden dress: if you get it wrong it's either a coincidence or a way of thinking rather than the polarity being incorrect :)

soroush · March 21, 2018, 5:35pm

Examples here are Set forwarding to contains, and the MultiSet/CountedSet example above, right? Anything I'm missing?

A little off-topic, but I think a lot of that is that ! (meaning "not") is a lot harder to read than people want to admit. C.f. the containsOnly/all function, which is "just" two ! operators + contains, which yields absolutely inscrutable code.

Ben_Cohen · March 21, 2018, 5:47pm

The ranges could also return 1 or 0 depending on whether the element was within bounds. And maybe RangeExpression could too depending on what RangeExpression.contains(_: Bound) should mean (as being discussed here). Also Dictionary could optimize to first check the key, then check the value.

dwaite · March 22, 2018, 7:48am

Seems more appropriate to have on Collection, as Sequence only has a property to (under)estimate the count.

Letan · March 22, 2018, 7:53am

I think that's becuase count needs to be O(1). where as count(where:) can/will? be O(n). So there doesn't seem to be a need to limit ourselves, unless you have another reason?

Edit: also sequence can be infinite

dwaite · March 22, 2018, 7:55am

Count is not defined to be O(1) on Collection - it is O(1) on RandomAccessCollection. https://developer.apple.com/documentation/swift/collection/2950091-count

Letan · March 22, 2018, 8:03am

Correct, I shouldn't be so hasty when reading. Thinking about it more I do think we should probably be more consistent with Sequence as there would've been a valid reason count isn't defined for it.

swiftyserver · March 22, 2018, 11:01am

Ran a small test. Below is a slightly more performant version that uses @_inlineable. This is about 60% faster than a lazy.filter implementation.

extension Sequence {
    @_inlineable public func count(where predicate: (Element) throws -> Bool) rethrows -> Int {
        return try reduce(0) { try predicate($1) ? $0+1 : $0 }
    }
}

soroush · March 22, 2018, 1:22pm

count is unavailable on Sequence because some sequences are single-pass only, and iterating through it to see how many elements you have would destroy the sequence, and properties aren't supposed to mutate a type. I believe this is why underestimatedCount exists — it doesn't consume the sequence to give you a value, and you can use it for stuff like Array.reserveCapacity.

So the reduce implementation is 2x faster than for loop? Or it's 2x faster than a not @_inlineable reduce?

anthonylatsis · March 22, 2018, 5:08pm

Here's what I got for the average time over 5k random arrays of 10 integers.

0.00184147862333509 // loop
0.00237991754542921 // reduce
0.00132116151556829 // inline reduce

So the loop is ~22% faster than reduce, inline reduce is ~44% faster than reduce and ~28% faster than loop

NilStack · March 23, 2018, 3:00am

Awesome!
agree that count(where:) is the better one.

Chris_Lattner3 · April 1, 2018, 12:08am

+1 to filling in obvious missing functionality, thanks @soroush!

-Chris

salutis · July 13, 2018, 10:48pm

Any news on this one?

soroush · July 13, 2018, 11:07pm

I have a PR open for both the proposal and the implementation, and we are just waiting for the review period to start.

salutis · July 13, 2018, 11:09pm

Fantastic, thanks @soroush!

Jon_Hull · August 5, 2018, 6:25am

I just had a place today where this really would have helped. I hope it comes up for review soon!