`count(where:)` on Sequence

collection

(Ben Cohen) #22

This is an important point. One of the criteria we’ve discussed before for justifying addition to the standard library is that it helps people avoid common correctness or performance traps.

To run down the full list, I think this qualifies on a number of levels:

Readability: array.count(where: predicate) is much more readable at a glance than the composed alternatives using lazy.filter or reduce.
Commonality: this is always a subjective one, but I think it’s clear this is a common need, and not tied to a specific domain.
Performance: This implementation avoids a performance pitfall of creating an intermediate array. There’s also an argument that the Equatable version should be a customization point since some collections can implement it more efficiently (cue @jrose expressing concern we have too many of those).

It doesn’t qualify on grounds of being hard to implement, and also (probably) doesn’t qualify on grounds of needing access to internals to be more efficient.

I see this as similar to isEmpty, which is trivially composable as startIndex != endIndex but is more readable and avoids people accidentally writing the often-linear count > 0.

edit: ha, I mean startIndex == endIndex of course. Not so trivially composable after all!


Add average to FloatingPoint arrays
(Paul Cantrell) #23

And count == 0, of course. It’s always a relief to know other people are human too!

And it does prove the point that even tiny cognitive burdens such as figuring out whether == means “empty” or “not empty” can introduce bugs that clearly named conveniences can help avoid.


(Ben Cohen) #24

It probably also demonstrates that isEmpty is the wrong polarity, like so many people keep saying and that I’ve been clinging to denial about…


(Anthony Latsis) #25

I would say the polarity of a boolean property is a situation similar to that meme about the blue-or-golden dress: if you get it wrong it’s either a coincidence or a way of thinking rather than the polarity being incorrect :)


(Soroush Khanlou) #26

:tada:

Examples here are Set forwarding to contains, and the MultiSet/CountedSet example above, right? Anything I’m missing?

A little off-topic, but I think a lot of that is that ! (meaning “not”) is a lot harder to read than people want to admit. C.f. the containsOnly/all function, which is “just” two ! operators + contains, which yields absolutely inscrutable code.


(Ben Cohen) #27

The ranges could also return 1 or 0 depending on whether the element was within bounds. And maybe RangeExpression could too depending on what RangeExpression.contains(_: Bound) should mean (as being discussed here). Also Dictionary could optimize to first check the key, then check the value.


Count(of:) and contains(other:) for Collection
(David Waite) #28

Seems more appropriate to have on Collection, as Sequence only has a property to (under)estimate the count.


(Letanyan Arumugam) #29

I think that’s becuase count needs to be O(1). where as count(where:) can/will? be O(n). So there doesn’t seem to be a need to limit ourselves, unless you have another reason?

Edit: also sequence can be infinite


(David Waite) #30

Count is not defined to be O(1) on Collection - it is O(1) on RandomAccessCollection. https://developer.apple.com/documentation/swift/collection/2950091-count


(Letanyan Arumugam) #31

Correct, I shouldn’t be so hasty when reading. Thinking about it more I do think we should probably be more consistent with Sequence as there would’ve been a valid reason count isn’t defined for it.


(Michael Sanford) #32

Ran a small test. Below is a slightly more performant version that uses @_inlineable. This is about 60% faster than a lazy.filter implementation.

extension Sequence {
    @_inlineable public func count(where predicate: (Element) throws -> Bool) rethrows -> Int {
        return try reduce(0) { try predicate($1) ? $0+1 : $0 }
    }
}

(Soroush Khanlou) #33

count is unavailable on Sequence because some sequences are single-pass only, and iterating through it to see how many elements you have would destroy the sequence, and properties aren’t supposed to mutate a type. I believe this is why underestimatedCount exists — it doesn’t consume the sequence to give you a value, and you can use it for stuff like Array.reserveCapacity.

So the reduce implementation is 2x faster than for loop? Or it’s 2x faster than a not @_inlineable reduce?


(Anthony Latsis) #34

Here’s what I got for the average time over 5k random arrays of 10 integers.

0.00184147862333509 // loop
0.00237991754542921 // reduce
0.00132116151556829 // inline reduce

So the loop is ~22% faster than reduce, inline reduce is ~44% faster than reduce and ~28% faster than loop


(Peng Guo) #35

Awesome!
agree that count(where:) is the better one.


(Chris Lattner) #36

+1 to filling in obvious missing functionality, thanks @soroush!

-Chris


(Rudolf Adamkovič) #37

Any news on this one?


(Soroush Khanlou) #38

I have a PR open for both the proposal and the implementation, and we are just waiting for the review period to start.


(Rudolf Adamkovič) #39

Fantastic, thanks @soroush!


(Jon Hull) #40

I just had a place today where this really would have helped. I hope it comes up for review soon!


(Soroush Khanlou) #41

The review starts tomorrow!