`count(where:)` on Sequence

I don't think it would be appropriate for count(of:) to return 1 or 0 in NSCountedSet, but having it behave like count(for:) is redundant. Substituting count(for:) with count(of:) is not a good idea as well, since it changes the method's semantics. But actually, maybe it's not that bad of an idea.

By the way, we don't have an isUnique property for sequences...What do you think? It sounds ambiguous in the context of NSCountedSet though.

This is an important point. One of the criteria we've discussed before for justifying addition to the standard library is that it helps people avoid common correctness or performance traps.

To run down the full list, I think this qualifies on a number of levels:

Readability: array.count(where: predicate) is much more readable at a glance than the composed alternatives using lazy.filter or reduce.
Commonality: this is always a subjective one, but I think it's clear this is a common need, and not tied to a specific domain.
Performance: This implementation avoids a performance pitfall of creating an intermediate array. There's also an argument that the Equatable version should be a customization point since some collections can implement it more efficiently (cue @jrose expressing concern we have too many of those).

It doesn't qualify on grounds of being hard to implement, and also (probably) doesn't qualify on grounds of needing access to internals to be more efficient.

I see this as similar to isEmpty, which is trivially composable as startIndex != endIndex but is more readable and avoids people accidentally writing the often-linear count > 0.

edit: ha, I mean startIndex == endIndex of course. Not so trivially composable after all!

17 Likes

And count == 0, of course. It’s always a relief to know other people are human too!

And it does prove the point that even tiny cognitive burdens such as figuring out whether == means “empty” or “not empty” can introduce bugs that clearly named conveniences can help avoid.

4 Likes

It probably also demonstrates that isEmpty is the wrong polarity, like so many people keep saying and that I've been clinging to denial about...

I would say the polarity of a boolean property is a situation similar to that meme about the blue-or-golden dress: if you get it wrong it's either a coincidence or a way of thinking rather than the polarity being incorrect :)

1 Like

:tada:

Examples here are Set forwarding to contains, and the MultiSet/CountedSet example above, right? Anything I'm missing?

A little off-topic, but I think a lot of that is that ! (meaning "not") is a lot harder to read than people want to admit. C.f. the containsOnly/all function, which is "just" two ! operators + contains, which yields absolutely inscrutable code.

The ranges could also return 1 or 0 depending on whether the element was within bounds. And maybe RangeExpression could too depending on what RangeExpression.contains(_: Bound) should mean (as being discussed here). Also Dictionary could optimize to first check the key, then check the value.

3 Likes

Seems more appropriate to have on Collection, as Sequence only has a property to (under)estimate the count.

I think that's becuase count needs to be O(1). where as count(where:) can/will? be O(n). So there doesn't seem to be a need to limit ourselves, unless you have another reason?

Edit: also sequence can be infinite

2 Likes

Count is not defined to be O(1) on Collection - it is O(1) on RandomAccessCollection. https://developer.apple.com/documentation/swift/collection/2950091-count

Correct, I shouldn't be so hasty when reading. Thinking about it more I do think we should probably be more consistent with Sequence as there would've been a valid reason count isn't defined for it.

Ran a small test. Below is a slightly more performant version that uses @_inlineable. This is about 60% faster than a lazy.filter implementation.

extension Sequence {
    @_inlineable public func count(where predicate: (Element) throws -> Bool) rethrows -> Int {
        return try reduce(0) { try predicate($1) ? $0+1 : $0 }
    }
}

count is unavailable on Sequence because some sequences are single-pass only, and iterating through it to see how many elements you have would destroy the sequence, and properties aren't supposed to mutate a type. I believe this is why underestimatedCount exists — it doesn't consume the sequence to give you a value, and you can use it for stuff like Array.reserveCapacity.

So the reduce implementation is 2x faster than for loop? Or it's 2x faster than a not @_inlineable reduce?

Here's what I got for the average time over 5k random arrays of 10 integers.

0.00184147862333509 // loop
0.00237991754542921 // reduce
0.00132116151556829 // inline reduce

So the loop is ~22% faster than reduce, inline reduce is ~44% faster than reduce and ~28% faster than loop

2 Likes

Awesome!
agree that count(where:) is the better one.

+1 to filling in obvious missing functionality, thanks @soroush!

-Chris

5 Likes

Any news on this one?

I have a PR open for both the proposal and the implementation, and we are just waiting for the review period to start.

4 Likes

Fantastic, thanks @soroush!

I just had a place today where this really would have helped. I hope it comes up for review soon!