`count(where:)` on Sequence

collection

(Paul Cantrell) #17

.lazy is indeed a wonderful and powerful tool! I’d like to see it more widely known and used, and also supported in more situations in the stdlib. No argument there!

I’d respectfully disagree with this. The point of convenience methods is not merely to hide complexity from beginners, but also:

  • to make code more readable and information-bearing by reducing the verbose repetition of common patterns, and
  • to reduce the surface area for bugs inherent in such repetitions.

Understanding the more verbose form does not mitigate these problems; on the contrary, it exacerbates them. A conditioned “oh, I know that pattern” response leads to skimming, and thus to code blindness. Ergonomics doesn’t just save keystrokes; it prevents bugs.

There are in fact very few methods that we strictly need on Sequence and Collection; most of them are just conveniences (semantically, at least). Why do we need .count when you can just do .reduce(0) { c, _ in c + 1 }? Even if compiler magic made it so there were no performance difference, .count would still pull its weight. This:

let hasDuplicates = (items.count != Set(items).count)

…is simply more readable than this:

let hasDuplicates = (items.reduce(0) { c, _ in c + 1 } != Set(items).reduce(0) { c, _ in c + 1 })

…and this version has a subtle bug that is very hard to spot in all the visual noise:

let hasDuplicates = (items.reduce(0) { c, _ in c + 1 } != Set(items).reduce(0) { _, c in c + 1 })

None of that is to say a word against .lazy, long may it live! It is to say rather that the fact that a new method is purely a convenience does not make it undesirable; rather, we should think about the tradeoff between having crisper, more readable code versus having a smaller API footprint. In this case, that tradeoff squarely favors count(where:).


(Soroush Khanlou) #18

I’m not sure I follow here — which protocol should it go on if not Sequence? Set’s conformance would always return 1 or 0, but I think that’s actually okay?

It also raises another interesting question: what should the NSCountedSet (or any multiset) implementation return? With a naive implementation, NSCountedSet currently returns 1 (the cast to Int is because NSCountedSet isn’t generic (yet?)):

let cs = NSCountedSet(array: [1, 1, 2, 3, 3, 3])

cs.count // => 3

Array(cs) // => [3, 1, 2]

let m = cs.count({ $0 as? Int == 1}) // => 1

I think the correct answer here is to return that there are 2 elements that pass the test?


(Anthony Latsis) #19

Well, formally it’s okay, but in practice it’s kind of redundant. It becomes a second contains(Element).

A set should return 1 or 0, by definition, could you clarify why 2 elements?


(Soroush Khanlou) #20

Because NSCountedSet isn’t generic, it’s hard to show. But imagine NSCountedSet were generic, and we constructed one that was generic over an Equatable type, like Int. If that were the case, we’d get count(of:), right?

 let cs = NSCountedSet(array: [1, 1, 2, 3, 3, 3]) // => imagine this is an NSCountedSet<Int>

// inherited from this pitch
cs.count(of: 1) // => 1 

// part of the NSCountedSet API
cs.count(for: 1) // => 2

I think these two methods should return the same thing, otherwise it would be very confusing for end users.


(Anthony Latsis) #21

I don’t think it would be appropriate for count(of:) to return 1 or 0 in NSCountedSet, but having it behave like count(for:) is redundant. Substituting count(for:) with count(of:) is not a good idea as well, since it changes the method’s semantics. But actually, maybe it’s not that bad of an idea.

By the way, we don’t have an isUnique property for sequences…What do you think? It sounds ambiguous in the context of NSCountedSet though.


(Ben Cohen) #22

This is an important point. One of the criteria we’ve discussed before for justifying addition to the standard library is that it helps people avoid common correctness or performance traps.

To run down the full list, I think this qualifies on a number of levels:

Readability: array.count(where: predicate) is much more readable at a glance than the composed alternatives using lazy.filter or reduce.
Commonality: this is always a subjective one, but I think it’s clear this is a common need, and not tied to a specific domain.
Performance: This implementation avoids a performance pitfall of creating an intermediate array. There’s also an argument that the Equatable version should be a customization point since some collections can implement it more efficiently (cue @jrose expressing concern we have too many of those).

It doesn’t qualify on grounds of being hard to implement, and also (probably) doesn’t qualify on grounds of needing access to internals to be more efficient.

I see this as similar to isEmpty, which is trivially composable as startIndex != endIndex but is more readable and avoids people accidentally writing the often-linear count > 0.

edit: ha, I mean startIndex == endIndex of course. Not so trivially composable after all!


Add average to FloatingPoint arrays
(Paul Cantrell) #23

And count == 0, of course. It’s always a relief to know other people are human too!

And it does prove the point that even tiny cognitive burdens such as figuring out whether == means “empty” or “not empty” can introduce bugs that clearly named conveniences can help avoid.


(Ben Cohen) #24

It probably also demonstrates that isEmpty is the wrong polarity, like so many people keep saying and that I’ve been clinging to denial about…


(Anthony Latsis) #25

I would say the polarity of a boolean property is a situation similar to that meme about the blue-or-golden dress: if you get it wrong it’s either a coincidence or a way of thinking rather than the polarity being incorrect :)


(Soroush Khanlou) #26

:tada:

Examples here are Set forwarding to contains, and the MultiSet/CountedSet example above, right? Anything I’m missing?

A little off-topic, but I think a lot of that is that ! (meaning “not”) is a lot harder to read than people want to admit. C.f. the containsOnly/all function, which is “just” two ! operators + contains, which yields absolutely inscrutable code.


(Ben Cohen) #27

The ranges could also return 1 or 0 depending on whether the element was within bounds. And maybe RangeExpression could too depending on what RangeExpression.contains(_: Bound) should mean (as being discussed here). Also Dictionary could optimize to first check the key, then check the value.


Count(of:) and contains(other:) for Collection
(David Waite) #28

Seems more appropriate to have on Collection, as Sequence only has a property to (under)estimate the count.


(Letanyan Arumugam) #29

I think that’s becuase count needs to be O(1). where as count(where:) can/will? be O(n). So there doesn’t seem to be a need to limit ourselves, unless you have another reason?

Edit: also sequence can be infinite


(David Waite) #30

Count is not defined to be O(1) on Collection - it is O(1) on RandomAccessCollection. https://developer.apple.com/documentation/swift/collection/2950091-count


(Letanyan Arumugam) #31

Correct, I shouldn’t be so hasty when reading. Thinking about it more I do think we should probably be more consistent with Sequence as there would’ve been a valid reason count isn’t defined for it.


(Michael Sanford) #32

Ran a small test. Below is a slightly more performant version that uses @_inlineable. This is about 60% faster than a lazy.filter implementation.

extension Sequence {
    @_inlineable public func count(where predicate: (Element) throws -> Bool) rethrows -> Int {
        return try reduce(0) { try predicate($1) ? $0+1 : $0 }
    }
}

(Soroush Khanlou) #33

count is unavailable on Sequence because some sequences are single-pass only, and iterating through it to see how many elements you have would destroy the sequence, and properties aren’t supposed to mutate a type. I believe this is why underestimatedCount exists — it doesn’t consume the sequence to give you a value, and you can use it for stuff like Array.reserveCapacity.

So the reduce implementation is 2x faster than for loop? Or it’s 2x faster than a not @_inlineable reduce?


(Anthony Latsis) #34

Here’s what I got for the average time over 5k random arrays of 10 integers.

0.00184147862333509 // loop
0.00237991754542921 // reduce
0.00132116151556829 // inline reduce

So the loop is ~22% faster than reduce, inline reduce is ~44% faster than reduce and ~28% faster than loop


(Peng Guo) #35

Awesome!
agree that count(where:) is the better one.


(Chris Lattner) #36

+1 to filling in obvious missing functionality, thanks @soroush!

-Chris