`count(where:)` on Sequence

soroush · March 20, 2018, 2:17pm

While Swift's Sequence models brings a lot of niceties that we didn't have access to in Objective-C, like map and filter, there are other useful operations on Sequences that the standard library doesn't support yet. I'd like to pitch one today: count(where:), which counts the number of objects in a Sequence that passes some test.

Usage

While this behavior can currently be approximated with a filter and a count, that approach creates an intermediate array which it immediately discards. This is a bit wasteful, especially for long sequences:

// not as good
[1, 2, 3, -1, -2].filter({ $0 > 0 }).count // => 3

Simple usage:

// good
[1, 2, 3, -1, -2].count(where: { $0 > 0 }) // => 3

I use count(where:) in my code regularly, and I think it'd make a nice addition to the standard library.

Naming

In earlier versions of Swift, first and first(where:) would collide, causing bad errors. This is a potential problem here as well (because count exists on Collection). However, I think count(where:) is the best name for this function, and if it causes bad diagnostics, we should treat those as bugs to be fixed. However, I'm interested in hearing any other names for this function.

Equatable version

One thing that we should discuss is if there should be a version (perhaps called count(of:)?) that works on Sequences whose Element is Equatable and returns the count of all objects that are equal to the parameter. I've never had a use for this function, but I would be open to including it in the proposal if other people want it.

Proposal and Implementation

If this pitch is regarded well, I'd be happy to write a more detailed proposal and open a pull request with an implementation and tests!

A sample implementation

extension Sequence {
    func count(where predicate: (Element) throws -> Bool) rethrows -> Int {
        var count = 0
        for element in self {
            if try predicate(element) {
                count += 1
            }
        }
        return count
    }
}

extension Sequence where Element: Equatable {
    func count(of elementToTest: Element) -> Int {
        var count = 0
        for element in self {
            if element == elementToTest {
                count += 1
            }
        }
        return count
    }
}

anthonylatsis · March 20, 2018, 3:14pm

It's hard to believe this hasn't been discussed previously though.

If there is a question about adding count(of: Element), it's worth mentioning that it should be more convenient to require it at least in some protocol refining Collection, otherwise it would be a trivial method for Set.

davedelong · March 20, 2018, 3:16pm

This is an obvious, no-brainer huge improvement.

A big +1 on both the functionality and the name.

Karl · March 20, 2018, 3:23pm

This is why we have lazy operations.

[1, 2, 3, -1, -2].lazy.filter { $0 > 0 }.count

Jens · March 20, 2018, 3:39pm

soroush:

While this behavior can currently be approximated with a filter and a count, that approach creates an intermediate array which it immediately discards. This is a bit wasteful, especially for long sequences:
// not as good
[1, 2, 3, -1, -2].filter({ $0 > 0 }).count // => 3

I think it would be more natural to use reduce rather than filter for this:

[1, 2, 3, -1, -2].reduce(0) { $1 > 0 ? $0 + 1 : $0 }

soroush · March 20, 2018, 3:41pm

Yes, this is true, but I want to note a few things here:

From a "reading code" perspective, this looks way worse — is the lazy extraneous noise or is it important? It's hard to tell as a passing reader.
From a "writing code" perspective, this isn't how someone would write this code by default. I think one of the values of Swift is making the easy thing the correct thing as well, and getting performance gains without having to think about it fits into that nicely imo.
You can make this claim about other standard lib functions as well:

Oh contains? that's just seq.lazy.filter(pred).count == 1. Why would you need contains in the standard library?

We add this stuff because it makes things more composable, reduces cognitive strain while reading, and generally makes our lives easier.

Same deal here, but even worse from both a reading and writing code perspective.

stephencelis · March 20, 2018, 3:42pm

I can see count(where:) being a nice alias to this if the performance is the same. It's easy to overlook lazy (it can also affect type-checking performance).

Jens · March 20, 2018, 3:46pm

I agree that it could probably be a good addition to the stdlib, but the proposal could mention

[1, 2, 3, -1, -2].lazy.filter { $0 > 0 }.count
[1, 2, 3, -1, -2].reduce(0) { $1 > 0 ? $0 + 1 : $0 }

In addition to

[1, 2, 3, -1, -2].filter { $0 > 0 }.count

and motivate the need for count(where: ) just like you did above.

soroush · March 20, 2018, 3:48pm

Great call! I will definitely put these in there and the rationale for why they're subpar.

mpangburn · March 20, 2018, 4:40pm

Hah, I was planning on writing up this exact pitch this week!
Agreed that count(where:) is the best name.

A tiny detail: your sample implementation doesn't actually include the where argument label.

soroush · March 20, 2018, 4:55pm

Fixed!

Nevin · March 20, 2018, 4:58pm

In spreadsheets this is spelled countIf(), though I agree that in Swift it should be count(where:).

Here is an implementation using reduce():

extension Sequence {
  func count(where f: (Element) throws -> Bool) rethrows -> Int {
    return try reduce(0) { (n, x) in try f(x) ? n+1 : n }
  }
}

Paul_Cantrell · March 20, 2018, 9:16pm

+1 to the proposed functionality.

+1 to the name. It’s the name I would look for if I were searching for this behavior, and would make sense reading it in context if I'd never seen it before.

Yes, it's an easy behavior to implement this ad hoc, but the simplicity and readability benefits justify the minor additional API surface.

Karl · March 20, 2018, 10:07pm

I disagree.

I don't get this point. You can make this claim about literally anything in the universe which you don't understand.
It's how I would write the code, first time. Because I understand what .lazy does, and how incredibly useful it is in a variety of situations. For example, I use lazy-flatMaps all the time when I need to downcast-and-filter all elements of a collection: self.subviews.lazy.flatMap { $0 as? UILabel }.forEach { /* update every label with the new font, etc */ }.
You could certainly make those claims. But I'm going to go back to your original motivation:

"While this behavior can currently be approximated with a filter and a count, that approach creates an intermediate array which it immediately discards. This is a bit wasteful, especially for long sequences"

Your motivation was to avoid temporaries - in that case, the general solution is to use lazy. That's what I meant (above), not that I would do it like this because I'm some amazing programmer, but that the .lazy subsystem is an essential part of Swift that all programmers should learn. That's why it's in the standard library.

If you're experienced enough to be concerned with temporaries, it's time to introduce yourself to this part of the language. Maybe we could use some better intermediate-level documentation?

soroush · March 21, 2018, 4:44am

I'm not making a claim about understanding it or not, I'm making a claim about noise. More noise is harder to read for everyone.

As a big part of my contract work, I do lots of code review. In my experience, people write the code that comes to them the easiest. To solve this problem, people always do filter + count, without a lazy in sight. My goal with this pitch is to make the easy thing also the correct thing.

I’m not really sure how to respond to this. If temporaries are something for “intermediate-level” programmers, does that mean only “intermediate” and expert programmers who know about lazy should get to access optimizations like this?

Karl · March 21, 2018, 5:02am

It's only noise because the person reading it doesn't understand why it's there. You or I would know, of course, and perhaps wouldn't regard it as noise.

In general, for ad-hoc one-liners like that, the compiler should eliminate the temporary for you, even without lazy (even if it doesn't - I didn't check - it should).

.lazy is more useful when you need to chain operations or box them in an AnyCollection (or... to save lines... when you're lazy...). But yeah, the lazy system is really great, has lots of features and detailed optimisations from the Swift standard library maintainers (see the current topic about lazy compactMap), composes incredibly well, etc. It should be one of the first things any Swift programmer learns once they progress past the fundamentals. I'm not sure it's worth replicating its functionality at the top-level.

Paul_Cantrell · March 21, 2018, 6:25am

.lazy is indeed a wonderful and powerful tool! I'd like to see it more widely known and used, and also supported in more situations in the stdlib. No argument there!

I'd respectfully disagree with this. The point of convenience methods is not merely to hide complexity from beginners, but also:

to make code more readable and information-bearing by reducing the verbose repetition of common patterns, and
to reduce the surface area for bugs inherent in such repetitions.

Understanding the more verbose form does not mitigate these problems; on the contrary, it exacerbates them. A conditioned “oh, I know that pattern” response leads to skimming, and thus to code blindness. Ergonomics doesn’t just save keystrokes; it prevents bugs.

There are in fact very few methods that we strictly need on Sequence and Collection; most of them are just conveniences (semantically, at least). Why do we need .count when you can just do .reduce(0) { c, _ in c + 1 }? Even if compiler magic made it so there were no performance difference, .count would still pull its weight. This:

let hasDuplicates = (items.count != Set(items).count)

…is simply more readable than this:

let hasDuplicates = (items.reduce(0) { c, _ in c + 1 } != Set(items).reduce(0) { c, _ in c + 1 })

…and this version has a subtle bug that is very hard to spot in all the visual noise:

let hasDuplicates = (items.reduce(0) { c, _ in c + 1 } != Set(items).reduce(0) { _, c in c + 1 })

None of that is to say a word against .lazy, long may it live! It is to say rather that the fact that a new method is purely a convenience does not make it undesirable; rather, we should think about the tradeoff between having crisper, more readable code versus having a smaller API footprint. In this case, that tradeoff squarely favors count(where:).

soroush · March 21, 2018, 3:06pm

I'm not sure I follow here — which protocol should it go on if not Sequence? Set's conformance would always return 1 or 0, but I think that's actually okay?

It also raises another interesting question: what should the NSCountedSet (or any multiset) implementation return? With a naive implementation, NSCountedSet currently returns 1 (the cast to Int is because NSCountedSet isn't generic (yet?)):

let cs = NSCountedSet(array: [1, 1, 2, 3, 3, 3])

cs.count // => 3

Array(cs) // => [3, 1, 2]

let m = cs.count({ $0 as? Int == 1}) // => 1

I think the correct answer here is to return that there are 2 elements that pass the test?

anthonylatsis · March 21, 2018, 3:15pm

Well, formally it's okay, but in practice it's kind of redundant. It becomes a second contains(Element).

A set should return 1 or 0, by definition, could you clarify why 2 elements?

soroush · March 21, 2018, 3:20pm

Because NSCountedSet isn't generic, it's hard to show. But imagine NSCountedSet were generic, and we constructed one that was generic over an Equatable type, like Int. If that were the case, we'd get count(of:), right?

 let cs = NSCountedSet(array: [1, 1, 2, 3, 3, 3]) // => imagine this is an NSCountedSet<Int>

// inherited from this pitch
cs.count(of: 1) // => 1 

// part of the NSCountedSet API
cs.count(for: 1) // => 2

I think these two methods should return the same thing, otherwise it would be very confusing for end users.