Counting in sequences, both conditional and unconditional

CTMacUser · November 8, 2019, 3:37am

Continuing the discussion from [Accepted] SE-0220: count(where:):

I've been thinking about counting lately. As I was thinking, I remembered that we added a Sequence.count(where:) to implement that, but the compiler keeps choking on it due to name conflicts. Maybe we should just punt and rename it:

extension Sequence {

    /// Determines how many elements qualify for the given predicate.
    ///
    /// - Parameter isIncluded: A predicate that takes an `Element` value that
    ///   shall return `true` only when its given element should be counted.
    /// - Returns: The number of elements that returned `true` from the calls to
    ///   `isIncluded`.
    ///
    /// - Complexity: O(*n*), where *n* is the length of the sequence.
    public func countThoseMatching(by isIncluded: (Element) throws -> Bool) rethrows -> Int {
        var count = 0
        for e in self where try isIncluded(e) {
            count += 1
        }
        return count
    }

}

extension Sequence where Element: Equatable {

    /// Determines how many elements equal the given value.
    ///
    /// - Parameter value: A value to compare all the elements against.
    /// - Returns: The number of elements that equal `value`.
    ///
    /// - Complexity: O(*n*), where *n* is the length of the sequence.
    public func countMatches(to value: Element) -> Int {
        return countThoseMatching(by: { $0 == value })
    }

}

But right now I need an unconditional count. I realized that count(where:) is the same as filter(_:) followed by an unconditional counting method. I was about to post that here, but I counter-realized the manual method doesn't cover predicates that could throw. (Plus the user experience issues mentioned in the original thread.)

An unconditional count could be done with count(where:) with { _ in true }, but that doesn't sound efficient. Unless it can be shown that using the always-true predicate always elides the loop's test during the SIL or LLVM phase, then we need something like:

extension Sequence {

    /// Measures the length of the sequence by burning through its elements.
    ///
    /// - Returns: The number of elements in the sequence
    ///
    /// - Complexity: O(*n*), where *n* is the length of the sequence.
    public func overworkedCount() -> Int {
        var count = 0
        for _ in self {
            count += 1
        }
        return count
    }

}

unconditionalCount() could be a less-cute name. My playground accepted count(), but I don't want to risk causing another compiler name-overload conflict.

xwu · November 8, 2019, 3:57am

In the rare case that you need to consume a sequence that isn’t a collection for the count: reduce(0) { x, _ in x + 1 }. This should be rare. Otherwise count would be a property on Sequence.

CTMacUser · November 8, 2019, 4:19am

That's an even worse user experience, and possibly less obvious, than count(where: {_ in true}). Most non-mutating methods of Sequence/Collection can be massaged into forms of reduce, but we don't do that.

count shouldn't be a property on Sequence because it can't be non-destructive in general, and couldn't work for infinite sequences anyway.

wadetregaskis · November 8, 2019, 5:42am

Indeed - so how would overworkedCount work on an infinite sequence, to not just loop forever, and what should it return as an Int?

I’m not sure of the merits of this for the standard library - it feels unusual and by that standard maybe not worth inclusion in the standard library, as opposed to a 3rd party library or just ad-hoc if & when it’s needed - but I certainly don’t mind the intellectual exercise of thinking about how one can do this…

My first thought was that this shouldn’t prioritise counting in the name, but rather consumption of the sequence. It should be more that the number of items enumerated is returned as a side-effect. That way the consequences of exhausting a potentially one-time-enumerable sequence are foremost in the user’s mind. So something more like:

extension Sequence {
  public func truncate() -> Int? {
    guard if some_magic_to_detect_infinite_sequences? else {
      return nil
    }

    var count = 0

    for _ in self {
      count += 1
    }

    return count
  }
}

CTMacUser · November 8, 2019, 8:05pm

overworkedCount goes through every element, so it must soft(?)-lock its program when used on an infinite sequence. It's no different than the theoretical Sequence.count property, but I feel that a different between representing a status as a property vs. a method is that we expect a property to never fail and work in reasonable time. Using a method as I did is a hint that determining the desired status may take unreasonable time (in some circumstances); that the user should read the documentation at least once to check if there's any downsides.

Do you feel that

mySequence.count(where: { _ in true })

is a better (i.e. more Swift-y) way to determine the entire count without inflicting any storage penalty? I wouldn't mind this, modulo we add it to the documentation of count(where:), if it can be shown that it will elide out any conditional checks in the SIL (or IR-Gen) and said elision semantics would be preserved in any updates to SIL (and/or IR-Gen).

So, we could rename the method to "consumeAll()"? It would still have to return a non-optional Int. Should we make the return value discardable? But the name is non-obvious for users that want an unconditional count, especially if they know the sequence is multi-pass. The documentation would have to warn on soft-locking when the sequence is infinite.

There is no way to detect infinite sequences in advance without adding a new customization point for that purpose. It would probably have to return true by default. (But instances of Collection would override it to return false.)