Find joint maxima of sequence

Saklad5 · January 21, 2022, 8:54pm

There are currently several algorithms for finding the greatest element(s) of a sequence, such as:

Sequence.max(by:)
- O(n)
- Returns a single greatest element
Collection.max(count:sortedBy:)
- O(k log k + nk)
- Returns specified quantity of greatest elements, stably sorted

I propose adding an additional group:

Sequence.max(by:)
- O(n)
- Returns all greatest elements, preserving original order

This method takes a closure that accepts an Element and returns something conforming to Comparable. It returns the greatest result from across all elements in the sequence, along with the elements that produced it.

As an example of utility, consider an expensive operation used to determine the value of an element. If the outcome is tied between multiple elements, all such elements are desired. The number of elements with that highest value are not known in advance.

To accomplish this, a point of comparison (hereafter PoC) is produced from the first element in the sequence and established as a baseline. An array containing that element is also initialized. A PoC is then produced from each successive element: if equal to the baseline, the element is appended to the array. If greater than the baseline, the array is replaced by one containing the element, and the baseline is supplanted by the new PoC. When iteration is complete, the final array and PoC are returned.

An implementation is below. Note that there are many ways to generalize and specialize this:

A Collection implementation may use SE-0270’s RangeSet and accompanying operations to maintain indices during iteration rather than allocating new arrays of elements.
This algorithm could be generalized to accept a closure for equality and a closure for increasing order, at the cost of not preserving the point of comparison.
A sequence of Comparable elements could skip the closure entirely and make a number of simplifications relative to simply passing the identity function.^[1]
Counterparts for minima could be added.
A closure could be added for transforming elements during iteration, which is particularly helpful if Element is actually a tuple containing both the desired element and a point of comparison. Without this tweak, the point of comparison would be redundantly stored many times over.
An asynchronous version of this algorithm can be trivially implemented, which is actually what I originally designed. This is helpful for evaluating a TaskGroup when combined with the above modification, where the ChildTaskResult is a tuple consisting of a child task’s input and output.

extension Sequence {
  /// Returns the maximum elements in the sequence, using the given predicate to produce points of
  /// comparison.
  ///
  /// - Parameter pointOfComparison: A closure that accepts an element of this sequence as its
  /// argument and returns a point of comparison that can be used to determine ordering.
  /// - Returns: A tuple consisting of the sequence's maximum elements in their original order and
  /// the result of calling `pointOfComparison` with the first element thereof. If the sequence has
  /// no elements, returns `nil`.
  ///
  /// - Complexity: O(*n*), where *n* is the length of the sequence.
  /// - Postcondition: The maximum elements, if returned, will not be empty.
  func max<C: Comparable>(
    by pointOfComparison: (Element) throws -> C
  ) rethrows -> (maxima: [Element], pointOfComparison: C)? {
    var iterator = makeIterator()

    guard let first = iterator.next() else { return nil }

    return try IteratorSequence(iterator)
      .reduce(
        into: (maxima: [first], pointOfComparison: pointOfComparison(first))
      ) { partialResult, nextElement in
      switch try pointOfComparison(nextElement) {
      case ..<partialResult.pointOfComparison:
        return
      case partialResult.pointOfComparison:
        partialResult.maxima.append(nextElement)
      case let newBaseline:
        assert(newBaseline > partialResult.pointOfComparison)
        partialResult = ([nextElement], newBaseline)
      }
    }
  }
}

Note, however, that equality implies substitutability: this would be considerably less useful than it sounds, as there’s little reason to preserve multiple interchangeable values. ↩︎

Saklad5 · March 11, 2022, 7:48pm

Does any interest in this exist, or should I just make it a separate package?