[Benchmarks] Selection: Linear-Time Order-Statistics

Here are benchmarks to investigate the impact of a pitch to declare a selection algorithm in Swift-Algorithms:

We begin with a “simple” implementation of QuickSelect:

func Pivot(left: Int, right: Int) -> Int {
  (left + right) / 2
}

func Partition<Element: Comparable>(
  _ a: inout Array<Element>,
  left: Int,
  right: Int,
  pivot: inout Int
) {
  let temp = a[pivot]
  a.swapAt(pivot, right)
  pivot = left
  for i in (left ..< right) {
    if a[i] < temp {
      a.swapAt(pivot, i)
      pivot += 1
    }
  }
  a.swapAt(pivot, right)
}

func QuickSelect<Element: Comparable>(
  _ a: inout Array<Element>,
  k: Int
) {
  var left = 0
  var right = (a.count - 1)
  while left < right {
    var pivot = Pivot(
      left: left,
      right: right
    )
    Partition(
      &a,
      left: left,
      right: right,
      pivot: &pivot
    )
    if k < pivot {
      // go left!
      right = (pivot - 1)
      continue
    }
    if pivot < k {
      // go right!
      left = (pivot + 1)
      continue
    }
    // found it!
    break
  }
}

For simplicity let’s go ahead and start with this version. A “production” scale version of QuickSelect could implement some important optimizations… but this is good enough for now to achieve an expected linear-time performance.

Let’s also implement a MinSelect that uses the min method from Swift-Algorithms:

import Algorithms

func MinSelect<Element: Comparable>(
  _ a: Array<Element>,
  k: Int
) -> Element {
  precondition(k + 1 < (a.count / 10))
  return a.min(count: k + 1)[k]
}

Our current implementation of min runs on sorted from Standard Library when the count is greater than ten percent of the Array.[1] Let’s precondition here and check the performance of a full sort in another place.

Here is a HeapSelect that uses the Heap data structure from Swift-Collections:

import Collections

func HeapSelect<Element: Comparable>(
  _ a: Array<Element>,
  k: Int
) -> Element {
  var heap = Heap<Element>(a.prefix(k + 1))
  for i in a.suffix(from: k + 1) {
    if i < heap.max! {
      heap.replaceMax(with: i)
    }
  }
  return heap.max!
}

Here’s some simple infra to define a benchmark run:

import Foundation

func Measure<T>(
  _ label: String,
  _ cycles: Int,
  setUp: () -> () = { },
  body: () -> (T),
  condition: (T) -> Bool = { _ in true },
  tearDown: () -> () = { },
) {
  var duration = Duration.nanoseconds(0)
  
  for _ in (1 ... cycles) {
    setUp()
    let clock = ContinuousClock()
    let instant = clock.now
    let result = body()
    duration += instant.duration(to: clock.now)
    precondition(condition(result))
    tearDown()
  }
  
  duration /= cycles
  
  let microseconds = duration.formatted(
    .units(
      allowed: [.microseconds],
      fractionalPart: .show(length: 3)
    )
  )
  print("\(label): \(microseconds)")
}

Let’s define some benchmark runs on an Array of n elements:

func Benchmark(
  n: Int,
  cycles: Int
) {
  let a = Array(1 ... n)
  let k = (a.count / 10) - 2
  do {
    var temp = a
    Measure("SortSelect", cycles) {
      temp.shuffle()
    } body: {
      temp.sort()
      return temp[k]
    } condition: { result in
      result == a[k]
    }
  }
  do {
    var temp = a
    Measure("MinSelect", cycles) {
      temp.shuffle()
    } body: {
      MinSelect(temp, k: k)
    } condition: { result in
      result == a[k]
    }
  }
  do {
    var temp = a
    Measure("HeapSelect", cycles) {
      temp.shuffle()
    } body: {
      HeapSelect(temp, k: k)
    } condition: { result in
      result == a[k]
    }
  }
  do {
    var temp = a
    Measure("QuickSelect", cycles) {
      temp.shuffle()
    } body: {
      QuickSelect(&temp, k: k)
      return temp[k]
    } condition: { result in
      result == a[k]
    }
  }
}

Each measurement attempts to select the element at index k using a different approach:

  • We use the sort algorithm from Standard Library to modify an Array in-place.
  • We use the min algorithm from Swift-Algorithms to compute the k + 1 smallest elements.
  • We use the Heap data structure from Swift-Collections to compute the k + 1 smallest elements.
  • We use our QuickSelect algorithm to modify an Array in-place.

After each measurement we run a simple soundness check to confirm the algorithm selected the correct element.

We can then run these benchmarks:

func main() {
  let sizes = [100, 1_000, 10_000, 100_000]
  for size in sizes {
    print("--- Size: \(size) ---")
    Benchmark(
      n: size,
      cycles: 100_000
    )
    print()
  }
}

main()

Results

Here are some results from a MacBook Pro M2 Max building for release:

--- Size: 100 ---
SortSelect: 1.689 μs
MinSelect: 0.682 μs
HeapSelect: 0.548 μs
QuickSelect: 0.553 μs

--- Size: 1000 ---
SortSelect: 25.108 μs
MinSelect: 11.397 μs
HeapSelect: 6.229 μs
QuickSelect: 5.431 μs

--- Size: 10000 ---
SortSelect: 352.044 μs
MinSelect: 240.842 μs
HeapSelect: 67.784 μs
QuickSelect: 53.734 μs

--- Size: 100000 ---
SortSelect: 4,758.252 μs
MinSelect: 13,157.398 μs
HeapSelect: 957.129 μs
QuickSelect: 536.502 μs

SortSelect uses sorting from Standard Library to sort all n elements. Our expected runtime is O(n log n): slower than O(n).

MinSelect returns faster than SortSelect for small collections, but there seems to be an “accidental quadratic” that makes the current implementation much slower for large collections.[2] Our expected runtime is O(k log k + nk): slower than O(n).

The runtime complexity of HeapSelect is O(n log k): this is faster than sorting all the elements, but still slower than O(n).

QuickSelect returns fastest in expected linear time. One important benefit of QuickSelect is this operation is in-place: we only need to consume constant space if we do not need to preserve the order of our original Array.

An important observation here is that SortSelect is guaranteed to be stable. Our QuickSelect currently makes no guarantees about stability. If we had multiple elements that compared as equal by-value but were not identical we would no longer preserve the order they originally appeared.

Conclusions

An efficient selection algorithm looks like an impactful addition to Swift-Algorithms. This would be a good choice to have available for engineers that need to select an order statistic without performing the work to sort additional elements.

Next Steps

An important area of research would be to benchmark and compare different selection algorithms against each other. The version of QuickSelect presented here is not meant to be the final version that ships to production. QuickSelect is not the only selection algorithm available. Solutions like IntroSelect are used in C++ and Rust for improved worst-case performance.

Another area of research would be to investigate how a selection algorithm could also introduce stability. This does not need to block our “unstable” selection algorithm from shipping.


  1. https://github.com/apple/swift-algorithms/blob/1.2.1/Sources/Algorithms/MinMax.swift#L267-L271 ↩︎

  2. https://github.com/apple/swift-algorithms/blob/1.2.1/Sources/Algorithms/MinMax.swift#L34 ↩︎

1 Like