Sorted(by:) for lazy collections does not use call-by-need

Hello.
In my application I have the following code snippet:

Sorted lazy collection
let array = [1, 10, 5, 6, 20, 45, 3, 12, 14, 21, 6, 7]
var i = 0
let result = array.lazy.sorted { a, b -> Bool in
    print("predicate " + String(i))
    i += 1
    return a < b
}.first
print(result ?? 0)

Predicate that I pass to sorted(by:) is calling 35 times. However I expect that it will be called less times, because I use a lazy collection and I need only the first item of sorted result. As I know sorted(by:) uses the Introsort algorithm, that's why I guess it is possible to use call-by-need for it.

I suppose the compiler might be able to optimize it with special knowledge of the fact that a sort followed by first is a min operation. But this isn’t really an optimization specific to being lazy though as it can be applied to the eager version as well.

1 Like

Hello @Letan
Thank you for your answer. Now I try this code:

Optimisation with min(by:)
let array = [1, 10, 5, 6, 20, 45, 3, 12, 14, 21, 6, 7]
var i = 0
let result = array.min { a, b -> Bool in
    print("predicate " + String(i))
    i += 1
    return a < b
}
print(result ?? 0)

As I can see predicate was called 11 times only. However I cannot understand why predicate is calling 35 times with lazy.sort(by:).first.

The sorting algorithm might not necessarily place items in their correct position after each comparison. For example, your array is less than 20 elements, so the stdlib will use an insertion sort algorithm to sort your array. Using a modified version of the stdlib code to print out the state of the array will produce the following results:

Array states after swaps

(note I used the same array as you except that I moved 1 to a the 5th position to show how long it takes to get into it's correct position)

[10, 10, 6, 20, 1, 45, 3, 12, 14, 21, 6, 7]
[5, 10, 10, 20, 1, 45, 3, 12, 14, 21, 6, 7]
[5, 6, 10, 20, 20, 45, 3, 12, 14, 21, 6, 7]
[5, 6, 10, 10, 20, 45, 3, 12, 14, 21, 6, 7]
[5, 6, 6, 10, 20, 45, 3, 12, 14, 21, 6, 7]
[5, 5, 6, 10, 20, 45, 3, 12, 14, 21, 6, 7]
[1, 5, 6, 10, 20, 45, 45, 12, 14, 21, 6, 7]
[1, 5, 6, 10, 20, 20, 45, 12, 14, 21, 6, 7]
[1, 5, 6, 10, 10, 20, 45, 12, 14, 21, 6, 7]
[1, 5, 6, 6, 10, 20, 45, 12, 14, 21, 6, 7]
[1, 5, 5, 6, 10, 20, 45, 12, 14, 21, 6, 7]
[1, 3, 5, 6, 10, 20, 45, 45, 14, 21, 6, 7]
[1, 3, 5, 6, 10, 20, 20, 45, 14, 21, 6, 7]
[1, 3, 5, 6, 10, 12, 20, 45, 45, 21, 6, 7]
[1, 3, 5, 6, 10, 12, 20, 20, 45, 21, 6, 7]
[1, 3, 5, 6, 10, 12, 14, 20, 45, 45, 6, 7]
[1, 3, 5, 6, 10, 12, 14, 20, 21, 45, 45, 7]
[1, 3, 5, 6, 10, 12, 14, 20, 21, 21, 45, 7]
[1, 3, 5, 6, 10, 12, 14, 20, 20, 21, 45, 7]
[1, 3, 5, 6, 10, 12, 14, 14, 20, 21, 45, 7]
[1, 3, 5, 6, 10, 12, 12, 14, 20, 21, 45, 7]
[1, 3, 5, 6, 10, 10, 12, 14, 20, 21, 45, 7]
[1, 3, 5, 6, 6, 10, 12, 14, 20, 21, 45, 45]
[1, 3, 5, 6, 6, 10, 12, 14, 20, 21, 21, 45]
[1, 3, 5, 6, 6, 10, 12, 14, 20, 20, 21, 45]
[1, 3, 5, 6, 6, 10, 12, 14, 14, 20, 21, 45]
[1, 3, 5, 6, 6, 10, 12, 12, 14, 20, 21, 45]
[1, 3, 5, 6, 6, 10, 10, 12, 14, 20, 21, 45]
[1, 3, 5, 6, 6, 7, 10, 12, 14, 20, 21, 45]
The modified code I used to check

Just added some print's.

extension MutableCollection where Self: BidirectionalCollection {
  @inlinable
  internal mutating func insertionSort(
    within range: Range<Index>, 
    by areInIncreasingOrder: (Element, Element) throws -> Bool
  ) rethrows {
    guard !range.isEmpty else { return }

    let start = range.lowerBound

    // Keep track of the end of the initial sequence of sorted elements. One
    // element is trivially already-sorted, thus pre-increment Continue until
    // the sorted elements cover the whole sequence
    var sortedEnd = index(after: start)

    while sortedEnd != range.upperBound {
      // get the first unsorted element
      // FIXME: by stashing the element, instead of using indexing and swapAt,
      // this method won't work for collections of move-only types.
      let x = self[sortedEnd]

      // Look backwards for x's position in the sorted sequence,
      // moving elements forward to make room.
      var i = sortedEnd
      repeat {
        let j = index(before: i)
        let predecessor = self[j]

        // If closure throws, put the element at right place and rethrow.
        do {
          // if x doesn't belong before y, we've found its position
          if try !areInIncreasingOrder(x, predecessor) {
            break
          }
        } catch {
          self[i] = x
          throw error
        }

        // Move y forward
        self[i] = predecessor
        print(self)
        i = j
      } while i != start

      if i != sortedEnd {
        // Plop x into position
        self[i] = x
      }
      formIndex(after: &sortedEnd)
    }
    print(self)
  }
}

var array = [10, 5, 6, 20, 1, 45, 3, 12, 14, 21, 6, 7]
var i = 0
array.insertionSort(within: array.startIndex..<array.endIndex) { a, b -> Bool in
//  print("predicate " + String(i))
  i += 1
  return a < b
}

Basically, you can't really know whether the item in the first position (any position) is correct until you completely sort the array.

2 Likes

When you write let sorted = array.lazy.sorted() nothing is sorted yet. When you then call sorted.first, the array is actually sorted, and the first element returned. Note, however, that even though sorting is delayed, the sorting algorithm is run to its completion before returning the first element. This is because .first cannot short circuit anything. It needs to know that the collection/sequence is indeed sorted.

Writing is a one-liner doesn't change that.

However, min(by:) doesn't have to complete the sort, if it knows that is has found the smallest element.

Thank you for a very detailed explanation. But what about the situation when array's size is greater than 20? I guess in such case the Introsort algorithm begins as the Quicksort algorithm and at the first iteration I will know that about a half of items will be not in the begin of a sorted collection. However when I tried sorted(by:) with a huge array of integers I've seen that it works the same way both for lazy and non-lazy collections. Could you please explain me why?

Thank you for your answer. But as I know the Introsort algorithm works as the Quicksort algorithm on first iterations and in such case I know that I can reduce the amount of comparations if I need only the first item of a sorted collection.

This isn't true. Sorting is always eager. lazy isn't a universal "deferring" mechanism.

You can see this in action if you write a comparable wrapper that prints:

struct NoisyComparable<Base: Comparable>: Comparable {
  let _base: Base
  static func < (lhs: NoisyComparable, rhs: NoisyComparable) -> Bool {
    print("comparing \(lhs._base) with \(rhs._base)")
    return lhs._base < rhs._base
  }
}

let array = [5,2,1,4].map { NoisyComparable(_base: $0) }

let sorted = array.lazy.sorted() // will print out on comparison here
2 Likes

Because they are the same :) LazyCollection uses the sorted method found on Sequence because it doesn't conform to MutableCollection or provide its own sort method. Looking at the Sequence definition we can see that a new ContigousArray must eagerly be created then sorted.

sorted definition for Sequence
  @inlinable
  public func sorted(
    by areInIncreasingOrder:
      (Element, Element) throws -> Bool
  ) rethrows -> [Element] {
    var result = ContiguousArray(self)
    try result.sort(by: areInIncreasingOrder)
    return Array(result)
  }
2 Likes

It is certainly true that there are partial sorting algorithms that are more efficient when you know in advance you only want the top n entries. It would be great to add these to the standard library. But I think it's unrealistic to expect partial sorting to kick in automatically via the lazy property when you then call a method on a subset of a lazy collection (e.g. via first or prefix or slicing in general). The mechanisms for this to work just aren't there.

1 Like

Yeah. I was apparently confused. I guess my point was still kinda the same you're making: That .lazy doesn't make everything deferred, and at some point the sequence have to be walked. I was only for some reason confused as to which step this happened at. I guess it doesn't make much sense to return a LazySortedSequence.

Thanks for the clarification :-)