startOfSuffix() can be much simplified by using `.lastIndex`?

young · May 11, 2021, 12:25am

I got the answer on how to implement dropLast(while:). I was using suffix(while:) from Swift Algorithms. So I was looking at the source of .suffix(while:) and saw this:

extension BidirectionalCollection {
  /// Returns the inclusive lower bound of the suffix of elements that satisfy
  /// the predicate.
  ///
  /// - Parameter predicate: A closure that takes an element of the collection
  ///   as its argument and returns `true` if the element is part of the suffix
  ///   or `false` if it is not. Once the predicate returns `false` it will not
  ///   be called again.
  ///
  /// - Complexity: O(*n*), where *n* is the length of the collection.
  @inlinable
  internal func startOfSuffix(
    while predicate: (Element) throws -> Bool
  ) rethrows -> Index {
    var index = endIndex
    while index != startIndex {
      let after = index
      formIndex(before: &index)
      if try !predicate(self[index]) {
        return after
      }
    }
    return index
  }
}

seems this is kind of re-inventing the wheel and can be much simplified by using BidirectionalCollection.lastIndex(where:).

xwu · May 11, 2021, 2:12am

This issue of implementation has been discussed:

Add `SortedArray`, `SortedSet` and `SortedDictionary` with binary search

Add `SortedArray`, `SortedSet` and `SortedDictionary` with binary search

It seems rather antithetical to the goal of “use the standard algorithms that are already written” when both the standard library and the Algorithms package choose to spend a dozen lines reimplementing something twice (with complicated nested loops whose exit conditions are hidden in the middle, no less!), that would be easy one-liners using the standard index-finding methods.

Programming isn't turtles all the way down -- the cost of providing really nice abstractions is that somebody somewhere needs to implement them in terms of more primitive tools. The engineers working on the stdlib and these packages sweat the details of implementing these so that others don't have to.

young · May 12, 2021, 1:19am

I don't understand why someone would rather write some duplicate code to do exactly what BidirectionalCollection.lastIndex does. This startOfSuffix() can be writen in one short line using .lastIndex(where:) without any loop and early exit.

nnnnnnnn · May 12, 2021, 3:17pm

It will probably help to see the difference if we look at what an implementation of startOfSuffix(while:) on top of lastIndex(where:) looks like:

  func startOfSuffix(
    while predicate: (Element) throws -> Bool
  ) rethrows -> Index {
    guard let lastFailingIndex = lastIndex(where: { !predicate($0) }) else {
      return startIndex
    }
    return index(after: lastFailingIndex)
  }

Compared to the implementation in the Algorithms package, this approach requires two (or three) extra operations:

The result of lastIndex(where:) needs to be boxed into an Optional and unwrapped
There's an extra call to index(after:)

Those differences are why it's implemented at that level — startOfSuffix(while:) and its counterpart, endOfPrefix(while:), essentially end up as building blocks used in a few different algorithms. (All that said, I don't believe we've benchmarked the difference in performance, so it may end up being moot.)

Nevin · May 12, 2021, 3:38pm

And it’s worth noting this call may be arbitrarily expensive. For example in String it could involve traversing a Character with any number of code-points.

young · May 12, 2021, 3:58pm

I prefer this way:

try lastIndex(where: { try !predicate($0) }).map { index(after: $0) } ?? startIndex

less to read and still quite clear.

is index(after:) generally problematic, on a BidirectionalCollection?

kylemacomber · May 12, 2021, 4:43pm

It's not generally problematic, but it's unnecessary work. For the Algorithms package and the Standard Library, there's rarely a good excuse for doing unnecessary work. As Karoy said:

It would be really unfortunate if someone had to re-implement one of our algorithms to be faster because we had opted for a slightly more concise but less efficient solution.

young · May 12, 2021, 5:41pm

Maybe add a comment stating this code is written for maximum performance is the reason why .lastIndex is not used. Or stating this "re-write for max performance" choice in the doc.

And it seems this makes the case that .lastIndex is insufficient for this use case, as such, the startOfSuffix() is generally useful, maybe it should be public?

And should the stdlib address this apparent shortcoming of .lastIndex? Perhaps provide a variant that does what the startOfSuffix() does? Or one the returns both (lastIndex, after)

xwu · May 12, 2021, 6:59pm

I think these questions are addressed adequately by @lorentey's discussion on the thread linked above. I quote some salient parts below:

It is the goal of the standard library (and algorithms library, which is a proving ground for the standard library) to provide a rich array of useful algorithms for general use. It's a non-goal to remove the need for raw loops to be used in its own implementations, and by the same token, the need for raw loops in the standard library or algorithms library doesn't demonstrate that there's a missing building block that's generally useful. Nor is it even really remarkable enough to warrant a comment in the code, since doing this legwork in the standard library and algorithms library so that users don't have to is the entire point of these libraries.