Avoiding COW in MutableCollection's slice setter

I have an algorithm to collapse duplicate elements in a collection.

I couldn't find anything in the standard library that would help with this, so the solution I came up with is to iterate the collection, finding ranges of duplicates, and copying the regions between strings of duplicates to an earlier position which overwrites all but the first previously-discovered duplicate.

I'd rather not use RangeReplaceableCollection/replaceSubrange because it has to account for arbitrarily growing/shrinking the collection, when really I'm just shifting some content down a bit. Ideally, the shifting operations would optimize down to a memmove.

So it seems that MutableCollection is probably a more suitable generic constraint; with the caveat that since it cannot change the overall size of the collection, there will be some junk remaining at the tail-end, which the caller will have to remove/ignore (perhaps by calling removeSubrange, or by using a slice for any further processing). This isn't a public API, so I'm okay with some amount of awkwardness if it means I can definitely avoid copies and reduce code-size.

extension Collection {

  /// Returns the first slice of 2 or more consecutive elements which match the given predicate,
  /// or `nil` if the collection does not contain a run of 2 or more consecutive elements which match the predicate.
  ///
  /// ```swift
  /// let testOne = [1, 2, 3, 4]
  /// testOne.firstConsecutiveElements { $0 == 2 } // nil
  ///
  /// let testTwo = [1, 2, 2, 3, 4]
  /// testTwo.firstConsecutiveElements { $0 == 2 } // [2, 2], at indices 1..<3
  /// ```
  ///
  internal func firstConsecutiveElements(matching predicate: (Element) -> Bool) -> SubSequence? {

    var i = startIndex
    while let firstMatch = self[i...].firstIndex(where: predicate) {
      let run = self[index(after: firstMatch)...].prefix(while: predicate)
      if !run.isEmpty {
        return self[firstMatch..<run.endIndex]
      }
      i = run.endIndex
    }
    return nil
  }
}

extension MutableCollection where Self: RandomAccessCollection {

  /// Collapses runs of 2 or more consecutive elements which match the given predicate, keeping only the first element from each run.
  ///
  /// This function overwrites elements of the collection without changing its overall length (as that is not supported by `MutableCollection`).
  /// That means the collection will contain a tail region of old elements which need to be discarded (either removed/deinitialized, or simply ignored by subsequent
  /// processing). This function returns the index from which that tail region starts, and it is the caller's responsibility to ignore/discard elements from that position.
  ///
  /// The following example demonstrates removing consecutive `2`s from an `Array`.
  /// Note the call to `removeSubrange` to discard elements in the tail region:
  ///
  /// ```swift
  /// var elements = [1, 2, 2, 3, 2, 2, 2, 4, 5, 2, 2, 2, 2, 6, 7, 8, 2]
  /// let end = elements.collapseConsecutiveElements(from: 3, matching: { $0 == 2 })
  /// elements.removeSubrange(end...)
  /// print(elements) // [1, 2, 2, 3, 2, 4, 5, 2, 6, 7, 8, 2]
  /// ```
  ///
  /// - parameters:
  ///   - beginning: The index to begin searching from.
  ///   - predicate: The predicate by which elements are grouped. Consecutive elements that match this predicate are collapsed; only the first
  ///                matching element of each run is kept.
  /// - returns: The index after the last meaningful element in the collection.
  ///            Elements in positions `returnValue..<endIndex` will have already been collapsed or copied to an earlier position in the collection,
  ///            and should be discarded or ignored.
  ///
  internal mutating func collapseConsecutiveElements(
    from beginning: Index, matching predicate: (Element) -> Bool
  ) -> Index {

    var readHead = beginning
    guard let firstRun = self[readHead...].firstConsecutiveElements(matching: predicate) else {
      return endIndex // Nothing to collapse.
    }
    var writeHead = index(after: firstRun.startIndex) // Keep one of the duplicates.
    readHead = firstRun.endIndex

    while let dupes = self[readHead...].firstConsecutiveElements(matching: predicate) {
      let elementsToKeep = self[readHead..<index(after: dupes.startIndex)] // Keep one of the duplicates.
      let nextWriteHead = index(writeHead, offsetBy: elementsToKeep.count)
      self[writeHead..<nextWriteHead] = elementsToKeep
      writeHead = nextWriteHead
      readHead = dupes.endIndex
    }

    let tailLength = distance(from: readHead, to: endIndex)
    let newEnd = index(writeHead, offsetBy: tailLength)
    self[writeHead..<newEnd] = self[readHead..<endIndex]
    return newEnd
  }
}

I decided to do some basic scaling tests, in a worst-case situation where every 2 consecutive elements are considered duplicates and the 3rd element breaks the chain. Calling this on an UnsafeMutableBufferPointer<Int>, the function scales linearly up to 10 million elements:

Doing the same on an Array<Int>, I still get correct results, but the performance looks somewhere between quadratic (for n up to around 200,000) to ~cubic (up to 1 million elements. Collapsing duplicates from 1M elements took nearly 20 minutes, so I just gave up after that and considered the point proven). Debug vs. release mode doesn't appear to have any impact on the trend.

image

I assume this is due to COW - that the slice on the RHS of the subscript setter makes the array non-uniquely-referenced. I notice that MutableCollection doesn't actually provide any performance guarantees for its slicing setter (only a O(1) guarantee for constructing a slice from a Range<Index>), so technically this isn't out of spec, but it is unfortunate.

I would really rather not drop down to unsafe constructs just to avoid COW; even though I'm okay with having a slightly awkward API, I'm not really okay with eliminating bounds-checking and possibly violating memory safety if there's a bug in the code.

Is there any way to avoid this? It seems to me like MutableCollection is missing a requirement to copy content from some range of indices to a destination index. It's possible to do it with swap, but that would perform many more writes than is necessary, and would seem to preclude optimizing the copy to a memmove (which does appear to happen for UMBP<Int>).

1 Like

By holding onto a slice while writing the shifted elements you're forcing a copy.

      let elementsToKeep = self[readHead..<index(after: dupes.startIndex)] // Keep one of the duplicates.
      let nextWriteHead = index(writeHead, offsetBy: elementsToKeep.count)
      self[writeHead..<nextWriteHead] = elementsToKeep

But it seems that this range-replacement assignment can easily be written as a loop assigning single elements.

1 Like

Hmm, so I rewrote it to set each index individually rather than using bulk assignment:

internal mutating func collapseConsecutiveElements(
    from beginning: Index, matching predicate: (Element) -> Bool
  ) -> Index {

    var readHead = beginning
    guard let firstRun = self[readHead...].firstConsecutiveElements(matching: predicate) else {
      return endIndex // Nothing to collapse.
    }
    var writeHead = index(after: firstRun.startIndex) // Keep one of the duplicates.
    readHead = firstRun.endIndex

    while let dupes = self[readHead...].firstConsecutiveElements(matching: predicate) {
      let keepEnd = index(after: dupes.startIndex) // Keep one of the duplicates.
      let numToWrite = distance(from: readHead, to: keepEnd)
      let nextReadHead = dupes.endIndex // So we don't hold the slice 'dupes'.
      for _ in 0..<numToWrite {
        self[writeHead] = self[readHead]
        formIndex(after: &writeHead)
        formIndex(after: &readHead)
      }
      readHead = nextReadHead
    }

    let tailLength = distance(from: readHead, to: endIndex)
    for _ in 0..<tailLength {
      self[writeHead] = self[readHead]
      formIndex(after: &writeHead)
      formIndex(after: &readHead)
    }
    return writeHead
  }

And indeed, the performance seems much better when using Array! :tada: uhm... in release mode (linear plot, overall time):

chart_release

Unfortunately, in debug builds, it's still quadratic (again, linear, overall time):

I'm not sure how I feel about this workaround. Debug performance isn't super-critical, but it's not unimportant, either. Algorithmically-worse performance is very harsh: the overall execution time for 512k elements is only ~8ms in a release build, but almost 2 minutes in a debug build :dizzy_face:. I don't expect to be processing half a million elements very often in debug builds, but still... it's just really harsh.

It's probably still worth having bulk-assignment built-in to MutableCollection. This way is more awkward to write, and the penalty for not knowing to do it is very severe. It also puts operations like bounds-checking and indexing operations in a loop, and the compiler seems unable/unwilling to optimise those in many situations.

What is quite interesting is that UMBP no longer does bulk assignment via memmove. However, if I remove traps on index overflow (which I am convinced is entirely safe for all Collections), it will instead use wide vector registers to bulk copy, which is probably even better (that's why the compiler chose to do it, I suppose).