Add `extract(where:)` method to `RangeReplaceableCollection`

I've updated the proposal with suggested changes, a better implementation, and a link to the implementation PR. I've also expanded it to include extractFirst(_:), extractLast(_:), and extractSubrange(_:).

I think rather than cut straight to what you're proposing, it is worth thinking about some more fundamental capabilities to add the the standard library, from which these operations can be composed. Once these are added, building the capabilities you lay out in the proposal becomes a lot easier, and may no longer be necessary, though perhaps still worthwhile.

Those building blocks include stable partition, and the ability to "move" the tail of a range-replaceable collection into an array. The stable partition would be the starting point for how you would build extract(where:) efficiently, by partitioning into two sections, kept and extracted elements. Then, you would use the ability to move the tail of extracted elements into a new array. It ought to be possible to make this second part very efficient for types backed by contiguous storage, like array, because if the element type is bitwise-takeable it can just be a memcpy + resize. This approach also has the benefit of working with move-only types in the future (I think, anyway...).

Breaking these into two operations also has the benefit that you need not do any allocation when what you want is to delete certain elements, but before you delete them, do something extra with them. If partition gives you back the pivot point, you can use range subscripting to give you a slice of the to-be-removed elements, which you can use, then delete using removeSubrange.

Once you have the move-tail feature, extractLast(count:) becomes trivial, because it's just a question of finding the index n from the end and then moving that out into an array.

extractFirst(n) and extracting an interior subrange is trickier, but could be done by using a rotate operation instead of partitioning.

9 Likes

Ah, in fact I now see some of these points were also made on the original thread. I'd strongly recommend considering those points further. Breaking the problem down into smaller composable features will allow for a wider variety of use cases.

Thanks for looking through the proposal! I agree that bringing both of these features would make the implementation a lot more straightforward. My understanding on the stable partition front was that an implementation which uses a stable partition would only be appropriate for RangeReplaceableCollection where Self: MutableCollection, just how removeAll(where:) is currently implemented. This algorithmic improvement would definitely be welcome, but something compatible with a pure RangeReplaceableCollection would also be necessary, no?

Regarding the tail-move functionality, I think I'm probably just a bit out of my depth on what exactly would go into that. How would it differ from, e.g. Array(self[index..<end])?

Thinking about this a bit more, RangeReplaceableCollection.extractSubrange(_:) -> Self is probably what's needed. This would be more efficient when done from the end than the start/middle for most collections, but it doesn't need to be limited to that. It could be a customization point that allowed types like Array to do memory-unsafe implementations, with a default implementation in terms of the existing capabilities of RRC.

1 Like

Non-Mutable RangeReplaceableCollections are very rare. Mainly just String (because its elements are variable width). Maybe some fancy heterogenous-but-contiguous collection could be too.

The pattern with removeAll(where:) is that there's a default implementation for pure RRCs that just creates a new version of the collection and self-assigns, but then there's an optimized version that does a half-stable partition when the collection is mutable.

So just to clarify, you're suggesting adding extractSubrange(_:) as a protocol member rather than just an extension method? And would there be a more clever default implementation than the one currently in the proposal?

public mutating func extractSubrange(_ bounds: Range<Index>) -> Self {
    let extracted = Self (self[bounds])
    self.removeSubrange(bounds)
    return extracted
}

Also, as far as moving stablePartition(isSuffixElement:) into the standard library, do you think that would deserve its own proposal, or is it appropriate to do it here? I pictured the two issues a separate: an improvement of the RangeReplaceableCollection API vs. a possible future improvement of the implementation.

As a further thought, how do you view this interacting with the _customRemove(First/Last)(_:) system? IMO any extract(First/Last)(_:) should respect those as well, which forces the extract operation to copy the to-be-removed elements before we call out to the customization points. Is it worth adding new _customExtract(First/Last)(_:) methods as customization points as well?

I was just talking with a co-worker about this. I've written some utility functions that do something similar:

extension Collection {
    
    /// Removes the first `k` elements from the collection and returns them as
    /// a subsequence.
    ///
    /// - Parameter k: The number of elements to remove. Must not be larger than
    ///                the collectionā€™s `count`.
    /// - Returns: The elements that have been removed from the collection.
    mutating public func removeAndReturnFirst(_ k: Int) -> Self.SubSequence {
        precondition(
            k >= 0, "Number of elements to remove should be non-negative")
        
        precondition(
            count >= k,
            "Can't remove more items from a collection than it contains")
        
        let offset = Swift.min(count, k)
        
        let sequence = self[startIndex ..< index(startIndex, offsetBy: offset)]
        _ = self.dropFirst(k)
        
        return sequence
    }
    
    /// Removes all of the elements from the collection and returns them as a
    /// subsequence.
    ///
    /// - Note: The collection must not be empty.
    ///
    /// - Returns: The elements that have been removed from the collection.
    mutating public func removingAll() -> Self.SubSequence {
        precondition(!isEmpty,
                     "Can't remove all items from an empty collection")
        
        return removeAndReturnFirst(count)
    }
    
}

The main use case I have is when I want to remove the elements of a collection and do some operation on them. I want it to be as close to atomic as possible. In my case, hereā€™s where I use it:

private func unregisterKeyboardObservers() {
    keyboardObservers.removingAll()
        .forEach(NotificationCenter.default.removeObserver)
}

I really like the simplicity of passing removeObserver directly here. Iā€™m in favor of this proposal, although itā€™s a bit different than what I came up with above.

1 Like

As per @jawbroken, I agree that extract feels like the correct name for this. However, is it not better to use extractAll to keep it similar to removeAll?

2 Likes

@Ben_Cohen What do you see as the most reasonable path forward on this? Should the proposal be postponed following a rotate/stablePartition proposal, or is it appropriate to fold those in to this proposal? And what do you think the best way to address the _customRemove... customization points is?