Add `extract(where:)` method to `RangeReplaceableCollection`

A related method that I've seen a lot of requests for is the non-mutating version of this, which returns two arrays filled with the elements that pass and fail a predicate. I've seen it sometimes written as dividing(where:):

// Sequence method:
func dividing(where predicate: (Element) -> Bool)
    -> (matching: [Element], nonmatching: [Element])

let numbers = 1...10
let divided = numbers.dividing(where: { $0.isMultiple(of: 2) })
// divided.matching == [2, 4, 6, 8, 10]

Is there interest in this as well? Is there a name or pair of names that could indicate the relationship between these? (I'm not sure if extracting(where:) makes sense.)

The partition(by:) method is more closely the non-mutating version of what you want, though perhaps there would be utility for a partitioning(by:) method. Could return either (secondPartitionIndex: Index, result: Self) or (firstPartition: Self, secondPartition: Self).

1 Like

It seems like a proposal to cover this feature may as well propose extractSubrange(_:) and extract(First/Last)(_:) as well. Is there any reason to not include these functions?

Here's the beginnings of a proposal, will continue to flesh it out over the coming days.

1 Like

Oh, good point. The nonmutating version would be better linked up with that method.

  • Typo in the implementation snippet. You forgot All after extract.

  • And I suggested discardAll not just discard. ;)

  • The usage example requires () after execute.

  • You should also mention that modifying removeAll to return something would be a potentially source breaking change regardless the discarding attribute. predicates.forEach(collection.removeAll(where:)) for example will stop compiling.

1 Like

Thanks for taking the time to look it over! Edits incoming (EDIT: And pushed!).

Is there a way to get this to compile today? As written I thought the partial application of removeAll(where:) would be disallowed since it's mutating.

1 Like

Ah sorry you're totally right, I forgot about that. That does not compile today, excuse me for confusion.

1 Like

No worries. If you do think of any other ways in which the @discardableResult alternative would be source-breaking let me know—that should definitely be called out in the proposal.

I've updated the proposal with suggested changes, a better implementation, and a link to the implementation PR. I've also expanded it to include extractFirst(_:), extractLast(_:), and extractSubrange(_:).

I think rather than cut straight to what you're proposing, it is worth thinking about some more fundamental capabilities to add the the standard library, from which these operations can be composed. Once these are added, building the capabilities you lay out in the proposal becomes a lot easier, and may no longer be necessary, though perhaps still worthwhile.

Those building blocks include stable partition, and the ability to "move" the tail of a range-replaceable collection into an array. The stable partition would be the starting point for how you would build extract(where:) efficiently, by partitioning into two sections, kept and extracted elements. Then, you would use the ability to move the tail of extracted elements into a new array. It ought to be possible to make this second part very efficient for types backed by contiguous storage, like array, because if the element type is bitwise-takeable it can just be a memcpy + resize. This approach also has the benefit of working with move-only types in the future (I think, anyway...).

Breaking these into two operations also has the benefit that you need not do any allocation when what you want is to delete certain elements, but before you delete them, do something extra with them. If partition gives you back the pivot point, you can use range subscripting to give you a slice of the to-be-removed elements, which you can use, then delete using removeSubrange.

Once you have the move-tail feature, extractLast(count:) becomes trivial, because it's just a question of finding the index n from the end and then moving that out into an array.

extractFirst(n) and extracting an interior subrange is trickier, but could be done by using a rotate operation instead of partitioning.

9 Likes

Ah, in fact I now see some of these points were also made on the original thread. I'd strongly recommend considering those points further. Breaking the problem down into smaller composable features will allow for a wider variety of use cases.

Thanks for looking through the proposal! I agree that bringing both of these features would make the implementation a lot more straightforward. My understanding on the stable partition front was that an implementation which uses a stable partition would only be appropriate for RangeReplaceableCollection where Self: MutableCollection, just how removeAll(where:) is currently implemented. This algorithmic improvement would definitely be welcome, but something compatible with a pure RangeReplaceableCollection would also be necessary, no?

Regarding the tail-move functionality, I think I'm probably just a bit out of my depth on what exactly would go into that. How would it differ from, e.g. Array(self[index..<end])?

Thinking about this a bit more, RangeReplaceableCollection.extractSubrange(_:) -> Self is probably what's needed. This would be more efficient when done from the end than the start/middle for most collections, but it doesn't need to be limited to that. It could be a customization point that allowed types like Array to do memory-unsafe implementations, with a default implementation in terms of the existing capabilities of RRC.

1 Like

Non-Mutable RangeReplaceableCollections are very rare. Mainly just String (because its elements are variable width). Maybe some fancy heterogenous-but-contiguous collection could be too.

The pattern with removeAll(where:) is that there's a default implementation for pure RRCs that just creates a new version of the collection and self-assigns, but then there's an optimized version that does a half-stable partition when the collection is mutable.

So just to clarify, you're suggesting adding extractSubrange(_:) as a protocol member rather than just an extension method? And would there be a more clever default implementation than the one currently in the proposal?

public mutating func extractSubrange(_ bounds: Range<Index>) -> Self {
    let extracted = Self (self[bounds])
    self.removeSubrange(bounds)
    return extracted
}

Also, as far as moving stablePartition(isSuffixElement:) into the standard library, do you think that would deserve its own proposal, or is it appropriate to do it here? I pictured the two issues a separate: an improvement of the RangeReplaceableCollection API vs. a possible future improvement of the implementation.

As a further thought, how do you view this interacting with the _customRemove(First/Last)(_:) system? IMO any extract(First/Last)(_:) should respect those as well, which forces the extract operation to copy the to-be-removed elements before we call out to the customization points. Is it worth adding new _customExtract(First/Last)(_:) methods as customization points as well?

I was just talking with a co-worker about this. I've written some utility functions that do something similar:

extension Collection {
    
    /// Removes the first `k` elements from the collection and returns them as
    /// a subsequence.
    ///
    /// - Parameter k: The number of elements to remove. Must not be larger than
    ///                the collection’s `count`.
    /// - Returns: The elements that have been removed from the collection.
    mutating public func removeAndReturnFirst(_ k: Int) -> Self.SubSequence {
        precondition(
            k >= 0, "Number of elements to remove should be non-negative")
        
        precondition(
            count >= k,
            "Can't remove more items from a collection than it contains")
        
        let offset = Swift.min(count, k)
        
        let sequence = self[startIndex ..< index(startIndex, offsetBy: offset)]
        _ = self.dropFirst(k)
        
        return sequence
    }
    
    /// Removes all of the elements from the collection and returns them as a
    /// subsequence.
    ///
    /// - Note: The collection must not be empty.
    ///
    /// - Returns: The elements that have been removed from the collection.
    mutating public func removingAll() -> Self.SubSequence {
        precondition(!isEmpty,
                     "Can't remove all items from an empty collection")
        
        return removeAndReturnFirst(count)
    }
    
}

The main use case I have is when I want to remove the elements of a collection and do some operation on them. I want it to be as close to atomic as possible. In my case, here’s where I use it:

private func unregisterKeyboardObservers() {
    keyboardObservers.removingAll()
        .forEach(NotificationCenter.default.removeObserver)
}

I really like the simplicity of passing removeObserver directly here. I’m in favor of this proposal, although it’s a bit different than what I came up with above.

1 Like

As per @jawbroken, I agree that extract feels like the correct name for this. However, is it not better to use extractAll to keep it similar to removeAll?

2 Likes

@Ben_Cohen What do you see as the most reasonable path forward on this? Should the proposal be postponed following a rotate/stablePartition proposal, or is it appropriate to fold those in to this proposal? And what do you think the best way to address the _customRemove... customization points is?