Add `extract(where:)` method to `RangeReplaceableCollection`

I recently started a thread over in Evolution about this, but figured I would turn it into an actual pitch to solicit further feedback about including this method in the Standard Library.

The motivation is that frequently, when removing elements from some Collection, we want to retrieve the elements that were removed. Currently, this is not possible to do in bulk with methods in the standard library, since removeAll(where:) does not return the removed items. This idiom is already valuable enough to be included in the Standard Library in several places: RanceReplaceableCollection returns Self.Element from remove(at:), Set returns Element? from remove(_:) and update(with:). Per @jrose on my Evolution thread, removeAll(where:) neglects to return the removed items in the interest of efficiency, not for any philosophical reason (if removeAll(where:) were modified to return these elements, it would always be performing an extra allocation even in situations where the result was not desired.

This idea has been brought up before with a positive, but limited response. Based on this thread (as well as the suggestion from @xwu, I propose the addition of a method with the following signature to RangeReplaceableCollection:

@discardableResult mutating func extract(where shouldBeExtracted: (Self.Element) throws -> Bool) -> [Self.Element] rethrows

The extract(where:) method would iterate through the collection, removing elements for which shouldBeExtracted returns true, collecting them in the array, and returning the array upon completion for further processing.

For a use case, consider the small example of a list of pending tasks which is periodically searched for tasks which are ready to be executed. Using extract(where:), this could be implemented as follows:

let readyTasks = taskList.extract(where: { $0.isReady })
readyTasks.forEach { $0.execute() }

Proposed alternatives in the Evolution thread using removeAll(where:) are less readable, less efficient, or perform consuming work in the shouldBeRemoved predicate:

// Poor readability
var x = (0...10).shuffled() 
let n = x.partition{$0 < 5}
print(x[0..<n]) // Everything 5 and above 
print(x[n...]) // Everything less than 5

// Most correct, but highly verbose compared to extract(where:)
var readyTasks: [Task] = []
taskList.removeAll(where: {
    let isReady = $0.isReady
    if isReady { readyTasks.append($0) }
    return isReady
})
readyTasks.forEach { $0.execute() }

// Invokes execute() as part of an inclusion predicate--surprising!
// Also, not feasible if processing of the removed elements array as a whole
// is desired.
taskList.removeAll(where: {
    let isReady = $0.isReady
    if isReady { $0.execute() }
    return isReady
})

The implementation of extract(where:) can be very similar to that of removeAll(where:) except that the removed suffix would be copied into an array. This addition would also open up the door to extractSubrange(_:) and possibly extractLast(_:) and extractFirst(_:).

I've run into the need for this method several times, so I would love to hear if it is a common enough use case to consider inclusion in the Standard Library. If it seems like there is a decent response, I can start of a formal proposal soon.

EDIT: As suggested by @Avi, a better signature removes the @discardableResult attribute, so that the signature is just:

mutating func extract(where shouldBeExtracted: (Self.Element) throws -> Bool) -> [Self.Element] rethrows
6 Likes

I don't think @discardableResult is appropriate. If the result is being discarded, it's semantically- and behaviorally identical to removeAll(where:). I think a caller should be explicit if they are calling extract(where:) and discarding the result.

23 Likes

To be honest the word extract does not imply to me that the elements in the collection are being removed instead of just copied. I'd like to take the chance and pitch an alternative name. What would the community think about discardAll(where:), discardSubrange(_:), discardLast(_:) and discardFirst(_:)? This would align well with what @Avi said, because discarding the result from a discarding method would not make much sense for the API.

1 Like

I believe the name extract(where:) accurately describes this operation.

  • Clearly implies side effects. (Those with side-effects should read as imperative verb phrases.)
  • Has literal meaning:

    extract verb 1. to remove or take out, especially by effort or force.

The word "discard", however, implies that things discarded are no longer useful, which is not the intended semantics of this API. Its meaning is closer to remove(where:).

discard verb 1. to get rid of especially as useless or unwanted

16 Likes

To me extract(where) (while I think it should be called extractAll(where:)) feels like filter(where:) to be honest, but again that might be only me. ;) I see the word extract analog to the unzip which clearly does not imply any removal of the original values.

3 Likes

I think filter(where:) would also be a great name for this API if we didn't allow those naming exceptions (filter, dropFirst, etc, imperative verbs for non-mutating operations) in the standard library in the first place.

Another suggestion I had for the name was filterOut(where:). I agree that the discard... variants imply ignoring the removed values (in fact, removeAll(where:) and discardAll(where:) where removeAll is the returning version seems the most clear so far, if we ignore the fact that it would modify the existing behavior of removeAll)

Agreed. Had imagined it being @discardableResult when it was a modification of removeAll(where:) but as a separate API it should be incorrect to not use the result.

Extract is very clear to me, and I wouldn't confuse it with filter.

3 Likes

Lately, for any function that traditionally returns an Array of some element, I make a master version that takes the return type as a function parameter, then make the regular version call the master with Array.self as the extra argument. I was going to suggest that, then I remembered that some existing algorithms have a RangeReplaceableCollection overload that returns Self. Maybe you should do that here, instead of a generic Array. (The user could always explicitly copy to a new Array if Self isn't random-access and/or mutable but s/he needs that.)

1 Like

I think you're right that returning Self makes sense here rather than (possibly) jumping across type boundaries. Will include that in the proposal writeup.

A related method that I've seen a lot of requests for is the non-mutating version of this, which returns two arrays filled with the elements that pass and fail a predicate. I've seen it sometimes written as dividing(where:):

// Sequence method:
func dividing(where predicate: (Element) -> Bool)
    -> (matching: [Element], nonmatching: [Element])

let numbers = 1...10
let divided = numbers.dividing(where: { $0.isMultiple(of: 2) })
// divided.matching == [2, 4, 6, 8, 10]

Is there interest in this as well? Is there a name or pair of names that could indicate the relationship between these? (I'm not sure if extracting(where:) makes sense.)

The partition(by:) method is more closely the non-mutating version of what you want, though perhaps there would be utility for a partitioning(by:) method. Could return either (secondPartitionIndex: Index, result: Self) or (firstPartition: Self, secondPartition: Self).

1 Like

It seems like a proposal to cover this feature may as well propose extractSubrange(_:) and extract(First/Last)(_:) as well. Is there any reason to not include these functions?

Here's the beginnings of a proposal, will continue to flesh it out over the coming days.

1 Like

Oh, good point. The nonmutating version would be better linked up with that method.

  • Typo in the implementation snippet. You forgot All after extract.

  • And I suggested discardAll not just discard. ;)

  • The usage example requires () after execute.

  • You should also mention that modifying removeAll to return something would be a potentially source breaking change regardless the discarding attribute. predicates.forEach(collection.removeAll(where:)) for example will stop compiling.

1 Like

Thanks for taking the time to look it over! Edits incoming (EDIT: And pushed!).

Is there a way to get this to compile today? As written I thought the partial application of removeAll(where:) would be disallowed since it's mutating.

1 Like

Ah sorry you're totally right, I forgot about that. That does not compile today, excuse me for confusion.

1 Like

No worries. If you do think of any other ways in which the @discardableResult alternative would be source-breaking let me know—that should definitely be called out in the proposal.