Updating removeAll(where:) to return removed elements

Jumhyn · February 20, 2019, 10:58pm

Often, I find myself wanting to remove items satisfying some predicate and pull them into an array for some sort of processing. Thus, I wish the signature of removeAll(where:) was:

@discardableResult mutating func removeAll(where: (Iterator.Element) throws -> Bool) -> [Self.Element] rethrows

This would enable code such as

let readyTasks = taskList.removeAll(where: { $0.isReady })
readyTasks.forEach { $0.execute() }

Rather than taking the extra step:

let readyTasks = taskList.filter { $0.isReady }
taskList.removeAll(where: { $0.isReady })
readyTasks.forEach { $0.execute() }

In fact, the above might be infeasible if the closure supplied is a particularly expensive operation. Is there currently another way to replicate this behavior that I'm not thinking of?

jrose · February 20, 2019, 11:03pm

The general principle of return values (that I learned from @dabrahams) is that you should prefer returning information you already have to returning Void, but that you shouldn't do extra work to compute information that might go unused. Obviously that latter bit needs to be taken loosely, but in the case of producing an Array of removed elements, you'd end up with a new memory allocation—possibly multiple allocations, since you don't know how many elements are going to be in the result up front. That's probably a bit much for something as low-level as removeAll(where:).

That said, it's not unreasonable to define a second method on RangeReplaceableCollection that does do this. Whether or not such a method would belong in the standard library is, well, something that'd need to be discussed.

Nevin · February 20, 2019, 11:32pm

The standard library already has a partition method on MutableCollection, however it is not “stable”, meaning it can rearrange the order of elements. Usage is:

var x = (0...10).shuffled()
let n = x.partition{$0 < 5}
print(x[0..<n])  // Everything 5 and above
print(x[n...])   // Everything less than 5

If you need to preserve the ordering, you could copy and paste the implementation of stablePartition from the standard library’s non-public algorithm prototypes file.

Jumhyn · February 20, 2019, 11:35pm

That makes sense. The pattern of “remove/replace and return the old element” is present throughout the Standard Library already, as in RangeReplacableCollection’s remove(at:) and Set’s update(_:) and insert(_:), so I feel like there is an acknowledgement that knowing the elements you’ve removed is a useful feature.

As a side thought, I wonder if it would be feasible to have a feature where an @discardableResult function could specify work to be performed only when the result is actually requested by the caller... would require compiling two different versions of the function at the very least.

Jumhyn · February 20, 2019, 11:37pm

Nice. That’s a good solution, but I still feel like having a version that does it in a single line fits with other idioms in the language (noted in my previous comment). Creating a partition isn’t necessarily very clear at the call site, especially if you just want to put the items in the latter potion in their own array anyway.

jrose · February 20, 2019, 11:49pm

Yeah, it's not that returning the elements you've removed isn't useful; it's that it's basically free when it's a single element. (It's not 100% free, but it's still a lot cheaper than a memory allocation.)

Jumhyn · February 20, 2019, 11:59pm

Given that RangeReplacableCollection uses a partition to perform the removal all at once, would it be possible to take advantage of that to return some sort of slice without allocating all the space for the removed elements?

michelf · February 21, 2019, 12:30am

Jumhyn:

let readyTasks = taskList.filter { $0.isReady }
taskList.removeAll(where: { $0.isReady })
readyTasks.forEach { $0.execute() }
In fact, the above might be infeasible if the closure supplied is a particularly expensive operation. Is there currently another way to replicate this behavior that I'm not thinking of?

I think this should be quite efficient:

var readyTasks: [Task] = []
taskList.removeAll(where: {
    let isReady = $0.isReady
    if isReady { readyTasks.append($0) }
    return isReady
})
readyTasks.forEach { $0.execute() }

And if $0.execute is guarantied to never access taskList, you can even move it inside the closure to avoid allocating a temporary array.

taskList.removeAll(where: {
    let isReady = $0.isReady
    if isReady { $0.execute() }
    return isReady
})

And if you move the closure's logic to a separate function, you can write it like this:

taskList.removeAll(where: { $0.executeIfReady() })

bzamayo · February 21, 2019, 12:30am

I'll just weigh in here that getting the elements back is something I've needed a fair few times, and although I now have an extension for it, an efficient implementation of the actual removeAll(where:) algorithm is not trivial so it is something that seems worthy of standard library inclusion. This feels like the (rare) case where a standard library overload, that returns the removed elements, is a suitable solution.

jrose · February 21, 2019, 12:36am

Unfortunately yes, since nothing would be keeping the contents of the slice alive.

I didn't want to give the impression that it definitively wasn't worth putting in the standard library. It's that

it shouldn't replace the existing one, because it does have a significant performance cost for a low-level operation
it probably shouldn't have a base name of removeAll, because overloading on return type is really subtle and we generally try to avoid it

bzamayo · February 21, 2019, 12:52am

In this case, as the only overloads are void or array returns, isn’t the the chance of screwing up pretty low? I mean we are essentially only recreating the flexibility you’d get from @discardableResult.

jrose · February 21, 2019, 1:15am

:-/ Still seems very subtle to me, especially if they have different availability.

Jumhyn · February 21, 2019, 2:14am

michelf:

var readyTasks: [Task] = []
taskList.removeAll(where: { 
    let isReady = $0.isReady 
    if isReady { readyTasks.append($0) }
    return isReady 
}) 
readyTasks.forEach { $0.execute() }

This is the most direct solution that preserves the behavior of my hypothetical version of removeAll(where:). The version that calls execute() inline is good too, but fails if you are doing anything that requires you to use the array of removed objects as a whole. Things like:

let readyTasks = taskList.removeAll(where: { $0.isReady })
let alphabeticalTasks = readyTasks.sorted(by: { $0.name < $1.name })
alphabeticalTasks.forEach { $0.execute() }

In any case, the proposed alternatives aren't nearly as readable—doing consumptive work in the predicate supplied to removeAll violates the principle of least astonishment, even in the case where the work is offloaded to a separate method/function/closure.

Jumhyn · February 21, 2019, 2:20am

These are good points, though I agree with @bzamayo that the overload is distinctly less problematic in this case than in non-Void overloads. I would be fine with having this under a different name since I think the functionality is definitely worth introduction to the standard library. I think a name like filterOut(where:) is pretty clear but am open to other ideas as well.

xwu · February 21, 2019, 2:22am

extract(where:)

Maksim_Odnoletkov · February 21, 2019, 12:05pm

Always creating and returning an array seems to be very wasteful and unexpected.

From my experience you rarely need a materialized collection of removed elements – an array – but actually just want to enumerate them. So the partition(where:) approach is great for that if you don't mind the order. When I do I use:

let predicate: (Task) -> Bool = { $0.isReady }
taskList.lazy.filter(predicate).forEach { $0.execute() }
taskList.removeAll(where: not • predicate)

Jumhyn · February 21, 2019, 12:16pm

I’ve started a pitch thread so anyone with thoughts on this matter should chime in over there as well.

This code is not correct, which I think illustrates the utility of a function like this. At the end, taskList will contain all the tasks for which isReady == true, the exact opposite of the desired behavior. The dual iteration is also inefficient if taskList is large, or if the predicate was checking $0.expensiveOperationReturningBool() rather than $0.isReady.

Maksim_Odnoletkov · February 21, 2019, 12:26pm

Right, sorry I misunderstood intention of the original sample.

That's my point on efficiency – it depends on the situation. If taskList is large and predicate matches often, creating and returning a large array may be much more inefficient than double-iteration.

Jumhyn · February 21, 2019, 12:34pm

Definitely agreed that returning the removed should not always be the behavior—overloading or creating a new method is the way to go. The dual iteration approach also doesn’t cover the cases where you may not want to consume the removed items right away. I think there are enough use cases that this would be a valuable addition.