gather
Should Be Independent of RangeSet
[I have a few issues with the proposal that I'll be raising in separate posts.]
One of the issues I had with the previous proposal was that it failed to separate independent concerns. The most glaring examples—the methods that mutate a RangeSet
based on a Collection
—were removed from in this proposal, but one still remains: there's an overload of gather
that takes a RangeSet
argument.
To review, the proposed gather
methods are as follows:
extension MutableCollection {
/// Collects the elements at the given indices just before the element at
/// the specified index.
@discardableResult
public mutating func gather(
_ indices: RangeSet<Index>, at insertionPoint: Index
) -> Range<Index>
/// Collects the elements that satisfy the given predicate just before
/// the element at the specified index.
@discardableResult
public mutating func gather(
at insertionPoint: Index, where predicate: (Element) -> Bool
) -> Range<Index>
}
Why is the first method's first parameter a RangeSet
? Its algorithm will work with any representation of a set of a collection's indices. In fact, all it needs is a predicate that tells us whether the element at a given index should be gathered. If someone is already working with a Set<Index>
, or already has a predicate that answers the question, we should not force them to reconstitute it as a RangeSet<Index>
. That code looks like this:
let s = someSetOfIndices()
// O(a.count), plus an allocation
let rs = RangeSet(
a.indices.lazy
.filter { i in s.contains(i) }
.map { i in i..<a.index(after: i))
a.gather(rs, at: destination)
Actually, that's pretty fancy code. It's more likely they'd end up with this, which performs worse:
let s = someSetOfIndices()
// Same big-O, but probably many allocations
var rs = RangeSet<A.Index>()
for i in a.indices where s.contains(i) {
rs.insert(contentsOf: i..<a.index(after: i))
}
a.gather(rs, at: destination)
and I doubt most people would be aware of the difference.
The second gather
overload is almost what we'd need to solve the problem, but its predicate works on elements, rather than indices. With a gather
method that takes a predicate over indices, the user could just write:
let s = someSetOfIndices()
a.gather(at: destination) { i in s.contains(i) }
That version of gather
could be declared as follows (but see below for a problem):
/// Collects the elements at positions satisfying the given predicate
/// just before the element at the specified index, preserving their
/// relative order.
@discardableResult
public mutating func gather(
at insertionPoint: Index, where predicate: (Index) -> Bool
) -> Range<Index>
The One True gather
It's tempting to think we could use this new gather method to do the work of the second overload:
a.gather(at: destination) { i in somePredicateOverElements(a[i]) }
The problem is that the closure captures a
, which will force the first mutation made by gather
to copy a
's storage, even if it was otherwise uniquely-referenced.
It's also problematic to add this new gather method as an overload, because there are common cases where the Index
type is the same as that of the Element
(e.g. Array<Int>
), which would make invoking the method ambiguous.
So, if we are going to solve this problem with one method taking a predicate argument, it should be a binary predicate over elements and indices:
/// Collects the elements satisfying the given predicate over their
/// values and positions just before the element at the specified index,
/// preserving their relative order.
@discardableResult
public mutating func gather(
at gatheringPlace: Index,
where predicate: (Element, Index) -> Bool
) -> Range<Index>
Using it with a RangeSet
or a Set<Index>
would be equally simple:
a.gather(at: gatheringPlace) { _, i in someSetOfIndices.contains(i) }
Once you have this method, it seems to me that the second overload buys very little; you get to drop “, _
” from
a.gather(at: gatheringPlace) { e, _ in somePredicate(e) }
The Generic Approach
Another way of making gather
independent of RangeSet
would be to
- introduce a primitive set protocol that only supports containment testing
- make the first
gather
overload generic over the set type
- preserve the second
gather
overload
But this seems to add lots of complexity with little benefit. You still need two overloads. That primitive set protocol is, semantically, nothing more than a predicate function that answers the containment question. Also, if what you're starting with is a predicate over indices, you'd have to build a conforming wrapper type just to use it, which would interrupt the flow of code.
Conclusion
gather
should be a single method taking a binary predicate as suggested above. The simplicity of the result, both for the library and for its users, is telling us that's the right design. I hope we'll listen.
-Dave