SE-0270 (Review #2): Add Collection Operations on Noncontiguous Elements

dabrahams · December 18, 2019, 3:20am

`gather` Should Be Independent of `RangeSet`

[I have a few issues with the proposal that I'll be raising in separate posts.]

One of the issues I had with the previous proposal was that it failed to separate independent concerns. The most glaring examples—the methods that mutate a RangeSet based on a Collection—were removed from in this proposal, but one still remains: there's an overload of gather that takes a RangeSet argument.

To review, the proposed gather methods are as follows:

extension MutableCollection {
    /// Collects the elements at the given indices just before the element at 
    /// the specified index.
    @discardableResult
    public mutating func gather(
        _ indices: RangeSet<Index>, at insertionPoint: Index
    ) -> Range<Index>

    /// Collects the elements that satisfy the given predicate just before
    /// the element at the specified index.
    @discardableResult
    public mutating func gather(
        at insertionPoint: Index, where predicate: (Element) -> Bool
    ) -> Range<Index>
}

Why is the first method's first parameter a RangeSet? Its algorithm will work with any representation of a set of a collection's indices. In fact, all it needs is a predicate that tells us whether the element at a given index should be gathered. If someone is already working with a Set<Index>, or already has a predicate that answers the question, we should not force them to reconstitute it as a RangeSet<Index>. That code looks like this:

let s = someSetOfIndices()

// O(a.count), plus an allocation
let rs = RangeSet(
        a.indices.lazy
            .filter { i in s.contains(i) }
            .map { i in i..<a.index(after: i))

a.gather(rs, at: destination)

Actually, that's pretty fancy code. It's more likely they'd end up with this, which performs worse:

let s = someSetOfIndices()

// Same big-O, but probably many allocations
var rs = RangeSet<A.Index>()
for i in a.indices where s.contains(i) {
   rs.insert(contentsOf: i..<a.index(after: i))
}

a.gather(rs, at: destination)

and I doubt most people would be aware of the difference.

The second gather overload is almost what we'd need to solve the problem, but its predicate works on elements, rather than indices. With a gather method that takes a predicate over indices, the user could just write:

let s = someSetOfIndices()
a.gather(at: destination) { i in s.contains(i) }

That version of gather could be declared as follows (but see below for a problem):

    /// Collects the elements at positions satisfying the given predicate
    /// just before the element at the specified index, preserving their
    /// relative order.
    @discardableResult
    public mutating func gather(
        at insertionPoint: Index, where predicate: (Index) -> Bool
    ) -> Range<Index>

The One True `gather`

It's tempting to think we could use this new gather method to do the work of the second overload:

a.gather(at: destination) { i in somePredicateOverElements(a[i]) }

The problem is that the closure captures a, which will force the first mutation made by gather to copy a's storage, even if it was otherwise uniquely-referenced.

It's also problematic to add this new gather method as an overload, because there are common cases where the Index type is the same as that of the Element (e.g. Array<Int>), which would make invoking the method ambiguous.

So, if we are going to solve this problem with one method taking a predicate argument, it should be a binary predicate over elements and indices:

    /// Collects the elements satisfying the given predicate over their
    /// values and positions just before the element at the specified index,
    /// preserving their relative order.
    @discardableResult
    public mutating func gather(
        at gatheringPlace: Index, 
        where predicate: (Element, Index) -> Bool
    ) -> Range<Index>

Using it with a RangeSet or a Set<Index> would be equally simple:

a.gather(at: gatheringPlace) { _, i in someSetOfIndices.contains(i) }

Once you have this method, it seems to me that the second overload buys very little; you get to drop “, _” from

a.gather(at: gatheringPlace) { e, _ in somePredicate(e) }

The Generic Approach

Another way of making gather independent of RangeSet would be to

introduce a primitive set protocol that only supports containment testing
make the first gather overload generic over the set type
preserve the second gather overload

But this seems to add lots of complexity with little benefit. You still need two overloads. That primitive set protocol is, semantically, nothing more than a predicate function that answers the containment question. Also, if what you're starting with is a predicate over indices, you'd have to build a conforming wrapper type just to use it, which would interrupt the flow of code.

Conclusion

gather should be a single method taking a binary predicate as suggested above. The simplicity of the result, both for the library and for its users, is telling us that's the right design. I hope we'll listen.

-Dave

SE-0270 (Review #2): Add Collection Operations on Noncontiguous Elements

gather Should Be Independent of RangeSet

The One True gather

The Generic Approach

Conclusion

`gather` Should Be Independent of `RangeSet`

The One True `gather`