Add IndexSet based subscript to Collection where its index is Int, subscript setter to MutableCollection

First time trying to contribute here so be kind :grinning:. Also let me know if this has already been proposed and/or is in the process of being adopted. I'm kind of surprised that it's not yet in the library nor have I been able to find an equivalent proposal.

IndexSet is a basic type in the Swift standard library but it barely has any default support in the library collections (i.e. Array). This makes bridging Foundation logic returning or using IndexSet with Swift logic awkward.

Even beyond Foundation interaction, it's still a useful type to use when dealing with collections of items that are expensive to copy or parallel arrays of related items.

Adding IndexSet based subscripting to collections indexed with an Int (most famously Array, but it can generically be applied to any others) should make usage of IndexSet with collections much more smooth.

The API would look as follows:

public extension Collection where Self.Index == Int {

/** Returns an array with the elements at the given indexes.
- Parameter indexes: The indexes of the elements to retrieve.
 */
public subscript(indexes: IndexSet) -> [Element] { get }

}

which would be equivalent to NSArray's objects(at indexes: IndexSet) -> [Any]

public extension MutableCollection where Self.Index == Int {

/** Returns an array with the elements at the given indexes.
- Parameter indexes: The indexes of the elements to retrieve.
 */
public subscript(indexes: IndexSet) -> [Element] { get set }

}

equivalent to NSMutableArray's func insert(_ objects: [Any], at indexes: IndexSet)

The one issue where I'm not sure how this would be treated is documentation.

1 Like

If we do something in this space, we’d probably want a generic subscript taking a Sequence of Self.Index:

extension Collection {
  subscript<T: Sequence>(idx: T) -> [Element]
  where T.Element == Index {
    get { return idx.map{ self[$0] } }
  }
}

extension MutableCollection {
  subscript<T: Sequence>(idx: T) -> [Element]
  where T.Element == Index {
    get { return idx.map{ self[$0] } }
    set { zip(idx, newValue).map{ (i, x) in self[i] = x } }
  }
}

We probably want also that, but a specific IndexSet one is going to be more efficient as you can copy index ranges directly instead of going element by element.

This would be immediately useful and we can either add a generic Sequence one at the same time or leave it for later.

1 Like

This would have to be an addition to Foundation, not the stdlib as IndexSet is a Foundation type.

I believe you can still get this optimization with any input sequence of indices, by starting with the assumption that the input is sorted (as IndexSet is?) and coalescing adjacent indices and only copying when you hit a discontiguous one. When indices appear out of order, this doesn't matter, you just lose out on the ability to coalesce.

An IndexSet might still have a potential edge because its held as ranges so you can skip the coalescing logic entirely. But whether that yields a meaningful performance win (comparing integers to determine the discontiguous points could be pretty quick) versus the other costs (of allocating the result array and copying the various elements, potentially including retaining non-trivial types) would need to be seen before going down the path of optimizing for it.

All good points. But there's no good library or language based way to enforce that a given generic sequence of indices is going to be sorted. Which doesn't mean it's not a good idea to add but it ought to be as a distinct API from the clean subscript operator (maybe a subscript with a named parameter like myArray[sortedIndices: mySequence]).

I'll see if that's something I can look at while I'm putting a patch in place on the Foundation overlay (I think that's where the logic should go per Lance's comment above). And if it looks simple enough to add I'll include it in the official proposal.

But since I don't believe adding the IndexSet subscript would stop us from adding the generic Sequence version later in any way without any backwards compatibility issues I'll focus more on the specific one I want for my work ;o)

As I said, I don't think the implementation requires the input indices to be sorted in order to benefit from coalescing copies when they are sorted.

Given this is a possibilty, I don't believe it is a good idea to patch this concrete implementation into Foundation. Since importing of Foundation is very common, we need to avoid these kind of extensions on standard library types specific to Foundation when they can be easily generalized to apply to a standard library protocol instead.

I think if the Sequence of indexes subscript got added later the IndexSet implementation would still remain and be used as a further specialization of the generic one.

The implementation gets more efficient as you go from Sequence to IndexSet. For example for a Collection of indexes you can count on the size of the collection to allocate all the space needed once, something that cannot be done with Sequence. The IndexSet subscript, as we've discussed above, both guarantees that the indexes are sorted and that they are readily grouped in contiguous ranges.

If there's any obscure circumstance where not adding all the increasingly specialized subscript implementations to the library at once would cause backwards compatibility or unexpected changes in behavior that would be a more concerning issue.

Not so. Check out Sequence.underestimatedCount for details.

Can we count with Collection's implementation of underestimatedCount to just return count? (pun intended)

Also above you mention coalescing copies, for a generic sequence of indexes subscript I don't think there would be neither the need nor the desire to coalesce neither indexes nor results. If the caller wants the same index three times let the caller have the same element three times.

That's the default implementation. Collections can override it if an underestimate would be better (e.g. a lazy filtered collection, which wants to avoid performing the possibly-expensive filtering operation just to generate a count).

That is fine. My suggestion is that if a sequence returns 1, 2, 3, 4, then the algorithm could coalesce this incrementally-increasing part of the sequence into 1..<5, deferring the copy of the elements into the result array until it detects either a repetition (e.g. 1,1) or a discontinuity (e.g. 1,3), thus realizing the benefit of batched copies.

Not a bad idea for the generic Sequence implementation. I'll explore that as I mess with the code.