Why the constraint Collection.Indices.Index == Collection.Index?

I recently noticed that Collection.Indices.Index is constrained to be the same type as Collection.Index (link), and it is not obvious to me why. Are there algorithms which require this?

To me it seems like it should be enough that Collection.Indices.Element == Collection.Index, which is indeed already the case as well. After all, if you are working with an index into indices, say, “var i: Indices.Index”, then you can easily write “indices[i]” to get the corresponding index into the Collection.

I don’t have a use-case right now where I need the two types to be different, but it seems plausible that somebody might want to create a collection with a non-trivial index type, whose indices are nonetheless themselves indexed by, say, an integer. (Perhaps the currently-valid indices are stored in an array.)

What is the rationale behind the constraint that Indices.Index == Index?

Here is an example of how that constraint can be useful:

extension RangeReplaceableCollection where Self: MutableCollection {
  mutating func remove(where match: (Element) -> Bool) {
    guard var i = index(where: match) else { return }

    for j in indices[index(after: i)...] {
      if !match(self[j]) {
        self[i] = self[j]
        i = index(after: i)  
      }
    }

    removeSubrange(i...)
  }
}

Note that on the line with the guard, we get an index from the collection. Then, it’s used to slice the indices from that point. This is possible because the collection and indices index are one and the same.

Okay, though if we didn’t have that constraint one could still write:

guard var i = indices.index{ match(self[indices[$0]]) }

Sure, the code is less elegant this way and it requires an extra subscript call, but still.

• • •

Also, what’s with the constraint “Indices.SubSequence == Indices”? That outright prohibits the use of Array for Indices.

The key thing here is that the only purpose of indices is to improve code like this. indices isnt a part of the implementation mechanics of a collection. It’s a convenience for iterating over the indices of a collection, which can result in nicer clearer code than creating and advancing an index manually. Anything that detracts from that purpose is going against that goal.

There would never be a good reason to use an Array as a collection’s Indices type, since indices is always a computed “view”. There are really only two practical ways to implement indices:

  • the way DefaultIndices does it, which is to store the original collection and a start/end, then use the collection to advance the index; and
  • for simpler collections that have a collection that can be advanced independently, like Array, they can optimize away the storing of a copy of the collection (a big deal because it avoids ARC) and just return a Range or a Stride or some similar trivial type.

The reason Indices is an associated type, and the indices var a protocol requirement rather than just an extension, is to facilitate that second optimization.

5 Likes

Oh! Good to know. Is there a place where this sort of detailed information can be found?

Also, the documentation for Collection.indices reads in part,

  /// A collection's `indices` property can hold a strong reference to the
  /// collection itself, causing the collection to be nonuniquely referenced.
  /// If you mutate the collection while iterating over its indices, a strong
  /// reference can result in an unexpected copy of the collection. To avoid
  /// the unexpected copy, use the `index(after:)` method starting with
  /// `startIndex` to produce indices instead.

The remove example you showed does exactly that, mutating inside a for loop over indices. Are there sub-protocols which require indices not to hold such a reference? I might suspect RandomAccessCollection could ensure such a thing, but does it? And what about RangeReplaceableCollection?

Should we consider making indices hold at most a weak reference to self, or some other scheme to prevent unwanted copies in this situation?

1 Like

Is there a place where this sort of detailed information can be found?

Sadly not. We don’t have an official in-depth guide to the standard library that goes into this kind of detail. The documentation comments are good, but not in-depth enough to get into this kind of detail (and probably shouldn’t be, unless we “tiered” them somehow, because there’d be information overload for most users).

The remove example you showed does exactly that, mutating inside a for loop over indices.

Haha, it’s a fair cop! 🙂

In practice all collections in the std lib that conform to MutableCollection ought to have a trivial Indices type that doesn’t need to take a copy. This is probably generally the case most collections that conform to MutableCollection, but certainly I can think of reasons why it wouldn’t always be.

Should we consider making indices hold at most a weak reference to self, or some other scheme to prevent unwanted copies in this situation?

The trouble is, indices really needs that strong reference in cases where it needs to use the collection to advance the index. If it didn’t force a copy when the array changed, then it’d no longer have value semantics because the changed array would change the indices value.

If indices changes during the middle of iterating over it, can you even safely proceed? You’d be iterating over something which is no longer the indices into your collection, and using those values to index into your collection.

That is morally equivalent to:

var x = [10, 11, 12]
for i in x.indices {
    x.remove(at: i)
}

…and liable to make Bad Things™ happen.

The important situation is where the Collection is mutated within the loop, but indices remains unchanged. Since MutableCollection requires that indices are stable when assigning via subscript, it would be nice if we could guarantee the backing storage will not be duplicated in code like this:

extension MutableCollection where Element: Numeric {
    mutating func clear() {
        for i in indices {
            self[i] = 0
        }
    }
}

If anything, in the case where indices needs to refer to its parent collection to advance, we *don’t* want value semantics: if the parent collection changes, then either any existing use of indices should immediately and automatically refer to the newly-updated collection, or should somehow be invalidated. When someone wants a separate copy of the collection, they can create it themself and then ask for *its* indices.

Perhaps the simplest approach is to add a semantic requirement to MutableCollection whereby indices must not keep a reference to self. Are there important models of MutableCollection where that would be problematic?

While Ben’s code was an excellent example of how indices can be useful, that particular algorithm maybe ought to be done with stablePartition + remove(rest...) to avoid the ARC costs of copying elements that contain references (and then be specialized for specific collections like Array using move operations). Note: indices also works for non-mutating algorithms where this isn’t an issue.

Yes, many. Set and Dictionary are prominent examples.

Set and Dictionary don’t conform to MutableCollection.

Okay then, A dictionary’s .values is an example

Is there anything we can do to prevent unwanted copies when mutating-while-iterating-over-indices?

I don’t think so.

That’s unfortunate.

In my experience, the only time I want to iterate over indices is when mutating a collection’s elements, because if I weren’t mutating them I could simply iterate over the collection, and if I were adding or removing elements then indices would not suffice.

So the one thing indices would be great for…is a potential pitfall.

I am probably going to start operating under the assumption that any Collection which is both Mutable and RandomAccess should have standalone indices that won’t capture self, even though there’s no documented guarantee of this.