[Pitch] Add `indexed()` and `Collection` conformances for `enumerated()` and `zip(_:_:)`

Implementation: [stdlib] Add `indexed()` and `Collection` conformances for `enumerated()` and `zip(_:_:)` by timvermeulen · Pull Request #36851 · apple/swift · GitHub

Introduction

This proposal aims to fix the lack of Collection conformance of the sequences returned by zip(_:_:) and enumerated(), preventing them from being used in a context that requires a Collection. Also included is the addition of the indexed() method on Collection as a more ergonomic, efficient, and correct alternative to c.enumerated() and zip(c.indices, c).

Motivation

Currently, the Zip2Sequence and EnumeratedSequence types conform to Sequence, but not to any of the collection protocols. Adding these conformances was impossible before SE-0234 Remove Sequence.SubSequence, and would have been an ABI breaking change before the language allowed @available annotations on protocol conformances (Availability checking for protocol conformances). Now we can add them!

Conformance to the collection protocols can be beneficial in a variety of ways, for example:

  • (1000..<2000).enumerated().dropFirst(500) becomes a constant time operation.
  • zip("abc", [1, 2, 3]).reversed() will return a ReversedCollection rather than allocating a new array.
  • SwiftUI’s List and ForEach views will be able to directly take an enumerated or zipped collection as their data.

This proposal also includes the addition of the indexed() method (which can already be found in the Swift Algorithms package) as an alternative for many use cases of zip(_:_:) and enumerated(). When the goal is to iterate over a collection’s elements and indices at the same time, enumerated() is often inadequate because it provides an offset, not a true index. For many collections this integer offset is different from the Index type, and in the case of ArraySlice in particular this offset is a common source of bugs when the slice’s startIndex isn’t 0. zip(c.indices, c) solves these problems, but it is less ergonomic than indexed() and potentially less performant when traversing the indices of a collection is computationally expensive.

Detailed design

Conditionally conform Zip2Sequence to Collection, BidirectionalCollection, and RandomAccessCollection.

Note: OS version 9999 is a placeholder and will be replaced with whatever actual OS versions this functionality will be introduced in.

@available(macOS 9999, iOS 9999, watchOS 9999, tvOS 9999, *)
extension Zip2Sequence: Collection
  where Sequence1: Collection, Sequence2: Collection
{
  // ...
}

@available(macOS 9999, iOS 9999, watchOS 9999, tvOS 9999, *)
extension Zip2Sequence: BidirectionalCollection
  where Sequence1: BidirectionalCollection, Sequence2: BidirectionalCollection
{
  // ...
}

@available(macOS 9999, iOS 9999, watchOS 9999, tvOS 9999, *)
extension Zip2Sequence: RandomAccessCollection
  where Sequence1: RandomAccessCollection, Sequence2: RandomAccessCollection {}

Conditionally conform EnumeratedSequence to Collection, BidirectionalCollection, RandomAccesCollection, and LazyCollectionProtocol.

@available(macOS 9999, iOS 9999, watchOS 9999, tvOS 9999, *)
extension EnumeratedSequence: Collection where Base: Collection {
  // ...
}

@available(macOS 9999, iOS 9999, watchOS 9999, tvOS 9999, *)
extension EnumeratedSequence: BidirectionalCollection
  where Base: BidirectionalCollection
{
  // ...
}

@available(macOS 9999, iOS 9999, watchOS 9999, tvOS 9999, *)
extension EnumeratedSequence: RandomAccessCollection
  where Base: RandomAccessCollection {}

@available(macOS 9999, iOS 9999, watchOS 9999, tvOS 9999, *)
extension EnumeratedSequence: LazySequenceProtocol
  where Base: LazySequenceProtocol {}

@available(macOS 9999, iOS 9999, watchOS 9999, tvOS 9999, *)
extension EnumeratedSequence: LazyCollectionProtocol
  where Base: LazyCollectionProtocol {}

Add an indexed() method to Collection that returns a collection over (index, element) pairs of the original collection.

extension Collection {
  @available(macOS 9999, iOS 9999, watchOS 9999, tvOS 9999, *)
  public func indexed() -> Indexed<Self> {
    Indexed(_base: self)
  }
}

@available(macOS 9999, iOS 9999, watchOS 9999, tvOS 9999, *)
public struct Indexed<Base: Collection> {
  // ...
}

@available(macOS 9999, iOS 9999, watchOS 9999, tvOS 9999, *)
extension Indexed: Collection {
  // ...
}

@available(macOS 9999, iOS 9999, watchOS 9999, tvOS 9999, *)
extension Indexed: BidirectionalCollection where Base: BidirectionalCollection {
  // ...
}

@available(macOS 9999, iOS 9999, watchOS 9999, tvOS 9999, *)
extension Indexed: RandomAccessCollection where Base: RandomAccessCollection {}

@available(macOS 9999, iOS 9999, watchOS 9999, tvOS 9999, *)
extension Indexed: LazySequenceProtocol where Base: LazySequenceProtocol {}

@available(macOS 9999, iOS 9999, watchOS 9999, tvOS 9999, *)
extension Indexed: LazyCollectionProtocol where Base: LazyCollectionProtocol {}

Source compatibility

Adding LazySequenceProtocol conformance for EnumeratedSequence is a breaking change for code that relies on the enumerated() method currently not propagating LazySequenceProtocol conformance in a lazy chain:

extension Sequence {
  func everyOther_v1() -> [Element] {
    let x = self.lazy
      .enumerated()
      .filter { $0.offset.isMultiple(of: 2) }
      .map(\.element)
    
    // error: Cannot convert return expression of type 'LazyMapSequence<...>' to return type '[Self.Element]'
    return x
  }
  
  func everyOther_v2() -> [Element] {
    // will keep working, the eager overload of `map` is picked
    return self.lazy
      .enumerated()
      .filter { $0.offset.isMultiple(of: 2) }
      .map(\.element)
  }
}

All protocol conformances of an existing type to an existing protocol are potentially source breaking because users could have added the exact same conformances themselves. However, given that Zip2Sequence and EnumeratedSequence do not expose their underlying sequences, there is no reasonable way anyone could have conformed either type to Collection themselves. The only sensible conformance that could conflict with one of the conformances added in this proposal is the conformance of EnumeratedSequence to LazySequenceProtocol.

Effect on ABI stability

This proposal does not affect ABI stability.

Alternatives considered

Don’t add LazyCollectionProtocol conformance for EnumeratedSequence for the sake of source compatibility.

We consider it a bug that enumerated() currently does not propagate laziness in a lazy chain.

Only conform Zip2Sequence and EnumeratedSequence to BidirectionalCollection when the base collections conform to RandomAccessCollection.

Traversing an EnumeratedSequence backwards requires computing the count of the collection upfront in order to determine the correct offsets, which is an O(count) operation when the base collection does not conform to RandomAccessCollection. This is a one-time cost incurred when c.index(before: c.endIndex) is called, and does not affect the overall time complexity of an entire backwards traversal. Besides, index(before:) does not have any performance requirements that need to be adhered to.

Similarly, Zip2Sequence requires finding the index of the longer of the two collections that corresponds to the end index of the shorter collection, when doing a backwards traversal. As with EnumeratedSequence, this adds a one-time O(n) cost that does not violate any performance requirements.

Keep EnumeratedSequence the way it is and add an enumerated() overload to Collection that returns a Zip2Sequence<Range<Int>, Self>.

This is tempting because enumerated() is little more than zip(0..., self), but this would cause an unacceptable amount of source breakage due to the lack of offset and element tuple labels that EnumeratedSequence provides.

27 Likes

When did this become possible?

Last I heard there was no way to introduce versioned conformances.

If we’re able to do this now, then I’d like to add StrideTo and StrideThrough to the list of things that want Collection conformances.

2 Likes

This is thanks to @Slava_Pestov :slightly_smiling_face:

7 Likes

There are implementation-specific complications to this, in that floating-point strides cannot share an implementation in common with other strides and a type can only conform to a protocol in one way. But that is neither here nor there for this discussion.

(An implementation of conditional conformance to Collection for stride types has been in the code base the whole time, just disabled with an #if false.)

1 Like

I think the proposed additions are overall a no-brainer, and the proposed trade-offs discussed under the alternatives are very reasonable, but this issue regarding source compatibility due to propagation of laziness looms large--distinct from the issue arising from user conformance of types they don't own to protocols they don't own, for which we have never and should not now offer source stability guarantees. I would be concerned whether the advantages of LazySequenceProtocol conformance meet the bar for a source-breaking change.

4 Likes

Can you comment further on what problem it is you see here? The only potential source breakage that I can see is the possibility of a user conforming a type they don't own (EnumeratedSequence) to a protocol they don't own (LazySequenceProtocol), which you explicitly say in the next sentence is not something for which source stability should be a worry -- and I agree.

The only other change this makes is to behavior of chained calls that include enumerated(), which doesn't break source but only changes runtime efficiency / memory-use. This sort of thing would happen with any underlying algorithmic or data structure change to any of the stdlib types, so I'm unclear about what you see as the issue.

1 Like

Yes please! I have this in every project.

2 Likes

This one: