Hi all,
This is a second pitch of the indexed()
proposal (initial pitch thread, review thread). The following changes have been made since the original proposal:
- The return type of the
indexed()
method has been renamed fromIndexed
toIndexedCollection
. - The conditional
RandomAccessCollection
conformance ofZip2Sequence
has been removed in favor of a newzip(_:_:)
overload for random-access collections that returns a newZip2RandomAccessCollection
type, due to implementation difficulties. This has been elaborated on in the Alternatives considered.
I want to highlight a trade-off made regarding the BidirectionalCollection
conformance of the Zip2Sequence
and EnumeratedSequence
types. EnumeratedSequence
is simpler, so I'll explain the trade-off in terms of that type, but all of the below applies to both types equally.
Here’s what the Collection
conformance could look like:
extension EnumeratedSequence: Collection where Base: Collection {
struct Index {
let base: Base.Index
let offset: Int
}
var startIndex: Index {
Index(base: _base.startIndex, offset: 0)
}
var endIndex: Index {
Index(base: _base.endIndex, offset: 0)
}
func index(after index: Index) -> Index {
Index(base: _base.index(after: index.base), offset: index.offset + 1)
}
subscript(index: Index) -> (offset: Int, element: Base.Element) {
(index.offset, _base[index.base])
}
}
extension EnumeratedSequence.Index: Comparable {
static func == (lhs: Self, rhs: Self) -> Bool {
return lhs.base == rhs.base
}
static func < (lhs: Self, rhs: Self) -> Bool {
return lhs.base < rhs.base
}
}
Here’s what the Bidirectional
conformance could look like. The question is: should Base
be required to conform to BidirectionalCollection
or RandomAccessCollection
?
extension EnumeratedSequence: BidirectionalCollection where Base: ??? {
func index(before index: Index) -> Index {
let currentOffset = index.base == _base.endIndex ? _base.count : index.offset
return Index(base: _base.index(before: index.base), offset: currentOffset - 1)
}
}
Notice that calling index(before:)
with the end index requires computing the count
of the base collection. This is an O(1) operation if the base collection is RandomAccessCollection
, but O(n) if it's BidirectionalCollection
.
Option 1: where Base: BidirectionalCollection
A direct consequence of index(before:)
being O(n) when passed the end index is that some operations like last
are also O(n):
extension BidirectionalCollection {
var last: Element? {
isEmpty ? nil : self[index(before: endIndex)]
}
}
// A bidirectional collection that is not random-access.
let evenNumbers = (0 ... 1_000_000).lazy.filter { $0.isMultiple(of: 2) }
let enumerated = evenNumbers.enumerated()
// This is still O(1), ...
let endIndex = enumerated.endIndex
// ...but this is O(n).
let lastElement = enumerated.last!
print(lastElement) // (offset: 500000, element: 1000000)
However, since this performance pitfall only applies to the end index, iterating over a reversed enumerated collection stays O(n):
// A bidirectional collection that is not random-access.
let evenNumbers = (0 ... 1_000_000).lazy.filter { $0.isMultiple(of: 2) }
// Reaching the last element is O(n), and reaching every other element is another combined O(n).
for (offset, element) in evenNumbers.enumerated().reversed() {
// ...
}
In other words, this could make some operations unexpectedly O(n), but it’s not likely to make operations unexpectedly O(n²).
Option 2: where Base: RandomAccessCollection
If EnumeratedSequence
’s conditional conformance to BidirectionalCollection
is restricted to when Base: RandomAccessCollection
, then operations like last
and last(where:)
will only be available when they’re guaranteed to be O(1):
// A bidirectional collection that is not random-access.
let str = "Hello"
let lastElement = str.enumerated().last! // error: value of type 'EnumeratedSequence<String>' has no member 'last'
That said, some algorithms that can benefit from bidirectionality such as reversed()
and suffix(_:)
are also available on regular collections, but with a less efficient implementation. That means that the code would still compile if the enumerated sequence is not bidirectional, it would just perform worse — the most general version of reversed()
on Sequence
allocates an array and adds every element to that array before reversing it:
// A bidirectional collection that is not random-access.
let str = "Hello"
// This no longer conforms to `BidirectionalCollection`.
let enumerated = str.enumerated()
// As a result, this now returns a `[(offset: Int, element: Character)]` instead
// of a more efficient `ReversedCollection<EnumeratedSequence<String>>`.
let reversedElements = enumerated.reversed()
The base collection needs to be traversed twice either way, but the defensive approach of giving the BidirectionalCollection
conformance a stricter bound ultimately results in an extra allocation.
Taking all of this into account, I’ve gone with option 1 for the sake of giving collections access to more algorithms and more efficient overloads of some algorithms. Conforming these collections to BidirectionalCollection
when the base collection conforms to the same protocol is less surprising. I don’t think the possible performance pitfalls pose a large enough risk in practice to negate these benefits.
Comments are welcome! If this direction has downsides that weren't considered yet, then I’d love to hear about it.