Introduction
Collection.startIndex
is expected to be an O(1) operation, but some collections in the standard library deviate from this, which can lead to unpredictable performance. We should change these collections to properly meet this O(1) performance expectation, by doing any expensive work needed to reach the first element ahead of time, in the collection’s initializer.
It’s always been the expectation that lazy collections will defer computations such as closure calls until they’re absolutely necessary, in order to avoid unused computations when iteration is cut short. However, due to deferring the calculation of the start index to the moment the first element is accessed, the behavior of some lazy collections has been inconsistent and inefficient. And with little gain, since essentially every collection operation requires accessing the start index anyway. With this change, all lazy collection wrappers satisfy the O(1) startIndex
requirement, even if that means executing a closure before any elements are accessed.
Motivation
Some algorithms, such as Collection.reversed()
, require repeated access to a collection’s start index. Using this in combination with a collection for which startIndex
is not an O(1) operation can easily lead to poor performance:
var counter = 0
// `LazyFilterCollection` finds the first element that matches the predicate
// each time `startIndex` is called.
let atLeast500 = (1...1000).lazy.filter {
counter += 1
return $0 >= 500
}
for _ in atLeast500.reversed() {}
print(counter) // 251501
Collection.lastIndex(where:)
is a good example of an algorithm where storing the collection’s start index in a local variable can make the difference between good performance and horrible performance:
extension BidirectionalCollection {
public func lastIndex(
where predicate: (Element) throws -> Bool
) rethrows -> Index? {
// Uncommenting the following line turns the below `lastIndex(where:)`
// call from quadratic into linear.
// let startIndex = self.startIndex
var i = endIndex
while i != startIndex {
formIndex(before: &i)
if try predicate(self[i]) {
return i
}
}
return nil
}
}
var counter = 0
let atLeast500 = (1...1000).lazy.filter {
counter += 1
return $0 >= 500
}
_ = atLeast500.lastIndex(where: { _ in false })
print(counter) // 1001 with the local variable, 251501 without
lastIndex(where:)
should arguably store the start index in a variable regardless of its time complexity, but it still illustrates just how easy it is to accidentally get quadratic performance when using certain collections.
Proposed solution
Collections that cannot access their first element in constant time should do this work ahead of time, at initialization, to avoid having to do this work each time startIndex
is called. LazyFilterCollection
from the examples above could achieve an O(1) startIndex
by finding the index of the first element that satisfies the predicate up front, and storing it in an instance variable.
Precomputing the start index does have the downside of sometimes doing premature computations, even if you never end up iterating the collection, as chunks(ofCount:)
from the Swift Algorithms package demonstrates:
var counter = 0
let atMost500 = (1...1000).lazy.filter {
counter += 1
return $0 <= 500
}
_ = atMost500.chunks(ofCount: 100)
print(counter) // 101
This behavior, however, is already present (and possibly unavoidable) in the standard library today:
var counter = 0
let atMost500 = (1...1000).lazy.filter {
counter += 1
return $0 <= 500
}
_ = atMost500.prefix(100)
print(counter) // 102
Source and ABI compatibility
Most offending types in the standard library are @frozen
, in which case a new type must be introduced in order to cache the start index. The old type will be kept around as deprecated in order to maintain ABI stability and a combination of @available
and @_disfavoredOverload
annotations will be used to optimize source compatibility and provide diagnostic assistance.
In most circumstances existing code should keep working as intended:
// keeps working
for x in (0..<10).lazy.filter({ $0.isMultiple(of: 3) }) {
// ...
}
However, these changes are potentially source breaking due to the difference in return type. In these rare cases, the deprecation warning on the old type should provide a helpful hint for resolving the type mismatch error:
// breaks due to a type mismatch as `multiplesOfThree` will be inferred to be a `FilterCollection`
func f() -> LazyFilterSequence<Range<Int>> { // warning: 'LazyFilterSequence' was deprecated in macOS 9999
let multiplesOfThree = (0..<10).lazy.filter { $0.isMultiple(of: 3) }
return multiplesOfThree // error: cannot convert return expression of type 'FilterCollection<Range<Int>>' to return type 'LazyFilterSequence<Range<Int>>'
}
Note: When back-deploying, the availability annotations will instruct the compiler to choose the old types and methods, so no compiler errors or warnings will be emitted.
Detailed Design
The existing LazyFilterSequence
type will be deprecated and replaced by FilterSequence
and FilterCollection
. FilterSequence
will be like LazyFilterSequence
but without the conditional conformance to the collection protocols, which is split off into FilterCollection
. FilterCollection
will also have an extra stored property that stores the first index of the base collection that satisfies the predicate.
Note: OS version 9999 is a placeholder and will be replaced with whatever actual OS versions this functionality will be introduced in.
@available(macOS, deprecated: 9999)
@available(iOS, deprecated: 9999)
@available(watchOS, deprecated: 9999)
@available(tvOS, deprecated: 9999)
public struct LazyFilterSequence<Base: Sequence> {
// ...
}
@available(macOS 9999, iOS 9999, watchOS 9999, tvOS 9999, *)
public struct FilterSequence<Base: Sequence> {
@usableFromInline
internal var _base: Base
@usableFromInline
internal let _predicate: (Base.Element) -> Bool
@inlinable
public init(_base base: Base, _ isIncluded: @escaping (Base.Element) -> Bool) {
self._base = base
self._predicate = isIncluded
}
}
extension FilterSequence: Sequence {
// ...
}
extension FilterSequence: LazySequenceProtocol {}
@available(macOS 9999, iOS 9999, watchOS 9999, tvOS 9999, *)
public struct FilterCollection<Base: Collection> {
@usableFromInline
internal var _base: Base
@usableFromInline
internal let _predicate: (Base.Element) -> Bool
@usableFromInline
internal let _startIndex: Base.Index
@inlinable
public init(_base base: Base, _ isIncluded: @escaping (Base.Element) -> Bool) {
self._base = base
self._predicate = isIncluded
self._startIndex = base.firstIndex(where: isIncluded) ?? base.endIndex
}
}
extension FilterCollection: Collection {
@inlinable
public var startIndex: Index {
return _startIndex
}
// ...
}
extension FilterCollection: LazyCollectionProtocol {}
extension LazySequenceProtocol {
@available(macOS, deprecated: 9999)
@available(iOS, deprecated: 9999)
@available(watchOS, deprecated: 9999)
@available(tvOS, deprecated: 9999)
@_disfavoredOverload
@inlinable
public func filter(
_ isIncluded: @escaping (Elements.Element) -> Bool
) -> LazyFilterSequence<Self.Elements> {
return LazyFilterSequence(_base: self.elements, isIncluded)
}
@available(macOS 9999, iOS 9999, watchOS 9999, tvOS 9999, *)
@inlinable
public func filter(
_ isIncluded: @escaping (Elements.Element) -> Bool
) -> FilterSequence<Self.Elements> {
return FilterSequence(_base: self.elements, isIncluded)
}
}
extension LazyCollectionProtocol {
@available(macOS 9999, iOS 9999, watchOS 9999, tvOS 9999, *)
@inlinable
public func filter(
_ isIncluded: @escaping (Elements.Element) -> Bool
) -> FilterCollection<Self.Elements> {
return FilterCollection(_base: self.elements, isIncluded)
}
}
Sequence.joined
, LazySequenceProtocol.joined
, LazySequenceProtocol.flatMap
, and LazySequenceProtocol.compactMap
will be deprecated and replaced by new overloads in a similar fashion.
Source compatibility
These changes are potentially source breaking due to the differences in return type, though in most circumstances existing code should keep working as intended:
// keeps working
for x in (0..<10).lazy.filter({ $0.isMultiple(of: 3) }) {
// ...
}
// keeps working with deprecation warnings
func f() -> LazyFilterSequence<Range<Int> {
return (0..<10).lazy.filter { $0.isMultiple(of: 3) }
}
// keeps working with depracation warnings
func g() -> LazyFilterSequence<Range<Int>> {
let multiplesOfThree: LazyFilterSequence<Range<Int>> = (0..<10).lazy.filter { $0.isMultiple(of: 3) }
return multiplesOfThree
}
// breaks due to a type mismatch as `multiplesOfThree` will be inferred to be a `FilterCollection`
func h() -> LazyFilterSequence<Range<Int>> {
let multiplesOfThree = (0..<10).lazy.filter { $0.isMultiple(of: 3) }
return multiplesOfThree
}
Effect on ABI stability
Old versions of types and methods are kept around (deprecated and disfavored) in order to maintain ABI stability.
Alternatives considered
Lift the O(1) startIndex
requirement and place the burden on collection algorithms to not compute the collection’s start index more than once.
A lot of existing code was written under the assumption that accessing startIndex
is an O(1) operation, both in the standard library and elsewhere. Meanwhile, collections that cannot trivially access their first element are few and far between. Fixing these collections themselves has a smaller impact on existing code than fixing all code that uses them.
Make startIndex
O(1) and let the subscript do the work necessary to reach the first element.
While this makes the startIndex
operation O(1) without having to precompute anything, this approach is problematic for multiple reasons:
- it violates the O(1) performance requirement of
subscript(position:)
, index(after:)
will likely need to redo this work to get to the second index, and- for collections such as
LazyFilterCollection
this makes it impossible to determine in constant time whether the start index and end index are equal (that is,isEmpty
would still be O(n)).