Pitch: Efficient `Collection.startIndex` throughout the standard library

Introduction

Collection.startIndex is expected to be an O(1) operation, but some collections in the standard library deviate from this, which can lead to unpredictable performance. We should change these collections to properly meet this O(1) performance expectation, by doing any expensive work needed to reach the first element ahead of time, in the collection’s initializer.

It’s always been the expectation that lazy collections will defer computations such as closure calls until they’re absolutely necessary, in order to avoid unused computations when iteration is cut short. However, due to deferring the calculation of the start index to the moment the first element is accessed, the behavior of some lazy collections has been inconsistent and inefficient. And with little gain, since essentially every collection operation requires accessing the start index anyway. With this change, all lazy collection wrappers satisfy the O(1) startIndex requirement, even if that means executing a closure before any elements are accessed.

Motivation

Some algorithms, such as Collection.reversed() , require repeated access to a collection’s start index. Using this in combination with a collection for which startIndex is not an O(1) operation can easily lead to poor performance:

var counter = 0
 
// `LazyFilterCollection` finds the first element that matches the predicate
// each time `startIndex` is called.
let atLeast500 = (1...1000).lazy.filter {
  counter += 1
  return $0 >= 500
}
 
for _ in atLeast500.reversed() {}
print(counter)  // 251501

Collection.lastIndex(where:) is a good example of an algorithm where storing the collection’s start index in a local variable can make the difference between good performance and horrible performance:

extension BidirectionalCollection {
  public func lastIndex(
    where predicate: (Element) throws -> Bool
  ) rethrows -> Index? {
    // Uncommenting the following line turns the below `lastIndex(where:)`
    // call from quadratic into linear.
    // let startIndex = self.startIndex
 
    var i = endIndex
    while i != startIndex {
      formIndex(before: &i)
      if try predicate(self[i]) {
        return i
      }
    }
    return nil
  }
}
 
var counter = 0
 
let atLeast500 = (1...1000).lazy.filter {
  counter += 1
  return $0 >= 500
}
 
_ = atLeast500.lastIndex(where: { _ in false })
print(counter)  // 1001 with the local variable, 251501 without

lastIndex(where:) should arguably store the start index in a variable regardless of its time complexity, but it still illustrates just how easy it is to accidentally get quadratic performance when using certain collections.

Proposed solution

Collections that cannot access their first element in constant time should do this work ahead of time, at initialization, to avoid having to do this work each time startIndex is called. LazyFilterCollection from the examples above could achieve an O(1) startIndex by finding the index of the first element that satisfies the predicate up front, and storing it in an instance variable.

Precomputing the start index does have the downside of sometimes doing premature computations, even if you never end up iterating the collection, as chunks(ofCount:) from the Swift Algorithms package demonstrates:

var counter = 0
 
let atMost500 = (1...1000).lazy.filter {
  counter += 1
  return $0 <= 500
}
 
_ = atMost500.chunks(ofCount: 100)
print(counter)  // 101

This behavior, however, is already present (and possibly unavoidable) in the standard library today:

var counter = 0
 
let atMost500 = (1...1000).lazy.filter {
  counter += 1
  return $0 <= 500
}
 
_ = atMost500.prefix(100)
print(counter)  // 102

Source and ABI compatibility

Most offending types in the standard library are @frozen, in which case a new type must be introduced in order to cache the start index. The old type will be kept around as deprecated in order to maintain ABI stability and a combination of @available and @_disfavoredOverload annotations will be used to optimize source compatibility and provide diagnostic assistance.

In most circumstances existing code should keep working as intended:

// keeps working
for x in (0..<10).lazy.filter({ $0.isMultiple(of: 3) }) {
  // ...
}

However, these changes are potentially source breaking due to the difference in return type. In these rare cases, the deprecation warning on the old type should provide a helpful hint for resolving the type mismatch error:

// breaks due to a type mismatch as `multiplesOfThree` will be inferred to be a `FilterCollection`
func f() -> LazyFilterSequence<Range<Int>> { // warning: 'LazyFilterSequence' was deprecated in macOS 9999
  let multiplesOfThree = (0..<10).lazy.filter { $0.isMultiple(of: 3) }
  return multiplesOfThree // error: cannot convert return expression of type 'FilterCollection<Range<Int>>' to return type 'LazyFilterSequence<Range<Int>>'
}

Note: When back-deploying, the availability annotations will instruct the compiler to choose the old types and methods, so no compiler errors or warnings will be emitted.

Detailed Design

The existing LazyFilterSequence type will be deprecated and replaced by FilterSequence and FilterCollection. FilterSequence will be like LazyFilterSequence but without the conditional conformance to the collection protocols, which is split off into FilterCollection. FilterCollection will also have an extra stored property that stores the first index of the base collection that satisfies the predicate.

Note: OS version 9999 is a placeholder and will be replaced with whatever actual OS versions this functionality will be introduced in.

@available(macOS, deprecated: 9999)
@available(iOS, deprecated: 9999)
@available(watchOS, deprecated: 9999)
@available(tvOS, deprecated: 9999)
public struct LazyFilterSequence<Base: Sequence> {
  // ...
}
 
@available(macOS 9999, iOS 9999, watchOS 9999, tvOS 9999, *)
public struct FilterSequence<Base: Sequence> {
  @usableFromInline
  internal var _base: Base
 
  @usableFromInline
  internal let _predicate: (Base.Element) -> Bool
 
  @inlinable
  public init(_base base: Base, _ isIncluded: @escaping (Base.Element) -> Bool) {
    self._base = base
    self._predicate = isIncluded
  }
}
 
extension FilterSequence: Sequence {
  // ...
}
 
extension FilterSequence: LazySequenceProtocol {}
 
@available(macOS 9999, iOS 9999, watchOS 9999, tvOS 9999, *)
public struct FilterCollection<Base: Collection> {
  @usableFromInline
  internal var _base: Base
 
  @usableFromInline
  internal let _predicate: (Base.Element) -> Bool
  
  @usableFromInline
  internal let _startIndex: Base.Index
  
  @inlinable
  public init(_base base: Base, _ isIncluded: @escaping (Base.Element) -> Bool) {
    self._base = base
    self._predicate = isIncluded
    self._startIndex = base.firstIndex(where: isIncluded) ?? base.endIndex
  }
}
 
extension FilterCollection: Collection {
  @inlinable
  public var startIndex: Index {
    return _startIndex
  }
 
  // ...
}
 
extension FilterCollection: LazyCollectionProtocol {}
 
extension LazySequenceProtocol {
  @available(macOS, deprecated: 9999)
  @available(iOS, deprecated: 9999)
  @available(watchOS, deprecated: 9999)
  @available(tvOS, deprecated: 9999)
  @_disfavoredOverload
  @inlinable
  public func filter(
    _ isIncluded: @escaping (Elements.Element) -> Bool
  ) -> LazyFilterSequence<Self.Elements> {
    return LazyFilterSequence(_base: self.elements, isIncluded)
  }
  
  @available(macOS 9999, iOS 9999, watchOS 9999, tvOS 9999, *)
  @inlinable
  public func filter(
    _ isIncluded: @escaping (Elements.Element) -> Bool
  ) -> FilterSequence<Self.Elements> {
    return FilterSequence(_base: self.elements, isIncluded)
  }
}
 
extension LazyCollectionProtocol {
  @available(macOS 9999, iOS 9999, watchOS 9999, tvOS 9999, *)
  @inlinable
  public func filter(
    _ isIncluded: @escaping (Elements.Element) -> Bool
  ) -> FilterCollection<Self.Elements> {
    return FilterCollection(_base: self.elements, isIncluded)
  }
}

Sequence.joined, LazySequenceProtocol.joined, LazySequenceProtocol.flatMap, and LazySequenceProtocol.compactMap will be deprecated and replaced by new overloads in a similar fashion.

Source compatibility

These changes are potentially source breaking due to the differences in return type, though in most circumstances existing code should keep working as intended:

// keeps working
for x in (0..<10).lazy.filter({ $0.isMultiple(of: 3) }) {
  // ...
}
 
// keeps working with deprecation warnings
func f() -> LazyFilterSequence<Range<Int> {
  return (0..<10).lazy.filter { $0.isMultiple(of: 3) }
}
 
// keeps working with depracation warnings
func g() -> LazyFilterSequence<Range<Int>> {
  let multiplesOfThree: LazyFilterSequence<Range<Int>> = (0..<10).lazy.filter { $0.isMultiple(of: 3) }
  return multiplesOfThree
}
 
// breaks due to a type mismatch as `multiplesOfThree` will be inferred to be a `FilterCollection`
func h() -> LazyFilterSequence<Range<Int>> {
  let multiplesOfThree = (0..<10).lazy.filter { $0.isMultiple(of: 3) }
  return multiplesOfThree
}

Effect on ABI stability

Old versions of types and methods are kept around (deprecated and disfavored) in order to maintain ABI stability.

Alternatives considered

Lift the O(1) startIndex requirement and place the burden on collection algorithms to not compute the collection’s start index more than once.

A lot of existing code was written under the assumption that accessing startIndex is an O(1) operation, both in the standard library and elsewhere. Meanwhile, collections that cannot trivially access their first element are few and far between. Fixing these collections themselves has a smaller impact on existing code than fixing all code that uses them.

Make startIndex O(1) and let the subscript do the work necessary to reach the first element.

While this makes the startIndex operation O(1) without having to precompute anything, this approach is problematic for multiple reasons:

  • it violates the O(1) performance requirement of subscript(position:),
  • index(after:) will likely need to redo this work to get to the second index, and
  • for collections such as LazyFilterCollection this makes it impossible to determine in constant time whether the start index and end index are equal (that is, isEmpty would still be O(n)).
12 Likes

I understand the motivation, however I think that lazy types should remain lazy. That does not, however, intrinsically preclude an improvement here.

For example, a lazy collection could cache its startIndex, meaning the first access may still be O(n), but subsequent accesses would be O(1).

This avoids pushing nontrivial work up to initialization time, while still achieving fast repeated access to startIndex. In other words, it makes the type more truly and fully lazy:

struct LazyFooCollection {
  var _startIndex: Index? = nil
  
  var startIndex: Index {
    if let i = _startIndex { return i }
    let i = _findStartIndex()
    _startIndex = i
    return i
  }
  
  ...
}

Edit:

Mutatingness makes this a bit weird, so the cache would have to be in a reference-type box. Something like this should work:

class CollectionCache<Index> {
  var startIndex: Index?
  var endIndex: Index?
}

Could also include other things like count as well.

5 Likes

While cacheing the start index on access instead of on initialization would make the type "more truly and fully lazy," it would also incur a heap allocation. What's the benefit? As @timv said:

I think laziness is a means to an end, not an end in itself.

4 Likes

+1. This is has bugged me for a long time, and I've noticed considerable performance benefits by writing my own lazy types which precalculate startIndex/endIndex. As you say, generic code is written with those performance guarantees in mind.

Their elements are still calculated lazily - a LazyFilterCollection still filters its elements on-demand; it's just that instantiation is not O(1). That's fine IMO. It never made that guarantee. It did, however, claim to be able to give its startIndex in constant time (by virtue of its Collection conformance).


The only thing I hesitate about is adding new types and going through a source break for this. IMO, the lazy collection types suffer from 2 issues and this is only one of them; the other is that the closure comes with retain/release overhead, and is not actually part of the thing that gets specialised. To the compiler, all LazyFilterCollection<Array<Int>>s are the same.

I hope that one day we'll have generic value types, so that we could parameterise and specialise lazy wrappers by their individual predicates. If we go through this now, I fear there might be less appetite to introduce something like this in future. Then again, I don't know of any plan to add generic value types.

6 Likes

Thanks for raising this concern, this is definitely good to keep in mind when adding new types and deprecating the old ones. When doing this, it makes a lot of sense to carefully consider which other changes we might want to make at the same time that we couldn't make otherwise. For LazyFilterCollection in particular this could be a good opportunity to stop using Base.Index as the Index type, and introduce a wrapper type as the index instead.

One of the main goals of this pitch is to collect feedback on this approach in general — we're using this technique in several collections in Swift Algorithms and it would be great to settle on a preferred direction when this trade-off presents itself, which it does surprisingly often.

3 Likes

I think generic value parameters have kind of gone a bit under-the-radar next to variadic generics, which are much sexier, but they would give us tremendous capabilities for any kind of wrapper which includes a closure. As things stand, you can get significantly better, more efficient wrappers than any general-purpose type by copy/pasting the implementation and swapping the closure for a function. In a project I'm working on, I did this with LazyFilterCollection and saw enormous performance improvements (I assume because the compiler didn't need to retain a closure, and generic specialisation told it exactly what the filter predicate was, so it could inline it and optimise around it).

Just re-ran my benchmarks
benchmark                                              column      baseline    latest       %
---------------------------------------------------------------------------------------------
Constructor.SpecialNonFile.AverageURLs filtered        std            31.94     22.33   30.09
Constructor.SpecialNonFile.AverageURLs filtered        iterations  20704.00   5217.00   74.80
Constructor.SpecialNonFile.AverageURLs filtered        time        59565.00 250315.00 -320.24
...
Constructor.SpecialNonFile.IPv4 host filtered          std            33.79     40.77  -20.64
Constructor.SpecialNonFile.IPv4 host filtered          iterations  26756.00   8477.00   68.32
Constructor.SpecialNonFile.IPv4 host filtered          time        46838.00 154797.00 -230.49
...
Constructor.SpecialNonFile.IPv6 host filtered          std            29.79     28.72    3.62
Constructor.SpecialNonFile.IPv6 host filtered          iterations  22074.00   7611.00   65.52
Constructor.SpecialNonFile.IPv6 host filtered          time        57169.00 168267.00 -194.33

"baseline" is using my copy/pasted lazy-filter, "latest" is using the standard library's one. It's anywhere from ~200% to 300% slower to use the standard library.

I'm not sure how widely-known this issue is, or how the core team rates its severity. Personally I think it is more pressing than variadic generics, such that it's probably worth holding off on touching the lazy collections until they can be fixed.

As a bonus, generic value parameters would mean we can finally make @lazy a property wrapper, without having to actually store a closure inside of it.

6 Likes

Generic value parameters would certainly be useful, but I do think it's worth pointing out that they can't fully replace the existing dynamic lazy implementations, because types are allocated permanently, not reference counted, and thus using run-time values (e.g. closures with captures) as generic parameters would mean an unbounded memory leak.

Additionally, supporting capture-less closures as generic value parameters would be possible, but raises questions about the equivalence of types both statically and dynamically. Are Lazy<{ 0 }> and Lazy<{ 0 }> equivalent if they're written in two different places? Does the run-time equivalence of Lazy<{ 0 }> and Lazy<{ 1 - 1 }> depend on optimization? (This is more a design question than an implementation question.)

2 Likes

The way I picture it, every function and closure literal you write would be it’s own “value” for the purpose of GVP. So you wouldn’t have Lazy<{ 0 }> but Lazy<(closure in file:myfile.swift, line:200)>. If you want to give those closures a specific identity (e.g. for ABI-stable modules), you’d use a function and get Lazy<MyModule.myFilterFunc>. That way, we avoid any questions about optimisation changing the type.

That would work for collection and other wrappers, where nobody should really care what type they are. Every time you need to write a stdlib lazy type out manually, I think it points to some other gap in the generics system (e.g. constraints on opaque types or PAT existentials). The point of the lazy wrappers, as I understand it, is to be used like SwiftUI’s views - at the “some Collection” level (as SwiftUI uses “some View”).

Anyway, I think the idea of using “closure literals” means this should be able to support lazy in the same way we do now, without leaking memory. It may even be able to support captures — perhaps even more efficiently since it is specialised for each utterance.

1 Like

In this particular code snippet the real culprit is the Swift sequence and collection protocol hierarchy rather than an issue with startIndex not being 0(1) for lazy filter collections. This bottleneck is solvable without introducing some new lazy structure with memoized behavior, the thing that is missing is something like a BidirectionalSequence protocol, i.e. an iterator that can be reversed without being a necessarily Collection.

for instance, the exact same code snippet in Rust (where sequence operations are lazy by default and a DoubleEndedIterator exists) only calls the predicate 1000 times:

let mut counter = 0;

let at_least_500 = (0..1000).filter(|x| { 
    counter += 1;
    x >= &500 
});
    
for _ in at_least_500.rev() { }

println!("{:?}", counter);  // 1_000

This lack of BidirectionalSequence can cause many overhead, another example:

for (index, element) in (0...1000).enumerated().reversed() {
    print(index, element)
}

This loop in Swift causes the creation of an unneeded intermediate array because enumerated() return type does not conditionally conforms to Collection / BidirectionalCollection, hence, .reversed() is the Sequence fallback which creates an array.

The same applies for your lastIndex() example, the existence of a BidirectionalSequence would solves the issue gracefully.

IMO the underlying issue is more a lack of granularity in Swift Sequence-related protocols.

3 Likes

I had recently run across that issue with enumerated, and opened what I thought was going to be a pretty trivial PR: [stdlib] Add conditional RandomAccessCollection conformance to EnumeratedSequence by cltnschlosser · Pull Request #36340 · apple/swift · GitHub, but @Ben_Cohen brought up some pretty valid concerns. Most notably that the conditional conformance to Collection isn’t performant, if you don’t have a RandomAccessCollection.

1 Like

Sure, this PR slightly deviates from the original subject of this thread but is interesting (it is more related to a SwiftUI issue with ForEach views IMO). The enumerated behavior you are requesting (which can be very useful) requires RandomAccessCollection conformance to be efficient because you need to compute the distance between two indices in O(1) to be able to use the subscript in O(1).

What I said here is hence not correct:

As you said, it should also conform to RandomAccessCollection to be efficient. But anyway, as I said the real issue here is the lack of a BidirectionalSequence protocol because the operation only involves iterators, not collections.

Another Collection complexity requirement I find myself having to break from time to time is the O(1) subscripting requirement. It doesn't really work if you need to transform/copy the underlying data in order to produce the result.

For instance, LazyMapCollection obviously cannot meet these requirements unless its transform closure is O(1).

I wonder if we could loosen the requirement, and say that the complexity of accessing a Collection's element via a subscript should not depend on the length of the collection (i.e. O(1) WRT length), but that it can depend on other things such as the size of the element being accessed.

FWIW, that's actually how I've always understood the O(1) complexity guarantees, and I believe that's the intended interpretation. Guarantees of O(n) are given only in terms of the length of the collection, which (if interpreted in the strictest sense) is trivially violated by e.g., firstIndex(of:) since it must also depend on the complexity of the given Element's Equatable conformance.

ETA: Of course, this is not to say that explicit clarification on this point isn’t a good idea! I agree that the current wording, particularly for O(1) guarantees, is ambiguous.

8 Likes