(Continuing a discussion from
https://twitter.com/cocoaphony/status/729712476451971073\.\)
The current recommendation for converting from lazy to strict is with an
Array constructor:
Array(xs.lazy.<chain>)
Because the Array constructor accesses count and then iterates over the
sequence, this generally means that the entire chain is evaluated twice.
This is a surprising result. See
Why `double` is called 5 times and `filter` is called 4 times? Shouldn't they be called same amount of times? Shouldn't they both be called only twice as I have 2-element array? · GitHub for an
example.
While it is reasonable to argue that a map/filter chain should be pure, it
is much more surprising to require that it be cheap (a common use for lazy
is to avoid calling expensive operations that might not be needed). This
can be a particular issue if network fetches (even idempotent ones) are
included in the chain.
As Joe Groff notes, this is fixable with Array(AnySequence(xs.lazy....)),
but that is very undiscoverable and dependent on knowing the implementation
details of Array.init.
I believe the root cause of this is that LazyFilterCollection is not a
proper Collection. It violates the performance requirements.
CollectionType.count requires O(1) access if Index is a
RandomAccessIndexType. LazyFilterCollection.count is always at least O(N).
LFC also breaks several other O(1) performance requirements (endIndex for
instance). This breaks the performance promises of anything that operates
on a generic collection and receives an LFC. The problem here isn't
Array.init. It's doing everything right. The problem is that LFC doesn't
uphold its side of the bargain.
There are many ways and reasons that lazy filter chains are constructed. I
don't think there's any way for the stdlib to guess the best performance
trade-off for all of them, and optimizing for one breaks others. Given
that, I think it should stick to the documented performance promises. Since
LazyFilterCollection cannot meet the requirements of a Collection (even in
simple cases), a lazy filter must return a LazySequence. This breaks
pre-allocation optimizations, but allows reasoning about the performance in
ways that is not currently possible. We should keep our promises.
While not beautiful, it is possible to recover pre-allocation optimization
in a generic way by adding an initialCapacity option to
Array.init<SequenceType>. This would allow the caller to provide the best
initial allocation in cases where that is externally known, or worth an
explicit counting.
-Rob