I wholeheartedly welcome this proposal, which is finally addressing issues in the critically important area of Swift Standard Library — the Collections framework. I fully support removing the Sequence.SubSequence
. But given this problem was actually raison d'être for my work on Swift Benchmarking Suite, this review will be a bit more detail oriented with regards to the proposed solution. I have written about this topic extensively before in Troubled Sequence
. I’ll focus only on the basic sequence slicing operations — I have done no work with split
.
The sequence slicing benchmarks that feature different underlying types and AnySequence
in particular, were added to highlight how utterly destructive to performance this type-erasing abstraction actually is. But the reasons for some dramatic regressions highlighted in the text of this proposal have almost nothing to do with the loss of customization point. They are caused by unspecialized generic code, dynamic dispatch, value type boxing and reference counting. See section Cost of Abstractions, Revisited in the Troubling Sequence
for more details. In the end, these regressions are unimportant to the evaluation of this proposal to remove Sequence.SubSequence
, because they only highlight the extent of utter optimizer failure in the combination of .lazy
and AnySequence
, which this proposal completely eliminates: As long as no stdlib API returns AnySequence
, we are golden!
Regular eager and lazy API
Since we are breaking source and ABI compatibility for 5.0 with the proposed change, we should do this one as good as we can. In my opinion, we can do better than simply expose the existing implementations of DropFirstSequence
, DropWhileSequence
and PrefixSequence
.
The irregularity of the current sequence slicing implementations, from the perspective of eager/lazy dichotomy is not only bothersome on esthetic grounds, but it has serious impact on the performance of code written in functional style. In my experience, chaining sequence slicing operations can achieve the performance of imperative code only when there is a truly lazy implementation. With the current architecture, this means that relevant sequence slicing methods should also have proper LazySequenceProtocol
implementations, as is already the case with drop(while:)
and prefix(while:)
from SE-0045 implemented in PR #3600. Such code allows the optimizer to boil the functional method chain down to single loop with performance equivalent to hand tuned imperative loop.
Even though the internal sequences were recently cleaned up a bit, I think they still exhibit a bit of technical debt form the piecemeal evolution:
-
DropWhileSequence
eagerly drops the elements in itsinit
method and than lazily iterates over the remainder. -
DropFirstSequence
eagerly drops elements in itsmakeIterator
method and than lazily iterates over the remainder.
The PrefixSequence
looks to be always lazy, so after removing the AnySequence
overhead, the main remaining issue is that it breaks the .lazy
chains, as brought up by @dennisvennink in SR-5754. But I would like to re-measure my MandelbrotSwiftly benchmarks with this change, just to make sure the laziness does its magic… Can I please get a toolchain from PR #20221?
Since we are breaking source and API compatibility here, I think it would be shame to not use this opportunity to also polish this area and complete the LazySequenceProtocol
implementations for the relevant slicing methods before 5.0. At a minimum, we should change the default implementations for dropFirst
, drop(while:)
and prefix
to always eagerly return [Element]
. I think this can be done trivially by using the internal sequences to fully materialize the [Element]
in the default implementation (I have pitched this before). Surfacing the the hybrid eager/lazy implementations helps nobody. Lazy implementations can be added later… right?