SE-0234: Removing Sequence.SubSequence

+1, it's good

Yes. SubSequence (and honestly Sequence broadly) is too rough of an edge for how important it is, and anything we can do to simplify it is good.

I don't have a strong feeling here.

N/A

Quick reading, and have spent a lot of time with sequences.

The proposal doesn't mention them, because the opaque type feature doesn't exist yet so it isn't an actual alternative to consider (and waiting is not an option for ABI reasons). But I want to push back a bit on the assumption, both here and in the pitch, that opaque types would be better, even if we had them. The meme that opaque types are always preferable is a false one.

Keeping an air of mystery about what you are returning is not an end in itself. It is a means to two ends — of hiding complexity, and of preserving source compatibility under change.

We hide complexity to help the user understand what is important. If the exact implementation details of what type is returned are likely to distract and confuse the user, and that adding an extra type into the library would be noise, then an opaque type is better. But this comes at a cost. Opaque types are also less clear about exactly what is going on. The question is, is that lack of clarity worth it to reduce the noise.

In the recent example of LazyCompactMap, it was worth it. The lazy sequences are a bit strange, and the laziness would still be spelled out explicitly in the opaque type (because it would be an opaque implementation of LazySequenceProtocol). Lazy types are often wrapped around other lazy types in a chain and those types get unwieldy, so the opaqueness helps there too.

In this case, it's different, because this method is shadowed later in the hierarchy. That shadowing isn't great — shadowing also causes confusion and mishaps — but there's no good alternative. When shadowing like this, it's important to be as clear as possible at every point. You ought to know, when you call suffix, that you have called a version that returned an array not a slice. You ought to know, when you call DropFirstSequence, that you've got a lazily evaluated sequence wrapper. You shouldn't have to wonder what the mystery thing you got back actually is. That would be unhelpful.

It would also be unhelpful, when the contents of that mystery package is an array, to not give the user an actual array. It's really nice to be given an array! Arrays have really important properties you can usefully take advantage of. Arrays are currency types and you should prefer to return them when they're the best option. Even if you think there might be a better implementation in the future, you should think really hard about whether that future potential outweighs the big user benefit of returning an array.

Opaque types also help preserve source stability when you want to change things in later releases. They will let you avoid brittle interfaces when you want to refactor your code. This will also be great for non-ABI-stable libraries; I wish we'd had this feature during the early development of the standard library. But once a library declares ABI stability, it's not so useful, because symbols are forever. So even if we want to change these methods to return something different in the future, we'd have to keep the old methods around until the next ABI break. Yes, recompiling means you'll pick up the new versions, but only when combined with minimum deployment targets. And there's also a real risk of confusion: if the behavior of the new version changes significantly (and presumably it does, to justify changing it), then you're in this odd situation where behavior changes in an opaque way, determined at a distance by things like which compiler you used and what availability settings kick in. That's a pretty bad unintended consequence.

tl;dr: opaque types are great for lots of use cases, but they're not a panacea.

3 Likes

If sequences are potentially single-pass, you really can't get the suffix lazily. To iterate is to consume, so you need to buffer what you consumed if you ever want to see it again.

No methods like this are escaping currently. Escaping closures can be very surprising to users if they ever capture state. They also can't be throwing (when would they throw). This is one reason why map and filter aren't lazy by default. Unlike suffix though, there is a reasonable lazy equivalent for dropWhile which you currently get on LazySequence.

What this currently does is create an iterator, advance it until the predicate returns false, and stash that element in case of that. Then when you iterate over it, it serves that element back up and then continues with the rest of the iterator.

This is again based on the assumption that sequences can be single pass. As it happens, if the iterator is a copyable value type (like almost all are for multi-pass sequences) it will all work out OK because the stored iterator for each iteration is a copy that restores the state each time.

The good thing about making these just extensions instead of customization points is we have a lot more flexibility in the future to deprecate or replace them with new methods (gated on swift version) that do different things, depending on how move-only types work out or if we declare sequence to be multi-pass, or if we deprecate sequence altogether.

2 Likes

I’m not sure the tail-operations (suffix and dropLast) really belong on Sequence at all.

Also, does split have to be eager, or could we make a lazy splitting wrapper sequence?

1 Like

There's a good case to be made that they don't, though probably one that would be deemed not sufficiently justified given the source breakage. That should be a separate subsequent proposal, though. This proposal is a necessary precursor to it.

Given it takes a closure, the same reasoning as applied above means it cannot be lazy by default. A proposal to add a lazy version to lazy might be welcomed, though.

2 Likes
  • What is your evaluation of the proposal?

+1 (I would also support removing the rest of Sequence in lieu of Collection and Iterator being root concepts)

  • Is the problem being addressed significant enough to warrant a change to Swift?

Yes, forcing all return values to the same type makes it impossible to provide an efficient implementation for anything that is naturally a Sequence but not counted and bidirectional.

@Ben_Cohen

This is a nice performance tweak, but it doesn’t justify the significant downsides listed above. It is essentially improving generic performance at the expense of concrete performance. Normally, these kind of generic improvements come at just a tiny cost (e.g. slight increase in compile time or binary size) rather than a significant runtime performance penalty.

Hi Ben, could you clarify the following for me? I'm a bit confused trying to figure out how the highlighted sentence logically follows the rest. You demonstrate the performance regression that happens when the methods aren't dynamically dispatched. You then end the section saying that using generic algorithms that benefit from dynamic dispatch normally comes at a tiny cost rather than a significant runtime performance penalty (probably referring to the preceding benchmarks). Is that right? It seems I didn't quite get what you were trying to convey with that last sentence. Is it simply a side-note? Just in case, sorry for the language gap!

  • What is your evaluation of the proposal?

I once tried to write a single type that could do most of what Sequence.SubSequence would need to do. It didn't seem completely impossible, but it was a nightmare.

Nobody's ever gonna really use Sequence.SubSequence. Dump it.

  • Is the problem being addressed significant enough to warrant a change to Swift?

Yes. We shouldn't be erasing these types.

  • Does this proposal fit well with the feel and direction of Swift?

Yes. This is a sensible clean-up.

  • If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?

N/A.

  • How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

Just a glance.

2 Likes

I'm absolutely +1, my only concern is that I'm not sure whether to write code for Sequence or Collection anymore after this change. I'd like to make use of the more efficient suffix/etc methods that use SubSequence and I'd have to write my code in terms of Collection for that, but not all sequences in the standard library implement Collection. The proposal states that several types such as Zip2Sequence could be conditionally conformed to Collection after this change, but what's the timeline for this? Will this need to be proposed separately?

My intention was that this proposal includes making them collections (similar to how the conditional conformance proposal made Array and Optional Equatable, codable things Codable etc). The work is pretty trivial once we boot out the and stuff in Sequence.

2 Likes

What is your evaluation of the proposal?
+1

Is the problem being addressed significant enough to warrant a change to Swift?
Yes

Does this proposal fit well with the feel and direction of Swift?
Yes

How much effort did you put into your review? A glance, a quick reading, or an in-depth study?
Read the proposal

What is your evaluation of the proposal?

:+1:

The ability to conform a Sequence to Collections higher up the hierarchy chain in a single type is long overdue.

I tried to conform Zip2Sequence to Collection a few months ago. And while I did finally manage to make it work in the end after hours of frustration, I ended up rewriting almost every single instance method that touched SubSequence.

Fast forward to last week. I git cloned @Ben_Cohen's repo, checked out the PR, created a local development toolchain, loaded it up into Xcode and conformed Zip2Sequence to Collection in literally 5 minutes. (Although it also helped that I already had a full implementation.)

It really is a breeze to work with.

Is the problem being addressed significant enough to warrant a change to Swift?

Yes.

Does this proposal fit well with the feel and direction of Swift?

Yes.

If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?

How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

Multiple thorough readings. Already wrote code with it.

:ship: it.

5 Likes

What is your evaluation of the proposal?

+1.

Is the problem being addressed significant enough to warrant a change to Swift?

Yes.

Does this proposal fit well with the feel and direction of Swift?

Yes, it eliminates an unnecessary complexity.

How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

Read the proposal.

I wholeheartedly welcome this proposal, which is finally addressing issues in the critically important area of Swift Standard Library — the Collections framework. I fully support removing the Sequence.SubSequence. But given this problem was actually raison d'être for my work on Swift Benchmarking Suite, this review will be a bit more detail oriented with regards to the proposed solution. I have written about this topic extensively before in Troubled Sequence. I’ll focus only on the basic sequence slicing operations — I have done no work with split.

The sequence slicing benchmarks that feature different underlying types and AnySequence in particular, were added to highlight how utterly destructive to performance this type-erasing abstraction actually is. But the reasons for some dramatic regressions highlighted in the text of this proposal have almost nothing to do with the loss of customization point. They are caused by unspecialized generic code, dynamic dispatch, value type boxing and reference counting. See section Cost of Abstractions, Revisited in the Troubling Sequence for more details. In the end, these regressions are unimportant to the evaluation of this proposal to remove Sequence.SubSequence, because they only highlight the extent of utter optimizer failure in the combination of .lazy and AnySequence, which this proposal completely eliminates: As long as no stdlib API returns AnySequence, we are golden!

Regular eager and lazy API

Since we are breaking source and ABI compatibility for 5.0 with the proposed change, we should do this one as good as we can. In my opinion, we can do better than simply expose the existing implementations of DropFirstSequence, DropWhileSequence and PrefixSequence.

The irregularity of the current sequence slicing implementations, from the perspective of eager/lazy dichotomy is not only bothersome on esthetic grounds, but it has serious impact on the performance of code written in functional style. In my experience, chaining sequence slicing operations can achieve the performance of imperative code only when there is a truly lazy implementation. With the current architecture, this means that relevant sequence slicing methods should also have proper LazySequenceProtocol implementations, as is already the case with drop(while:) and prefix(while:) from SE-0045 implemented in PR #3600. Such code allows the optimizer to boil the functional method chain down to single loop with performance equivalent to hand tuned imperative loop.

Even though the internal sequences were recently cleaned up a bit, I think they still exhibit a bit of technical debt form the piecemeal evolution:

  • DropWhileSequence eagerly drops the elements in its init method and than lazily iterates over the remainder.
  • DropFirstSequence eagerly drops elements in its makeIterator method and than lazily iterates over the remainder.

The PrefixSequence looks to be always lazy, so after removing the AnySequence overhead, the main remaining issue is that it breaks the .lazy chains, as brought up by @dennisvennink in SR-5754. But I would like to re-measure my MandelbrotSwiftly benchmarks with this change, just to make sure the laziness does its magic… Can I please get a toolchain from PR #20221?

Since we are breaking source and API compatibility here, I think it would be shame to not use this opportunity to also polish this area and complete the LazySequenceProtocol implementations for the relevant slicing methods before 5.0. At a minimum, we should change the default implementations for dropFirst, drop(while:) and prefix to always eagerly return [Element]. I think this can be done trivially by using the internal sequences to fully materialize the [Element] in the default implementation (I have pitched this before). Surfacing the the hybrid eager/lazy implementations helps nobody. Lazy implementations can be added later… right?

8 Likes

This recommendation really needs a justification. It's not clear to me why it would be better to return arrays instead of wrapper types. I was pretty intentional when proposing this change in leaving them in-place, so I don't think it's correct to describe their continued presence as debt through evolution. It was a conscious choice, though certainly open to debate.

The current situation in the standard library is things are as lazy as they can be without incurring a downside (other than that of relying on the optimizer to strip off the cost of modest abstractions).

So for example, BidirectionalCollection.reversed, or zip, can return lazy wrappers, because there is no risk from that laziness – it's all win.

Whereas Sequence.filter and Sequence.map are not lazy by default because there are two big downsides to making them lazy: a performance one, because the mapping/filtering will happen on every iteration, which is bad if that operation is costly; and a correctness one, because if the closure has side effects or is nondeterminstic, the user may not realize that they could get different results every time. It also means the closure can throw, unlike in the lazy version.

So you can basically ask two questions whenever you add a method to Sequence: could it being lazy bring benefits, and are there any risks from making it lazy? If the answer is yes/no, it should be lazy by default. If the answer is yes/yes, it should be eager, and there should be a lazy version put onto LazySequenceProtocol.

dropFirst is clearly in the yes/no bucket. Returning [Element] instead of a lazy wrapper with the elements already dropped could be disastrous for performance in many cases, where only the next few elements are consumed. Why allocate and copy potentially huge numbers of elements for no reason? The case needs to be stronger than that. Bear in mind the most common use is dropFirst(1) – so common that we default the argument to 1.

prefix is the same clearly beneficial with no risks bucket.

dropWhile on the other hand falls into the yes beneficial but with risks bucket, for the same reasons as map and filter: it takes a closure. That's why it eagerly drops the elements when called. But there is still a useful middle-ground where the iterator can be consumed eagerly, but then the rest of the sequence produced lazily.

It helps avoid allocating memory (potentially a lot of memory) gratuitously. It also avoids dropFirst() from being O(n) on every call. The same argument could be applied to zip – should we deprecate the current version and instead return [(T,U)]? Should repeatElement also return an array? What is the justification for this performance pessimization? Consistency? The justification needs to be stronger than that.

6 Likes

I understand now the convenience factor of the hybrid default implementation, that doesn't force people to explicitly choose between eager and lazy. Thank you for the explanation, that justification makes perfect sense.

I was coming from the perspective of .lazy sequences, where the eager part messes up the method chaining. That's probably a stronger case to also provide LazySequeceProtocol version, that delays the dropping until the first call to Iterator.next, right?

1 Like

I'm not sure it's all that useful to defer dropping until the first call to next. The times when someone calls makeIterator, but then doesn't consume from it at all are probably pretty tiny (even in the case of zip, you'd only realize a win if the left side of the zip was empty). The trade-off is that by dropping all the elements at the start, you avoid having to check the count is zero every time you call next. It's probably worth a quick benchmarking to see if that makes any difference.

I think someone mentioned upstream that DropFirstSequence should conditionally conform to LazySequenceProtocol where Base: LazySequenceProtocol, which would avoid the current un-lazying.

It would be a good starter bug/proposal for someone to go through and find other eager things that should be added to lazy. split is definitely a case where a lazy equivalent would be good to add. Probably also, any future proposals for eagerly-evaluated additions should usually be paired with a lazy version if appropriate.

I'm specifically thinking about lazy method chaining, that allows the compiler to fuse them all into single loop. I don't have examples that include dropFirst, but to illustrate what I have in mind:

Or a little bit of manual fusion of closures:

When I replaced prefix with my implementation:

I'd love to verify that #PR 20221 matches that performance. Can you please provide us with downloadable toolchain?

SE-0234 has been accepted; thank you all for your participation.

I'll be closing this thread, but please feel free to take the discussion about lazy sequence implementations to the evolution discussion forum.