SE-0222: Lazy compactMap Sequence

The review of SE-0222: Lazy compactMap Sequence begins now and runs through Sunday, August 5th, 2018.

Reviews are an important part of the Swift evolution process. All review feedback should be either on this forum thread or, if you would like to keep your feedback private, directly to me as the review manager via email or direct message on the forums. If you send me email, please put "SE-0222" somewhere in the subject line.

What goes into a review of a proposal?

The goal of the review process is to improve the proposal under review through constructive criticism and, eventually, determine the direction of Swift.

When reviewing a proposal, here are some questions to consider:

  • What is your evaluation of the proposal?

  • Is the problem being addressed significant enough to warrant a change to Swift?

  • Does this proposal fit well with the feel and direction of Swift?

  • If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?

  • How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

As always, thank you for contributing to Swift.

John McCall
Review Manager

5 Likes
  • What is your evaluation of the proposal?

Worthwhile optimisation.

  • Is the problem being addressed significant enough to warrant a change to Swift?

Yes, it is little extra code.

  • Does this proposal fit well with the feel and direction of Swift?

Yes, a lot of effort goes into optimizing the collections library.

  • If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?

The Java streams library does similar optimizations.

  • How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

Played with the code for a bit.

It seems like the lazy system keeps getting more complicated, not less. Are we likely to need more overloads and types like this in the future? Have we evaluated an approach where we try to reduce the number of types and overloads by, for instance, treating filter as sugar for compactMap?

If these sorts of options have been evaluated and found wanting, I think this proposal is a good one; I just want to make sure we’re not needlessly missing out on a cleaner design.

5 Likes

This was my main thought when reading the proposal. It's somewhat unsatisfying to have LazyMapCollection, LazyFilterCollection and LazyCompactMapCollection, when they could all theoretically be implemented using LazyCompactMapCollection and a suitable _transform. Or perhaps they could all be folded into LazyCollection, which isn't very useful on its own, by giving it an appropriate closure instance variable.

These options might be unacceptable for performance reasons (or source compatibility reasons?) but it would be nice if this was mentioned in the proposal, or clarified in this review thread. I suppose this could all become more of an implementation detail if Swift had generalised existentials, but currently all of these types are exposed to users in ways that can be verbose or confusing.

Edit: Looking into it, the discussion thread has some more detail on this that unfortunately didn't make it into the proposal. e.g.

This is a good point that you could only work around by e.g. storing a Boolean instance variable that keeps track of whether filter or similar had ever been called on the Lazy*Collection. So perhaps the answer is performance reasons. There is some benchmarking of various options in that thread, but it is quite hard for me to understand and summarise. The part about performance testing at the end of the proposal is a link to a fairly significant amount of code with no summary of results.

Hmm I thought I had mentioned that in the proposal but I guess I didn't. I should probably add it in now

Looks like I forgot that, too. I guess it's a good thing, because the results have changed since I made the proposal. The benchmarking code is in main.swift, btw, the rest of the files are implementing the change so it can be run without the updated stdlib (and can compare the new and old). Current Lazy and New Lazy are testing the stacking of lazy maps and filters, Current CompactMap and New CompactMap are comparing against a manually made single compact map, and Manual Loop is a manual loop (and my original performance comparison target, due to it originally being consistently the fastest)

Results under the Swift 4.1 compiler (the one originally used):

      Current Lazy: 0.197009034s
          New Lazy: 0.022848093s
Current CompactMap: 2.60150722s
    New CompactMap: 0.022660378s
       Manual Loop: 0.019604643s

Under the latest development compiler, all tests run at the same speed as the manual loop, meaning that the optimizer is capable of optimizing the old lazy system (which I previously didn't believe would be possible), however using -swift-version 5 on the latest development compiler (which enables the map-map and filter-filter flattening change mentioned in this proposal) breaks it again:

      Current Lazy: 0.119408128s
          New Lazy: 0.019238704s
Current CompactMap: 0.019551983s
    New CompactMap: 0.019371183s
       Manual Loop: 0.020673078s

Given this, it looks like the main benefits of this proposal would be the fact that Lazy*Collections no longer stack all over each other, and the lower reliance on finicky compiler optimizations (as shown by the -swift-version 5 code breaking the optimization), however the original benefit seems to be doable by the optimizer now even if it might need better tuning to work more of the time.

2 Likes

Great, thanks for looking into this and the update. It's not clear to me if these new results have any impact on the idea of collapsing some of these very similar types, do you have any opinions about that?

My thought on collapsing is that the count and underestimatedCount optimization with LazyMapCollection is something that I would consider important. On the other hand, the optimization on LazyFilterCollection seems much less important to me, so I would be fine with collapsing that with LazyCompactMapCollection.

You could theoretically retain that information with a flag on the type (e.g. private var preservesUnderlyingCollectionCount = true that you set to false when you use an operation that might change the count, like a filter) and therefore possibly collapse them all into a single LazyCollection type. I don't really understand the tradeoffs (performance, standard library size, type checking performance, source compatibility, simplicity, etc) here.

That would work for Filter, but not for Map as it also has a bunch of conditional conformances to things like RandomAccessCollection which you can't just throw behind if statements

There has been exactly one structured review of this proposal. I'm going to hold this thread open for another couple of days; if you have any feedback on this proposal, please make it now.

Does the phrasing here imply that only posts that go through the review questions one-by-one are really considered to be reviews? The standard review template says “When reviewing a proposal, here are some questions to consider”, which I always interpreted as a loose guideline, not a rule.

It's not a voting system; we do read and consider all the feedback we receive. But your posts are a great example of why I like to get structured reviews, because after reading your posts, my summary is currently "concerned about slow expansion in the number of lazy collection types; ideas about how to avoid that; no clear +/- position on proposal as it actually stands".

Do you approve of the idea of making compactMap a lazy operation on lazy collections?

Makes sense, I can see that might not be clear because I only posted about implementation details rather than the overall goal. I think the goal of the proposal is good, and the proposed implementation is also fine, I was just trying to understand the tradeoffs because some of the information from the discussion thread was missing.

I'm not sure this is the idea in the proposal. As I understand it, it is already a lazy operation but it results in a lot of nested types and some suboptimality in performance. e.g.

let a = [1, nil, 2, 3, 4]
let b = a.lazy.compactMap { $0 } // LazyMapCollection<LazyFilterCollection<LazyMapCollection<[Int?], Int?>>, Int>

Removing this fairly unhelpful chain of nested types seems like a mostly straightforward win, except for the small source compatibility concern, which is probably why there wasn't much discussion that wasn't about the implementation.

I think this proposal is plugging and important performance sensitive hole in the lazy system, fully in line with its current implementation.

To the concern about raising complexity of the lazy system has been raised above, I can say that after having played quite extensively with tangentially related lazy sequences that are currently unsupported in standard library, due to historically ad-hoc approach to its implementation, the lazy subsystem is the only way to get a well performing code. Its current implementation is leaking a ton of implementation details (there is sprawling plethora of concrete return types, that aren't useful outside of their super-narrow role of opaque container conforming to the Sequence/Collection protocols), but until we have much more powerful type system, we have to live with that. This proposal even helps to curb this a little bit by collapsing a chain of lazy calls into single return type that doesn't propagate the nested generics too much.

Therefore, in my opinion, this proposal is an important stopgap measure to get more performant sequences and collections in the standard library. :+1:

6 Likes

A lot of people are on vacation, so the Core Team's going to talk about this next week.

Sorry for the long delay.

Discussion on this proposal was generally in favor of it, but several people expressed reservations about increasing the apparent complexity of the library's lazy subsystem. The Core Team agrees that it's unfortunate that decisions like this are exposed to clients of the library; this objection might be addressed by a feature like opaque result types. The Core Team also observes that the performance benefits of this change currently appear to be negligible because of improvements to the optimizer. Accordingly, this proposal has been rejected. If future developments make it profitable, we can reconsider it then.