Introduce lazy version of compactMap

It seems small and uncontroversial from my viewpoint, so it makes sense that there wasn’t much activity. The only questions in my mind are ones about compatibility and implementation that someone from the standard library team will probably weigh in on.

I performed some measurements (see the code at the bottom). The idea was to a) understand whether chaining lazy.filter.map.filter.map is as bad as the proposal suggests (no it’s not that bad), and b) find out whether the implementation of filter and map in terms of the new LazyCompactMapSequence/LazyCompactMapCollection regresses performance or not, because, obviously, there should be a price to be paid for the convenience of a single type, as opposed to the long chain of LazyMapCollection<LazyFilterCollection<LazyMapCollection<...>>>.

Results

What happened next will surprise you!
Seriously, I was quite surprised myself. Here are the results (yet again, the code is a bit long, so it’s presented below).

> $ time ../nestedfiltermap bespoke
true
../nestedfiltermap bespoke  0.07s user 0.01s system 94% cpu 0.080 total
                                                                                                                        
> $ time ../nestedfiltermap lazyFilterMap
true
../nestedfiltermap lazyFilterMap  0.26s user 0.01s system 98% cpu 0.273 total
                                                                                                                        
> $ time ../nestedfiltermap lazyFilterMap2
true
../nestedfiltermap lazyFilterMap2  0.25s user 0.01s system 98% cpu 0.262 total
                                                                                                                        
> $ time ../nestedfiltermap lazyCompactMap
true
../nestedfiltermap lazyCompactMap  7.61s user 0.01s system 99% cpu 7.630 total
                                                                                                                        
> $ time ../nestedfiltermap lazyCompactMap2
true
../nestedfiltermap lazyCompactMap2  0.18s user 0.01s system 97% cpu 0.196 total

Tests with 2 suffix use the proposed types.

Outcomes

  • Surprisingly the lazyCompactMap (that I implemented with an intention to establish a reasonable baseline) turned out to be the slowest contestant, by far.
  • Simple chaining (i.e. lazy.filter {}.map{}.filter{}) is not slow. Perhaps due to artificially simple closures, but I would argue that if you chain these calls in this manner, then the closures would be quite simple.
  • lazyFilterMap2 is actually faster than lazyFilterMap, which means that even if there is some extra overhead added by transforming boolean values into Optionals in order to be compatible with compactMap, this overhead is being effectively eliminated by the optimizer.

Some more results

For comparison, here are the results of using AnySequence instead of a simple array of Ints.

> $ time ../nestedfiltermap bespoke
true
../nestedfiltermap bespoke  6.01s user 0.02s system 99% cpu 6.070 total
                                                                                                                        
> $ time ../nestedfiltermap lazyFilterMap
true
../nestedfiltermap lazyFilterMap  6.05s user 0.01s system 99% cpu 6.069 total
                                                                                                                        
> $ time ../nestedfiltermap lazyFilterMap2
true
../nestedfiltermap lazyFilterMap2  6.13s user 0.01s system 99% cpu 6.147 total
                                                                                                                        
> $ time ../nestedfiltermap lazyCompactMap
true
../nestedfiltermap lazyCompactMap  7.42s user 0.01s system 99% cpu 7.439 total
                                                                                                                        
> $ time ../nestedfiltermap lazyCompactMap2
true
../nestedfiltermap lazyCompactMap2  5.96s user 0.01s system 99% cpu 5.971 total

The distribution of the results looks quite similar, just a larger overhead of using AnySequence that’s applied to all the cases.

Code

#if false

public let first10k = AnySequence(Array(0...100_000-1))
typealias MainIterator = AnySequence<Int>.Iterator

#else

public let first10k = Array(0...100_000-1)
typealias MainIterator = Array<Int>.Iterator

#endif

@inline(never)
public func lazyFilterMap(_ N: Int) {
  var result = 0
  for _ in 1...10*N {
    let numbers = first10k.lazy
      .filter { $0 % 3 == 0 }
      .map { $0 * 2 }
      .filter { $0 % 8 == 0 }
      .map { $0 / 2 + 1}
    // print(type(of: numbers))
    result = numbers.reduce(into: 0) { $0 += $1 }
  }
  print(result == 416691666)
}

@inline(never)
public func lazyCompactMap(_ N: Int) {
  var result = 0
  for _ in 1...10*N {
    let numbers = first10k.lazy
      .compactMap { (x: Int) -> Int? in
        if x % 3 != 0 { return nil }
        let y = x * 2
        if y % 8 != 0 { return nil }
        let z = y / 2 + 1
        return z
      }
    // print(type(of: numbers))
    result = numbers.reduce(into: 0) { $0 += $1 }
  }
  print(result == 416691666)
}

struct Numbers: Sequence {
  func makeIterator() -> Iterator {
    return Iterator()
  }

  struct Iterator: IteratorProtocol {
    var it: MainIterator

    init() {
      self.it = first10k.makeIterator()
    }

    mutating func next() -> Int? {
      while true {
        if let x = it.next() {
          if x % 3 != 0 { continue }
          let y = x * 2
          if y % 8 != 0 { continue }
          let z = y / 2 + 1
          return z
        } else {
          return nil
        }
      }
    }
  }
}

@inline(never)
public func bespoke(_ N: Int) {
  var result = 0
  for _ in 1...10*N {
    let numbers = Numbers()
    // print(type(of: numbers))
    result = numbers.reduce(into: 0) { $0 += $1 }
  }
  print(result == 416691666)
}

@inline(never)
public func lazyFilterMap2(_ N: Int) {
  var result = 0
  for _ in 1...10*N {
    let numbers = first10k.lazy
      .filter2 { $0 % 3 == 0 }
      .map2 { $0 * 2 }
      .filter2 { $0 % 8 == 0 }
      .map2 { $0 / 2 + 1}
    // print(type(of: numbers))
    result = numbers.reduce(into: 0) { $0 += $1 }
  }
  print(result == 416691666)
}

@inline(never)
public func lazyCompactMap2(_ N: Int) {
  var result = 0
  for _ in 1...10*N {
    let numbers = first10k.lazy
      .compactMap2 { (x: Int) -> Int? in
        if x % 3 != 0 { return nil }
        let y = x * 2
        if y % 8 != 0 { return nil }
        let z = y / 2 + 1
        return z
      }
    // print(type(of: numbers))
    result = numbers.reduce(into: 0) { $0 += $1 }
  }
  print(result == 416691666)
}

if let which = CommandLine.arguments.dropFirst().first {
  switch which {
  case "lazyFilterMap": lazyFilterMap(1)
  case "lazyFilterMap2": lazyFilterMap2(1)
  case "lazyCompactMap": lazyCompactMap(1)
  case "lazyCompactMap2": lazyCompactMap2(1)
  case "bespoke": bespoke(1)
  default: print("No can do")
  }
}
2 Likes

In fact! What’s even more important, because I did not use the proposed implementation and instead just copied various pieces from there, I ended up adding filter and map methods returning LazyConcatMapSequence directly to LazySequenceProtocol, which, given the performance numbers that I got, means that we can try and get rid of standalone LazyMapSequence and LazyFilterSequence, if that’s possible without breaking source.

2 Likes

Very interesting findings. Thank you for this.

@TellowKrinkle would you be willing to try removing LazyMap and LazyFilter types in favor of the new type and update the proposal accordingly?

Dropping LazyMap would lose its O(1) count (and its nonzero underestimatedCount) which might not be preferable, but dropping LazyFilter would probably make sense. The only thing that LazyFilter does that LazyCompactMap doesn’t is the check in _customContainsEquatableElement that checks whether or not the element matches the predicate before actually searching the collection. I don’t know much that helps things but as long as you think losing that is fine that I’m happy to try the removal.

Also did you remember to compile with whole module optimization? All your tests run in 0.00 seconds (except lazyCompactMap due to some weird optimizer bug that’s new in Swift 4.1 and I should report) when I test with wmo on.
Edit: In the test I used (the one I linked in my original post) I got a pretty big improvement

Good catch!

FWIW, I only added this optimization fairly recently for no other reason than “because I can” (https://github.com/apple/swift/commit/dd5ba51a196d4f2a8a90433524982dcb8e46ad01).

Let me try that.

@TellowKrinkle WMO did not change anything. :man_shrugging: Please try the AnySequence version of the code, maybe? It should be significantly slower.

Wait a second… Can’t we just call the compactMap’s transform on the element in question and if it returns nil then it essentially would mean that it’s not found?

The element passed to contains(_:) would be the transformed element, not the original, so this wouldn’t be what we need.

1 Like

:man_facepalming:

1 Like

I was going to try that, but then it didn’t compile and I realized I’d need the inverse of the function, so that won’t actually work.

I turned the code snippets above into benchmarks, and ran them with and without the proposed changes in the stdlib. (Please note that the “after” keeps the LazyMap... and LazyFilter... types and corresponding functions, i.e. it is pretty much apple/swift#14841).

Here are the results:

Before
> bin/Benchmark_O ChainedFilterMap FatCompactMap --num-samples=5
#,TEST,SAMPLES,MIN(us),MAX(us),MEAN(us),SD(us),MEDIAN(us)
1,ChainedFilterMap,5,1293,1361,1317,29,1307
2,FatCompactMap,5,192573,201983,196937,4094,195579

Totals,2,193866,203344,198254,0,0
After
> bin/Benchmark_O ChainedFilterMap FatCompactMap --num-samples=5
#,TEST,SAMPLES,MIN(us),MAX(us),MEAN(us),SD(us),MEDIAN(us)
1,ChainedFilterMap,5,1348,1357,1351,3,1350
2,FatCompactMap,5,1295,1308,1298,5,1296

Totals,2,2643,2665,2649,0,0

My interpretation of the results is that we don’t lose much in the chaining case, but gain a lot in the compactMap case, and also gain a simpler type for complex lazy chains.

With this, I would like to withdraw my comment about dropping LazyMap and LazyFilter types, to limit the scope of the proposal. Sorry for this not very well thought through remark.

Here’s what I get with AnySequence:

> swiftc -O -wmo *.swift
> time ./main bespoke
true
0.76user 0.00system 0:00.76elapsed 100%CPU (0avgtext+0avgdata 7436maxresident)k
0inputs+0outputs (0major+267minor)pagefaults 0swaps
> time ./main lazyFilterMap
true
0.73user 0.00system 0:00.73elapsed 99%CPU (0avgtext+0avgdata 6904maxresident)k
0inputs+0outputs (0major+261minor)pagefaults 0swaps
> time ./main lazyFilterMap2
true
0.72user 0.00system 0:00.72elapsed 99%CPU (0avgtext+0avgdata 7324maxresident)k
0inputs+0outputs (0major+265minor)pagefaults 0swaps
> time ./main lazyCompactMap
true
0.81user 0.00system 0:00.82elapsed 99%CPU (0avgtext+0avgdata 7268maxresident)k
0inputs+0outputs (0major+269minor)pagefaults 0swaps
> time ./main lazyCompactMap2
true
0.70user 0.00system 0:00.71elapsed 99%CPU (0avgtext+0avgdata 7048maxresident)k
0inputs+0outputs (0major+264minor)pagefaults 0swaps

When using your code with the array, I get similar results for everything (except lazyCompactMap), however if I change the array to a CountableClosedRange<Int>, you can see an improvement (N is 100 so the times are visible)

> swiftc -O -wmo *.swift
> time ./main bespoke
true
0.27user 0.00system 0:00.28elapsed 99%CPU (0avgtext+0avgdata 6576maxresident)k
0inputs+0outputs (0major+254minor)pagefaults 0swaps
> time ./main lazyFilterMap
true
0.35user 0.00system 0:00.35elapsed 100%CPU (0avgtext+0avgdata 6704maxresident)k
0inputs+0outputs (0major+253minor)pagefaults 0swaps
> time ./main lazyFilterMap2
true
0.27user 0.00system 0:00.27elapsed 100%CPU (0avgtext+0avgdata 6728maxresident)k
0inputs+0outputs (0major+257minor)pagefaults 0swaps
> time ./main lazyCompactMap
true
89.45user 0.18system 1:29.68elapsed 99%CPU (0avgtext+0avgdata 7372maxresident)k
0inputs+0outputs (0major+270minor)pagefaults 0swaps
> time ./main lazyCompactMap2
true
0.27user 0.00system 0:00.27elapsed 99%CPU (0avgtext+0avgdata 7080maxresident)k
0inputs+0outputs (0major+259minor)pagefaults 0swaps

If you invert the filter checks, the difference becomes visible even with the array (the more things that pass the first filter, the more the performance of the multiple filter is affected, since something that fails the first filter is effectively the same in a chained filter as it is in a compactmap.)
Note that I also added a manual loop since ideally I would want the performance of a map filter combination to be the same as if I had manually made a for loop.

> swiftc -O -wmo *.swift
> time ./main bespoke
true
0.48user 0.00system 0:00.48elapsed 99%CPU (0avgtext+0avgdata 7964maxresident)k
0inputs+0outputs (0major+455minor)pagefaults 0swaps
> time ./main lazyFilterMap
true
0.62user 0.00system 0:00.62elapsed 100%CPU (0avgtext+0avgdata 7776maxresident)k
0inputs+0outputs (0major+453minor)pagefaults 0swaps
> time ./main lazyFilterMap2
true
0.48user 0.00system 0:00.49elapsed 99%CPU (0avgtext+0avgdata 7676maxresident)k
0inputs+0outputs (0major+453minor)pagefaults 0swaps
> time ./main lazyCompactMap
true
34.51user 0.06system 0:34.59elapsed 99%CPU (0avgtext+0avgdata 7900maxresident)k
0inputs+0outputs (0major+461minor)pagefaults 0swaps
> time ./main lazyCompactMap2
true
0.48user 0.00system 0:00.48elapsed 100%CPU (0avgtext+0avgdata 7848maxresident)k
0inputs+0outputs (0major+454minor)pagefaults 0swaps
> time ./main loop
true
0.43user 0.00system 0:00.44elapsed 99%CPU (0avgtext+0avgdata 7916maxresident)k
0inputs+0outputs (0major+455minor)pagefaults 0swaps

Updated Code (works if you use it as the main.swift along with the files I linked earlier)

#if false

public let first10k = AnySequence(0...100_000-1)
public let targetResult = 2500049999
typealias MainIterator = AnySequence<Int>.Iterator

#elseif true

public let first10k = Array(0...100_000-1)
public let targetResult = 2500049999
typealias MainIterator = Array<Int>.Iterator

#else 

public let first10k = 0...100_000-1
public let targetResult = 2500049999
typealias MainIterator = CountableClosedRange<Int>.Iterator

#endif

@inline(never)
public func lazyFilterMap(_ N: Int) {
  var result = 0
  for _ in 1...10*N {
    let numbers = first10k.lazy
      .filter { $0 % 3 != 0 }
      .map { $0 * 2 }
      .filter { $0 % 8 != 0 }
      .map { $0 / 2 + 1 }
      // .filter { $0 % 9 != 0 }
    // print(type(of: numbers))
    result = numbers.reduce(into: 0) { $0 += $1 }
  }
  print(result == targetResult)
}

@inline(never)
public func lazyFilterMap2(_ N: Int) {
  var result = 0
  for _ in 1...10*N {
    let numbers = first10k.lazy2
      .filter { $0 % 3 != 0 }
      .map { $0 * 2 }
      .filter { $0 % 8 != 0 }
      .map { $0 / 2 + 1 }
      // .filter { $0 % 9 != 0 }
    // print(type(of: numbers))
    result = numbers.reduce(into: 0) { $0 += $1 }
  }
  print(result == targetResult)
}

@inline(never)
public func lazyCompactMap(_ N: Int) {
  var result = 0
  for _ in 1...10*N {
    let numbers = first10k.lazy
      .compactMap { (x: Int) -> Int? in
        if x % 3 == 0 { return nil }
        let y = x * 2
        if y % 8 == 0 { return nil }
        let z = y / 2 + 1
        // if z % 9 == 0 { return nil }
        return z
      }
    // print(type(of: numbers))
    result = numbers.reduce(into: 0) { $0 += $1 }
  }
  print(result == targetResult)
}

@inline(never)
public func lazyCompactMap2(_ N: Int) {
  var result = 0
  for _ in 1...10*N {
    let numbers = first10k.lazy2
      .compactMap { (x: Int) -> Int? in
        if x % 3 == 0 { return nil }
        let y = x * 2
        if y % 8 == 0 { return nil }
        let z = y / 2 + 1
        // if z % 9 == 0 { return nil }
        return z
      }
    // print(type(of: numbers))
    result = numbers.reduce(into: 0) { $0 += $1 }
  }
  print(result == targetResult)
}

struct Numbers: Sequence {
  func makeIterator() -> Iterator {
    return Iterator()
  }

  struct Iterator: IteratorProtocol {
    var it: MainIterator

    init() {
      self.it = first10k.makeIterator()
    }

    mutating func next() -> Int? {
      while true {
        if let x = it.next() {
          if x % 3 == 0 { continue }
          let y = x * 2
          if y % 8 == 0 { continue }
          let z = y / 2 + 1
          // if z % 9 == 0 { continue }
          return z
        } else {
          return nil
        }
      }
    }
  }
}

@inline(never)
public func bespoke(_ N: Int) {
  var result = 0
  for _ in 1...10*N {
    let numbers = Numbers()
    // print(type(of: numbers))
    result = numbers.reduce(into: 0) { $0 += $1 }
  }
  print(result == targetResult)
}

@inline(never)
public func loop(_ N: Int) {
  var result = 0
  for _ in 1...10 * N {
    var tmpResult = 0
    for x in first10k {
      if x % 3 == 0 { continue }
      let y = x * 2
      if y % 8 == 0 { continue }
      let z = y / 2 + 1
      // if z % 9 == 0 { continue }
      tmpResult += z
    }
    result = tmpResult
  }
  // print(result)
  print(result == targetResult)
}

if let which = CommandLine.arguments.dropFirst().first {
  let n = 100
  switch which {
  case "lazyFilterMap": lazyFilterMap(n)
  case "lazyFilterMap2": lazyFilterMap2(n)
  case "lazyCompactMap": lazyCompactMap(n)
  case "lazyCompactMap2": lazyCompactMap2(n)
  case "bespoke": bespoke(n)
  case "loop": loop(n)
  default: print("No can do")
  }
}
1 Like

I see what you meant by using WMO. It’s a bit of a cheating to use it here as the results will not be representative of what it would look like when the change lands in the standard library. When the LazyCompactMapCollection is in the Swift module, WMO won’t be able to help, IIUC.

One other important metric for the record: the binary size of the resulting dylib. (master as of 12c67608 vs same revision with the proposed changes applied on top).

Title                              Section             Old             New  Percent
libswiftCore.dylib                  __text:        5829618         5875826     0.8%

I just tested and when compiling with my branch of the compiler (with just -O and -swift-version 5) it runs about as fast as the version compiled with -wmo rather than as fast as the version compiled without -wmo.

Terms of Service

Privacy Policy

Cookie Policy