Introduce lazy version of compactMap

moiseev · March 20, 2018, 9:50pm

I turned the code snippets above into benchmarks, and ran them with and without the proposed changes in the stdlib. (Please note that the "after" keeps the LazyMap... and LazyFilter... types and corresponding functions, i.e. it is pretty much apple/swift#14841).

Here are the results:

Before

> bin/Benchmark_O ChainedFilterMap FatCompactMap --num-samples=5
#,TEST,SAMPLES,MIN(us),MAX(us),MEAN(us),SD(us),MEDIAN(us)
1,ChainedFilterMap,5,1293,1361,1317,29,1307
2,FatCompactMap,5,192573,201983,196937,4094,195579

Totals,2,193866,203344,198254,0,0

After

> bin/Benchmark_O ChainedFilterMap FatCompactMap --num-samples=5
#,TEST,SAMPLES,MIN(us),MAX(us),MEAN(us),SD(us),MEDIAN(us)
1,ChainedFilterMap,5,1348,1357,1351,3,1350
2,FatCompactMap,5,1295,1308,1298,5,1296

Totals,2,2643,2665,2649,0,0

My interpretation of the results is that we don't lose much in the chaining case, but gain a lot in the compactMap case, and also gain a simpler type for complex lazy chains.

moiseev · March 20, 2018, 9:53pm

With this, I would like to withdraw my comment about dropping LazyMap and LazyFilter types, to limit the scope of the proposal. Sorry for this not very well thought through remark.

TellowKrinkle · March 20, 2018, 10:46pm

Here's what I get with AnySequence:

> swiftc -O -wmo *.swift
> time ./main bespoke
true
0.76user 0.00system 0:00.76elapsed 100%CPU (0avgtext+0avgdata 7436maxresident)k
0inputs+0outputs (0major+267minor)pagefaults 0swaps
> time ./main lazyFilterMap
true
0.73user 0.00system 0:00.73elapsed 99%CPU (0avgtext+0avgdata 6904maxresident)k
0inputs+0outputs (0major+261minor)pagefaults 0swaps
> time ./main lazyFilterMap2
true
0.72user 0.00system 0:00.72elapsed 99%CPU (0avgtext+0avgdata 7324maxresident)k
0inputs+0outputs (0major+265minor)pagefaults 0swaps
> time ./main lazyCompactMap
true
0.81user 0.00system 0:00.82elapsed 99%CPU (0avgtext+0avgdata 7268maxresident)k
0inputs+0outputs (0major+269minor)pagefaults 0swaps
> time ./main lazyCompactMap2
true
0.70user 0.00system 0:00.71elapsed 99%CPU (0avgtext+0avgdata 7048maxresident)k
0inputs+0outputs (0major+264minor)pagefaults 0swaps

When using your code with the array, I get similar results for everything (except lazyCompactMap), however if I change the array to a CountableClosedRange<Int>, you can see an improvement (N is 100 so the times are visible)

> swiftc -O -wmo *.swift
> time ./main bespoke
true
0.27user 0.00system 0:00.28elapsed 99%CPU (0avgtext+0avgdata 6576maxresident)k
0inputs+0outputs (0major+254minor)pagefaults 0swaps
> time ./main lazyFilterMap
true
0.35user 0.00system 0:00.35elapsed 100%CPU (0avgtext+0avgdata 6704maxresident)k
0inputs+0outputs (0major+253minor)pagefaults 0swaps
> time ./main lazyFilterMap2
true
0.27user 0.00system 0:00.27elapsed 100%CPU (0avgtext+0avgdata 6728maxresident)k
0inputs+0outputs (0major+257minor)pagefaults 0swaps
> time ./main lazyCompactMap
true
89.45user 0.18system 1:29.68elapsed 99%CPU (0avgtext+0avgdata 7372maxresident)k
0inputs+0outputs (0major+270minor)pagefaults 0swaps
> time ./main lazyCompactMap2
true
0.27user 0.00system 0:00.27elapsed 99%CPU (0avgtext+0avgdata 7080maxresident)k
0inputs+0outputs (0major+259minor)pagefaults 0swaps

If you invert the filter checks, the difference becomes visible even with the array (the more things that pass the first filter, the more the performance of the multiple filter is affected, since something that fails the first filter is effectively the same in a chained filter as it is in a compactmap.)
Note that I also added a manual loop since ideally I would want the performance of a map filter combination to be the same as if I had manually made a for loop.

> swiftc -O -wmo *.swift
> time ./main bespoke
true
0.48user 0.00system 0:00.48elapsed 99%CPU (0avgtext+0avgdata 7964maxresident)k
0inputs+0outputs (0major+455minor)pagefaults 0swaps
> time ./main lazyFilterMap
true
0.62user 0.00system 0:00.62elapsed 100%CPU (0avgtext+0avgdata 7776maxresident)k
0inputs+0outputs (0major+453minor)pagefaults 0swaps
> time ./main lazyFilterMap2
true
0.48user 0.00system 0:00.49elapsed 99%CPU (0avgtext+0avgdata 7676maxresident)k
0inputs+0outputs (0major+453minor)pagefaults 0swaps
> time ./main lazyCompactMap
true
34.51user 0.06system 0:34.59elapsed 99%CPU (0avgtext+0avgdata 7900maxresident)k
0inputs+0outputs (0major+461minor)pagefaults 0swaps
> time ./main lazyCompactMap2
true
0.48user 0.00system 0:00.48elapsed 100%CPU (0avgtext+0avgdata 7848maxresident)k
0inputs+0outputs (0major+454minor)pagefaults 0swaps
> time ./main loop
true
0.43user 0.00system 0:00.44elapsed 99%CPU (0avgtext+0avgdata 7916maxresident)k
0inputs+0outputs (0major+455minor)pagefaults 0swaps

Updated Code (works if you use it as the main.swift along with the files I linked earlier)

#if false

public let first10k = AnySequence(0...100_000-1)
public let targetResult = 2500049999
typealias MainIterator = AnySequence<Int>.Iterator

#elseif true

public let first10k = Array(0...100_000-1)
public let targetResult = 2500049999
typealias MainIterator = Array<Int>.Iterator

#else 

public let first10k = 0...100_000-1
public let targetResult = 2500049999
typealias MainIterator = CountableClosedRange<Int>.Iterator

#endif

@inline(never)
public func lazyFilterMap(_ N: Int) {
  var result = 0
  for _ in 1...10*N {
    let numbers = first10k.lazy
      .filter { $0 % 3 != 0 }
      .map { $0 * 2 }
      .filter { $0 % 8 != 0 }
      .map { $0 / 2 + 1 }
      // .filter { $0 % 9 != 0 }
    // print(type(of: numbers))
    result = numbers.reduce(into: 0) { $0 += $1 }
  }
  print(result == targetResult)
}

@inline(never)
public func lazyFilterMap2(_ N: Int) {
  var result = 0
  for _ in 1...10*N {
    let numbers = first10k.lazy2
      .filter { $0 % 3 != 0 }
      .map { $0 * 2 }
      .filter { $0 % 8 != 0 }
      .map { $0 / 2 + 1 }
      // .filter { $0 % 9 != 0 }
    // print(type(of: numbers))
    result = numbers.reduce(into: 0) { $0 += $1 }
  }
  print(result == targetResult)
}

@inline(never)
public func lazyCompactMap(_ N: Int) {
  var result = 0
  for _ in 1...10*N {
    let numbers = first10k.lazy
      .compactMap { (x: Int) -> Int? in
        if x % 3 == 0 { return nil }
        let y = x * 2
        if y % 8 == 0 { return nil }
        let z = y / 2 + 1
        // if z % 9 == 0 { return nil }
        return z
      }
    // print(type(of: numbers))
    result = numbers.reduce(into: 0) { $0 += $1 }
  }
  print(result == targetResult)
}

@inline(never)
public func lazyCompactMap2(_ N: Int) {
  var result = 0
  for _ in 1...10*N {
    let numbers = first10k.lazy2
      .compactMap { (x: Int) -> Int? in
        if x % 3 == 0 { return nil }
        let y = x * 2
        if y % 8 == 0 { return nil }
        let z = y / 2 + 1
        // if z % 9 == 0 { return nil }
        return z
      }
    // print(type(of: numbers))
    result = numbers.reduce(into: 0) { $0 += $1 }
  }
  print(result == targetResult)
}

struct Numbers: Sequence {
  func makeIterator() -> Iterator {
    return Iterator()
  }

  struct Iterator: IteratorProtocol {
    var it: MainIterator

    init() {
      self.it = first10k.makeIterator()
    }

    mutating func next() -> Int? {
      while true {
        if let x = it.next() {
          if x % 3 == 0 { continue }
          let y = x * 2
          if y % 8 == 0 { continue }
          let z = y / 2 + 1
          // if z % 9 == 0 { continue }
          return z
        } else {
          return nil
        }
      }
    }
  }
}

@inline(never)
public func bespoke(_ N: Int) {
  var result = 0
  for _ in 1...10*N {
    let numbers = Numbers()
    // print(type(of: numbers))
    result = numbers.reduce(into: 0) { $0 += $1 }
  }
  print(result == targetResult)
}

@inline(never)
public func loop(_ N: Int) {
  var result = 0
  for _ in 1...10 * N {
    var tmpResult = 0
    for x in first10k {
      if x % 3 == 0 { continue }
      let y = x * 2
      if y % 8 == 0 { continue }
      let z = y / 2 + 1
      // if z % 9 == 0 { continue }
      tmpResult += z
    }
    result = tmpResult
  }
  // print(result)
  print(result == targetResult)
}

if let which = CommandLine.arguments.dropFirst().first {
  let n = 100
  switch which {
  case "lazyFilterMap": lazyFilterMap(n)
  case "lazyFilterMap2": lazyFilterMap2(n)
  case "lazyCompactMap": lazyCompactMap(n)
  case "lazyCompactMap2": lazyCompactMap2(n)
  case "bespoke": bespoke(n)
  case "loop": loop(n)
  default: print("No can do")
  }
}

moiseev · March 20, 2018, 11:22pm

I see what you meant by using WMO. It's a bit of a cheating to use it here as the results will not be representative of what it would look like when the change lands in the standard library. When the LazyCompactMapCollection is in the Swift module, WMO won't be able to help, IIUC.

moiseev · March 20, 2018, 11:25pm

One other important metric for the record: the binary size of the resulting dylib. (master as of 12c67608 vs same revision with the proposed changes applied on top).

Title                              Section             Old             New  Percent
libswiftCore.dylib                  __text:        5829618         5875826     0.8%

TellowKrinkle · March 21, 2018, 12:49am

I just tested and when compiling with my branch of the compiler (with just -O and -swift-version 5) it runs about as fast as the version compiled with -wmo rather than as fast as the version compiled without -wmo.