Execution time ContiguousArray vs Array

I ran with @Jens 's suggestions, running multiple times (101) and adding while variant, and modified the code thus:

func whileSieveOfEratosthenes<Storage: Possibles>(makeStorage: () -> Storage) -> [Int] {
    var possibles = makeStorage()
    let limit = possibles.count - 1
    var result = [Int]()
    for i in 2 ... limit where possibles[i] { // Lazy, so that `possibles` is updated before it is used.
        var j = i * i
        while j <= limit {  // <-- While loop instead of stride.
            possibles[j] = false
            j += i
        }
        result.append(i)
    }
    return result
}

func timeOneTest(_ name: String, _ storageType: String, _ function: (Int) -> [Int]) -> (Double, Int)  {
    let start = Date()
    let last = function(1_000_000).last ?? 0
    let end = Date()
    let time = end.timeIntervalSince(start)
    //print("\(name) \(storageType) biggest: \(last) time: \(time)")
    return (time, last)
}
func test(_ name: String, _ storageType: String, _ function: @escaping (Int) -> [Int]) {
    let n = 101
    let total = (0 ..< n).lazy.map { _ in
        timeOneTest(name, storageType, function)
    }.reduce(0) { (r, t) in
        r + t.0
    }
    print("\(name) \(storageType) average time: \(total / Double(n))")
}

On Version 10.0 beta (10L176w) with Swift 4.2 and -Ofast I now get:

for    Array           average time: 0.007411170713972337
for    ContiguousArray average time: 0.007583871926411544
lazy   Array           average time: 0.006907522088230246
lazy   ContiguousArray average time: 0.0076326733768576445
while  Array           average time: 0.003555823080610521
while  ContiguousArray average time: 0.003516881772787264

Which shows that ContiguousArray is slightly slower than Array for both for loop and lazy loop variants, but is same for while loop variant, and the while loop variant is much quicker! Note while loop doesn't use stride and therefore likely some problem with stride.

Note what happens if you take the minimum instead of mean. And run the program multiple times to see how the min time changes. I'd say the result is the same for ContiguousArray and Array here in this particular test.

And, as previously mentioned, the reason why there is no dramatic difference between using Array and ContiguousArray in this particular test of yours, is probably because the tested functions (for…, lazy… and while…) are being passed as closure arguments to the test func, rather than being called directly as is the case in the demo program of SR-7637. This is why the demo program of SR-7637 still demonstrates the same significant slow down when using ContiguousArray. Please try that program out to see what I mean (unless you already did see that, and I'm repeating it for no real reason here : ) ).

EDIT: Correction, the reason your code shows Array and ContiguousArray making essentially no difference for performance (using Array has become equally slow as using ContiguousArray), also has to do with how the storage is declared through a generic type parameter (a type conforming to Possibles), and passed into each test through a closure, rather than being declared as either Array or Storage directly within the tested func, as in the demo of SR-7637, in which using Array (directly declared) with for… is as fast as while… (which is fast no matter the type used for possibles).

Phew ... : ) This is what it's like when trying to achieve and analyze the efficiency of some code. Every little detail matters, the exact formulation and context of the code for the test case, and the exact formulation and context for the time-measuring code. The performance inconsistencies we are trying to identify, by measuring execution time, are of course also affecting the code we use to measure their execution time ...

Taking on board @Jens 's suggestion of using minimum time (rather than average) and hard coding the array type (rather than using a protocol as a type) the code becomes:

import Foundation

let a = Array<Int>()

func forArraySieveOfEratosthenes(_ limit: Int) -> [Int] {
    var possibles = Array(repeating: true, count: limit + 1)
    var result = [Int]()
    for i in 2 ... limit where possibles[i] { // Lazy, so that `possibles` is updated before it is used.
        for j in stride(from: i * i, through: limit, by: i) {
            possibles[j] = false
        }
        result.append(i)
    }
    return result
}

func forContiguousArraySieveOfEratosthenes(_ limit: Int) -> [Int] {
    var possibles = ContiguousArray(repeating: true, count: limit + 1)
    var result = [Int]()
    for i in 2 ... limit where possibles[i] { // Lazy, so that `possibles` is updated before it is used.
        for j in stride(from: i * i, through: limit, by: i) {
            possibles[j] = false
        }
        result.append(i)
    }
    return result
}

func lazyArraySieveOfEratosthenes(_ limit: Int) -> [Int] {
    var possibles = Array(repeating: true, count: limit + 1)
    return (2 ... limit).lazy.filter { i in // Lazy, so that `possibles` is updated before it is used.
        possibles[i]
        }.map { i in
            stride(from: i * i, through: limit, by: i).lazy.forEach { j in
                possibles[j] = false
            }
            return i
        }.reduce(into: [Int]()) { (result, next) in
            result.append(next)
    }
}

func lazyContiguousArraySieveOfEratosthenes(_ limit: Int) -> [Int] {
    var possibles = ContiguousArray(repeating: true, count: limit + 1)
    return (2 ... limit).lazy.filter { i in // Lazy, so that `possibles` is updated before it is used.
        possibles[i]
        }.map { i in
            stride(from: i * i, through: limit, by: i).lazy.forEach { j in
                possibles[j] = false
            }
            return i
        }.reduce(into: [Int]()) { (result, next) in
            result.append(next)
    }
}

func whileArraySieveOfEratosthenes(_ limit: Int) -> [Int] {
    var possibles = Array(repeating: true, count: limit + 1)
    var result = [Int]()
    for i in 2 ... limit where possibles[i] { // Lazy, so that `possibles` is updated before it is used.
        var j = i * i
        while j <= limit {  // <-- While loop instead of stride.
            possibles[j] = false
            j += i
        }
        result.append(i)
    }
    return result
}

func whileContiguousArraySieveOfEratosthenes(_ limit: Int) -> [Int] {
    var possibles = ContiguousArray(repeating: true, count: limit + 1)
    var result = [Int]()
    for i in 2 ... limit where possibles[i] { // Lazy, so that `possibles` is updated before it is used.
        var j = i * i
        while j <= limit {  // <-- While loop instead of stride.
            possibles[j] = false
            j += i
        }
        result.append(i)
    }
    return result
}

func timeOneTest(_ name: String, _ storageType: String, _ function: (Int) -> [Int]) -> (Double, Int)  {
    let start = Date()
    let last = function(4_000_037).last ?? 0
    let end = Date()
    let time = end.timeIntervalSince(start)
    //print("\(name) \(storageType) biggest: \(last) time: \(time)")
    return (time, last)
}
func test(_ name: String, _ storageType: String, _ function: @escaping (Int) -> [Int]) {
    let n = 47
    let minimum = (0 ..< n).lazy.map { _ in
        timeOneTest(name, storageType, function)
    }.reduce(Double.greatestFiniteMagnitude) { (r, t) in
        r > t.0 ? t.0 : r
    }
    print("\(name) \(storageType) min time: \(minimum * 1000.0) ms")
}
test("for   ", "Array          ", forArraySieveOfEratosthenes(_:))
test("for   ", "ContiguousArray", forContiguousArraySieveOfEratosthenes(_:))
test("lazy  ", "Array          ", lazyArraySieveOfEratosthenes(_:))
test("lazy  ", "ContiguousArray", lazyContiguousArraySieveOfEratosthenes(_:))
test("while ", "Array          ", whileArraySieveOfEratosthenes(_:))
test("while ", "ContiguousArray", whileContiguousArraySieveOfEratosthenes(_:))

And I get (Xcode beta 1, Swift 4.2, -Ofast):

for    Array           min time: 15.726923942565918 ms
for    ContiguousArray min time: 29.75606918334961 ms
lazy   Array           min time: 29.79898452758789 ms
lazy   ContiguousArray min time: 29.718995094299316 ms
while  Array           min time: 14.216065406799316 ms
while  ContiguousArray min time: 14.311909675598145 ms

Which shows that for a:

  1. for loop: ContiguousArray is a factor of 2 slower than Array.
  2. lazy loop: Both arrays are about the same.
  3. while loop: ContiguousArray is slightly slower than Array.

What is surprising is that:

  1. For any test ContiguousArray is slower than Array.
  2. stride seems to be slow.
2 Likes

I'd say you are seeing only two substantially different time measurements there. One is about 15 ms, the other is about 30 ms. The for test with Array might be a bit slower than the while tests, but we'd need to see more runs and look at the disassembly to draw that conclusion.

(For anyone following this thread, note that the times have changed since the limit for the highest prime is now 4_000_037 instead of 1_000_000 as it was in previous examples.)

Here are a couple of runs of the same program on my machine (which seems to be a bit slower than yours):

for    Array           min time: 19.320964813232422 ms
for    ContiguousArray min time: 46.321988105773926 ms
lazy   Array           min time: 46.383023262023926 ms
lazy   ContiguousArray min time: 46.33796215057373 ms
while  Array           min time: 18.932104110717773 ms
while  ContiguousArray min time: 18.826007843017578 ms

for    Array           min time: 19.291043281555176 ms
for    ContiguousArray min time: 46.21601104736328 ms
lazy   Array           min time: 46.48399353027344 ms
lazy   ContiguousArray min time: 46.28896713256836 ms
while  Array           min time: 18.828988075256348 ms
while  ContiguousArray min time: 18.849968910217285 ms

for    Array           min time: 20.66493034362793 ms
for    ContiguousArray min time: 50.183892250061035 ms
lazy   Array           min time: 50.8190393447876 ms
lazy   ContiguousArray min time: 50.486087799072266 ms
while  Array           min time: 20.4010009765625 ms
while  ContiguousArray min time: 19.833087921142578 ms

I am also using Xcode 10 beta 1, default toolchain, release build (-O, there's no -Ofast, only -Ounchecked, -O, -Osize, -Onone, and -O is the one optimizing for speed).

Guess why the measurements of the third run is slower than the first two?

It's because I was moving the mouse cursor like crazy while the test was running. :)

Note that ContiguousArray is not always slower than Array.

Also, look what happens if I run the same program with disabled safety checks (-Ounchecked):

for    Array           min time: 20.627975463867188 ms
for    ContiguousArray min time: 20.71702480316162 ms
lazy   Array           min time: 56.8850040435791 ms
lazy   ContiguousArray min time: 46.7149019241333 ms
while  Array           min time: 17.348051071166992 ms
while  ContiguousArray min time: 17.39799976348877 ms

for    Array           min time: 20.56705951690674 ms
for    ContiguousArray min time: 20.666956901550293 ms
lazy   Array           min time: 57.30998516082764 ms
lazy   ContiguousArray min time: 46.66697978973389 ms
while  Array           min time: 17.3569917678833 ms
while  ContiguousArray min time: 17.35401153564453 ms

for    Array           min time: 20.643949508666992 ms
for    ContiguousArray min time: 20.751953125 ms
lazy   Array           min time: 57.5789213180542 ms
lazy   ContiguousArray min time: 46.80800437927246 ms
while  Array           min time: 17.465949058532715 ms
while  ContiguousArray min time: 17.420053482055664 ms

Now, both for-cases are equally fast. Which suggests that there might have been a bounds check or something that wasn't optimized away for for with ContiguousArray. (I haven't looked at the disassembly)

It might seem surprising that one test case (lazy with Array) has become significantly slower when disabling safety checks. Also, it seems like the for tests have become slightly slower, and the while tests has become slightly faster. My expectation would be that the tests could become faster, but never slower with -Ounchecked, I have seen (entirely different) code becoming slower with -Ounchecked before, and I'd like to understand why that is.

Anyway, my point is that, it's far from straight forward to draw conclusion like:

  1. For any test ContiguousArray is slower than Array.
  2. stride seems to be slow.

without trying out various contexts, rerunning the program (even though it is performing hundreds of trials and reports min, max, mean, median and whatever), seeing significant consistent time differences and analyzing the disassembly.

As shown in the first test runs of this post, using ContiguousArray is not always slower than using Array for lazy and while (at least not on my machine, even though all other things are the same). The time differences are too small and too fluctuating.

And the reason why lazy is slow need not be only because of stride. It might as well be because it is passing code through closures extensively, together with various shortcomings of the optimizer given the specific context for eg stride, etc. But don't get me wrong, it is possible that the current implementation of stride makes it unnecessarily inefficient.


By the way, the tiny bit of intuition I've been able to build for the performance of Swift code would not let me trust the harness used for measuring the time. Since it's passing the tested function around as a closure, it might itself introduce some of the performance inconsistencies we are trying to pin point, ie it might not be the least performance degrading context in which to call the test functions:

func timeOneTest(_ name: String, _ storageType: String, _ function: (Int) -> [Int]) -> (Double, Int)  {
    let start = Date()
    let last = function(4_000_037).last ?? 0
    let end = Date()
    let time = end.timeIntervalSince(start)
    //print("\(name) \(storageType) biggest: \(last) time: \(time)")
    return (time, last)
}
func test(_ name: String, _ storageType: String, _ function: @escaping (Int) -> [Int]) {
    let n = 47
    let minimum = (0 ..< n).lazy.map { _ in
        timeOneTest(name, storageType, function)
    }.reduce(Double.greatestFiniteMagnitude) { (r, t) in
        r > t.0 ? t.0 : r
    }
    print("\(name) \(storageType) min time: \(minimum * 1000.0) ms")
}

(Also, if I'm not mistaken, the code is written in a way such that the resulting primes (or the last resulting prime) are never used, and if the optimizer was more aggressive, it could probably remove all of the prime finding code by dead code elimination.)

But I did a quick check (changing 4_000_037 to 1_000_000) and saw that the results for for and while are the same as in the demo program of SR-7637 so, at least for these particular test functions, the harness is probably not incurring any loss in performance.

But please have a look at eg SR-7023, SR-7094 and SR-6983 to get a feel for how unpredictable and non-intuitive these things are.

As for the lazy variant, I haven't really looked closely at the implementation, and I can't say that I understand exactly how and why it does what it does, or whether it could be reformulated to be more efficient (while still deserving the name lazy, no pun intended).

1 Like