Execution time ContiguousArray vs Array

Thanks for trying.

May I ask which arrays are you converting to contiguous arrays in your code? It’s not completely obvious (to me). Are the return types also contiguous, which ones in the body are converted?

@Letan: Hey, it's just 30 lines of code and the second line is:

var possibles = Array(repeating: true, count: limit + 1) // or **ContiguousArray**

Note the comment : )


@hlovatt:
I've copy pasted the code into the main file of a command line project and ran it.

A small side note is that the program is missing an import Foundation(otherwise Date won't work).

The program performs very little work just once, so in order to get a micro benchmark from which we can draw any conclusions whatsoever, it'll have to be modified to do more work, on random data, repeatedly, and in a way such that the important stuff won't get removed by dead code elimination etc.

Anyway, I wouldn't be surprised if ContiguousArray was slower, I think I've been in a situation where I noticed that too.

Ah sorry, I completely missed that. I was on a mobile screen :)


Doing a quick run on my system it looks like Array outperforms ContiguosArray only when the build is optimised, otherwise ContiguosArray performs better.

Un-Optimized:
Array: 0.577607989311218
Contiguos Array: 0.292797088623047

Optimized:
Array: 0.00630402565002441
Contiguos Array: 0.0145009756088257

So probably some missed optimisation opportunities.

Note: I've just run the program as provided without any of @Jens's suggestions.

1 Like

Oops - corrected - thanks

I am using the program to compare different looping constructs, so that is by intention.

Will try with larger limit on the primes to see if that effects things.

Yes similar to what I get.

I am wondering if it is memory related.

If ContiguousArray is ever slower than Array, it's a bug. Could be the optimizer treating them differently for some reason, or could be that Array at some point got a customization that was forgotten from ContiguousArray. But definitely a bug.

ContiguousArray is supposed to just be an implementation of Array but without the parts of Array that handle lazily bridging NSArray. These aspects of Array can result in unnecessary branches the optimizer can't eliminate, in which case a ContiguousArray can speed things up. But it should never go the other way. Can you file a jira?

EDIT: Ah, I missed the observation earlier this was only for unoptimized code. Not worth filing a jira for that.

EDIT2: No, I misread that and it's slower when optimized! That is bad... definitely could do with a Jira.

(cc @Erik_Eckstein, @lancep)

4 Likes

You misread. ContiguousArray is only slower in this code when optimized.

2 Likes

Just a couple of more questions about the code example : )

(Note that I don't doubt that there might be a bug which makes ContiguousArray slower than Array, and as I said, I even think this is the case. But if the code example is to be used as a demonstration of this bug then it's good if it's minimal and clear.)

  1. Why are there throws (and try and catch) when nothing in the code can actually throw?

  2. I guess the stride will be empty for all i * i > limit which is for all i > sqrt(limit), which is for most i, is this intentional? EDIT: I see now what the algorithm is doing, and that this is intentional, ie the inner j-loop should do nothing for i > sqrt(limit).

  3. Why not make a couple (say 10) test runs instead of just a single one, to get an overview of several measurements (fastest, slowest, average)? The timings will probably differ a bit, and it's good to show how much/little that is.

Here's a modified version that addresses point 1 and 3:

import Foundation

typealias A = Array
//typealias A = ContiguousArray

func forSieveOfEratosthenes(limit: Int) -> [Int] {
    var possibles = A(repeating: true, count: limit + 1)
    var result = [Int]()
    for i in 2 ... limit {
        guard possibles[i] else {
            continue
        }
        for j in stride(from: i * i, through: limit, by: i) {
            possibles[j] = false
        }
        result.append(i)
    }
    return result
}

func test() {
    let start = Date()
    let result = forSieveOfEratosthenes(limit: 1_000_000)
    let end = Date()
    print("biggest: \(result.last ?? 0): time: \(end.timeIntervalSince(start))")
}
for _ in 0 ..< 10 { test() }

It still demonstrates that A = ContiguousArray is slower than A = Array (in optimized builds).


EDIT 2:
Here's an interesting thing! I noticed that with the following reformulation (I'm pretty sure it is functionally equivalent), A = ContiguousArray is as fast as A = Array:

import Foundation

//typealias A = Array
typealias A = ContiguousArray

func forSieveOfEratosthenes(limit: Int) -> [Int] {
    var possibles = A(repeating: true, count: limit + 1)
    var result = [Int]()
    for i in 2 ... limit {
        guard possibles[i] else { continue }
        var j = i * i
        while j <= limit {
            possibles[j] = false
            j += i
        }
        result.append(i)
    }
    return result
}

func test() {
    let start = Date()
    let result = forSieveOfEratosthenes(limit: 1_000_000)
    let end = Date()
    print("biggest: \(result.last ?? 0): time: \(end.timeIntervalSince(start))")
}
for _ in 0 ..< 10 { test() }

Filed SR-7637.

1 Like

I know it's been pointed out this comment doesn't apply to this particular case, but I also don't think this applies to Swift in general: an Array is contiguous unless it comes from Objective-C. In code terms, this can be seen by noticing that, with Objective-C support on, Array stores an ArrayBuffer which stores a _BridgeStorage<_ContiguousArrayStorageBase, _NSArrayCore>. Without Objective-C support, it just stores a contiguous array.

My experience is, often, the extra complexity of implementing non-contiguous data structures, plus the poorer cache behaviour (both in terms of just requiring two+ pointer dereferences and also the data itself being more spread out) means seemingly-better data structures often lose out to the "dumb" one. There's an example of linked lists vs. vectors that Stroustrup (of C++ fame) uses, where the contiguous storage wins out even for workloads that a list would seemingly prefer (removing elements from somewhere other than the end).

I think some of the confusion here might be that you're talking about the result array, not the possibilities array? I think the latter is the one being changed here.

Also, you might be interested to know that the x/(log(x) - 1) is, for x ≥ 5393, a lower bound on the number of primes below x (meaning there may be one resize at the very end, if the true count is above the lower bound, as it is likely to be), and it's also not quite the best one. For instance, page 14 of http://www.unilim.fr/laco/theses/1998/T1998_01.pdf (the equations can still be understood by those like me who can't read French, if you keep in mind that , is the decimal point) summarises various bounds for the prime counting function π with more detail in Theorem 1.10 (page 36). The π(x) ≥ x/(log(x)-1) approximation does appear there, but it seems to only be the best lower bound (of the ones the author lists) for 5393 ≤ x < 32299.

1 Like

@Jens thanks for filing. Your version the code is better than mine!

The reason for the odd code in my case, including the spurious throws, is that it is an extract from a larger piece of code; I cut down the larger code to some extent to make the post shorter but didn't really go far enough - thanks for fixing up the code.

I have also noted that if you rewrite the code using lazy iteration then the problem goes away, like you noted with a while loop. This would indicate it is some strange interaction with ContiguousArray, Stride's iterator, and the optimizer.

1 Like

With 4.2 original problem goes away, but comms back for lazy! EG:

import Foundation

protocol Possibles {
    init(repeating: Bool, count: Int)
    subscript(index: Int) -> Bool { get set }
    var count: Int { get }
}
extension Array: Possibles where Element == Bool {}
extension ContiguousArray: Possibles where Element == Bool {}

func lazySieveOfEratosthenes<Storage: Possibles>(makeStorage: () -> Storage) -> [Int] {
    var possibles = makeStorage()
    let limit = possibles.count - 1
    return (2 ... limit).lazy.filter { i in // Lazy, so that `possibles` is updated before it is used.
        possibles[i]
    }.map { i in
        stride(from: i * i, through: limit, by: i).lazy.forEach { j in
            possibles[j] = false
        }
        return i
    }.reduce(into: [Int]()) { (result, next) in
        result.append(next)
    }
}

func forSieveOfEratosthenes<Storage: Possibles>(makeStorage: () -> Storage) -> [Int] {
    var possibles = makeStorage()
    let limit = possibles.count - 1
    var result = [Int]()
    for i in 2 ... limit where possibles[i] { // Lazy, so that `possibles` is updated before it is used.
        for j in stride(from: i * i, through: limit, by: i) {
            possibles[j] = false
        }
        result.append(i)
    }
    return result
}

func test(_ name: String, _ storageType: String, _ function: (Int) -> [Int]) {
    let start = Date()
    let result = function(1_000_000)
    let end = Date()
    print("\(name) \(storageType) biggest: \(result.last ?? 0) time: \(end.timeIntervalSince(start))")
}
test("for   ", "Array          ") { limit in
    forSieveOfEratosthenes {
        Array(repeating: true, count: limit + 1)
    }
}
test("for   ", "ContiguousArray") { limit in
    forSieveOfEratosthenes {
        ContiguousArray(repeating: true, count: limit + 1)
    }
}
test("lazy  ", "Array          ") { limit in
    lazySieveOfEratosthenes {
        Array(repeating: true, count: limit + 1)
    }
}
test("lazy  ", "ContiguousArray") { limit in
    lazySieveOfEratosthenes {
        ContiguousArray(repeating: true, count: limit + 1)
    }
}

Prints:

for    Array           biggest: 999983 time: 0.009091973304748535
for    ContiguousArray biggest: 999983 time: 0.007838010787963867
lazy   Array           biggest: 999983 time: 0.006950974464416504
lazy   ContiguousArray biggest: 999983 time: 0.007106900215148926

Argh!!!

Did you try Xcode 10 beta, both with its default toolchain and with the most recent development snapshot?

Yes above results with Xcode 10 bite and Swift set to 4.2 and optimization on.

No, it doesn't, at least not if the original problem is the same as the one demonstrated by the program in the description of SR-7637. I've tested and verified that this issue remains (in default toolchain of Xcode 10 beta 1, and in development snapshot 2018-06-06).

I guess that you thought the original issue was gone because your new test program is not affected by the original problem, and this is probably because the func is now being passed as a closure (Ah, the predictability of Swift performance!).


Looks to me like they are all about the same (since it's just a single run of each, and the times would probably fluctuate a bit between repeated runs), what do you mean by the problem coming back for lazy?


When I run your program, with a slight modification of wrapping the content of your test function in a for-loop, so that each test is performed more than once (so we can see the fluctuations in time of repeated trials), then there is not much difference between the four cases.

I'm using Xcode 10 beta 1 (and -O / release build, -swift-version 4.2), and I've tried both the default toolchain and the most recent dev snapshot. See the exact results in Details.

Note that I included the results of a debug build at the end, as that's the only way I can see a significant slow down for the lazy ones compared to the others. (But all four are between 30 and 90 times slower than in a release build.)

Details

The modification I did to your code:

func test(_ name: String, _ storageType: String, _ function: (Int) -> [Int]) {
    for _ in 0 ..< 5 { // <-- I added this
        let start = Date()
        let result = function(1_000_000)
        let end = Date()
        print("\(name) \(storageType) biggest: \(result.last ?? 0) time: \(end.timeIntervalSince(start))")
    }
}

With default toolchain of Xcode 10 beta 1, release build:

for    Array           biggest: 999983 time: 0.013331055641174316
for    Array           biggest: 999983 time: 0.012892007827758789
for    Array           biggest: 999983 time: 0.011870980262756348
for    Array           biggest: 999983 time: 0.011000990867614746
for    Array           biggest: 999983 time: 0.010987043380737305
for    ContiguousArray biggest: 999983 time: 0.012047052383422852
for    ContiguousArray biggest: 999983 time: 0.011559009552001953
for    ContiguousArray biggest: 999983 time: 0.011247992515563965
for    ContiguousArray biggest: 999983 time: 0.010803937911987305
for    ContiguousArray biggest: 999983 time: 0.010781049728393555
lazy   Array           biggest: 999983 time: 0.011129975318908691
lazy   Array           biggest: 999983 time: 0.011131048202514648
lazy   Array           biggest: 999983 time: 0.010547995567321777
lazy   Array           biggest: 999983 time: 0.010286927223205566
lazy   Array           biggest: 999983 time: 0.010206937789916992
lazy   ContiguousArray biggest: 999983 time: 0.010246038436889648
lazy   ContiguousArray biggest: 999983 time: 0.010326981544494629
lazy   ContiguousArray biggest: 999983 time: 0.010244965553283691
lazy   ContiguousArray biggest: 999983 time: 0.011294007301330566
lazy   ContiguousArray biggest: 999983 time: 0.010947942733764648

With Development Snapshot 2018-06-06, release build:

for    Array           biggest: 999983 time: 0.014073014259338379
for    Array           biggest: 999983 time: 0.01243603229522705
for    Array           biggest: 999983 time: 0.012018918991088867
for    Array           biggest: 999983 time: 0.011536002159118652
for    Array           biggest: 999983 time: 0.011247038841247559
for    ContiguousArray biggest: 999983 time: 0.011636972427368164
for    ContiguousArray biggest: 999983 time: 0.011552095413208008
for    ContiguousArray biggest: 999983 time: 0.011477947235107422
for    ContiguousArray biggest: 999983 time: 0.011501073837280273
for    ContiguousArray biggest: 999983 time: 0.012168049812316895
lazy   Array           biggest: 999983 time: 0.01273202896118164
lazy   Array           biggest: 999983 time: 0.011597990989685059
lazy   Array           biggest: 999983 time: 0.011325955390930176
lazy   Array           biggest: 999983 time: 0.011317968368530273
lazy   Array           biggest: 999983 time: 0.011630058288574219
lazy   ContiguousArray biggest: 999983 time: 0.011358976364135742
lazy   ContiguousArray biggest: 999983 time: 0.011302947998046875
lazy   ContiguousArray biggest: 999983 time: 0.0121079683303833
lazy   ContiguousArray biggest: 999983 time: 0.011568069458007812
lazy   ContiguousArray biggest: 999983 time: 0.012079000473022461

With default toolchain of Xcode 10 beta 1, DEBUG build:

for    Array           biggest: 999983 time: 0.6173779964447021
for    Array           biggest: 999983 time: 0.6165099143981934
for    Array           biggest: 999983 time: 0.6127970218658447
for    Array           biggest: 999983 time: 0.6117799282073975
for    Array           biggest: 999983 time: 0.6133040189743042
for    ContiguousArray biggest: 999983 time: 0.32653796672821045
for    ContiguousArray biggest: 999983 time: 0.3264739513397217
for    ContiguousArray biggest: 999983 time: 0.32669806480407715
for    ContiguousArray biggest: 999983 time: 0.32600200176239014
for    ContiguousArray biggest: 999983 time: 0.32672107219696045
lazy   Array           biggest: 999983 time: 1.0019129514694214
lazy   Array           biggest: 999983 time: 0.994049072265625
lazy   Array           biggest: 999983 time: 1.0054988861083984
lazy   Array           biggest: 999983 time: 1.0101089477539062
lazy   Array           biggest: 999983 time: 1.012789011001587
lazy   ContiguousArray biggest: 999983 time: 0.7068800926208496
lazy   ContiguousArray biggest: 999983 time: 0.7070260047912598
lazy   ContiguousArray biggest: 999983 time: 0.7070150375366211
lazy   ContiguousArray biggest: 999983 time: 0.7148840427398682
lazy   ContiguousArray biggest: 999983 time: 0.7186750173568726

Also, don't forget the fastest and most stable variant!

The variant of the SieveOfEratosthenes func with an "alternative inner loop" in SR-7637, is still more than 2 times faster than the for… and lazy… variants, and it will be fast no matter if Array or ContiguousArray is used, and no matter if it is passed as a closure or called directly.

We could probably demonstrate this by writing a new version of the demo program of SR-7637, and include your for… and lazy… variants, as well as this while… variant:

func whileSieveOfEratosthenes<Storage: Possibles>(makeStorage: () -> Storage) -> [Int] {
    var possibles = makeStorage()
    let limit = possibles.count - 1
    var result = [Int]()
    for i in 2 ... limit where possibles[i] {
        var j = i * i
        while j <= limit {  // <-- Using a while loop like this here.
            possibles[j] = false
            j += i
        }
        result.append(i)
    }
    return result
}

And compare how the three variants perform when used with Array or ContiguousArray and when they are passed to a test function as a closure or called directly. I think this would be an interesting test that might perhaps help the devs fixing the issue(s) behind these inconsistencies.

I will write and add this test here when I get the time.

1 Like

I ran with @Jens 's suggestions, running multiple times (101) and adding while variant, and modified the code thus:

func whileSieveOfEratosthenes<Storage: Possibles>(makeStorage: () -> Storage) -> [Int] {
    var possibles = makeStorage()
    let limit = possibles.count - 1
    var result = [Int]()
    for i in 2 ... limit where possibles[i] { // Lazy, so that `possibles` is updated before it is used.
        var j = i * i
        while j <= limit {  // <-- While loop instead of stride.
            possibles[j] = false
            j += i
        }
        result.append(i)
    }
    return result
}

func timeOneTest(_ name: String, _ storageType: String, _ function: (Int) -> [Int]) -> (Double, Int)  {
    let start = Date()
    let last = function(1_000_000).last ?? 0
    let end = Date()
    let time = end.timeIntervalSince(start)
    //print("\(name) \(storageType) biggest: \(last) time: \(time)")
    return (time, last)
}
func test(_ name: String, _ storageType: String, _ function: @escaping (Int) -> [Int]) {
    let n = 101
    let total = (0 ..< n).lazy.map { _ in
        timeOneTest(name, storageType, function)
    }.reduce(0) { (r, t) in
        r + t.0
    }
    print("\(name) \(storageType) average time: \(total / Double(n))")
}

On Version 10.0 beta (10L176w) with Swift 4.2 and -Ofast I now get:

for    Array           average time: 0.007411170713972337
for    ContiguousArray average time: 0.007583871926411544
lazy   Array           average time: 0.006907522088230246
lazy   ContiguousArray average time: 0.0076326733768576445
while  Array           average time: 0.003555823080610521
while  ContiguousArray average time: 0.003516881772787264

Which shows that ContiguousArray is slightly slower than Array for both for loop and lazy loop variants, but is same for while loop variant, and the while loop variant is much quicker! Note while loop doesn't use stride and therefore likely some problem with stride.

Note what happens if you take the minimum instead of mean. And run the program multiple times to see how the min time changes. I'd say the result is the same for ContiguousArray and Array here in this particular test.

And, as previously mentioned, the reason why there is no dramatic difference between using Array and ContiguousArray in this particular test of yours, is probably because the tested functions (for…, lazy… and while…) are being passed as closure arguments to the test func, rather than being called directly as is the case in the demo program of SR-7637. This is why the demo program of SR-7637 still demonstrates the same significant slow down when using ContiguousArray. Please try that program out to see what I mean (unless you already did see that, and I'm repeating it for no real reason here : ) ).

EDIT: Correction, the reason your code shows Array and ContiguousArray making essentially no difference for performance (using Array has become equally slow as using ContiguousArray), also has to do with how the storage is declared through a generic type parameter (a type conforming to Possibles), and passed into each test through a closure, rather than being declared as either Array or Storage directly within the tested func, as in the demo of SR-7637, in which using Array (directly declared) with for… is as fast as while… (which is fast no matter the type used for possibles).

Phew ... : ) This is what it's like when trying to achieve and analyze the efficiency of some code. Every little detail matters, the exact formulation and context of the code for the test case, and the exact formulation and context for the time-measuring code. The performance inconsistencies we are trying to identify, by measuring execution time, are of course also affecting the code we use to measure their execution time ...

Taking on board @Jens 's suggestion of using minimum time (rather than average) and hard coding the array type (rather than using a protocol as a type) the code becomes:

import Foundation

let a = Array<Int>()

func forArraySieveOfEratosthenes(_ limit: Int) -> [Int] {
    var possibles = Array(repeating: true, count: limit + 1)
    var result = [Int]()
    for i in 2 ... limit where possibles[i] { // Lazy, so that `possibles` is updated before it is used.
        for j in stride(from: i * i, through: limit, by: i) {
            possibles[j] = false
        }
        result.append(i)
    }
    return result
}

func forContiguousArraySieveOfEratosthenes(_ limit: Int) -> [Int] {
    var possibles = ContiguousArray(repeating: true, count: limit + 1)
    var result = [Int]()
    for i in 2 ... limit where possibles[i] { // Lazy, so that `possibles` is updated before it is used.
        for j in stride(from: i * i, through: limit, by: i) {
            possibles[j] = false
        }
        result.append(i)
    }
    return result
}

func lazyArraySieveOfEratosthenes(_ limit: Int) -> [Int] {
    var possibles = Array(repeating: true, count: limit + 1)
    return (2 ... limit).lazy.filter { i in // Lazy, so that `possibles` is updated before it is used.
        possibles[i]
        }.map { i in
            stride(from: i * i, through: limit, by: i).lazy.forEach { j in
                possibles[j] = false
            }
            return i
        }.reduce(into: [Int]()) { (result, next) in
            result.append(next)
    }
}

func lazyContiguousArraySieveOfEratosthenes(_ limit: Int) -> [Int] {
    var possibles = ContiguousArray(repeating: true, count: limit + 1)
    return (2 ... limit).lazy.filter { i in // Lazy, so that `possibles` is updated before it is used.
        possibles[i]
        }.map { i in
            stride(from: i * i, through: limit, by: i).lazy.forEach { j in
                possibles[j] = false
            }
            return i
        }.reduce(into: [Int]()) { (result, next) in
            result.append(next)
    }
}

func whileArraySieveOfEratosthenes(_ limit: Int) -> [Int] {
    var possibles = Array(repeating: true, count: limit + 1)
    var result = [Int]()
    for i in 2 ... limit where possibles[i] { // Lazy, so that `possibles` is updated before it is used.
        var j = i * i
        while j <= limit {  // <-- While loop instead of stride.
            possibles[j] = false
            j += i
        }
        result.append(i)
    }
    return result
}

func whileContiguousArraySieveOfEratosthenes(_ limit: Int) -> [Int] {
    var possibles = ContiguousArray(repeating: true, count: limit + 1)
    var result = [Int]()
    for i in 2 ... limit where possibles[i] { // Lazy, so that `possibles` is updated before it is used.
        var j = i * i
        while j <= limit {  // <-- While loop instead of stride.
            possibles[j] = false
            j += i
        }
        result.append(i)
    }
    return result
}

func timeOneTest(_ name: String, _ storageType: String, _ function: (Int) -> [Int]) -> (Double, Int)  {
    let start = Date()
    let last = function(4_000_037).last ?? 0
    let end = Date()
    let time = end.timeIntervalSince(start)
    //print("\(name) \(storageType) biggest: \(last) time: \(time)")
    return (time, last)
}
func test(_ name: String, _ storageType: String, _ function: @escaping (Int) -> [Int]) {
    let n = 47
    let minimum = (0 ..< n).lazy.map { _ in
        timeOneTest(name, storageType, function)
    }.reduce(Double.greatestFiniteMagnitude) { (r, t) in
        r > t.0 ? t.0 : r
    }
    print("\(name) \(storageType) min time: \(minimum * 1000.0) ms")
}
test("for   ", "Array          ", forArraySieveOfEratosthenes(_:))
test("for   ", "ContiguousArray", forContiguousArraySieveOfEratosthenes(_:))
test("lazy  ", "Array          ", lazyArraySieveOfEratosthenes(_:))
test("lazy  ", "ContiguousArray", lazyContiguousArraySieveOfEratosthenes(_:))
test("while ", "Array          ", whileArraySieveOfEratosthenes(_:))
test("while ", "ContiguousArray", whileContiguousArraySieveOfEratosthenes(_:))

And I get (Xcode beta 1, Swift 4.2, -Ofast):

for    Array           min time: 15.726923942565918 ms
for    ContiguousArray min time: 29.75606918334961 ms
lazy   Array           min time: 29.79898452758789 ms
lazy   ContiguousArray min time: 29.718995094299316 ms
while  Array           min time: 14.216065406799316 ms
while  ContiguousArray min time: 14.311909675598145 ms

Which shows that for a:

  1. for loop: ContiguousArray is a factor of 2 slower than Array.
  2. lazy loop: Both arrays are about the same.
  3. while loop: ContiguousArray is slightly slower than Array.

What is surprising is that:

  1. For any test ContiguousArray is slower than Array.
  2. stride seems to be slow.
2 Likes

I'd say you are seeing only two substantially different time measurements there. One is about 15 ms, the other is about 30 ms. The for test with Array might be a bit slower than the while tests, but we'd need to see more runs and look at the disassembly to draw that conclusion.

(For anyone following this thread, note that the times have changed since the limit for the highest prime is now 4_000_037 instead of 1_000_000 as it was in previous examples.)

Here are a couple of runs of the same program on my machine (which seems to be a bit slower than yours):

for    Array           min time: 19.320964813232422 ms
for    ContiguousArray min time: 46.321988105773926 ms
lazy   Array           min time: 46.383023262023926 ms
lazy   ContiguousArray min time: 46.33796215057373 ms
while  Array           min time: 18.932104110717773 ms
while  ContiguousArray min time: 18.826007843017578 ms

for    Array           min time: 19.291043281555176 ms
for    ContiguousArray min time: 46.21601104736328 ms
lazy   Array           min time: 46.48399353027344 ms
lazy   ContiguousArray min time: 46.28896713256836 ms
while  Array           min time: 18.828988075256348 ms
while  ContiguousArray min time: 18.849968910217285 ms

for    Array           min time: 20.66493034362793 ms
for    ContiguousArray min time: 50.183892250061035 ms
lazy   Array           min time: 50.8190393447876 ms
lazy   ContiguousArray min time: 50.486087799072266 ms
while  Array           min time: 20.4010009765625 ms
while  ContiguousArray min time: 19.833087921142578 ms

I am also using Xcode 10 beta 1, default toolchain, release build (-O, there's no -Ofast, only -Ounchecked, -O, -Osize, -Onone, and -O is the one optimizing for speed).

Guess why the measurements of the third run is slower than the first two?

It's because I was moving the mouse cursor like crazy while the test was running. :)

Note that ContiguousArray is not always slower than Array.

Also, look what happens if I run the same program with disabled safety checks (-Ounchecked):

for    Array           min time: 20.627975463867188 ms
for    ContiguousArray min time: 20.71702480316162 ms
lazy   Array           min time: 56.8850040435791 ms
lazy   ContiguousArray min time: 46.7149019241333 ms
while  Array           min time: 17.348051071166992 ms
while  ContiguousArray min time: 17.39799976348877 ms

for    Array           min time: 20.56705951690674 ms
for    ContiguousArray min time: 20.666956901550293 ms
lazy   Array           min time: 57.30998516082764 ms
lazy   ContiguousArray min time: 46.66697978973389 ms
while  Array           min time: 17.3569917678833 ms
while  ContiguousArray min time: 17.35401153564453 ms

for    Array           min time: 20.643949508666992 ms
for    ContiguousArray min time: 20.751953125 ms
lazy   Array           min time: 57.5789213180542 ms
lazy   ContiguousArray min time: 46.80800437927246 ms
while  Array           min time: 17.465949058532715 ms
while  ContiguousArray min time: 17.420053482055664 ms

Now, both for-cases are equally fast. Which suggests that there might have been a bounds check or something that wasn't optimized away for for with ContiguousArray. (I haven't looked at the disassembly)

It might seem surprising that one test case (lazy with Array) has become significantly slower when disabling safety checks. Also, it seems like the for tests have become slightly slower, and the while tests has become slightly faster. My expectation would be that the tests could become faster, but never slower with -Ounchecked, I have seen (entirely different) code becoming slower with -Ounchecked before, and I'd like to understand why that is.

Anyway, my point is that, it's far from straight forward to draw conclusion like:

  1. For any test ContiguousArray is slower than Array.
  2. stride seems to be slow.

without trying out various contexts, rerunning the program (even though it is performing hundreds of trials and reports min, max, mean, median and whatever), seeing significant consistent time differences and analyzing the disassembly.

As shown in the first test runs of this post, using ContiguousArray is not always slower than using Array for lazy and while (at least not on my machine, even though all other things are the same). The time differences are too small and too fluctuating.

And the reason why lazy is slow need not be only because of stride. It might as well be because it is passing code through closures extensively, together with various shortcomings of the optimizer given the specific context for eg stride, etc. But don't get me wrong, it is possible that the current implementation of stride makes it unnecessarily inefficient.


By the way, the tiny bit of intuition I've been able to build for the performance of Swift code would not let me trust the harness used for measuring the time. Since it's passing the tested function around as a closure, it might itself introduce some of the performance inconsistencies we are trying to pin point, ie it might not be the least performance degrading context in which to call the test functions:

func timeOneTest(_ name: String, _ storageType: String, _ function: (Int) -> [Int]) -> (Double, Int)  {
    let start = Date()
    let last = function(4_000_037).last ?? 0
    let end = Date()
    let time = end.timeIntervalSince(start)
    //print("\(name) \(storageType) biggest: \(last) time: \(time)")
    return (time, last)
}
func test(_ name: String, _ storageType: String, _ function: @escaping (Int) -> [Int]) {
    let n = 47
    let minimum = (0 ..< n).lazy.map { _ in
        timeOneTest(name, storageType, function)
    }.reduce(Double.greatestFiniteMagnitude) { (r, t) in
        r > t.0 ? t.0 : r
    }
    print("\(name) \(storageType) min time: \(minimum * 1000.0) ms")
}

(Also, if I'm not mistaken, the code is written in a way such that the resulting primes (or the last resulting prime) are never used, and if the optimizer was more aggressive, it could probably remove all of the prime finding code by dead code elimination.)

But I did a quick check (changing 4_000_037 to 1_000_000) and saw that the results for for and while are the same as in the demo program of SR-7637 so, at least for these particular test functions, the harness is probably not incurring any loss in performance.

But please have a look at eg SR-7023, SR-7094 and SR-6983 to get a feel for how unpredictable and non-intuitive these things are.

As for the lazy variant, I haven't really looked closely at the implementation, and I can't say that I understand exactly how and why it does what it does, or whether it could be reformulated to be more efficient (while still deserving the name lazy, no pun intended).

1 Like