Execution time ContiguousArray vs Array

In the following benchmark I get a factor of two difference in execution time if a ContiguousArray as opposed to an Array is used:

import Foundation

func forSieveOfEratosthenes(limit: Int = 11) throws -> [Int] {
    var possibles = Array(repeating: true, count: limit + 1) // or **ContiguousArray**
    var result = [Int]()
    for i in 2 ... limit {
        guard possibles[i] else {
            continue
        }
        for j in stride(from: i * i, through: limit, by: i) {
            possibles[j] = false
        }
        result.append(i)
    }
    return result
}

func test(_ name: String, _ function: (Int) throws -> [Int]) {
    var result = [Int]()
    let start = Date()
    do {
        result = try function(1_000_000)
    } catch {
        print(error)
    }
    let end = Date()
    print("\(name): biggest: \(result.last ?? 0): time: \(end.timeIntervalSince(start))")
}

test("for   ", forSieveOfEratosthenes) // for   : biggest: 999983: time: 0.0109549760818481 - Contiguous, 0.00551199913024902 - Array

The surprising part is that it is Array that is twice as quick.

Any idea why?

1 Like

With the caveat that I haven't looked at the code at all: sure, I have a guess. The problem is that when you ask for ContiguousArray you ask the standard library to enforce that all the array elements are next to each other in memory. That means that if ContiguousArray has to be resized to become larger, the stdlib will need to realloc that memory.

In some cases realloc can be the exact same cost as malloc, but in the worst-case realloc is O(n) where n is the number of elements in the array.

In general, dynamically resizing arrays resize on powers of 2. That means you get the following pattern for ContiguousArray:

  • 0 elements, 0 allocations
  • Add first element, calls malloc
  • Add second element, calls realloc, copies 1 entry.
  • Add third element, calls realloc, copies 2 entries.
  • Add fifth element, calls realloc, copies 4 entries.
  • Add ninth element, calls realloc, copies 8 entries.
  • Add 17th element, calls realloc, copies 16 entries.
  • Add 33rd element, calls realloc, copies 32 entries.

And so on. Clearly for large arrays this can be quite costly! However, if Array is free to be non-contiguous it can allocate memory in blocks without copying memory already stored. That means it never has to copy an entry on writing memory: it can just allocate a new block somewhere else. That completely avoids the overhead of copying the memory around.

If my theory is right you can avoid the overhead by asking the ContiguousArray to reserveCapacity before you use it. The best approximation function is x/(log(x) - 1), so for your case a capacity of around 1<<17 should completely omit all allocations for both Array and ContiguousArray, and their performance should be approximately equal.

As a general rule, it's a mistake to assume that ContiguousArray is cheaper than Array. Any data type that enforces additional constraints on data layout is likely to be more expensive in at least some use cases than one that is free to arrange data however it sees fit. If you know you do not need the contiguous nature of the memory (e.g. you don't need an UnsafeBufferPointer to be cheap, or you don't care if the data is not necessarily cleanly cache coherent) then you should just use Array and be happy. In your case, you just need efficient access to .last, and Array is no more expensive than ContiguousArray for that.

2 Likes

It doesn't resize.

Ah, apologies, you're quite right, I misread that code. In that case, I'm out of ideas. :wink:

1 Like

Thanks for trying.

May I ask which arrays are you converting to contiguous arrays in your code? It’s not completely obvious (to me). Are the return types also contiguous, which ones in the body are converted?

@Letan: Hey, it's just 30 lines of code and the second line is:

var possibles = Array(repeating: true, count: limit + 1) // or **ContiguousArray**

Note the comment : )


@hlovatt:
I've copy pasted the code into the main file of a command line project and ran it.

A small side note is that the program is missing an import Foundation(otherwise Date won't work).

The program performs very little work just once, so in order to get a micro benchmark from which we can draw any conclusions whatsoever, it'll have to be modified to do more work, on random data, repeatedly, and in a way such that the important stuff won't get removed by dead code elimination etc.

Anyway, I wouldn't be surprised if ContiguousArray was slower, I think I've been in a situation where I noticed that too.

Ah sorry, I completely missed that. I was on a mobile screen :)


Doing a quick run on my system it looks like Array outperforms ContiguosArray only when the build is optimised, otherwise ContiguosArray performs better.

Un-Optimized:
Array: 0.577607989311218
Contiguos Array: 0.292797088623047

Optimized:
Array: 0.00630402565002441
Contiguos Array: 0.0145009756088257

So probably some missed optimisation opportunities.

Note: I've just run the program as provided without any of @Jens's suggestions.

1 Like

Oops - corrected - thanks

I am using the program to compare different looping constructs, so that is by intention.

Will try with larger limit on the primes to see if that effects things.

Yes similar to what I get.

I am wondering if it is memory related.

If ContiguousArray is ever slower than Array, it's a bug. Could be the optimizer treating them differently for some reason, or could be that Array at some point got a customization that was forgotten from ContiguousArray. But definitely a bug.

ContiguousArray is supposed to just be an implementation of Array but without the parts of Array that handle lazily bridging NSArray. These aspects of Array can result in unnecessary branches the optimizer can't eliminate, in which case a ContiguousArray can speed things up. But it should never go the other way. Can you file a jira?

EDIT: Ah, I missed the observation earlier this was only for unoptimized code. Not worth filing a jira for that.

EDIT2: No, I misread that and it's slower when optimized! That is bad... definitely could do with a Jira.

(cc @Erik_Eckstein, @lancep)

4 Likes

You misread. ContiguousArray is only slower in this code when optimized.

2 Likes

Just a couple of more questions about the code example : )

(Note that I don't doubt that there might be a bug which makes ContiguousArray slower than Array, and as I said, I even think this is the case. But if the code example is to be used as a demonstration of this bug then it's good if it's minimal and clear.)

  1. Why are there throws (and try and catch) when nothing in the code can actually throw?

  2. I guess the stride will be empty for all i * i > limit which is for all i > sqrt(limit), which is for most i, is this intentional? EDIT: I see now what the algorithm is doing, and that this is intentional, ie the inner j-loop should do nothing for i > sqrt(limit).

  3. Why not make a couple (say 10) test runs instead of just a single one, to get an overview of several measurements (fastest, slowest, average)? The timings will probably differ a bit, and it's good to show how much/little that is.

Here's a modified version that addresses point 1 and 3:

import Foundation

typealias A = Array
//typealias A = ContiguousArray

func forSieveOfEratosthenes(limit: Int) -> [Int] {
    var possibles = A(repeating: true, count: limit + 1)
    var result = [Int]()
    for i in 2 ... limit {
        guard possibles[i] else {
            continue
        }
        for j in stride(from: i * i, through: limit, by: i) {
            possibles[j] = false
        }
        result.append(i)
    }
    return result
}

func test() {
    let start = Date()
    let result = forSieveOfEratosthenes(limit: 1_000_000)
    let end = Date()
    print("biggest: \(result.last ?? 0): time: \(end.timeIntervalSince(start))")
}
for _ in 0 ..< 10 { test() }

It still demonstrates that A = ContiguousArray is slower than A = Array (in optimized builds).


EDIT 2:
Here's an interesting thing! I noticed that with the following reformulation (I'm pretty sure it is functionally equivalent), A = ContiguousArray is as fast as A = Array:

import Foundation

//typealias A = Array
typealias A = ContiguousArray

func forSieveOfEratosthenes(limit: Int) -> [Int] {
    var possibles = A(repeating: true, count: limit + 1)
    var result = [Int]()
    for i in 2 ... limit {
        guard possibles[i] else { continue }
        var j = i * i
        while j <= limit {
            possibles[j] = false
            j += i
        }
        result.append(i)
    }
    return result
}

func test() {
    let start = Date()
    let result = forSieveOfEratosthenes(limit: 1_000_000)
    let end = Date()
    print("biggest: \(result.last ?? 0): time: \(end.timeIntervalSince(start))")
}
for _ in 0 ..< 10 { test() }

Filed SR-7637.

1 Like

I know it's been pointed out this comment doesn't apply to this particular case, but I also don't think this applies to Swift in general: an Array is contiguous unless it comes from Objective-C. In code terms, this can be seen by noticing that, with Objective-C support on, Array stores an ArrayBuffer which stores a _BridgeStorage<_ContiguousArrayStorageBase, _NSArrayCore>. Without Objective-C support, it just stores a contiguous array.

My experience is, often, the extra complexity of implementing non-contiguous data structures, plus the poorer cache behaviour (both in terms of just requiring two+ pointer dereferences and also the data itself being more spread out) means seemingly-better data structures often lose out to the "dumb" one. There's an example of linked lists vs. vectors that Stroustrup (of C++ fame) uses, where the contiguous storage wins out even for workloads that a list would seemingly prefer (removing elements from somewhere other than the end).

I think some of the confusion here might be that you're talking about the result array, not the possibilities array? I think the latter is the one being changed here.

Also, you might be interested to know that the x/(log(x) - 1) is, for x ≥ 5393, a lower bound on the number of primes below x (meaning there may be one resize at the very end, if the true count is above the lower bound, as it is likely to be), and it's also not quite the best one. For instance, page 14 of http://www.unilim.fr/laco/theses/1998/T1998_01.pdf (the equations can still be understood by those like me who can't read French, if you keep in mind that , is the decimal point) summarises various bounds for the prime counting function π with more detail in Theorem 1.10 (page 36). The π(x) ≥ x/(log(x)-1) approximation does appear there, but it seems to only be the best lower bound (of the ones the author lists) for 5393 ≤ x < 32299.

1 Like

@Jens thanks for filing. Your version the code is better than mine!

The reason for the odd code in my case, including the spurious throws, is that it is an extract from a larger piece of code; I cut down the larger code to some extent to make the post shorter but didn't really go far enough - thanks for fixing up the code.

I have also noted that if you rewrite the code using lazy iteration then the problem goes away, like you noted with a while loop. This would indicate it is some strange interaction with ContiguousArray, Stride's iterator, and the optimizer.

1 Like

With 4.2 original problem goes away, but comms back for lazy! EG:

import Foundation

protocol Possibles {
    init(repeating: Bool, count: Int)
    subscript(index: Int) -> Bool { get set }
    var count: Int { get }
}
extension Array: Possibles where Element == Bool {}
extension ContiguousArray: Possibles where Element == Bool {}

func lazySieveOfEratosthenes<Storage: Possibles>(makeStorage: () -> Storage) -> [Int] {
    var possibles = makeStorage()
    let limit = possibles.count - 1
    return (2 ... limit).lazy.filter { i in // Lazy, so that `possibles` is updated before it is used.
        possibles[i]
    }.map { i in
        stride(from: i * i, through: limit, by: i).lazy.forEach { j in
            possibles[j] = false
        }
        return i
    }.reduce(into: [Int]()) { (result, next) in
        result.append(next)
    }
}

func forSieveOfEratosthenes<Storage: Possibles>(makeStorage: () -> Storage) -> [Int] {
    var possibles = makeStorage()
    let limit = possibles.count - 1
    var result = [Int]()
    for i in 2 ... limit where possibles[i] { // Lazy, so that `possibles` is updated before it is used.
        for j in stride(from: i * i, through: limit, by: i) {
            possibles[j] = false
        }
        result.append(i)
    }
    return result
}

func test(_ name: String, _ storageType: String, _ function: (Int) -> [Int]) {
    let start = Date()
    let result = function(1_000_000)
    let end = Date()
    print("\(name) \(storageType) biggest: \(result.last ?? 0) time: \(end.timeIntervalSince(start))")
}
test("for   ", "Array          ") { limit in
    forSieveOfEratosthenes {
        Array(repeating: true, count: limit + 1)
    }
}
test("for   ", "ContiguousArray") { limit in
    forSieveOfEratosthenes {
        ContiguousArray(repeating: true, count: limit + 1)
    }
}
test("lazy  ", "Array          ") { limit in
    lazySieveOfEratosthenes {
        Array(repeating: true, count: limit + 1)
    }
}
test("lazy  ", "ContiguousArray") { limit in
    lazySieveOfEratosthenes {
        ContiguousArray(repeating: true, count: limit + 1)
    }
}

Prints:

for    Array           biggest: 999983 time: 0.009091973304748535
for    ContiguousArray biggest: 999983 time: 0.007838010787963867
lazy   Array           biggest: 999983 time: 0.006950974464416504
lazy   ContiguousArray biggest: 999983 time: 0.007106900215148926

Argh!!!

Did you try Xcode 10 beta, both with its default toolchain and with the most recent development snapshot?

Yes above results with Xcode 10 bite and Swift set to 4.2 and optimization on.

No, it doesn't, at least not if the original problem is the same as the one demonstrated by the program in the description of SR-7637. I've tested and verified that this issue remains (in default toolchain of Xcode 10 beta 1, and in development snapshot 2018-06-06).

I guess that you thought the original issue was gone because your new test program is not affected by the original problem, and this is probably because the func is now being passed as a closure (Ah, the predictability of Swift performance!).


Looks to me like they are all about the same (since it's just a single run of each, and the times would probably fluctuate a bit between repeated runs), what do you mean by the problem coming back for lazy?


When I run your program, with a slight modification of wrapping the content of your test function in a for-loop, so that each test is performed more than once (so we can see the fluctuations in time of repeated trials), then there is not much difference between the four cases.

I'm using Xcode 10 beta 1 (and -O / release build, -swift-version 4.2), and I've tried both the default toolchain and the most recent dev snapshot. See the exact results in Details.

Note that I included the results of a debug build at the end, as that's the only way I can see a significant slow down for the lazy ones compared to the others. (But all four are between 30 and 90 times slower than in a release build.)

Details

The modification I did to your code:

func test(_ name: String, _ storageType: String, _ function: (Int) -> [Int]) {
    for _ in 0 ..< 5 { // <-- I added this
        let start = Date()
        let result = function(1_000_000)
        let end = Date()
        print("\(name) \(storageType) biggest: \(result.last ?? 0) time: \(end.timeIntervalSince(start))")
    }
}

With default toolchain of Xcode 10 beta 1, release build:

for    Array           biggest: 999983 time: 0.013331055641174316
for    Array           biggest: 999983 time: 0.012892007827758789
for    Array           biggest: 999983 time: 0.011870980262756348
for    Array           biggest: 999983 time: 0.011000990867614746
for    Array           biggest: 999983 time: 0.010987043380737305
for    ContiguousArray biggest: 999983 time: 0.012047052383422852
for    ContiguousArray biggest: 999983 time: 0.011559009552001953
for    ContiguousArray biggest: 999983 time: 0.011247992515563965
for    ContiguousArray biggest: 999983 time: 0.010803937911987305
for    ContiguousArray biggest: 999983 time: 0.010781049728393555
lazy   Array           biggest: 999983 time: 0.011129975318908691
lazy   Array           biggest: 999983 time: 0.011131048202514648
lazy   Array           biggest: 999983 time: 0.010547995567321777
lazy   Array           biggest: 999983 time: 0.010286927223205566
lazy   Array           biggest: 999983 time: 0.010206937789916992
lazy   ContiguousArray biggest: 999983 time: 0.010246038436889648
lazy   ContiguousArray biggest: 999983 time: 0.010326981544494629
lazy   ContiguousArray biggest: 999983 time: 0.010244965553283691
lazy   ContiguousArray biggest: 999983 time: 0.011294007301330566
lazy   ContiguousArray biggest: 999983 time: 0.010947942733764648

With Development Snapshot 2018-06-06, release build:

for    Array           biggest: 999983 time: 0.014073014259338379
for    Array           biggest: 999983 time: 0.01243603229522705
for    Array           biggest: 999983 time: 0.012018918991088867
for    Array           biggest: 999983 time: 0.011536002159118652
for    Array           biggest: 999983 time: 0.011247038841247559
for    ContiguousArray biggest: 999983 time: 0.011636972427368164
for    ContiguousArray biggest: 999983 time: 0.011552095413208008
for    ContiguousArray biggest: 999983 time: 0.011477947235107422
for    ContiguousArray biggest: 999983 time: 0.011501073837280273
for    ContiguousArray biggest: 999983 time: 0.012168049812316895
lazy   Array           biggest: 999983 time: 0.01273202896118164
lazy   Array           biggest: 999983 time: 0.011597990989685059
lazy   Array           biggest: 999983 time: 0.011325955390930176
lazy   Array           biggest: 999983 time: 0.011317968368530273
lazy   Array           biggest: 999983 time: 0.011630058288574219
lazy   ContiguousArray biggest: 999983 time: 0.011358976364135742
lazy   ContiguousArray biggest: 999983 time: 0.011302947998046875
lazy   ContiguousArray biggest: 999983 time: 0.0121079683303833
lazy   ContiguousArray biggest: 999983 time: 0.011568069458007812
lazy   ContiguousArray biggest: 999983 time: 0.012079000473022461

With default toolchain of Xcode 10 beta 1, DEBUG build:

for    Array           biggest: 999983 time: 0.6173779964447021
for    Array           biggest: 999983 time: 0.6165099143981934
for    Array           biggest: 999983 time: 0.6127970218658447
for    Array           biggest: 999983 time: 0.6117799282073975
for    Array           biggest: 999983 time: 0.6133040189743042
for    ContiguousArray biggest: 999983 time: 0.32653796672821045
for    ContiguousArray biggest: 999983 time: 0.3264739513397217
for    ContiguousArray biggest: 999983 time: 0.32669806480407715
for    ContiguousArray biggest: 999983 time: 0.32600200176239014
for    ContiguousArray biggest: 999983 time: 0.32672107219696045
lazy   Array           biggest: 999983 time: 1.0019129514694214
lazy   Array           biggest: 999983 time: 0.994049072265625
lazy   Array           biggest: 999983 time: 1.0054988861083984
lazy   Array           biggest: 999983 time: 1.0101089477539062
lazy   Array           biggest: 999983 time: 1.012789011001587
lazy   ContiguousArray biggest: 999983 time: 0.7068800926208496
lazy   ContiguousArray biggest: 999983 time: 0.7070260047912598
lazy   ContiguousArray biggest: 999983 time: 0.7070150375366211
lazy   ContiguousArray biggest: 999983 time: 0.7148840427398682
lazy   ContiguousArray biggest: 999983 time: 0.7186750173568726

Also, don't forget the fastest and most stable variant!

The variant of the SieveOfEratosthenes func with an "alternative inner loop" in SR-7637, is still more than 2 times faster than the for… and lazy… variants, and it will be fast no matter if Array or ContiguousArray is used, and no matter if it is passed as a closure or called directly.

We could probably demonstrate this by writing a new version of the demo program of SR-7637, and include your for… and lazy… variants, as well as this while… variant:

func whileSieveOfEratosthenes<Storage: Possibles>(makeStorage: () -> Storage) -> [Int] {
    var possibles = makeStorage()
    let limit = possibles.count - 1
    var result = [Int]()
    for i in 2 ... limit where possibles[i] {
        var j = i * i
        while j <= limit {  // <-- Using a while loop like this here.
            possibles[j] = false
            j += i
        }
        result.append(i)
    }
    return result
}

And compare how the three variants perform when used with Array or ContiguousArray and when they are passed to a test function as a closure or called directly. I think this would be an interesting test that might perhaps help the devs fixing the issue(s) behind these inconsistencies.

I will write and add this test here when I get the time.

1 Like