One Terabyte Array Idea

This deserves being in a separate thread.

Continuing the discussion from Should we rename Deque?:

Rather that create a pile of work for John, and to make it easier for everyone new to this, it's probably best to just pick out or summarize the one or two salient details from each of those fairly short comments on your original post.

John, thanks for moving those posts.
@scanon I believe your above post is no longer applicable and could be deleted?

Edit: Thanks to @ibex10 for alerting me about bugs and suggestions how to improve the implementation. Now the results are much more sane and not off the charts. For debug mode I'm reducing the element count 100x fold to not wait too long. If you try the app better use release mode.


I gave this idea a test:

sample app
import Foundation

let kb = 1024
let mb = 1024*kb
let gb = 1024*mb
let tb = 1024*gb

// TODO: COW, etc
struct QuickArray<Element> {
    var items: UnsafeMutablePointer<Element>
    var startOffset: Int
    
    init() {
        let count = tb / MemoryLayout<Element>.stride
        items = UnsafeMutableRawPointer
            .allocate(byteCount: tb, alignment: MemoryLayout<Element>.alignment)
            .bindMemory(to: Element.self, capacity: count)
        startOffset = count / 2
    }
    
    func deinitialize() {
        items.deallocate()
    }
    
    subscript(_ index: Int) -> Element {
        get {
            precondition(index >= 0 && index < count)
            return items[startOffset + index]
        }
        set {
            precondition(index >= 0 && index < count)
            items[startOffset + index] = newValue
        }
    }
    
    private (set) var count: Int = 0
    
    mutating func append(_ element: Element) {
        count += 1
        self[count - 1] = element
    }
    
    mutating func prepend(_ element: Element) {
        count += 1
        startOffset -= 1
        self[0] = element
    }
}

extension Array {
    mutating func prepend(_ element: Element) {
        insert(element, at: 0)
    }
}

@inline (never)
func test_arrayOperation(size: Int, execute: (Int) -> Void) {
    for i in 0 ..< size {
        execute(i)
    }
}

@inline (never)
func test_appendQuickArray(size: Int) {
    var items = QuickArray<UInt8>()
    test_arrayOperation(size: size) { i in
        items.append(UInt8(i & 0xFF))
    }
    items.deinitialize()
}

@inline (never)
func test_prependQuickArray(size: Int) {
    var items = QuickArray<UInt8>()
    test_arrayOperation(size: size) { i in
        items.prepend(UInt8(i & 0xFF))
    }
    items.deinitialize()
}

@inline (never)
func test_appendNormalArray(size: Int) {
    var items: [UInt8] = []
    items.reserveCapacity(size)
    test_arrayOperation(size: size) { i in
        items.append(UInt8(i & 0xFF))
    }
    items.removeAll()
    items.reserveCapacity(0)
}

@inline (never)
func test_prependNormalArray(size: Int) {
    var items: [UInt8] = []
    items.reserveCapacity(size)
    test_arrayOperation(size: size) { i in
        items.prepend(UInt8(i & 0xFF))
    }
    items.removeAll()
    items.reserveCapacity(0)
}

func runTests() {
    #if DEBUG
    print("tests started DEBUG mode")
    let bigArraySize = 10*mb
    let smallArraySize = 1*mb
    #else
    print("tests started RELEASE mode")
    let bigArraySize = 1000*mb
    let smallArraySize = 1*mb
    #endif

    print("append quick array test")
    var start = Date()
    test_appendQuickArray(size: bigArraySize)
    print("append quick array \(bigArraySize/mb) million elements: \(Date().timeIntervalSince(start)) sec")
    sleep(5)

    print("prepend quick array test")
    start = Date()
    test_prependQuickArray(size: bigArraySize)
    print("prepend quick array \(bigArraySize/mb) million elements: \(Date().timeIntervalSince(start)) sec")
    sleep(5)

    print("append normal array test")
    start = Date()
    test_appendNormalArray(size: bigArraySize)
    print("append normal array \(bigArraySize/mb) million elements: \(Date().timeIntervalSince(start)) sec")
    sleep(5)

    print("prepend normal array test")
    // too slow, using smaller size
    start = Date()
    test_prependNormalArray(size: smallArraySize)
    print("prepend normal array \(smallArraySize/mb) million elements!!!: \(Date().timeIntervalSince(start)) sec")
    sleep(5)

    print("tests done")
}

runTests()

Got these results:

RELEASE mode!
append quick array 1000 million elements: 0.95 sec
prepend quick array 1000 million elements: 0.77 sec
append normal array 1000 million elements: 2.35 sec
prepend normal array 1 million elements: 10.53 sec // 1000x fewer elements

All ops above but prepend on normal array are O(1), and as prepend on normal array is O(N) I can't run it with the same large number of elements to directly compare the speed difference. Note that this simple QuickArray implementation supports only POD types at the moment.

This was on 64 bits CPU that has an enormous address space. How to do a similar thing on 32 bits that doesn't have huge address space? We could do something similar with smaller upfront allocation: e.g. for an array that grows to 100K elements we allocate 200K elements with the start offset positioned to 50K from the arena start so the array elements are "centred". Once array is outgrown the allocated arena - double arena size and place array elements "centred" again in the arena area. Similarly for shrinking.


This might seem irrelevant at the first glance but poses a question: if it is possible to do a quick array implementation that allows quick appending / removing from both ends, why bother having a special "dequeue" type to begin with (let alone discussing its proper name :crazy_face:)

Does that program pre-allocate a terabyte for one array? How exactly do you have that much RAM lying around?

Also, in order for this to be useful, it needs to be reasonable for an app to have more than one of them at a time. I’ve run Swift in places where I’m limited to 32 megabytes of memory, so even one of these would be impractical, let alone multiple.

Virtual memory. You could allocate much more memory than you have RAM, e.g. allocating 1000 of those one terabyte arrays should not be an issue on 64 bit CPU and will fit into much less than 32 megabytes of available memory. It's only when you actually use that memory when it is mapped and could overflow RAM. Note that appending a single byte to an empty array would use page size of memory (I don't remember how much exactly, something like 4K or 16K).

Memory profile of running debug version of the app (the one that uses 1 terabyte (VM) arrays and fills arrays up to 10MB):

Noteworthy: that normal array can't shrink space has probably to do with reserveCapacity. Tried this sequence but it didn't help:
items.removeAll()
items.reserveCapacity(0)

Nope, even without a single reserveCapacity call it's the same behaviour, memory doesn't shrink back when removing from normal array in this test. Update - this is in debug mode only. It's ok in release mode


Release mode memory profile, now filling those 1TB arrays up to 10GB, prepending of normal array is not shown. Also increased delay between tests to 10 seconds for better appearance on the plot. The timing on the plot is approximate as I've drawn it manually.

The spiky growth of normal array is quite interesting and it shrinks to zero in release mode, so this is good. Interesting consequence of normal array growing - it temporarily needs twice as much wired space when it grows past its current limit and need to increase its size (by doubling the reserved amount I guess). Not showing prepending of normal array as it would be too slow growing at this scale of sizes.

Note: If you run these tests don't fill arrays to more than about half of your RAM, so if you have 16GB of RAM don't fill them to more than 8GB - in this case when the spike happens during normal array growing you won't hit your RAM limit and VM won't have to swap to disk, which would otherwise make results significantly (orders of magnitude) worse.

The cost of the indexing operation between Deque and Array is quite different. Both are O(1), but the constant factors matter substantially here. And neither of them over-allocates by a page, which makes them vastly kinder on memory.

Right. In the common case (index statically knowable to be in-bounds by construction, especially when iterating an array), it's vital to program performance that the compiler be able to reduce an array access to just a single load. Making that work for Deque is ... possible in most cases, but significantly harder to achieve.

3 Likes

I wonder if this could still work with less than a terabyte of virtual RAM

Here is another idea, which doesn't require any preallocation. It uses two regular arrays to speed up prepend.

class TwoEndedArray <Element> {
    //
    // self [0]       == last element of head_store if not empty, or first element of tail_store
    // self [count-1] == last element of tail_store if not empty, or first element of head_store
    
    private var head_store = Array <Element> ()
    private var tail_store = Array <Element> ()
    
    var count: Int {
        head_store.count + tail_store.count
    }
}

Insertions at the head and tail are both O(1), but removals can be O(n).

Details
// ---------------------------------------------------------------
//
class TwoEndedArray <Element> {
    //
    // self [0]       == last element of head_store if not empty, or first element of tail_store
    // self [count-1] == last element element of tail_store if not empty, or first element of head_store
    
    private var head_store = Array <Element> ()
    private var tail_store = Array <Element> ()
    
    var count: Int {
        head_store.count + tail_store.count
    }
}

// -----------------------------------------------------------------
//
extension TwoEndedArray {
    var first: Element? {
        guard count > 0 else {
            return nil
        }
        return self [0]
    }
    
    var last: Element? {
        guard count > 0 else {
            return nil
        }
        return self [count - 1]
    }
}

// -----------------------------------------------------------------
//
extension TwoEndedArray {
    func remove_first () -> Element? {
        if head_store.count > 0 {
            return head_store.removeLast()
        }
        
        if tail_store.count > 0 {
            return tail_store.removeFirst()
        }
        
        return nil
    }
    
    func remove_last () -> Element? {
        if tail_store.count > 0 {
            return tail_store.removeLast()
        }
        
        if head_store.count > 0 {
            return head_store.removeFirst()
        }
        
        return nil
    }
}


// -----------------------------------------------------------------
//
extension TwoEndedArray {
    func append(_ u: Element) {
        tail_store.append (u)
    }
    
    func prepend(_ u: Element) {
        head_store.append (u)
    }
}

// -----------------------------------------------------------------
//
extension TwoEndedArray {
    subscript(_ index: Int) -> Element {
        get {
            precondition (index >= 0 && index < count)
            let M = head_store.count
            if M > 0 {
                if index < M {
                    return head_store [M - 1 - index]
                }
                else {
                    let index = index - M
                    precondition (index >= 0 && index < tail_store.count)
                    return tail_store [index]
                }
            }
            else {
                precondition (index >= 0 && index < tail_store.count)
                return tail_store [index]
            }
        }
        set {
            let M = head_store.count
            if M > 0 {
                if index < M {
                    head_store [M - 1 - index] = newValue
                }
                else {
                    let index = index - M
                    precondition (index >= 0 && index < tail_store.count)
                    tail_store [index] = newValue
                }
            }
            else {
                precondition (index >= 0 && index < tail_store.count)
                tail_store [index] = newValue
            }
        }
    }
}

// -----------------------------------------------------------------
//
extension TwoEndedArray: CustomStringConvertible {
    var description: String {
        "[\(head_store) \(tail_store)]"
    }
}

Without commenting on anything else: there was a lot of effort expended to make Array<T>'s representation take as few bytes as possible, namely the size of a pointer. This is because we often store arrays of arrays, and it was deemed worth the effort to make that as economical as possible. Doubling the size of Array's primary representation, and then doubling the amount of pointer-chasing required seems like a high cost.

6 Likes

That's basically just a Deque isn't it?

Wait, so array stores both it's size and capacity inside of allocated memory? TIL I thought it was like string and only stored the capacity inside the allocation.