Memory-optimal collections of Optionals

wadetregaskis · February 23, 2024, 5:16pm

Continuing the discussion from [Pitch] 128 bit Integer Types:

I'm interested in thinking this through more.

Benefits

Separate storage for the optional bit vs the value is potentially a pretty significant win. For a start it can greatly compact the value storage (e.g. eight Int64s per 64 byte cache line instead of only four), which saves memory and makes accessing the array faster (by virtue of less memory traffic, better cache utilisation, lower average latency, etc).

It can be an even bigger win for many operations, like 'compactMapping' nil values out - scanning the optionality bitmap is much faster than the values bitmap.

Similarly many existing collection methods can benefit from this, although it might require new variants of them in many cases, since e.g. the existing filter(by:) expects to visit every element, nil or not. I suspect many uses would only care about the non-nil values so could benefit from new methods like (mutating) nilling(if:) and a nonNil "modifier" like lazy which vends a view with the Optional wrapper removed.

Spin-off collection types

There's also the possibility of further compacting the storage to not have empty spots for nil values. This does make look-ups more expensive since you can't just do a naive base + index * size, so it would only make sense for certain types of use / collections (i.e. not RandomAccessCollection).

Challenges & limitations

I don't know whether it's possible modify the existing collections in this manner, but even if not it might make sense to have new variants, e.g. CompactOptionalsArray in swift-collections or whatever.

To elaborate on that, one of the concerns raised is that Array provides direct access to its storage, which is then ostensibly requiring cohabitated layout of the Optional with its wrapped type. Hypothetically an intermediary view could be introduced which provides the illusion of the existing "inline" layout, but it seems impossible (or at least impractical inefficient) to do this in a way that provides truly direct memory access (UnsafeBufferPointer et al).

Even if so, precluding the existing Array from adopting this, that seems like functionality that's not often used (?) and so wouldn't be a significant dampener against new collection types.

Existing implementations?

I haven't encountered any implementations of collections like this, although I haven't specifically & thoroughly looked for them. Anyone happen to know of any?

Joe_Groff · February 23, 2024, 6:23pm

Not to derail this ideal with an even more ambitious one, but more generally, it would be cool to have a type that was aware of Swift type layouts and provide a struct-of-arrays data structure for it as well. Optional could be seen as a special case of this where the "struct" consists of the value (if present) and the boolean "value is present" flag.

John_McCall · February 23, 2024, 6:29pm

It'd be a pretty straightforward proposal to add extra-inhabitants APIs to MemoryLayout and the unsafe pointers.

al45tair · February 25, 2024, 11:09am

Jonathan Blow’s “jai” language has that feature, I believe. Not sure I’ve seen any other implementation of it anywhere, but it has numerous useful applications, particularly in graphics, audio and game development.

George · February 25, 2024, 3:47pm

Seems like it would be possible to achieve something like this with a macro today, though it would probably need to annotate the element type.

Joe_Groff · February 25, 2024, 7:01pm

I think so too, though another weakness of the macro approach would be that (I think) you'd have to instantiate the macro individually for every type you want to SoA, and the resulting collections types would be value-specific. It would be cool if we had a reflection API that could be constant-folded, giving a result similar to what D, Zig, and those other static-reflection-centric languages can do, while still executing dynamically across ABI boundaries, allowing for a generic SoArray<T> to be instantiated on any type while maintaining Swift's support for separate compilation.

tera · February 25, 2024, 9:24pm

Yes!
(and indeed Optional is just a special case).

Sketch implementation:

protocol PackableType { associatedtype Packed: PackedType }

protocol PackedType: Hashable {
    associatedtype Unpacked: Hashable, PackableType
    init(unpacked: Unpacked)
    var unpacked: Unpacked { get }
}

extension Array where Element: PackedType {
    subscript(unpack index: Index) -> Element.Unpacked {
        get { self[index].unpacked }
        set { self[index] = Element(unpacked: newValue) }
    }
    mutating func append(unpacked element: Element.Unpacked) {
        append(Element(unpacked: element))
    }
}
extension Dictionary where Value: PackedType {
    subscript(unpack key: Key) -> Value.Unpacked? {
        get { self[key]?.unpacked }
        set {
            guard let newValue else { self[key] = nil; return }
            self[key] = Value(unpacked: newValue)
        }
    }
}

func writeBytes(_ source: UnsafeRawPointer, _ destination: UnsafeMutableRawPointer, _ bitOffset: inout Int, _ count: Int) {
    precondition(bitOffset % 8 == 0, "TODO this case later")
    let byteOffset = bitOffset / 8
    // TODO: do the following better
    let dst = destination.assumingMemoryBound(to: UInt8.self) + byteOffset
    let src = source.assumingMemoryBound(to: UInt8.self)
    for i in 0 ..< count { dst[i] = src[i] }
    bitOffset += count * 8
}
func writeBits(_ source: UnsafeRawPointer, _ destination: UnsafeMutableRawPointer, _ bitOffset: inout Int, _ count: Int) {
    preconditionFailure("TODO later")
    bitOffset += count
}
func readBytes(_ destination: UnsafeMutableRawPointer, _ source: UnsafeRawPointer, _ bitOffset: inout Int, _ count: Int) {
    precondition(bitOffset % 8 == 0, "TODO this case later")
    let byteOffset = bitOffset / 8
    // TODO: do the following better
    let dst = destination.assumingMemoryBound(to: UInt8.self)
    let src = source.assumingMemoryBound(to: UInt8.self) + byteOffset
    for i in 0 ..< count { dst[i] = src[i] }
    bitOffset += count * 8
}
func readBits(_ destination: UnsafeMutableRawPointer, _ source: UnsafeRawPointer, _ bitOffset: inout Int, _ count: Int) {
    preconditionFailure("TODO later")
    bitOffset += count
}

struct ExampleType: Hashable {
    var a: Int8 = 0
    var b: Int32 = 0
}

extension ExampleType: PackableType { // autogenerated somehow
    struct Packed: PackedType {
        typealias Unpacked = ExampleType // will be different for differnt types
        
        typealias Bytes = (UInt8, UInt8, UInt8, UInt8, UInt8) // will be different for differnt types
        var bytes: Bytes
        
        static func == (lhs: Self, rhs: Self) -> Bool {
            lhs.unpacked == rhs.unpacked
        }
        func hash(into hasher: inout Hasher) {
            hasher.combine(unpacked)
        }
        init(unpacked: ExampleType) {
            var bitOffset = 0
            var unpacked = unpacked // 😢 have to make a writeable copy for no good reason
            // the following will be different for differnt types
            bytes = (0, 0, 0, 0, 0) // 😢 have to initialize it first
            writeBytes(&unpacked.a, &bytes, &bitOffset, 1)
            writeBytes(&unpacked.b, &bytes, &bitOffset, 4)
        }
        var unpacked: ExampleType {
            var bitOffset = 0
            var bytes = bytes // 😢 have to make a writeable copy for no good reason
            // the following will be different for differnt types
            var unpacked = ExampleType() // 😢 have to initialize it first
            readBytes(&unpacked.a, &bytes, &bitOffset, 1)
            readBytes(&unpacked.b, &bytes, &bitOffset, 4)
            return unpacked
        }
    }
}

var dict: [String: ExampleType.Packed] = [:]
var array: [ExampleType.Packed] = []
var a = ExampleType(a: 0x11, b: 0x22334455)
var b = ExampleType(a: 0x66, b: 0x778899AA)
array.append(unpacked: a)
array.append(unpacked: b)
// array at this point is: 11 55 44 33 22 66 aa 99 88 77
precondition(array[unpack: 1] == b)
dict[unpack: "hello"] = a
precondition(dict[unpack: "hello"] == a)

This sketch actually works! The use site required changes though...

var dict: [String: ExampleType.Packed] = [:]
var array: [ExampleType.Packed] = []
var a = ExampleType(a: 0x11, b: 0x22334455)
var b = ExampleType(a: 0x66, b: 0x778899AA)
array.append(unpacked: a)
array.append(unpacked: b)
// array memory at this point is:
// 11 55 44 33 22 66 aa 99 88 77
precondition(array[unpack: 1] == b)
dict[unpack: "hello"] = a
precondition(dict[unpack: "hello"] == a)

as instead of original ExampleType I have to use ExampleType.Packed to store in a collection and then instead of normal subscript operator, etc use the "unpack/unpacked" counterparts. which raises a question or three:

perhaps I could do just array.append(b.packed) or array[index].unpacked, etc
can the use site be made looking normal, without those packs/unpacks?
or could we embed it into collections implementation, which would check if element type adheres to some protocol similar to "Packable" above and do the pack/unpack business transparently?

tera · February 25, 2024, 9:46pm

A cleaner sketch that uses Array & Dictionary wrappers

protocol PackedType: Hashable {
    associatedtype Unpacked: Hashable, PackableType where Unpacked.Packed == Self
    init(unpacked: Unpacked)
    var unpacked: Unpacked { get }
}
protocol PackableType: Hashable {
    associatedtype Packed: PackedType where Packed.Unpacked == Self
    var packed: Packed { get }
}
extension PackableType {
    var packed: Packed { Packed(unpacked: self) }
}

struct PackedArray<Element: PackableType> where Element.Packed.Unpacked == Element {
    private var internalArray: [Element.Packed] = []
    var count: Int { internalArray.count }
    mutating func append(_ element: Element) {
        internalArray.append(element.packed)
    }
    subscript(_ index: Int) -> Element {
        get { internalArray[index].unpacked }
        set { internalArray[index] = newValue.packed }
    }
}

extension PackedArray: ExpressibleByArrayLiteral where Element: PackableType {
    init(arrayLiteral elements: Element...) {
        internalArray = elements.map { $0.packed }
    }
}
extension PackedDictionary: ExpressibleByDictionaryLiteral where Value: PackableType {
    init(dictionaryLiteral elements: (Key, Value)...) {
        let elements = elements.map { ($0, $1.packed) }
        internalDictionary = Dictionary(uniqueKeysWithValues: elements)
    }
}

struct PackedDictionary<Key: Hashable, Value: PackableType> where Value.Packed.Unpacked == Value {
    private var internalDictionary: [Key: Value.Packed] = [:]
    var count: Int { internalDictionary.count }
    subscript(_ key: Key) -> Value? {
        get { internalDictionary[key]?.unpacked }
        set {
            guard let newValue else {
                internalDictionary[key] = nil
                return
            }
            internalDictionary[key] = newValue.packed
        }
    }
}

func writeBytes(_ source: UnsafeRawPointer, _ destination: UnsafeMutableRawPointer, _ bitOffset: inout Int, _ count: Int) {
    precondition(bitOffset % 8 == 0, "TODO this case later")
    let byteOffset = bitOffset / 8
    // TODO: do the following better
    let dst = destination.assumingMemoryBound(to: UInt8.self) + byteOffset
    let src = source.assumingMemoryBound(to: UInt8.self)
    for i in 0 ..< count { dst[i] = src[i] }
    bitOffset += count * 8
}
func writeBits(_ source: UnsafeRawPointer, _ destination: UnsafeMutableRawPointer, _ bitOffset: inout Int, _ count: Int) {
    preconditionFailure("TODO later")
    bitOffset += count
}
func readBytes(_ destination: UnsafeMutableRawPointer, _ source: UnsafeRawPointer, _ bitOffset: inout Int, _ count: Int) {
    precondition(bitOffset % 8 == 0, "TODO this case later")
    let byteOffset = bitOffset / 8
    // TODO: do the following better
    let dst = destination.assumingMemoryBound(to: UInt8.self)
    let src = source.assumingMemoryBound(to: UInt8.self) + byteOffset
    for i in 0 ..< count { dst[i] = src[i] }
    bitOffset += count * 8
}
func readBits(_ destination: UnsafeMutableRawPointer, _ source: UnsafeRawPointer, _ bitOffset: inout Int, _ count: Int) {
    preconditionFailure("TODO later")
    bitOffset += count
}

struct ExampleType: Hashable {
    var a: Int8 = 0
    var b: Int32 = 0
}

extension ExampleType: PackableType {
    // autogenerated somehow
    struct Packed: PackedType {
        typealias Unpacked = ExampleType // will be different for differnt types
        
        typealias Bytes = (UInt8, UInt8, UInt8, UInt8, UInt8) // will be different for differnt types
        var bytes: Bytes
        
        static func == (lhs: Self, rhs: Self) -> Bool {
            lhs.unpacked == rhs.unpacked
        }
        func hash(into hasher: inout Hasher) {
            hasher.combine(unpacked)
        }
        init(unpacked: ExampleType) {
            var bitOffset = 0
            var unpacked = unpacked // 😢 have to make a writeable copy for no good reason
            // the following will be different for differnt types
            bytes = (0, 0, 0, 0, 0) // 😢 have to initialize it first
            writeBytes(&unpacked.a, &bytes, &bitOffset, 1)
            writeBytes(&unpacked.b, &bytes, &bitOffset, 4)
        }
        var unpacked: ExampleType {
            var bitOffset = 0
            var bytes = bytes // 😢 have to make a writeable copy for no good reason
            // the following will be different for differnt types
            var unpacked = ExampleType() // 😢 have to initialize it first
            readBytes(&unpacked.a, &bytes, &bitOffset, 1)
            readBytes(&unpacked.b, &bytes, &bitOffset, 4)
            return unpacked
        }
    }
}

The use site is much cleaner:

var dict: PackedDictionary<String, ExampleType> = [:]
var array: PackedArray<ExampleType> = []
var a = ExampleType(a: 0x11, b: 0x22334455)
var b = ExampleType(a: 0x66, b: 0x778899AA)
array.append(a)
array.append(b)
// array at this point is: 11 55 44 33 22 66 aa 99 88 77
// normally it would be:   11 00 00 00 55 44 33 22 66 00 00 00 aa 99 88 77
precondition(array[1] == b)
dict["hello"] = a
precondition(dict["hello"] == a)

although there's still an explicit opt-in of using PackedDictionary / PackedArray.

What are the precedents from other languages? Do they do this invisibly under the hood or not?

dnadoba · February 26, 2024, 12:18am

packed structs is a different concept to array of structs. Your example type as an array of structs would be [(Int8, Int32)] and struct of arrays would be ([Int8], [UInt32]).

Coming back to the original question, this is indeed something of interest and has come up in the past in a different context already:

Ticket map?

By replacing the buffer of optionals with a (partially initialized) unsafe buffer, and a separate bitmap indicating which slots are initialized, we can considerably reduce allocation sizes, often without significantly impacting performance. The bitmap is very compact, so it doesn't take up much space in the CPU cache. (Of course, cold lookups suffer two memory accesses rather than just one.)

For example, a 1024-item buffer of Optional<Int>s needs 16,384 bytes, while an unsafe buffer of 1024 Ints plus a same-sized bitmap only takes 8,192 + 128 = 8,320 bytes -- a mere 50.78% of the space. (Of course, we may need a constant amount of additional metadata to keep track of the bitmap's position etc.)

An example where this is currently used is in the Set implementation -- it uses this exact mechanism to represent its hash table, with both the bitmap and the unsafe elements buffer located in tail-allocated storage of an internal class instance.

We already have BitArray (referred to as bitmap in the linked post) in swift-collections and would now only need to implement the (partially initialized) unsafe buffer.

tera · February 26, 2024, 12:35am

With a few fields I guess that would have a horrible cache performance when items are accessed naturally (say a view representing this item accessing all or most of the fields).

Taking the example you quoted the 1024 packed array of Optional<Int> would take 9216 bytes which is just ~10% more than the bitmap version. OTOH performance would be not so great due to packing / unpacking overhead. It would be more cache friendly though.

Optional while common is just one use case. I'd rather see the optimisation working in general case, e.g.:

enum E { case x, y, z, t(Int) }

dnadoba · February 26, 2024, 1:45am

Struct packing and struct of arrays have different trade offs, it’s not only about size. For example a struct of arrays can usually make use of SIMD instructions as values are contiguous in memory and have the required alignment whereas packed structs need unpacking first. It can also reduce the required memory bandwidth if only a subset of the fields are accessed. However you are also right that with an increased number of fields cache trashing can become a problem if all fields are accessed together. I think both are interesting tools that should be available in swift.

wadetregaskis · February 27, 2024, 3:45am

I'm curious about the current behaviour of Optional, as it seems it's not always enlarging what it wraps:

Code

func printLayout<T>(of type: T.Type) {
    print("\(type):", MemoryLayout<T>.size, MemoryLayout<T>.stride, MemoryLayout<T>.alignment, separator: "\t")
}

print("Type\tSize\tStride\tAlignment")

printLayout(of: Bool.self)
printLayout(of: Optional<Bool>.self)

printLayout(of: Int8.self)
printLayout(of: Optional<Int8>.self)

printLayout(of: Int16.self)
printLayout(of: Optional<Int16>.self)

printLayout(of: Int32.self)
printLayout(of: Optional<Int32>.self)

printLayout(of: Int64.self)
printLayout(of: Optional<Int64>.self)

printLayout(of: Character.self)
printLayout(of: Optional<Character>.self)

printLayout(of: String.self)
printLayout(of: Optional<String>.self)

enum SimpleEnum {
    case a, b, c
}

printLayout(of: SimpleEnum.self)
printLayout(of: Optional<SimpleEnum>.self)

enum NotSoSimpleEnum {
    case a, b, c(Int)
}

printLayout(of: NotSoSimpleEnum.self)
printLayout(of: Optional<NotSoSimpleEnum>.self)

enum ComplicatedEnum {
    case a(Int), b(Bool), c(String)
}

printLayout(of: ComplicatedEnum.self)
printLayout(of: Optional<ComplicatedEnum>.self)

enum EvenMoreComplicatedEnum {
    case a(Int), b(Bool), c(String, Int, Bool)
}

printLayout(of: EvenMoreComplicatedEnum.self)
printLayout(of: Optional<EvenMoreComplicatedEnum>.self)

struct FullStruct {
    let a: Int
    let b: Int
    let c: Int
    let d: Int
}

printLayout(of: FullStruct.self)
printLayout(of: Optional<FullStruct>.self)

struct StructWithSpaceAvailable {
    let a: Int
    let b: Bool
}

printLayout(of: StructWithSpaceAvailable.self)
printLayout(of: Optional<StructWithSpaceAvailable>.self)

Type	Size	Stride	Alignment
Bool	1	1	1
Optional<Bool>	1	1	1

Int8	1	1	1
Optional<Int8>	2	2	1

Int16	2	2	2
Optional<Int16>	3	4	2

Int32	4	4	4
Optional<Int32>	5	8	4

Int64	8	8	8
Optional<Int64>	9	16	8

Character	16	16	8
Optional<Character>	16	16	8

String	16	16	8
Optional<String>	16	16	8

SimpleEnum	1	1	1
Optional<SimpleEnum>	1	1	1

NotSoSimpleEnum	9	16	8
Optional<NotSoSimpleEnum>	10	16	8

ComplicatedEnum	17	24	8
Optional<ComplicatedEnum>	17	24	8

EvenMoreComplicatedEnum	25	32	8
Optional<EvenMoreComplicatedEnum>	25	32	8

FullStruct	32	32	8
Optional<FullStruct>	33	40	8

StructWithSpaceAvailable	9	16	8
Optional<StructWithSpaceAvailable>	9	16	8

In most of these test cases it actually doesn't add any bytes to the overall size. In some it does add a byte but it's irrelevant to the stride. It's mainly just the simple scalars (or multiples thereof) that suffer, with their stride as much as doubling.

Is MemoryLayout accurate?

I'm surprised that even in the struct cases it seems to find a spare bit available in one of the Bool members.

Is this all expected?

It appears it never changes the alignment (which I expected).

Alejandro · February 27, 2024, 4:14am

Yep

wadetregaskis · February 27, 2024, 4:42am

Why does Optional sometimes steal a bit / value from an enum, but not other times? I get that there might not always be anything to steal, but in the case below there sure seems to be no matter how you slice it (spare bits and spare enum constant values) and yet Optional adds an extra byte for itself:

enum NotSoSimpleEnum {
    case a, b, c(Int)
}

Joe_Groff · February 27, 2024, 4:43am

Reusing unused tag representations from a tagged single-payload enum is something that should be possible, but I couldn't get it implemented in time before ABI stability.

tera · February 27, 2024, 12:04pm

This one is ok:

class C {}
enum E { case a, b, c(C) }

Optional's size/stride and alignment is 8

An extension to the packed array idea above: pack to bits instead of bytes. the bit count of each element would be constants (as with bytes) so array indexing would be normal, just in general the underlying array will have to reference between one and three packed elements when accessing a single unpacked element and in there could be some unused bits at the end of the array. 1024 array of optional Ints will take just 8320 bytes.

An explicit PackedDictionary / PackedArray is easier to implement but it feels it would be much less useful compared to the case when Array / Dictionary itself do the necessary logic when their Element conforms to some Packed protocol. The specific "classic" use case would be Array<Bool>.

wadetregaskis · February 28, 2024, 3:20am

Does that mean it'll now never happen?

Even for package-internal types, that [I assume] aren't limited by the ABI?

Joe_Groff · February 28, 2024, 3:47am

We might still be able to improve the layout algorithm for newly-introduced enums that never have to be handled by old compilers (or potentially old runtimes also, if runtime metadata instantiation needs to change). Non-ABI-stable platforms could also more freely take advantage of new ABI optimizations.