Fixed size array hacks

Swift doesn't have fixed size arrays, and I guess many of us have experimented with various workarounds.

I thought it would be fun to share and discuss our ways around Swift's lack of fixed size arrays here.

From the very simple ones like:

var myFixedSizeArrayOfEightInts = (0, 0, 0, 0, 0, 0, 0, 0)

and

var myFourFloats = SIMD4<Float>

to more involved "solutions" like eg:

protocol FixedSizeArray : ExpressibleByArrayLiteral, RandomAccessCollection where Index == Int {
    associatedtype Element
    associatedtype Storage
    var storage: Storage { get set }
    init(elementForIndex: (Int) -> Element)
}
extension FixedSizeArray {
    static var count: Int {
        let storageSize = MemoryLayout<Storage>.size
        let elementStride = MemoryLayout<Element>.stride
        precondition(storageSize.isMultiple(of: elementStride))
        return storageSize / elementStride
    }
    var count: Int { return Self.count }
    var startIndex: Int { return 0 }
    var endIndex: Int { return Self.count }
    subscript(position: Int) -> Element {
        get {
            precondition(0 <= position && position < Self.count, "\(position)")
            return self[unchecked: position]
        }
        set {
            precondition(0 <= position && position < Self.count)
            self[unchecked: position] = newValue
        }
    }
    subscript(unchecked position: Int) -> Element {
        get {
            return withUnsafeBytes(of: storage) { (rawPtr) -> Element in
                let offset = MemoryLayout<Element>.stride &* position
                return rawPtr.load(fromByteOffset: offset, as: Element.self)
            }
        }
        set {
            let offset = MemoryLayout<Element>.stride &* position
            withUnsafeMutableBytes(of: &storage) { (rawMutPtr) -> Void in
                rawMutPtr.storeBytes(of: newValue,
                                     toByteOffset: offset,
                                     as: Element.self)
            }
        }
    }
    init(repeating value: Element) {
        self.init() { _ in value }
    }
    init(arrayLiteral elements: Element...) {
        precondition(elements.count == Self.count, "Illegal element count")
        self.init() { elements[$0] }
    }
}


struct FSA2<Element> : FixedSizeArray {
    var storage: (Element, Element)
    init(elementForIndex efi: (Int) -> Element) {
        storage = (efi(0), efi(1))
    }
}
// ... Add eg FSA1, FSA3, FSA4, ... as needed ...
struct FSA16<Element> : FixedSizeArray {
    var storage: (Element, Element, Element, Element,
                  Element, Element, Element, Element,
                  Element, Element, Element, Element,
                  Element, Element, Element, Element)
    init(elementForIndex efi: (Int) -> Element) {
        storage = (
            efi( 0), efi( 1), efi( 2), efi( 3),
            efi( 4), efi( 5), efi( 6), efi( 7),
            efi( 8), efi( 9), efi(10), efi(11),
            efi(12), efi(13), efi(14), efi(15)
        )
    }
}

3 Likes

You could copy the internal _FixedArray16 from the standard library, or use the older GYB version (see apple/swift#25131) to generate your own sizes.

1 Like

I don't understand why it has that var _count: Int 8 property stored as an extra byte after the storage though:

internal struct _FixedArray16<T> {
  // ABI TODO: This makes assumptions about tuple layout in the ABI, namely that
  // they are laid out contiguously and individually addressable (i.e. strided).
  //
  internal var storage: (
    // A 16-wide tuple of type T
    T, T, T, T, T, T, T, T,
    T, T, T, T, T, T, T, T
  )

  var _count: Int8
}

Is it there to make the fixed size array have a dynamic size? :P

It also makes _FixedArray16<Element> have a stride that is not simply 16 times the element stride.

I'm sure there are good reasons for this in the stdlib but it's not the kind of fixed size array that I have needed.

I have a somewhat-improved fixed-size buffer pattern that I've been using in some projects, which I can make available on a Swift Numerics branch for people to experiment with. I'll clean it up and document it a bit first, then post back here once it's up.

14 Likes

Here's my prototype (which I have used for a bunch of experiments, but should still be considered quite experimental); feedback appreciated. It definitely makes some design decisions that are possibly a bit more general than needed for most people, but are things that I found myself wanting to have:

Jordan posted a lovely alternative take on Twitter the other day, which makes a different set of design decisions wrapping essentially the same mechanism: Compiler Explorer

9 Likes

It wasn't there when I wrote _FixedArray originally, and the reasons for adding it weren't great ("it's called Array not Buffer!"). It isn't actually used by anything currently. It can be removed with no change to existing behavior: [Draft] Demonstrate dropping count field from FixedArray

A different minimal approach is in _SmallBuffer from e.g. here. That holds 64 bytes (a common cache line size) of trivial data (not statically enforced, see _invariantCheck()). It is not written as a (Mutable)Collection, though it could be (with count = capacity). If you go this route you'd want to add the relevant withUnsafe[Mutable]Pointer/Bytes as well.

_SmallBuffer was meant to provide exactly what String normalization buffers needed (which are the only users of _FixedArray), but they never switched over. Those buffers are only needed to facilitate efficient interaction with UTF-16-based ICU for normalization, which we plan on replacing at some point anyways.

Don't ask about _UIntBuffer or _ValidUTF8Buffer, which unlike these others are burned into Swift's ABI forever. They were sorted at the bottom of the ABI stability priority list.

3 Likes

When I was working with this kind of approach back in pre-Swift 4, type checking and optimization time was prohibitively slow. Are you finding any issues so far? How is the quality of codegen?

Generally pretty good in release! Note that it's important that everything except the inits goes through bare pointer arithmetic (and the inits only don't because they can't--though even there it might be better to zero-init first and then write through the pointer for pod types). The nested types are really only used to get an object with the right size and layout.

Still some optimization issues if you try to use it for larger arrays, it tries writing the whole tuple to the stack in the hot loop for some reason once the size gets above 10... Compiler Explorer

That seems like a fairly basic missed compiler optimization that should be "relatively easy" to fix (since it happens with both the tuple and my weird type one, it's mostly likely just an LLVM threshold that's tuned wrong). The main thing is getting the semantics right in the source language; we can fix these codegen bugs "anytime". Note that for lengths less than 11, you do get nice vectorized codegen (Compiler Explorer), even with the gross types:

output.testOptimization(output.Adjoin<output.Adjoin<output.Adjoin<output.Array1<Swift.Int>, output.Array1<Swift.Int>>, output.Adjoin<output.Array1<Swift.Int>, output.Array1<Swift.Int>>>, output.Adjoin<output.Adjoin<output.Array1<Swift.Int>, output.Array1<Swift.Int>>, output.Adjoin<output.Array1<Swift.Int>, output.Array1<Swift.Int>>>>) -> Swift.Int:
        push    rbp
        mov     rbp, rsp
        movdqu  xmm0, xmmword ptr [rdi]      // load elements 0 and 1
        movdqu  xmm1, xmmword ptr [rdi + 16] // load elements 2 and 3
        movdqu  xmm2, xmmword ptr [rdi + 32] // load elements 4 and 5
        paddq   xmm2, xmm0                   // [4+0, 5+1]
        movdqu  xmm0, xmmword ptr [rdi + 48] // load elements 6 and 7
        paddq   xmm0, xmm1                   // [6+2, 7+3]
        paddq   xmm0, xmm2                   // [6+2+4+0, 7+3+5+1]
        pshufd  xmm1, xmm0, 78               // [7+3+5+1, don't care]
        paddq   xmm1, xmm0                   // [7+3+5+1+6+2+4+0, don't care]
        movq    rax, xmm1                    // move to result register
        pop     rbp
        ret

Note also that mine has the escape hatch of withUnsafe[Mutable]BufferPointer, which lets you avoid the issue you're seeing above size 10 in the meantime, while we wait for the optimizer to get smarter.

1 Like

I'm curious of what you think about the need for and/or possibility of adding some small language feature to make working with homogeneous tuples less cumbersome.

  1. There is no nice / general way of writing them:

    • We can use textual repetition, manually or via eg gyb:

      typealias ArrayOf12Floats = (Float, Float, Float, Float, Float, Float, Float, Float, Float, Float, Float, Float)
      
    • or we can use bridged C types like:

      typedef float ArrayOf12Floats[12];
      
    • or we can use various recursive workarounds like the Adjoin struct in your prototype (which leads to nested tuple types (with the same memory layout as the simplest unnested form)).

  2. And to initialize them we have to jump through similar hoops.


I'm probably wrong, but it feels like some small natural addition to the language could remove the above two pain points, and thereby make something like your prototype and the one by @jrose work as a way to officially support static arrays in Swift.

1 Like

That's great stuff,@scanon! I had yet another set of criteria to achieve recently, explained at the top of this gist, which also has my code.

I can imagine several advantages you might be deriving from the binary tree representation you're using, but I'd really like to hear your rationale. I suspect, unfortunately, that it's incompatible with my goal of being able to do insert and remove with a stable set of types.

Looking forward to hearing more,
Dave

2 Likes

Yeah, insert/remove is an anti-goal for most of my purposes; I think you're right that it's incompatible with what you're doing, but at a quick glance your gist looks pretty reasonable to me.

Thanks, Steve. But can you say more about the tree representation? I'm particularly interested to know whether you feel there's something important (e.g. for optimization) about the exact tree structures you chose, or whether you're just trying to keep type proliferation down and help the inliner.

It was purely a consequence of needing only power-of-two sizes at first, then going back to fill in some others, then deciding to throw the rest in. Since everything except inits happens (under the covers) through unsafe pointers, the actual type-system representation of the storage doesn't matter much--you can conform anything made up of the correct type with the right layout to the protocol and it'll work.

I I've been waiting fixed size array for 2 years now and as I can see, it's not for tomorrow.
I'm working on a DVD decoder. Let me show you a little part of the IFO file header struct.

UInt32		sectorMenuCellAddressTable;
	UInt32		sectorMenuVOBUAddressMap;
	UInt32		sectorTitleCellAddressTable;		// VTS olny
	UInt32		sectorTitleVOBUAddressMap;			// VTS only
	UInt8		unused5[24];
	DataVideoAttributes	menuVideoAttributes;
	UInt16		nbMenuAudioStreams;
	DataAudioAttributes	menuAudioAttributes[MAX_AUDIO_STREAMS];
	UInt8		reserved1[16];
	UInt16		nbMenuSubpictureStreams;			// 0 ou 1
	DataSubpictureAttributes	menuSubpictureAttributes;
	UInt8		reserved2[164];
	DataVideoAttributes	titleVideoAttributes;						// VTS only

For now, the only solution I have is a Obj-C wrapper to access the struct. If Swift wants replace C a day, fixed size array is a must have.

3 Likes

I have a use case for a fixed size array and found this. But there seems to be a codegen regression on Swift 5.3 and later. In Swift 5.2, I see the same codegen as in your post, but with Swift 5.3 (Compiler Explorer), we get this:

output.testOptimization(output.Adjoin<output.Adjoin<output.Adjoin<output.Array1<Swift.Int>, output.Array1<Swift.Int>>, output.Adjoin<output.Array1<Swift.Int>, output.Array1<Swift.Int>>>, output.Adjoin<output.Adjoin<output.Array1<Swift.Int>, output.Array1<Swift.Int>>, output.Adjoin<output.Array1<Swift.Int>, output.Array1<Swift.Int>>>>) -> Swift.Int:
    push    rbp
    mov     rbp, rsp
    movups  xmm0, xmmword ptr [rdi]
    movaps  xmmword ptr [rbp - 64], xmm0
    mov     rax, qword ptr [rbp - 64]
    movups  xmm0, xmmword ptr [rdi]
    movaps  xmmword ptr [rbp - 64], xmm0
    add     rax, qword ptr [rbp - 56]
    movups  xmm0, xmmword ptr [rdi + 16]
    movaps  xmmword ptr [rbp - 48], xmm0
    add     rax, qword ptr [rbp - 48]
    movups  xmm0, xmmword ptr [rdi + 16]
    movaps  xmmword ptr [rbp - 48], xmm0
    add     rax, qword ptr [rbp - 40]
    movups  xmm0, xmmword ptr [rdi + 32]
    movaps  xmmword ptr [rbp - 32], xmm0
    add     rax, qword ptr [rbp - 32]
    movups  xmm0, xmmword ptr [rdi + 32]
    movaps  xmmword ptr [rbp - 32], xmm0
    add     rax, qword ptr [rbp - 24]
    movups  xmm0, xmmword ptr [rdi + 48]
    movaps  xmmword ptr [rbp - 16], xmm0
    add     rax, qword ptr [rbp - 16]
    movups  xmm0, xmmword ptr [rdi + 48]
    movaps  xmmword ptr [rbp - 16], xmm0
    add     rax, qword ptr [rbp - 8]
    pop     rbp
    ret
1 Like

Thanks! Please file bug reports for any perf regressions like this in the future. In this case, I went ahead and did it for you. FWIW, this is almost surely a regression in LLVM's vectorization cost model, rather than "Swift" per se, but it obviously has bad effects here.

4 Likes