Subscript for 4D and higher dimensions in a ShapedArray

wigging · August 6, 2024, 2:10pm

Below is a ShapedArray structure for working with 1D, 2D, and 3D arrays. The DataBuffer class holds a reference to the underlying mutable buffer for data storage. A flatIndex function determines the one-dimensional index for the subscript getter and setter properties.

struct ShapedArray<T> {
    
    let shape: [Int]
    private let data: DataBuffer<T>

    var buffer: UnsafeMutableBufferPointer<T> {
        get { self.data.buffer }
        set { self.data.buffer = newValue }
    }

    init(_ array: [T]) {
        self.shape = [array.count]
        self.data = DataBuffer(array: array)
    }

    init(_ array: [T], shape: Int...) {
        self.shape = shape
        self.data = DataBuffer(array: array)
    }

    init(_ array2D: [[T]]) {
        let rows = array2D.count
        let columns = array2D[0].count
        self.shape = [rows, columns]
        self.data = DataBuffer(array: array2D.flatMap { $0 })
    }

    subscript(index: Int...) -> T {
        get {
            let idx = flatIndex(index: index, shape: self.shape)
            return self.buffer[idx]
        }
        set {
            let idx = flatIndex(index: index, shape: self.shape)
            self.buffer[idx] = newValue
        }
    }
}

func flatIndex(index: [Int], shape: [Int]) -> Int {
    var idx = 0
    if shape.count == 1 {
        idx = index[0]    // index in 1D array
    } else if shape.count == 2 {
        let i = index[0]  // row index in 2D array
        let j = index[1]  // column column 2D array
        let ncols = shape[1]
        idx = (i * ncols) + j
    } else {
        let k = index[0]  // depth index in 3D array
        let i = index[1]  // row index in 3D array
        let j = index[2]  // column index in 3D array
        let nrows = shape[1]
        let ncols = shape[2]
        idx = (i * ncols) + j + (k * nrows * ncols)
    }
    return idx
}

class DataBuffer<T> {
    
    var buffer: UnsafeMutableBufferPointer<T>

    init(array: [T]) {
        self.buffer = UnsafeMutableBufferPointer<T>.allocate(capacity: array.count)
        _ = self.buffer.initialize(fromContentsOf: array)
    }

    deinit {
        self.buffer.deinitialize()
        self.buffer.deallocate()
    }
}

Here is a 3D array example:

let a = [1, 2, 3, 4, 5,
         6, 7, 8, 9, 10,
         
         11, 12, 13, 14 ,15,
         16, 17, 18, 19, 20]

let b = ShapedArray(a, shape: 2, 2, 5)

print("shape is \(b.shape)")
print(b[0, 1, 4])

And here is a 4D array example where the subscript does not work:

let a = [1, 2, 3, 4,
         5, 6, 7, 8,

         9, 10, 11, 12,
         13, 14, 15, 16,

         17, 18, 19, 20,
         21, 22, 23, 24,

         25, 26, 27, 28,
         29, 30, 31, 32,

         33, 34, 35, 36,
         37, 38, 39, 40,

         41, 42, 43, 44,
         45, 46, 47, 48]

let b = ShapedArray(a, shape: 2, 3, 2, 4)

The flatIndex function works fine for 1D, 2D, 3D arrays but how do I adapt it to handle higher dimensions like the 4D example shown above?

jrose · August 6, 2024, 4:20pm

Here is a rearrangement that might help:

(i * ncols) + j + (k * nrows * ncols)
(k * nrows * ncols) + (i * ncols) + j
((k) * nrows + i) * ncols + j

((index[0]) * shape[1] + index[1]) * shape[2] + index[2]

wigging · August 6, 2024, 5:03pm

In the example, i is the row index, j is the column index, and k is the depth index for a 3D array. The array is indexed as row-major order. So basically, the flat index formula is:

flat index = j + (i * ncols) + (k * ncols * nrows)

Are you suggesting that a 4D flat index would be something like what is shown below? The s would be the index for the fourth dimension.

flat index = j + (i * ncols) + (k * ncols * nrows) + (s * ncols * nrows)

Slava_Pestov · August 6, 2024, 5:04pm

If you call this function with an invalid shape.count, it behaves as if the count was 3. This might cause confusion for users later. Good coding style says that it is better to assert and crash deterministically, instead of silently propagating invalid values downstream.

However, if you look at your implementation of the 1, 2 and 3 cases, you will see that it generalizes to any dimension if you introduce a loop with an accumulator value.

Also, try to figure out the behavior when shape.count == 0. It turns out that this is a totally reasonable case. How many elements does such an "array" hold?

wigging · August 6, 2024, 6:23pm

This accumulates the k term and works for 2D and 3D. Not sure how to handle the other multiplication factors for 4D and higher. Need to replace (k * nrows * ncols) with some kind of loop too.

func flatIndex(indices: [Int], shape: [Int]) -> Int {
    let n = indices.count
    let i = indices[n - 2]    // row index in 2D array
    let j = indices[n - 1]    // column index in 2D array
    let nrows = shape[n - 2]  // total number of rows in 2D array
    let ncols = shape[n - 1]  // total number of columns in 2D array

    var k = 0

    for i in 0 ..< n - 2 {
        k += indices[i]
    }

    let idx = j + (i * ncols) + (k * nrows * ncols)
    return idx
}

Slava_Pestov · August 6, 2024, 6:32pm

You can start by eliminating j/j/nrows/ncols and replacing their usages with array subscripting, and then try to generalize the array subscripting into a loop.

wigging · August 6, 2024, 8:59pm

How about this? It works on 2D, 3D, and 4D so I assume it works for higher dimensions too. I haven't checked it with a 1D array yet.

func flatIndex(indices: [Int], shape: [Int]) -> Int {
    var index = 0
    var stride = 1
    for i in (0..<shape.count).reversed() {
        index += indices[i] * stride
        stride *= shape[i]
    }
    return index
}

Slava_Pestov · August 6, 2024, 9:19pm

Looks good to me. Now, what happens if shape and indices are both empty? :-)

wigging · August 6, 2024, 11:25pm

The flatIndex function is used within the subscript method of the ShapedArray struct shown in the original post. I don't think you can provide an empty subscript so I'm not worried about an empty shape or indices reaching the function.

Slava_Pestov · August 6, 2024, 11:29pm

You can declare a subscript with no parameters, or call a variadic subscript with no arguments:

struct S1 {
  subscript() -> Int { return 0 }
}

let s1 = S1()
print(s1[])

struct S2 {
  subscript(_: Int...) -> Int { return 0 }
}

let s2 = S2()
print(s2[])

Quedlinbug · August 6, 2024, 11:32pm

It would really help if Shape could be a parameter pack.

You can force that into the type …

typealias Each<Value, each Placeholder> = Value

struct ShapedArray<T, each Index> {
  typealias Repeat<Value> = (repeat Each<Value, each Index>)
  let shape: Repeat<Int>
  init(shape: Repeat<Int>) { self.shape = shape }
}

let shape = (2, 2, 5)
let shapedArray = ShapedArray<_, Never, Never, Never>(shape: shape)
#expect(shapedArray.shape == shape) ✅

…but there's not much you can do with it that won't crash the compiler.

wigging · August 7, 2024, 12:06am

For something like:

struct S1 {
    subscript() -> Int { return 0 }
}

Why would you define a subscript with no parameters?

For the variadic subscript:

struct S2 {
    subscript(_: Int...) -> Int { return 0 }
}

Can you just put a precondition statement that ensures one or more subscript values are provided (otherwise an error is thrown)?

Quedlinbug · August 7, 2024, 1:16am

It is no more valid to use a nonzero dimension of indices that doesn't match the shape than it is to use no index at all.

Slava_Pestov · August 7, 2024, 1:31am

It’s simpler to just avoid the special case.

A zero-dimensional array has shape [], and the number of elements in a multi-dimensional array is the product of the dimensions.

The product of an empty array is 1, so a zero-dimensional array has exactly one element; It’s at index [] and flat index 0.