I'm using Metal Performance Shaders to perform matrix multiplication as shown below. I know that such a small matrix space is not well suited for Metal; but I'm using small matrices so I can print the result to check the calculation.
// main.swift
import MetalPerformanceShaders
// Arrays and their rows and columns
let a: [Float] = [1, 2, 3,
4, 5, 6,
7, 8, 9]
let b: [Float] = [1, 2, 3,
4, 5, 6,
7, 8, 9]
let rowsA = 3
let columnsA = 3
let rowsB = columnsA
let columnsB = 3
let rowsC = rowsA
let columnsC = columnsB
// Setup the Metal matrices
guard let device = MTLCreateSystemDefaultDevice() else {
fatalError("Failed to get GPU (Metal device)")
}
let bufferA = device.makeBuffer(bytes: a, length: rowsA * columnsA * MemoryLayout<Float>.stride, options: [])!
let bufferB = device.makeBuffer(bytes: b, length: rowsB * columnsB * MemoryLayout<Float>.stride, options: [])!
let bufferC = device.makeBuffer(length: rowsC * columnsC * MemoryLayout<Float>.stride, options: [])!
let descA = MPSMatrixDescriptor(dimensions: rowsA, columns: columnsA, rowBytes: columnsA * MemoryLayout<Float>.stride, dataType: .float32)
let descB = MPSMatrixDescriptor(dimensions: rowsB, columns: columnsB, rowBytes: columnsB * MemoryLayout<Float>.stride, dataType: .float32)
let descC = MPSMatrixDescriptor(dimensions: rowsC, columns: columnsC, rowBytes: columnsC * MemoryLayout<Float>.stride, dataType: .float32)
let matrixA = MPSMatrix(buffer: bufferA, descriptor: descA)
let matrixB = MPSMatrix(buffer: bufferB, descriptor: descB)
let matrixC = MPSMatrix(buffer: bufferC, descriptor: descC)
// Perform matrix multiplication using Metal
let commandBuffer = device.makeCommandQueue()!.makeCommandBuffer()!
let mul = MPSMatrixMultiplication(device: device, resultRows: rowsC, resultColumns: columnsC, interiorColumns: columnsA)
mul.encode(commandBuffer: commandBuffer, leftMatrix: matrixA, rightMatrix: matrixB, resultMatrix: matrixC)
commandBuffer.commit()
commandBuffer.waitUntilCompleted()
// Print result
let rawPointer = matrixC.data.contents()
let floatPointer = rawPointer.bindMemory(to: Float.self, capacity: rowsC * columnsC)
let bufferPointer = UnsafeBufferPointer(start: floatPointer, count: rowsC * columnsC)
let arrayC = Array(bufferPointer)
for i in 0..<matrixC.rows {
for j in 0..<matrixC.columns {
print(arrayC[i * matrixC.columns + j], terminator: " ")
}
print("")
}
This prints the following:
30.0 36.0 42.0
66.0 81.0 96.0
102.0 126.0 150.0
The Metal docs suggest using the rowBytes() method to determine the recommended matrix row stride, in bytes, for a given number of columns. So I tried the following:
let rowBytesA = MPSMatrixDescriptor.rowBytes(forColumns: columnsA, dataType: .float32)
let rowBytesB = MPSMatrixDescriptor.rowBytes(forColumns: columnsB, dataType: .float32)
let rowBytesC = MPSMatrixDescriptor.rowBytes(forColumns: columnsC, dataType: .float32)
let bufferA = device.makeBuffer(bytes: a, length: rowsA * rowBytesA)!
let bufferB = device.makeBuffer(bytes: b, length: rowsB * rowBytesB)!
let bufferC = device.makeBuffer(length: rowsC * rowBytesC)!
let descA = MPSMatrixDescriptor(rows: rowsA, columns: columnsA, rowBytes: rowBytesA, dataType: .float32)
let descB = MPSMatrixDescriptor(rows: rowsB, columns: columnsB, rowBytes: rowBytesB, dataType: .float32)
let descC = MPSMatrixDescriptor(rows: rowsC, columns: columnsC, rowBytes: rowBytesC, dataType: .float32)
But this gives me a weird result such as:
38.0 14.0 4.441763e+08
0.0 98.0 46.0
1.0364113e+09 0.0 -nan(0x1fffff)
I think the rowBytes() size is different than the original array length in memory which is causing the wrong print values. But I'm not sure about this. Anyway, does anyone have suggestions on how to properly use the rowBytes() method?