New BLAS and LAPACK interfaces do not work well with Accelerate functions

wigging · May 8, 2024, 12:58pm

If you ignore the deprecation warnings about using the new BLAS and LAPACK interface for Accelerate, then you can use functions like cblas_cgemv as shown in the following example:

import Accelerate

let a = [DSPComplex(real: 1.0, imag: 2.0), DSPComplex(real: 3.0, imag: 4.0),
         DSPComplex(real: 5.0, imag: 6.0), DSPComplex(real: 7.0, imag: 8.0)]

let x = [DSPComplex(real: 1.0, imag: 2.0), DSPComplex(real: 3.0, imag: 4.0)]

let m: Int32 = 2
let n: Int32 = 2
let alpha = [DSPComplex(real: 1.0, imag: 0.0)]
let beta = [DSPComplex(real: 1.0, imag: 0.0)]

var y = [DSPComplex](repeating: DSPComplex(), count: Int(m))

cblas_cgemv(CblasRowMajor, CblasNoTrans, m, n, alpha, a, m, x, 1, beta, &y, 1)

To get rid of the warnings and use the new interface, you must use ACCELERATE_NEW_LAPACK=1 and ACCELERATE_LAPACK_ILP64=1 as preprocessor macros in Xcode projects or compile with the appropriate flags for swiftc. Using this new interface requires the above example to be written as shown below. You have to write several nested closures to get unsafe pointers to the variables. This feels like a downgrade compared to the previous example. Is there a cleaner way to work with the new BLAS and LAPACK interface or do we have to go back to a "pyramid of doom" approach?

import Accelerate

let a = [DSPComplex(real: 1.0, imag: 2.0), DSPComplex(real: 3.0, imag: 4.0),
         DSPComplex(real: 5.0, imag: 6.0), DSPComplex(real: 7.0, imag: 8.0)]

let x = [DSPComplex(real: 1.0, imag: 2.0), DSPComplex(real: 3.0, imag: 4.0)]

let m = 2                                       // rows in matrix A
let n = 2                                       // columns in matrix A
let alpha = [DSPComplex(real: 1.0, imag: 0.0)]  // scale factor for αAX
let beta = [DSPComplex(real: 1.0, imag: 0.0)]   // scale factor for βY

var y = [DSPComplex](repeating: DSPComplex(), count: m)

alpha.withUnsafeBufferPointer { alphaPtr in
    a.withUnsafeBufferPointer { aPtr in
        x.withUnsafeBufferPointer { xPtr in
            beta.withUnsafeBufferPointer { betaPtr in
                y.withUnsafeMutableBufferPointer { yPtr in
                    cblas_cgemv(
                        CblasRowMajor,
                        CblasNoTrans,
                        m,
                        n,
                        .init(alphaPtr.baseAddress!),
                        .init(aPtr.baseAddress!),
                        m,
                        .init(xPtr.baseAddress!),
                        1,
                        .init(betaPtr.baseAddress!),
                        .init(yPtr.baseAddress!),
                        1
                    )
                }
            }
        }
    }
}

gchilds · May 9, 2024, 9:33am

A lot of the pain in your sample comes directly from swift's Array class, so just don't use it!

Array manages a memory buffer for you, but the price is very high: you can only get access to the memory inside an awkward, poorly composable callback. Do your own memory management to get a more streamlined result:

// don't forget to .deallocate() us at some point!
let a_ = UnsafeMutablePointer<DSPComplex>.allocate(capacity: 4)
let x_ = UnsafeMutablePointer<DSPComplex>.allocate(capacity: 2)
let alpha_ = UnsafeMutablePointer<DSPComplex>.allocate(capacity: 1)
let beta_ = UnsafeMutablePointer<DSPComplex>.allocate(capacity: 1)
let y_ = UnsafeMutablePointer<DSPComplex>.allocate(capacity: Int(m))

// initialize manually or from an Array or some other way
a_[0] = DSPComplex(real: 1.0, imag: 2.0) // ...
// or
a.withUnsafeBufferPointer { aBuf in
   a_.update(from: aBuf.baseAddress!, count: aBuf.count)
}
// ...

// enjoy a [slightly ugly] one liner. I guess those `.init()`s are preferable to an `OpaquePointer()`. 
cblas_cgemv(CblasRowMajor, CblasNoTrans, m, n, .init(alpha_), .init(a_), m, .init(x_), 1, .init(beta_), .init(y_), 1)

If you're working with vectors and matrices, make a class for them and deallocate() the memory in deinit. Another advantage of abandoning Array is you will no longer need redundant initializations like this for data you know you are going to overwrite:

var y = [DSPComplex](repeating: DSPComplex(), count: Int(m))

p.s. I see the scalars alpha_ and beta_ - obviously it would be preferable to have them as plain old DSPComplexes instead of UnsafeMutablePointer<DSDPComplex> or [DSPComplex] but I can't figure out how to make it compile without a [smaller] pyramid of doom.

tera · May 9, 2024, 10:38am

In the previous thread I suggested using a wrapper cblas_cgemv_wrapper that effectively "reverts" the API to what it was prior to ACCELERATE_NEW_LAPACK=1 and ACCELERATE_LAPACK_ILP64=1. It's a bit pain to introduce such a wrapper (or a few if you need it for more than one function), but overall this seems the least painful solution. Note that the wrapper implementation doesn't need to use a pyramid of doom as it taking advantage of automatic "array to unsafe pointer function parameter" conversion done by Swift.

wigging · May 9, 2024, 3:43pm

I think the suggestion by @tera to use a wrapper function may be the cleanest approach to handle this while still using Array to define the matrix and vector. For context, you could do something like this to avoid the pyramid of doom:

import Accelerate

func cblas_cgemv_wrapper(
    _ order: CBLAS_ORDER,
    _ transpose: CBLAS_TRANSPOSE,
    _ m: Int,
    _ n: Int,
    _ alpha: UnsafeRawPointer,
    _ a: UnsafeRawPointer,
    _ lda: Int,
    _ x: UnsafeRawPointer,
    _ incX: Int,
    _ beta: UnsafeRawPointer,
    _ y: UnsafeRawPointer,
    _ incY: Int
) {
    cblas_cgemv(order, transpose, m, n, .init(alpha), .init(a), lda, .init(x), incX, .init(beta), .init(y), incY)
}

let a = [DSPComplex(real: -1.0, imag: 2.0), DSPComplex(real: 3.0, imag: 4.0),
         DSPComplex(real: -5.0, imag: 6.0), DSPComplex(real: 7.0, imag: -8.0)]

let x = [DSPComplex(real: -1.0, imag: 2.0), DSPComplex(real: 3.0, imag: 4.0)]

let m = 2
let n = 2
let alpha = [DSPComplex(real: 1.0, imag: 0.0)]
let beta = [DSPComplex(real: 1.0, imag: 0.0)]

let y = [DSPComplex](repeating: DSPComplex(real: 0.0, imag: 0.0), count: m)

cblas_cgemv_wrapper(CblasRowMajor, CblasNoTrans, m, n, alpha, a, m, x, 1, beta, y, 1)

jrose · May 9, 2024, 6:14pm

I’m confused by this, because none of Apple’s platforms are ILP64 (describing an architecture where CInt, CLong, and pointers all have the same bit width of 64). Is this just something I don’t know about Accelerate/LAPACK?

EDIT: presumably, but I’m still confused: BLAS | Apple Developer Documentation

EDIT2: the practical effect of this flag is to switch from using int as the integer type to long. So it means “pretend we’re on a platform where CInt is as big as CLong”, not “assume we’re on a platform where CInt is as big as CLong”.

wigging · May 9, 2024, 6:19pm

I was thinking about your comment regarding the use of Array. If I define a matrix type to perform matrix multiplication, then I would implement it as shown below. Notice the underlying values are stored as a generic array such as let values: [T].

import Accelerate

struct Matrix<T> {
    let rows: Int
    let columns: Int
    let values: [T]

    init(_ content: [[T]]) {
        self.rows = content.count
        self.columns = content[0].count
        self.values = content.flatMap { $0 }
    }

    init(rows: Int, columns: Int, values: [T]) {
        self.rows = rows
        self.columns = columns
        self.values = values
    }
}

func cblas_zgemm_wrapper(
    _ m: Int,
    _ n: Int,
    _ k: Int,
    _ alpha: UnsafeRawPointer,
    _ a: UnsafeRawPointer,
    _ lda: Int,
    _ b: UnsafeRawPointer,
    _ ldb: Int,
    _ beta: UnsafeRawPointer,
    _ c: UnsafeRawPointer,
    _ ldc: Int
) {
    cblas_zgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, m, m, k, .init(alpha), .init(a), lda, .init(b), ldb, .init(beta), .init(c), ldc)
}

extension Matrix {

    static func * (lhs: Matrix, rhs: Matrix) -> Matrix where T == Double {
        precondition(lhs.columns == rhs.rows, "Number of columns in left matrix must equal number of rows in right matrix")
        var a = lhs.values
        var b = rhs.values
        var c = [Double](repeating: 0.0, count: lhs.rows * rhs.columns)

        let m = lhs.rows     // rows in matrices A and C
        let n = rhs.columns  // columns in matrices B and C
        let k = lhs.columns  // columns in matrix A and rows in matrix B
        let alpha = 1.0
        let beta = 0.0

        // matrix multiplication where C ← αAB + βC
        cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, m, n, k, alpha, &a, k, &b, n, beta, &c, n)

        let mat = Matrix(rows: lhs.rows, columns: rhs.columns, values: c)
        return mat
    }

    static func * (lhs: Matrix, rhs: Matrix) -> Matrix where T == DSPDoubleComplex {
        precondition(lhs.columns == rhs.rows, "Number of columns in left matrix must equal number of rows in right matrix")
        let a = lhs.values
        let b = rhs.values
        let c = [DSPDoubleComplex](repeating: DSPDoubleComplex(), count: lhs.rows * rhs.columns)

        let m = lhs.rows     // rows in matrices A and C
        let n = rhs.columns  // columns in matrices B and C
        let k = lhs.columns  // columns in matrix A and rows in matrix B

        let alpha = [DSPDoubleComplex(real: 1.0, imag: 0.0)]
        let beta = [DSPDoubleComplex(real: 1.0, imag: 0.0)]

        cblas_zgemm_wrapper(m, n, k, alpha, a, k, b, n, beta, c, n)

        let mat = Matrix(rows: lhs.rows, columns: rhs.columns, values: c)
        return mat
    }
}

I would use the matrix type as shown here:

import Accelerate

let a = Matrix([[DSPDoubleComplex(real: 1.0, imag: 2.0), DSPDoubleComplex(real: 2.0, imag: 3.0)],
                [DSPDoubleComplex(real: 5.0, imag: 6.0), DSPDoubleComplex(real: 7.0, imag: 8.0)]])

let b = Matrix([[DSPDoubleComplex(real: 1.0, imag: 2.0), DSPDoubleComplex(real: 2.0, imag: 3.0)],
                [DSPDoubleComplex(real: 5.0, imag: 6.0), DSPDoubleComplex(real: 7.0, imag: 8.0)]])

let c = a * b

for i in 0..<c.rows {
    for j in 0..<c.columns {
        print(c.values[i * c.columns + j], terminator: " ")
    }
    print()
}

// This prints
// DSPDoubleComplex(real: -11.0, imag: 31.0) DSPDoubleComplex(real: -14.0, imag: 44.0)
// DSPDoubleComplex(real: -20.0, imag: 98.0) DSPDoubleComplex(real: -23.0, imag: 139.0)

So @gchilds, are you recommending that I don't use an array like let values: [T] for the underlying data storage? If you use an unsafe buffer pointer to store the values then how would you initialize the actual values of matrix A and B?

scanon · May 10, 2024, 1:34am

This has been the accepted nomenclature for LAPACK bindings with 64b indices for at least fifteen years or so. It has nothing to do with CInt/CLong, but rather whether the LAPACK integer type (which is usually a typedef in LAPACK C headers) is a 32 or 64b integer. It is not a very good name, but here we are. It's the name that developers expect.

See, e.g. the definition of lapack_int in Intel's LAPACKE interfaces: LAPACK: LAPACKE/include/lapacke_config.h Source File

ekrich · July 17, 2024, 11:22pm

I am working from an alternate world, bindings from Scala Native to C. What would be ideal, it seems even for Swift would be the addition of the LAPACKE interface to the new underlying LAPACK interface in Accelerate. This way macOS and Linux could use the same interface which is much more friendly than LAPACK with pointers to integers etc.