Use a protocol to reduce code and documentation for numeric functions

wigging · November 27, 2024, 4:55pm

I'm working on a linear algebra package that relies on Accelerate, BLAS, and LAPACK for performing vector and matrix operations. Consequently, most of the operations use functions that are only applicable to a certain value type such as Float or Double. This makes for a lot of redundant code and documentation because I need to provide documentation for each function for each specific value type. The example below demonstrates this.

Example without a protocol

Here is a generic vector structure that uses a flat array as the underlying value storage.

// Vector.swift

struct Vector<T> {

    let size: Int
    var values: [T]

    init(_ values: [T]) {
        self.size = values.count
        self.values = values
    }

    init(like vector: Self) {
        self.size = vector.size
        self.values = vector.values
    }
}

The code below uses the vector struct to perform element-wise vector addition for Int, Float, and Double values. The Accelerate vDSP.add function only supports Float and Double values so a loop is used for vectors that contain Int values. Basically, the same function has been defined for each value type that I want to support. The documentation comments for each function are also similar. This is a lot of duplicate code which generates a lot of redundant documentation. I would like to have just one function that handles the different value types and therefore one function with documentation comments.

// Addition.swift

import Accelerate

/// Add two vectors using integer values.
/// - Parameters:
///   - a: The first vector.
///   - b: The second vector.
/// - Returns: The result vector.
func add(_ a: Vector<Int>, _ b: Vector<Int>) -> Vector<Int> {
    var result = [Int](repeating: 0, count: a.size)
    for i in 0..<a.size {
        result[i] = a.values[i] + b.values[i]
    }
    return Vector(result)
}

/// Add two vectors using single-precision values.
/// - Parameters:
///   - a: The first vector.
///   - b: The second vector.
/// - Returns: The result vector.
func add(_ a: Vector<Float>, _ b: Vector<Float>) -> Vector<Float> {
    var vec = Vector(like: a)
    vDSP.add(a.values, b.values, result: &vec.values)
    return vec
}

/// Add two vectors using double-precision values.
/// - Parameters:
///   - a: The first vector.
///   - b: The second vector.
/// - Returns: The result vector.
func add(_ a: Vector<Double>, _ b: Vector<Double>) -> Vector<Double> {
    var vec = Vector(like: a)
    vDSP.add(a.values, b.values, result: &vec.values)
    return vec
}

Usage of this code is shown below along with the print output.

let a = Vector([1, 2, 3])  // integer
let b = Vector([9, 3, 4])
let c = add(a, b)
print(c)

let a1 = Vector<Float>([1, 2, 3])  // float
let b1 = Vector<Float>([9, 3, 4])
let c1 = add(a1, b1)
print(c1)

let a2 = Vector([1.9, 2, 3])  // double
let b2 = Vector([9, 3.8, 4])
let c2 = add(a2, b2)
print(c2)

Vector<Int>(size: 3, values: [10, 5, 7])
Vector<Float>(size: 3, values: [10.0, 5.0, 7.0])
Vector<Double>(size: 3, values: [10.9, 5.8, 7.0])

Example with protocol

The only solution (that I know of) to reduce redundant code and documentation is to use a protocol along with generic values. Using the same vector struct defined above, I can define a scalar protocol as follows:

// Scalar.swift

protocol Scalar {
    static func add(_ a: Vector<Self>, _ b: Vector<Self>) -> Vector<Self>
}

Using this protocol I can extend the Int, Float, and Double types with the add function for that particular type. See below for the Float and Double extensions.

// Float+Scalar.swift

import Accelerate

extension Float: Scalar {

    static func add(_ a: Vector<Self>, _ b: Vector<Self>) -> Vector<Self> {
        var vec = Vector(like: a)
        vDSP.add(a.values, b.values, result: &vec.values)
        return vec
    }
}

// Double+Scalar.swift

import Accelerate

extension Double: Scalar {

    static func add(_ a: Vector<Self>, _ b: Vector<Self>) -> Vector<Self> {
        var vec = Vector(like: a)
        vDSP.add(a.values, b.values, result: &vec.values)
        return vec
    }
}

This allows me to have one public add function and therefore one set of documentation comments which applies to all the supported types. See the add function below.

// Addition.swift

/// Element-wise addition of two vectors.
/// - Parameters:
///   - a: The first vector.
///   - b: The second vector.
/// - Returns: The result of adding two vectors.
func add<T: Scalar>(_ a: Vector<T>, _ b: Vector<T>) -> Vector<T> {
    T.add(a, b)
}

Usage of the above code and the print output is shown below.

let a = Vector<Float>([1, 2, 3, 4])
let b = Vector<Float>([4, 8, 5, 10])
let c = add(a, b)
print(c)

let a1 = Vector([1, 2, 3, 4.0])
let b1 = Vector<([4, 8, 5, 10.0])
let c1 = add(a1, b1)
print(c1)

Vector<Float>(size: 4, values: [5.0, 10.0, 8.0, 14.0])
Vector<Double>(size: 4, values: [5.0, 10.0, 8.0, 14.0])

This approach requires more internal code but the amount of public functions and documentation is greatly reduced.

Questions

I have several questions related to all of this:

Is using a protocol a good approach to reduce duplicate public code and documentation for this situation?
If I don't use the protocol approach, does DocC provide any features to produce one set of documentation for functions that handle different value types?
Are there any performance issues that I should be aware of when using the protocol approach compared to the non-protocol approach?
Is there a better name other than Scalar that I can use for the protocol implemented above? I thought about calling it Numeric but Swift already has a Numeric protocol.
Does the generic element in the vector struct need to conform to the protocol? For example, should I use struct Vector<T: Scalar> { ... } instead of struct Vector<T> { ... }?

Sajjon · November 27, 2024, 8:35pm

I think generics with protocol requirements is the way to go. But just to expand on the list of alternatives

I guess there exists two more approaches, but falling in under the category of “meta programming”:

Macros
GYB

Both solutions are quite different. Macros makes the package using them build much slower due to Swift Syntax being a heavy dependency. But is a more neat and Swifty solution.

GYB is more ancient tech, not all Swifty (it is Python based), but it has some nice advantages. It will not increase build time at all really. You GYB once and it will have generated the swift files for you which you git include.

wigging · November 27, 2024, 9:01pm

I have never heard of GYB so I'll read up on it using the link you shared. As for macros, I haven't learned much about them since they are very new in Swift. Do you know of any examples that might guide me to applying them to this type of problem?

Sajjon · November 27, 2024, 9:35pm

GYB is basically a simpler version of Sourcery used by Apple.

I’m bad at Swift macros (but decent at Rust macros), and maybe they fit badly here. I think you might be able to do the code gen you want with Swift macros, but Im unsure. In Rust you have two different kind of macros: macro_rules! and procmacros. Swift macros are more like Rusts procmacros - which are more
Powerful, but also more “end user facing” if you will. macro_rules! is more a meta programming tool - which I imagine is what you want. You wanna author and vendor the repetitive code - with max performance and with possibility of generating code doc. That is typically what I - in Rust - would use macro_rules! for. But I can’t imagine how using procmacros… ie not how using Swift macros. So I think I ought to amend my two suggestions:

GYB
Sourcery

EDIT:
I still think protocols and generics is the way to go. A minor advantage with metaprogramming is one can write slightly better Documentation IMO. And the function signature use concrete types only which might make them slightly simpler to grasp. But users of linear algebra packages are typically quite bright and ought to have to issues with generics…

(Apologies for the Rust talk, I’m mostly writing Rust nowadays. Let’s see if anyone great at Swift macros and Rust macros can give some pointers to how how would solve this using Swift macros)

Sajjon · November 28, 2024, 8:00am

Btw I think it is worth pointing out that generics get specializations of each possible type that fit the generic types generated at compile time - if Im not terribly mistaken - meaning that generics should not incur any runtime performance penalty.

CrystDragon · November 28, 2024, 10:46am

That's compiler's choice, I believe.

Though I think it should generally happen when the caller and callee are in the same module. But when the invocation is cross-module, you may need @inlinable / @alwaysEmitIntoClient ... sort of things. The options passed to the Swift compiler matter too.

wigging · December 1, 2024, 8:56pm

Can you elaborate on this? What is considered a module in Swift? Where in my example code that I posted above should I apply the @inlinable and @alwaysEmitIntoClient attributes?

eskimo · December 2, 2024, 11:53am

What is considered a module in Swift?

See The Swift Programming Language > Access Control > Modules, Source Files, and Packages. The key thing is that each bit of code within a module can ‘see’ any other bit of the code within the module, at least conceptually. In contrast, code in another module can only ‘see’ the interface to your module.

Where in my example code that I posted above should I apply the @inlinable …?

That’s a hard question to answer. Generally you won’t need @alwaysEmitIntoClient because that’s useful when dealing with backward compatibility stuff. Rather, you’ll likely need @inlinable and @usableFromInline. There’s a bunch of general info about those in The Swift Programming Language > Attributes.

To make sense of that you have to think like the compiler. Specifically, if the compiler is building a client module that’s separate from your module, and thus can’t see all the code in your module, what does it need to see to order to inline, and so effectively optimise, the client code?

Share and Enjoy

Quinn “The Eskimo!” @ DTS @ Apple

wigging · December 3, 2024, 1:59pm

Thank you for sharing the links. The documentation for inlinable and usableFromInline is lacking code examples of how to actually use the attributes. I'm very much a learn-by-example kind of person. Are there any code examples of these attributes that you recommend I look at to get a better understanding of how to apply them?

eskimo · December 4, 2024, 9:19am

IMO it’s always worth starting with the Swift Evolution proposal, which in this case is SE-0193 Cross-module inlining and specialization.

Beyond that, I’ve used these things myself but in a space that’s so different from yours that it’s probably not worth sharing. Regarding your space:

There are examples of this in Swift Numerics.
It wouldn’t surprise me to find examples in other popular packages that are in or adjacent to this space (big ints, crypto, and so on).

Share and Enjoy

Quinn “The Eskimo!” @ DTS @ Apple

wigging · December 6, 2024, 5:15pm

@scanon In swift-numerics there is heavy use of protocols for defining operations on single value types. Whereas here I'm discussing the use of protocols for containers like vectors, matrices, and shaped arrays for numerical values. Could you comment on using protocols for vector and matrix structs where the elements may be Float, Double, or some Complex number type?