Simd vector types

scanon · September 18, 2018, 2:11am

If you program for macOS or iOS, you may already be familiar with the <simd/simd.h> module, which provides a set of architecture-agnostic "simd vector" and matrix types for C, Objective-C, and C++. A limited subset of that functionality is currently exposed to Swift via the simd overlay.

[A note on the terminology "simd vector": what I mean by this is a small (<= 64B) fixed-size "vector" that provides elementwise arithmetic and comparisons as well as member access and a small set of additional operations. These are not general fixed-size arrays (though they are fixed-size), nor are they precisely elements of a mathematical vector space (though they provide all of those operations). It is best to think of them as just another fundamental data type supported by modern CPU architectures. This family of types and functions makes a uniform semantics available for all target architectures.]

I plan to expose the full set of functionality (plus some additional goodies that are hard to do well in C) as a part of the swift standard library. This pitch is the first step of that process.

There's a work-in progress branch here that you're welcome to play with: https://github.com/stephentyrone/swift/tree/simd

For 5.0, I would like to implement the following two pieces:
(a) getting the full set of basic vector types into the stdlib.
(b) teaching the importer how to map between clang ext_vectors in C / C++ and Swift.

This pitch is mainly about part (a). The set of types defined are as follows:

Signed and unsigned integer vector types. These have names of the form [U]IntN.VectorM. E.g. an Int8.Vector16 is a vector of 16 Int8s. A UInt32.Vector8 is a vector of 8 UInt32s.
Floating-point vector types. These are Float.VectorM and Double.VectorM.
"Predicate" vector types. These represent masks that result from elementwise comparison operations. These have the form SIMDPredicateNxM (this is a placeholder name, but the specific name is not too consequential because you rarely use these types explicitly).

Arbitrary vectors sizes are not supported with any of these types; there is support for vectors of size 2, 3, 4, 8, 16, 32, or 64, with an upper bound of 64B on the total size of the vector. This covers 99% of use cases, but sorry, no support for Vector7.

There is also an associated protocol hierarchy; the base is SIMDVector, which conforms to Collection, Hashable, etc, and also requires some useful inits and a Predicate associatedtype. The bulk of these conformances can be defaulted. This is refined by the following three protocols:

SIMDIntegerVector: adds comparisons, leading/trailing zeros, popcount, bitwise operations, smart and masking shifts, masking arithmetic operations, ability to create random vectors.
SIMDFloatingPointVector: adds comparisons, arithmetic, rounding, ability to create random vectors.
SIMDPredicate: adds boolean operations, ability to create random vectors.

There is also a set of protocols of the form SIMDVectorN, which are used to enforce constraints that some heterogenous operations on vectors require vectors of the same size, as well as provide accessors for halves of vectors and the x, y, z, w components of 2, 3 and 4 element vectors.

There's quite a bit of other functionality implemented on the types themselves, and I'll be filling in missing details over the next week or two, but I would like to go ahead and get the pitch process started now.

This first post is pretty high-level because I don't know which specific details will interest people. Please let me know what you would like to know more about and I'll write more low-level explanations.

rxwei · September 18, 2018, 2:54am

Why are these called "predicate vectors"? "Predicate" sounds like a function, but the proposed "predicate vector" isn't a function.

ben-cohen · September 18, 2018, 3:26am

Yeah, something like "mask vector" is probably less prone to confusion.

xwu · September 18, 2018, 9:30am

Painting the shed here: I think names like Int32x4 are sufficiently well established that we can use those in Swift to good effect.

scanon · September 18, 2018, 11:24am

the only real downside to that pattern is ambiguity with longer vectors; is an Int16x8 a vector of 16 Int8 or vector of 8 Int16?

scanon · September 18, 2018, 11:26am

Sure, “mask” would also work.

"Predicate" is commonly used as a term of art for these (and is accurate in the sense that they represent a function from vector index to true/false), but "mask" is also used, and is maybe less confusing to people who don't come from a vector programming background.

nuclearace · September 18, 2018, 11:39am

I see Int16 as being tightly bound together, since IntX is a natural thing to a Swift programer. So I would think it should fall out that Int16x8 is Int16, 8 times. If that isn't true for a type named like that, then yes, I would say it's misnamed.

scanon · September 18, 2018, 12:34pm

That's "obviously" the correct interpretation, but it's much, much less obvious than Int16.Vector8 is.

Torust · September 18, 2018, 12:34pm

Thanks for pushing this forward – it looks great!

One note about the naming – I personally like the longer names (the Float.VectorM etc. variants), with the reason being that any code that frequently uses these types will usually typealias their most used types to something shorter. I don't want to be writing Float.Vector4 all over my code; I'll probably use Vec4f or something similar instead, and I'm fine to define that name within my own modules.

Do you have any thoughts on adding swizzles for the small vectors? I can understand the reasons why you wouldn't – code size impact and compilation time – but they are very useful to have.

scanon · September 18, 2018, 12:46pm

Well, we're not going to provide all 336 2-4 element swizzles as vars =)

The prototype already has the four most heavily used cases in .[low,high,even,odd]Half. For more general permutes/shuffles, I'm planning to expose something like:

static func gather<Dictionary, Indices>(from dictionary: Dictionary, at indices: Indices) -> Self
where Dictionary : SIMDVector,
      Dictionary.Element == Self.Element,
      Indices : SIMDIntegerVector & SIMDVectorN

With the expectation that people who frequently need specific swizzles would then define them in their code in terms of this operation. I'm still trying out a few different versions of how this operation should be structured, so they're not available on the branch just yet (I'll update when they are).

taylorswift · September 18, 2018, 4:00pm

will integer subscripting like vector[i] be allowed? Then swizzles like vector.zw could be written like vector[2, 3] and only up to M subscripts would have to be defined. Also I’m not sure how much benefit there is to spelling out “Vector” in the type name could Int64.V4 or Int64.x4 be used instead?

scanon · September 18, 2018, 4:07pm

Integer subscripting is already supported (they conform to RandomAccessCollection with Index = Int). Multiple subscripts gets pretty unwieldy with vectors of more than 4 elements, but it's not impossible as a solution, either.

What does abbreviating Vector actually buy you? Swift generally favors spelling out type names explicitly.

taylorswift · September 18, 2018, 4:10pm

yes, but at least the number of subscripts grows with M instead of M^M

scanon · September 18, 2018, 4:11pm

Well, it grows like log(M)^2, since you want to be able to construct an N-element vector from an M-element vector, but yes, it's much better than M^M =)

taylorswift · September 18, 2018, 4:14pm

very little, and i wouldn’t bring this up if it was already in Swift but I don’t see what information the word Vector in the type name would bring that isn’t already obvious so I see no reason to choose a longer name over a shorter one here

taylorswift · September 18, 2018, 4:16pm

wait why is it log(M)^2 i thought you only need [i0, i1], [i0, i1, i2], ..., [i0, i1, i2, ..., i(M-1)] which has M - 1 subscripts

scanon · September 18, 2018, 4:17pm

The main reason to be somewhat explicit with Vector is because I expect we will add other such types down the road, like Float.Matrix4x4 or Float.Quaternion.

We could definitely shorten it to something like .Vec4, though, while still retaining that flexibility.

taylorswift · September 18, 2018, 4:18pm

I thought the idea was Float.VecM would be the base type and stuff like Float.Quaternion would just be a wrapper around a Float.Vec4

scanon · September 18, 2018, 4:19pm

I don't see how those implementation details effect the naming.

taylorswift · September 18, 2018, 4:41pm

i guess what i noticed in the standard library is names get shorter the lower-level you go? like it’s UInt32 not UnsignedInteger32

,, also, idk if this is helpful but I went through the vector.swift files in my projects to find my most used vector functionality and this is what I came up with:

enum Math<N>
{
    typealias V2 = (x:N, y:N)
    typealias V3 = (x:N, y:N, z:N)
    typealias V4 = (x:N, y:N, z:N, w:N)
    
    typealias Mat3 = (V3, V3, V3)
    typealias Mat4 = (V4, V4, V4, V4)

    static
    func copy(_ v:V2, to pointer:UnsafeMutablePointer<N>)

    static
    func copy(_ v:V3, to pointer:UnsafeMutablePointer<N>)

    static
    func copy(_ v:V4, to pointer:UnsafeMutablePointer<N>)

    static
    func copy(_ v:Mat4, to pointer:UnsafeMutablePointer<N>)

    static
    func load(from pointer:UnsafeMutablePointer<N>) -> V2

    static
    func load(from pointer:UnsafeMutablePointer<N>) -> V3

    static
    func load(from pointer:UnsafeMutablePointer<N>) -> V4
}

extension Math where N:Numeric
{ 
    static
    func sum(_ v:V2) -> N

    static
    func sum(_ v:V3) -> N

    static
    func add(_ v1:V2, _ v2:V2) -> V2

    static
    func add(_ v1:V3, _ v2:V3) -> V3

    static 
    func sub(_ v1:V2, _ v2:V2) -> V2

    static
    func sub(_ v1:V3, _ v2:V3) -> V3

    static
    func vol(_ v:V2) -> N // { return v.x * v.y }

    static
    func vol(_ v:V3) -> N // { return v.x * v.y * v.z }

    static
    func mult(_ v1:V2, _ v2:V2) -> V2

    static
    func mult(_ v1:V3, _ v2:V3) -> V3

    static
    func scale(_ v:V2, by c:N) -> V2

    static
    func scale(_ v:V3, by c:N) -> V3

    static
    func dot(_ v1:V2, _ v2:V2) -> N

    static
    func dot(_ v1:V3, _ v2:V3) -> N

    static
    func dot(_ v1:V4, _ v2:V4) -> N

    static
    func eusq(_ v:V2) -> N // { return v.x * v.x + v.y * v.y }

    static
    func eusq(_ v:V3) -> N // { return v.x * v.x + v.y * v.y + v.z * v.z }

    static
    func cross(_ v1:V2, _ v2:V2) -> N

    static
    func cross(_ v1:V3, _ v2:V3) -> V3


    static
    func mat3(from M:Mat4) -> Mat3
    
    static
    func transpose(_ M:Mat3) -> Mat3

    static
    func transpose(_ M:Mat4) -> Mat4

    static
    func mult(_ A:Mat3, _ v:V3) -> V3

    static
    func mult(_ A:Mat3, _ B:Mat3) -> Mat3

    static
    func mult(_ A:Mat4, _ v:V4) -> V4

    static
    func mult(_ A:Mat4, _ B:Mat4) -> Mat4

    static
    func homogenize(_ v:V2) -> V3 // { return (v.x, v.y, 1) }

    static
    func homogenize(_ v:V3) -> V4
}

extension Math where N:Numeric, N:Comparable 
{
    // compares magnitudes without doing .squareRoot()
    static
    func test(_ v:V2, lessThan r:N) -> Bool

    static
    func test(_ v:V3, lessThan r:N) -> Bool

    static
    func test(_ v:V2, lessEqual r:N) -> Bool

    static
    func test(_ v:V3, lessEqual r:N) -> Bool
}

extension Math where N:SignedNumeric
{
    static
    func neg(_ v:V2) -> V2 // { return (-v.x, -v.y) }

    static
    func neg(_ v:V3) -> V3
}
extension Math where N:FloatingPoint
{
    static
    func abs(_ v:V2) -> V2

    static
    func abs(_ v:V3) -> V3
   
   // another day: we should add this to scalar types too...
    static
    func clamp(_ v:N, to range:ClosedRange<N> = 0 ... 1) -> N

    static
    func clamp(_ v:V2) -> V2

    static
    func clamp(_ v:V3) -> V3
}
extension Math where N:SignedNumeric, N.Magnitude == N
{
    static
    func abs(_ v:V2) -> V2

    static
    func abs(_ v:V3) -> V3
}
extension Math where N:Comparable, N:SignedNumeric
{
    static
    func abs(_ v:V2) -> V2

    static
    func abs(_ v:V3) -> V3
}

extension Math where N:BinaryFloatingPoint
{
    static
    func cast<I>(_ v:V2, as _:I.Type) -> Math<I>.V2 where I:BinaryInteger

    static
    func cast<I>(_ v:V3, as _:I.Type) -> Math<I>.V3 where I:BinaryInteger
}
extension Math where N:BinaryInteger
{
    static
    func cast<I>(_ v:V2, as _:I.Type) -> Math<I>.V2 where I:BinaryInteger

    static
    func cast<I>(_ v:V3, as _:I.Type) -> Math<I>.V3 where I:BinaryInteger

    static
    func cast<F>(_ v:V2, as _:F.Type) -> Math<F>.V2 where F:FloatingPoint

    static
    func cast<F>(_ v:V3, as _:F.Type) -> Math<F>.V3 where F:FloatingPoint

    static
    func idiv(_ dividend:V2, by divisor:V2) -> Math<(N, N)>.V2

    static
    func idiv(_ dividend:V3, by divisor:V3) -> Math<(N, N)>.V3
}

extension Math where N:FloatingPoint
{
    static
    func reciprocal(_ v:V2) -> V2

    static
    func reciprocal(_ v:V3) -> V3

    static
    func div(_ v1:V2, _ v2:V2) -> V2

    static
    func div(_ v1:V3, _ v2:V3) -> V3

    static
    func madd(_ v1:V2, _ v2:V2, _ v3:V2) -> V2

    static
    func madd(_ v1:V3, _ v2:V3, _ v3:V3) -> V3

    static
    func scadd(_ v1:V2, _ v2:V2, _ c:N) -> V2

    static
    func scadd(_ v1:V3, _ v2:V3, _ c:N) -> V3

    // another thing that would be useful on scalar types too
    static
    func lerp(_ v1:N, _ v2:N, _ t:N) -> N

    static
    func lerp(_ v1:V2, _ v2:V2, _ t:N) -> V2

    static
    func lerp(_ v1:V3, _ v2:V3, _ t:N) -> V3

    static
    func length(_ v:V2) -> N

    static
    func length(_ v:V3) -> N

    static
    func normalize(_ v:V2) -> V2

    static
    func normalize(_ v:V3) -> V3
}
extension Math where N:BinaryFloatingPoint
{
    static
    func cast<F>(_ v:V2, as _:F.Type) -> Math<F>.V2 where F:BinaryFloatingPoint

    static
    func cast<F>(_ v:V3, as _:F.Type) -> Math<F>.V3 where F:BinaryFloatingPoint
}

extension Array
{
    mutating
    func append(vector:Math<Element>.V2)

    mutating
    func append(vector:Math<Element>.V3)
}

obviously the spelling is gonna be different depending on what people like (how do we feel about static methods?) but these are the operations i’ve found very useful when using vectors up to M = 4