SIMD move-mask operation?

Maybe i just haven’t found it yet, but there doesn’t seem to be a move-mask SIMD operation in the Swift SIMD API. Is there a way to do this in Swift? I know this operation exists because Folly’s F14 dictionary uses it.

movemask isn't a thing that can be efficiently defined for all architectures (of particular interest, arm64 does not have an analogous instruction). So if you want to use it, you have to use the intrinsic. But, good news--one of the main motivations for importing the SIMD types is that it makes vector intrinsics available in Swift!

The bad news is that Intel's intrinsic types are kinda crazy for integer vectors (there's one integer vector type of each size, rather than differentiating on element size), so a little bit of conversion noise is unavoidable (but, if you're going to do a lot of this you can define a few helper extensions to make it nicer). E.g.:

#if arch(x86_64)

import _Builtin_intrinsics.intel

func movemask(_ vec: SIMD16<Int8>) -> UInt16 {
    UInt16(truncatingIfNeeded:
      _mm_movemask_epi8(unsafeBitCast(vec, to: SIMD2<Int64>.self))
    )
}

#endif

I think there's some room for a small library that wraps the intrinsics like this and makes them a bit nicer to use for people who need finer-grained control than the more generic SIMD interface.

2 Likes

hmm, how bad would it be if i just wrote something equivalent in plain swift like this:

func find(_ key:UInt8, in vector:SIMD16<UInt8>, _ body:(Int) -> ())
{
    // (key: 5, vector: (1, 5, 1, 1, 5, 5, 1, 1, 1, 1, 1, 1, 5, 1, 1, 5))
    let places:SIMD16<UInt8>    = 
        .init(128, 64, 32, 16, 8, 4, 2, 1, 128, 64, 32, 16, 8, 4, 2, 1),
        match:SIMD16<UInt8>     = places.replacing(with: 0, where: vector .!= key)
    // match: ( 0, 64,  0,  0,  8,  4,  0,  0,  0,  0,  0,  0,  8,  0,  0,  1)
    let r8:SIMD8<UInt8> =    match.evenHalf |    match.oddHalf, 
        r4:SIMD4<UInt8> =       r8.evenHalf |       r8.oddHalf,
        r2:SIMD2<UInt8> =       r4.evenHalf |       r4.oddHalf
    let r:UInt16        = .init(r2.x) << 8  | .init(r2.y)
    // r: 0b0100_1100_0000_1001
    var i:Int = r.leadingZeroBitCount
    while i < 16
    {
        body(i)
        i += 1 + ((r << i) << 1).leadingZeroBitCount
    }
}

the assembly doesn’t look very concise, but i wonder if you have any ideas for how to improve it?