i have code that clears a byte in an
SIMD16<UInt8> value given an element index
0 ...< 14. However, no matter how I spell the assignment (do the assignment in a local var and then write the whole
SIMD16<UInt8> value back into the array, or do the assignment through a pointer lvalue), the generated assembly seems to do silly things like copy the entire vector onto the stack, clear the byte in memory, then reload the entire vector back into a register, then write the entire vector back into memory again. For example, this pointer lvalue-based spelling generates this:
let destination:UnsafeMutablePointer<SIMD16<UInt8>> = (buffer + base).assumingMemoryBound(to: SIMD16<UInt8>.self) destination.pointee[i - 1] = 0
// index calculation addq $-1, %rbx cmpq $15, %rbx ja .LBB4_23 // write vector to stack (vector was previously loaded into xmm1) movdqa %xmm1, -32(%rbp) // completely redundant index clamp andl $15, %ebx // clear the relevant byte in the vector in the stack movb $0, -32(%rbp,%rbx) // load the vector back into a register movdqa -32(%rbp), %xmm0 // write the vector to its proper memory location movdqa %xmm0, (%rdx,%r14)
the version written with read-mutate-writeback semantics generates the exact same assembly.
var tags:SIMD16<UInt8> = buffer.load(fromByteOffset: base, as: SIMD16<UInt8>.self) tags[i - 1] = 0 buffer.storeBytes(of: tags, toByteOffset: base, as: SIMD16<UInt8>.self)
how can i get Swift to do the simple thing and just assign the relevant byte in memory where it already lives?
note this is not about trying to assign an element of an SIMD value in a register, i already know that’s not efficient.