i have code that clears a byte in an SIMD16<UInt8>
value given an element index 0 ...< 14
. However, no matter how I spell the assignment (do the assignment in a local var and then write the whole SIMD16<UInt8>
value back into the array, or do the assignment through a pointer lvalue), the generated assembly seems to do silly things like copy the entire vector onto the stack, clear the byte in memory, then reload the entire vector back into a register, then write the entire vector back into memory again. For example, this pointer lvalue-based spelling generates this:
let destination:UnsafeMutablePointer<SIMD16<UInt8>> =
(buffer + base).assumingMemoryBound(to: SIMD16<UInt8>.self)
destination.pointee[i - 1] = 0
// index calculation
addq $-1, %rbx
cmpq $15, %rbx
ja .LBB4_23
// write vector to stack (vector was previously loaded into xmm1)
movdqa %xmm1, -32(%rbp)
// completely redundant index clamp
andl $15, %ebx
// clear the relevant byte in the vector in the stack
movb $0, -32(%rbp,%rbx)
// load the vector back into a register
movdqa -32(%rbp), %xmm0
// write the vector to its proper memory location
movdqa %xmm0, (%rdx,%r14)
the version written with read-mutate-writeback semantics generates the exact same assembly.
var tags:SIMD16<UInt8> =
buffer.load(fromByteOffset: base, as: SIMD16<UInt8>.self)
tags[i - 1] = 0
buffer.storeBytes(of: tags, toByteOffset: base, as: SIMD16<UInt8>.self)
how can i get Swift to do the simple thing and just assign the relevant byte in memory where it already lives?
note this is not about trying to assign an element of an SIMD value in a register, i already know that’s not efficient.