Because the following initializer defined in the SceneKit overlay, which you are using, is not marked inlinable:
extension SIMD4 where Scalar == Float {
public init(_ v: SCNVector4) {
self.init(Float(v.x), Float(v.y), Float(v.z), Float(v.w))
}
}
That's not a Swift performance bug, it's a library performance bug. The compiler is required by the overlay to generate a call.
Note that you're also not doing an apples-to-apples comparison (though this is a relatively minor detail); on the platform you're targeting, SCNVector4 is a vector of four doubles, so converting to simd_float4 and back requires an actual conversion operation, while your scalar code stays in double the whole time.
Note also that because of ABI considerations, vectorization isn't actually profitable for a stand-alone add function on SceneKit quaternions, because they're defined as a struct of four CGFloat, which are passed in xmm0, xmm1, xmm2, xmm3, etc. Assembling them into a contiguous register in order to do SIMD arithmetic is less efficient than just adding them as scalars. Nonetheless, simply making the definitions visible to the compiler produces essentially optimal code with this limitation in mind:
import SceneKit
extension simd_double4 {
init(_ other: SCNVector4) {
self = simd_double4(Double(other.x), Double(other.y), Double(other.z), Double(other.w))
}
var scnv4: SCNVector4 {
SCNVector4(x: CGFloat(self.x), y: CGFloat(self.y), z: CGFloat(self.z), w: CGFloat(self.w))
}
}
func add(a: SCNVector4, b: SCNVector4) -> SCNVector4 {
(simd_double4(a) + simd_double4(b)).scnv4
}
_$s3addAA1a1bSo10SCNVector4VAE_AEtF: // add(a: SCNVector4, b: SCNVector4) -> SCNVector4
0000000100003f90 pushq %rbp
0000000100003f91 movq %rsp, %rbp
0000000100003f94 addsd %xmm4, %xmm0
0000000100003f98 addsd %xmm5, %xmm1
0000000100003f9c addsd %xmm6, %xmm2
0000000100003fa0 addsd %xmm7, %xmm3
0000000100003fa4 popq %rbp
0000000100003fa5 retq
If we directly use simd_quatf or simd_quatd instead, which are passed contiguously in SIMD registers, we get something nicer:
_$s3addAA1a1bSo10simd_quatdaAE_AEtF: // add(a: simd_quatd, b: simd_quatd) -> simd_quatd
0000000100003f90 pushq %rbp
0000000100003f91 movq %rsp, %rbp
0000000100003f94 addpd %xmm2, %xmm0
0000000100003f98 addpd %xmm3, %xmm1
0000000100003f9c popq %rbp
0000000100003f9d retq
_$s3addAA1a1bSo10simd_quatfaAE_AEtF: // add(a: simd_quatf, b: simd_quatf) -> simd_quatf
0000000100003fa0 pushq %rbp
0000000100003fa1 movq %rsp, %rbp
0000000100003fa4 addps %xmm1, %xmm0
0000000100003fa7 popq %rbp
0000000100003fa8 retq
Basically, everything you are seeing is a necessary result of how the types and operations are defined and exposed in SceneKit, rather than Swift compiler limitations. There are some very real compiler limitations around SIMD performance still, but these are not they.