Using SIMD types for vector addition

scanon · October 3, 2024, 6:27pm

The vDSP library is about 30 years old (and its direct predecessors were a decade or two older than that); optimizing compilers were not always as good as they are today. If someone were writing a vDSP-like library from scratch today, they might not include some of these operations (or they might--a function call is still a little bit tidier than a for loop, depending on your taste).

There's also an aspect of future-proofing. If Apple launches the M3000 processor tomorrow, with a special "add 1000 doubles at once" instruction, Accelerate will be updated to take advantage of that and your app will benefit automatically.¹ Autovectorizer work in the compiler sometimes lags somewhat behind what hand-tuned libraries can provide, and you would at least have to recompile your app to benefit, plus handle detecting the instruction and falling back to a different implementation.

¹ This example is contrived, but only a little. If, when Apple shipped the first 64b Intel MacBook Pro in 2007, you compiled an app that used vDSP_vaddD (the C equivalent of your vDSP.add() call), and then never updated it, it would have gone twice as fast (cycle for cycle) when Haswell shipped in 2013, and then twice as fast again on an AVX-512 machine without you ever needing to recompile the binary, without you needing to select between different implementations for different micro-architectures, or even be aware that it was possible, and without users needing to do anything. That has real value.