I finally upgraded from Intel to Apple Silicon with a M4 MacBook Air. I was curious about the performance of Accelerate on the new Mac so I ran some benchmarks on large vector addition.
Swift example
Below is a basic Swift example that adds two arrays of doubles. The size of the arrays is defined by n
which is 800,000,000. The arrays a
and b
are initialized with repeated doubles. The for-loop adds each element of the arrays and stores the result in the c
array. The first and last item in the c
array is printed to check the result.
// Swift example
// swiftc basic.swift -Ounchecked -o build/basic && ./build/basic
func main() {
let n = 800_000_000
let a = Array(repeating: 2.5, count: n)
let b = Array(repeating: 1.88, count: n)
var c = Array(repeating: 0.0, count: n)
for i in 0..<n {
c[i] = a[i] + b[i]
}
print("first", c[0], "last", c[n-1])
}
main()
Accelerate examples
Here is the same example using vDSP
from the Accelerate framework.
// Accelerate vDSP example
// swiftc accel.swift -Ounchecked -o build/accel && ./build/accel
import Accelerate
func main() {
let n = 800_000_000
let a = Array(repeating: 2.5, count: n)
let b = Array(repeating: 1.88, count: n)
var c = Array(repeating: 0.0, count: n)
vDSP.add(a, b, result: &c)
print("first", c[0], "last", c[n-1])
}
main()
Here is the same example using BLAS daxpy
from the Accelerate framework. The cblas_daxpy
function overwrites the result array but I still create three arrays to keep the number of arrays consistent with the previous examples.
// Accelerate BLAS example
// swiftc blas.swift -Xcc -DACCELERATE_NEW_LAPACK -Ounchecked -o build/blas && ./build/blas
import Accelerate
func main() {
let n = 800_000_000
let a = Array(repeating: 2.5, count: n)
let _ = Array(repeating: 1.88, count: n)
var c = Array(repeating: 1.88, count: n)
cblas_daxpy(Int32(n), 1.0, a, 1, &c, 1)
print("first", c[0], "last", c[n-1])
}
main()
Benchmarks
I ran some benchmarks of the above examples using the hyperfine tool. The Makefile
shown below was used to run the benchmarks. Here are the computer specs where I ran the benchmarks:
- MacBook Air 15-inch, M4, 2025
- macOS Sequoia 15.5 arm64
- CPU: Apple M4 (10) @ 4.46 GHz
- GPU: Apple M4 (10) @ 1.47 GHz [Integrated]
- Memory: 32.00 GiB
- swift-driver version: 1.120.5 Apple Swift version 6.1.2, swiftlang-6.1.2.1.2, clang-1700.0.13.5
- Target: arm64-apple-macosx15.0
# Makefile
benchmark:
mkdir -p build
swiftc basic.swift -Ounchecked -o build/basic
swiftc accel.swift -Ounchecked -o build/accel
swiftc blas.swift -Xcc -DACCELERATE_NEW_LAPACK -Ounchecked -o build/blas
hyperfine --warmup 3 'build/basic' 'build/accel' 'build/blas'
clean:
rm -rf build
Below are the results from the benchmarks. And yes, the benchmarks include the time to create the arrays.
mkdir -p build
swiftc basic.swift -Ounchecked -o build/basic
swiftc accel.swift -Ounchecked -o build/accel
swiftc blas.swift -Xcc -DACCELERATE_NEW_LAPACK -Ounchecked -o build/blas
hyperfine --warmup 3 'build/basic' 'build/accel' 'build/blas'
Benchmark 1: build/basic
Time (mean ± Ļ): 1.659 s ± 0.004 s [User: 0.643 s, System: 1.016 s]
Range (min ⦠max): 1.651 s ⦠1.667 s 10 runs
Benchmark 2: build/accel
Time (mean ± Ļ): 1.645 s ± 0.003 s [User: 0.650 s, System: 0.994 s]
Range (min ⦠max): 1.640 s ⦠1.649 s 10 runs
Benchmark 3: build/blas
Time (mean ± Ļ): 1.594 s ± 0.004 s [User: 0.671 s, System: 0.923 s]
Range (min ⦠max): 1.585 s ⦠1.599 s 10 runs
Summary
build/blas ran
1.03 ± 0.00 times faster than build/accel
1.04 ± 0.00 times faster than build/basic
I also benchmarked the examples where the code performs the addition operation several times to make sure more time is spent doing the addition than creating the arrays (see below). But even with these changes the benchmark timings were similar.
// Swift addition
for _ in 0..<100 {
for i in 0..<n {
c[i] = a[i] + b[i]
}
}
// Accelerate addition
for _ in 0..<100 {
vDSP.add(a, b, result: &c)
}
Questions
Based on the examples, I don't see any major performance gains from Accelerate compared to using plain Swift code when compiling with -Ounchecked
. I tried other large vector arithmetic operations and saw similar benchmark results. When I had my old Intel MacBook Pro, there were noticeable performance benefits with Accelerate, but I don't see these benefits with the Apple Silicon MacBook Air. So what is going on here? Is Swift code more efficient on the new M-series Macs and therefore negates the use of the Accelerate framework for certain operations? Are there certain compiler options for Accelerate that I need to use to take advantage of the Apple Silicon architecture?