Calculating the sum of the elements of a numeric sequence is a common task. This post is motivated by this article and and a recent answer to this SO question.

A vanilla way of summing is :

```
let a = Array(1...1_000_000)
var sum = 0
while i < a.count {
sum += a[i]
i += 1
}
```

I would suspect the optimizer to use loop-unrolling for such a basic loop. But in these benchmarks, manual unrolling is quicker :

```
while i < a.count - 4 {
sum += a[i] + a[i+1] + a[i+2] + a[i+3]
i += 4
}
for j in i..<a.count {
sum += a[j]
}
```

I've experimented with different sizes for the unrolling, `4`

to `6`

elements seems like a sweet spot.

It clocks the same or slightly better than `reduce`

:

```
sum = a.reduce(0,+)
```

I was wondering why is `reduce`

optimized but a normal loop isn't? Are my benchmarks valid at all? And is this related?

**Unrelated side note:**

Vectorization with `cblas_dasum`

is twice as fast as the other approaches but doesn't work with integers:

```
import Accelerate
var x = a.map { Double($0) }
let sum = cblas_dasum(Int32(x.count),
&x,
1)
```

(Benchmarks were run on my local machine in the terminal with optimizations)