Implementing generic mathematical function on Collection Types

Nevin · November 4, 2021, 3:14pm

Well, you can write your sum function against AdditiveArithmetic, but you’ll still need two separate versions of average because Double does not have an initializer that takes an AdditiveArithmetic instance.

Conversely, you could make a new protocol to encapsulate the idea of “convertible to Double”.

• • •

However, the reality of calculating an average (or even a sum) is quite a bit more intricate than you may realize.

When you sum a collection of signed integers, the partial sums could spuriously overflow even though the final result does not. For example, [100 as Int8, 100, -100, -100] clearly “should” sum to zero, but the first two elements add to 200, which overflows the Int8 type.

And when you sum a collection of floating-point numbers, significant bits could be lost due to order-of-magnitude differences. For example, [16_777_216 as Float, 1, -16_777_216] clearly “should” sum to one, but a naive calculation results in zero.

I have written more about these issues in this thread, including an implementation for integers that solves the first problem, and an implementation for floats that mitigates the second.

• • •

Furthermore, on the subject of average specifically, there exist so-called “online” algorithms for floating-point types, which don’t require calculating the full sum. A similar algorithm exists for the variance / standard deviation, and both can be done with compensated (Kahan) summation to reduce the accumulation of errors.

A while back I wrote a very simple implementation to calculate basic statistics. This version does not perform error correction, but such a change is not too difficult.