# Add average to FloatingPoint arrays

Swift currently does not provide a built-in way to get the average of an array.
I propose to add an `average` property on `Array` where `Element` is `FloatingPoint`.

The reasoning behind only adding this to `FloatingPoint` arrays is that the type of the result can then be the same as the `Element` of the array. Adding an `average` property to an `Int` `Array` would make the type of `average` ambiguous (and returning the average as an `Int` seems kinda weird to me since that might lose precision).

Since the implementation of average naturally uses the sum of the array, it might make sense to also add a `sum` property, although that might be out of scope for this pitch.

## Usage

``````let numbers = [1.0, 2.0, 3.0] // inferred to be of type [Double]
let average = numbers.average // average is also Double
print(average) // prints 2.0
``````

## Implementation:

``````extension Array where Element: FloatingPoint {

var sum: Element {
return reduce(0, +)
}

var average: Element {
guard !isEmpty else {
return 0
}
return sum / Element(count)
}

}
``````

If this pitch is well received, I’d be willing to write up an official proposal.

2 Likes

The average of an empty array isn’t 0. It’s undefined. I think a function like this should either give an error for an empty array or return an optional.

1 Like

Good point! I think returning an optional makes the most sense.

Some thoughts:

• Adding the numbers and dividing by the count is notoriously inaccurate for lengthy arrays, because once the sum gets large then the small bits of the individual numbers are lost.

• There is a method known as compensated summation that avoids these errors.

• There is also an algorithm for calculating both the average and the variance in a single pass, which can work for any `Sequence`, not just `Array`.

• Frameworks like `Accelerate` have fine-tuned functions for things like this.

• Personally, in several of my projects I have a `Statistic` struct, which takes in a sequence of numbers and stores the min, max, mean, and variance.

• For small arrays and everyday usage, it is easy to write, eg, `x.reduce(0, +) / Double(x.count)`. Unless you are doing serious numerical work, that is good enough for most purposes.

It would certainly be convenient to write `x.average()`, however I am not convinced it belongs in the standard library.

11 Likes

That’s what `nan` is for.

2 Likes

Why limit it to `Array`? It could be useful as a `Sequence` extension.

Here’s what I use in my own projects:

``````extension Sequence where Element: BinaryFloatingPoint {
func average() -> Element {
var i: Element = 0
var total: Element = 0

for value in self {
total = total + value
i += 1
}

}
}
``````

-1 to the pitch and all code examples above that compute `sum` and then divide by `n`.

There are various statistics you might want to gather from a sequence of numbers: Mean, Variance, Standard Deviation, Skewness, Kurtosis
The proposed addition of `average` does not meet the high bar of

Have a look at the above quoted post to see what it takes to extend standard library.

The `reduce` method from `Sequence` protocol is fully sufficient for implementing these properly using the non-naive method in single pass.

@Nevin, would you mind sharing your `Statistic` struct with the community?

1 Like

Sure, no problem. It’s pretty bare-bones:

``````struct Statistics<T: FloatingPoint> {
private var mean  : T = 0
private var ssqDev: T = 0

private(set) var count: T = 0
private(set) var min  : T = +.infinity
private(set) var max  : T = -.infinity

var average : T { return (count > 0) ? mean : .nan }
var variance: T { return (count > 1) ? ssqDev / (count - 1) : .nan }
var standardDeviation: T { return variance.squareRoot() }

init() {}
init<S: Sequence>(_ values: S) where S.Element == T {
}

mutating func addValues<S: Sequence>(_ values: S) where S.Element == T {
for x in values { addValue(x) }
}

mutating func addValue(_ value: T) {
count += 1
min = Swift.min(min, value)
max = Swift.max(max, value)

let diff = value - mean
let frac = diff / count
mean    += frac
ssqDev  += diff * (diff - frac)
}
}
``````

For my use-cases, I needed unbiased sample variance, hence Bessel’s correction. Also, there exists an error-compensating version of the “`addValue`” algorithm, but I didn’t need it so I went with the simple approach.

Tangentially, notice that I had to write “`Swift.min()`”, because of SR–2450.

1 Like

Just to illustrate how to use this with reduce, if it didn’t provide the convenience method `addValues` would be:

``````let stats = seq.reduce(into:Statistics()) { \$0.addValue(\$1) }
print(stats.mean)
``````

Right?

I have written something very similar to Pavol for my own use. It would be a great addition to the standard library, along with other common reduce structs.

Java has a load of pre-provided reduce classes (actually collect which is reduce on steroids): https://docs.oracle.com/javase/8/docs/api/java/util/stream/Collectors.html. It would be nice to have a selection pre-written to save the trouble of everyone writing their own (and to get high quality implementations).

Add this to the list of threads wishing we had a good maths/stats library. It seems to be something important to many people in the community, and there are several abandoned attempts floating around GitHub (sorry), but none of them seemed to get any broader community involvement beyond the author.

Perhaps that’s the problem - maybe whoever wants to lead this effort should start with a call for participants and get the community invested from the start. I guess the more people who know about it and work on it, the less likely it is to become abandoned.

1 Like

From skimming, I think most of these already exist in Swift in some form except the statistical ones that are the subject of this thread. Is there something else missing?

For my own code I didn’t make it open source because that would be extra work and as things stand currently with Swift unlikely to get much traction. I think the unlikeliness of traction for an individuals GitHub project is multifaceted:

1. SPM is not yet mature enough for people to commit to using, but sufficiently mature to put people of from using Carthage, Cocoapods etc.

2. SPM isn’t integrated into Xcode.

3. There is no discovery/advertising mechanism in SPM and GitHub searching isn’t great.

4. Swift isn’t ABI stable.

5. There is no versioning system in Swift.

6. There is no method/process of establishing a third-party project as useful and then transferring it into the standard library or yet to be started extended library (or whatever it will be called if it ever exists).

2 Likes