Add average to FloatingPoint arrays


(Louis D'hauwe) #1

Swift currently does not provide a built-in way to get the average of an array.
I propose to add an average property on Array where Element is FloatingPoint.

The reasoning behind only adding this to FloatingPoint arrays is that the type of the result can then be the same as the Element of the array. Adding an average property to an Int Array would make the type of average ambiguous (and returning the average as an Int seems kinda weird to me since that might lose precision).

Since the implementation of average naturally uses the sum of the array, it might make sense to also add a sum property, although that might be out of scope for this pitch.

Usage

let numbers = [1.0, 2.0, 3.0] // inferred to be of type [Double]
let average = numbers.average // average is also Double
print(average) // prints 2.0

Implementation:

extension Array where Element: FloatingPoint {
	
	var sum: Element {
		return reduce(0, +)
	}

	var average: Element {
		guard !isEmpty else {
			return 0
		}
		return sum / Element(count)
	}

}

If this pitch is well received, I’d be willing to write up an official proposal.


(Adam Kemp) #2

The average of an empty array isn’t 0. It’s undefined. I think a function like this should either give an error for an empty array or return an optional.


(Louis D'hauwe) #3

Good point! I think returning an optional makes the most sense.


#4

Some thoughts:

• Adding the numbers and dividing by the count is notoriously inaccurate for lengthy arrays, because once the sum gets large then the small bits of the individual numbers are lost.

• There is a method known as compensated summation that avoids these errors.

• There is also an algorithm for calculating both the average and the variance in a single pass, which can work for any Sequence, not just Array.

• Frameworks like Accelerate have fine-tuned functions for things like this.

• Personally, in several of my projects I have a Statistic struct, which takes in a sequence of numbers and stores the min, max, mean, and variance.

• For small arrays and everyday usage, it is easy to write, eg, x.reduce(0, +) / Double(x.count). Unless you are doing serious numerical work, that is good enough for most purposes.

It would certainly be convenient to write x.average(), however I am not convinced it belongs in the standard library.


(Anthony Latsis) #5

That’s what nan is for.


(Sindre Sorhus) #6

Why limit it to Array? It could be useful as a Sequence extension.

Here’s what I use in my own projects:

extension Sequence where Element: BinaryFloatingPoint {
	func average() -> Element {
		var i: Element = 0
		var total: Element = 0

		for value in self {
			total = total + value
			i += 1
		}

		return total / i
	}
}

(Pavol Vaskovic) #7

-1 to the pitch and all code examples above that compute sum and then divide by n.

There are various statistics you might want to gather from a sequence of numbers: Mean, Variance, Standard Deviation, Skewness, Kurtosis
The proposed addition of average does not meet the high bar of

Have a look at the above quoted post to see what it takes to extend standard library.

The reduce method from Sequence protocol is fully sufficient for implementing these properly using the non-naive method in single pass.

@Nevin, would you mind sharing your Statistic struct with the community?


#8

Sure, no problem. It’s pretty bare-bones:

struct Statistics<T: FloatingPoint> {
    private var mean  : T = 0
    private var ssqDev: T = 0
    
    private(set) var count: T = 0
    private(set) var min  : T = +.infinity
    private(set) var max  : T = -.infinity
    
    var average : T { return (count > 0) ? mean : .nan }
    var variance: T { return (count > 1) ? ssqDev / (count - 1) : .nan }
    var standardDeviation: T { return variance.squareRoot() }
    
    init() {}
    init<S: Sequence>(_ values: S) where S.Element == T {
        addValues(values)
    }
    
    mutating func addValues<S: Sequence>(_ values: S) where S.Element == T {
        for x in values { addValue(x) }
    }
    
    mutating func addValue(_ value: T) {
        count += 1
        min = Swift.min(min, value)
        max = Swift.max(max, value)
        
        let diff = value - mean
        let frac = diff / count
        mean    += frac
        ssqDev  += diff * (diff - frac)
    }
}

For my use-cases, I needed unbiased sample variance, hence Bessel’s correction. Also, there exists an error-compensating version of the “addValue” algorithm, but I didn’t need it so I went with the simple approach.

Tangentially, notice that I had to write “Swift.min()”, because of SR–2450.


(Pavol Vaskovic) #9

Just to illustrate how to use this with reduce, if it didn’t provide the convenience method addValues would be:

let stats = seq.reduce(into:Statistics()) { $0.addValue($1) }
print(stats.mean)

Right?


(Howard Lovatt) #10

I have written something very similar to Pavol for my own use. It would be a great addition to the standard library, along with other common reduce structs.

Java has a load of pre-provided reduce classes (actually collect which is reduce on steroids): https://docs.oracle.com/javase/8/docs/api/java/util/stream/Collectors.html. It would be nice to have a selection pre-written to save the trouble of everyone writing their own (and to get high quality implementations).


(Karl) #11

Add this to the list of threads wishing we had a good maths/stats library. It seems to be something important to many people in the community, and there are several abandoned attempts floating around GitHub (sorry), but none of them seemed to get any broader community involvement beyond the author.

Perhaps that’s the problem - maybe whoever wants to lead this effort should start with a call for participants and get the community invested from the start. I guess the more people who know about it and work on it, the less likely it is to become abandoned.


#12

From skimming, I think most of these already exist in Swift in some form except the statistical ones that are the subject of this thread. Is there something else missing?


(Howard Lovatt) #13

For my own code I didn’t make it open source because that would be extra work and as things stand currently with Swift unlikely to get much traction. I think the unlikeliness of traction for an individuals GitHub project is multifaceted:

  1. SPM is not yet mature enough for people to commit to using, but sufficiently mature to put people of from using Carthage, Cocoapods etc.

  2. SPM isn’t integrated into Xcode.

  3. There is no discovery/advertising mechanism in SPM and GitHub searching isn’t great.

  4. Swift isn’t ABI stable.

  5. There is no versioning system in Swift.

  6. There is no method/process of establishing a third-party project as useful and then transferring it into the standard library or yet to be started extended library (or whatever it will be called if it ever exists).