Implementing a few statistics functions in Swift


(Georgios Moschovitis) #1

Hey everyone,

I would like to implement a few statistics functions in Swift (e.g. variance, standardDeviation, etc) that are computed over a collection.

I am aware of this library:

https://github.com/evgenyneu/SigmaSwiftStatistics

My problem is that it only supports Doubles and Arrays. Also the API doesn’t look very ‘swifty' to me.

I am wondering how would someone implement such functionality in a more generic way: to allow usage of multiple collections (even custom, e.g. a RingBuffer) and multiple value types (e.g. Decimal, Double). Extra points for being 'swifty'.

Thanks in advance for any ideas.

-g.


(Michael Ilseman) #2

Hey everyone,

I would like to implement a few statistics functions in Swift (e.g. variance, standardDeviation, etc) that are computed over a collection.

I am aware of this library:

https://github.com/evgenyneu/SigmaSwiftStatistics

My problem is that it only supports Doubles and Arrays. Also the API doesn’t look very ‘swifty' to me.

You might find this library to be more Swifty: https://github.com/harlanhaskins/Probably

It’s not as generic as possible nor has all the features you might need, but the author is very responsive to feedback.

···

On Oct 8, 2016, at 11:29 AM, Georgios Moschovitis via swift-users <swift-users@swift.org> wrote:

I am wondering how would someone implement such functionality in a more generic way: to allow usage of multiple collections (even custom, e.g. a RingBuffer) and multiple value types (e.g. Decimal, Double). Extra points for being 'swifty'.

Thanks in advance for any ideas.

-g.
_______________________________________________
swift-users mailing list
swift-users@swift.org
https://lists.swift.org/mailman/listinfo/swift-users


(Harlan Haskins) #3

Oh yeah, I'd love contributions and feedback! I'm essentially implementing this as I learn things in stats 101 so it's probably woefully inadequate. :sweat_smile:

-- Harlan

···

On Oct 10, 2016, at 1:04 PM, Michael Ilseman <milseman@apple.com> wrote:

On Oct 8, 2016, at 11:29 AM, Georgios Moschovitis via swift-users <swift-users@swift.org> wrote:

Hey everyone,

I would like to implement a few statistics functions in Swift (e.g. variance, standardDeviation, etc) that are computed over a collection.

I am aware of this library:

https://github.com/evgenyneu/SigmaSwiftStatistics

My problem is that it only supports Doubles and Arrays. Also the API doesn’t look very ‘swifty' to me.

You might find this library to be more Swifty: https://github.com/harlanhaskins/Probably

It’s not as generic as possible nor has all the features you might need, but the author is very responsive to feedback.

I am wondering how would someone implement such functionality in a more generic way: to allow usage of multiple collections (even custom, e.g. a RingBuffer) and multiple value types (e.g. Decimal, Double). Extra points for being 'swifty'.

Thanks in advance for any ideas.

-g.
_______________________________________________
swift-users mailing list
swift-users@swift.org
https://lists.swift.org/mailman/listinfo/swift-users


#4

I rolled my own (rather simple) statistics struct. It had been using Double
and Array, but I just went back and made it work generically with
FloatingPoint and Sequence. Here’s what it looks like:

struct Statistic<Number: FloatingPoint> {
    private var ssqDev: Number = 0
    private(set) var count: Number = 0
    private(set) var average: Number = 0
    private(set) var maximum: Number = Number.infinity
    private(set) var minimum: Number = -Number.infinity

    var variance: Number { return ssqDev / (count - 1) }
    var standardDeviation: Number { return sqrt(variance) }

    init() {}

    init<T: Sequence> (values: T) where T.Iterator.Element == Number {
        addValues(values)
    }

    mutating func addValues<T: Sequence> (_ vals: T) where
T.Iterator.Element == Number {
        for val in vals { addValue(val) }
    }

    mutating func addValue(_ value: Number) {
        count += 1 as Number
        let diff = value - average
        let frac = diff / count
        average += frac
        ssqDev += diff * (diff - frac)
        minimum = min(minimum, value)
        maximum = max(maximum, value)
    }
}

(Sorry for the lack of syntax highlighting—Gmail strips the formatting when
I paste it.)

Some notes:

• The approach is to look at each data point once and keep the statistics
correct for the numbers seen so far. This saves memory if the values are
being computed or fetched, since you don’t need to store them. However it
also means that the median cannot be found.

• The calculation to update “average” and “ssqDev” is simplified from the
online-algorithm
<https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online_algorithm>
found on Wikipedia. (“ssqDev” stores the sum of squared deviations from the
mean, which is just the sample variance times the count.)

• If you want to ignore NaN’s, just add “if value.isNaN { return }” at the
top of “addValue”.

• The “as Number” coercion shouldn’t be necessary, but I was getting an
“ambiguous use of +=” error without it.

• All occurrences of “count” were originally “n”, which was private, and I
had a computed “count” that just returned Int(n). But when I switched from
“Double” to “Number: FloatingPoint” I lost the ability to write “Int(n)”.

Nevin

···

On Mon, Oct 10, 2016 at 1:13 PM, Harlan Haskins via swift-users < swift-users@swift.org> wrote:

Oh yeah, I'd love contributions and feedback! I'm essentially implementing
this as I learn things in stats 101 so it's probably woefully inadequate. :sweat_smile:

-- Harlan

On Oct 10, 2016, at 1:04 PM, Michael Ilseman <milseman@apple.com> wrote:

On Oct 8, 2016, at 11:29 AM, Georgios Moschovitis via swift-users < > swift-users@swift.org> wrote:

Hey everyone,

I would like to implement a few statistics functions in Swift (e.g.
variance, standardDeviation, etc) that are computed over a collection.

I am aware of this library:

https://github.com/evgenyneu/SigmaSwiftStatistics

My problem is that it only supports Doubles and Arrays. Also the API
doesn’t look very ‘swifty' to me.

You might find this library to be more Swifty: https://github.com/
harlanhaskins/Probably

It’s not as generic as possible nor has all the features you might need,
but the author is very responsive to feedback.

I am wondering how would someone implement such functionality in a more
generic way: to allow usage of multiple collections (even custom, e.g. a
RingBuffer) and multiple value types (e.g. Decimal, Double). Extra points
for being 'swifty'.

Thanks in advance for any ideas.

-g.
_______________________________________________
swift-users mailing list
swift-users@swift.org
https://lists.swift.org/mailman/listinfo/swift-users

_______________________________________________
swift-users mailing list
swift-users@swift.org
https://lists.swift.org/mailman/listinfo/swift-users


#5

…and of course I typoed the minus sign when switching things over. That
should be:

    private(set) var minimum: Number = Number.infinity
    private(set) var maximum: Number = -Number.infinity

Nevin

···

On Mon, Oct 10, 2016 at 4:02 PM, Nevin Brackett-Rozinsky < nevin.brackettrozinsky@gmail.com> wrote:

I rolled my own (rather simple) statistics struct. It had been using
Double and Array, but I just went back and made it work generically with
FloatingPoint and Sequence. Here’s what it looks like:

struct Statistic<Number: FloatingPoint> {
    private var ssqDev: Number = 0
    private(set) var count: Number = 0
    private(set) var average: Number = 0
    private(set) var maximum: Number = Number.infinity
    private(set) var minimum: Number = -Number.infinity

    var variance: Number { return ssqDev / (count - 1) }
    var standardDeviation: Number { return sqrt(variance) }

    init() {}

    init<T: Sequence> (values: T) where T.Iterator.Element == Number {
        addValues(values)
    }

    mutating func addValues<T: Sequence> (_ vals: T) where
T.Iterator.Element == Number {
        for val in vals { addValue(val) }
    }

    mutating func addValue(_ value: Number) {
        count += 1 as Number
        let diff = value - average
        let frac = diff / count
        average += frac
        ssqDev += diff * (diff - frac)
        minimum = min(minimum, value)
        maximum = max(maximum, value)
    }
}

(Sorry for the lack of syntax highlighting—Gmail strips the formatting
when I paste it.)

Some notes:

• The approach is to look at each data point once and keep the statistics
correct for the numbers seen so far. This saves memory if the values are
being computed or fetched, since you don’t need to store them. However it
also means that the median cannot be found.

• The calculation to update “average” and “ssqDev” is simplified from the
online-algorithm
<https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online_algorithm>
found on Wikipedia. (“ssqDev” stores the sum of squared deviations from the
mean, which is just the sample variance times the count.)

• If you want to ignore NaN’s, just add “if value.isNaN { return }” at the
top of “addValue”.

• The “as Number” coercion shouldn’t be necessary, but I was getting an
“ambiguous use of +=” error without it.

• All occurrences of “count” were originally “n”, which was private, and I
had a computed “count” that just returned Int(n). But when I switched from
“Double” to “Number: FloatingPoint” I lost the ability to write “Int(n)”.

Nevin

On Mon, Oct 10, 2016 at 1:13 PM, Harlan Haskins via swift-users < > swift-users@swift.org> wrote:

Oh yeah, I'd love contributions and feedback! I'm essentially
implementing this as I learn things in stats 101 so it's probably woefully
inadequate. :sweat_smile:

-- Harlan

On Oct 10, 2016, at 1:04 PM, Michael Ilseman <milseman@apple.com> wrote:

On Oct 8, 2016, at 11:29 AM, Georgios Moschovitis via swift-users < >> swift-users@swift.org> wrote:

Hey everyone,

I would like to implement a few statistics functions in Swift (e.g.
variance, standardDeviation, etc) that are computed over a collection.

I am aware of this library:

https://github.com/evgenyneu/SigmaSwiftStatistics

My problem is that it only supports Doubles and Arrays. Also the API
doesn’t look very ‘swifty' to me.

You might find this library to be more Swifty: https://github.com/har
lanhaskins/Probably

It’s not as generic as possible nor has all the features you might need,
but the author is very responsive to feedback.

I am wondering how would someone implement such functionality in a more
generic way: to allow usage of multiple collections (even custom, e.g. a
RingBuffer) and multiple value types (e.g. Decimal, Double). Extra points
for being 'swifty'.

Thanks in advance for any ideas.

-g.
_______________________________________________
swift-users mailing list
swift-users@swift.org
https://lists.swift.org/mailman/listinfo/swift-users

_______________________________________________
swift-users mailing list
swift-users@swift.org
https://lists.swift.org/mailman/listinfo/swift-users


(Georgios Moschovitis) #6

OK there seem to be some useful ideas in these two examples, let me study them :slight_smile:

Thank you.