Pitch: Method to sum numeric arrays

igorkulman · May 5, 2019, 10:39am

Currently working on a on app with a lot of math to compute different statistics I find myself summing different arrays all the time.

Coming from C# I expected some kind of .sum() method like LINQ has, but there is nothing like that in Swift.

Current state

I currently sum the arrays using .reduce(0,+) which I do not mind by the code is not that readable

let total = expenses.map({ $0.amount }).reduce(0,+)

Bad imperative code

Another bigger problem than readability is that I encounter code like this from other developers all the time:

var sum = 0
for expense in expenses {
    sum = sum + expense.amount
}

This code might come from imperative thinking but it might also come from not knowing how to use .reduce() which can look like an intimidating method. A .sum() would be friendlier and much easier to use even to inexperienced developers.

Solution

My proposed solution is to introduce a .sum() method on collections, so

let total = expenses.map({ $0.amount }).reduce(0,+)

can be written as

let total = expenses.map({ $0.amount }).sum()

or even better

let total = expenses.sum({ $0.amount })

Implementation

The actual implementation should be easy, just two extension method on Collection in the standard library

extension Collection where Element: Numeric {
    func sum() -> Element {
        return reduce(0, +)
    }
}

extension Collection {
    func sum<T: Numeric>(_ transform: (Element) throws -> T) rethrows -> T {
        return try map(transform).sum()
    }
}

Lantua · May 5, 2019, 3:53pm

I'm fine with either adding this, or leave it as is.

Some note:

It should be on AdditiveArithmatic rather than Numeric.

Also, I think the first function should be more inline with max/min signature.

So

extension Sequence where Element: AdditiveArithmetic {
    @inlinable
    @warn_unqualified_access
    public func sum() -> Element? {
        var it = makeIterator()
        guard var result = it.next() else { return nil }
        while let e = it.next() {
            result += e
        }
        return result
    }
}

ebg · May 5, 2019, 3:53pm

Though this is just a specialized version of reduce, I think for code readability it's a worthwhile addition to the standard library. Particularly, reduce is not well understood or often leveraged the same way map and filter are, likely in part due to its non-obvious name.

There is also precedence from adding allSatisfy, which is a simple derivative of filter.

There was a proposal as well for adding a fold method or reduce overload which was well-received but stalled before getting to a pitch phase:

Nevin · May 5, 2019, 4:37pm

I agree with this.

If you’re talking about making the result Optional, I strongly disagree. The empty sum is always equal to zero.

Yes, this is definitely related to @Erica_Sadun’s previous pitch.

Here is how the different options compare at the use-site:

nums.reduce(0, +)      // Current Swift
nums.reduce(+)         // Erica’s pitch
nums.sum()             // This pitch

It think it is abundantly self-evident that “sum” is much clearer at the point of use. The code is calculating a sum.

• • •

igorkulman:

Bad imperative code

Another bigger problem than readability is that I encounter code like this from other developers all the time:
var sum = 0
for expense in expenses {
    sum = sum + expense.amount
}

That code looks perfectly fine to me, if slightly more verbose than absolutely necessary. I would not call it “bad”.

Also, the examples in the original post do not actually show the proper way to sum this today:

expenses.reduce(0){ $0 + $1.amount }

The map example shown in the original post allocates an extraneous Array, which could be avoided by first calling lazy. For that reason, I would say the loop you referred to as “bad code” is actually significantly better code than the non-lazy map version.

• • •

I am unsure how frequently a mapping sum method would be used, but if it is a commonly employed operation then I agree it would be nice to write concisely:

expenses.sum{ $0.amount }    // closure
expenses.sum(\.amount)       // key-path

Jens · May 5, 2019, 6:04pm

Perhaps the expectations on a sum() method provided by the Standard Library would be higher than just doing the obvious/naive thing?

Should it do something to reduce the numerical error when Element is floating point for example?

Seeing this code
let v = myLargeFloatArray.reduce(0, +)
it is at least clear what it does,
but seeing this:
let v = myLargeFloatArray.sum()
might suggest that some more precise summing algorithm is being used.

Should the following trap (because the particular order causes an overflow) or do the best that it can and print 7:

let a: [Int8] = [127, 8, -128]
print(a.sum())

?

scanon · May 5, 2019, 7:28pm

To expand on this a bit, there are multiple related questions lurking here, which effect both floating-point and integer:

Is the accumulation order specified? Is vectorization (which implicitly re-associates) allowed? Is the re-association allowed to be different depending on platform or runtime?
Is the accumulator width required to be the same as the element type? Can the compiler or runtime use a wider accumulator to prevent overflow? Is the compiler or runtime required to use a wider accumulator to prevent overflow?
Is the runtime required to produce a result as though computed in a single operation without intermediate overflow or rounding?

davedelong · May 5, 2019, 8:37pm

I am in favor of both sum() and sum(_:) methods. I have both in my personal frameworks and use them frequently for simple value aggregation. They are more expressive than reduce and therefore end up making my code much more readable.

For example, in a simple "sectioned datasource" model (like what might back a UITableView), I have an Array<Section>, and each section has a numberOfObjects: Int property. To get the total number of items in the tableView is this nicely expressive call:

let totalNumberOfObjects = sections.sum { $0.numberOfObjects }

I also have a dictionary of "inventory" data, where the key is a store, and the value is the number of items in-stock. Getting the total number of items in-stock at all stores is now:

let inventory: Dictionary<StoreIdentifier, Int> = ...
let totalAmount = inventory.values.sum()

These are immensely valuable methods to me and I'd love to see them get lowered into the standard library.

These are my respective implementations:

public extension Collection {
    func sum<T: Numeric>(_ value: (Element) -> T) -> T {
        var s = T.init(exactly: 0) !! "Unable to create zero of type \(T.self)"
        for item in self {
            s += value(item)
        }
        return s
    }
}

public extension Collection where Element: Numeric {
    func sum() -> Element {
        var s = Element.init(exactly: 0) !! "Unable to create zero of type \(Element.self)"
        for item in self {
            s += item
        }
        return s
    }
}

Nevin · May 5, 2019, 9:01pm

Numeric (which is your constraint) refines ExpressibleByIntegerLiteral, so you can just write 0 as T.

AdditiveArithmetic (which *should* be the constraint) provides a .zero static property, so you can write T.zero.

• • •

The point @Jens and @scanon discussed about algorithmic accuracy is interesting. For integer types, it is probably fine to initially implement in terms of reduce, then later if desired change it to avoid overflow.

For floating-point, we probably ought to use an error-correcting algorithm from the outset, perhaps compensated summation. Ideally we would guarantee that the result is correctly rounded as if it were calculated in arbitrary precision, but I don’t know offhand whether that is realistically feasible.

CTMacUser · May 5, 2019, 11:14pm

I don't think we should have the last one. We want to compose orthogonal things together, not make variants that hide orthogonal steps. Is there a case for the mapping-sum to be an improvement over a conventional calling map and sum in series? This includes using a lazy-map then sum.

Hmm, maybe we could just have the mapping-sum variant that @Nevin implied that takes only a key path.

AlexanderM · May 5, 2019, 11:21pm

If this is adopted, how do we feel about product, average? How far do we take it?

What if we namespace is through a computed property, for a syntax like array.stats.average?

scanon · May 5, 2019, 11:38pm

Why? (To be clear, I don't disagree, but I also think it's important to back up assertions like this.) There are possibly better middle ground algorithms ("superblock accumulation", "tree sum", etc), which are both more accurate than naive reduce and faster (because they allow vectorization), but are less accurate than compensated summation.

It's absolutely possible; there are a few options for how to actually do it. It requires either multiple passes or sorting or accumulator memory proportional to the exponent range of the type (hundreds of bytes for double). Any of these are significantly slower than other accumulation options, but some of them only by a constant factor. "Kulish Accumulator" is a pretty good search term if you want to learn more about the subject.

Ben_Cohen · May 5, 2019, 11:40pm

I would go further than Nevin. Not only is the loop version not bad – it is actively harmful to describe such imperative code as "bad code".

That's not to say the reduce version isn't better IMO. I would encourage all Swift users to explore using reduce to solve these problems, as once you get used to it, I think it's a much clearer idiom. But the open-coded loop version is perfectly fine, not some kind of moral failing. Swift is, unrepentantly, an imperative language.

What's more, as Nevin points out, it is the map+reduce version that is bad, because it has a hidden performance issue. Describing for loops as if they're somehow a always a mistake on the part of the programmer leads to exactly the kind of problems where developers tie themselves in knots trying to solve a problem using reduce and end up writing things that are less performant (sometimes accidentally quadratic e.g. array.reduce([],+)) and often far less readable than their loop equivalent. This is the trade-off with all these higher-level algorithms.

There is a case to be made for adding sum to the standard library but that case isn't helping save developers from writing for loops. Because summing of a particular field is so common an operation, you could perhaps add it on the grounds that it will be enough of a readability win over reduce(0,+) or reduce(0) { $0 + $1.field }. Or possibly an optimized version of sum for floating point can bring accuracy or performance benefits. But it's a fairly slim case still and in general I continue to feel that the evolution community would be much better served discussing additions to the library that solve problems that are common and hard to solve (such as, say, partial sorting) or that avoid significant correctness or performance traps rather than this kind of trivially-composable sugar.

Ben_Cohen · May 5, 2019, 11:47pm

allSatisfy was introduced in part because it is not a simple derivative of filter. Using filter + count is a common performance blunder because it both allocates a new array and fails to bail early when the first failure is hit.

Nevin · May 5, 2019, 11:56pm

I’m not sure if you’re trying to expand the pitch, or use a slippery-slope argument in opposition to it, or what.

Regardless, average doesn’t work for integer types (at least, the return type would have to be different from the element type), and I’m not sure product is used anywhere near as often as sum.

I might say that a hypothetical Math library which sits alongside the standard library but is only available if you write import Math, could accommodate product, average, variance, and the like.

But sum is probably the only one that occurs frequently enough in general code to be worth considering for the standard library.

Ben_Cohen · May 5, 2019, 11:57pm

There would need to be a stronger motivation than this. Unlike min, there is a natural answer to summing an empty sequence. If nearly all uses of sum would be written sum() ?? 0, this undermines the slim readability benefits. For those (I suspect extremely rare) cases where you want to handle the empty case differently from the non-empty but sum to zero case, there are other alternatives that don't force the optionality on all users of the method.

AlexanderM · May 6, 2019, 1:42am

Nope, just genuinely inquiring about any future directions we might have in mind, and how we can preempt them in our design.

Nevin · May 6, 2019, 1:50am

My concern is that once we bake *some* behavior into sum over FloatingPoint, it would probably be unacceptable to change the result later even to make it more accurate.

So, we need to decide up-front if we want accurate rounding, or compensated summation, or block accumulation, or naively calling reduce, or what.

igorkulman · May 6, 2019, 7:10am

The .NET implementation I use in C# that inspired me for this is really simple (take a look at referencesource/Enumerable.cs at master · microsoft/referencesource · GitHub), just a foreach

public static float Sum(this IEnumerable<float> source) {
    if (source == null) throw Error.ArgumentNull("source");
    double sum = 0;
    foreach (float v in source) sum += v;
    return (float)sum;
}

Jens · May 6, 2019, 7:53am

Note that it uses double precision for the sum variable when summing single precision floats though.

scanon · May 6, 2019, 11:20am

The question is what does it do with doubles?