# Pitch: Method to sum numeric arrays

Currently working on a on app with a lot of math to compute different statistics I find myself summing different arrays all the time.

Coming from C# I expected some kind of `.sum()` method like LINQ has, but there is nothing like that in Swift.

## Current state

I currently sum the arrays using `.reduce(0,+)` which I do not mind by the code is not that readable

``````let total = expenses.map({ \$0.amount }).reduce(0,+)
``````

Another bigger problem than readability is that I encounter code like this from other developers all the time:

``````var sum = 0
for expense in expenses {
sum = sum + expense.amount
}
``````

This code might come from imperative thinking but it might also come from not knowing how to use `.reduce()` which can look like an intimidating method. A `.sum()` would be friendlier and much easier to use even to inexperienced developers.

## Solution

My proposed solution is to introduce a `.sum()` method on collections, so

``````let total = expenses.map({ \$0.amount }).reduce(0,+)
``````

can be written as

``````let total = expenses.map({ \$0.amount }).sum()
``````

or even better

``````let total = expenses.sum({ \$0.amount })
``````

## Implementation

The actual implementation should be easy, just two extension method on `Collection` in the standard library

``````extension Collection where Element: Numeric {
func sum() -> Element {
return reduce(0, +)
}
}

extension Collection {
func sum<T: Numeric>(_ transform: (Element) throws -> T) rethrows -> T {
return try map(transform).sum()
}
}
``````
4 Likes

I'm fine with either adding this, or leave it as is.

Some note:

It should be on `AdditiveArithmatic` rather than `Numeric`.

Also, I think the first function should be more inline with max/min signature.

So

``````extension Sequence where Element: AdditiveArithmetic {
@inlinable
@warn_unqualified_access
public func sum() -> Element? {
var it = makeIterator()
guard var result = it.next() else { return nil }
while let e = it.next() {
result += e
}
return result
}
}
``````

Though this is just a specialized version of reduce, I think for code readability it's a worthwhile addition to the standard library. Particularly, `reduce` is not well understood or often leveraged the same way `map` and `filter` are, likely in part due to its non-obvious name.

There is also precedence from adding `allSatisfy`, which is a simple derivative of `filter`.

There was a proposal as well for adding a `fold` method or `reduce` overload which was well-received but stalled before getting to a pitch phase:

1 Like

I agree with this.

I agree with this.

If youâ€™re talking about making the result `Optional`, I strongly disagree. The empty sum is always equal to zero.

Yes, this is definitely related to @Erica_Sadunâ€™s previous pitch.

Here is how the different options compare at the use-site:

``````nums.reduce(0, +)      // Current Swift
nums.reduce(+)         // Ericaâ€™s pitch
nums.sum()             // This pitch
``````

It think it is abundantly self-evident that â€ś`sum`â€ť is much clearer at the point of use. The code is calculating a sum.

â€˘ â€˘ â€˘

That code looks perfectly fine to me, if slightly more verbose than absolutely necessary. I would not call it â€śbadâ€ť.

Also, the examples in the original post do not actually show the proper way to sum this today:

``````expenses.reduce(0){ \$0 + \$1.amount }
``````

The `map` example shown in the original post allocates an extraneous `Array`, which could be avoided by first calling `lazy`. For that reason, I would say the loop you referred to as â€śbad codeâ€ť is actually significantly better code than the non-lazy `map` version.

â€˘ â€˘ â€˘

I am unsure how frequently a mapping sum method would be used, but if it is a commonly employed operation then I agree it would be nice to write concisely:

``````expenses.sum{ \$0.amount }    // closure
expenses.sum(\.amount)       // key-path
``````
6 Likes

Perhaps the expectations on a `sum()` method provided by the Standard Library would be higher than just doing the obvious/naive thing?

Should it do something to reduce the numerical error when `Element` is floating point for example?

Seeing this code
`let v = myLargeFloatArray.reduce(0, +)`
it is at least clear what it does,
but seeing this:
`let v = myLargeFloatArray.sum()`
might suggest that some more precise summing algorithm is being used.

Should the following trap (because the particular order causes an overflow) or do the best that it can and print 7:

``````let a: [Int8] = [127, 8, -128]
print(a.sum())
``````

?

13 Likes

To expand on this a bit, there are multiple related questions lurking here, which effect both floating-point and integer:

• Is the accumulation order specified? Is vectorization (which implicitly re-associates) allowed? Is the re-association allowed to be different depending on platform or runtime?
• Is the accumulator width required to be the same as the element type? Can the compiler or runtime use a wider accumulator to prevent overflow? Is the compiler or runtime required to use a wider accumulator to prevent overflow?
• Is the runtime required to produce a result as though computed in a single operation without intermediate overflow or rounding?
20 Likes

I am in favor of both `sum()` and `sum(_:)` methods. I have both in my personal frameworks and use them frequently for simple value aggregation. They are more expressive than `reduce` and therefore end up making my code much more readable.

For example, in a simple "sectioned datasource" model (like what might back a `UITableView`), I have an `Array<Section>`, and each section has a `numberOfObjects: Int` property. To get the total number of items in the tableView is this nicely expressive call:

``````let totalNumberOfObjects = sections.sum { \$0.numberOfObjects }
``````

I also have a dictionary of "inventory" data, where the key is a store, and the value is the number of items in-stock. Getting the total number of items in-stock at all stores is now:

``````let inventory: Dictionary<StoreIdentifier, Int> = ...
let totalAmount = inventory.values.sum()
``````

These are immensely valuable methods to me and I'd love to see them get lowered into the standard library.

These are my respective implementations:

``````public extension Collection {
func sum<T: Numeric>(_ value: (Element) -> T) -> T {
var s = T.init(exactly: 0) !! "Unable to create zero of type \(T.self)"
for item in self {
s += value(item)
}
return s
}
}

public extension Collection where Element: Numeric {
func sum() -> Element {
var s = Element.init(exactly: 0) !! "Unable to create zero of type \(Element.self)"
for item in self {
s += item
}
return s
}
}
``````
2 Likes

`Numeric` (which is your constraint) refines `ExpressibleByIntegerLiteral`, so you can just write `0 as T`.

`AdditiveArithmetic` (which *should* be the constraint) provides a `.zero` static property, so you can write `T.zero`.

â€˘ â€˘ â€˘

The point @Jens and @scanon discussed about algorithmic accuracy is interesting. For integer types, it is probably fine to initially implement in terms of `reduce`, then later if desired change it to avoid overflow.

For floating-point, we probably ought to use an error-correcting algorithm from the outset, perhaps compensated summation. Ideally we would guarantee that the result is correctly rounded as if it were calculated in arbitrary precision, but I donâ€™t know offhand whether that is realistically feasible.

I don't think we should have the last one. We want to compose orthogonal things together, not make variants that hide orthogonal steps. Is there a case for the mapping-sum to be an improvement over a conventional calling `map` and `sum` in series? This includes using a lazy-map then `sum`.

Hmm, maybe we could just have the mapping-sum variant that @Nevin implied that takes only a key path.

If this is adopted, how do we feel about `product`, `average`? How far do we take it?

What if we namespace is through a computed property, for a syntax like `array.stats.average`?

2 Likes

Why? (To be clear, I don't disagree, but I also think it's important to back up assertions like this.) There are possibly better middle ground algorithms ("superblock accumulation", "tree sum", etc), which are both more accurate than naive reduce and faster (because they allow vectorization), but are less accurate than compensated summation.

It's absolutely possible; there are a few options for how to actually do it. It requires either multiple passes or sorting or accumulator memory proportional to the exponent range of the type (hundreds of bytes for double). Any of these are significantly slower than other accumulation options, but some of them only by a constant factor. "Kulish Accumulator" is a pretty good search term if you want to learn more about the subject.

3 Likes

I would go further than Nevin. Not only is the loop version not bad â€“ it is actively harmful to describe such imperative code as "bad code".

That's not to say the `reduce` version isn't better IMO. I would encourage all Swift users to explore using reduce to solve these problems, as once you get used to it, I think it's a much clearer idiom. But the open-coded loop version is perfectly fine, not some kind of moral failing. Swift is, unrepentantly, an imperative language.

What's more, as Nevin points out, it is the `map`+`reduce` version that is bad, because it has a hidden performance issue. Describing `for` loops as if they're somehow a always a mistake on the part of the programmer leads to exactly the kind of problems where developers tie themselves in knots trying to solve a problem using `reduce` and end up writing things that are less performant (sometimes accidentally quadratic e.g. `array.reduce([],+)`) and often far less readable than their loop equivalent. This is the trade-off with all these higher-level algorithms.

There is a case to be made for adding `sum` to the standard library but that case isn't helping save developers from writing `for` loops. Because summing of a particular field is so common an operation, you could perhaps add it on the grounds that it will be enough of a readability win over `reduce(0,+)` or `reduce(0) { \$0 + \$1.field }`. Or possibly an optimized version of `sum` for floating point can bring accuracy or performance benefits. But it's a fairly slim case still and in general I continue to feel that the evolution community would be much better served discussing additions to the library that solve problems that are common and hard to solve (such as, say, partial sorting) or that avoid significant correctness or performance traps rather than this kind of trivially-composable sugar.

24 Likes

`allSatisfy` was introduced in part because it is not a simple derivative of `filter`. Using `filter` + `count` is a common performance blunder because it both allocates a new array and fails to bail early when the first failure is hit.

6 Likes

Iâ€™m not sure if youâ€™re trying to expand the pitch, or use a slippery-slope argument in opposition to it, or what.

Regardless, `average` doesnâ€™t work for integer types (at least, the return type would have to be different from the element type), and Iâ€™m not sure `product` is used anywhere near as often as `sum`.

I might say that a hypothetical `Math` library which sits alongside the standard library but is only available if you write `import Math`, could accommodate `product`, `average`, `variance`, and the like.

But `sum` is probably the only one that occurs frequently enough in general code to be worth considering for the standard library.

There would need to be a stronger motivation than this. Unlike `min`, there is a natural answer to summing an empty sequence. If nearly all uses of `sum` would be written `sum() ?? 0`, this undermines the slim readability benefits. For those (I suspect extremely rare) cases where you want to handle the empty case differently from the non-empty but sum to zero case, there are other alternatives that don't force the optionality on all users of the method.

6 Likes

Nope, just genuinely inquiring about any future directions we might have in mind, and how we can preempt them in our design.

1 Like

My concern is that once we bake *some* behavior into `sum` over `FloatingPoint`, it would probably be unacceptable to change the result later even to make it more accurate.

So, we need to decide up-front if we want accurate rounding, or compensated summation, or block accumulation, or naively calling `reduce`, or what.

The .NET implementation I use in C# that inspired me for this is really simple (take a look at https://github.com/Microsoft/referencesource/blob/master/System.Core/System/Linq/Enumerable.cs), just a foreach

``````public static float Sum(this IEnumerable<float> source) {
if (source == null) throw Error.ArgumentNull("source");
double sum = 0;
foreach (float v in source) sum += v;
return (float)sum;
}
``````
1 Like

Note that it uses double precision for the `sum` variable when summing single precision floats though.

1 Like

The question is what does it do with doubles?