 # Convenience functions for sum and product across a Collection type

I regularly find myself wanting to sum all the elements in a `Set`, `Array` or other `Collection`s. This made me think whether adding a simple wrapper in the standard library for the sum and product operations, something along the lines of:

`collection.reduce(0,+)`, `collection.reduce(1,*)`

Can somebody suggest whether this would be a good idea, and if so any advice for creating such an issue? I have assumed here that ordering isn't invariant, which is why I thought of collection. For `Collection` I was also thinking of creating a function called `pairwiseDifferences() -> [Element]` for `Collection where Element: Numeric`. This would iterate through the collection and calculate the pairwise differences for every pair that can be selected. This would make use of the existing `combinations(ofCount` with an argument of 2.

Do I need to do some more research? If so, what would be good to consider?

2 Likes

The behavior of `reduce(0, +)` is not necessarily what we want for `sum()`.

Some of the issues are discussed in these previous threads:

Pitch: Method to sum numeric arrays (see in particular these two comments by Jens and Steve Canon)
Sum with Block

Notably, if the partial sums would overflow but the full sum would not, should the calculation trap like `reduce` does, or should `sum` make an effort to avoid that, perhaps by reordering or using a wider type?

Example: `[100 as Int8, 100, -100, -100].sum()` mathematically should equal zero, but the first two terms overflow the `Int8` type.

Also, in floating-point calculations, adding a large number of small terms can cause the low-order bits to be lost, thus producing an inaccurate sum. This is exacerbated if earlier terms are substantially larger than later ones.

Example: `[16_777_216 as Float, 1, 1, 1, 1].sum()` mathematically should equal `16_777_220`, which is exactly representable as `Float`, but simply adding `1` to the first value returns the first value unchanged so the result of `reduce(0, +)` is `16_777_216`.

9 Likes

I've spent some time thinking about this since my thread (the sum with block one you linked) and while I think it's an important optimization to think about, the longer we dally on adding the function, more usages of `.reduce(0, +)` go into various Swift codebases around the world. If we can agree on the interface we can add the function to the standard library, and then any optimizations/correctness fixes we add in the future a) wouldn't necessarily need to go through evolution and b) everyone would get for free, as soon as they built their program with a newer toolchain.

Put another way, adding this as a simple alias for `.reduce(0, +)` is a better in a few ways than the current state of the world, and no worse, and it opens us up to more interesting algorithms in the future that have more safeguards.

4 Likes

I would say that this concern is not merely an implementation detail or optimization opportunity, it leaks quite a bit of the function's internals into the world. You should not expect `[100 as Int8, 100, -100, -100].sum()` to ever crash, but it will for no reason other than improper implementation. You could say that, alright, let's do the proper one, but suddenly the algorithm is not O(n) anymore (and computing a sum in non-linear time would be extremely surprising) — in other words, you will inevitably be lying to users about what `.sum()` actually does. `.reduce(0, +)` doesn't introduce such inconsistencies — its straightforward looping behaviour is part of its semantics.

You also make it sound that having `.reduce(0, +)` in codebases is somehow a bad thing, but is it really? Will it be in any way hard to replace it later? I see no problem with it other than it being less convenient to write, and I would argue that aliasing `.sum()` with `.reduce(0, +)` actually does produce more harm than it solves.

I believe that this would be the best solution:

3 Likes

Is it more shocking that something called `.sum()` crashes on integer overflow than that an operator called `+` crashes on integer overflow?

Is it more shocking that `[16_777_216 as Float, 1, 1, 1, 1].sum()` results in an incorrect sum than `(16_777_216 as Float) + 1` resulting in an incorrect sum?

Part of the problem is it's never just `.reduce(0, +)`. Usually we keep more interesting objects in arrays than just numerics. What I see people do is map to the property they're interested in, which is a little performance snag they wouldn't have to think about if it was properly standard library. There's "harm" on the `reduce` side of the question, too.

Will it be harder to replace some code than to not have to replace some code? Yeah!

2 Likes

Yes, because the former crash is much less reproducible: it depends on the order of the elements which can be random (e.g. in `Set`s), so you may crash or not at different program invocations. Moreover, it's the actual semantics of `+` that it crashes on overflow; it might not be immediately evident, but I would expect a reasonable Swift programmer be aware of it and be able to choose between `+`, `&+` or `addingReportingOverflow()` as needed. You won't be able to do that with `sum()` or a "sum with block"; otherwise you just go back to `reduce`.

I think this is way beyond the actual scope of the problem because if you're telling me that people don't know how to access their properties correctly, then it will persist for everything past summation too.

The point really is that if your `reduce(0, +)` already works, then you don't even need to replace it; you'd only do this when you need better numerical guarantees. This is especially worse backwards: if `sum()` is just a wrapper for `reduce(0, +)` at some point, but then we rewrite it to use an O(n log n) algorithm, which won't cut it for many applications, then you'd need to go and replace it back.

I agree with @soroush . A quick search of Github shows thousands of implementations of `sum` that use `reduce(0, +)` (or an equivalent variant of it). I myself have implemented it that way several times with no idea that it might be wrong.

If there's an opportunity here for the standard library to provide the better-and-less-obvious implementation that makes everyone's code better, why wouldn't we do that?

1 Like

Here’s an idea for integers.

We can count the number of overflows (with sign), and if it is zero then the sum using wrapping arithmetic is correct:

``````extension Sequence where Element: FixedWidthInteger & SignedInteger {
func sum() -> Element {
var result: Element = 0
var overflow: Bool = false
var totalOverflow: Int = 0

for x in self {
if overflow {
totalOverflow += (x < 0) ? -1 : 1
}
}

precondition(totalOverflow == 0)
return result
}
}
``````

Of course, integers that are unsigned or variable-width can use the simple version:

``````extension Sequence where Element: BinaryInteger {
func sum() -> Element {
return self.reduce(0, +)
}
}
``````
4 Likes

And here’s a floating-point implementation with compensated summation:

``````extension Sequence where Element: FloatingPoint {
func sum() -> Element {
var result: Element = 0
var excess: Element = 0

for x in self {
let large = Element.maximumMagnitude(result, x)
let small = Element.minimumMagnitude(result, x)
result += x
excess += (result - large) - small
}

return result - excess
}
}
``````

Although for `Float` (and `Float16`) it’s probably better to just sum as `Double`:

``````extension Sequence where Element == Float {
func sum() -> Float {
return Float(self.reduce(0 as Double){ \$0 + Double(\$1) })
}
}
``````

• • •

I have more thoughts on this, and I’ll work on writing them up, but the top-line is that I think `sum()` belongs in either the standard library or perhaps Numerics, rather than Algorithms.

There’s also some consideration for whether we ought to make `Sequence.sum()` trampoline off a customization point in `AdditiveArithmetic` to enable dynamic dispatch. That has both benefits and drawbacks, which I’ll try to cover when I get the chance.

1 Like

Just a quick note: all of the algorithms discussed here to produce higher-quality results remain O(n). For floating-point specifically, people sometimes sort by magnitude before summing (resulting in O(n log n)), but that's not necessary; an extra-wide accumulator can produce the same result without sorting.

3 Likes

I would like to add the full set of IEEE 754 reduction operations in Numerics: sum, dot, sumSquare, sumAbs, scaledProd, scaledProdSum, scaledProdDiff (IEEE names used here; we would be somewhat more verbose, I expect). `sum` is as good a place to start as any.

2 Likes

One issue I foresee is that we’ll want dynamic dispatch. If someone writes a custom generic algorithm that involves, say, summing a numeric collection:

``````extension Collection where Element: Numeric {
func myFancyAlgorithm() -> Element {
let sum = self.sum()
// do something with sum
}
}
``````

Then that can be called with elements that are floating-point, or signed integers, or unsigned integers, and so forth.

If we just implement the various `Sequence.sum()` versions as constrained extension methods, then this will call whichever `sum()` applies to `Numeric` elements.

It will not look into the dynamic type of `Element` to see if it “wants” the overflow-counting behavior of signed fixed-width integers, or the compensated-sum of floating-point, or anything else.

• • •

The natural way to fix this is for the standard library to introduce a customization point in, say `AdditiveArithmetic`, which might be spelled:

``````static func _sum<S: Sequence>(_ s: S) -> Self where S.Element == Self
``````

That would let us put custom implementations in places like `extension FixedWidthInteger where Element: SignedInteger`, and conforming types would pick them up naturally.

Then `Sequence.sum()` could have a one-line implementation that just calls `Element._sum(self)`, and the algorithm would be dynamically dispatched through the element type.

• • •

That’s all well and good, and it would work great for the standard integer and floating-point types.

Of course, for the standard library to add a customization point to a protocol, it must also provide a default implementation along with it. The only reasonable choices for that are to trap, which is not desirable, or to use the general-purpose implementation `reduce(.zero, +)`.

The `reduce` implementation is a perfectly fine default, but it makes things difficult for generic types that prefer a better algorithm.

This isn’t too bad for something like `Complex` whose generic parameter is always floating-point, but even there the Numerics library would want to provide a custom implementation of `Complex._sum()` that essentially duplicates the standard version for floating-point, perhaps with a special-case branch for `Float` and `Float16`.

• • •

However, one can imagine a `Matrix<Element>` type, whose `Element` has no intrinsic constraints. `Matrix` would conditionally conform to `AdditiveArithmetic` when its `Element` does, so it would have to provide a single implementation of `_sum()` that works for integers, floats, complex values, and other matrices, as well as 3rd-party types.

It is perfectly sensible to sum a sequence of matrices (provided their dimensions match), but there’s no good way for the implementation of `Matrix._sum()` to make use of `Element._sum()` when doing so.

If we constrained to `Collection` instead of `Sequence` then `Matrix._sum()` could do a multi-pass traversal, summing each position separately with `Element._sum()` via a lazy map of the collection.

That would make it easier for `Matrix` to provide a correct conformance, but it would still leave the pitfall that the author of `Matrix` might not notice they need to implement `_sum()`, since the default implementation would be present and would even work correctly for some `Element` types.

It also would seem like a rather arbitrary restriction, since summing makes perfectly good sense for single-pass sequences.

If we did change the constraint from `Sequence` to `Collection`, then we could provide a single default implementation for composite types like `Complex` and `Matrix` by introducing a new `Vector` protocol along these lines:

``````protocol Vector: AdditiveArithmetic {
var componentCount: Int { get }
subscript(component n: Int) -> Element { get set }
}
``````

Then this implementation would allow authors of composite types to simply conform them to `Vector` rather than writing a custom `_sum()` method:

``````extension Vector {
static func _sum<C: Collection>(_ c: C) -> Self where C.Element == Self {
var result: Self = .zero
guard let n = c.first?.componentCount else { return result }

for i in 0..<n {
result[component: i] = c.lazy.map{ \$0[component: i] }.sum()
}

return result
}
}
``````

I’m not sure it’s worth going that route just for this, however.

I think the main argument to at least have this shortcut is that it's what you'd expect to exist as a user and that it's quite an obscure way for new Swift users to get the `sum` (or `product`).

Yes, it might crash but honestly I don't think it's more risky on the `reduce` call than on `sum`. Right now people stumble into the `reduce` solution and are none the wiser if it fails.

Also, note that it's simple to get it wrong, as evidenced by `collection.reduce(0,*)` always returning `0` :)

1 Like

If I understand correctly, all 3 of `dot`, `sumSquares`, and `sumAbs` can be written in terms of `sum`:

``````extension Sequence where Element: Numeric {
func dot<S: Sequence>(_ s: S) -> Element where S.Element == Element {
return zip(self, s).lazy.map{ \$0 * \$1 }.sum()
}

func sumSquares() -> Element {
return self.lazy.map{ \$0 * \$0 }.sum()
}
}

extension Sequence where Element: SignedNumeric & Comparable {
func sumAbs() -> Element {
return self.lazy.map{ abs(\$0) }.sum()
}
}
``````

So yes, `sum` would appear to be the right place to start.

Although…that implementation of `dot` isn’t really right for `Complex` (nor is `sumSquares` particularly useful there). Should we make a protocol for inner products?

And also, perhaps the standard library should get a `strictZip` function that traps if the sequences have unequal length.

In the meantime, this conversation should probably move to either the Numerics category or Evolution-Discussion.

I think that something will have to go in the standard library to make `sum` work properly in generic contexts, at minimum a trampolinable member on `AdditiveArithmetic`.

Well, that dot and sumSquares are not correct w.r.t. handling spurious overflow/underflow and cancellation (this doesn't matter as much for sumSquares, but one would prefer to return a value and a scale so you can recover from overflow if you're going to end up taking a square root). So it's rather more subtle than that (IEEE 754 wouldn't bother recommending them if they were that simple ; 754 doesn't actually specify the precise behavior of these, but IMO if they're worth doing at all they're worth doing well).

Ignore Complex for now, that's at least partially a red herring (outside the scope of IEEE 754 bindings, for one thing). We should take all this to a dedicated thread, though, rather than forking this one.

3 Likes