Sum with Block

Hi everyone!

@timv and I have been working on a small proposal for a Nice Thing that you all deserve. It's called sum with block, and I've been finding it super useful in my own codebases.

Motivation

Summing a sequence of numbers is useful on its own, but Swift users don’t frequently keep lists of numbers devoid of any context. The numbers that need be summed up are frequently a property of some richer sequence element.

As a more concrete example, one might want to add up all the followers that some collection of users might have. This can currently be done with map and reduce:

users.map(\.followerCount).reduce(0, +) // => sum of all users' followers

This approach is suboptimal because it does two passes over the data and map needlessly creates an intermediate array, which is discarded right away.

To avoid this, you can use a lazy sequence:

users.lazy.map(\.followerCount).reduce(0, +) // => sum of all users' followers

However, the closure parameter of the lazy map is marked @escaping and cannot throw an error, which means this is not always an option. And if the passed closure does throw an error or cannot escape, Swift will silently call the eager version of map, nullifying any performance benefits you think you’re getting.

A third option without these downsides is to stuff the extraction logic inside a reduce closure, making it less readable:

users.reduce(0, { $0 + $1.followerCount })  // => sum of all users' followers

These three solutions all lie on a spectrum between readability and performance.

Proposed solution

The proposed solution would avoid a performance trap and provide a simple interface for users to both read and write. Autocomplete should present it to them handily as well.

users.sum(\.followerCount) // => sum of all users' followers

In addition, if the sequence is already made up of numbers, a version without parameters (sum()) is provided to make the code even clearer.

Detailed design

A reference implementation for the function is included here:

extension Sequence {
	func sum<T>(_ transform: (Element) throws -> T) rethrows -> T where T: AdditiveArithmetic {
        var sum = T.zero
		for element in self {
			sum += try transform(element)
		}
		return sum
	}
}
	

extension Sequence where Element: AdditiveArithmetic {
	func sum() -> Element {
		return sum { $0 }
	}
}

You can read the full proposal here! Please let us know what you think. Tim and I are happy to provide a reference implementation and tests if the community likes the pitch!

31 Likes

shut-up-and-take-my-upvote


OK, more helpful info:

• I love this, because I use this frequently and it's one of those things that I think is common enough to warrant addition (haha) to the frameworks
• In fact, this seems so obvious that even the documentation for AdditiveArithmetic has a "look, you can make sum()!" as the very first example: Apple Developer Documentation
• I was initially thinking that product() would be nice too (which you mention in your "alternatives" section), but 1) there is no MultiplicativeArithmetic protocol, so it'd be way less obvious when you'd get it it and 2) multiplication is not always commutative, which means there'd be situations where you use it and sometimes get unexpected results.

SUMMARY: omg yes please

7 Likes

I'm not necessarily opposed to this, but I'd rather improve the ergonomics around users.reduce(0, { $0 + $1.followerCount }).

Ideally (well, IMHO, anyway), the compiler would automatically collapse sequential calls to map / flatmap / filter / reduce and such into a single closure and only iterate over the sequence once. Maybe something could be done with Function Builders? I don't know that part of the language well enough to say. Also, since the compiler would be rearranging the code into a single iteration with multiple "transforms" vs multiple iterations with a single "transform", I wouldn't be surprised if we'd need some notion of "purity" to ensure we keep the same semantics.

Since we don't have that now, though, what about something like this?

extension Sequence {
    func mapAndReduce<T, Result>(
        _ transform: (Element) throws -> T,
        // The compiler complains about rethrowing if the initial result and the next partial result closure are in a tuple, otherwise the next two parameters would just be `reduce: (T, (T, T) throws -> T)` to more closely match `reduce`'s syntax.
        _ initialResult: Result,
        _ nextPartialResult: (Result, T) throws -> Result
    ) rethrows -> Result {
        var currentResult = initialResult
        for el in self {
            currentResult = try nextPartialResult(currentResult, try transform(el)) // This also compiles without the second `try`. Not sure if it matters.
        }
        return currentResult
    }
}

The inefficient but easy to read users.map(\.followerCount).reduce(0, +) example would become users.mapAndReduce(\.followerCount, 0, +). Granted, that's not as readable as the proposed users.sum(\.followerCount), but neither is it limited to only calculating sums.

2 Likes

My only feedback would be to consider adding an external label to the transform closure with some preposition. Possible options:

  • users.sum(by: \.followerCount) ← Personal preference
  • users.sum(on: \.followerCount)
  • users.sum(over: \.followerCount)

(This is similar to assign(to:on:) found in Combine.)

Other than that, yes this Nice Thing is indeed Nice. Not just that, it has performance improvements that most developers wouldn't ever consider and that most shouldn't ever have to.

4 Likes

On a side note, something I still don't get is the discrepancy in some Swift APIs regarding methods and computed properties. Is there a particular reason why Collection.first is a computed property instead of being a function overload of Sequence.first(where:)?
Most of the times you don't have computed properties, but method overloads, such as for min() / min(by:) and max() / max(by:).

I missed this somehow! Tim and I will take a look and incorporate any valuable feedback from this thread.

This is generally true, but there's a big counter-example in the stdlib, which is count. Until you get to the RandomAccessCollection refinement, count is not guaranteed to be O(1).

1 Like

I'll repeat my concerns from the previous thread: A sum method provided by the Standard Library might be expected to be correct for floating point types, vectorized, not trap on eg [127, 8, -128] as [Int8], etc.

4 Likes

The idea of this is a great addition, but I don't think it's the right specific solution. The zero representing emptiness is most troublesome. And it's not general-purpose enough. Here's what I do instead:

public extension Sequence {
  /// Accumulates transformed elements.
  /// - Returns: `nil`  if the sequence has no elements.
  func reduce<Result>(
    _ transform: (Element) throws -> Result,
    _ getNextPartialResult: (Result, Result) throws -> Result
  ) rethrows -> Result? {
    try lazy.map(transform).reduce(getNextPartialResult)
  }

  /// - Returns: `nil` If the sequence has no elements, instead of an "initial result".
  func reduce(
    _ getNextPartialResult: (Element, Element) throws -> Element
  ) rethrows -> Element? {
    try first.map {
      try dropFirst().reduce($0, getNextPartialResult)
    }
  }

  /// The first element of the sequence.
  /// - Note: `nil` if the sequence is empty.
  var first: Element? {
    var iterator = makeIterator()
    return iterator.next()
  }
}

e.g.

users.reduce(\.followerCount, +)

Whether the lazy.map is good enough, :man_shrugging:. As long as the functionality/overload is there, I'm happy.

I do think a sum property is great for these though:

public extension Sequence where Element: AdditiveArithmetic {
  var sum: Element? { reduce(+) }
}

Also, "block" is an Objective-C-specific term.

1 Like

Seems like average() and standardDeviation() should also be included.

This was also mentioned in the previous thread, and I'm of the same mind as @Nevin.

5 Likes

You can have whatever minds you like, but there's no reason for us to deal with any strong opinion from esoteric domains when we have ??. Put a 0 after it and make your own extension with that if you want.

But again, this isn't really about any "accumulate" function, e.g. +. It's about missing reduce overloads.

[].reduce(0, +) returns 0.

2 Likes

I think @anon9791410 is proposing that we add a new version of reduce that doesn't have an "initial value" parameter, and returns an optional if the sequence is empty. However, that's quite different from what Tim and I are proposing here and I think would best suited to another thread. Even with this new version of reduce, the summing code is still not as clean as using .sum(_:).

These are more statistically oriented, where summing has a wider variety of uses — for example summing up the heights in a bunch of stacked views. These other functions also have issues with integers, as mentioned in the previous thread.

1 Like

I would prefer if the solution kept the reduce(0, +) spelling, which makes the semantics clear. sum() makes you wonder what numerical summation algorithm is being used (just like average() and standardDeviation() does).

1 Like

The same could be said in the opposite direction for using the ternary operator and .isEmpty, so I’m still firmly on the side of representing the sum operation in a mathematically faithful way.

2 Likes

and allSatisfy. :wink:

Edit:

Wait, I thought you mean that those two are examples of mathematically faithful functions :grimacing:

1 Like

This makes no sense at all to me, aren't we talking about AdditiveArithmetic here? What it does is precisely that, it defines what zero means and what addition means.

I'm aware that the documentation for AdditiveArithmetic even has the following example:

extension Sequence where Element: AdditiveArithmetic {
  func sum() -> Element {
    return reduce(.zero, +)
  }
}

But again, I think a sum() method, if provided by the Standard Library, should do better than that.

For example, using the above implementation:

let array1: [Float] = [ 3_000_000, -3_000_000,      0.125 ]
let array2: [Float] = [ 3_000_000,      0.125, -3_000_000 ]
let set = Set(array1)
print(array1.sum()) // 0.125
print(array2.sum()) // 0.0
print(set.sum()) // Either 0.0 or 0.125, it's different each time the program runs.

Why shouldn't sum() return 0.125 for all three of those cases?


It's easy to envision many more similar examples of where a reduce(0, +) hiding behind a sum() may result in surprising results, like trapping or not randomly between runs, etc.

reduce(0, +) says on the tin what it does, and unless you are unfamiliar with the specifics of the numeric type or operator, you will not be surprised.

sum() on the other hand, is IMO something that probably belongs in swift numerics or some other module.

From the other thread:

5 Likes

Part of my argument (basically echoed by others, in the other thread) is against going the way of C# and Kotlin and having specialized functions for too much. This sum overload is too specific, but the less specific version of it is not, and we don't have it: a combination of map and reduce.

The alternative is too messy, and might not be optimally performant. You've argued very well for the right level of what we need, but you've taken it further than it needs to go. I think now that you know this, you'll go play around with the idea and see the other cases where sum is two important abstractions away from what we have, not just one.

It's truth elsewhere, but it's a bogus argument for Swift's domain. A number is a loss of information over an optional number. Go render a SwiftUI view conditionally for a sum, if there are numbers to sum, and tell someone to their face that it's the right choice for the job.

A sane person will then bite your nose.