Pitch: Reducing a sequence onto its own elements to simplify code and reduce errors

ptomaselli · January 5, 2019, 3:53pm

Ha! Those are all great points.

I didn’t mean to imply that my counterexamples were airtight, only that they had a funny feel to me. But I also find myself mellowing on my objections as this thread goes on.

Understood. Sorry, I was sort of implicitly addressing a more “common” (is it though?) formulation of monoid where it’d be a protocol and wrappers would conform. I personally don’t mind wrappers, but I know that others do.

I think if you abandon the idea of monoid as a protocol, there are a lot of interesting ways to go! For example, Java has Collectors, which are pretty interesting!

Lantua · January 5, 2019, 3:56pm

As I see the discussion right now:

fold should return Monoid.identity on empty sequence. This is the same as current reduce behaviour.
fold should return nil on empty sequence

There indeed are cases when returning empty value allow for simpler code. The problem here is when standard library provide one, but user wants another:

A. fold provides 1, but we want 2.

let reduceValue: Value
if sequence.isEmpty {
    // do something
} else {
    reducedValue = sequence.fold(+)
}

Or

guard !sequence.isEmpty else {
    // handle empty case
}
let reducedValue = sequence.fold(+)

B. fold provides 2, but we want 1.

let reducedValue = sequence.fold(+) ?? 0

IMO, scenario B plays out mich nicer compared to A, you can still fallback to empty value if you so wish, with little to no mental cost.

I already have mentioned it, but I want to have ability to create functions that are in line with Sequence.min() and Sequence.max() at the moment.

Surely min and max is monoidal in most arithmatic types, but it works much better when they return nil instead of Value.max, Value.min respectively.

Erica_Sadun · January 5, 2019, 5:58pm

Let me mess about a bit when I have a moment with an alternate take. It's going to be completely distinct from pitch take 1. And I'm veering off from Matthew's design.

Ben_Cohen · January 5, 2019, 8:02pm

Narrator: but they did not move on...

My general feedback about this thread is it has veered off course from a simple, useful (if fairly low-priority) addition to the std lib, following existing idioms already present, and that could be quickly reviewed and added (especially now we're past 5.0). Instead it's now going in a direction that is much more complex, and will produce a lot of push back. I for one don't believe adding Monoid to the standard library is the right thing to do right now. These higher-level concepts are all great, and I encourage people to learn about them and try them out, but I think they belong in 3rd party libraries not the core language.

Erica_Sadun · January 5, 2019, 8:20pm

I'm throwing this into the mix now: monoid.swift · GitHub

This does two things:

It adds Monoid. (I went there, I did that, sorry narrator voice)
It removes optionals by trapping on reduce(:_) for empty sequences

These changes (I won't call them enhancements) can be considered together or separately.

By trapping, it follows array indexing in providing the most ergonomic approach. With monoids, it adds a reasonably useful feature.

anandabits · January 5, 2019, 8:23pm

Can you explain why you think that? These are universal abstractions for which it would be very useful to have a shared language in Swift. We're not talking about "difficult" FP abstractions that require higher kinds and other advanced type system features Swift doesn't have here. We're talking about a relatively straightforward abstraction which underlies algorithms that already exist in the standard library.

Formally supporting simple abstractions like Monoid will make code more expressive (i.e. fold(.sum) is more clear at the usage site than reduce(0, +)) and eliminate the potential for error (i.e. passing the wrong element as the identity / initial result). Monoid is also pretty fundamental in writing data parallel algorithms. It seems like a very strong candidate for inclusion in the standard library.

anandabits · January 5, 2019, 8:48pm

This design for Monoid removes the ability to pass monoids at the type level which can be very useful. It also does not allow your Monoid struct to conform to Equatable, Hashable, etc (because a function is an essential part of its value) which would be extremely useful. I know @stephencelis and @mbrandonw have been advocating this kind of approach to abstraction lately but I don't think they are a good fit for the standard library.

However, using the Monoid protocol I posted above you can build this:

struct AnyMonoid<Value> {
    let identity: Value
    let combine: (Value, Value) -> Value
    private let type: Any.Type

    init<M: Monoid>(_ m: M.Type) where M.Value == Value {
        identity = m.identity
        combine = m.combine
        type = m
    }
}
extension AnyMonoid: Hashable {
    static func == (lhs: AnyMonoid<Value>, rhs: AnyMonoid<Value>) -> Bool {
        return lhs.type == rhs.type
    }
    func hash(into hasher: inout Hasher) {
        hasher.combine(ObjectIdentifier(type))
    }
}

This struct is functionally very similar to the struct in your design with the advantage that it is Hashable. The primary downside is that it isn't quite as easy to create ad-hoc instances (something that should be rare if the standard library and Foundation includes a commonly useful set of monoids). This is a downside that could be mitigated in the future with various kinds of syntactic sugar. I don't want to sidetrack this thread further so I won't discuss that further here other than to say that I don't think current syntax required by ad-hoc instances should be a significant factor in choosing a design for Monoid.

Lantua · January 5, 2019, 9:30pm

I agree with @Ben_Cohen about it being more suitable for 3rd party.

We don’t even have a clear image about how to go an implement Monoid. Even if we decided on that, we need to implement it on:

Arithmatic functions (+, *)
Set operations (union, intersect)
Sequence concatenation (appending)
Range’’s operation (clamp, union?)

This is a large extension, and omiting any of this will make your suggestion seem broken/incomplete. All that trying to achieve something a single function can do.

If you decide want to push further Monoid, I do suggest pitching a parallel thread and make a more fleshed out idea (You don’t even agree on which implementation of Monoid you should use).

Steering this thread back (yay!), this is something fold can do!

Erica_Sadun · January 5, 2019, 9:32pm

Indeed.

So any thoughts about my suggestion where it returns Element (not Element?) and traps on empty sequences?

Ben_Cohen · January 5, 2019, 9:42pm

This would be inconsistent with min/max, which are basically the same question and already return an optional.

In fact I would say the main use case for adding this is to handle the empty possibility cleanly, without having to do a little emptiness check dance. It also allows you to use optional sugar to substitute a value easily.

Lantua · January 5, 2019, 9:44pm

You just read my mind

Tino · January 5, 2019, 9:56pm

I don't think it's a good idea to take array indexing as a precedent.
Using the same logic, you could deduce that Optionals themselves are superfluous:
When you assume that fold is only called on non-empty collections, you could also assume an Optional isn't nil when you try to access it.
Unless you are dealing with a special type of Sequence, emptiness should be treated as regular as an Optional being nil.

Erica_Sadun · January 5, 2019, 10:04pm

Good thing I didn't change the original pitchprosal for that then.

And because, well, I can't shake this idea, how about:

extension Sequence {
  
  public func reduce<Result>(
    _ initialPartialResult: Result,
    _ keyPath: KeyPath<Element,Result>,
    _ nextPartialResult:
    (_ partialResult: Result, Result) throws -> Result)
    rethrows -> Result {
      var accumulator = initialPartialResult
      for element in self {
        accumulator = try nextPartialResult(accumulator, element[keyPath: keyPath])
      }
      return accumulator
  }
}

let ageSum1 = people.reduce(0, \Person.vitae.age, +)
print("Average age is \(Double(ageSum1) / Double(people.count))")

// vs

let ageSum2 = people.reduce(0, { $0 + $1.vitae.age })
print("Average age is \(Double(ageSum2) / Double(people.count))")

anandabits · January 5, 2019, 10:11pm

Right, but that ignores the fact that I brought up Monoid because it is actually a sub-optimal solution in many cases, including the motivating examples.

One of the unfortunate aspects of the SE community is the desire to always push off related topics to independent threads. This prevents us from stepping back and looking a the motivating problem more holistically. IMO, this proposal cannot be evaluated well without considering alternative ways to solve the motivating problems, including Monoid.

I agree with others that if we include a fold that is not based on Monoid it should not trap and should return optional. However, I don't think it makes sense to include such a method unless there are sufficiently many use cases that cannot be addressed well with Monoid. If you want to continue proposing an Optional-returning fold I recommend looking for use cases other than monoid-style value combining (it might help to look for semigroups that are not also monoids).

Lantua · January 5, 2019, 10:40pm

Touche, but Monoid costs too much for what it’s really worth.

It’s structure is rigid; adding another monoid incurs a non-trivial amount of boilerplate.
You argued that it shouldn’t need many more custom ones, but that means we’re reducing boilerplate codes by putting boilerplate codes in the standard library. That’s something I can’t sit very well with.

We’ve debated solely about Monoid for more than a third of the lifetime of this proposal, yet not going anywhere. I’m inclined to say that, from this thread’s perspective, we’ve already considered Monoid at this point. Though it’s hard to say when only a few people are in the discussion.

anandabits · January 5, 2019, 10:51pm

Abstractions are defined once and used many times. Given that combining values is one of the most pervasive operations in programming, I think explicitly modeling combining operations is a good thing to do. Fortunately for you, I don't get to decide what makes it into the standard library and what doesn't!

This thread has mostly consisted of opinions without strong arguments that clearly evaluate the tradeoffs. I wish we would see more of that kind of objective analysis on SE. People can reasonably have different opinions about what tradeoffs we should make, but it should be possible to at least try to agree on what the tradeoffs are before a decision is made.

Lantua · January 5, 2019, 11:14pm

This does look too much like something you can do with

let ageSum1 = people.map { $0.vitae.age } .reduce(0, +)

which to me seems like a more versatile option with about the same mental model.

ptomaselli · January 5, 2019, 11:53pm

A question I wanted to ask before things moved too far downthread…

I’m curious to know more about what this means, or if you could provide some examples!

stephencelis · January 5, 2019, 11:55pm

You can write some fun functions around this and reuse existing machinery.

func get<Root, Value>(_ kp: KeyPath<Root, Value>) -> (Root) -> Value {
  return { root in
    root[keyPath: kp]
  }
}

func combining<Root, Value>(
  _ f: @escaping (Root) -> Value,
  by g: @escaping (Value, Value) -> Value
  )
  -> (Value, Root)
  -> Value {

    return { value, root in
      g(value, f(root))
    }
}

And with those defined, your example becomes:

people.reduce(0, combining(get(\.vitae.age), by: +))

As a sidebar, the (A, A) -> Bool shape also shows up in a bunch of generic algorithms (sorted, min, max), so the their function can be pretty handy:

func their<Root, Value>(
  _ f: @escaping (Root) -> Value,
  _ g: @escaping (Value, Value) -> Bool
  )
  -> (Root, Root) -> Bool {
    return { g(f($0), f($1)) }
}

people.sorted(by: their(get(\.name), <))

And with the obvious Comparable overload:

func their<Root, Value: Comparable>(
  _ f: @escaping (Root) -> Value
  )
  -> (Root, Root) -> Bool {
    return their(f, <)
}

people.sorted(by: their(get(\.name)))

I'm also a fan of defining the prefix ^ operator to simplify the noise of get while we wait for more seamless key-path integration

prefix operator ^
prefix func ^ <Root, Value>(_ kp: KeyPath<Root, Value>) -> (Root) -> Value {
  return get(kp)
}

Which makes the earlier examples read quite lovely!

people.reduce(0, combining(^\.vitae.age, by: +))
people.sorted(by: their(^\.name))

Lantua · January 6, 2019, 12:19am

I’m referring to this:

extension Sequence {
    @warn_unqualified_access
    public func min(
        by areInIncreasingOrder: (Element, Element) throws -> Bool
    ) rethrows -> Element?

    @warn_unqualified_access
    public func max(
        by areInIncreasingOrder: (Element, Element) throws -> Bool
    ) rethrows -> Element?
}

extension Sequence where Element: Comparable {
    @warn_unqualified_access
    public func min() -> Element?

    @warn_unqualified_access
    public func min() -> Element?
}

They’re defined here.

You can use it like this:

[1, 2, 3, 4].min(<) // Optional(1)
[Int].min(<) // nil
[1,2,3,4].min() // Optional(1); Another implementations when Element is Comparable

This saves me a lot of time since I can skip a lot of logic when I have empty data and they do remind me to check if that’s the case.