Pitch: remove some customization points from the std lib collection protocols

Hi Swift Evolution.

A short pitch that is part of preparing the std lib for ABI stability, as well as readying for move-only types in the future. GitHub version here

Remove Some Customization Points from the Standard Library's Collection Hierarchy

Introduction

This proposal removes four customization points from protocols in the standard library:

  • map and forEach from Sequence
  • first from Collection
  • last on BidirectionalCollection

Motivation

Customization points are methods that are declared on the protocol, as well as given default implementations in an extension. This way, when a type provides its own non-default implementation, this will be dispatched to in a generic context (e.g. when another method defined in an extension on the protocol calls the cusomized method). Without a customization point, the default implementation is called in the generic context.

This serves broadly two purposes:

  1. Allowing for differences in behavior. For example, an "add element" method on
    a set type might exclude duplicates while on a bag it might allow them.

  2. Allowing for more efficient implementations. For example, count on
    forward-only collections takes O(n) (because without random access, the
    implementation needs to iterate the collection to count it). But some
    collection types might know their count even if they aren't random accesss.

Once ABI stability has been declared for a framework, customization points can never be removed, though they can be added.

Customization points aren't free – they add a small cost at both compile time and run time. So they should only be added if there is a realistic possibility that one of the two reasons above are necessary.

In the case of the 4 customization points in this proposal, reason 1 does not apply. In fact it could be considered a serious bug if any type implemented these 4 features with anything other than the default obvervable behavior.

It is also hard to find a good use case for reason 2 – whereas slight slowdowns from the presence of the customization points have been observed. While it is possible that a resilient type's forEach implementation might be able to eek out a small performance benefit (for example, to avoid the reference count bump
of putting self into an iterator), it is generally harmful to encourage this kind of "maybe forEach could be faster" micro-optimization. For example, see here, where error control flow was used in order to break out of the forEach early, causing unpleasant interference for debugging workflows that detected when errors were thrown.

Future move-only type considerations

In the case of first and last there is an additional consideration: in the future, collections of move-only types (including Array) will not be able to reasonably fulfil these requirements.

A collection that contains move-only types will only allow elements to be either removed and returned (e.g. with popLast()), or borrowed (e.g. via subscript).

Returning an optional to represent the first element fits into neither of these buckets. You cannot write a generic implementation of first that fetches the first move-only element of a collection using a subscript, moves it into an optional, and then returns that optional.

This means first and last need to be removed as requirements on the collection protocols in order to make it possible for collections of move only types to conform to them.

They would remain on Collection via extensions. When move-only types are introduced, those extensions will be constrained to the collection element being copyable.

Once the final functionality for move-only types is designed, it may be that language features will be added that allow for borrowing into an optional, allowing even collections of move-only types to implement a first property.
But it's better to err on the side of caution for now and remove them from the protocol.

Proposed solution

Remove these 4 customization points from the Collection protocols. The default implementations will remain.

Source compatibility

These are customization points with an existing default implementation, so there is no effect on source stability.

It is theoretically possible that removing these customization points could result in a behavior change on types that rely on the dynamic dispatch to add additional logic. However, this would be an extremely dubious practice e.g. MyCollection.first should really never do anything more than return the first element.

Effect on ABI stability

Removing customization points is not an ABI-stable operation. The driver for this proposal is to do this before declaring ABI stability.

Effect on API resilience

N/A

Alternatives considered

Leave them in. Live with the slight code/performance impact in the case of map and forEach, and work around the issue when designing move-only types.

7 Likes

I have a collection wrapper which has concurrent implementations of map, filter, etc. If this is accepted such things won't be possible.

But that's basically the only reasonable alternative implementation of map I can think of. forEach can't be made concurrent because the documentation states that the elements are presented in a particular order.

2 Likes

This generally sounds important enough that it should be done regardless of compatibility impact. Would it be "actively harmful" for a feature to prevent the evolution of the language in the direction it needs to go?

Based on past experience, I'd have expected forEach to be an important customization point for tree-based collections, and I was fully prepared to skewer the performance argument against it. Surprisingly, my benchmarking indicates this is no longer the case at all!

With that argument gone, I can think of no reason to keep Sequence.forEach, Sequence.map, Collection.first, or BidirectionalCollection.last as customization points.

Boring benchmarking details follow; feel free to stop reading now. :relaxed:

The slowest tree-based collection I can think of is an enum-based binary tree:

enum BinaryTree<Element> {
  case empty
  indirect case node(value: Element, left: BinaryTree, right: BinaryTree)
}

This would not be very practical as a general-purpose data structure, but I think it's quite important in educational contexts. (I.e., people learning about algorithms need to be able to learn about trees in an approachable way, without having to deal with ManagedBuffer or isKnownUniquelyReferenced.)

We can add initializers to make it easier to create such trees from a list of values. We can also conform BinaryTree to Sequence, Collection etc. Implementation details are hidden behind the details toggles below.

Initalizer for building a balanced search tree from a sorted sequence
extension BinaryTree {
  init<S: Sequence>(_ source: S) where S.Element == Element {
    let array = Array(source)
    self.init(array[array.indices])
  }

  init(_ source: [Element]) {
    self.init(source[source.indices])
  }

  init(_ source: ArraySlice<Element>) {
    if source.isEmpty {
      self = .empty
    } else {
      let middle = source.startIndex + source.count / 2
      self = .node(
        value: source[middle],
        left: BinaryTree(source.prefix(upTo: middle)),
        right: BinaryTree(source.suffix(from: middle + 1)))
    }
  }
}
Sequence conformance (inorder iterator)

This is the simplest definition I can think of for an iterator that does an inorder walk. It's not very elegant.

extension BinaryTree: Sequence {
  struct Iterator: IteratorProtocol {
    var stack: [BinaryTree]

    init(_ tree: BinaryTree) {
      stack = []
      push(tree)
    }

    mutating func push(_ tree: BinaryTree) {
      var tree = tree
      while case let .node(value: _, left: left, right: _) = tree {
        stack.append(tree)
        tree = left
      }
    }

    mutating func next() -> Element? {
      while true {
        guard let tree = stack.popLast() else { return nil }
        switch tree {
        case .empty:
          continue
        case let .node(value: value, left: _, right: right):
          push(right)
          return value
        }
      }
    }
  }

  func makeIterator() -> Iterator {
    return Iterator(self)
  }
}
Collection conformance

Sorry, this is left as an exercise for the reader. The trick is that indices have to contain a representation for a path inside the tree. Keep in mind that subscripting needs to have O(1) performance.

Specialized forEach implementation (inorder walk)
extension BinaryTree {
  func forEach(_ body: (Element) throws -> Void) rethrows {
    switch self {
    case .empty:
      return
    case let .node(value: value, left: left, right: right):
      try left.forEach(body)
      try body(value)
      try right.forEach(body)
    }
  }
}

Let's compare the performance of a for-in loop to a forEach call:

let tree = BinaryTree(0 ..< 10_000_000)

benchmark {
  for _ in 0 ..< 10 {
    var sum = 0
    tree.forEach { value in
      sum &+= value
    }
    precondition(sum == 49999995000000)
  }
}
// âźą elapsed time: 4.96s

benchmark {
  for _ in 0 ..< 10 {
    var sum = 0
    for value in tree {
      sum &+= value
    }
    precondition(sum == 49999995000000)
  }
}
// âźą elapsed time: 5.12s

The iterator is less than 4% slower. With unsafe trickery, I can speed up forEach by about 14%, widening the gap to 20% -- but even that is too small to argue for the customization point.

6 Likes

This is good to see. Thanks for benchmarking.

Even if it means leaving modest perf gains on the table, I would really rather avoid forEach being a backdoor performance tweak for lots of reasons because it spreads an unhealthy "use forEach it's faster meme" that is rarely true and will sometimes be a net loss, and like we saw with the use in first(where:), trying hard to use it encourages really obfuscated code with unexpected downsides.

My hope is we get some kind of coroutine generator-based iteration some time in the future which could bring the same benefits you saw, but without having to use forEach to get them.

5 Likes

I have been thinking of a collection that "compresses" repeated occurrences of equal elements ("145838485 times foo, 3244 times bar...) that could provide an optimized map — but without pure functions, this won't even work, and it's probably better to do such caching in specialized methods.

I don't think map can be made concurrent either, tbh.

There is no rule that the closure to map cannot have side effects. In fact, because it could have side-effects is part of the reason it is eager by default, and to get lazy behavior you have to ask for that explicitly, in which case the closure is becomes @escaping.

While out-of-order calls to the closure is probably fair game, simultaneous calling of the closure from multiple threads is probably not something you should be doing in a generic context that might not be expecting it. If something is going to be executing in parallel, it's probably better to be explicit about it.

6 Likes

Hmmm, that's interesting, because the design I use is based on something @dabrahams suggested a few years back:

Though I support eliminating these customization points, I feel obliged to point out:

  • I think you want to use an explicit stack (i.e. an Array) to implement forEach, rather than recursion, if you want to observe the potential advantages of the customization point.
  • I am not sure that testing on the slowest tree data structure you can think of is in any way conclusive. Normally we decide on customization points based on whether they provide a significant advantage for the fastest representative data structures. Many of these, such as reserveCapacity, are motivated by Array.

Oh, and I consider this a cruel trick to play on the eager reader:

I, at least, was looking forward to seeing what you had written there ;-)

1 Like

Handling parallel algorithms the same way as "lazy" actually requires that these methods aren't customization points. It's an important feature of that design that the lazy versions of those methods only "shadow" the non-lazy versions, rather than being dispatched to in a generic context where you have no idea that laziness is going into effect. That would be even more important with parallel versions.

5 Likes

Ah right, that makes sense. Although I'm generally against that kind of "shadowing"...

Curious what impact this may have on Realm’s Collections if any

I don't think I would want to comment on specific frameworks, but as discussed in the review, the main reason these shouldn't be customization points is that there are no reasonable alternative implementations that are appropriate to expose in a generic context. I expect that reasoning would hold with any framework.

1 Like

Can I ask what a "customization point" even refers to? Is this something internal to the compiler that the average end user developer wouldn't know or care about?
Thanks.

“Customization point” means one of two things:

Sometimes it refers to a protocol requirement in general, because conforming types get to customize the implementation.

More often though, it refers to a protocol requirement which has a default implementation, because conforming types can customize the implementation if they want to (but they don’t have to).

Both of the above are dynamically dispatched (because the method is defined within the body of the protocol). This contrasts with protocol extension methods (that are not part of the protocol body itself) which use static dispatch and so their behavior cannot be customized by conforming types.

6 Likes

That is understandable, and provided this comment
// Sequence conformance for RLMArray and RLMResults is provided by RLMCollection's
// makeIterator() implementation.
I’m assuming that holds true for this framework as well.

Thank you. This thread now makes a lot more sense to me! :grinning:

FWIW, my usage is: things in the protocol are requirements, and requirements that also have a default implementation are customization points. This isn't official, it's just how I use them.

9 Likes