Idea: mutatingForEach for Collections


(Cory Benfield) #1

With the introduction of the _modify accessor and its use in Swift 5, it has become possible to perform manipulations directly on the backing storage of a number of Collection data types.

This is extremely useful when a Collection is the primary storage location for mutable state, especially when the elements stored in the Collection are CoW-able value types, or when the act of obtaining the object from the backing storage may be costly.

As an example of this use-case, SwiftNIO is working on a pure-Swift HTTP/2 implementation that stores state for each HTTP/2 stream in a Dictionary. These state objects are mutated on the hot path, and as a result it would be very useful to avoid the overhead both of hashing the key twice, instead of once, as well as avoiding CoW operations should we add any CoW-able data types to the stream state.

While operating on a single element of a Collection in this way is straightforward, there is not an obvious spelling for doing this for all elements in a dictionary. for...in does not allow mutating the value of the iterator, forEach does not pass the elements inout, and all other operations produce a new Collection.

It seems to me that it might be nice to have something like a mutatingForEach, which is identical to forEach but passes the elements inout. This will allow using the _modify accessors on the subscript, where they are available, providing a potentially optimised path for bulk operations on Collections.

@lorentey has been a useful sounding board here, and pointed out that this is a sufficiently simple function that it does not need to be a customisation point. The implementation could simply be:

mutating func mutatingForEach(_ body: (inout Element) throws -> Void) rethrows {
    var index = self.startIndex
    while index != self.endIndex {
        try body(&self[index])
        self.formIndex(after: &index)
    }
}

The only thing I don't like about it is the name, so I propose that the name of this function is a great bike shed to paint.

Do people think this is worth a proposal? I'm happy to put one together if we think it's worthwhile.


(Karoy Lorentey) #2

+1, and I think transform would be a good name for this function.

dictionary.values.transform { v in v += 1 }

(Steve Canon) #3

Yeah, either that or something like modifyEach.


(Guillaume Lessard) #4

I like modifyEach. I don't like suggestion of transform much because it evokes map closures (e.g. map<T>(_ transform: (Element) -> T).


(Nate Cook) #5

This sounds like a useful addition! Note that Dictionary is troublesome as always — this method would need to be added in an extension to MutableCollection, but dictionaries themselves aren't mutable collections, only their Values sub-collections are. If you're okay with modifying the values independent of the keys, that's great, but if not, to solve your use case we might additionally need a modifyEachValueWithKey (I can come up with terrible names all day). That would look something like this:

extension Dictionary {
    mutating func modifyEachValueWithKey(
        _ body: (Key, inout Value) throws -> Void
    ) rethrows
}

(Johannes Weiss) #6

+1, that would be a great addition. I don't mind the name :slight_smile:


(Michael Pangburn) #7

This discussion has come up before. See In-place map for MutableCollection.
I'm in favor of the idea and don't have a strong opinion on its name.

Supporting this functionality at the language level has also come up:

for inout element in sequence {
    // ...
}

Both can exist (as for-in and forEach do), and I don't mean to imply that for-inout's implementation should preclude the inclusion of the proposed method—but it's relevant to discussion here.


(Cory Benfield) #8

Is this link right? You have just linked back to a post in this thread. :smile:


(Michael Pangburn) #9

Not sure how I managed that one—fixed, thanks!


(Cory Benfield) #10

Thanks @mpangburn

So I skimmed through the previous discussion and didn't find any particular reason for it to stall out, it just kinda...did. I can see two possible reasons for its demise:

  1. Insufficient time to pitch and implement. Only @Chris_Eidhof can speak to whether that's the case.
  2. Some uncertainty around @Joe_Groff's comment about the possibility of map being able to evolve this behaviour.

In either case I think the proposal as written has merit, and we should tackle it again.

As for for inout element in sequence, I like this substantially less because it's a much larger change to the language. The way Sequence is implemented does not seem to make it straightforward to hand the underlying storage to the loop body. In particular, there is no concept of MutableSequence. This means that for inout loops have different type requirements than for loops, either requiring a MutableCollection as the sequence (and then not involving the makeIterator path) or creating a new MutableSequence protocol that uses accessors in weird ways.

In either case this feels like a lot of effort for adding syntax to what is not a particularly common operation. So I think I'd want to constrain this to the much simpler model proposed here and in the previous thread.


(Joe Groff) #11

An eventual extension to the modify idea we have planned, which @John_McCall laid out in his ownership manifesto, is to move to using Python-style generators as the iteration primitive. Modeling iteration as a coroutine has the same benefits for multiple elements as read/modify coroutine give individual elements, since it naturally provides a context from which values can be borrowed in-place. If for loops became the interface for pumping generators then it would make sense to support for inout as well.

In the meantime, I think it makes sense to have an operation like this. As you noted, it doesn't really make sense for Sequence because there's no way to modify the underlying storage, but with a MutableCollection, this should be equivalent to iterating through the indices and passing each element inout in turn. That seems like the right place to put this as an extension method.


(Cory Benfield) #12

:+1: Strongly agreed: this seems like a really effective extension to the language.

Agreed that we shouldn't defer this enhancement to wait for that language extension though, so I'll go ahead and put together a proposal: it seems generally useful. I'm still happy to take naming suggestions from anyone with opinions (do we know anyone like that in the Swift community?), though I like modifyEach so far.


(Ben Cohen) #13

I'd like to push back on this a little bit. If we have a for &x in a kind of feature we're pretty confident is coming, we shouldn't pre-empt it in an ABI-stable library temporarily until we have it. The helper is easy for people to add themselves if they really feel it would simplify their code in the meantime.

I admit a bias here: I think Sequence.forEach is horrendous and if I had my way, would expunge it from the language (luckily for its fans, I won't have my way for source compat reasons if no other, and it's here to stay). So modeling a mutating version on it would be something I'd really rather not do except as the only and ultimate solution, rather than a stop-gap.


(Cory Benfield) #14

I think the core question is not “are we confident it’s coming”, but “when are we confident it will ship”?

If this is an idea we have conceptualised and know could be built, but do not plan to build in the relatively near future, I don’t think it hurts to provide a stopgap. After all, the odds of anything shipping early are always low!

“I love deadlines. I like the whooshing sound they make as they fly by.”


(Joe Groff) #15

IMO the ABI liability of one extension method (I don't see the pressing need for it to be a protocol requirement) seems minor, and unlike forEach, which I agree is pretty superfluous, this operation does in fact encapsulate a not-entirely-trivial code pattern you have to write otherwise, looping over collection.indices and modifying each &collection[i] in turn.


(Frank Swarbrick) #16

Even if we eventually have "for inout ... in ..." (which we should have), I would say as long as we have forEach we should have a mutable version. Mine is named updateEach and is similar to the originally suggested mutatingForEach, but uses a for/in over indices. Shorter. Don't know if it would perform better or worse than his version.

(Based on the next comment in this thread by @Nevin it sounds like this would perform worse. Please ignore! Except for the name, which I still like. The performance issue not being obvious does seem to justify its presence in stdlib, in any case!)

extension MutableCollection {
    mutating func updateEach(_ body: (inout Element) throws -> Void) rethrows {
        for index in indices {
            try body(&self[index])
        }
    }
}

I would imagine once for inout ... in ... is implemented it could look very similar to the forEach implementation:

extension MutableCollection {
    mutating func updateEach(_ body: (inout Element) throws -> Void) rethrows {
        for inout element in self {
            try body(&element)
        }
    }
}

#17

As the unofficial torch-bearer of the “Collection.indices is a potential performance trap” caravan, I would be remiss not to point out that the proper way to do this (as I described here in the previous thread) is:

var i = startIndex
while i != endIndex {
  try f(&self[i])        OR        self[i] = try f(self[i])
  formIndex(after: &i)
}

The “OR” there is because it is not obvious whether f should take its own argument inout. Taking it inout makes life easier when you have a collection-of-collections that you want to mutate in place (as I described here in the previous thread), but taking it non-inout lets you pass any (Element)->Element function as the transform.


(Cory Benfield) #18

That looks a lot like the code I originally proposed. :wink:

As to the inout, I think it is a substantial improvement to take the argument inout. This is specifically because it unlocks the use of the modify accessor, which provides a meaningful performance advantage.


(Cory Benfield) #20

I've put together a draft proposal, shown below. Please let me know what you think.

In-place modification of MutableCollection elements

Introduction

This proposal would add a modifyEach method to MutableCollection.

This feature would enable modifying the elements contained within a single MutableCollection, without creating a new MutableCollection to contain them the way map does. In many cases this can provide performance improvements by taking advantage of the _modify accessor available on many MutableCollections.

Motivation

Many Swift programs use MutableCollections to store substantial amounts of program state. In many cases Swift programs will endeavour to store their state as objects with value semantics, as this provides substantial rewards in terms of ensuring program correctness and performance.

However, if the primary copy of the program state is stored in the MutableCollection, updating that state can be expensive. If the user writes code that brings the value into temporary storage, that value will need to be copied out of the storage (invoking the getaccessor), mutated, and them copied back in to the storage (invoking the set accessor).

As an example, consider a simple task tracking application. We may model its main program state something like this:

struct TaskTracker {
    var tasks: [Task]
}

Imagine we wanted to implement a function that would mark all of the tasks as completed. How might we write such a function?

One natural approach would be to use map:

mutating func markAllComplete() {
    self.tasks = self.tasks.map {
        var newElement = $0
        newElement.complete = true
        return newElement
    }
}

This approach creates a whole new array, heap-allocates it, copies all the data from the old array to the new one (modifying it along the way), and then frees the old one. That's not ideal.

We could instead attempt something more performant: modify the array in place. A naive approach might be:

mutating func markAllComplete() {
    for index in self.tasks.indices {
        self.tasks[index].complete = true
    }
} 

While ths implementation is safe for Array, it has performance problems when written as a generic operation on MutableCollection, as indices may hold a reference to self.tasks, which will cause this operation to trigger a CoW operation unnecessarily.

The highest-performance implementation is this one:

mutating func markAllComplete() {
    var index = self.tasks.startIndex
    while index != self.tasks.endIndex {
        self.tasks[index].complete = true
        index = self.tasks.formIndex(after: &index)
    }
} 

This code is good in Swift 4, but in Swift 5 this final form is particularly powerful due to the increasingly widespread use of the _modify accessor. In this case, this code will be able to compile down to something very close to the equivalent code in C, manipulating the values directly in the underlying Array storage.

Unfortunately, this pattern is not a natural pattern for most Swift programmers to write. It is unlikely that the average Swift programmer will naturally solve their way to this implementation without having a relatively good grasp of not only how accessors work in Swift in general, but also the use of newer accessors in Swift 5.

Proposed solution

This proposal seeks to add a new method, modifyEach, to MutableCollection. This method would provide a generic implementation of the above pattern that seeks to push users towards a performant pattern of modifying elements in mutable collections.

This change would modify the above function to:

mutating func markAllComplete() {
    self.tasks.modifyEach { $0.complete = true }
} 

In addition to being a natural spelling of this kind of operation, it provides users with a common pattern for performing this kind of state mutating operation on all kinds of collections, including those that potentially have substantially more expensive access operations.

While the example above using Array is simple, in real-world usage the programs and state modification logic can be substantially more complex. Generally these more complex operations are described as mutating functions that need to be invoked on the objects stored in a MutableCollection. modifyEach provides easy hooks for performing arbitrary mutation of stored objects, while pushing users towards invoking functions either on inout references to state or as mutating member functions on the stored objects. In either case, this plays much more nicely with the _modify accessor than more naive mutation approaches.

This proposal does not propose to make modifyEach a customisation point. In practice there is only one reasonable implementation that will work across all MutableCollection objects with maximum performance, snd there is no reason to allow MutableCollections to override that behaviour.

Detailed design

The new function is short and clear:

extension MutableCollection {
    /// Calls the given closure on each element in the collection in the same order as a `for-in` loop.
    ///
    /// The `modifyEach` method provides a mechanism for modifying all of the contained elements in a `MutableCollection`. It differs
    /// from `forEach` or `for-in` by providing the contained elements as `inout` parameters to the closure `body`. In some cases this
    /// will allow the parameters to be modified in-place in the collection, without needing to copy them or allocate a new collection.
    ///
    /// - parameters:
    ///    - body: A closure that takes each element of the sequence as an `inout` parameter
    @inlinable
    mutating func modifyEach(_ body: (inout Element) throws -> Void) rethrows {
        var index = self.startIndex
        while index != self.endIndex {
            try body(&self[index])
            self.formIndex(after: &index)
        }
    }
}

Source compatibility

This change provides no source compatibility impact. It requires no new syntax from the language and could easily have been implemented in Swift 4, though it provides better performance in Swift 5.

Effect on ABI stability

This change does not affect the stable ABI.

Effect on API resilience

This change adds to the declared API of MutableCollection with minimal ABI impact. In particular, it does not break the ABI of MutableCollection.

As the body of the method relies entirely on declared parts of the MutableCollection protocol, it is safe to make this method non-resilient, as it is valid in all current versions of Swift. Programs that do inline this implementation will continue to be correct as long as the MutableCollection protocol does not fundamentally change.

Alternatives considered

This change was originally proposed in March of 2018 by Chris Eidhof. This proposal is spiritually identical to Chris' identical proposal: as such, he has been identified as a co-author of this proposal.

Indices

Chris' original proposal, as well as several suggestions in the new thread, proposed using indices instead of formIndex. As discussed above, indices can cause an unnecessary CoW operation in some cases, as it may hold a reference to the original MutableCollection that cannot be elided.

For this reason, it is preferable to use formIndex instead.

Inout for loops

As part of the same ownership manifesto that led to the addition of the _modify accessor, John McCall discussed the possibility of using Python-style generators for iteration in a future version of Swift.

The sample code from the manifesto bears a striking similarity to the code proposed in this document:

  mutating generator iterateMutable() -> inout Element {
    var i = startIndex, e = endIndex
    while i != e {
      yield &self[i]
      self.formIndex(after: &i)
    }
  }

As this generator construct could be used to provide iteration à la Python, a logical extension to the for-in syntax would be for inout x in y, allowing the mutation of the elements produced by the generator.

This language extension would be substantially nicer than the modifyEach function provided here. In particular, it integrates much better with features elsewhere in the langage, avoids the potential performance pitfalls of the widespread use of closures, and looks altogether more "Swifty".

However, at this stage such a feature is unlikely to land in Swift 5, meaning that its public release will be at least a year away, even if it is implemented on the most aggressive of schedules. For this reason, and considering the low ABI, API, and source compatibility impact of adding modifyEach to MutableCollection, the authors consider it worthwhile to add modifyEach to provide a worse version of for inout until that language feature arrives.

At the time that this language feature arrives, the Swift community should consider whether modifyEach (and its non-modifying cousin, Sequence.forEach) should be formally deprecated in favour of the loop constructs. However, this should be part of a wider discussion about the use of generators for iteration, and is outside the scope of this proposal.


(Paul Cantrell) #21

I’d have found this useful on several occasions — far more useful than forEach. I believe I’ve called it mapInPlace, as the prior discussion proposed. But modifyEach also strikes me just fine.

I’d also very much like to have a filterInPlace to go with it. The implementation of that method allows nice optimizations for specific collection types that are difficult to express generally, and even the simple array-only efficient implementation is easy to mess up.