Idea: mutatingForEach for Collections

Chris_Eidhof · December 6, 2018, 6:19am

I also still find it useful. I like the argument (I think it was by Stephen?) for not calling it map, as this method doesn't change the type (only the values). Thank you for including me as an author.

To improve the proposal further, I'd suggest having more/different examples. In my experience, people find it hard to imagine how they could use something new (especially when they've never used it before). Some good examples could help there. WDYT?

lukasa · December 6, 2018, 7:50am

I am not convinced that filterInPlace provides as substantial a benefit. filterInPlace requires a deeper understanding of the semantics of the specific Collection, as the _modify accessor can no longer be intelligently used: we will likely need to re-arrange the underlying storage in the Collection. This would require it ultimately to be a customisation point, and I think @ben-cohen is hard enough to convince as it is: trying to make the case that we should add a customisation point to implement this is a wildly harder sell that I think belongs in a different proposal.

I think that's an excellent idea. I'll try to come up with a few more that aren't too specific to my use-case.

Chris_Eidhof · December 6, 2018, 1:06pm

Let me know when you think it's ready, I'm happy to go over it once as well and give some feedback before it's a real proposal.

Joe_Groff · December 6, 2018, 3:01pm

filterInPlace is removeAll(where:). It still has the benefit of not allocating a new collection, and RangeReplaceableCollection provides the underlying customization points necessary.

lukasa · December 6, 2018, 3:02pm

Perfect, let's still not build filterInPlace.

Joe_Groff · December 6, 2018, 3:05pm

Too late!

lukasa · December 6, 2018, 3:06pm

Dang! I guess I should propose to rename removeAll(where:) to filterInPlace then.

Paul_Cantrell · December 6, 2018, 3:14pm

Huh, I guess I should pay more attention to this “documentation” people keep talking about!

lukasa · December 6, 2018, 4:14pm

Here is a new draft proposal, containing the two most broadly-relevant examples I could think of.

In-place modification of MutableCollection elements

Proposal: SE-NNNN
Authors: Cory Benfield, Chris Eidhof
Review Manager: TBD
Status: Awaiting implementation

Introduction

This proposal would add a modifyEach method to MutableCollection.

This feature would enable modifying the elements contained within a single MutableCollection, without creating a new MutableCollection to contain them the way map does. In many cases this can provide performance improvements by taking advantage of the _modify accessor available on many MutableCollections.

Motivation

Many Swift programs use MutableCollections to store substantial amounts of program state. In many cases Swift programs will endeavour to store their state as objects with value semantics, as this provides substantial rewards in terms of ensuring program correctness and performance.

However, if the primary copy of the program state is stored in the MutableCollection, updating that state can be expensive. If the user writes code that brings the value into temporary storage, that value will need to be copied out of the storage (invoking the get accessor), mutated, and them copied back in to the storage (invoking the set accessor).

As an example, consider a simple task tracking application. We may model its main program state something like this:

struct TaskTracker {
    var tasks: [Task]
}

Imagine we wanted to implement a function that would mark all of the tasks as completed. How might we write such a function?

One natural approach would be to use map:

mutating func markAllComplete() {
    self.tasks = self.tasks.map {
        var newElement = $0
        newElement.complete = true
        return newElement
    }
}

This approach creates a whole new array, heap-allocates it, copies all the data from the old array to the new one (modifying it along the way), and then frees the old one. That's not ideal.

We could instead attempt something more performant: modify the array in place. A naive approach might be:

mutating func markAllComplete() {
    for index in self.tasks.indices {
        self.tasks[index].complete = true
    }
}

While ths implementation is safe for Array, it has performance problems when written as a generic operation on MutableCollection, as indices may hold a reference to self.tasks, which will cause this operation to trigger a CoW operation unnecessarily.

The highest-performance implementation is this one:

mutating func markAllComplete() {
    var index = self.tasks.startIndex
    while index != self.tasks.endIndex {
        self.tasks[index].complete = true
        index = self.tasks.formIndex(after: &index)
    }
}

This code is good in Swift 4, but in Swift 5 this final form is particularly powerful due to the increasingly widespread use of the _modify accessor. In this case, this code will be able to compile down to something very close to the equivalent code in C, manipulating the values directly in the underlying Array storage.

Unfortunately, this pattern is not a natural pattern for most Swift programmers to write. It is unlikely that the average Swift programmer will naturally solve their way to this implementation without having a relatively good grasp of not only how accessors work in Swift in general, but also the use of newer accessors in Swift 5.

Proposed solution

This proposal seeks to add a new method, modifyEach, to MutableCollection. This method would provide a generic implementation of the above pattern that seeks to push users towards a performant pattern of modifying elements in mutable collections.

This change would modify the above function to:

mutating func markAllComplete() {
    self.tasks.modifyEach { $0.complete = true }
}

In addition to being a natural spelling of this kind of operation, it provides users with a common pattern for performing this kind of state mutating operation on all kinds of collections, including those that potentially have substantially more expensive access operations.

While the example above using Array is simple, in real-world usage the programs and state modification logic can be substantially more complex. Generally these more complex operations are described as mutating functions that need to be invoked on the objects stored in a MutableCollection. modifyEach provides easy hooks for performing arbitrary mutation of stored objects, while pushing users towards invoking functions either on inout references to state or as mutating member functions on the stored objects. In either case, this plays much more nicely with the _modify accessor than more naive mutation approaches.

This proposal does not propose to make modifyEach a customisation point. In practice there is only one reasonable implementation that will work across all MutableCollection objects with maximum performance, snd there is no reason to allow MutableCollections to override that behaviour.

Detailed design

The new function is short and clear:

extension MutableCollection {
    /// Calls the given closure on each element in the collection in the same order as a `for-in` loop.
    ///
    /// The `modifyEach` method provides a mechanism for modifying all of the contained elements in a `MutableCollection`. It differs
    /// from `forEach` or `for-in` by providing the contained elements as `inout` parameters to the closure `body`. In some cases this
    /// will allow the parameters to be modified in-place in the collection, without needing to copy them or allocate a new collection.
    ///
    /// - parameters:
    ///    - body: A closure that takes each element of the sequence as an `inout` parameter
    @inlinable
    mutating func modifyEach(_ body: (inout Element) throws -> Void) rethrows {
        var index = self.startIndex
        while index != self.endIndex {
            try body(&self[index])
            self.formIndex(after: &index)
        }
    }
}

Examples

Composite Data Structures

Dictionary is commonly used as a form of "lookup table", where the nature of the keying is known ahead of time but the number of values that need to be keyed is not. It is useful to be able to store value types in these dictionaries in order to gain all of the correctness and performance properties of value types.

As an example, consider a hypothetical multiplayer video game that stores events received from the network for all of the other players. This may be stored in a dictionary whose values are arrays:

var events: [PlayerID: [GameEvent]]

In some cases the game state may change in a way that invalidates a whole swathe of events. For example, these events may concern a game object that no longer exists. In Swift today, the natural way to write this is:

events = events.mapValues { $0.filter { !$0.target == droppedTarget } }

The performance of this is fairly admirable in Swift, as it avoids rehashing all of the keys in the dictionary, but it ends up consuming a substantial amount of memory. We need to allocate an intermediate dictionary, and then each filter call will not only allocate a new Array but will potentially reallocate that array many times as it discovers it needs to store new elements.

With modifyEach, we can achieve an almost equally nice representation:

events.modifyEach { $0.removeAll { $0.target == droppedTarget } }

However, this provides substantial performance improvements. Not only does modifyEach modify the values in-place, avoiding the need for an extra dictionary allocation, but removeAll(where:) also modifies the underlying arrays in place, avoiding their extra allocations. This reduces the memory traffic of this operation substantially, as well as being just as clear as the original proposal.

Efficient Buffer Transformations

It is occasionally necessary to transform large buffers of binary data in some way. For example, the WebSockets network protocol contains a provision for "masking" data by means of XORing the frame contents with a 4-byte field.

Implementing this with Data or Array today, a number of patterns are possible. A combination of enumerate and map can be used:

let maskingKey: [UInt8] = [1, 2, 3, 4]
let data = Array("Hello, world!".utf8)

let maskedData = data.enumerated().map { $0.1 ^ maskingKey[$0.0 % 4] }

Alternatively, a for loop can be used:

let maskingKey: [UInt8] = [1, 2, 3, 4]
let data = Array("Hello, world!".utf8)

var keyIndex = maskingKey.startIndex
var newData = Array<UInt8>()
newData.reserveCapacity(data.count)

for byte in data {
    newData.append(byte ^ maskingKey[keyIndex])
    maskingKey.formIndex(after: &keyIndex)
    if keyIndex == maskingKey.endIndex {
        keyIndex = maskingKey.startIndex
    }
}

Both of these implementations bear a cost, however, which is that they are memory-hungry, requiring twice the storage of data to perform their operation. This is an unnecessary cost.

To avoid this cost, we can use modifyEach to modify the elements in place.

let maskingKey: [UInt8] = [1, 2, 3, 4]
var data = Array("Hello, world!".utf8)

var keyIndex = maskingKey.startIndex
data.modifyEach {
    $0 ^= maskingKey[keyIndex]
    maskingKey.formIndex(after: &keyIndex)
    if keyIndex == maskingKey.endIndex {
        keyIndex = maskingKey.startIndex
    }
}

The advantage of this is particularly noticeable for large buffers, where the efficiency gains of mutating the buffer in place are highly valuable.

Source compatibility

This change provides no source compatibility impact. It requires no new syntax from the language and could easily have been implemented in Swift 4, though it provides better performance in Swift 5.

Effect on ABI stability

This change does not affect the stable ABI.

Effect on API resilience

This change adds to the declared API of MutableCollection with minimal ABI impact. In particular, it does not break the ABI of MutableCollection.

As the body of the method relies entirely on declared parts of the MutableCollection protocol, it is safe to make this method non-resilient, as it is valid in all current versions of Swift. Programs that do inline this implementation will continue to be correct as long as the MutableCollection protocol does not fundamentally change.

Alternatives considered

This change was originally proposed in March of 2018 by Chris Eidhof. This proposal is spiritually identical to Chris' identical proposal: as such, he has been identified as a co-author of this proposal.

Indices

Chris' original proposal, as well as several suggestions in the new thread, proposed using indices instead of formIndex. As discussed above, indices can cause an unnecessary CoW operation in some cases, as it may hold a reference to the original MutableCollection that cannot be elided.

For this reason, it is preferable to use formIndex instead.

Inout for loops

As part of the same ownership manifesto that led to the addition of the _modify accessor, John McCall discussed the possibility of using Python-style generators for iteration in a future version of Swift.

The sample code from the manifesto bears a striking similarity to the code proposed in this document:

  mutating generator iterateMutable() -> inout Element {
    var i = startIndex, e = endIndex
    while i != e {
      yield &self[i]
      self.formIndex(after: &i)
    }
  }

As this generator construct could be used to provide iteration à la Python, a logical extension to the for-in syntax would be for inout x in y, allowing the mutation of the elements produced by the generator.

This language extension would be substantially nicer than the modifyEach function provided here. In particular, it integrates much better with features elsewhere in the langage, avoids the potential performance pitfalls of the widespread use of closures, and looks altogether more "Swifty".

However, at this stage such a feature is unlikely to land in Swift 5, meaning that its public release will be at least a year away, even if it is implemented on the most aggressive of schedules. For this reason, and considering the low ABI, API, and source compatibility impact of adding modifyEach to MutableCollection, the authors consider it worthwhile to add modifyEach to provide a worse version of for inout until that language feature arrives.

At the time that this language feature arrives, the Swift community should consider whether modifyEach (and its non-modifying cousin, Sequence.forEach) should be formally deprecated in favour of the loop constructs. However, this should be part of a wider discussion about the use of generators for iteration, and is outside the scope of this proposal.

jrose · December 6, 2018, 4:26pm

Arguably mapInPlace is the one that takes an (Element) -> Element closure, while modifyEach takes (inout Element) -> Void.

lukasa · December 6, 2018, 4:34pm

This proposal covers only modifyEach and does not propose to add a mapInPlace.

jarod · December 6, 2018, 7:45pm

I have an extension for this in my own code -- I think it's useful. I've called it mutateEach, which is nice from a consistency perspective, but I like the proposed modifyEach as well.

Nevin · December 6, 2018, 8:29pm

This looks fantastic!

I especially appreciate the in-depth discussion of optimization opportunities that are enabled by this proposal, but easy to miss without it.

Perhaps we ought to consider a diagnostic that recognizes the inefficient pattern “x = x.map” and suggests “x.mutateEach” instead.

IanPartridge · December 6, 2018, 8:46pm

I think this is a useful addition. To help paint the bikeshed I would point out that the signatures of forEach() and modifyEach() are of course very similar:

func forEach(_ body: (Element) throws -> Void) rethrows

mutating func modifyEach(_ body: (inout Element) throws -> Void) rethrows

As forEach() is the incumbent the new method should work alongside it, including in code completion. As a programmer I would like code completion to offer me the two choices side by side so I can think about which is appropriate for what I'm trying to achieve.

Hence, I think better names might be forEachInPlace(), forEachMutating() or forEachInout().

jawbroken · December 7, 2018, 12:46am

I'm not sure that “methods with similar signatures should have a common prefix for code completion reasons” is a good general naming guideline, and modifyEach seems like a clearer name than any of those alternatives.

xwu · December 7, 2018, 4:39am

We actually have an official API naming guideline for these pairs. People like to disparage it, but nonetheless it exists--and that is to use the word "form," as in union and formUnion, or index and formIndex. Here, the guideline would make the mutating version of forEach either formForEach or formEach.

Nevin · December 7, 2018, 5:53am

That is incorrect.

Here’s what the API naming guidelines actually say about the form prefix:

The existing “forEach” is not a noun, so the “form” rule does not apply.

• • •

The guidelines also say:

It is entirely reasonable to say the proposed “modifyEach” is consistent with “forEach”.

Moreover, forEach is clearly an exception to this entire section of the guidelines, because it is neither a verb phrase nor a noun phrase:

Therefore the relevant naming rules are in fact the highest ones:

jawbroken · December 7, 2018, 10:07am

To add to what @Nevin wrote, I want to push back on this idea that this is a mutating version of forEach. This is a really more a mutating version of map (it's not a perfect match because the return type of map is Array for various practical reasons, the transform is inout instead of Element -> T), and is only really related to forEach in the same loose fashion that any method that needs to loop over all the elements of a Sequence/Collection is related to forEach.

orobio · December 7, 2018, 12:48pm

I think this is a great addition. Only thing that bothers me about modifyEach is that it implies that each element will be modified, which of course doesn’t have to be the case. Therefore I’d suggest inoutForEach.

mmkider · December 7, 2018, 1:58pm

I’d suggest forEachReference.