Trick to modify several properties at a time without copying

zienag · February 19, 2020, 12:23am

Hi, Swift users!
I want to share a trick, that I've come up recently and find it very interesting and useful in some cases.

Setting

Lets say we have some state and two functions that can operate on and modify the subset of it:

struct State {
  var foo: [Foo]
  var bar: [Bar]
  var baz: [Baz]
}
func foobazer(data: inout ([Foo], [Baz])) { ... }
func barbazer(data: inout ([Bar], [Baz])) { ... }

The most straightforward way to call those functions (from now assume that we have var state: State declared somewhere visible to current scope):

var foobazData = (state.foo, state.baz)
foobazer(data: &foobazData)
state.foo = foobazData.0
state.baz = foobazData.1

But that looks a little messy, so we can extract this in computed property (as far as I know it is called lenses in functional programming):

extension State {
  var foobaz: ([Foo], [Baz]) {
    get { (foo, baz) }
    set { foo = newValue.0 ; baz = newValue.1 }
  }
  var barbaz: ([Bar], [Baz]) { ... }
}

And call foobazer and barbazer like this:

foobazer(data: &state.foobaz)
barbazer(data: &state.barbaz)

Good!

The problem

Now, the interesting part. If foobazer or barbazer modifies the array of Foo, Bar or Baz, by law of copy on write, it will be copied, because upper level state still holds the reference to it. Thats pretty bad and useless, as the old copy, that is owned by upper state, will be discarded just after function returns. For that specific purpose swift has special _modify and yield keywords, that are now going through pitch phase Modify Accessors to be an official feature and get rid of leading underscore. But, the problem is, that yield can be used with only one value, and you cannot put inout values into tuples:

  var foobaz: ([Foo], [Baz]) {
    get { (foo, baz) }
    _modify {
      yield &(foo, baz) // will not work
      yield (&foo, &baz) // nope
      yield &(&foo, &baz) // still not, no matter how much & you put
      var copy = (foo, baz)
      yield &copy // will work
      foo = copy.0; bar = copy.1 // but, again, will trigger copying
  }

The solution

So, we somehow need to remove the ownership from upper State. How can we do it? What if we write some temporary value to State, and then, when modification is done, restore it to new, shiny value? Lets try:

  var foobaz: ([Foo], [Baz]) {
    get { (foo, baz) }
    _modify {
      // internal array buffer will not be copied because of cow
      var copy = (foo, baz)
      // temporary set dummy values to "borrow" ownership
      foo = [] ; baz = []
      yield &copy  // modification happens
      foo = copy.0; baz = copy.1 // restoring values
  }

And that will work! We took the ownership from State, and now when foobazer will modify the array, it will be single referenced, and not copied. The last thing, that dummy value is kinda strange, and for some types there is no such dummy values. We can move to unsafe world and deinitialize variable temporary:

  var foobaz: ([Foo], [Baz]) {
    get { (foo, bar) }
    _modify {
      var copy = (foo, bar)
      withUnsafeMutablePointer(to: &foo) {
        _ = $0.deinitialize(count: 1)
      }
      withUnsafeMutablePointer(to: &baz) {
        _ = $0.deinitialize(count: 1)
      }
      yield &copy
      withUnsafeMutablePointer(to: &foo) {
        $0.initialize(to: copy.0)
      }
      withUnsafeMutablePointer(to: &baz) {
        $0.initialize(to: copy.1)
      }
    }
  }

We can define some helper functions to make things cleaner:

// just like in c++ 🙈
func unsafeMove<T>(_ val: inout T) -> T {
  withUnsafeMutablePointer(to: &val) { $0.move() }
}
func unsafeInitialize<T>(_ val: inout T, with source: T) {
  withUnsafeMutablePointer(to: &val) { $0.initialize(to: source) }
}

And final solution

  var foobaz: ([Foo], [Baz]) {
    get { (foo, bar) }
    _modify {
      var copy = (unsafeMove(&foo), unsafeMove(&baz))
      yield &copy
      unsafeInitialize(&foo, with: copy.0)
      unsafeInitialize(&baz, with: copy.1)
    }
  }

Afterword

What is still bothers me, that we could, theoretically, move initialized memory of copy to uninitialized foo and bar, and that will help with gigantic structures to reduce retain/release calls and overall copying. But I couldn't find a way to do it yet, may be someone could help me? And what do you think about all this?
Thanks for reading!

zienag · February 21, 2020, 1:25pm

Small update – it looks like if modifying function will throw, statements after yield will not be executed, and object will fall into invalid state. We need to put initialization in defer block:

  var foobaz: ([Foo], [Baz]) {
    get { (foo, bar) }
    _modify {
      var copy = (unsafeMove(&foo), unsafeMove(&baz))
      defer {
        unsafeInitialize(&foo, with: copy.0)
        unsafeInitialize(&baz, with: copy.1)
      }
      yield &copy
    }
  }

zienag · February 21, 2020, 1:51pm

@mbrandonw you might be interested in this :) In your series about reducers, there is an argument that inout parameters are efficient, because no copy occurs – that's not the case if some "substate" is constructed from other values, and above approach can help with that.