In-place mutation of an enum associated value

DevAndArtist · April 7, 2018, 12:38pm

This is a very interesting scenario @Helge_Hess1 brought up on slack recently. Structs allow in-place mutation of their stored properties, while an enum which doesn't have any stored properties may have associated values which may require in-place mutation. The example was fairly simple. If an associated enum value is an array and that array must be modified by adding new values quite frequently, then we will end up in copying every array for each mutation which can potentially result in overall bad performance. To workaround the issue he added a case without any associated values which will be used to release the reference to the array buffer from self of an enum after another reference was created to keep the buffer alive, then modify the array in-place and re-assign the old case with the mutated array back to self again. However this isn't ideal at all because it requires an extra case to be added to the enum, which from the API design perspective make no sense at all.

Here is an example:

struct MyArray<Element> : CustomStringConvertible {
  
  class Storage<T> {
    var values = ContiguousArray<T>()
  }
  
  var storage = Storage<Element>()
  
  init<T: Sequence>(contentsOf s: T) where T.Element == Element {
    for item in s { storage.values.append(item) }
  }
  
  mutating func append(_ value: Element) {
    if isKnownUniquelyReferenced(&storage) {
      print("KEEP STORAGE")
      storage.values.append(value)
    }
    else {
      print("COPY STORAGE")
      let old = storage.values
      storage = Storage<Element>()
      storage.values.append(contentsOf: old)
    }
  }
  
  var description : String { return storage.values.description }
}

enum Values {
  case release
  case string(String)
  case array ([String])
  case myarray (MyArray<String>)
}

extension Values {
  mutating func add(_ item: String) -> Bool {
    guard case .array(var array) = self else { return false }
    self = .release
    array.append(item)
    self = .array(array)
    return true
  }
  mutating func madd(_ item: String) -> Bool {
    guard case .myarray(var array) = self else { return false }
    self = .release // make sure the RC of `array` drops to 1
    array.append(item)
    self = .myarray(array)
    return true
  }
}

var values = [ Values ]()

values.append(.array(["hallo"]))
_ = values[0].add("World")
print(values[0])

values.append(.myarray(MyArray(contentsOf: ["hallo"])))
_ = values[1].madd("World")
print(values[1])

What do you think? Should there be a different better solution for that particular problem?

zoul · April 7, 2018, 1:33pm

I welcome every improvement that brings enums closer to feature parity with structures, if reasonably possible. It’s a shame that such a useful tool has so many downsides in Swift compared to structs.

hamishknight · April 16, 2018, 9:52pm

There are a couple of alternatives to doing that, the simplest of which would be overwriting self with the associated value filled in with some default value (though granted this isn't always a practical).

For example, with arrays, you could temporarily overwrite with an empty array:

  @discardableResult
  mutating func add(_ item: String) -> Bool {
    guard case .array(var array) = self else { return false }
    self = .array([])
    array.append(item)
    self = .array(array)
    return true
  }

Another option would be to deinitialise then reinitialise the memory manually, which will work with arbitrary associated values:

  @discardableResult
  mutating func madd(_ item: String) -> Bool {
    guard case .myarray(var array) = self else { return false }
    withUnsafeMutablePointer(to: &self) { ptr in
      ptr.deinitialize(count: 1)
      array.append(item)
      ptr.initialize(to: .myarray(array))
    }
    return true
  }

(granted, we are relying on the fact that &self is addressable in order to ensure the associated value is uniquely referenced)

But neither of these are satisfactory – I definitely think that the language should provide some way to conditionally mutate an associated value in-place.

Slava_Pestov · April 17, 2018, 4:29am

It would be possible to design a solution where you can pattern match on an enum and capture the associated value as an inout value. This would also require rethinking how indirect enum cases work. Right now the indirect box is essentially passed as a reference type, and the fact that it is immutable preserves value semantics. If the associated value of an enum could be mutated, we would need to perform copy-on-write when projecting the contents of the box instead.

technogen · March 7, 2020, 7:04am

Has there been any developments in this area? I’ve been running into this limitation very often, so I’d be happy to help make this solution a reality.

MutatingFunk · March 7, 2020, 10:32am

There are frameworks available now which are managing to fill this gap via reflection, one prominent example being EnumKit: Access an associated value · gringoireDM/EnumKit Wiki · GitHub

ZPedro · November 5, 2020, 1:39pm

Definitely throw my vote in with finding a way to mutate associated values in-place, especially one that integrates with indirect: my current Swift project has an iterator state that is best expressed as a tree, and while the code currently works using reference type for nodes, my performance is getting killed by swift_retain/release overhead, e.g. when I get a reference to a node solely to read some info and maybe place it elsewhere, and be done with that reference.

I could consider unmanaged references, but that is not very idiomatic; instead, inspired by Julia, I thought about using value types instead, so the compiler would know there can never be more than one reference. But the compiler copying the value instead is too great, if not unavoidable, if I don't have a way to express in-place modification.

Another advantage of switching to value types would be making copying the iterator state trivial, for my future multi-threaded/gcd improvement through work-stealing.

Finally, I encountered a bug in my latest refactor that was due to me forgetting to write back the enumeration in some cases, which I worked around with a defer block that does it no matter what, but this is a bit of a silly workaround for a problem that shouldn't exist in the first place.

Dev1an · June 15, 2021, 8:58am

I ~~stumbled~~ on ~~the~~ ~~same~~ ~~problem~~ ~~and~~ I am ~~still~~ ~~wondering~~:
If ~~Swift~~'s Optional is an enum ~~with~~ ~~associated~~ ~~value~~, ~~then~~ ~~how~~ ~~does~~ ~~something~~ ~~like~~ ~~the~~ ~~following~~ ~~code~~ ~~snippet~~ ~~work~~?

var someArray = Optional.some([[100], [0,1,2,4,5]])
someArray?[1].insert(3, at: 3)
print(someArray) //→ Optional([[100], [0, 1, 2, 3, 4, 5]])

DevAndArtist · June 15, 2021, 9:06am

Potentially desugaring:

if case .some(var array) = someArray {
  someArray = array[1].insert(3, at: 3)
}

I don't think that this particular example has any special treatment to allow in-place mutation of the associated value, however I could be wrong here.

Dev1an · June 15, 2021, 10:53am

Well I performed some tests in Xcode and they clearly indicate that mutating an optional chain works differently than your "if case var, mutate, copy/move" approach.

Testing

This is my test setup:

let a: [Int] = bigArray() // 1GB array
let b: [Int] = bigArray() // See `Supplemental test code`

var someArray = Optional.some([a, b])

Then I measure 2 different blocks using XCTest:

measure {
	someArray?[0][3] -= 1
}   //  This takes 0.056s and 2.8GB memory

vs

measure {
	if case .some(var array) = someArray {
		array[0][3] -= 1
		someArray = array
	}
}   //  This takes 0.708s 3.6GB

Results

It looks like the optional chaining is 12x faster and it uses less memory than the other approach for some unknown reason.

Syntactic sugar

If the optional chaining is just syntactic sugar then we should be able to find the implementation just in swift code, right? But when I look at the swift repo, I only find Optional.swift. And it doesn't look like it contains the operator ?.

So I'm still interested in finding out more on how the optional chaining works behind the scenes...

Supplemental test code

let oneGigaByte = 1_000_000_000
let count64Bit = oneGigaByte/8
func bigArray() -> [Int] {
	.init(unsafeUninitializedCapacity: count64Bit) { buffer, counter in
		var pointer = buffer.baseAddress!
		let bound = pointer.advanced(by: count64Bit)
		while pointer < bound {
			pointer.pointee = .random(in: 0 ..< .max)
			pointer += 1
		}
		counter = count64Bit
	}
}

Karl · June 15, 2021, 11:19am

Did you measure using an optimized/release-mode build?

Dev1an · June 15, 2021, 11:20am

Yes, the test results in my earlier post are from a release build.
Debug builds with smaller (3MB) arrays give similar results.

Karl · June 15, 2021, 11:34am

Oh :(

Well I'll certainly add my +1 to fix this. I have the same issue in my WebURL library, and I've found a little dance which is really horrible and requires a _tempStorage global to swap with. The good news is that it seems to work even in debug builds (so you can actually test in-place mutation).

github.com

karwa/swift-url/blob/a402e9aec436e879ea08a6afa694ba802fc78621/Sources/WebURL/URLStorage.swift#L825


      
            // It's like if you have something stuck to one hand and try to remove it with the other hand.
            //
            // Basically:
            // 1. Swap our storage to temporarily point to some read-only global, so our only storage reference is
            //    via a local variable.
            // 2. Extract the URLStorage (which is a COW value type) from local variable's enum payload, and set
            //    the local to also point that read-only global.
            // 3. Hand that extracted storage off to closure `inout`, which does what it wants and
            //    returns a storage object back (possibly the same storage object).
            // 4. We round it all off by assigning that value as our new storage. Phew.
            var localRef = self
            self = _tempStorage
            switch localRef {
            case .large(var extracted_storage):
              localRef = _tempStorage
              self = large(&extracted_storage)
            case .small(var extracted_storage):
              localRef = _tempStorage
              self = small(&extracted_storage)
            }
          }

Nobody1707 · June 15, 2021, 8:46pm

I think the optional chaining operator is hard coded into the compiler, and may have some hand coded optimizations in it that aren't in the code path for pattern matching.