In-place mutation of an enum associated value

This is a very interesting scenario @Helge_Hess1 brought up on slack recently. Structs allow in-place mutation of their stored properties, while an enum which doesn't have any stored properties may have associated values which may require in-place mutation. The example was fairly simple. If an associated enum value is an array and that array must be modified by adding new values quite frequently, then we will end up in copying every array for each mutation which can potentially result in overall bad performance. To workaround the issue he added a case without any associated values which will be used to release the reference to the array buffer from self of an enum after another reference was created to keep the buffer alive, then modify the array in-place and re-assign the old case with the mutated array back to self again. However this isn't ideal at all because it requires an extra case to be added to the enum, which from the API design perspective make no sense at all.

Here is an example:

struct MyArray<Element> : CustomStringConvertible {
  
  class Storage<T> {
    var values = ContiguousArray<T>()
  }
  
  var storage = Storage<Element>()
  
  init<T: Sequence>(contentsOf s: T) where T.Element == Element {
    for item in s { storage.values.append(item) }
  }
  
  mutating func append(_ value: Element) {
    if isKnownUniquelyReferenced(&storage) {
      print("KEEP STORAGE")
      storage.values.append(value)
    }
    else {
      print("COPY STORAGE")
      let old = storage.values
      storage = Storage<Element>()
      storage.values.append(contentsOf: old)
    }
  }
  
  var description : String { return storage.values.description }
}

enum Values {
  case release
  case string(String)
  case array ([String])
  case myarray (MyArray<String>)
}

extension Values {
  mutating func add(_ item: String) -> Bool {
    guard case .array(var array) = self else { return false }
    self = .release
    array.append(item)
    self = .array(array)
    return true
  }
  mutating func madd(_ item: String) -> Bool {
    guard case .myarray(var array) = self else { return false }
    self = .release // make sure the RC of `array` drops to 1
    array.append(item)
    self = .myarray(array)
    return true
  }
}

var values = [ Values ]()

values.append(.array(["hallo"]))
_ = values[0].add("World")
print(values[0])

values.append(.myarray(MyArray(contentsOf: ["hallo"])))
_ = values[1].madd("World")
print(values[1])

What do you think? Should there be a different better solution for that particular problem?

11 Likes

I welcome every improvement that brings enums closer to feature parity with structures, if reasonably possible. It’s a shame that such a useful tool has so many downsides in Swift compared to structs.

7 Likes

There are a couple of alternatives to doing that, the simplest of which would be overwriting self with the associated value filled in with some default value (though granted this isn't always a practical).

For example, with arrays, you could temporarily overwrite with an empty array:

  @discardableResult
  mutating func add(_ item: String) -> Bool {
    guard case .array(var array) = self else { return false }
    self = .array([])
    array.append(item)
    self = .array(array)
    return true
  }

Another option would be to deinitialise then reinitialise the memory manually, which will work with arbitrary associated values:

  @discardableResult
  mutating func madd(_ item: String) -> Bool {
    guard case .myarray(var array) = self else { return false }
    withUnsafeMutablePointer(to: &self) { ptr in
      ptr.deinitialize(count: 1)
      array.append(item)
      ptr.initialize(to: .myarray(array))
    }
    return true
  }

(granted, we are relying on the fact that &self is addressable in order to ensure the associated value is uniquely referenced)

But neither of these are satisfactory – I definitely think that the language should provide some way to conditionally mutate an associated value in-place.

4 Likes

It would be possible to design a solution where you can pattern match on an enum and capture the associated value as an inout value. This would also require rethinking how indirect enum cases work. Right now the indirect box is essentially passed as a reference type, and the fact that it is immutable preserves value semantics. If the associated value of an enum could be mutated, we would need to perform copy-on-write when projecting the contents of the box instead.

14 Likes

Has there been any developments in this area? I’ve been running into this limitation very often, so I’d be happy to help make this solution a reality.

There are frameworks available now which are managing to fill this gap via reflection, one prominent example being EnumKit: Access an associated value · gringoireDM/EnumKit Wiki · GitHub

Definitely throw my vote in with finding a way to mutate associated values in-place, especially one that integrates with indirect: my current Swift project has an iterator state that is best expressed as a tree, and while the code currently works using reference type for nodes, my performance is getting killed by swift_retain/release overhead, e.g. when I get a reference to a node solely to read some info and maybe place it elsewhere, and be done with that reference.

I could consider unmanaged references, but that is not very idiomatic; instead, inspired by Julia, I thought about using value types instead, so the compiler would know there can never be more than one reference. But the compiler copying the value instead is too great, if not unavoidable, if I don't have a way to express in-place modification.

Another advantage of switching to value types would be making copying the iterator state trivial, for my future multi-threaded/gcd improvement through work-stealing.

Finally, I encountered a bug in my latest refactor that was due to me forgetting to write back the enumeration in some cases, which I worked around with a defer block that does it no matter what, but this is a bit of a silly workaround for a problem that shouldn't exist in the first place.

I stumbled on the same problem and I am still wondering:
If Swift's Optional is an enum with associated value, then how does something like the following code snippet work?

var someArray = Optional.some([[100], [0,1,2,4,5]])
someArray?[1].insert(3, at: 3)
print(someArray) //→ Optional([[100], [0, 1, 2, 3, 4, 5]])

Potentially desugaring:

if case .some(var array) = someArray {
  someArray = array[1].insert(3, at: 3)
}

I don't think that this particular example has any special treatment to allow in-place mutation of the associated value, however I could be wrong here.

Well I performed some tests in Xcode and they clearly indicate that mutating an optional chain works differently than your "if case var, mutate, copy/move" approach.

Testing

This is my test setup:

let a: [Int] = bigArray() // 1GB array
let b: [Int] = bigArray() // See `Supplemental test code`

var someArray = Optional.some([a, b])

Then I measure 2 different blocks using XCTest:

measure {
	someArray?[0][3] -= 1
}   //  This takes 0.056s and 2.8GB memory

vs

measure {
	if case .some(var array) = someArray {
		array[0][3] -= 1
		someArray = array
	}
}   //  This takes 0.708s 3.6GB

Results

It looks like the optional chaining is 12x faster and it uses less memory than the other approach :thinking: for some unknown reason.

Syntactic sugar

If the optional chaining is just syntactic sugar then we should be able to find the implementation just in swift code, right? But when I look at the swift repo, I only find Optional.swift. And it doesn't look like it contains the operator ?.

So I'm still interested in finding out more on how the optional chaining works behind the scenes...

Supplemental test code

let oneGigaByte = 1_000_000_000
let count64Bit = oneGigaByte/8
func bigArray() -> [Int] {
	.init(unsafeUninitializedCapacity: count64Bit) { buffer, counter in
		var pointer = buffer.baseAddress!
		let bound = pointer.advanced(by: count64Bit)
		while pointer < bound {
			pointer.pointee = .random(in: 0 ..< .max)
			pointer += 1
		}
		counter = count64Bit
	}
}
1 Like

Did you measure using an optimized/release-mode build?

Yes, the test results in my earlier post are from a release build.
Debug builds with smaller (3MB) arrays give similar results.

1 Like

Oh :(

Well I'll certainly add my +1 to fix this. I have the same issue in my WebURL library, and I've found a little dance which is really horrible and requires a _tempStorage global to swap with. The good news is that it seems to work even in debug builds (so you can actually test in-place mutation).

1 Like

I think the optional chaining operator is hard coded into the compiler, and may have some hand coded optimizations in it that aren't in the code path for pattern matching.

2 Likes