Swifty Memoization

dannflor · October 5, 2024, 7:55pm

I write a lot of React in my day job (not by choice). Before I made my way to this job I was working on iOS and macOS apps in SwiftUI. Ever the good architects, we tried to minimize state that we needed to keep synchronized by locating it in a central store. We ended up with a bunch of arrays of structs (sorry data-driven programmers), and injected this into our view models as a dependency.

class DataStore {
  var redData: [RedData]
  var blueData: [BlueData]
  var greenData: [GreenData]
  
  func syncWithServer() async throws {}
}

One thing we discovered was that our views would need to consume the arrays of data in a slightly modified format, e.g. we would need redData to be sorted on a different member than it currently was, or a redBlueData that combined the elements of both of them in a novel way and then sorted on the result. The naive way to solve this was to use a computed var

@Observable
class ViewModel {
  let dataStore: DataStore

  init(dataStore: DataStore) {
    self.dataStore = dataStore
  }

  var processedData: [RedBlueData] {
    processData(dataStore.redData, dataStore.blueData)
  }
}

These arrays were quite enormous so we soon discovered that this approach was wildly unperformant. Every time the view touched some other variable or function that referenced processedData it would be mapping and filtering and sorting over thousands or tens of thousands of elements. We did not have control over the server to send the work back over there and page the results.

So the answer we arrived at was to maintain memoized versions of the processed data in the data store

class DataStore {
  var redData: [RedData] = [] {
    didSet {
      redBlueData = processData(redData, blueData)
    }
  }
  var blueData: [BlueData] = [] {
    didSet {
      redBlueData = processData(redData, blueData)
    }
  }
  var greenData: [GreenData] = []

  private(set) var redBlueData: [RedBlueData] = []

  init() {}
  
  func syncWithServer() async throws {}
}

enormous performance gains were achieved. We all slapped each other on the back and called it a day.

But something always rubbed me the wrong way. redBlueData is not mutable from outside the DataStore, true. But it shouldn't be mutable from anywhere except the didSets of those two variables. DataStore was in reality a very complex object with dozens of functions and mutable data members. Any time an invariant is enforced by social compact ("DON'T MUTATE THIS EXCEPT FROM A DEPENDENCY'S DIDSET") I feel anxious. It's so easy for those earlier understandings to be lost in time. I like enforcing invariants with language features, and there just wasn't one that could express this.

Fast forward a year and I'm working in react and I start writing this all over our codebase:

const redBlueData: RedBlueData = useMemo(() => processData(redData, blueData), [redData, blueData])

For those of you fortunate enough to not be able to read this, useMemo takes a callback and a dependency array. The dependency array lists the state values that, when mutated, should trigger the callback to run again. redBlueData is now only derived from its dependencies and not mutable outside of that! (well, except in the way every javascript object is mutable even when it's const... but imagine this with value semantics!)

So now I'm back in a swift codebase in my free time and I'm wondering what our equivalent to useMemo is. I understand that react makes this possible because of the peculiarities of its runtime and state tracking, but I still feel like there ought to be some way to accomplish the same thing, even if it has a different signature.

I first thought we can hide the mutable nature of our memoized output using the guts of a macro, so even though it is still technically mutable the barrier to messing with the view is much higher. But there's no getting around that you have to declare the variable as a var.

// redData and blueData must be tracked with @Observable
@Memoized(derivedFrom: { processData(redData, blueData) })
// This always has to be a var for @Memoized to be able to change it
private(set) var redBlueData: [RedBlueData]

It's still better than forcing someone to cram the logic into two didSets, but it doesn't solve my real concern which is that anybody working on DataStore can still violate the invariant that redBlueData should only be a processed view of its two underlying dependencies.

Has there ever been any talk of introducing further access control annotations that would allow locking this down? I'm curious to know what the community thinks.

One last example to show my ideal semantics: you are able to declare the lexical scopes where the variable is accessible. Obviously this is not a literal spelling I'm advocating for:

@Memoized(derivedFrom: { processData(redData, blueData) })
private(onlySettableFrom: Memoized.self) var redBlueData: [RedBlueData]

// or

private(onlySettableFrom: [redData.didSet, blueData.didSet]) var redBlueData: [RedBlueData]

EDIT: ok one last example

A super cool way to do it would be an annotation for a computed property that acquires storage for it in the object's memory layout, stashes the results of the computation, and only reruns whenever an @Observable value within its body updates

memoized var redBlueData: [RedBlueData] {
  // only updates when redData or blueData updates
  processData(redData, blueData, untrackedVariable)
}

bbrk24 · October 5, 2024, 8:09pm

It doesn’t sound difficult to write a propertyWrapper for this, though I’m not at my computer now, and I’m curious whether it already exists.

nkbelov · October 6, 2024, 12:46pm

I personally would just make a dedicated data structure for this specific task. This is a bit of work in the beginning, as you'd have to account for the things you'd typically do to your arrays, but as a bonus you'll be able to define even faster variants of processData. It also makes it trivial to unit test your invariant that redBlueData is a function of the other underlying arrays.

struct MyMemoStruct {
    private(set) var redBlueData: [RedBlueData] = []
    private(set) var redData: [RedData] = []
    private(set) var blueData: [BlueData] = []

    func setRedData(redData: [RedData]) {
        self.redData = redData
        redBlueData = processData(redData, blueData)
    }

    func insertRedData(_ item: RedData, at index: Int) {
        redData.insert(item, at: index)
        // here, you can e.g. leverage the fact that
        // `blueData` hasn't been mutated
        redBlueData = processDataFasterForSingleInsertedItem(redData, blueData)
    }
}

You can also sprinkle in some asserts that test this invariant in-flight:

private var _redBlueData: [RedBlueData] = []
var redBlueData: [RedBlueData] {
    // only triggers in debug builds, will not harm performance in release
    assert(processData(redData, blueData) == _redBlueData)
    return _redBlueData
}