Async deinit: Optimization Goals

Nickolas_Pohilets · February 5, 2023, 8:03pm

I was thinking about implementation of the async deinit, and realised that I don't fully understand the problem.

In this case, naïve implementation would create a fresh task for each async deinit. What exactly is the concern here. Is it performance cost of the creation of the new task? Is it about memory usage with a large number of tasks executing concurrently? It is about cost of switching between tasks?

Is task creation/destruction expensive? Does pooling tasks have any benefits in general case? Does pooling have benefits if tasks are known to have identical priority and initial task-locals?

DevAndArtist · February 6, 2023, 6:26am

I have not that extensive experiences with deinit so I have a few questions:

What happens when the compiler calls deinit on an object? Are the stored properties of that object (even weak once) still all in tact?
What happens after the call? Will the Compiler proceed recursively?
Is it possible to delay or even stop this process, by extending the objects lifetime or forcefully assigning a reference to self somewhere else?

I haven't read the other discussion, but this partly feels like another pseudo cooperative "cancellation". An unstructured task's cancel method is nonisolated and synchronous, but it's just a signal.

Could a deinit not remain nonisolated and synchronous in case the last question from before is actually possible?

An object which gets its deinit called could then extend the lifetime of self to cooperatively clean up in an asynchronous way and then release the extended reference again to potentially re-trigger deinit.

Again, that's just a gut feeling. Not really speaking from experience. However a cooperative deinit sounds like an interesting concept.

John_McCall · February 6, 2023, 11:20pm

Yes. deinit is called immediately before the component-wise destruction of the stored properties. In principle, there doesn't have to be a phase distinction between these as long as we can make sure that nothing happens with a partially-destructed object; for example, we should be able to consume values out of stored properties as long as we don't then try to use the full self value. In classes with inheritances, however, that would get tricky because calling super.deinit is one of the things that cannot happen with a partially-destructed object; that is, soundness under inheritance has a tendency to force a phase distinction.

There's nothing different about weak properties except that weak references to objects that have entered deallocation will read as nil. But that's about the target of the reference, not the source; the weak properties of self in a deinit are still normal.

I'm not entirely sure, but I think this is answered above.

This is called "resurrection", and no, we intentionally do not allow it. There is a (language) bug that makes this have unsafe behavior today, but the solution for that will be to statically prevent it; we will never have a mechanism in deinit to stop object destruction. (People ask for this sort of thing now and again for all sorts of clever reasons that are ultimately really, really bad ideas.)

My general feeling is that async deinit is an attempt to do things via object lifetimes that shouldn't be done that way.

kavon · February 7, 2023, 12:35am

All of the stored properties of the object are destroyed. Code to destroy the properties is automatically injected into all deinits (implicit or not). For a stored property of reference type, release is called on it, which can recursively trigger that object's deinit if it's the last reference. While copyable value types do not have a user-defined deinit, each member of such instances are still recursively destroyed. Destroying simple types like an Int is a no-op.

It is not valid to do that in Swift, as John said. While classes haven't had these illegal lifetime extensions prevented statically, we now raise an error at runtime when it happens.

Now, for async deinits...

I was just pondering worst-case scenarios, such as a tree like this:

class Node {
  var children: [Node]

  init(width: Int, depth: Int) {
    children = []
    guard depth > 1 else { return }
    for _ in 0..<width {
      children.append(Node(width: width, depth: depth-1))
    }
  }

  // this acts like a nonisolated async deinit, 
  // where it spawns a destruction task and doesn't await.
  func destroy(asyncDeinit: Bool) {
    let destroyIt: @Sendable () -> () = { [self] in
      // warning: capture of 'self' with non-sendable type 'Node' in a `@Sendable` closure
      while (!children.isEmpty) {
        let c = children.removeLast()
        c.destroy(asyncDeinit: asyncDeinit)
      }
    }

    if asyncDeinit {
      Task.detached(priority: .background, operation: destroyIt) 
    } else {
      destroyIt()
    }
  }
}

I wasn't sure what the problem would be, so I threw together a little test harness to play with it:

Test Harness

// creates `batchSize` trees of specified size
func cycle(batchSize: Int, treeWidth: Int, 
           treeDepth: Int, useAsyncDeinit: Bool) {
  var batch: [Node] = []
  
  // create a batch of trees
  for _ in 0..<batchSize {
    batch.append(Node(width: treeWidth, depth: treeDepth))
  }

  // throw away the batch
  while (!batch.isEmpty) {
    let root = batch.removeLast()
    root.destroy(asyncDeinit: useAsyncDeinit)
  }
}

#if ASYNC_DEINIT
  let asyncDeinit = true
#else
  let asyncDeinit = false
#endif

while true {
  cycle(batchSize: 8, treeWidth: 2, treeDepth: 8, useAsyncDeinit: asyncDeinit)
}

And it seems the biggest problem is the possibility of unbounded memory growth, as the async deinit version doesn't plateau and grows rapidly:

memory use for asyncDeinit = false: 2.3MB steady
memory use for asyncDeinit = true: 81GB and still increasing before I terminated it

I peeked at Instruments and it's definitely an increasing number of active tasks with no end in sight. That's because nobody's awaiting the completion of these destruction tasks, and we're creating them faster than they're getting completed.

This isn't too surprising: of course deferring work requires memory allocation. So this is not necessarily a showstopper for the feature, it's still a question of risk vs reward. Is this only ever limited to toy examples like mine? How much worse is it if the async deinit is also isolated to a serial executor, like @MainActor?

Nickolas_Pohilets · February 7, 2023, 1:43pm

I've played with your example a bit, and looks like the problem here is the priority. Removing priority: .background, fixes unbound memory growth. Which is surprising to me, because there are no other tasks with higher priority, so priority should not matter. But even changing it to .utility prevents unbounded growth. Even without awaiting awaiting in the producer, with only one producer there are enough consuming threads to keep the queue drained.

I've modified your example to see if awaiting in producers is sufficient to create a back-pressure, and looks like it is:

class Node {
    var children: [Node]
    
    init(width: Int, depth: Int) {
        children = []
        guard depth > 1 else { return }
        for _ in 0..<width {
            children.append(Node(width: width, depth: depth-1))
        }
    }
    
    // this acts like an async deinit,
    // where it spawns a destruction task and doesn't await.
    func destroy() {
        Task.detached { [self] in
            try? await Task.sleep(nanoseconds: 500_000_000)
            // warning: capture of 'self' with non-sendable type 'Node' in a `@Sendable` closure
            while (!children.isEmpty) {
                let c = children.removeLast()
                c.destroy()
            }
        }
    }
}

// creates `batchSize` trees of specified size
func cycle(batchSize: Int, treeWidth: Int, treeDepth: Int) {
    var batch: [Node] = []
    
    // create a batch of trees
    for _ in 0..<batchSize {
        batch.append(Node(width: treeWidth, depth: treeDepth))
    }
    
    // throw away the batch
    while !batch.isEmpty {
        let root = batch.removeLast()
        root.destroy()
    }
}

@main struct Main {
  static func main() async {
      let pi = ProcessInfo.processInfo
      var t: Task<Void, Never>?
      for _ in 0..<pi.processorCount {
          t = Task {
              while true {
                  cycle(batchSize: 8, treeWidth: 2, treeDepth: 8)
                  await Task.yield()
              }
          }
      }
      // Dummy await to keep the main thread alive
      await t?.value
  }
}

Note that because of the delay in destroy() it takes some time to reach saturation:

bbrk24 · February 7, 2023, 4:44pm

I remember reading to only use .background for truly optional work, as it may never be run. (IINM one case where this may happen is on iOS when low power mode is enabled, but I’m unsure on that part.)

kavon · February 7, 2023, 9:43pm

Using default priority certainly helps get the jobs serviced faster, but it doesn't completely eliminate unbounded memory growth. For me, Activity Monitor still shows in the Memory (and Real Mem) column that it is consistently demanding more memory to the point of significant swapping, and the Swift Concurrency instrument shows the number of active tasks growing without leveling off.

But the specifics of my machine and configuration are not important. System load, whether the program is compiled with -O, whether I'm on battery, other scheduler decisions, and all sorts of other random factors will influence whether one person's system can keep-up with servicing the constant flow of async destruction jobs in my example. So the rate-of-growth comparisons aren't terribly useful.

Asynchronous deinitialization lazily destroys values, so there is an inherent memory footprint cost associated with deferring deallocations. One tricky aspect of async deinits in Swift is that we actually need to allocate memory in order to remember how to deallocate other memory. We should focus on ways to control that footprint.

Adding a Task.sleep for a half-a-second duration at the start of each detached task created by destroy is not a realistic form of backpressure. It just artificially slows down the rate at which the data structure is recursively destroyed. EDIT: So, while I can imagine a sort of await Task.deinitTaskYield() injected at the start of these tasks to help the runtime control whether the task should run, is that much different than having the task have some sort of special, low "deinit" priority? It also doesn't apply backpressure on the actual task creation for non-recursive cases like this:

while true {
  Node(depth: 0, width: 0).destroy()
}

About the Task.yield()'s

Also, I think the await Task.yield() between each call to cycle() is probably not doing much beyond slowing down / pausing the cycle loop. Since multiple tasks are being spawned in main with Task.init, they should all be trying to run on the main thread. The yields just try to allow the main thread to run some other task, but I think yield on a high priority thread would just switch to another task of similar priority, not the deinit ones.

But you have the right idea: backpressure is definitely the missing piece of what's needed to control the growth of deferred deallocations (aka garbage). Standard garbage collectors may rely on some specific heap filling up as a form of backpressure that forces a stop-the-world collection. Swift's existing form of backpressure against garbage is to eagerly destroy everything, giving its programs a low memory footprint.

I don't know how we can ensure backpressure for a real async deinit, primarily because objects with an async deinit may be destroyed from non-async contexts.

For example, suppose the synchronous deinit thunk wrapping the async deinit could recognize when its immediate caller is an async function. The thunk maybe could do a little dance to pop its own frame and push an async frame on top of the task's stack to execute the body of the async deinit, so we can avoid creating a detached task by effectively awaiting the deinit. But if the caller is synchronous, we cannot await, so we're forced to create a detached task and we're back at the same problem.

Nickolas_Pohilets · February 8, 2023, 11:32am

It's await Task.yield() which creates back-pressure, not Task.sleep(). Sleep is there is to make deallocation even slower. It models some async shutdown activity in the deinit body.

Back-pressure is applied towards allocation of the new objects. Bounding memory usage is achieved by pausing new allocations until already scheduled deallocations are not processed. If used correctly, executors already provide this functionality. And "used correctly" means two things:

Producing code eventually yields execution
Deallocating tasks have equal or greater priority than allocating tasks.

Current implementation of the isolated deinit copies priority of the task that executed last deinit. If objects are allocated in high-priority task, but last release happens in low-priority task, this might be a problem. Maybe solution for both isolated and async deinit would be to remember inside the object priority of the task that allocated it, WDYT?

I was considering this in detail in separate post. Basically this makes code very hard to reason about. There are lots of sync frames inside the async function, often generated by the compiler. Which may be inlined or not.

tera · February 8, 2023, 11:48am

Indeed, a property release may cause object deallocation which may cause a property release which may cause deallocation, and so on. Currently this is done recursively (which is known to be problematic for "deep" object graphs).

Nickolas_Pohilets · February 8, 2023, 2:12pm

At least neither isolated deinit nor async deinit change the status quo. If objects are allocated in high-priority task in a loop, but last release is happening from a lower-priority task, then there is already a danger of unbounded memory growth even with sync nonisolated deinit, simply because low-priority jobs may never get a chance to be executed and even reach the release.

kavon · February 8, 2023, 6:41pm

Swift concurrency doesn't support preemptive scheduling, which is the underlying principle behind your use of Task.yield. In addition, the code destroying the objects with async deinits may be fully synchronous, like in my original example, so they can’t yield. Even if the programmer tries, Task.yield does not switch tasks if it’s already on the highest priority one.

(will respond to other points in a bit)

kavon · February 11, 2023, 10:23pm

My original example just has one task allocating and then deallocating objects. Playing with the priorities of the tasks spawned in the deinit doesn't completely solve the problem, as there is no real backstop for the growth.

I think the two-task status quo scenario you mention is different in a few ways:

First, there would need to be an await (or a lock / atomic) somewhere to synchronize the transfer of objects between tasks (say adding/removing from some concurrent queue). That can serve as a place to add backpressure for the allocating task.

Second, the low-priority task destroying the objects has to be grabbing some number of objects out of the queue and then first doing something with them. Then, the accumulation of data in that queue is not garbage. If the task removing the items can't keep up, maybe it has the wrong priority, or you need to add backpressure for the allocating task... but that data is not garbage.

Otherwise, if this low-priority task really is the only one taking items, and it's doing nothing but calling the queue's remove() method and ignoring the return value, then that queue is definitely an accumulation of garbage. The elements should have never been added to the data structure. Then, they would have been destroyed right after it's last use in the allocating task. This queue of garbage scenario more closely matches the async deinit / isolated deinit issue.

Speaking of status quo's: To end up with similar unbounded memory growth today, you'd have to end up doing something asynchronous in some deinit executed by the task:

Kick off a new task with Task.init or Task.detached
Call something that is async like DispatchQueue.async.

That in itself could maybe become a problem, even if these async jobs don't recursively trigger more async work (corresponding to the case where the Node's children.count < 2). I haven't experimented with this.

But one way to really compound the issue would be if this one task spawns more than one other task. That could easily happen in the recursive destruction scenario. For that to happen, you would need to explicitly capture more than one unique reference to some class (like the single children field of the Node), into these asynchronously-invoked closures. You cannot refer to it as self.children, since that captures self and doing that in the deinit is illegal:

class BinaryTree {
  var left: Node?
  var right: Node?
  // ...
  deinit {
    Task.detached { [left,right] in 
      // do something
    }
  }
}

Is the risk of making all of this happen implicitly with just an async or isolated deinit an acceptable solution? Of course, that depends on what we gain from these deinits and what the alternative solutions are like. But this thread isn't a proposal review.

In terms of optimization goals, the impact of that problem is really the question. We know there is a risk of uncontrolled accumulation of memory and tasks. How bad can it get and in what ways can the implementation mitigate those issues?

Nickolas_Pohilets · February 12, 2023, 10:56pm

kavon:

Second, the low-priority task destroying the objects has to be grabbing some number of objects out of the queue and then first doing something with them. Then, the accumulation of data in that queue is not garbage. If the task removing the items can't keep up, maybe it has the wrong priority, or you need to add backpressure for the allocating task... but that data is not garbage.

Otherwise, if this low-priority task really is the only one taking items, and it's doing nothing but calling the queue's remove() method and ignoring the return value, then that queue is definitely an accumulation of garbage. The elements should have never been added to the data structure. Then, they would have been destroyed right after it's last use in the allocating task. This queue of garbage scenario more closely matches the async deinit / isolated deinit issue.

How exactly do you define “garbage”? And is there a difference if unbounded memory growth is caused by garbage or non-garbage data?

Note that it is not the deallocation itself which is enqueued by isolated/async deinit, but execution of the deinit body. Destruction of stored properties and freeing object memory happens synchronously immediately after deinit body completes. And if deinit body is doing something, then data that schedules it’s execution is not garbage, right?

kavon · February 13, 2023, 10:50pm

A value is garbage once it is no longer accessible to the user program (i.e. or the mutator... the part of the runtime doing the "useful work"). In Swift this is the when the reference count reaches zero. This is a pretty typical definition.

Both are bugs. The difference is who's responsible: the language implementation or the programmer.

This is not quite accurate and I see no reason to make this distinction. Destruction of stored properties happens during the deinit. A reference count reaching zero means it's dead. The deinit is then invoked, passing a self having a reference count of zero.

No. If a garbage collector discovers an object with a deinit has become unreachable, it's still garbage, but just a little bit special. Here's a nice paper that has more discussion about this, using "destructor" and "finalizer" to distinguish synchronous and asynchronous kinds of deinits, plus the sorts of problems that can come up.

I want to refocus the point of my bringing up the garbage collector analogy: There needs to be a backstop to prevent unbounded growth of memory.

Nickolas_Pohilets · February 16, 2023, 4:50pm

The reason for distinction that I see, is that code in the deinit body is controlled by the programmer, while destruction of stored properties is done by compiler/runtime.

So I consider cases of unbounded memory growth caused by deinit body and unbounded memory growth caused by other form of explicit code to be equivalent. With programmer being responsible for identifying and fixing the issues. And solution for one also to be a solution for another.

How explicit/automatic do you expect this backstop to be?

kavon · February 16, 2023, 8:17pm

The design of Swift concurrency is that unstructured tasks are not created implicitly, in part so that people's programs don't end up silently leaking tasks, which is the crux of the async and isolated deinit issue. Structured concurrency in Swift is specifically designed to help programmers avoid such leaks.

As automatic as Swift's automatic memory management, whose purpose is specifically to alleviate programmers from worrying about leaking memory. I'm not OK with pushing any more responsibility for dealing with subtle memory leaks onto Swift programmers.

Nickolas_Pohilets · February 16, 2023, 9:23pm

Even with ARC developers still need to be aware of retain cycles and decide where to use weak references to break cycles.

IMO, unbounded memory growth in case of async deinit requires similar level of awareness. Developers would need to be aware of the issue, and decide where to insert await in the allocation loop.

Fully automatic solution would require pausing allocation in some way. Blocking the thread is definitely a no-go for async code, but even for sync code it may cause unexpected deadlocks. Better alternative would be force classes with async deinit to also have async init(), so that object construction is always an explicit suspension point.

I’m concerned that this can be too restrictive. Developers may want to main a critical section of non-suspending code, and forcing async initializers might interfere with that.

And even if go this way, what would be the policy for pausing/resuming the allocation? What is the definition of “allocating slow enough to allow deallocation to catch up”?

This sounds like a matter of scheduling, which ultimately should be handled by the executor. Inserting implicit await Task.yield() before async init looks like a way to get the desired policy out of the executor. WDYT?