[Sub-Pitch] Task-local values in isolated synchronous deinit and async deinit

ktoso · May 9, 2024, 1:16pm

I was asked by the review manager of the first review round to make some time and check in what we can do to get this proposal unblocked. I'm sorry we left your thread hanging here for quite a while @Nickolas_Pohilets! I would like to thank you for your great analysis, and especially the performance analysis for this specific topic of the task locals, this was very useful.

I got together with @Nickolas_Pohilets and some of the core team collecting feedback as to what we should be doing here. Here's a mini summary on the specific topic of task locals in deinits, and some related notes.

Here's the writeup / notes from my sync-up with core Nickolas and core team, summarizing what I think is the direction we need to take here:

First review feedback

Quoting the core team review summary from last round:

So the core team believe that "clearing" is the "safe default" and we should adopt this as the default.

Myself and @Nickolas_Pohilets have discussed alternatives though, and I'd like to explain them here as well:

Clearing in isolated deinit means different isolated/nonisolated deinit behaviors

It is true that the behavior between deinit and isolated deinit with regards to task locals is going to be different if we apply clearing. This note aims to bring up this topic to the attention of the core team during the second review.

Consider the following code:

@TaskLocal
var value: String = "Hello"

@MainActor 
final class SomeClass { 
  var store: SomeClass? = nil
  
  isolated deinit { 
    print("context: \(value)")
  }

  func store(other: SomeClass) { 
    self.store = other // retain the class...
  }

  func release() { self.store = nil }
}

and now let's imagine two "situations", as in, places where code may cause a deinit...
The snippets below simulate the calling side and one or the other may happen first:

// these snippets simulate more complex call trees;

  func situationA_causesRelease() async {
    await $value.withValue("Some Value") { 
      await someClazz.release()
    }
  }
// ~~~ these two calls are "racing" ~~~
  func situationD_causesRelease() async {
    Task.detached { await someClazz.release() } // oh, no task locals
  }

It should be noted that given the clearing proposal:

deinit would SOMETIMES see the task local; it's inherently racy
isolated deinit we're arguing should be clearing the tasklocals lookup; and should NEVER see the task local.

If one wants to use task locals in cleanups, an explicit cleanup method should be used. One can also use withSomething { ... } style APIs to get reliable behavior.

We acknowlage this difference though and will include this in alternatives considerer, for the core team's consideration.

It should be noted that an isolated deinit already may be a bit more costly -- it MAY have to allocate a Job in order to run the deinit, so the isolated deinit by definition being a bit more costly already is part of the performance model.

Why not clear "always" even in normal deinits?
- This may be breaking behavior and it is unlikely much code is relying on this today.
What about "I need FAST isolated deinit and those 20ns" will hurt my performance?
- We could consider adding an @_isolatedDeinit(taskLocals: .noClear) (bad names, ignore them) attribute to turn off this clearing behavior; it would remove this clearing behavior and have no cost at runtime then -- the same as a normal deinit.

We should add such section to the alternatives considered and propose the "safe" default.

Skipping async deinits

I think this thread has agreed already that we're ok to skip async deinits for the time being until there'd be a strong reason for them.

Let's remove the ability to do async deinits and the Task allocation from the implementation.

Future direction if we needed async deinits

In discussing with @Nickolas_Pohilets we came to an interesting idea IF we ever needed to do async deinitializers.

The big problem with them is the large up-front Task stack allocation, so if we had to deinitialize e.g. 100 objects, we'd suddenly allocate 100 tasks and 100 pre-allocated stacks for them as well. They'd sit around waiting to be run eventually and consume all that memory...

We would be able to avoid the initial slab allocation and make those tasks much lighter, with the assumption that there won't be much happening in deinitializers... although if suspensions were to happen those would allocate then during deinit running...

It is unclear if this is worth it though; I wanted to jot down the idea though for future reference; It's not necerssary to reflect this in the proposal I think, as this is a performance optimization of the implementation, and either way we've decided to not do async deinit for now.

Summing up the thread and feedback I think the way forward for this proposal would be to:

clear task locals with the "barrier" record when deinitializing with an isolated deinit
include the difference between an isolated and not with regards to the task local semantics -- we can copy the writeup above and add into alternatives considered
- also for the core team for another round of consideration really
remove the async deinit; this was already agreed on earlier it seems

I think that's about it... I hope I managed to convey all side's points and suggest a productive way forward? Do you want to adjust the proposal and implementation @Nickolas_Pohilets and shall we request another review when these adjustments are done?

FYI @hborla @John_McCall