[Sub-Pitch] Task-local values in isolated synchronous deinit and async deinit

Nickolas_Pohilets · February 15, 2024, 1:37pm

You can find the updated proposal text here - Isolated synchronous deinit and asynchronous deinit.

Updates since the reviewed version:

Scope of the proposal was extended to include async deinit.
Synchronous deinit of GAIT's is non-isolated by default and requires explicit opt-in, as requested by the review feedback. But async deinit inherits isolation of the class. As described in the proposal, this helps to warn users if they are using more expensive async deinit, where isolated synchronous one would be sufficient.
Removed assertion for escaping self from future directions, because it was implemented in [Runtime] Fatal error if self escapes from deinit. · apple/swift@108f780 · GitHub.
I've implemented resetting task-local values, but I still think that task-local values should be copied by default and reset should be an opt-in.

And it is the last part, that I would like this thread to focus on.

There are 3 alternatives regarding task-local values:

Do nothing - task-locals will be available on the fast path of the isolated deinit, but not on slow path of the isolated deinit, and not in async deinit.
Make task-locals consistently available - keep them on the fast path, copy on the slow path and for async deinit.
Make task-locals consistenty unavailable - add stop-node on the fast path, do noting on the slow path and for async deinit.

Current implementation in the branch supports all 3 in the runtime, and options 2 and 3 in the frontend. It is easy to support all 3, the main question is which one should be the default one.

Behavior

Second option offers the least surprising developer experience:

no unexpected changes when adopting the feature
no unexpected changes between different code paths
no unexpected changes between debug and release builds
no two cases to test: hopping vs non-hopping deinit
it follows the precedent of Task.init vs Task.detached

It can be hard to predict where the point of the last release will be, but as long it can be constrained to some broad scope, code in the deinit can assume that all task-locals set for the entire scope are available.

In some cases it might be a better practice to capture task-local values in the init() , store them as object properties and re-inject them into deinit . But it is not always possible to know which task-local values are used inside the deinit . For example, if it calls functions from closed-source frameworks or hidden behind type-erasure. In some cases it can be even not possible at all, if task-locals are tunneled through call stack as an implementation detail with actual task-local declarations being unaccessible.

There are no compile-time checks for availability of the non-default task-local values. Checking correctness of the code that uses task-local values relies on testing. Having different behavior depending on the point of the last release can lead to bugs, which are easy to miss during the testing and hard to reproduce once reported.

Task-local values are a tempting instrument to inject loggers or similar infrastructure. If task-local values are accessed only in exceptional situations - e.g. for logging assertion failures or runtime errors, chances of missing task-locals-related problem increase even further. But those are exactly the use cases where it is critically important for logging to work.

And finally, choosing the default behavior does not exclude other options. Advanced users can use attributes to customise behaviour of the deinit in performance critical code. Current proposal text describes new attribute @resetTaskLocals which selects option #3, because it was needed to run benchmarks. But it is easy to add another one for option #1. This part of the proposal likely to be updated based on the feedback.

Performance

To have a well-informed discussion about this topic, I've made a benchmark of performance of isolated and async deinit.

Here is a short summary:

Case	Cost
Fast path allowing task-local values	15-20ns per object
Resetting task-local values on fast path	20ns per object
Slow path not copying task-local values	Enqueueing: 115ns per object Dequeueing: 140ns per object
Copying task-local values on slow path	Enqueueing: 30-35ns per value per object Dequeueing: 20ns per value per object
Async deinit not copying task-local values	Enqueueing: 270ns per object Dequeueing: 315ns per object
Copying task-local values in async deinit	Enqueueing: 25-27ns per value per object Dequeueing: 20ns per value per object

Note that for cases involving queueing there are two numbers, which cannot be simply added up together.

If jobs are enqueued faster then dequeued, then overall performance is fully determined by the dequeueing and extra costs during enqueueing do not matter. And vice versa.

Numbers for fast path and dequeueing are incremental costs, compared to executing synchronous non-isolated deinit. To understand how long job takes to be dequeued and executed for specific use case, these costs need to be added to the duration of the deinit itself. E.g. if slow path job is scheduled copying 3 task-local values, and dequeued job runs deinit costing 100ns, then in total it will take 115ns + 3 * 35ns = 220ns to schedule and 240ns + 3 * 20ns + 100ns = 400ns to dequeue and execute.

When destroying a tree of objects with synchronous deinit isolated to the same actor, slow path is taken only once - for the root object. For the rest of the objects last release happens on the expected actor and fast path is taken.

Copying task-local values adds extra cost on the slow path, but is free on the fast path. Resetting task-locals (adding a stop-node, or even just checking if stop-node needs to be added), is free on the slow path, but comes with a cost on the fast path. If we assume that fast path is taken significantly more often than the slow one, then consistently copying task-local values is faster than consistently resetting task-local values.

If we consider a common case of deleting a tree of objects with synchronous deinit isolated to the same actor, than copying task-locals happens only for the root node (and only if root is released from the wrong actor), while cost of resetting is paid for each of the for N or N-1 objects in the tree regardless from where last release have happened. If we decide to consistently copy, extra cost is proportional to the number of task-locals (V), but does not depend on number of objects. If we choose to consistently reset, then extra cost is proportional to number of objects (O) and does not depend on number of task-local values. We are comparing Sometimes(V * 35ns) vs O * 20ns. Assuming that chance of taking slow path is low and number of objects with isolated deinit is higher than number of task-local values, consistently copying is faster than consistently resetting.

For async deinit, there is no fast path, so the logic above does not apply. But based on discussions so far, there seems to be far more demand for isolated synchronous deinit, than for async deinit. So it makes sense to make decisions prioritising isolated synchronous deinit.

Also async deinit has more than double costs of isolated synchronous deinit, so relative cost of copying task-local values is lower.

One of the concerns raised in the review, was that copying task-local values might prevent future optimizations for reusing current task for async deinit of the child objects. Release of the child objects happens from complier-generated code which does interfere with task-local values. And any task-local values added inside the body of the deinit, would be removed inside the body as well. So if last release of the child object starts a new async deinit, it would need to be executed with the same set of task-local values as parent deinit, and thus can be executed in the same task.

If async deinit is initiated from the body of the deinit, then potentially it could happen with added task-local values. Technically it is possible to store only incrementally added task-local values and still reuse current task. But current task can be reused only after entire deinit completes. If parent async deinit suspends after initiating another async deinit, then reusing task would cause second deinit to wait unnecessary, underutilising hardware. So probably, tasks should not be reused in this case.

Links

ktoso · February 19, 2024, 10:55am

To give a small update about concerns we're trying to figure out still here...

We're thinking about the impact of creating unstructured tasks from deinits to begin with to be honest.

Thinking out loud: The biggest problem here is how implicit and invisible this delayed deallocation effect is... And it is more problematic in real systems where the concurrency queues are under load already, so the delays, might be actually noticably worse in real systems, and hard if not impossible to fix...

I am wondering also if dedicating workers (queues, threads) for such async deallocation might be necessary. I don't think starving the concurrent thread-width limited pools with other work may ever cause starvation of tasks which MUST run in order to deallocate resources... So it also feels like those tasks must not only be high priority, but perhaps even guaranteed to run even if the width-limited default pool is starved...?

As for the question about copying the locals or not: I still think that anything other than "no accessible" is going to be a problem. Relying on where exactly a deinit happens and then sneak a local there to perform cleanups seems even riskier than the usual sneaky task local tricks, I'm not very confident this will yield in good APIs. If explicit control like this is needed, strict lifetime like moveonly+deinits or perhaps explicit APIs "terminate/close/end()" (or scopes "with...") sound like the right solution to me.

PS: For benchmarks I'd really recommend using the GitHub - ordo-one/package-benchmark: Swift benchmark runner with many performance metrics and great CI support since we'll get other metrics from it as well easily like allocations, context switching etc

This isn't a complete thought through answer, but I figured we might as well discuss some more rather than leave you handing with no reply, hope that makes sense.

FranzBusch · February 19, 2024, 12:33pm

I agree with @ktoso here. Having the compiler spawn unstructured task to support async deinits seems like a big foot gun. In my opinion, any usage of unstructured tasks must be explicit and written by the user to make it possible to audit the code and have linting rules that forbid the usage of unstructured tasks.

Generally, I would love to see async deinits on ~Copyable types to model resources such as file descriptors correctly.

ktoso · February 19, 2024, 1:12pm

I would want to add that I'd love for us to find a way to solve these concerns and the overall async deinits goal still seems valuable.

It's a rough topic and I didn't want to glance over it -- the overall goal of allowing async in deinits, and being able to still reason about when resources get cleaned up is important and I do hope we can find a way to get these done. Perhaps it'll be more work than it seemed at first though.

And I really wanted to thank you for the effort, and also the benchmarking done this time

Nickolas_Pohilets · February 19, 2024, 1:48pm

I do agree that linear types are the best tool to make explicit async cleanup API. I've mentioned this in the "Future directions". Though probably it won't be an async deinit, but rather a named consuming method in combination with deleted deinit. I think these are two different features, which can coexist in the language.

But I'm not sure if we need async deinit at all. So far, all the use cases, that I have personally and what I've seen on the forums, required isolated deinit, not async deinit. If you have any use cases for async deinit, please share, that would be a valuable input. Until there is a compelling evidence that there is a demand for async deinit, I don't mind completely removing async deinit from the proposal.

And for isolated deinit, it is very likely that deinit will take the fast path. Slow path is really just a safety net. Target audience for this feature are people who already have cleanup actions in their deinit, and they work okish on practise, but isolation checking produces warnings/errors. So for isolated deinit, I want to introduce as little changes as possible compared to regular deinit. Do you agree with this, as a design goal?

Keep in mind that isolated synchronous and async deinit are not needed to deallocate memory. They are needed to perform cleanup actions. Memory is kept to support these actions. Deallocation happens immediately as soon as cleanup work is done. It is not obvious for me why priority needs to be increased for these actions. As long as producing task occasionally yields, total number of objects pending for deallocation will be bound by the executor.

Stan_Smida · March 8, 2024, 6:07pm

It would be great if deinitializers in actors could be figured out for Swift 6.

ktoso · May 9, 2024, 1:16pm

I was asked by the review manager of the first review round to make some time and check in what we can do to get this proposal unblocked. I'm sorry we left your thread hanging here for quite a while @Nickolas_Pohilets! I would like to thank you for your great analysis, and especially the performance analysis for this specific topic of the task locals, this was very useful.

I got together with @Nickolas_Pohilets and some of the core team collecting feedback as to what we should be doing here. Here's a mini summary on the specific topic of task locals in deinits, and some related notes.

Here's the writeup / notes from my sync-up with core Nickolas and core team, summarizing what I think is the direction we need to take here:

First review feedback

Quoting the core team review summary from last round:

So the core team believe that "clearing" is the "safe default" and we should adopt this as the default.

Myself and @Nickolas_Pohilets have discussed alternatives though, and I'd like to explain them here as well:

Clearing in isolated deinit means different isolated/nonisolated deinit behaviors

It is true that the behavior between deinit and isolated deinit with regards to task locals is going to be different if we apply clearing. This note aims to bring up this topic to the attention of the core team during the second review.

Consider the following code:

@TaskLocal
var value: String = "Hello"

@MainActor 
final class SomeClass { 
  var store: SomeClass? = nil
  
  isolated deinit { 
    print("context: \(value)")
  }

  func store(other: SomeClass) { 
    self.store = other // retain the class...
  }

  func release() { self.store = nil }
}

and now let's imagine two "situations", as in, places where code may cause a deinit...
The snippets below simulate the calling side and one or the other may happen first:

// these snippets simulate more complex call trees;

  func situationA_causesRelease() async {
    await $value.withValue("Some Value") { 
      await someClazz.release()
    }
  }
// ~~~ these two calls are "racing" ~~~
  func situationD_causesRelease() async {
    Task.detached { await someClazz.release() } // oh, no task locals
  }

It should be noted that given the clearing proposal:

deinit would SOMETIMES see the task local; it's inherently racy
isolated deinit we're arguing should be clearing the tasklocals lookup; and should NEVER see the task local.

If one wants to use task locals in cleanups, an explicit cleanup method should be used. One can also use withSomething { ... } style APIs to get reliable behavior.

We acknowlage this difference though and will include this in alternatives considerer, for the core team's consideration.

It should be noted that an isolated deinit already may be a bit more costly -- it MAY have to allocate a Job in order to run the deinit, so the isolated deinit by definition being a bit more costly already is part of the performance model.

Why not clear "always" even in normal deinits?
- This may be breaking behavior and it is unlikely much code is relying on this today.
What about "I need FAST isolated deinit and those 20ns" will hurt my performance?
- We could consider adding an @_isolatedDeinit(taskLocals: .noClear) (bad names, ignore them) attribute to turn off this clearing behavior; it would remove this clearing behavior and have no cost at runtime then -- the same as a normal deinit.

We should add such section to the alternatives considered and propose the "safe" default.

Skipping async deinits

I think this thread has agreed already that we're ok to skip async deinits for the time being until there'd be a strong reason for them.

Let's remove the ability to do async deinits and the Task allocation from the implementation.

Future direction if we needed async deinits

In discussing with @Nickolas_Pohilets we came to an interesting idea IF we ever needed to do async deinitializers.

The big problem with them is the large up-front Task stack allocation, so if we had to deinitialize e.g. 100 objects, we'd suddenly allocate 100 tasks and 100 pre-allocated stacks for them as well. They'd sit around waiting to be run eventually and consume all that memory...

We would be able to avoid the initial slab allocation and make those tasks much lighter, with the assumption that there won't be much happening in deinitializers... although if suspensions were to happen those would allocate then during deinit running...

It is unclear if this is worth it though; I wanted to jot down the idea though for future reference; It's not necerssary to reflect this in the proposal I think, as this is a performance optimization of the implementation, and either way we've decided to not do async deinit for now.

Summing up the thread and feedback I think the way forward for this proposal would be to:

clear task locals with the "barrier" record when deinitializing with an isolated deinit
include the difference between an isolated and not with regards to the task local semantics -- we can copy the writeup above and add into alternatives considered
- also for the core team for another round of consideration really
remove the async deinit; this was already agreed on earlier it seems

I think that's about it... I hope I managed to convey all side's points and suggest a productive way forward? Do you want to adjust the proposal and implementation @Nickolas_Pohilets and shall we request another review when these adjustments are done?

FYI @hborla @John_McCall

wadetregaskis · May 9, 2024, 2:39pm

ktoso:

Consider the following code:

@TaskLocal
var value: String = "Hello"

@MainActor 
final class SomeClass { 
  var store: SomeClass? = nil
  
  isolated deinit { 
    print("context: \(local)")
  }

  func store(other: SomeClass) { 
    self.store = other // retain the class...
  }

  func release() { self.store = nil }
}

What is local?

FranzBusch · May 9, 2024, 3:33pm

Thanks for the great writeup @ktoso.

I am really glad to hear that we are removing async deinits from this proposal since they come with a lot of implications and in my opinion new potentially unpredictability when used for resource management.

I am really curious about this since it sounds similar to what async deinits needed. Is there really a difference between spawning a job and a task? Spawning a job means we have to enqueue the job running the deinit on the executor of the actor. Which means this job can run at arbitrary times and in fact might never run if its priority is too low and the system is busy doing other work. This plays into this statement from the motivation section of the proposal:

The combination of automatic reference counting and deterministic deinitialization makes deinit in Swift a powerful tool for resource management.

I strongly disagree with this and IMO deinit is not a great tool for resource management. From experience in the server ecosystem and developing iOS applications before that relying on deinit based clean-up results in all kinds of unpredictability in a program. It is hard to tell which thread is going to win the race to release the last reference and run the deinit. This makes it impossible to tell at what point in time an instance has been deinited.

I am afraid that when we introduce the implicit spawning of a job to run a deinit we are making this even worse by adding additional queueing to clean up the resource making it even more unpredictable when it is cleaned up.

Since the introducing of structured concurrency, with methods have been my go-to pattern to solve this problem of resource management. Before structured concurrency this wasn't always possible when the resource needed to be used asynchronously but now I didn't come across a resource management problem that couldn't be solved with a with method.

wadetregaskis · May 9, 2024, 7:25pm

But for many people / many cases this is a feature, not a bug. Arguably the whole point of reference counting is to free you from single-ownership problems, like you see folks struggling with in Rust or [older] C++. The intrinsic trade-off is that it's sometimes harder to tell when something will be deallocated. But most of the time, nobody cares.

I haven't played much with it yet, but it seems like non-copyable types give you an alternative here, where you can enforce a single-ownership sort of model and thus clearer deallocation timing (including requiring explicit deinit through a consuming self call, if you want). If you want. Having the flexibility to do it many ways is great.

Jumhyn · May 9, 2024, 8:07pm

I'm happy to see this move forward. The lack of an isolated deinit is something we've started hitting a fair amount as we have addressed concurrency warnings in more and more of our codebase.

In some situations there is simply no other option. Our current workaround is to capture the stored properties and send them off to a new unstructured in order to perform the cleanup anyway.

I suppose I should add that in most cases, we don't expect references to the types at issue to ever escape (usually) the main actor in practice, and so it would also often be viable for us to wrap the deinit body in an assumeIsolated block. But without compiler guarantees that the references never escape the main actor, anywhere up the object graph, that's a hard sell: as you note it's difficult to 'know' where we'll be deinited, so much 'safer' from a software reliability perspective to just punt the cleanup work off to a new task with the proper isolation than to risk crashing. Especially since we generally don't care that much when this cleanup work happens, just that it happens 'eventually' and ordered after the owning instance is deallocated. (If we needed better guarantees about cleanup timing, it probably wouldn't be in the deinit to begin with!)

ktoso · May 9, 2024, 9:32pm

It's supposed to be value; a typo.

ktoso · May 9, 2024, 9:39pm

I just had another round of discussion with @John_McCall et al about the task-locals clearing or not.

We went back and forth on the clearing and to be honest we arrived at walking back on the necessity of "clearing". Either way strongly relying on precise task local values existing in deinits strongly in "don't do this" territory, and perhaps we don't need to be guarding so hard against it -- especially since normal deinit already has this issue.

I think this is in line with the default behavior you were preferring as well @Nickolas_Pohilets.

I think at this point we can loop around and update the proposal text to reflect these things and request another review round @Jumhyn.

Nickolas_Pohilets · May 11, 2024, 12:23pm

Optimising async task

I think you misunderstood me. There is no such assumption. I was talking about tasks being enqueued, but not running yet. Currently such tasks still have first slab of the stack allocated. This allocation is triggered by using task allocator to allocate internal data structures for priority escalation. If we can avoid this somehow, we could make memory footprint of unstated tasks much more lightweight, comparable to a single job.

Note that the current proposal draft contains diagnostics guiding users away from unnecessary usage of async deinit. So quite the contrary, we can assume that async deinit suspends at least once.

Priorities

Job for isolated deinit runs at the same priority as the code that performed last release. And if system exists in condition where deinit job may never run, this means that last release also may never run. Such system would be accumulating unfreed resources even with regular deinit.

Task locals

Default behavior

My preference was copying TL by default, plus an option for advanced users to do nothing.

“Copying” actually means to do nothing on the fast path. It has the best possible performance and behavior compatible with regular deinit. And actual copying on the slow path is intended to address different problem.

In the code snippet that @ktoso provided, there are two different tasks racing for the last release. In this case, copying cannot help, because each of the tasks has its own set of task-local values. In this example, deinit is not special from any other function. Any other function, if called from different tasks, will observe different task-local values. I don't think this is a problem at all. That's just how task-locals work, that's in their name.

@TaskLocal
static var value: String = "Hello"

actor MyActor {
    var isClosed: Bool = false
    func close() {
        if !isClosed {
            isClosed = true
            print(Self.value)
        }
    }
}

func situationA_causesClose(actor: MyActor) async {
    await $value.withValue("Some Value") { 
        await actor.close()
    }
}
// ~~~ these two calls are "racing" ~~~
func situationD_causesRelease(actor: MyActor) async {
    Task.detached {
        await actor.close()  // oh, no task locals
    }
}

Copying on the slow path is intended to solve a different issue:

@TaskLocal
var value: String = "Hello"

@MainActor 
final class SomeClass { 
  var store: SomeClass? = nil
  
  isolated deinit { 
    print("context: \(value)")
  }

  func store(other: SomeClass) { 
    self.store = other // retain the class...
  }

  func release() { self.store = nil }
}

func example() async {
    await $value.withValue("Some Value") {
        let root = SomeClass()
        root.store(other: SomeClass())
        // Some logic
        if Bool.random() {
            await root.release() // child deinit takes fast path
        }
        // otherwise child deinit takes slow path
    }
}

In this example, there are regions of code with different isolations, but all being part of the same task and all within the scope of the withValue(). Currently in Swift there are no lifetimes, so compiler cannot check that root or child objects do not escape the scope of withValue(). But developers still can do it manually. And if developers are confident enough that reference does not escape the scope, then they can reasonably assume that object will be deinitialised inside the scope, and expect that task locals are available inside deinitializer the way they are inside any other function.

Availability of task-locals in deinit is as reliable as escape analysis is. Manual escape analysis is not perfect, but can be good enough for some cases. While I would not recommend relying on task-locals in the deinit, I don't think it is bad enough to punish people for that, and pay performance costs for this punishment.

For regular deinit, in the example above task-locals would be visible on both fast path and slow path. If task-locals are not copied on the slow path, then adoption of the isolated deinit may cause breaking behavioural changes. Most users probably won't care about this, but for some this will come as a surprise. In this case, I think it is better to be consistent and also clear task locals on the fast path, to help users to discover this quirk sooner, and make it more likely to be caught in unit tests.

So for a default option, my preference is - "no-op(keep)/copy" > "reset/no-op(none)" > "no-op(keep)/no-op(none)".

Alternative behaviors

In addition to that, we can use attributes to provide alternative behaviours.

If default behaviour is "no-op(keep)/copy", then I see value in "no-op(keep)/no-op(none)" being available as a performance optimization for advanced users. I see it being useful in cases when code inside deinit does not use any task-locals, so paying runtime cost for copying them does not make sense. But if they are not used at all, then keeping them on the fast path should be an issue either. So I don't see a need to support "reset/no-op(none)".

Note that instead of optimising away copying of the task-local values, it can be more productive to minimise number of slow paths. E.g. destroying an array of objects with isolated deinit on a wrong actor will create a job for each object. But if array is wrapped into another object isolated to the same actor, then only one job will be created:

@MainActor
class Leaf {
    isolated deinit {}
}

@AnotherActor
class Bad {
    var leaves: [Leaf]

   isolated deinit {
       // Will create leaves.count jobs
   }
}

@MainActor
class Container {
    var leaves: [Leaf]

    isolated deinit {
       // Destroys leaves using fast path
   }
}

@AnotherActor
class Good {
    let container: Container

   isolated deinit {
       // Will create one job to destroy Container
   }
}

If default behaviour is "reset/no-op(none)", then I see value in "no-op(keep)/no-op(none)" for performance reasons as well. But also I would provide "no-op(keep)/copy" as a way to quickly mitigate behavioural changes in the short term and be able to adopt best practices later, after successful adoption of the concurrency checking.

Opaque copy

And finally, if the best practice that we recommend is to copy TLs in the initializer and keep them as instance properties, then I think we may need to provide users with one more tool.

Task-locals can be used as an implementation detail, and not necessarily exposed publicly. In this case it is not possible to copy them in initializer. So, I think we need an API which allows opaque copying of all task-local values. Something like this:

// Library, closed source, cannot be changed
private class Context {
    var foo: String
    var bar: Bool

    @TaskLocal
    static var currentContext: Context?
}

public func withContext(foo: String, perform action: () async -> Void) async {
    await Context.$currentContext.withValue(Context(foo: foo, bar: false)) {
        await action()
    }
}

public func useContext() {
    if let context = Context.currentContext {
        print(context.foo)
        context.bar.toggle()
    }
}

// App, uses library
class Processor {
    var values: TaskLocalValues

    init() {
        // Copies everything including Context.currentContext
        self.values = TaskLocalValues.current()
    }

    deinit {
        self.values.perform {
            useContext()
        }
    }
}

func example() async {
    await withContext(foo: "Something") {
        let processor = Processor()
        ...
    }
}

wadetregaskis · May 11, 2024, 3:42pm

I suppose that can form a kind of priority inversion, where higher-priority code is waiting for the resources held by an enqueued deinit (e.g. semaphores, file descriptors, or even simply the memory).

Nickolas_Pohilets · May 11, 2024, 6:34pm

Yes, absolutely. My point is that if there is a priority inversion involving deinit job, there must also be priority inversion with code performing the release. So, slow path of the isolated deinit does not introduce priority inversion on its own. But it can worsen symptoms of an existing priority inversion issue. With regular deinit, once releasing job gets a chance to run, deinit body and memory deallocation will be executed as well. With slow path of isolated deinit, two low-priority jobs need to get their chance to run.

ktoso · May 12, 2024, 2:03am

We're talking about the same thing then, though I un-necessarily mentioned that assumption and had missed that there's a warning for an async deinit which does not suspend being proposed, that's all good.

Yeah I meant the same thing here, not delaying the allocation until it is necessary vs up-front while-not-running-yet having the stack ready in the task.

Either way, for now it seems we'll be skipping async deinits huh.

On this topic though my example was indeed quite bad! Thanks for correcting that.

Looking at your revised example again I'm again reminded why the copying isn't good imho. I view the "penalizing" the other way than you do I think?

The punishment IMHO is the copying, because it is an invisible performance penalty. Anything that we do do make it "actually work" (i.e. the copying on the slow path, as you explain here), is IMHO helping people "rely on the wrong thing" and while paying performance costs when doing so.

Because the availability is not going to be really reliable in any case, we should not be trying to make it slightly more reliable sometimes and adding a performance toll while doing so. That's my take on it at least -- back to the "reset/no-op(none)" handling IMHO.

Nickolas_Pohilets · May 13, 2024, 2:29pm

The nature of performance toll is quite different between "no-op(keep)/copy" and "reset/no-op(none)".

Assuming that typical use case looks something like this:

Objects live in small clusters, isolated to the same actor. Let's take size of the cluster to be 12.
Root of the cluster has 10% chance to have its last release happen from the wrong actor.
There are 3 TL values to copy.
Cost of copying 1 TL - 35ns, cost of resetting TLs - 20ns.

Worst case scenario for "no-op(keep)/copy" - 3 * 35ns = 105ns.
Amortised - 10% * 3 * 35ns = 10.5ns.

Worst case scenario for "reset/no-op(none)" - 12 * 20ns = 240ns.
Amortised - 10% * 11 * 20ns + 90% * 12 * 20ns = 238ns.

I expect that on practise fast path also will be the hot path. So its performance will have a deciding role in determining overall impact of the feature on app performance. So I prefer the option that gives consistency while optimising performance of the fast/hot path.

Also costs of "reset/no-op(none)" need to be paid, even if no TLs are used at all.

Because the availability is not going to be really reliable in any case, we should not be trying to make it slightly more reliable sometimes and adding a performance toll while doing so.

"no-op(keep)/copy" makes it reliable to use TLs within the scope of one task. TLs will be unreliable only if object is shared across several tasks, but this is not a problem unique for deinit's. Deinit's are challenging to work with because retains and releases are not visible and can be moved around by compiler optimisations. But these challenges apply to any actions in deinit body, and are not specific to usage of TLs.

I do understand that we don't want to promote writing unreliable code in deinit bodies, but if someone is already doing it, with "no-op(keep)/copy" it will continue working. With "reset/no-op(none)" existing code will break. Do we really want to force developers to re-write this code when they decide to adopt concurrency checking?

John_McCall · May 13, 2024, 5:15pm

In general, I have seen a lot of features that ought to be high-performance become low-performance because they’re burdened with too many interactions with other features. Decisions like this need to balanced by considering the value of the interaction and not just pile up overheads because the interaction seems nice to have. We have not seen any examples that would rely on seeing a task-local from a deinit that actually seem like good code.

John_McCall · May 13, 2024, 5:32pm

This 25-35ns figure for copying task locals also seems somewhat unrealistic since it requires allocation and deallocation. I suspect you're getting that by assuming that enqueuing an isolated deinit always requires creating a new task, so we can just allocate the task locals on the new task the same way we do for Task {}, but that's a significant amount of baseline overhead that I think we don't want in the long term. Ideally, enqueuing an isolated deinit should just involve adding the object pointer to some global queue without needing a separate allocation at all — certainly that's how we'd want to optimize it for deinits isolated to the main actor, which is likely to be the dominant use case for this feature.