You can find the updated proposal text here - Isolated synchronous deinit and asynchronous deinit.
Updates since the reviewed version:
- Scope of the proposal was extended to include async deinit.
- Synchronous deinit of GAIT's is non-isolated by default and requires explicit opt-in, as requested by the review feedback. But
async
deinit inherits isolation of the class. As described in the proposal, this helps to warn users if they are using more expensive async deinit, where isolated synchronous one would be sufficient. - Removed assertion for escaping
self
from future directions, because it was implemented in [Runtime] Fatal error if self escapes from deinit. Β· apple/swift@108f780 Β· GitHub. - I've implemented resetting task-local values, but I still think that task-local values should be copied by default and reset should be an opt-in.
And it is the last part, that I would like this thread to focus on.
There are 3 alternatives regarding task-local values:
- Do nothing - task-locals will be available on the fast path of the isolated deinit, but not on slow path of the isolated deinit, and not in async deinit.
- Make task-locals consistently available - keep them on the fast path, copy on the slow path and for async deinit.
- Make task-locals consistenty unavailable - add stop-node on the fast path, do noting on the slow path and for async deinit.
Current implementation in the branch supports all 3 in the runtime, and options 2 and 3 in the frontend. It is easy to support all 3, the main question is which one should be the default one.
Behavior
Second option offers the least surprising developer experience:
- no unexpected changes when adopting the feature
- no unexpected changes between different code paths
- no unexpected changes between debug and release builds
- no two cases to test: hopping vs non-hopping deinit
- it follows the precedent of
Task.init
vsTask.detached
It can be hard to predict where the point of the last release will be, but as long it can be constrained to some broad scope, code in the deinit
can assume that all task-locals set for the entire scope are available.
In some cases it might be a better practice to capture task-local values in the init()
, store them as object properties and re-inject them into deinit
. But it is not always possible to know which task-local values are used inside the deinit
. For example, if it calls functions from closed-source frameworks or hidden behind type-erasure. In some cases it can be even not possible at all, if task-locals are tunneled through call stack as an implementation detail with actual task-local declarations being unaccessible.
There are no compile-time checks for availability of the non-default task-local values. Checking correctness of the code that uses task-local values relies on testing. Having different behavior depending on the point of the last release can lead to bugs, which are easy to miss during the testing and hard to reproduce once reported.
Task-local values are a tempting instrument to inject loggers or similar infrastructure. If task-local values are accessed only in exceptional situations - e.g. for logging assertion failures or runtime errors, chances of missing task-locals-related problem increase even further. But those are exactly the use cases where it is critically important for logging to work.
And finally, choosing the default behavior does not exclude other options. Advanced users can use attributes to customise behaviour of the deinit in performance critical code. Current proposal text describes new attribute @resetTaskLocals
which selects option #3, because it was needed to run benchmarks. But it is easy to add another one for option #1. This part of the proposal likely to be updated based on the feedback.
Performance
To have a well-informed discussion about this topic, I've made a benchmark of performance of isolated and async deinit.
Here is a short summary:
Case | Cost |
---|---|
Fast path allowing task-local values | 15-20ns per object |
Resetting task-local values on fast path | 20ns per object |
Slow path not copying task-local values | Enqueueing: 115ns per object Dequeueing: 140ns per object |
Copying task-local values on slow path | Enqueueing: 30-35ns per value per object Dequeueing: 20ns per value per object |
Async deinit not copying task-local values | Enqueueing: 270ns per object Dequeueing: 315ns per object |
Copying task-local values in async deinit | Enqueueing: 25-27ns per value per object Dequeueing: 20ns per value per object |
Note that for cases involving queueing there are two numbers, which cannot be simply added up together.
If jobs are enqueued faster then dequeued, then overall performance is fully determined by the dequeueing and extra costs during enqueueing do not matter. And vice versa.
Numbers for fast path and dequeueing are incremental costs, compared to executing synchronous non-isolated deinit. To understand how long job takes to be dequeued and executed for specific use case, these costs need to be added to the duration of the deinit itself. E.g. if slow path job is scheduled copying 3 task-local values, and dequeued job runs deinit costing 100ns, then in total it will take 115ns + 3 * 35ns = 220ns
to schedule and 240ns + 3 * 20ns + 100ns = 400ns
to dequeue and execute.
When destroying a tree of objects with synchronous deinit isolated to the same actor, slow path is taken only once - for the root object. For the rest of the objects last release happens on the expected actor and fast path is taken.
Copying task-local values adds extra cost on the slow path, but is free on the fast path. Resetting task-locals (adding a stop-node, or even just checking if stop-node needs to be added), is free on the slow path, but comes with a cost on the fast path. If we assume that fast path is taken significantly more often than the slow one, then consistently copying task-local values is faster than consistently resetting task-local values.
If we consider a common case of deleting a tree of objects with synchronous deinit isolated to the same actor, than copying task-locals happens only for the root node (and only if root is released from the wrong actor), while cost of resetting is paid for each of the for N or N-1 objects in the tree regardless from where last release have happened. If we decide to consistently copy, extra cost is proportional to the number of task-locals (V
), but does not depend on number of objects. If we choose to consistently reset, then extra cost is proportional to number of objects (O
) and does not depend on number of task-local values. We are comparing Sometimes(V * 35ns)
vs O * 20ns
. Assuming that chance of taking slow path is low and number of objects with isolated deinit is higher than number of task-local values, consistently copying is faster than consistently resetting.
For async deinit, there is no fast path, so the logic above does not apply. But based on discussions so far, there seems to be far more demand for isolated synchronous deinit, than for async deinit. So it makes sense to make decisions prioritising isolated synchronous deinit.
Also async deinit has more than double costs of isolated synchronous deinit, so relative cost of copying task-local values is lower.
One of the concerns raised in the review, was that copying task-local values might prevent future optimizations for reusing current task for async deinit of the child objects. Release of the child objects happens from complier-generated code which does interfere with task-local values. And any task-local values added inside the body of the deinit, would be removed inside the body as well. So if last release of the child object starts a new async deinit, it would need to be executed with the same set of task-local values as parent deinit, and thus can be executed in the same task.
If async deinit is initiated from the body of the deinit, then potentially it could happen with added task-local values. Technically it is possible to store only incrementally added task-local values and still reuse current task. But current task can be reused only after entire deinit completes. If parent async deinit suspends after initiating another async deinit, then reusing task would cause second deinit to wait unnecessary, underutilising hardware. So probably, tasks should not be reused in this case.