[Sub-Pitch] Task-local values in isolated synchronous deinit and async deinit

FranzBusch · May 14, 2024, 8:47am

I am not sure I agree with that statement anymore after having learned the pain of deinit based cleanups. The problem is you start by doing it and it mostly works but then it suddenly fails and it becomes incredibly hard to track down. A common example is doing resource cleanups for file descriptors in deinit. Most systems impose an upper limit of open FDs for a process. If you start closing the FDs in deinits you are bound to run into an FD exhaustion error in busy processes. I agree that this doesn't happen often in client code but it in server code this is a common thing.

You are right that ~Copyable types bring us closer to a solution. The "only" remaining problem that we found is that deinits on ~Copyable types can't be async either so it we either need to have explicit consuming async methods to clean up or we stick with the with style methods.

I don't think this is true. While the priority might be the same, the job still get's enqueued on the executor. The executor might have a long queue with higher priority jobs which might spawn an indefinite amount of higher priority jobs delaying the deinit indefinitely.
In regular systems you are "just" at the mercy of the scheduler. You are right that the scheduler might suspend your thread right at the time where it dropped the last reference and was about to run the deinit. However, I think that scheduler suspension and executor priorities are completely different beasts here.

Nickolas_Pohilets · May 14, 2024, 8:56am

Decision about copying or not copying applies only to the slow path. These numbers describe the difference in performance between the slow path which copies values vs slow path which does not copy. Slow path creates an ad-hoc job, but does not create a full-featured task. Latest version which was measured indeed allocates TL nodes using TaskAllocator backed by a buffer of precisely calculated size. Job and buffer for TaskAllocator are allocated together, as a single allocation. But IIRC, this optimization was not making a significant impact. Two important optimizations that did make a difference were Fix quadratic performance of the `ListMerger` in specific usage pattern by nickolas-pohilets · Pull Request #70910 · apple/swift · GitHub and using llvm::DenseSet<> instead of std::set to copy values only once. If you are curious, I can re-run benchmarks using default allocator.

Current implementation requires that queue item stores at least two words - one is the object pointer, and another one is the pointer to isolated deinit function. In the current implementation VWT contains pointer to the isolating deinit, but isolated one is not reachable from VWT. To use only the object as a queue entry, we would have to add new entry to VWT. Or call isolating deinit, redundantly repeating actor check.

I was thinking of using object pointer as a Job, but that would mean that object layout is extended by bytes which are used only when it is deinitialised.

Nickolas_Pohilets · May 14, 2024, 9:46am

In the "Future Directions" there is a section about linear types. I think named consuming async methods are the answer, if you want to await for explicit deinitialization. But compiler should be checking that they cannot be bypassed. For that we need something like "deleted" deinit. I think "deleted" deinit and isolated deinit are two different features, which can co-exist and each have their own uses.

FranzBusch:

Nickolas_Pohilets:

Job for isolated deinit runs at the same priority as the code that performed last release. And if system exists in condition where deinit job may never run, this means that last release also may never run. Such system would be accumulating unfreed resources even with regular deinit.

I don't think this is true. While the priority might be the same, the job still get's enqueued on the executor. The executor might have a long queue with higher priority jobs which might spawn an indefinite amount of higher priority jobs delaying the deinit indefinitely.
In regular systems you are "just" at the mercy of the scheduler. You are right that the scheduler might suspend your thread right at the time where it dropped the last reference and was about to run the deinit. However, I think that scheduler suspension and executor priorities are completely different beasts here.

I was not talking about OS scheduler suspending threads, but about job priorities in the cooperative executor.

Code performing last release is also part of some job, which also get's enqueued on the executor, and with the same priority. So if executor has a long queue with higher priority jobs, then release can also be delayed indefinitely. Though, there will be a difference if high-priority jobs are created from the low-priority job that performs release.

Jumhyn · May 14, 2024, 10:01pm

The LSG discussed this proposal today in order to determine what we would consider bringing to review, mostly focused on what seems to be the primary remaining point of consternation: the default behavior of task locals. The LSG decided that in order for this proposal to come back for a second round of review, the proposal should remove any guarantees around the availability of task locals within an isolated deinit, regardless of how the call is scheduled. This is the option that 'does the least' both from a performance standpoint as well as from the standpoint of what promises the language makes.

The LSG acknowledged that it is of course possible to construct setups which would potentially exhibit meaningful behavior changes upon adopting isolated deinit, but believes that actual real-world use cases for reliable task local values in isolated deinit would be exceedingly rare and felt that the potential performance impact of messing with the task locals was the weightier concern. And moreover, because isolated deinit requires explicit adoption (as requested after the previous review), users who are concerned about having reliable access to task local values within their deinit will not have behavior change on them unexpectedly.

The LSG also noted that beginning with the behavior that makes fewer guarantees is the more conservative option: if the proposal is accepted, and if usage isolated deinit bears out that there is a strong demand for reliable task local behavior, it would potentially be an option to apply different behavior by default since that would be a 'strengthening' of the contract.

John_McCall · May 14, 2024, 11:23pm

Let me just add that, if there are compelling examples showing the importance of using task locals from deinits, the LSG is certainly open to hearing about that as feedback during the review.

Nickolas_Pohilets · May 15, 2024, 10:30am

Thanks for the input, I’ll update the proposal.

I can think of one more reason in support of keeping TLs undefined in deinit: proposal allows isolation to be implemented in Objective-C superclass by overriding release. Such implementation is likely to be unaware of TLs and would neither copy nor reset them.

@John_McCall, could you share a bit more details about this idea? I suspect it can affect the ABI of the runtime functions used by the feature.

Karl · May 15, 2024, 3:59pm

There's a C++ paper, Adding async RAII support to coroutines which summarises why not spawning a job isn't a great solution:

Can we just allow destructors to be coroutines/asynchronous?

No. Doing so would effectively bifurcate the C++ type-system and break a lot of existing code.
Let's imagine that we allowed some destructors to be made asynchronous. eg. the destructor returned an awaitable type that was implicitly awaited when the object was destructed.

These types would be viral in nature. Classes that contained non-static data-members of types that have asynchronous destructors would themselves need to have asynchronous destructors.

[...]

Such types would not be allowed to be placed as local variables in a normal function since the destructor must be awaited which can only be done from a coroutine.

These types would not be able to be placed in standard containers like std::vector since they need to be able to call the destructors from non-async methods like .resize() and .pop_back().

There is too much code that relies on the fact that destroying an object is a synchronous operation and making destructors potentially asynchronous would make a lot of existing generic code unable to be used with these types.

If we brought deinitialisers in to structured concurrency, it would bifurcate the Swift type system as well. They need to be unstructured so they can continue to appear synchronous.

I think it also allow actors to evolve more easily, doesn't it? A nonisolated deinit could become isolated later (or vice versa) since they both just look like a synchronous call to the outside world.

johannesweiss · May 16, 2024, 2:19pm

I have one main concern with isolated deinits: It introduces GC-style 'do it later' memory reclamation (*) without backpressure into the language.

This system works okay, if the garbage producer (i.e. the piece of code that lets the reference go to ref count 0) is slower as the garbage consumer (i.e. the actor's executor that'll run the isolated deinit). There is no backpressure which means that even if the garbage consumer is overwhelmed, the garbage producer will not be slowed down. That's common in GC'd systems but is usually not the case with ARC. In ARC, the deinit usually run synchronously and in line with the place that decrements the reference to 0. So assuming the backpressure works otherwise, ARC won't break it.
After introducing isolated deinit we could however get to a place where the garbage consumer cannot catch up with the garbage producer and we will start to uncontrollably accumulate memory, possibly until an OOM kill (**).

And of course, with ARC it's absolutely possible to either DispatchQueue.async inside of a deinit or a user could attempt to run the deinit in a DispatchQueue.async by carefully arranging the code in a way that makes it likely for the deinit to be triggered in a DispatchQueue. But that is done by explicitly and by calling library functions (as opposed to using a language feature). So this isn't a new problem, but it would be a new problem if we consider only the language itself.

The combination of automatic reference counting and deterministic deinitialization makes deinit in Swift a powerful tool for resource management

Yes, ARC and is deterministic, although this is very rarely a property that can usefully be exploited by a programmer. Once references are shared across threads, it's usually entirely unpredictable when and where deinit is run. Yes, the places where swift_retain and swift_release are called are absolutely knowable but in all but the most trivial cases it's unknowable which swift_release will actually deinit and what thread that is on.

What plain ARC gives you however is that once the last swift_release is called, deinit will run immediately, inline and guaranteed. No implicit and hidden buffers that hold onto resources until some unknowable later point in time.

Regarding "powerful tool for resource management": This applies to the narrowest set of resources: The ones that can synchronously be released, on any thread and are abundant enough that no program would need to know the amount currently allocated. This includes memory and really not much else. (Yes, for ~Escapable & ~Copyable resources that can be synchronously destrutured this is different but I consider this a very narrow field of resources).

If we do go ahead and introduce isolated deinits into the language, then the system becomes non-deterministic even in trivial-looking cases like the following:

@MainActor class Foo { var number: Int; isolated deinit { print(self.number) } }

func makeTwoFoos() async {
    // Trivial example that can shows ARC's determinism
    // On global default executor, not main actor
    precondition(!Thread.isMainThread)
    let two = Foo(number: 2)
    let one = Foo(number: 1)
    _ = consume one
    _ = consume two
    print("3")
}

This without isolated deinit is guaranteed to print (1 then 2 then 3). But with isolated deinit it might print the numbers in any order I believe. That's because (as I understand it) executors do not guarantee order. Not with regards to task enqueuing order or of course with regards to the caller's progress. So 1 then 2 then 3 is possible, as is 3, 2, 1 or any other permutation. This is not true today.

(*) I do not consider re-claiming most non-memory resources in deinit sensible or workable at a reasonable scale. I have seen many systems that attempt to do so and they usually regret this choice, eventually. In many, many cases this also violates structured concurrency because network connections/threads/... etc cannot typically be reclaimed synchronously without blocking a kernel thread hence scheduling something like a network connection to be closed at a later point is a strucutured concurrency violation (the structure (the scope) has ended, yet we're still doing work in the background).

(**) Extremely crude program that I don't think adds very much to this discussion but just in case

It's probably pretty clear that uncontrollable memory grow can appear if the actor that something's isolated to can't catch up running the enqueued deinits but just in case it's not. I (well mostly claude.ai -- my SwiftUI is ~non-existent) wrote an extremely crude, unrealistic, totally constructed program to demonstrate this. In the absence of isolated deinit I'm using DispatchQueue.main.async { ... } inside of deinit as a crude way to emulate it.

If compiled & run with swiftc -o /tmp/test test-sync-enqueue.swift && echo GO && /tmp/test then we can see how it first runs smoothly with stable memory, same after one click in "Throw Confetti" but on the second click it starts ballooning memory. When it starts ballooning memory (because it keeps enqueuing more deinit work faster than it runs) will depend a lot on your machine as well as if it's run from the command-line or from a UI app and what not. I think it depends on the priority of the main thread vs. the garbage producing thread.

John_McCall · May 16, 2024, 6:31pm

I agree that systems that do more than reclaim memory (or similar implementation-level resources) in deinit typically end up regretting the design. I'm okay with isolated deinit despite that concern as a sort of bridge; people who've put themselves into this situation do need some ability to solve it in the short-to-medium term, and it serves as a sort of lightning rod to which we can attach our recommendations to not rely on it in new designs.

johannesweiss · May 16, 2024, 7:25pm

Works for me, let's make sure we document it appropriately as a stop-gap solution to a particular problem. We've too often lead people astray by not actually documenting what they should/should not use some facility (like (isolated) deinit) for. Introducing a new thing can easily come across as "You should use this, it's great"

FranzBusch · May 16, 2024, 7:26pm

I agree with this statement in positioning isolated deinit as a stop gap solution for people that are already using deinits. Could we update the proposal to reflect this in the motivation section instead of positioning it as a great solution for resource management. This will allow us to refer people to the proposal in the future that clearly states this as a bridge.

ktoso · May 16, 2024, 9:41pm

I think it's absolutely fine to have the feature, but strongly document (including in TSPL, which should cover this feature) that it's not intended for resource management and we need to explain the tradeoffs there. Some folks or APIs got themselfes into a corner with deinits so this helps them, but it's not a general "use this, it's great!" as you said Thanks for the good writeup @johannesweiss, can leverage some of that as we do the docs for it

Jon889 · June 13, 2024, 1:38pm

I'm not sure I understand much here but as a practical example would removing an observer in deinit be considered a valid non stop gap use of deinit? (eg for UIControl target action) If not what is the actual best practice to do remove an observer when it not longer exists?

John_McCall · June 13, 2024, 5:21pm

I wouldn't advise only removing an observer in a deinit. The most important reason is that you probably want to stop observing immediately when the observer loses its reason to exist (e.g. when its associated UI is closed), rather than having to wait for the last reference to the object to disappear. Having an explicit teardown that removes the observer will also break any reference cycles which might have been introduced, which tend to be common with observer patterns; that lets you write code that's much simpler and easier to reason about than the alternative of using weak references in core places.

Having a fallback that removes the observer during deinit in case the teardown doesn't happen isn't a bad idea, although of course it may be pointless if the observed entity holds a strong reference to its observers.