[Pitch] ~SyncConsumable

In the similar vain to Copyable and Escapable, I suggest the addition of SyncConsumable, an implicit interface that indicates a value of a type may be consumed in a context that is not async.

"Consumed" refers to any operation that may dereference a class or call deinit on a non-copyable struct. This may include:

  • Defining a local variable and letting it go out of scope without capturing it or passing it to a consuming function
  • Receiving a value as a non-borrowing parameter, and letting it go out of scope without capturing it or passing it to a consuming function
  • Using the consume operator directly
  • Assigning a value to class or struct member or global variable of the type, overriding the previous value
  • Having a class or struct member of the type. The containing class or struct is considered a "sync context" if it implements SyncConsumable.

When a type does not implement SyncConsumable, no operation similar to the above is allowed in a context that is not async.

In exchange, such a type may use await in its deinit, if it is allowed to have deinit, i.e. if it is a class or a ~Copyable struct.

Use cases

There are many cases where being able to await inside a deinit is useful. The following are just a select few.

Certain stream formats, like gzip or AES-XTS need to be able to perform a "final write" after the final data has been received. In a sync context, it's possible to create a "gzipped file" class that receives regular writes, and automatically adds the final write when it is closed. It can also guarantee being closed when going out of scope, removing the chances of a leaking the resource, or leaving a corrupt file. Without being able to await inside deinit, it's not possible to translate this pattern to async, however.

AsyncIteratorProtocol does not have an explicit cancel, break, or return function, to indicate no more elements are desired. If it's drawing elements from some resource, that resource needs to be cleaned up in the deinit function. However, this is currently only possible if the clean up does not, itself, require using await.

A class may want to keep a background task running during its lifetime. On deinit, it should cancel the task, but it should also await it to ensure cancel processed correctly, and no resources were leaked. Currently, the closest option is to expose the background task (or have the class inherit from Task), and async let it from the context where the class is created. This creates room for error, if one forgets to async let, as not leaking the task would not be guaranteed at compile time.

Adding to both of the above, this can be used to create generators. The generator class would simply create a background task that yields elements, and then awaits until the next element is requested. On deinit, a signal needs to be sent to the background task to cleanly close, and then it must be awaited. A simple cancel may prevent cleanly releasing any held resources.

It could also be used for wrapping withTaskGroup in a class that's ~Copyable and ~SyncConsumable, thereby removing the need for a closure. This could allow, for example, to use a task group from inside a loop, and still be able to break out of said loop, which would result in deinit being called on the class containing the task group. This would, in turn, cause all tasks in the group to be awaited.

4 Likes

What’s wrong with the current way to do this, which is to create a new Task in the deinit?

My understanding is that an issue arises with a non-sendable ‘self’ being captured by that task. I would imagine some form of a ‘sending’ capture for the closure would solve that, since that ‘self’ is known to be the only unique instance of it within that deinit.

The issue is that you don't know when the task would actually complete. You're basically extending the lifetime of everything held by the matching class/struct, and introducing potential race conditions, as the deinit now runs concurrently with the outer scope. You're losing every bit of scope guarantee deinit usually provides.

Put more simply, it's no longer an equivalent to defer.

This is especially the case if dealing with ~Copyable struct, rather than a class. You have to give up the guarantee of no memory leftovers at the end of scope.

This might be interesting, but I also get the impression that, for many use cases involving async resource cleanup, any sort of implicit destruction isn't expressive enough for the full range of needs; sometimes cleanup can't happen at end of scope, but has to be scheduled in a particular way, or it can raise errors that need to be handled and retried. For those sorts of cases, a ~Deinitable property, which would mean that the type is not allowed to be implicitly destroyed at all and must be explicitly given to a consuming operation, might be the ultimate need. It'd be interesting to hear how much middle ground an implicitly-invoked async deinit could cover. In either case, you end up imposing many of the same restrictions on values of the type: the value can't be owned by anything that isn't either async-deinitable itself or able to give the inner value to a synchronously consuming operation in the owner's own deinit.

8 Likes

While capturing self in the unstructured task is a soundness problem. Spawning any unstructured work in deinits can open up remote denial of service attacks in applications.

During the second review of SE-0371: Isolated synchronous deinit @johannesweiss and I brought up our concerns with how implicit spawning of such unstructured work in deinits is dangerous. The same arguments apply for explicit spawning.

In general, I would love to see us explore ~SyncConsumable as a language feature. ~Copyable already brings us pretty close except that we cannot model asynchronous clean up. Types such as file descriptors, sockets or other resources that need async cleanup need to resort to using with-style methods instead.

One area to explore more is if a hypothetical deinit() async should really be implicitly awaited or if they need to be explicitly awaited. In similar veins, what about throwing deinits? My experience is that most async methods also want to become throws at some point.

The Rust project has an async drop initiative where they went through a similar exploration.

3 Likes

I'm having issues mentally visualizing this part. If a cleanup "has to be scheduled", then whatever scheduler is used can be used from a deinit, async or otherwise. I don't see how that interacts with anything related to scope.

By contrast, needing to handle errors and retry a deinit is something that doesn't seem doable. Even if you went the ~Deinitable route, you'd still need to somehow signal that a consuming method might "fail to consume". Although, from the get-go, I have trouble picturing what "fail to consume" would even mean, what use cases might trigger it, and what recourse the calling code might possibly have.

I feel like if it's possible to "retry" a consumption, then it should simply be a loop inside the deinit, not leak it to the containing scope. Even if we consider a throwing deinit (which is a completely separate topic), it would be normally expected to interpret a throw the same way it would interpret a return - i.e. the deinit should be considered done.

The one situation which I can see this becoming an issue is the handling of cancelled tasks. However, since deinit is most likely to be called after a cancel, it should most likely do its best to ignore the cancel, given that it is needed for performing the post-cancel clean-up. Perhaps even running the main logic by awaiting a detached task, so any leftover IO is not interrupted by cancel checks.

A consuming method can throw and give the value back as part of the thrown error if consumption failed, or produce a Result<(), FailedToConsume<Self>> or something like that.

1 Like

I feel like this is leaking into throwing deinit. Also, the more I try to visualize what code that retries consumption looks like, the more it feels like an anti-pattern:

var needsconsume = NeedsConsume()
theloop: while true {
  switch tryToConsume(needsconsume) {
  case .success:
    break theloop
  case .failure(let mustreconsume):
    // The compiler needs to know this part is mandatory. In particular, using try? would be a compiler error, because it discards mustreconsume
    needsconsume = mustreconsume // The compiler needs to understand that this is okay because needsconsume is already consumed, and that this "consumes" mustreconsume
  }
  // The compiler needs to know that needsconsume is necessarily unconsumed here, and it's okay to loop
}
// The compiler needs to understand that if we got here, then needsconsume is consumed.

This feels like a heck of a lot of stress on the compiler, for something that doesn't even seem like remotely readable code.

In contrast to a throwing-and-retriable deinit, an async deinit is something that is a lot more natural. It's just an equivalent to the way things would be done in sync, e.g. if you were using sync IO.

Well, a deinit in the sense of an implicitly-invoked cleanup can't ever really fail or produce errors, at least not without creating ugly language design challenges for dealing with things like double faults when a value gets implicitly destroyed because of an error that already occurred and causes a second error to pile on top of it. And implicitly-invoked cleanups that occur in the absence of programmer intervention have a chance of happening in the "wrong" place; this is already true to some degree of synchronous deinits, but there are more variables at play when talking about asynchronous cleanups. My question was that, since ~SyncConsumable already brings with it many of the same restrictions on values as fully manual cleanup would (you can no longer do simple synchronous reassignment because that involves implicitly destroying the old value, values cannot be members of synchronously-deinited types unless they're manually synchronously consumed, etc.), how useful it would be as opposed to going all the way to that fully manual cleanup model.

2 Likes

It would be useful because if you're inside an async function, all async consumptions happen transparently, and exactly in the same fashion as they would happen in a sync function. Basically, it allows equivalent async and sync code to be nearly identical except for the addition of a few awaits. I'm, of course, talking about the situation where you have a choice of sync vs async operations at the lowest level, e.g. sync file IO vs async file IO.

Yes, if you try to use it inside a sync function, things become overly complicated, if not outright impossible. That's kind of the point. To discourage from use inside a sync function.

By contrast, making the async consumption explicit make the code very different from its sync equivalent. Consider this as a thought experiment: What if deinit never existed at all, and instead you could declare certain structs/classes as requiring consumption, and having an explicit consuming method. Imagine how much more convoluted that would make basically all code be. You basically have to make sure you pre-consume a variable before assigning to it, as well as manually consuming any variable or parameter before they go out of scope.

Now consider that currently, all of this difference applies to any class or struct that fundamentally operates asynchronously, minus the ability to force the consumption. Forcing the consumption merely brings you to what sync would look like without deinit, and the same forced consumption paradigm. It would just make code needlessly verbose and convoluted.

P.S. On sync consumption happening in the "wrong" place: I've been following the discussion there, and I feel like the described issue partially stems from (mis)use of weak references, and is cleanly solved by withExtendedLifetime, and might be even better solved by a ~EarlyConsumable or @no_early_consume class/struct attribute, which would allows reimplementing withExtendedLifetime as a ~Copyable struct instead.

I don't have much to back this up, but I've long thought that types which must be explicitly consumed (i.e. ~Deinitable), especially with good error handling affordances, might feel nicer than deinit.

Something like this:

var f = open("hmm.txt")
defer f.close()

where the compiler verifies that f.close() or another consuming method is called on every branch. It's more verbose, but I like how it reads. It's immediately clear what's going on.

It would be harder to do this with reference counted types – there's no lexical scope where an instance is guaranteed to be dropped. But for non-copyable types, this is (arguably) nicer than deinit, and to @Joe_Groff's point, definitely more general.

2 Likes

Reference counted types aren't your only problem. Nor is the fact that you lose early consumption - The ability for the compiler to run the deinit immediately after the last use of a variable, instead of at the end of the scope.

No, your real issue is that there are two additional common forms of consumption:

class A {
  deinit {
    print("A deinit")
  }
}
class B {
  let a = A();
  deinit {
    print("B deinit")
    // Implicit consume of `a` here
  }
}
var b = B()
let c = B()
print("before")
b = c // Implicit consume of `b` here
print("after")
// Consumption of `c` is deferred to here, although early-consumption could optimize it to happen before "after" is printed

The above would print:

before
B deinit
A deinit
after
B deinit
A deinit

Or, if early consumption applies:

before
B deinit
A deinit
B deinit
A deinit
after

Now imagine that each of the above has to correspond with an explicit function call, instead of an implicit deinit call, if you try to do deinit-less code. It's a mess.

I don't believe this is correct. You don't need to use defer. If you want an early consume, you can call the consuming method or function as early as you want in whichever codepaths make sense.

Looking at your examples: if B required explicit consumption, but A didn't, a could still be implicitly consumed after b is explicitly consumed. If they both required explicit consumption, then you're right that the method that consumes B would have to explicitly consume a, but you'd only have to explicitly consume it once:

extension B {
    consuming func cleanup() {
        a.cleanup() // or some such
    }
}

But even though I like explicit consumption and might use it a lot, there's no reason any type would have to adopt it unless they needed it, and my guess is most types wouldn't, and would continue to use deinit.

For the second example, you're right that you'd have to explicitly consume b before doing b = c. Here's the code with more concrete types and names:

var foo = File(reading: "foo.txt")
let bar = File(reading: "bar.txt")
foo.close() // required if File is ~Deinitable
foo = bar

The question is how often are you going to be doing something like this with types that must be explicitly consumed? Obviously I picked an example where doing foo = bar is kind of silly. But closing file handles is an actual case where I could imagine this being used, and it's harder (though not impossible) to imagine a situation where you'd want to do the foo = bar assignment with a type that owns a unique resource like a file descriptor – it usually makes sense for each value with a unique resource to have a unique name that doesn't change. My guess is you'd end up with something more like this:

var foo = File(reading: "foo.txt")
defer { foo.close() }

let bar = File(reading: "bar.txt")
defer { bar.close() }

It's definitely more code, and that's a cost. But there's a benefit too: Swift is a big, complicated language, and deinitialization is a particularly complicated part (at least for me). The ship has sailed on keeping Swift simple, but finding ways to keep it from getting more complicated is valuable.

It seems like ~Deinitable would handle both the async deinit and throwing deinit use cases (and maybe others? it's pretty powerful), all while being simpler to understand because you end up with straight-line cleanup code and don't add new deinit rules. For me, the added lines of code are worth it for the simplicity and generality.

One other potential benefit of explicit consumption: you can provide multiple different consuming methods: e.g. in the gzip example, you could provide both close() and asyncClose() so that it's still possible to use GzipFile in a non-async context.

One note from earlier:

It's possible I'm missing some subtlety here, but in a type that requires explicit consumption, if you could have done your retries in a loop inside deinit, you should be able to do the same thing in a consuming method.

Which you have to do manually, i.e. you have to take the role of a compiler, and an optimizer. I mean, it's also possible to do loop unrolling manually, but it's usually a bad idea, as it makes the code less maintainable.

Generally, you want the compiler to find optimization opportunities for you, because if the code changes, it can easily remove and reapply an optimization automatically. However, while a compile error will force you to convert an early consumption to a defer if a later use is added, the same does not apply if the later use is removed. You have to constantly keep it in the back of your mind, "This consumption can be moved to earlier".

Yes, you'd also have to consume a, once. And if b also has members c, d, and e, then you have to consume each of these. And all the way down the tree. Basically, an extra line for every member definition.

This conversation started from deinit not being available for async versions of classes that use resources that are blocking if used sync, e.g. IO.

~Deinitable was suggested as an alternative, not in addition, so deinit would basically not be available. While backwards compatibility demands that deinit remain available to sync code, you have to imagine deinit being completely removed in order to visualize what coding without it feels like to someone coding async.

Yes, you can say "but async is only going to be for things like File where you don't usually do assignments":

And that would be wrong. async is contagious. If you have an async File, then it can leak down through multiple classes, down to a business logic class that simply held as a member some other class that indirectly uses a file for logging, or something along those lines. If every async resource is also a "must be explicitly consumed" resource, you're going to have a lot more of them than you're expecting.

Yes, but that would mean you didn't need to throw out of the consuming method, which also means you didn't need it to not be a regular deinit in the first place.