Deinit for non-Sendable types

We recently received the ability to isolate deinit with SE-0371. It's a really useful feature.

But, this has got me thinking about deinit and non-Sendable types. It seems to me like there could be an inherent problem here, but maybe I'm not thinking about it right.

class NonSendableResource {
    func cleanup() {}
}

class NonSendableContainer {
    let resource = NonSendableResource()

    deinit {
        // is this actually a safe thing to do?
        resource.cleanup()
    }
}

What I'm getting at here is, today, this code compiles. But, is it actually a data race hole?

4 Likes

I'd assume since NonSendableContainer can never be shared (legally), the class will only be referenced by one isolation domain at any given time. Wherever the final reference of this class goes out of scope is therefore the only isolation that has/had references to it. The same goes for the inner class by extension.

As long as the resources does not have some obscure "must run on same thread" thing going on, everything should be fine, right?

2 Likes

It could very well be that the only reason isolated deinit is even required is because GAITs are Sendable. That definitely makes sense to me.

So, what you say here also seems very logical and was my assumption, but I thought it would be worth the question. Deinit can be pretty tricky.

In this post I advocated for being able to move non-Sendable references across isolation boundaries. I wasn’t thinking about deinits though. I was reasoning as if a reference doesn’t “do” anything by itself, and therefore it is safe to share. But deinits effectively turn all references into Russian-roulette closures that may or may not execute at the end of the reference’s scope

2 Likes

Your instinct was right, there is a soundness hole here, but exploiting the problem in safe Swift is a little convoluted.

open class NonSendable {

    var s = ""
    var i = 0

    func cleanup() {
        i += 1
        // appending to a string will crash pretty quickly under concurrent access
        s += "\(i)"
    }
}

// we need to indirect through a non-Sendable class to "trick" the compiler
// into allowing us to call a method on a non-Sendable type from a
// non-isolated destructor.
open class NonSendableOwner {

    let ns: NonSendable

    init(_ ns: NonSendable) {
        self.ns = ns
    }

    deinit {
        // We *are* allowed to call a method on a non-Sendable property here.
        // maybe we shouldn't be? But if this was an error, a *lot* of existing
        // code would break!
        ns.cleanup()
    }

}

@MainActor
class ImplicitlySendable {

    let o: NonSendableOwner

    init(_ ns: NonSendable) {
        self.o = NonSendableOwner(ns)
    }

    deinit {
        // we couldn't call a method on a non-Sendable property here; the compiler would yell:
        // error: cannot access property with a non-Sendable type from nonisolated deinit
        // however, it will happily let us deinit `NonSendableOwner`, even though that
        // is allowed to indirectly call methods on non-Sendable types.
    }

}

// one single instance of the mutable data
let ns = NonSendable()
await withDiscardingTaskGroup { group in
    for _ in 0..<1_000_000 {
        // shared between a million instances of an `@MainActor` type
        let ma = ImplicitlySendable(ns)
        group.addTask {
            // each deinited in a different isolation, and thereby (indirectly)
            // modifying the mutable data concurrently.
            _ = ma
        }
    }
}

then:

swiftc -swift-version 6 NonisolatedDeinitButSent.swift && ./NonisolatedDeinitButSent
zsh: segmentation fault  ./NonisolatedDeinitButSent

Filed as #85667

13 Likes

Wow @OneSadCookie, incredible find!

I believe this problem is specific to global actors. A global actor isolated type is the only kind of type whose deinit can run in an arbitrary isolation domain while other code is running on the global actor (or another arbitrary isolation domain in your example of a race between deinits). For non-global actor instances, we have a guarantee that if the deinit is running, nothing in the actor's region can be referenced by anything else because nothing can be running on the actor (except for potentially the deinit itself), and no other actor deinit can have access to the same state because different instances have different isolation domains. If we attempt to replicate your example with ImplicitlySendable being a regular actor, we'd get a regain isolation error when constructing ImplicitlySendable because the code attempts to merge ns into a million different actor regions.

The only reasonable way I can think of to fix this is to require a synchronous isolated deinit for global actor isolated types that conform to Sendable, including using it for the synthesized deinit. Even that feels like a fairly big change and it would certainly need to be staged in under an upcoming feature because it can change the behavior of existing code. I'm going to continue thinking about whether there are other options. I think that totally banning calling methods on non-Sendable types in deinits of non-Sendable types is a nonstarter. We also can't identify whether there's a problematic custom deinit on a non-Sendable type that's stored in a given global actor isolated type because it could not even be accessible from the current module.

Any other thoughts/brainstorming is welcome!!

6 Likes

I'm concerned that it might be worse than that. Swift is generally lax about allowing isolated values to be referenced — and therefore potentially destroyed — outside of their isolation. This comes up in two main places:

  • the fact that global-actor-isolated values can be Sendable (and generally are), and
  • the fact that isolated functions can be Sendable.

Because function isolation can be erased in several ways, the latter in particular is a very broad exception:

class Counter {
  var count = 0
}

class HasCounter {
  var counter: Counter
  init(counter: Counter) {
    counter.count += 1
    self.counter = counter
  }
  func operation() { /* ... */ }
  func deinit() {
    counter.count -= 1
  }
}

func foo(@_inheritsActorContext fn: @Sendable () async -> ()) {
  // fn is sendable here and can therefore be referenced (and released)
  // from any context.
}

actor A {
  let counter: Counter
  func bar() {
    let ns = HasCounter(counter: self.counter)
    foo {
      // This closure captures a HasCounter, which has a reference to a
      // Counter object. Both are part of the actor's region. The closure
      // is isolated to the actor, but that isolation is statically erased,
      // which is allowed because the destination type is async.
      // Releasing the closure may call the deinit of HasCounter. The
      // HasCounter object is uniquely referenced by that point, so
      // accessing its stored properties cannot race, but the Counter
      // object referenced by those properties may have outstanding
      // references from the actor. Therefore, when we run the deinit
      // of the HasCounter, we may introduce a data race on the
      // stored properties of the Counter object.
      ns.operation()
    }
  }
}
10 Likes

When Mattie & I were searching for the example, ObjC seemed like a fruitful avenue for not-strictly-correct-in-swift refcounting traffic. For example, a Foundation API like NotificationCenter could easily retain an AnyObject parameter on the thread it was provided, and later release it on some other thread, without any “sense” that that is in any way wrong. So requiring isolated deinit on global actor isolated types seems like the right fix to be defensive about that.

1 Like

One solution is to prevent all types (sendable or non-sendable) from using other non-sendable values in their deinits unless those deinits are isolated. That's safe, but it's almost certainly not acceptable, because it'd be really, really restrictive.

An alternative is to try to clamp down retroactively on the ways that we allow non-sendable values to be destroyed outside of their isolation. So a sendable value has to tolerate being destroyed anywhere, but a non-sendable value can assume that it'll be destroyed with reasonable isolation.

We can think about this by cases. Really we're interested in all the references into the connected graph of non-sendable values that contains the value, which is a subset of the values in some region.

  • If the references are directly from code, we must be running with the right isolation, and we should be able to just make sure we destroy the value while we still have that isolation.
  • If the references are from other non-sendable types or closures, they currently can't be references into the region, because those types/closures have to be part of the region.
  • If the references are from sendable types or closures, those types/closures currently must be isolated:
    • If it's an actor, then the value is part of the actor's region. The actor deinit must therefore be the only code that has a reference to values in that region, so there can't be data races.
    • Otherwise, it must be a global-actor-isolated type or an isolated closure. We would need to shift the destruction to happen with the right isolation, but we should be able to do that.
  • In the future, sendable closures might be able to have captures in sendable (disconnected) regions. Such a function might transfer its captures into an arbitrary region; if we then destroyed the captures normally, their use in that region could theoretically race with the destruction. Furthermore, the function could have an arbitrary and unrelated isolation to wherever the disconnected region ends up, and so we would not be able to destroy the captures on that. Fortunately, I think this is relatively easy to deal with, because such a function would also have to be called at most once. As a result, we can reasonably expect that calling it will simply consume all of its captured values, so no separate destruction is needed. Destroying the function before it's called would be fine because the captured values would still be disconnected. The same idea applies to disconnected regions stored anywhere else: if they're transferred away, that just needs to actually remove them from wherever they came from.

The upshot is that I think it might actually be viable to fix this by ensuring that value operations on non-sendable values only happen within their isolation. We can't fully guarantee that (not enough to start using non-atomic refcounting or anything, at least), but we could close the hole for all practical purposes.

4 Likes

One hard limit here is that, while sending parameters are normally implicitly consuming, we do have a concept of sending parameters that aren't consuming because of backwards compatibility. The subsequent destruction of such a value in the caller necessarily potentially races with code in whatever region the callee ultimately sent the value to, and I don't think the approach I laid out above can be extended to cover this. Maybe this comes up sufficiently narrowly that it won't be a practical problem, or maybe we can find a way to be stricter about the values that are sent like this.

2 Likes

One solution is to prevent all types (sendable or non-sendable) from using other non-sendable values in their deinits unless those deinits are isolated. That's safe, but it's almost certainly not acceptable, because it'd be really, really restrictive.

Personally, that would be my personal preference for now, and should have been the starting point with Swift 5.5. Then improve on this with subsequent releases, as we did with Sendable checking in general.

1 Like

is this safe? I think "using in the deinit" has to be extended to "containing any (nontrivial) type at all", because (as my PoC shows) we can't do transitive analysis. And these types that we're deiniting might transitively do bad things (including via ObjC deinits and C++ destructors that are not subject to these rules)

1 Like

I think you’re over-thinking your PoC. Non-sendability is a transitive analysis that still generally works: you can’t access non-sendable values via sendable ones without synchronizing (or finding some other hole). The problem, as I see it, is specifically that deinit is assuming that it can access non-sendable state like an ordinary method could. That is generally safe because RBI forces all values in a region to only be used from a single concurrent context, but it doesn’t work for deinit if we can drop the last reference to an object in the region from outside that context. Preventing the deinit from accessing non-sendable values from the object would certainly be sufficient.

1 Like

I'm reading "using" in your suggestion as "explicitly, in the program's text, calling a method". But in general, if your type has a non-Sendable member, that member will be deinited as part of your deinit, and code that runs as part of that deinit (including transitively through other deinits) may misbehave (including in ObjC or C++ code we can't see into at all, or Swift code compiled with an older language standard).

So whilst I can certainly see that it's safe in pure Swift (after this requirement is enforced), and maybe that's all we care about, in practice that still seems like a footgun...

Of course, as you said, it doesn't seem very practical from a compatibility point either.

If we generally forbade non-isolated deinits from accessing non-sendable data in nonisolated deinits, that would apply recursively, so it would be safe to release other non-sendable values in those deinits. It's totally fair to point out that we couldn't universally enforce that, though, which means it would be risking data races when interacting with other code. So yeah, it's problematic in practice even though it's theoretically workable.

In case it's helpful, I'd like to point out that there's some prior art on thread-safe deinitialization in languages that combine deterministic memory management and garbage collection: namely, some extensions of C++ and Rust.

  • The paper "Destructors, Finalizers, and Synchronization" by Hans-J. Boehm describes a distinction between destructors, which are invoked synchronously and don't need to be thread safe, and finalizers, which are invoked asynchronously and do need to be thread safe.
  • The paper "Garbage collection in the next C++ standard" by Hans-J. Boehm and Mike Spertus proposes how this distinction could be made in a garbage-collected extension of C++ (on page 32).
  • C++/CLI, a garbage-collected extension of C++ that runs on the Common Language Runtime, spells destructors as ~Class() and finalizers as !Class().
  • The paper "Garbage Collection for Rust: The Finalizer Frontier" by Jacob Hughes and Laurence Tratt describes how its garbage-collected extension of Rust ensures thread safety for finalizers (in section 7). Section 10 discusses prior work in Rust.
  • The Rust library pyo3, which provides bindings from Rust to CPython, has the Py type, which is a reference to a Python object that can be transferred outside of CPython's global interpreter lock (GIL). That makes it analogous to a global actor-isolated type in Swift, if the GIL is represented as a global actor. Because CPython reference counting is not thread safe, the Py type needs to synchronize its destruction. When a Py value is destroyed while the GIL is not held, the Py value enqueues itself on a global reference pool that's cleared the next time the GIL is held.
6 Likes

I was thinking about this, and I keep coming back to closures. Is it possible they are a root problem here? Am I right? Is there a way to reproduce this problem without one?

I assume that, somewhere in the runtime, closures have some kind of deallocation function, to clean up their captures and whatnot. Would it be feasible to run such a function on the same actor the closure is formed?

Yep, closure contexts have a destructor which runs when the last reference is released, just like a class instance. They also have an “isa pointer” which points at a bit of metadata to identify the layout of the context. The destructor and metadata is generated in IRGen/GenHeap.cpp.

1 Like