Let's debug: over-release of actors involving unowned self captures

there's this strange bug (and related thread) that was reported some time ago in which a closure that is actor-isolated and involves an unowned capture of the actor instance can seemingly lead to an over-release of the actor itself. here's a reproduction distilled from the original report:

actor A {
    func bug() {
        let _: @isolated(any) () -> Void = {
            [unowned self] in
            _ = self
        }
    }

    deinit {
        print("deinit")
    }
}

@main
enum App {
    static func main() async {
        let a = A()
        await a.bug()
        // "deinit" prints twice, but only one instance created
    }
}

if i add a print() statement into the end of the bug() method, then deinit prints before it, so it seems something with the way the closure capture is handled is likely responsible. i've stared at the SIL for a long time and i'm still not entirely sure if there's something wrong at that level, or if it's further down the pipeline. the SIL for the bug() function in the above example looks like:

// A.bug(), loc "/app/example.swift":4:10, scope 6
// Isolation: actor_instance. name: 'self'
sil hidden @$s6output1AC3bugyyF : $@convention(method) (@sil_isolated @guaranteed A) -> () {
// %0 "self"                                      // users: %7, %4, %3, %1
bb0(%0 : $A):
  debug_value %0, let, name "self", argno 1, loc "/app/example.swift":4:10, scope 6 // id: %1
  %2 = alloc_stack [lexical] [var_decl] $@sil_unowned A, loc "/app/example.swift":6:26, scope 8 // users: %8, %6, %19, %18
  strong_retain %0, loc "/app/example.swift":6:26, scope 7 // id: %3
  %4 = ref_to_unowned %0 to $@sil_unowned A, loc "/app/example.swift":6:26, scope 7 // users: %6, %5
  unowned_retain %4, loc "/app/example.swift":6:26, scope 7 // id: %5
  store %4 to %2, loc "/app/example.swift":6:26, scope 7 // id: %6
  strong_release %0, loc "/app/example.swift":6:26, scope 7 // id: %7
  %8 = load %2, loc * "/app/example.swift":5:44, scope 7 // user: %9
  %9 = strong_copy_unowned_value %8, loc * "/app/example.swift":5:44, scope 7 // users: %12, %10
  %10 = ref_to_unowned %9 to $@sil_unowned A, loc * "/app/example.swift":5:44, scope 7 // users: %17, %14, %13, %11
  unowned_retain %10, loc * "/app/example.swift":5:44, scope 7 // id: %11
  strong_release %9, loc * "/app/example.swift":5:44, scope 7 // id: %12
  unowned_retain %10, loc "/app/example.swift":5:44, scope 7 // id: %13
  %14 = init_existential_ref %10 : $@sil_unowned A : $@sil_unowned A, $any Actor, loc "/app/example.swift":5:44, scope 7 // user: %15
  %15 = enum $Optional<any Actor>, #Optional.some!enumelt, %14, loc "/app/example.swift":5:44, scope 7 // user: %16
  release_value %15, loc "/app/example.swift":6:26, scope 7 // id: %16
  unowned_release %10, loc "/app/example.swift":6:26, scope 7 // id: %17
  destroy_addr %2, loc "/app/example.swift":6:26, scope 7 // id: %18
  dealloc_stack %2, loc "/app/example.swift":6:26, scope 7 // id: %19
  %20 = tuple (), loc "/app/example.swift":9:5, scope 7 // user: %21
  return %20, loc "/app/example.swift":9:5, scope 7 // id: %21
} // end sil function '$s6output1AC3bugyyF'

if we reduce it just to the instructions that i think affect strong reference counts (per the docs), we get:

strong_retain %0                    // +1
strong_release %0                   // -1

%9 = strong_copy_unowned_value %8   // +1
strong_release %9                   // -1

release_value %15                   // -1
// net change: -1?

looking at the lowered IR maybe makes it more clear something is off, since the strong retain/release counts don't seems to be balanced:

define hidden swiftcc void @"output.A.bug() -> ()"(ptr swiftself %0) #0 !dbg !59 {
entry:
  %self.debug = alloca ptr, align 8
    #dbg_declare(ptr %self.debug, !64, !DIExpression(), !65)
  call void @llvm.memset.p0.i64(ptr align 8 %self.debug, i8 0, i64 8, i1 false)
  %1 = alloca %swift.unowned, align 8
  store ptr %0, ptr %self.debug, align 8, !dbg !66
  call void @llvm.lifetime.start.p0(i64 8, ptr %1), !dbg !67
  %2 = call ptr @swift_retain(ptr returned %0) #3, !dbg !71                      ; strong +1
  %3 = call ptr @swift_unownedRetain(ptr returned %0) #2, !dbg !71
  store ptr %0, ptr %1, align 8, !dbg !71
  call void @swift_release(ptr %0) #2, !dbg !71                                  ; strong -1
  %4 = load ptr, ptr %1, align 8, !dbg !72
  %5 = call ptr @swift_unownedRetainStrong(ptr returned %4) #2, !dbg !72         ; strong +1
  %6 = call ptr @swift_unownedRetain(ptr returned %4) #2, !dbg !72
  call void @swift_release(ptr %4) #2, !dbg !72                                  ; strong -1
  %7 = call ptr @swift_unownedRetain(ptr returned %4) #2, !dbg !73
  %8 = call ptr @"lazy protocol witness table accessor for type output.A and conformance output.A : Swift.Actor in output"() #10, !dbg !73
  %9 = ptrtoint ptr %4 to i64, !dbg !73
  %10 = ptrtoint ptr %8 to i64, !dbg !73
  %11 = inttoptr i64 %9 to ptr, !dbg !74
  %12 = inttoptr i64 %10 to ptr, !dbg !74
  call void @swift_release(ptr %11) #2, !dbg !74                                 ; strong -1
  call void @swift_unownedRelease(ptr %4) #2, !dbg !74
  %toDestroy = load ptr, ptr %1, align 8, !dbg !76
  call void @swift_unownedRelease(ptr %toDestroy) #2, !dbg !76
  call void @llvm.lifetime.end.p0(i64 8, ptr %1), !dbg !76
  ret void, !dbg !76
}

so, what's going wrong here exactly? my best guess so far is that somehow the combination of an unowned, optional, class existential capture is producing the wrong cleanups. the instruction that seems particularly unusual to me is this:

%14 = init_existential_ref %10 : $@sil_unowned A : $@sil_unowned A, $any Actor

i've yet to come up with another formulation that will produce an init_existential_ref that has @sil_unowned on the AST types. is such an instruction well-formed? can you create a class existential container that houses an unowned reference?

at any rate, would be interested if anyone has ideas about what could actually be going wrong here.


here's a godbolt sample i was working with that contains code that differs just in the ownership of the closure capture. you can add a 'Diff View' from the 'Add' dropdown menu to directly compare the intermediate representation outputs. note that it is sort of difficult to reproduce this with development compilers or those with assertions enabled due to the fact that the involved code trips various SIL verifier issues and assertions, so the frontend flag -sil-verify-none is used. however, the code does compile successfully (or, perhaps, unsuccessfully?) with at least some of the release compilers that exist (e.g. the 6.2.0 Xcode toolchain).

4 Likes

I’m too lazy to check right now, but what do unowned class existentials look like? ie, unowned var x: AnyObject. If it’s different, it might be worth sticking an assert in the right place to guard against such init_existential_refs from being constructed, and fix the root cause.

with input like this:

class C {
    func g() {
        unowned let _: AnyObject = self
    }
}

actor A {
    func g() {
        unowned let _: AnyObject = self
        unowned let _: any Actor = self
    }
}

then the class example looks like:

%3 = init_existential_ref %0 : $C : $C, $AnyObject

and the actor cases come out as:

%3 = init_existential_ref %0 : $A : $A, $AnyObject
%6 = init_existential_ref %0 : $A : $A, $any Actor

not sure how the @sil_unowned is getting in there in the isolated capture case... none of the other closure capture permutations i've tried appear to end up with it, so maybe it's something unique to @isolated(any) closures. will try figuring out where to add an assertion to see what may be responsible.

I meant just asserting against a ReferenceStorageType appearing in SILBuilder::createInitExistentialRef() so you can catch the erroneous instruction being created.

1 Like

with that assertion, the stack trace looks like (simplified):

📚 stack trace
#7  swift::Lowering::SILGenFunction::emitExistentialErasure(...)
#8  swift::Lowering::SILGenFunction::emitExistentialErasure(...)
#9  swift::Lowering::SILGenFunction::emitTransformExistential(...)
#10 emitNonOptionalActorInstanceIsolation(...)
#11 swift::Lowering::SILGenFunction::emitActorInstanceIsolation(...)
#12 swift::Lowering::SILGenFunction::emitClosureIsolation(...)
#13 swift::Lowering::SILGenFunction::emitClosureValue(...)
#14 (anonymous namespace)::RValueEmitter::emitClosureReference(...)
#15 (anonymous namespace)::RValueEmitter::visitAbstractClosureExpr(...)
#16 (anonymous namespace)::RValueEmitter::visitCaptureListExpr(...)
#17 swift::Lowering::SILGenFunction::emitRValueAsSingleValue(...)
#18 swift::Lowering::SILGenFunction::emitConvertedRValue(...)
#19 swift::Lowering::SILGenFunction::emitConvertedRValue(...)
#20 (anonymous namespace)::RValueEmitter::visitFunctionConversionExpr(...)
#21 swift::Lowering::SILGenFunction::emitExprInto(...)
#22 swift::Lowering::SILGenFunction::emitPatternBinding(...)
#23 swift::ASTVisitor<swift::Lowering::SILGenFunction, ...>::visit(swift::Decl*)
#24 swift::ASTVisitor<(anonymous namespace)::StmtEmitter, ...>::visit(swift::Stmt*)
#25 swift::Lowering::SILGenFunction::emitStmt(...)
#26 swift::Lowering::SILGenFunction::emitFunction(...)

i found this in the header of SILGenFunction (emphasis mine):

  /// Transform an actor reference into an opaque isolation value.
  /// This supports optional actor references.
  /// The actor reference must be +1. ‼️
  ManagedValue emitActorInstanceIsolation(SILLocation loc,
                                          ManagedValue actor,
                                          CanType actorType);

although i'm not sure if +1 specifically means +1 strong reference count in this context.

if i take this portion of the the existing implementation of emitLoadOfCaptureIsolation:

    auto value = captureArgs[i].copy(SGF, loc);
    return SGF.emitActorInstanceIsolation(loc, value, isolatedVarType);

and force a a strong copy when it's a reference storage type:

    ManagedValue value;
    if (isa<ReferenceStorageType>(isolatedVarType)) {
      auto arg = captureArgs[i];
      value = SGF.getBuilder()
            .createStrongCopyUnownedValue(loc, arg);
    } else {
      value = captureArgs[i].copy(SGF, loc);
    }

then it seemingly resolves the over-release. still unsure if that's an appropriate thing to be doing though, let alone if it's really getting at the root of the issue.


it's somewhat unclear to me what the expected semantics are here. is an @isolated(any) closure supposed to keep a strong reference to the actor providing its isolation? if so that would sort of undermine expectations when using unowned.

well, turns out i still had the SIL verifier off. doing it this way results in a different issue:

SIL verification failed: init_existential_ref operand must be lowered to the right abstraction level for the existential
  $A
  $@sil_unowned A
Verifying instruction:
     %17 = strong_copy_unowned_value %15 : $@sil_unowned A // user: %18
->   %18 = init_existential_ref %17 : $A : $@sil_unowned A, $any Actor // user: %19
     %19 = enum $Optional<any Actor>, #Optional.some!enumelt, %18 : $any Actor // user: %20

but maybe getting slightly closer...

1 Like

well, i feel like this issue has helped me learn a lot about SILGen, but it still remains unclear to me what an appropriate solution to the problem may be. i've found a few approaches that seem to make the problem 'go away', which mostly rely on ensuring the existential erasure does not occur on an unowned reference storage type, but rather on the referent type (WIP PR). but it's still not 100% clear to me exactly why this mitigates the issue, which is a bit irksome.

from what i can tell, erasing a (lowered to SIL) UnownedStorageType to an existential via init_existential_ref is not a currently-supported thing to do – the SILVerifier fails on it here because ReferenceStorageType is not a 'bridgeable object type'. i'm not totally sure if that code is correct though... my naive assumption would be that the reference storage wrappers could forward to their underlying type to produce the object & retain/release implementations.

anyway, all existing SILGen patterns i've seen involving unowned existentials seem to do an unowned_to_ref conversion before initing the existential class reference. so, i've tried to copy that pattern, and it seems to help, at least superficially.

a simpler (but less satisfying) approach could be to just ban this construct, at least until it's implemented more robustly (ASSERT'ing would be easy enough i should think, but also it could presumably be diagnosed somewhere in Sema), but i'm not really sure if that's reasonable.

i'm still unclear on the intended semantics... are you supposed to be able to construct @isolated(any) closures where the erased isolated capture is unowned? is the erased isolation supposed to keep the actor alive, regardless of the capture's ownership?

If the same pattern of code works fine for a non-actor class, it's always possible there's a runtime bug specifically with unowned actors. Default actors play some funny tricks with deallocation that could well need special handling.

I think this is the real answer right here. @isolated(any) function types have special logic to capture the current isolation, but if self is unowned or weak, it doesn't really make sense to do that. It sounds like the right thing to do is to diagnose this as an error in Sema. You could also capture self as a strong reference regardless in this case, but I think that's strictly worse because it's unlikely to be what the user intended.

1 Like

It works without the actor, but then I think it reduces to a no-op. We partially-apply a nil value of type Optional<any Actor> to the @isolated(any) function, which does nothing:

class A {
  func bug() {
    let _: @isolated(any) () -> Void = {
      [unowned self] in
      _ = self
    }
  }
}

This doesn't demonstrate the problem, because closure body isn't actually isolated to an unowned reference to an actor in this case.

We should add an ASSERT to SILBuilder to guard against forming an existential with an illegal payload type, and avoid the assert by diagnosing this invalid code pattern in Sema.

2 Likes

i'm not sure how to construct an analog of this case using a class, or if that's possible. it seems like the issue is with how the special @isolated(any) parameter is handled in the case in which the closure capture that provides the 'source' isolation is unowned, and classes cannot be used as isolated parameters.

in terms of the ownership semantics though... if the closure does capture the isolation as unowned, and that gets stored in the special context slot for @isolated(any) closures, what exactly is supposed to happen? my assumption is that the closure 'owns' (in the SIL OwnershipKind sense) that value, and if the 'source' capture was unowned then the closure should not create a new strong catpure. if, when invoked, the actor reference is no longer valid, we'd just crash as we would when trying to use any unowned value that's deinitialized.

for weak it doesn't make sense because (AFAIK) there isn't really a model for how weak captures of isolated values should be handled. i could imaging supporting them where there's some sort of runtime trap if they're nil, but that would require an evolution proposal or something. unowned though seems like it basically has the properties necessary for this purpose... it's non-optional, and already has the semantics that if you try and use it after deinit it traps. so in theory it seems like it should 'just work' as long as the actor is still alive when calling the closure.

1 Like

I suspect there is an ABI reason why the @isolated(any) captures the isolation as an erased Optional<any Actor> instead of as the concrete type of self. We set a special [isolated_any] flag on the partial_apply, and we don't actually use the captured Optional <any Actor> inside the function body directly. Perhaps the runtime reflectively loads the Optional<any Actor> from the @isolated(any) closure's context, or something? In this case it has to remain a strong reference.

would you mind elaborating on why this follows?

Unowned references and strong references are not interchangeable; you have to convert an unowned reference to a strong reference with a runtime call, so that this runtime call can trap if the unowned reference points at a dead object. Take a look at this program:

class C {
  let x = 123

  func f() -> () -> () {
    return { [unowned self] in
      g(self)
    }
  }
}

@_optimize(none) func g(_: C) {}

If you compile it with -emit-ir, you'll see some runtime calls appearing the closure that would not be there if you remove the [unowned self] capture list:

define internal swiftcc void @"$s7unowned1CC1fyycyFyycfU_"(ptr %0) #0 {
entry:
  %self.debug = alloca ptr, align 8
  call void @llvm.memset.p0.i64(ptr align 8 %self.debug, i8 0, i64 8, i1 false), !Swift.isSwiftLLDBpreinit !18
  store ptr %0, ptr %self.debug, align 8
  %1 = call ptr @swift_unownedRetainStrong(ptr returned %0) #5
  call swiftcc void @"$s7unowned1gyyAA1CCF"(ptr %0)
  call void @swift_release(ptr %0) #5
  ret void
}

thank you. in general this makes sense, and i think it also helps resolves some confusion i developed when looking into this issue.


to digress slightly, if we look at how unowned variables are treated when wrapped within existential & optional wrappers, it seems that at each stage, the actual SIL type placed within the wrapper is never itself @sil_unowned, but there are transformations that take place when moving things in and out of the 'boxes'. e.g. if we consider code like this:

class C {
    init() { print("born") }
    deinit { print("died") }
}

func ownership() {
    let c: C = C()
    let fn = {
        [unowned c] in
        unowned let uo = c
        unowned let uoexopt: AnyObject? = uo
        _ = uoexopt
    }
    fn()
    _ = consume c
}

then the closure body's (raw) SILGen output looks something like:

// closure #1 in ownership(), loc "/app/example.swift":75:14, scope 46
// Isolation: nonisolated
sil private [ossa] @$s6output9ownershipyyFyycfU_ : $@convention(thin) (@guaranteed @sil_unowned C) -> () {
// %0 "c"                                         // users: %5, %1
bb0(%0 : @closureCapture @guaranteed $@sil_unowned C):
  debug_value %0, let, name "c", argno 1, loc "/app/example.swift":76:18, scope 47 // id: %1
  %2 = alloc_box ${ var @sil_unowned Optional<AnyObject> }, let, name "uoexopt", loc "/app/example.swift":77:21, scope 48 // users: %18, %3
  %3 = begin_borrow [lexical] [var_decl] %2, loc "/app/example.swift":77:21, scope 48 // users: %17, %4
  %4 = project_box %3, 0, loc "/app/example.swift":77:21, scope 48 // users: %12, %10
  %5 = strong_copy_unowned_value %0, loc "/app/example.swift":77:43, scope 49 // user: %6
  %6 = init_existential_ref %5 : $C : $C, $AnyObject, loc "/app/example.swift":77:43, scope 49 // user: %7
  %7 = enum $Optional<AnyObject>, #Optional.some!enumelt, %6, loc "/app/example.swift":77:43, scope 49 // users: %11, %8
  %8 = ref_to_unowned %7 to $@sil_unowned Optional<AnyObject>, loc "/app/example.swift":77:43, scope 49 // user: %9
  %9 = copy_value %8, loc "/app/example.swift":77:43, scope 49 // user: %10
  store %9 to [init] %4, loc "/app/example.swift":77:43, scope 49 // id: %10
  destroy_value %7, loc "/app/example.swift":77:43, scope 49 // id: %11
  %12 = load_borrow %4, loc "/app/example.swift":78:13, scope 48 // users: %14, %13
  %13 = strong_copy_unowned_value %12, loc "/app/example.swift":78:13, scope 48 // users: %16, %15
  end_borrow %12, loc "/app/example.swift":78:13, scope 48 // id: %14
  ignored_use %13, loc "/app/example.swift":78:13, scope 48 // id: %15
  destroy_value %13, loc "/app/example.swift":78:13, scope 48 // id: %16
  end_borrow %3, loc "/app/example.swift":79:5, scope 48 // id: %17
  destroy_value %2, loc "/app/example.swift":79:5, scope 48 // id: %18
  %19 = tuple (), loc "/app/example.swift":79:5, scope 48 // user: %20
  return %19, loc "/app/example.swift":79:5, scope 48 // id: %20
} // end sil function '$s6output9ownershipyyFyycfU_'

and we can see that, despite unowned appearing in basically every possible location, there is still a strong_copy_unowned_value before the init_existential_ref and enum transformation, and each use has its own unowned-to-strong conversion before actually doing anything with the underlying value.


returning to the specific case of @isolated(any) closures... as it seems there's already sort of a special way to access the function type's isolation value – the function_extract_isolation instruction – could the appropriate runtime handling be performed by that (if it isn't already) to deal with this issue? although... i suppose even if that were supported, perhaps the existing SIL function signatures still wouldn't be correct... today they print as, e.g.

bb0(%0 : @guaranteed $Optional<any Actor>, %1 : @closureCapture @guaranteed $@sil_unowned A):

where the isolation argument %0 never seems to be @sil_unowned even if the closure capture from which it was derived is (the docs for partial_apply do have some callouts about how the ownership requirements don't exactly match the type of the callee, though i'm not sure if that's directly relevant).

anyway, i mostly ask this out of curiosity and to try and better understand the possibilities. banning this seems reasonable to me, and IMO has the benefit of simplifying things slightly.

Yeah, the basic philosophy here is that UnownedStorageType only appears out the outermost level of the type of a SIL value, and never inside an existential or as the generic argument of an optional, etc. in order to do anything with an unowned reference, including operations like wrapping it in an optional or erasing it in an existential, you must upgrade the unowned reference to strong first (and possibly trap at this point), form your new value, and then form an unowned reference to the new value.

1 Like

here is a proposed change to strengthen the assertion to prevent the miscompilation: [SILGen]: strengthen assertions when creating InitExistentialRef by jamieQ · Pull Request #86413 · swiftlang/swift · GitHub. i assume there is some potential for a source break since there is code that a 6.2 no asserts compiler will currently build that will no longer build after this change. not sure what the policy is on that, but personally i'd prefer my code to stop building in a case like this.

i've not yet determined how the issue can or should be handled in Sema though, unfortunately.