There are a couple reported issues that look to me like miscompiles due to the "optimize hop to executor" pass, specifically involving nonisolated(nonsending) functions & closures. I reported this one a few months ago, but failed to identify an adequate solution at the time and unfortunately it looks like it shipped in 6.3. Yesterday another similar-looking issue was reported that appears to have a related root cause. I'll go over them in reverse order.
Issue 1: Task cancellation leads to thread switch and crash · Issue #88259 · swiftlang/swift · GitHub
The more recently-reported issue appears to be that interposing an nonisolated(nonsending), async, throwing call between an isolated caller and a nonisolated callee can result in returning to the isolated caller on an arbitrary executor, specifically in the case that the nonisolated callee throws. Concretely, this sort of thing behaves incorrectly:
@inline(never)
nonisolated func asyncThrowsFn() async throws {
throw CancellationError()
}
nonisolated(nonsending)
func ninsFwdingBug() async throws {
try await asyncThrowsFn() // error branch here does not hop back to the caller's executor
}
@MainActor
func bug() async {
do {
try await ninsFwdingBug()
} catch {
MainActor.assertIsolated() // 💥
}
}
await bug()
The SILGen for the error branch in the nonsending function looks something like:
bb2(%9 : @owned $any Error):
hop_to_executor %0
throw %9
But after the optimize-hop-to-executor pass, the hop is removed and we're left with a bare throw in that branch:
bb2(%8 : $any Error):
throw %8
Debugging the pass suggests it gets removed as a "dead executor" – i.e. that nothing following the hop_to_executor actually requires being on the executor. But that conclusion seems incorrect in this case, since (as documented in the code) callers are not required to "hop back" after calling a nonisolated(nonsending) function. There is logic that special-cases returns from nonisolated(nonsending) functions to avoid this problem, but it seems that other means of exiting a function aren't currently handled.
This PR contains a proposed fix: to treat function-exiting instructions uniformly, which appears to solve the problem.
Issue 2: [concurrency]: nonisolated(nonsending) can unexpectedly lose actor isolation · Issue #86083 · swiftlang/swift · GitHub
I'm less confident in the diagnosis here, but I think this bug may also be an issue with an executor hop that is incorrectly deemed "dead".
Schematically, the problem arises in situations like this, where an isolated function calls into a nonisolated(nonsending) closure:
actor A {
func callNinsClosure(
doit: nonisolated(nonsending) () async -> Void
) async {
await doit()
}
}
nonisolated func bug() async {
let a = A()
await a.callNinsClosure {
a.assertIsolated() // 💥
}
}
await bug()
The SILGen for the actor-isolated call that forwards to the closure looks like this (simplified a bit):
// A.callNinsClosure(doit:), loc "/app/example.swift":23:10, scope 8
// Isolation: actor_instance. name: 'self'
// ...
bb0(%0 : @guaranteed $@noescape @async @callee_guaranteed (@sil_isolated @sil_implicit_leading_param @guaranteed Builtin.ImplicitActor) -> (), %1 : @guaranteed $A):
debug_value %0, let, name "doit", argno 1 // id: %2
debug_value %1, let, name "self", argno 2 // id: %3
hop_to_executor %1
%5 = copy_value %0
%6 = copy_value %1
%7 = init_existential_ref %6 : $A : $A, $any Actor
%8 = enum $Optional<any Actor>, #Optional.some!enumelt, %7
%9 = begin_borrow %8
%10 = unchecked_value_cast %9 to $Builtin.ImplicitActor
%11 = begin_borrow %5
%12 = apply %11(%10) : $@noescape @async @callee_guaranteed (@sil_isolated @sil_implicit_leading_param @guaranteed Builtin.ImplicitActor) -> ()
end_borrow %11
end_borrow %9
destroy_value %8
hop_to_executor %1
destroy_value %5
%18 = tuple ()
return %18
Dumping the function after the optimize executor hop pass we get:
bb0(%0 : @guaranteed $@noescape @async @callee_guaranteed (@sil_isolated @sil_implicit_leading_param @guaranteed Builtin.ImplicitActor) -> (), %1 : @guaranteed $A):
debug_value %0, let, name "doit", argno 1 // id: %2
debug_value %1, let, name "self", argno 2 // id: %3
// ⁉️ entry hop no longer present
%4 = copy_value %0 // users: %10, %14
%5 = copy_value %1 // user: %6
%6 = init_existential_ref %5 : $A : $A, $any Actor // user: %7
%7 = enum $Optional<any Actor>, #Optional.some!enumelt, %6 // users: %12, %8
%8 = begin_borrow %7 // users: %11, %9
%9 = unchecked_value_cast %8 to $Builtin.ImplicitActor // user: %10
%10 = apply %4(%9) : $@noescape @async @callee_guaranteed (@sil_isolated @sil_implicit_leading_param @guaranteed Builtin.ImplicitActor) -> ()
end_borrow %8 // id: %11
destroy_value %7 // id: %12
hop_to_executor %1 // id: %13
destroy_value %4 // id: %14
%15 = tuple () // user: %16
return %15 // id: %16
And we can see the initial hop to enter the actor has been removed.
I think the problem in this case is that the apply is not being recognized as a call to an isolation-inheriting closure, so the pass assumes it will perform whatever executor hop it needs internally. The pass then reasons that none of the instructions preceding the apply require the current actor's executor, so the entry hop is removed. There is logic that attempts to prevent this, which does work for calls to nonisolated(nonsending) functions, but it doesn't seem to work for closures.
A draft of a potential fix for this can be found here, which changes the existing logic to try and infer caller-isolation-inheriting cases even when explicit actor isolation info is missing. I'm not entirely sure this approach is appropriate, or if the lack of isolation information is a sign of a deeper problem, so feedback would be appreciated.
If interested, here's the compiler explorer setup used to help investigate: Compiler Explorer