Possible miscompiles in OptimizeHopToExecutor

There are a couple reported issues that look to me like miscompiles due to the "optimize hop to executor" pass, specifically involving nonisolated(nonsending) functions & closures. I reported this one a few months ago, but failed to identify an adequate solution at the time and unfortunately it looks like it shipped in 6.3. Yesterday another similar-looking issue was reported that appears to have a related root cause. I'll go over them in reverse order.


Issue 1: Task cancellation leads to thread switch and crash · Issue #88259 · swiftlang/swift · GitHub

The more recently-reported issue appears to be that interposing an nonisolated(nonsending), async, throwing call between an isolated caller and a nonisolated callee can result in returning to the isolated caller on an arbitrary executor, specifically in the case that the nonisolated callee throws. Concretely, this sort of thing behaves incorrectly:

@inline(never)
nonisolated func asyncThrowsFn() async throws {
    throw CancellationError()
}

nonisolated(nonsending)
func ninsFwdingBug() async throws {
    try await asyncThrowsFn() // error branch here does not hop back to the caller's executor
}

@MainActor
func bug() async {
    do {
        try await ninsFwdingBug()
    } catch {
        MainActor.assertIsolated() // 💥
    }
}

await bug()

The SILGen for the error branch in the nonsending function looks something like:

bb2(%9 : @owned $any Error):
  hop_to_executor %0
  throw %9

But after the optimize-hop-to-executor pass, the hop is removed and we're left with a bare throw in that branch:

bb2(%8 : $any Error):
  throw %8

Debugging the pass suggests it gets removed as a "dead executor" – i.e. that nothing following the hop_to_executor actually requires being on the executor. But that conclusion seems incorrect in this case, since (as documented in the code) callers are not required to "hop back" after calling a nonisolated(nonsending) function. There is logic that special-cases returns from nonisolated(nonsending) functions to avoid this problem, but it seems that other means of exiting a function aren't currently handled.

This PR contains a proposed fix: to treat function-exiting instructions uniformly, which appears to solve the problem.


Issue 2: [concurrency]: nonisolated(nonsending) can unexpectedly lose actor isolation · Issue #86083 · swiftlang/swift · GitHub

I'm less confident in the diagnosis here, but I think this bug may also be an issue with an executor hop that is incorrectly deemed "dead".

Schematically, the problem arises in situations like this, where an isolated function calls into a nonisolated(nonsending) closure:

actor A {
    func callNinsClosure(
        doit: nonisolated(nonsending) () async -> Void
    ) async {
        await doit()
    }
}

nonisolated func bug() async {
    let a = A()
    await a.callNinsClosure {
        a.assertIsolated() // 💥
    }
}

await bug()

The SILGen for the actor-isolated call that forwards to the closure looks like this (simplified a bit):

// A.callNinsClosure(doit:), loc "/app/example.swift":23:10, scope 8
// Isolation: actor_instance. name: 'self'
// ...
bb0(%0 : @guaranteed $@noescape @async @callee_guaranteed (@sil_isolated @sil_implicit_leading_param @guaranteed Builtin.ImplicitActor) -> (), %1 : @guaranteed $A):
  debug_value %0, let, name "doit", argno 1 // id: %2
  debug_value %1, let, name "self", argno 2 // id: %3
  hop_to_executor %1
  %5 = copy_value %0
  %6 = copy_value %1
  %7 = init_existential_ref %6 : $A : $A, $any Actor
  %8 = enum $Optional<any Actor>, #Optional.some!enumelt, %7
  %9 = begin_borrow %8
  %10 = unchecked_value_cast %9 to $Builtin.ImplicitActor
  %11 = begin_borrow %5
  %12 = apply %11(%10) : $@noescape @async @callee_guaranteed (@sil_isolated @sil_implicit_leading_param @guaranteed Builtin.ImplicitActor) -> ()
  end_borrow %11
  end_borrow %9
  destroy_value %8
  hop_to_executor %1
  destroy_value %5
  %18 = tuple ()
  return %18

Dumping the function after the optimize executor hop pass we get:

bb0(%0 : @guaranteed $@noescape @async @callee_guaranteed (@sil_isolated @sil_implicit_leading_param @guaranteed Builtin.ImplicitActor) -> (), %1 : @guaranteed $A):
  debug_value %0, let, name "doit", argno 1       // id: %2
  debug_value %1, let, name "self", argno 2       // id: %3
  // ⁉️ entry hop no longer present
  %4 = copy_value %0                              // users: %10, %14
  %5 = copy_value %1                              // user: %6
  %6 = init_existential_ref %5 : $A : $A, $any Actor // user: %7
  %7 = enum $Optional<any Actor>, #Optional.some!enumelt, %6 // users: %12, %8
  %8 = begin_borrow %7                            // users: %11, %9
  %9 = unchecked_value_cast %8 to $Builtin.ImplicitActor // user: %10
  %10 = apply %4(%9) : $@noescape @async @callee_guaranteed (@sil_isolated @sil_implicit_leading_param @guaranteed Builtin.ImplicitActor) -> ()
  end_borrow %8                                   // id: %11
  destroy_value %7                                // id: %12
  hop_to_executor %1                              // id: %13
  destroy_value %4                                // id: %14
  %15 = tuple ()                                  // user: %16
  return %15                                      // id: %16

And we can see the initial hop to enter the actor has been removed.

I think the problem in this case is that the apply is not being recognized as a call to an isolation-inheriting closure, so the pass assumes it will perform whatever executor hop it needs internally. The pass then reasons that none of the instructions preceding the apply require the current actor's executor, so the entry hop is removed. There is logic that attempts to prevent this, which does work for calls to nonisolated(nonsending) functions, but it doesn't seem to work for closures.

A draft of a potential fix for this can be found here, which changes the existing logic to try and infer caller-isolation-inheriting cases even when explicit actor isolation info is missing. I'm not entirely sure this approach is appropriate, or if the lack of isolation information is a sign of a deeper problem, so feedback would be appreciated.


If interested, here's the compiler explorer setup used to help investigate: Compiler Explorer

5 Likes

Thanks for calling this out. I just spent this morning debugging crashes that are likely from the same source. We have an actor that takes a nonisolated(nonsending) closure, and calling that closure is not preserving the actor’s isolation when using Swift 6.3 (Xcode 26.4).

The same code works correctly in Swift 6.2.4 (Xcode 26.3).

5 Likes

@jamieQ I found another example which is also about nonisolated(nonsending) function loses its isolation. The symptom is that putting a nonisolated(nonsending) function in a closure causes it to run on global executor. Specifying the closure's isolation explicitly doesn't help.

See: https://godbolt.org/z/PnMjExjx1

actor A {
    func test() async {
        await foo()
    }
}

nonisolated(nonsending) 
func foo() async {
    // Specifying closure's isolation explicitly doesn't work either
    // let fn: nonisolated(nonsending) () async -> Void = { await bar() }
    let fn = { await bar() }
    await fn()
}

nonisolated(nonsending) 
func bar() async {
    print("OK")
}

The call path in SIL:

test()
  -> hop to actor executor
  -> foo()
     -> closure()  // $s6output3fooyyYaFyyYacfU_
        -> hop to global executor?!
        -> bar()
     -> hop back to implicit actor

Do you think if this is the same as #86083?

Sorry, my mistake. Specifying the closure's isolation explicitly does work. So the default behavior is perhaps by design.

1 Like

I would say this is related to this: Inconsistent nonisolated(nonsending) Inference for closures with NonisolatedNonsendingByDefault · Issue #87842 · swiftlang/swift · GitHub

1 Like

Yes, they are similar. I didn't enable NonisolatedNonsendingByDefault setting in my experiment, so the behavior I observed was by design.

It looks like a fix has been merged for this issue. (Thanks, @xedin!) Is there any chance this will make it into a Swift 6.3 patch? Or are we stuck with this miscompile until Swift 6.4 is released? Historically, it looks like it has taken a few months between the time the release branch is cut and when the final release happens.

Version Release branch created Public Release Delay
Swift 6.1 November 13, 2024 March 31, 2025 ~4.5 months
Swift 6.2 April 2, 2025 September 15, 2025 ~5.5 months
Swift 6.3 November 12, 2025 March 24, 2026 ~4 months
Swift 6.4 May 4, 2026 September 2026? ???

The Swift 6.4 release branch hasn't been created yet, but the release process post says this:

:warning: Until the new branch is created on May 4, 2026, the main branch should be treated as a release branch. Major changes should be avoided during this time, and the focus should be on converging the release.

This suggests to me that the default assumption is that changes going into main at this point would be released in Swift 6.4. If the usual schedule holds, that means that we'd be living with this miscompile in Swift 6.3 for about 5 more months. That's unfortunate, because it makes nonisolated(nonsending) dangerous, even at a time when it's otherwise being actively promoted as a better default (via inclusion in "Approachable Concurrency").

Anyway, super happy that a fix has been found. I'm hoping the fix can make it into a 6.3.x patch — right now, our team has to stay on Swift 6.2 (via Xcode 26.3), and that means we're missing out on other new Xcode features. And that's sad.

2 Likes

We are trying to figure out what we can do about this which would carry as little risk as possible for 6.3.

6 Likes

6.3 PR for reference - [6.3] [SIL/SILOptimizer] Fix a few issues where OptimizeHopToExecutor removed valid hops by xedin · Pull Request #88687 · swiftlang/swift · GitHub

4 Likes

In my testing, this does appear to be resolved in the version of Swift 6.3 included in the Xcode 26.5 RC build. (Initially, I was concerned, because the version number reported by the compiler didn't change in this release relative to Xcode 26.5 beta 3, but it does indeed appear to have different behavior. And the PR above wasn't merged until after beta 3 was released.)