An actor with non-async implementation of an async protocol method crashes periodically

Hi, I've a very strange bug, that seems to be fixed by removing "async" everywhere. Question is, does anyone know what is happening there? Here are the details:

  • I have a protocol P that has one method that is specified as "async".
  • protocol itself is ": Actor"
  • I have an implementation Impl: P, that is an actor and has an implementation of this method, but without "async".
  • everything builds and works.

But sometimes I get crashes when accessing Impl.someDictionary. It is usually something like unrecognized selector sent to instance 0x8000000000000000, mentioning some [__NSTaggedDate objectForKey:] or another tagged internal class.

I removed "async" from the protocol and the implementation (it was left there from the time when I did not use actors there), and everything seems to work now.

Question: what is the difference? What happens, when I write "async" in the protocol (that should be implemented as an actor), but do not write "async" in the actor implementation?

Thanks!

There should be no difference, because:

  • actor methods will be async at call outside of actor context
  • non-async implementation is valid implementation of method marked as async in protocol

Given eventual nature of crash, that seems to be a concurrency issue (yep, strange to get it for an actor, but who knows) of dictionary access. But too many things to assume here, if you’ll provide minimal code sample that reproduces this crash, it can simplify understanding what’s happening.

Given eventual nature of crash, that seems to be a concurrency issue (yep, strange to get it for an actor, but who knows) of dictionary access. But too many things to assume here, if you’ll provide minimal code sample that reproduces this crash, it can simplify understanding what’s happening.

I thought exactly that. Dictionary is being accessed concurrently indeed, I can see that. How was it fixed then when async disappeared?

I will try to make a small example of the code, but it may be quite difficult to do. Will try anyway.

Hard to guess why removing async fixes that (and does it? or just makes it much rare to occur, so you haven’t bumped into it) without full picture. It can be unsafe access somewhere in call chain, as you noted it was not an actor previously, and async triggers executor switch leading to crash as one of possibilities. Do you have strict concurrency checks enabled?

How is the client of this type calling this method?

var protoOpaque: some P
var protoImpl: Impl
var protoBoxed: any P
1 Like

How is the client of this type calling this method?

Last option, var protoBoxed: any P

Do you have strict concurrency checks enabled?

Yes, I did that. Did not help. Swift 5.10

Hmm… do you have the option to try returning a concrete or opaque type in this place? Does that have any effect on the crash?

Whoa, it was kinda easy. Here is the code (I've made a macOS command line utility) that works/does not work, depending on the async in line 4.

Reading/writing in the testFunc implementation is vital, without those code runs OK.

import Foundation

protocol Foo: Actor {
    func testFunc(key: String) async -> String?
}

actor Bar: Foo {
    private var dictionary: [String: String] = [:]

    func testFunc(key: String) -> String? {
        if let result = dictionary[key] {
            return result
        } else {
            dictionary[key] = "value for \(key)"
            return "value for \(key)"
        }
    }
}

let keys: [String] = [
    "test1",
    "test2",
    "test3",
    "test4",
    "test5",
    "test6",
    "test7",
]

let bar: any Foo = Bar()

await withTaskGroup(of: Void.self) { group in
    for _ in 0 ... 10000 {
        group.addTask {
            guard let key = keys.randomElement() else { return }

            let value: String? = await bar.testFunc(key: key)
            print("got “\(value ?? "nil")”")
        }
    }
}
1 Like

Seems to fix the crash in the test code above. O_o One step closer to the solution, still do not get it though.

1 Like

You think this can be due to the existential adding executor hop when protocol has async method, that somehow causes unsafe access?

I have had a theory that async in protocol might cause unwanted hops since compiler might actually doesn’t know underlying executor when you call a method on an existential (any P) and @vanvoorden just went straight to the case of existentials (even if not with the same backing idea). At the moment it feels like a bug to be honest…

1 Like
actor AnyFoo<F: Foo>: Foo {
  let f: F
  init(f: F) { self.f = f }
  func testFunc(key: String) async -> String? {
    await self.f.testFunc(key: key)
  }
}

What if you tried a small type erased version of the Foo protocol (instead of any Foo)? Would that help unblock for your situation while we wait for (what could be a bug) to be investigated?

Yeah… I see some runtime crash bugs with opaque types show up now and then. I don't normally use boxed any types very often but this looks like it might be a legit bug. @bealex could you also look through the Swift repo on GitHub to see if there are any open issues (and file a new issue if this looks like it has not been reported)?

Sure. Thanks for the help!

1 Like

Removal of async solves the problem, so I'm good for now. Thanks!

1 Like

Here is the bug: Crash in an actor with non-async implementation of an async protocol method · Issue #74126 · apple/swift · GitHub. I hope I did it correctly.

1 Like

I took a look at SIL that example generates, and seems like it misses to add hop to the actor executor in case of async.

This is a version of testFunc that's NOT marked as async, here are 2 hops: first hop_to_executor %23... – to the actor. Second is hop_to_executor %3... 2 lines below – to the generic executor.

  %21 = load %1 : $*any Foo                       // users: %23, %22
  strong_retain %21 : $any Foo                    // id: %22
  %23 = open_existential_ref %21 : $any Foo to $@opened("61B49E5A-2305-11EF-8A2F-82C93438CB79", any Foo) Self // users: %29, %25, %26, %26, %24
  %24 = witness_method $@opened("61B49E5A-2305-11EF-8A2F-82C93438CB79", any Foo) Self, #Foo.testFunc : <Self where Self : Foo> (isolated Self) -> (String) -> String?, %23 : $@opened("61B49E5A-2305-11EF-8A2F-82C93438CB79", any Foo) Self : $@convention(witness_method: Foo) <τ_0_0 where τ_0_0 : Foo> (@guaranteed String, @guaranteed τ_0_0) -> @owned Optional<String> // type-defs: %23; user: %26
  hop_to_executor %23 : $@opened("61B49E5A-2305-11EF-8A2F-82C93438CB79", any Foo) Self // id: %25
  %26 = apply %24<@opened("61B49E5A-2305-11EF-8A2F-82C93438CB79", any Foo) Self>(%18, %23) : $@convention(witness_method: Foo) <τ_0_0 where τ_0_0 : Foo> (@guaranteed String, @guaranteed τ_0_0) -> @owned Optional<String> // type-defs: %23; users: %118, %58, %28
  hop_to_executor %3 : $Optional<Builtin.Executor> // id: %27
  debug_value %26 : $Optional<String>, let, name "value" // id: %28
  strong_release %23 : $@opened("61B49E5A-2305-11EF-8A2F-82C93438CB79", any Foo) Self // id: %29

Now, when testFunc IS marked as async we see only 1 hop_to_executor %3... after calling it, but no hops to the actor's executor at all, so it is executed at whatever executor it has inherited from task group call of addTask:

  %21 = load %1 : $*any Foo                       // users: %23, %22
  strong_retain %21 : $any Foo                    // id: %22
  %23 = open_existential_ref %21 : $any Foo to $@opened("9DFF729A-2305-11EF-97BF-82C93438CB79", any Foo) Self // users: %28, %25, %25, %24
  %24 = witness_method $@opened("9DFF729A-2305-11EF-97BF-82C93438CB79", any Foo) Self, #Foo.testFunc : <Self where Self : Foo> (isolated Self) -> (String) async -> String?, %23 : $@opened("9DFF729A-2305-11EF-97BF-82C93438CB79", any Foo) Self : $@convention(witness_method: Foo) @async <τ_0_0 where τ_0_0 : Foo> (@guaranteed String, @guaranteed τ_0_0) -> @owned Optional<String> // type-defs: %23; user: %25
  %25 = apply %24<@opened("9DFF729A-2305-11EF-97BF-82C93438CB79", any Foo) Self>(%18, %23) : $@convention(witness_method: Foo) @async <τ_0_0 where τ_0_0 : Foo> (@guaranteed String, @guaranteed τ_0_0) -> @owned Optional<String> // type-defs: %23; users: %117, %57, %27
  hop_to_executor %3 : $Optional<Builtin.Executor> // id: %26
  debug_value %25 : $Optional<String>, let, name "value" // id: %27
  strong_release %23 : $@opened("9DFF729A-2305-11EF-97BF-82C93438CB79", any Foo) Self // id: %28

UPD. Checked on latest Swift toolchain from main branch – still reproduces as well.

5 Likes

Oh, so it does somehow matter. Thanks for that!

Should I copy this to the github bug? Will this be helpful?

1 Like

Yes, as Swift 6 aims to bring complete concurrency model, a lot of things are being addressed, so it can be the case newer versions has it fixed.

Yes, this can be a valuable addition.

2 Likes

It looks similar to this bug that @Douglas_Gregor fixed a couple of years ago:

2 Likes