Actor isolation is broken by protocol conformance

alexd6631 · April 28, 2022, 2:54pm

In the example code below, an actor implements two variant of a protocol, one async and one inheriting Actor protocol.

When testing a race between two tasks calling an actor method (simulating an expensive CPU work without suspension point), we can see that the Actor based will honor the isolation of the actor, whereas the async one will allow tasks to interleave.

Personally I think it is a poor, or at least misleading, design choice, because the point of working with actor is to have strong guarantee on isolation (except on explicitly non isolated methods, and at suspension points), and those guarantee can be broken by the caller just by making the actor conform a bad protocol.

Forcing inheriting the Actor protocol works, but it is a leaking abstraction, and it is not possible to abstract over two implementation using or not an actor as an implementation detail.

I would expect the protocol witness of the async protocol to preserve the underlying actor isolation without additional information.

Is there some documentation where this specific point was discussed ?
Otherwise should I submit an issue to the compiler team.

protocol TestableActor: Actor {
    func test()
}

protocol TestableAsync {
    func test() async
}

actor TestActor : TestableActor, TestableAsync {
    func test() {
        print("Before \(Thread.current)")
        Thread.sleep(forTimeInterval: 1)
        print("After \(Thread.current)")
    }
}


func test(actor: TestableActor) {
    Task {
        print("TestableActor 1 \(Thread.current)")
        await actor.test()
    }

    Task {
        print("TestableActor 2 \(Thread.current)")
        await actor.test()
    }
}

func testAsync(actor: TestableAsync) {
    Task{
        print("TestableAsync 1 \(Thread.current)")
        await actor.test()
    }

    Task {
        print("TestableAsync 2 \(Thread.current)")
        await actor.test()
    }
}

let actor = TestActor()

test(actor: actor)
Thread.sleep(forTimeInterval: 3)

testAsync(actor: actor)
Thread.sleep(forTimeInterval: 3)

valentinradu · April 28, 2022, 3:43pm

This looks like a bug. There are a few of these edge cases with the concurrency model, not sure if this specific one is documented (a quick look on github issues didn't return anything)

It's debatable if your TestActor implements TestableAsync. I'd say it doesn't. Just because the underlying implementation detail of an actor uses async, it doesn't make test() the same as test() async. So that would be the first problem IMO.
If you replace func test() with func test() async everything will work as expected in your TestableAsync test.
Another problem that I see is that if you provide both func test() async and func test() conformances the compiler will always pick func test() regardless of the protocol (even for TestableAsync).

I'm in the process of learning how the witness table works and I can't comment more on that, but as a user it certainly feels like a bug and a confusing abstraction.

hooman · April 28, 2022, 4:31pm

Task and Thread APIs don't mix. An actor is logically running on a specific Task, not on a specific thread[*]. Tasks are cooperative (that is why Task.yield() is needed) and a small thread pool is used to service all Tasks. Never sleep or suspend a thread that is servicing a Task. Use Task.sleep() instead. Also, upon each suspension (await) there is no guarantee that the Task will resume on the same thread.

[*] With the exception of some global actors, such as main.

valentinradu · April 28, 2022, 4:39pm

@hooman Thread.sleep can be used to simulate an intensive, long, non-async task. But this problem persists even without the sleep and has little to do with the thread the prints execute on, but with the order they do (which brakes the actor abstraction)

alexd6631 · April 28, 2022, 4:41pm

Yes, as mentioned in the message, Thread.sleep is here to simulate a CPU intensive operation not a task suspension, and to make the problem easy to reproduce.

I am familiar with actor and async/coroutines on other languages (and the concept of multiplexing N tasks over M threads) and I know it is not a good idea to block a Thread in a couroutine in general. Especially mixed with async I/O there is a huge scalability gain to not block threads.

But the point here is to illustrate the lack of isolation (an actor handling only a "message" at a time is a foundation of the actor principle) if you are abstracting the actor over an async protocol.

alexd6631 · April 28, 2022, 5:12pm

Interesting to see, that making the test method async will somehow make it work (even though the test body does not have any await).

Whether or not TestActor implements TestableAsync is debatable indeed.
It would be acceptable to not accept this implementation (and forcing to put async on test() implementation) with a clear error.
We could also argue that communicating with an actor is intrinsically async due to "message passing" so async is already implicit on test(), and make the conformance works without breaking the actor invariant.

In general I would like to keep the ability to abstract over data sources :

protocol SomeDataSource {
  func getSomeResource() async -> SomeResource
}

An implementation that will use network or its own mechanism for concurrency
An implementation that will be in memory but protected from races by an actor

hooman · April 28, 2022, 5:38pm

Sorry I didn't read the whole code carefully before posting. I still don't understand what is your expectation that is not met? I assume you know that Swift actors serialize access but they are reentrant and do not guarantee the order of operations.

alexd6631 · April 28, 2022, 6:13pm

In this example there is no reentrancy involved (since there is no await inside test() and the code is synchronous).
So my expectation as a user is that the actor will guarantee that all test() invocations will be serialized.
This expectation is indeed fulfilled by driving the actor directly or using it through the TestableActor protocol.

However when driving it through the TestableAsync, this guarantee is broken as you can see from the output the calls are interleaved.

(You can reproduce this issue without using Thread.sleep and replacing it by a non trivial computation which is a legitimate use of the actor)

I am pasting the output (in case you were not able to run the example) :

TestableActor 1 <NSThread: 0x1011cbba0>{number = 2, name = (null)}
Before <NSThread: 0x1011cbba0>{number = 2, name = (null)}
TestableActor 2 <NSThread: 0x1010044a0>{number = 3, name = (null)}
After <NSThread: 0x1011cbba0>{number = 2, name = (null)}
Before <NSThread: 0x1011cbba0>{number = 2, name = (null)}
After <NSThread: 0x1011cbba0>{number = 2, name = (null)}
TestableAsync 2 <NSThread: 0x1010044a0>{number = 3, name = (null)}
Before <NSThread: 0x1010044a0>{number = 3, name = (null)}
TestableAsync 1 <NSThread: 0x1011cbba0>{number = 2, name = (null)}
Before <NSThread: 0x1011cbba0>{number = 2, name = (null)}
After <NSThread: 0x1011cbba0>{number = 2, name = (null)}
After <NSThread: 0x1010044a0>{number = 3, name = (null)}

hooman · April 28, 2022, 6:21pm

The problem is you are creating two separate tasks. There is no guarantee which one will actually start running first. It is not about the actor. If you sequentially create a number of tasks, there is no guarantee that they will run in the same order they were created.

alexd6631 · April 28, 2022, 6:26pm

The problem is not the order of tasks but interleaving of the actor protected test() method.

Actor should guarantee that there will be Before followed by After, (regardless of the underlying Thread or Task) and it is indeed the case using the actor directly.

You can see interleaving happening here with TestableAsync :
Before <NSThread: 0x1010044a0>{number = 3, name = (null)}
TestableAsync 1 <NSThread: 0x1011cbba0>{number = 2, name = (null)}
Before <NSThread: 0x1011cbba0>{number = 2, name = (null)} <-- Interleave here
After <NSThread: 0x1011cbba0>{number = 2, name = (null)}

hooman · April 28, 2022, 6:55pm

Ah, now I see.

So it is not running it on the serial executor in the case of the non-actor existential.

This is clearly a bug and you better report it. It should be easy to fix.

alexd6631 · April 29, 2022, 6:47am

Thank you for your feedback @valentinradu and @hooman

I just created the issue here on github

https://github.com/apple/swift/issues/58517

hooman · April 29, 2022, 3:25pm

Thank you.

Since I had a hard time following what you mean, I added a test file to the bug report which also confirms that the bug still exists in the main branch.

Here is the gist of it:

// Dummy computation:
func fib(_ x: UInt) -> UInt { x == 0 ? 0 : x == 1 ? 1 : fib(x-1) + fib(x-2) }

protocol TestableAsync: Sendable {
    func test() async -> Int
}

actor TestActor : TestableAsync {
    var state: Int = 0
    func test() -> Int {
        state += 1
        let savedSate = state
        _ = fib(25) // May need tweaking
        precondition(savedSate == state, "FATAL: Actor isolation violated!")
        return state
    }
}

let boxedActor: TestableAsync = TestActor()
async let a = boxedActor.test()
async let b = boxedActor.test()
await print("(\(a), \(b)")

This will crash with precondition failure (you may need to tweak the input of fib() function based on your CPU speed to reliably get the crash).

Also Note that you need a recent snapshot of Swift to run this as is.