I was playing with measuring Actor performance

isaac-weisberg · September 28, 2024, 7:27pm

I have written this tiny test to compare actor-based synchronization with raw lock-based synchronization.

I have written these 2 similar objects:


final class StateHolderLock {
    var lock = os_unfair_lock_s()

    init() {}

    var sum = 0

    var onNewValueReceived: ((Int) -> Void)!

    func handleValueRecieved(_ val: Int) {
        os_unfair_lock_lock(&lock)
        sum += val
        os_unfair_lock_unlock(&lock)
        onNewValueReceived(val)
    }
}

final actor StateHolderActor {
    init() {}

    var sum = 0

    nonisolated(unsafe) var onNewValueReceived: ((Int) -> Void)!

    func handleValueRecieved(_ val: Int) {
        sum += val
        onNewValueReceived(val)
    }
}

And I have written a usage case, like this, which I run in a macOS CLI app target, in Release mode:

let iterations = 1000000
func testStateHolderActor() async {
    let e = measureSH()
    let actor = StateHolderActor()

    var sum = 0
    actor.onNewValueReceived = { val in
        sum += val
    }

    for i in 0 ..< iterations {
        await actor.handleValueRecieved(1)
    }

    e()
}

func testStateHolderLocked() {
    let e = measureSH()
    let actor = StateHolderLock()

    var sum = 0
    actor.onNewValueReceived = { val in
        sum += val
    }

    for i in 0 ..< iterations {
        actor.handleValueRecieved(1)
    }

    e()
}

Unfortunately, I am seeing that the lock version takes 2 ms to complete, while the actor-based version takes 58 ms to complete.

And I see in the trace of Time Profiler that testStateHolderActor() has a weight of 8.9%, while testStateHolderLocked() has a weight of 1.3%...

The code does a lot of stuff that's not directly related to what I have written.

Can somebody help me, maybe I need to somehow setup my project so that the performance is comparable?

~~I am attaching a trace file too.~~ ohh well... I can email it to you, I can't just upload a zip here...

nkbelov · September 28, 2024, 8:00pm

This is expected to an extent: actors are much higher-level than simple locks and rely on Swift Concurrency runtime to work, which includes some scheduling logic on top of OS threads. In that, they are more comparable to GCD queues.

You should not really expect actors to demonstrate comparable performance in tasks such as simply incrementing an integer from a different isolation domain, as the runtime will have to constantly switch contexts, which is way more expensive than the operation itself.

Part of the problem is that testStateHolderActor() runs on the so-called generic executor, while the actor has its own executor, and the runtime has to switch in each iteration. You can modify your logic to only hop off the actor once:

func testStateHolderActor() async {
    let e = measureSH()
    let actor = StateHolderActor()

    var sum = 0
    actor.onNewValueReceived = { val in
        sum += val
    }

    func run(actor: isolated StateHolderActor) async {
         for i in 0 ..< iterations {
            actor.handleValueRecieved(1)
        }
    }

    await run(actor: actor)

    e()
}

You will find much more success with actors for tasks where you have to isolate larger stateful systems, where it becomes increasingly more cumbersome to set up locking properly and there's a need to support async operations by design, such as network I/O.

isaac-weisberg · September 28, 2024, 9:06pm

wow, thank you!

It helped, now the actor version takes 6 ms versus 2 ms for the locked.

jamieQ · September 28, 2024, 9:43pm

isaac-weisberg:

final class StateHolderLock {
    var lock = os_unfair_lock_s()
    // <snip>
    func handleValueRecieved(_ val: Int) {
        os_unfair_lock_lock(&lock)
        sum += val
        os_unfair_lock_unlock(&lock)
        onNewValueReceived(val)
    }
}

forgive me for the tangent, but i wanted to highlight that the pattern used here is not safe, since using an os_unfair_lock_s in this manner is not guaranteed to have a stable address. this pitfall is highlighted in the documentation for OSAllocatedUnfairLock, which should be used instead (if available for your platform). it states:

However, it’s unsafe to use os_unfair_lock from Swift because it’s a value type and, therefore, doesn’t have a stable memory address. That means when you call os_unfair_lock_lock or os_unfair_lock_unlock and pass a lock object using the & operator, the system may lock or unlock the wrong object.

Alejandro · September 28, 2024, 9:45pm

Use Mutex instead of OSAllocatedUnfairLock when possible

nkbelov · September 28, 2024, 9:47pm

I'll just add that this is, of course, not an equivalent transformation: what happens in the new version is as if you'd called os_unfair_lock_lock and os_unfair_lock_unlock outside the loop, so the two cases are not strictly comparable anymore.

isaac-weisberg · September 28, 2024, 9:47pm

Doesn't it have a stable memory address because it's allocated as a stored property of a reference-type object?

isaac-weisberg · September 28, 2024, 9:48pm

ohhh thanks for bringing my attention to this <3

Alejandro · September 28, 2024, 9:51pm

Yes, but taking the address of a class property 1. does not guarantee that the pointer you get is the actual pointer to the property (we can make a temporary one if we want!) 2. introduces runtime exclusivity checks which synchronization primitives generally want to avoid because they implement their synchronization.

isaac-weisberg · September 28, 2024, 9:53pm

It introduces runtime exclusivity checks? And OSAllocatedUnfairLockdoesn't?

Alejandro · September 28, 2024, 9:54pm

Mutex and OSAllocatedUnfairLock do not introduce exclusivity checks.

isaac-weisberg · September 28, 2024, 9:54pm

ASDF measure SH testStateHolderActor() 7866 microseconds
ASDF measure SH testStateHolderLocked() 318 microseconds

oh no...

tclementdev · September 28, 2024, 9:56pm

An actor is not a lock, you shouldn't try to use it as such, see this post from ktoso: Why do not actor-isolated properties support 'await' setter? - #33 by ktoso

jamieQ · September 28, 2024, 10:08pm

in addition to what @Alejandro pointed out, this past thread has more discussion on the matter.

ibex10 · September 29, 2024, 12:38am

I have tried the above code, measuring the time with this:


func measure (_ prefix: String, _ f: () -> Void) {
    let d = ContinuousClock ().measure {
        f ()
    }
    print (prefix, d)
}

func measure (_ prefix: String, _ f: () async -> Void) async {
    let d = await ContinuousClock ().measure {
        await f ()
    }
    print (prefix, d)
}

Details

import Foundation

@main
enum Test {
    static func main () async  {
        testStateHolderLocked()
        await testStateHolderActor()
        await testStateHolderActor2()
    }
}

// [https://forums.swift.org/t/i-was-playing-with-measuring-actor-performance/75005]

final class StateHolderLock {
    var lock = os_unfair_lock_s()

    init() {}

    var sum = 0

    var onNewValueReceived: ((Int) -> Void)!

    func handleValueRecieved(_ val: Int) {
        os_unfair_lock_lock(&lock)
        sum += val
        os_unfair_lock_unlock(&lock)
        onNewValueReceived(val)
    }
}

final actor StateHolderActor {
    init() {}

    var sum = 0

    nonisolated(unsafe) var onNewValueReceived: ((Int) -> Void)!

    func handleValueRecieved(_ val: Int) {
        sum += val
        onNewValueReceived(val)
    }
}

let iterations = 1000000
func testStateHolderActor() async {
    await measure ("Actor:") {
        let actor = StateHolderActor()
        
        var sum = 0
        actor.onNewValueReceived = { val in
            sum += val
        }
        
        for _ in 0 ..< iterations {
            await actor.handleValueRecieved(1)
        }
        
    }
}

func testStateHolderLocked() {
    measure ("Locked:") {
        let actor = StateHolderLock()
        
        var sum = 0
        actor.onNewValueReceived = { val in
            sum += val
        }
        
        for _ in 0 ..< iterations {
            actor.handleValueRecieved(1)
        }
        
    }
}

// [https://forums.swift.org/t/i-was-playing-with-measuring-actor-performance/75005/2]

func testStateHolderActor2 () async {
    await measure ("Actor2 :") {
        let actor = StateHolderActor()
        
        var sum = 0
        actor.onNewValueReceived = { val in
            sum += val
        }
        
        func run (actor: isolated StateHolderActor) async {
            for _ in 0 ..< iterations {
                actor.handleValueRecieved(1)
            }
        }
        
        await run (actor: actor)
    }
}

func measure (_ prefix: String, _ f: () -> Void) {
    let d = ContinuousClock ().measure {
        f ()
    }
    print (prefix, d)
}

func measure (_ prefix: String, _ f: () async -> Void) async {
    let d = await ContinuousClock ().measure {
        await f ()
    }
    print (prefix, d)
}

And this is what I got (on macOS 14.5, 3.2 GHz 6-Core Intel Core i7, Xcode Version 15.4 (15F31d)

Locked: 0.536446558 seconds
Actor: 0.65408858 seconds
Actor2 : 0.525278467 seconds

David_Smith · September 29, 2024, 6:51am

Another interestingly subtle thing to note here is that if the main actor is involved in your test at all (whether going to it or from it), that can change the performance characteristics significantly.

The reason for this is an optimization called "executor stealing". When switching between actors that use the cooperative thread pool ^[1], Swift can completely avoid the cost of actually switching threads by reusing the current thread for the actor it's switching to.

For the main actor, an actual thread switch has to happen since the main actor is required to run on the main thread and everything else is required not to.

which is all non-main actors unless overridden by a custom executor ↩︎

vns · September 29, 2024, 9:09am

As of topic of actors performance, there was a thread replicating Go’s channels and out of curiosity I’ve made actors implementation to compare performances: Async Channels for Swift concurrency - #44 by vns

In general actors are quite good from performance perspective: there are cases where strategically put locks might perform better, especially if locks allow significantly reduce number of hops between executors (you can see in the thread that syncRw version is the slowest in either implementation exactly because there are a lot of hops back and forth), but for a majority of use cases they are pretty performant on their own.

So that if you have small chunks of work between which you expect to switch extensively, I’d prefer lock over actor — in that case you have zero hops between executors.

Also, I want to highlight the fact that your code with locks here is completely synchronous, while actors version is by nature introduce some asynchronous work. I think to be completely fair in comparison, you would need to introduce offloading to a separate queue for lock version and probably introduce some level of parallelism — so that your code is actually being mutated from different threads (for both locks and actors versions).