SE-0433 Mutex vs actor - General advice?

Datagram · April 18, 2024, 4:59pm

With SE-0433 adding Mutex to the standard library. I'm wondering what the general advice is regarding using Mutex vs actor.

Here is a very simple case of a FIFO using Array as storage to illustrate.
FIFO has two methods of enqueuing and dequeuing. In the first case, FIFO is implemented using an actor and an Array as storage and in the second case, FIFO is a class using an Array as storage protected via the new Mutex.

// Tested with swift-PR-71383-1223.xctoolchain

import Foundation
import Synchronization

actor FIFOWithActor {
    private var _storage = Array<Int>([])
    
    func enqueue(item: Int) {
        _storage.append(item)
    }

    func dequeue() -> Int? {
        guard !_storage.isEmpty else { return nil }
        return _storage.removeFirst()
    }
}

class FIFOWithMutex {
    private let _storage = Mutex<Array<Int>>([])
    
    init() { }
    
    func enqueue(item: Int) {
        _storage.withLock { storage in
            storage.append(item)
        }
    }

    func dequeue() -> Int? {
        _storage.withLock { storage in
            guard !storage.isEmpty else { return nil }
            return storage.removeFirst()
        }
    }
}

let fifoA = FIFOWithActor()
let fifoM = FIFOWithMutex()

let sender = Task.detached {
    for i in 0..<100 {
        print(">FIFOWithActor enqueue \(i)")
        await fifoA.enqueue(item: i)
        
        print("  >FIFOWithMutex enqueue \(i)")
        fifoM.enqueue(item: i)
                
        try await Task.sleep(for: .milliseconds(10))
    }
    
    return "finish sending"
}

let receiver = Task.detached {
    let fifoM = FIFOWithMutex()
    while true {
        if let item = await fifoA.dequeue() {
            print("<FIFOWithActor dequeue \(item)")
        }
        
        if let item = fifoM.dequeue() {
            print("  <FIFOWithMutex dequeue \(item)")
        }
        
        try await Task.sleep(for: .milliseconds(10))
    }
    
}

let result = await sender.result
print(result)

Beyond the fact that the implementation with Mutex is not restricted to use in an async/await concurrency context, what is the general advice regarding the use of Mutex vs actor?

wadetregaskis · April 18, 2024, 6:03pm

In this specific case, because using Array for FIFO is very inefficient, the actor version might be superior because it's less likely to cause noticeable delays on latency-sensitive threads, like the main thread.

And I think in general that's a good aspect to focus on, in making this decision. If what you're doing in the critical section is quite fast, a mutex is a good option and more efficient than using actors.

Of course, most uses probably don't really care about the difference in efficiency, so I'd be more likely to just fit the design to whatever's using it. e.g. do you need or want to use it from non-async contexts? Do you have ancillary, periodic functions going on that could neatly run as actor-isolated tasks? Etc.

Karl · April 18, 2024, 8:16pm

There was some discussion in the review thread about having async versions of locks such as mutexes. An actor basically is an async mutex. That's the context in which I would make this decision.

Swift's Tasks and Task suspension system can have some theoretical advantages over native OS threads, such as reduced context switching. Actors, because they suspend, can take advantage of that, while (synchronous) mutexes don't know anything about tasks; they work at the OS thread level. Potentially, an actor could be a better choice if you expect very high contention or you want to take advantage of features such as Task priorities.

In other situations, the overhead of suspending the current Task and waiting for the actor to become available may be burdensome, so in those cases a mutex might be a better choice.

(I say "theoretically" and "potentially" because I believe the focus so far has been on getting the overall semantics of language-integrated concurrency nailed down, so I'm not sure that the current actor/Task scheduling implementation necessarily reflects the best possible performance)

hassila · April 18, 2024, 8:44pm

So, to summarize:

Benchmark and try to measure your specific workload and let that help inform your decision if performance is a priority.

I’ve found over time that it’s usually simpler to break out measurement tools than trying to speculate - the full stack is too complicated these days…

If performance isn’t important for you, just pick the tool that will be easier to write robust solutions with (YMMV depending on your case).

More specifically, for a fifo you might consider the Deque from Collections as an alternative perhaps.

David_Smith · April 18, 2024, 9:07pm

This is how I think of it as well. Being asynchronous has some level of inherent overhead in both performance and cognitive load, and things like the executor stealing optimization only mitigate that, not eliminate it.

Whittling a piece of wood with a chainsaw will generally be less practical than with a carving knife, even if it's a really nice chainsaw.

Datagram · April 18, 2024, 9:42pm

Note that my example was not to know the best choice for FIFO storage. There are definitely better choices and Deque is one of them.

Whatever the storage, the idea was to show a simple example of a resource whose access is shared by different tasks and these accesses must be synchronized either using an actor or the new Mutex now available.

Now that we have this very simple to use Mutex<your_type_to_protect> in the standard library and available on all platforms I was thinking about using actor vs this new Mutex advice.

ktoso · April 18, 2024, 10:18pm

There's a few nuances to consider whenever deciding how to synchronize your concurrent code, be it using an actor, lock, or other patterns.

In your specific question "make an FIFO queue" this really falls into the "concurrent data-structures" (well, at least "concurrency-safe data-structures"), bucket and those usually prefer using locks or lock-free algorithms to provide the synchronization. Why? Because a data-structure is a low level concept that should provide highest performance and be reusable from most contexts. You would not want to thread hop in order to enqueue an element into a queue, that's too expensive.

So yes, for low level primitives, data-structures, and stuff like that locking (or lock-free algorithms) are the way to go usually.

Where does that leave actors? Well, for "normal day to day code" actors are a better choice. You don't really use an actor as "just a queue", sure, it "is" a queue but the value of an actor is in coupling state and logic -- this is my "Thingy Processing Actor", and all the thingy processing is inside it -- if someone wants to process a thingy, they ask the actor to do so. If they're both default actors, and the target wasn't running, we'll do efficient tricks to reuse the existing thread and not thread-hop etc -- so there's a lot of optimizations built into the model, but the core idea is about isolation by hopping.

The best thing about actors (and distributed actors) is that they're a great default for most day-to-day code, and the compiler will prevent you from writing all kinds of concurrency bugs when you rely on them. If you use locks manually, you'd have to be careful™ and think about the state much more. I also find that locking usually means that the state and logic are spread out across the codebase, and "random places" just grab the lock, modify some state, while with actors you would not do that -- by design, the "mutate the things" will be inside the actor -- which helps you organize your code into clean isolation domains, rather than have various pieces of code "randomly" reach for the lock, do some stuff, and write back the locked state...

You could abuse an actor "just as a lock" (actor State<T> { var s: T; func get() -> T; func set(_: T)), but that's somewhat missing the point -- you've not leveraged the actor semantically to have it be the "logical place where all my business logic sits".

So as much as it is a performance question:

is the critical section very short, and we don't want to avoid thread hops, and taking a lock will be cheaper? (use lock)
is it preferable to suspend if the code is already executing because it may take a while to complete and we wait but not block the calling thread? (use actor)

(And as always with performance questions: Trust no-one, benchmark everything.)

It very much is a design question as well:

is this just a single state thing I'm swapping sometimes? (lock is fine)
is this some complex business logic and I'd like to keep the logic and state encapsulated in one place, and get compiler to help me by encouraging me to put logic inside that place, rather than spread it all over the codebase? (use actor).

Those questions can help answer "what should I use?" I think. And an even shorter answer would be: default to actors, unless you have reasons not to. And one such reason could be that you're building a concurrent data-structure.

David_Smith · April 18, 2024, 11:52pm

I really can’t stress enough how true this is. If you’re regularly performance profiling your code at all, you’re ahead of the majority of programmers from what I’ve seen.

It’s fun and sometimes helpful to discuss the details like we’re doing here, but 90% of the time it’ll either be clear from the profiling tools or it won’t matter that much.

tera · April 20, 2024, 11:59am

I measured a significantly stripped and simplified version:

enqueue only.
no array used for storage, all "enqueue" doing is incrementing an integer variable.
running 10 tasks in parallel each "enqueuing" 10M items, so it's 100M "enqueues" in total.

Got these results:

vns · April 20, 2024, 1:28pm

Executor hops look quite expensive on such tests, recalling AsyncChannel sync case. And hard (if even possible?) to mitigate.

Datagram · April 20, 2024, 2:48pm

Interesting.
Here my stripped code version benchmarks with 10 tasks and 1_000_000 items

// Tested with swift-PR-71383-1223.xctoolchain
// MacBook Air M1 (8-cores)
// Xcode 15.2, macOS 13.6.6, build debug

import Foundation
import Synchronization

let numberOfTasks = 10
let numberOfItems = 1_000_000
let taskSleepTimeMilliseconds = 10
let benchmarks: [FifoType] = [.NSlock, .SE0433Mutex, .actor]

extension Date {
    static func - (lhs: Date, rhs: Date) -> TimeInterval {
        return lhs.timeIntervalSinceReferenceDate - rhs.timeIntervalSinceReferenceDate
    }
}

enum FifoType: String {
    case actor = "actor"
    case SE0433Mutex = "SE-0433 Mutex"
    case NSlock = "NSLock"
}

final class FIFOWithNSLock {
    private var storage = 0
    let lock = NSLock()
    
    init() { }
    
    func enqueue(item: Int) {
        lock.withLock {
            storage += 1
        }
    }
}

final class FIFOWithMutex {
    private let _storage = Mutex<Int>(0)
    
    init() { }
    
    func enqueue(item: Int) {
        _storage.withLock { storage in
            storage += 1
        }
    }
}

actor FIFOWithActor {
    private var _storage = 0
    
    func enqueue(item: Int) {
        _storage += 1
    }
}

let fifoL = FIFOWithNSLock()
let fifoM = FIFOWithMutex()
let fifoA = FIFOWithActor()

print("elapsed time in seconds")
print("========================================")

for benchmark in benchmarks {
    let timeBegin = Date()

    await withTaskGroup(of: Int.self) { group in
        for _ in 0..<numberOfTasks {
            group.addTask {
                for i in 0..<numberOfItems {
                    switch benchmark {
                    case .NSlock:
                        fifoL.enqueue(item: i)
                    case .SE0433Mutex:
                        fifoM.enqueue(item: i)
                    case .actor:
                        await fifoA.enqueue(item: i)
                    }
                }
                return 0
            }
        }
    }
    
    let elapsed = Date() - timeBegin
    print("\(elapsed) (\(benchmark.rawValue))")
}

Output:

elapsed time in seconds

========================================
1.55503511428833 (NSLock)
0.9709900617599487 (SE-0433 Mutex)
7.098009943962097 (actor)

Edit: Added the missing group.addTask in my code

Datagram · April 20, 2024, 3:24pm

Thanks so much Konrad, I read your text many times and I think I begin to better understand the role of actor vs Mutex in particular when you said:

So as much as it is a performance question:

is the critical section very short, and we don't want to avoid thread hops, and taking a lock will be cheaper? (use lock)
is it preferable to suspend if the code is already executing because it may take a while to complete and we wait but not block the calling thread? (use actor)

In the case of my FIFO example, it's clear it the "concurrent data-structures" use case as you said and preliminary benchmarks confirms this. Btw the SE-0433 Mutex sounds very interesting to me, its in the standard library, optimized for each platform, and is Swift 6 friendly.

Our domain is Test & Measurements Data acquisition and Simulation in laboratories and are using Swift more and more on macOS/iOS as well as Windows and Linux (Desktop not Server). Since years parallels and concurrency works is done using Dispatch (GCD) in C and Swift and now looking at Swift 6 to adopt the async/await, actor in places where it make sense. Your help in invaluable.

Thanks again !

wadetregaskis · April 20, 2024, 7:50pm

Note also that GCD offers an intermediary - executing tasks via a serial queue is substantially more efficient than through an actor, but provides the same benefit of letting you execute tasks asynchronously so as to not block a latency-sensitive thread.

The main downsides are that you won't get full concurrency safety checking, as you will (in Swift 6) with actors, and that the code can be less readable (due to the more complicated control flow, versus the relatively linear nature of async code).

David_Smith · April 21, 2024, 8:37pm

I have not found this to be true, fwiw. In my tests last year, ping ponging between two queues using dispatch_async was about half as fast as using actors.

wadetregaskis · April 21, 2024, 8:45pm

Curious. I was speaking only from casual experience, not any in-depth examination, but my results in real-world programs seemed very clear. They were what's driven me to avoid actors somewhat, preferring simple mutexes or GCD instead.

I don't have a suitable example I can share, at hand, but I'll have to keep an eye out for such in future. Perhaps I'm hitting something unexpected.

David_Smith · April 21, 2024, 10:02pm

One thing you could be running into is that the main actor doesn't support executor stealing (because it always has to use the main thread), so hops to and from it will be more expensive.

wadetregaskis · April 21, 2024, 11:40pm

Yeah, that could well be at least part of it. A lot of [my] code involves the main thread, as a controller. I believe there's improvements planned in the near future, circa Swift 6, which is in part why I haven't bothered to really dig into this area. Easier to wait for the concurrency stuff to finalise.