Using semaphores in actor code

Alejandro · February 22, 2024, 6:14pm

Smarter lock implementations have priority inheritance to help combat the inversion issue. When the system detects a high priority thread blocked by a low priority one, it will temporarily elevate the priority of the lower one in an attempt to release the lock at a quicker pace.

The linux documentation for their priority inheritance aware futex primitive states:

Priority inheritance is a mechanism for dealing with the
priority-inversion problem. With this mechanism, when a high-
priority task becomes blocked by a lock held by a low-priority
task, the priority of the low-priority task is temporarily raised
to that of the high-priority task, so that it is not preempted by
any intermediate level tasks, and can thus make progress toward
releasing the lock. To be effective, priority inheritance must
be transitive, meaning that if a high-priority task blocks on a
lock held by a lower-priority task that is itself blocked by a
lock held by another intermediate-priority task (and so on, for
chains of arbitrary length), then both of those tasks (or more
generally, all of the tasks in a lock chain) have their
priorities raised to be the same as the high-priority task.

Source: futex(2) - Linux manual page

John_McCall · February 22, 2024, 6:30pm

Binary vs. counting is not the problem. The problem is that there's no such thing as acquiring a semaphore; the semaphore gets signaled by an arbitrary, unpredictable thread. This is not a problem with locks because the thread that acquired a lock is required to be the thread that releases the lock.

An API like this could be implemented in a way that avoids priority inversion:

struct AsyncLimiter: ~Copyable {
  init(limit: Int, inOrder: Bool = false)

  /// At most `limit` tasks will be performing the operation concurrently
  func withControl<R,E>(operation: () async throws(E) -> R) async throws(E) -> R
}

Note that this is just a sort of async mutex when limit == 1.

It would need to be implemented in the Swift concurrency library because of the transitive inheritance issues that @Alejandro points out (which is something we already handle with e.g. Task.get()), but it would be implementable. How many of your uses of semaphores would that eliminate?

Philippe_Hausler · February 22, 2024, 7:34pm

I think you can probably do the un-ordered perform today with the APIs available on TaskGroups.

func unorderedPerform<T>(maxConcurrentTasks: Int, count: Int, action: @Sendable () async throws -> T) async rethrows -> [T] {
  // action does not really escape here since we await the entire task group
  try await withoutActuallyEscaping(action) { escapingClosure in
    try await withThrowingTaskGroup(of: T.self) { group in
      var running = 0
      var remaining = count
      var collected = [T]()
      while remaining > 0 {
        while running < maxConcurrentTasks {
          group.addTask(operation: escapingClosure)
          running += 1
        }
        guard let element = try await group.next() else {
          break
        }
        collected.append(element)
        running -= 1
        remaining -= 1
      }
      return collected
    }
  }
}

That should still handle the inversion case since it is structured concurrency (and no continuations). And I believe that with some massaging it could even be refactored to an ordered form too.

gwendal.roue · February 22, 2024, 9:13pm

Most of those I know. Let's say all of them.

With the traditional semaphore api, the targeted use case is await sem.wait(); defer { sem.signal() }; await doWork(). If it had to be written await limiter.withControl { await doWork() } instead, I see no problem at all I'd be happy to help sweating some details if I can help.

Thank you, and @tgoyne as well, for explaining the problem with signal() that can be called from anywhere. The limiter api you suggest avoids this problem.

ktoso · February 22, 2024, 10:53pm

So since, use-case wise, this aims to handle the same as (but yes, taking into account the priority escalation etc) to:

While an async API like this sounds useful in general, I am somewhat noticing that the use-cases probably mostly are around avoiding reentrancy issues in actors...

How many of those use cases are trying to just get a non-reentrant async actor method, rather than force people into yet another nesting layer and explicit locking?

tera · February 22, 2024, 10:57pm

Is this actually implemented?

In libpthread up to version 301 I can see this comment:

/*
 * Unlock a mutex.
 * TODO: Priority inheritance stuff
 */

In the further versions the comment is gone, yet there's no obvious priority inheriting business.

A quick test suggests priority is not inherited, although I am not sure I'm checking it properly: both pthread_getschedparam and thread_policy_get (with thread_precedence_policy) show that priority of the LPT is not changing whilst HPT is waiting for the lock that LPT is holding.

David_Smith · February 22, 2024, 11:34pm

pthread mutexes on Darwin do priority inheritance. pthread rwlocks do not.

tera · February 22, 2024, 11:39pm

Could you point me to the relevant piece of code? And how could I observe (test) it?

David_Smith · February 22, 2024, 11:43pm

I don't have the code handy, but you can use spindump to observe the effects

John_McCall · February 22, 2024, 11:56pm

Scheduler APIs generally don't return the dynamically-correct priority because (1) it can change asynchronously, so it doesn't really have a well-defined value, and (2) it generally shouldn't propagate to other things.

As an example of the latter, if you're trying to spawn a new thread at the current thread's priority, you usually don't want the new thread to get a higher priority just because the current thread happens to be temporarily blocking a high-priority thread. Now, you would want that if you intend to wait on your new "subthread" before you unblock the high-priority thread, but simply propagating your instantaneously-correct priority wouldn't be good enough to handle that; you need to be dynamically responsive, so that priority propagates to the subthread if the high-priority thread starts waiting on you after the subthread's creation. Traditional thread libraries don't deal with this sort of thing very well because they don't have the right information, but structured concurrency actually does, which is one of the reasons we like it.

wadetregaskis · February 22, 2024, 11:58pm

Possibly many. Be mindful that some are on the main thread, though. e.g. one of my recent Semaphore uses was to protect NSBitmapImageRep, which has to be used on the main thread for some things, and doesn't [seem to] mind for others (and ideally would do nothing important on the main thread, because the things it does do, like drawing, can make the main thread hang for tens of seconds with even just moderate resolution files… but I digress).

John_McCall · February 23, 2024, 12:09am

I'm sure some of them are, yeah. An API like AsyncLimiter would be prone to all the problems that we've tried to avoid by allowing actors to schedule other tasks when isolated functions call out to non-isolated callers: deadlock, poor utilization / excessive overhang, questions about recursive locking, and so on. It would also force us to deal with temporary boosts to task priority, which is a pretty non-trivial source of complexity that we've currently designed away.^[1] But it's also pretty valuable semantically.

There are definitely significant uses of it that could never fit in with actors, though; our actor design is centered around allowing a single task because that's what allows meaningful data isolation.

Waiting on a Task value causes a permanent boost to that task and its subtasks. Adding a higher-priority task to an actor that's currently running causes a temporary boost to the thread priority for that running thread. But we don't have anything that can create a temporary boost to task priority, and it's not clear what it would mean for subtasks. (It probably ought to only affect subtasks created during the controlled function?) ↩︎

ktoso · February 23, 2024, 12:36am

Oh yeah I’m not against introducing fine grained control APIs. It definitely is tricky though with those temporary boosts… just observing that this looked like the good old reentrancy discussion looking at the use cases people brought up so far.

tera · February 23, 2024, 12:53am

I managed to observe the priority inheritance effect with this snippet:

let lock = NSLock()

var count = 0
@inline (never) func inc(_ x: Int) -> Int { x + 1 }

let highPriorityThread = Thread {
    Thread.current.threadPriority = 1
    usleep(1000_000)
    lock.lock()
}

let lowPriorityThread = Thread {
    Thread.current.threadPriority = 0.1
    lock.lock()
    var count = 0
    let start = Date()
    print("LPT start loop")
    while count < 1000_000_000 {
        count = inc(count)
    }
    print("LPT end loop after \(Date().timeIntervalSince(start))")
}

// fill all cores, check how many have you got
for _ in 0 ..< 10 {
    let thread = Thread {
        Thread.current.threadPriority = 0.5
        while true {}
    }
    thread.start()
}

// highPriorityThread.start() // uncomment to speed low priority thread
lowPriorityThread.start()
RunLoop.main.run(until: .distantFuture)

Without highPriorityThread running the loop running in low priority thread completes in some 15 seconds, with highPriorityThread running the low priority loop completes in ~ 2 seconds.

gwendal.roue · February 23, 2024, 8:18am

Let's imagine a scenario.

[Tim]: Hi folks, do you remember this nasty bug that we've been looking for since two months? We eventually figured out that some server endpoints are not reentrant. In particular:

All calls to POST /foo must be serialized, for a given authorization token.
All calls to POST /bar and PUT /bar are mutually exclusive and must be serialized, for a given authorization token.

The server team is now fully aware, but the fix won't ship in production until a long time. So we decided that the client app should temporarily workaround the problem. This won't fix the problem for users who use two different devices with the same token, but this will drastically reduce our error rate nevertheless.

What are our options?

[Fanissa]: Easy! We can fix it with semaphores:

 final class APIClient {
+  private let fooSemaphore = AsyncSemaphore(value: 1)
+  private let barSemaphore = AsyncSemaphore(value: 1)
 
   func createFoo() async throws {
+    await fooSemaphore.wait()
+    defer { fooSemaphore.signal() }
     await post("/foo", ...)
   }
 
   func createBar() async throws {
+    await barSemaphore.wait()
+    defer { barSemaphore.signal() }
     await post("/bar", ...)
   }
 
   func updateBar() async throws {
+    await barSemaphore.wait()
+    defer { barSemaphore.signal() }
     await put("/bar", ...)
   }

   // unaffected
   func otherMethods async throws { ... }
 }

[Sam]: I read on the Swift forums that async semaphores are a bad pattern.

[Fanissa]: Yes, they can create issues in the general case. In our particular case, though:

All those api calls are performed from user-initiated tasks, so they can't hinder higher priority tasks.
They have no mutual dependency, so we don't risk any deadlock.
It's a quick and reliable fix, until the server fixes its own issues.
I also read on the Swift forums that some prominent members of the Swift team were not against a standard semaphore-like api that we'll be able to use instead of AsyncSemaphore in the future.

[Tim]: All right, let's do it and ship. And let's not forget to remove this workaround when it is no longer needed.

[Sam]: Oh, Fanissa, I think we have to properly deal with cancellation.

[Fanissa]: Sure, AsyncSemaphore has a waitUnlessCancelled method. But we can't really avoid server reentrancy if the URLSession request itself is cancelled - maybe the server will catch and process it. This is a case where we should avoid Task cancellation, even if the parent Task itself is cancelled. Some refactoring is probably needed here.

[Sam]: Well spotted! Let's discuss pros and cons, and decide a clear strategy about cancellation. I'm glad AsyncSemaphore will support our final decision.

That's my answer, @ktoso:

Not only actors need exclusivity (APIClient above is a class). When you work with OpenAPI Generator, you get generated structs.
The need for exclusivity can span multiple methods (here, createBar and updateBar).

gwendal.roue · February 23, 2024, 8:38am

Looking at the various code bases on my machine, I spot a use case for semaphores that would not be covered by your limiter.

To me, this does not remove the value of the limiter, because exclusivity and non-reentrancy is the most needed use case.

OK, so: semaphores are useful in tests of concurrent systems. Depending on the location of wait() and signal(), we can test multiple orderings of events in parallel scenarios:

What if the system scheduler starts A before B?
What if it is the opposite?
etc.

This does not always allow to test all orderings. But testing some of them is better than nothing, and I have spotted several concurrency bugs this way. Semaphores

ktoso · February 23, 2024, 8:54am

This really should be non-reentrant actors one protecting foo and one bar. I really do not think the fact that we'd "give up and just give everyone semaphores" rather than explicitly model these things using actors is a good route -- it hinders proper adoption of the actor model and sticking to the "old ways" of manually locking around state which is error prone and verbose. Both expose the same issues, but one is rather manual and a bigger foot gun than modeling it in the language and using actors to protect from races (high and low level) as they're supposed to.

Your use case is exactly what non-reentrant actors/methods are for, and I'm concerned it this is the true/actual use-case we're trying to solve here, we're killing a fly with a orbital laser canon -- bringing back manual locking patterns, rather than improving our implementation of the actor model.

At the same time, I do think adding "fine grained" locking control like these types (like John's scoped version for example), are okey -- although they're a "last resort when actors are not good enough (for some reason)". Imho, the default programming model and go-to-solution in your discussion between Tim/Fanissa should be "ah cool, actors solve that."

gwendal.roue · February 23, 2024, 9:16am

It's hard to tell if you're talking from the point of view of an actor evangelist, or from the point of view of language users who pounder their various options and take good, acceptable, or bad decisions, depending on their expertise level, and the available documentation. We have to work against bad decisions, but we should not reject acceptable decisions in a whim.

Putting this question aside, I don't know if splitting exclusivity domains across multiple actors is obviously a good idea:

If a given program has one actor (that isolates its inner state), and if a later version of the program needs to split this actor into multiple ones in order to deal with newly introduced needs for exclusivity, then the inner state of the original actor now has to be split across several ones. How to restore a unique isolation domain for accessing this state? I suppose that's the purpose of custom executors (a language feature that has to be praised). Will restoring a unique isolation domain work against the exclusivity requirements that introduced several actors to begin with? That's an api design question that has to be asked.

That's something I'd like to see, so that I can decide which solution I prefer, by comparing their ease of use, the time to spend, their eventual impact on the rest of the program, their fit with the expertise level of the team and the possible need for a general team training, the foreseen maintenance cost, etc.

I can look like someone who resist change, but this couldn't be more wrong. There are practical challenges that have to be considered because we do not code in a vacuum. Swift concurrency needs to convince, and show, not only reject. We're back at my above point about documentation.

JanWillemBrands · February 23, 2024, 11:21am

Love these discussions, and am looking forward to the sequel:

[Tim]: Hi folks, do you remember this nasty bug that we've been looking for since two months? We eventually figured out that some server endpoints are not reentrant. In particular:
...
What are our options?

[Fanissa]: Easy! We can fix it with actors:
[/quote]

ktoso · February 26, 2024, 9:01am

I'm looking at this from a holistic language design angle: since one of our go-to solutions for concurrency problems is the actor model; We're lacking expressive power for some situations, like reentrancy which is one of the issues people frequently hit today. So I was asking if your examples aren't actually all about that specific problem, for which I believe we should figure out a solution without having people "drop down to manual lock/semaphore management style".

Another potential area to explore I can think of is shared isolation domains between some actors (achieved dynamically by sharing a custom serial executor by a few actors).

Rather than give up the entire actor model and go back to locks and semaphores, and especially litter actor code with locks and semaphores (even if async ones), I am interested in exploring ideas how to make it possible and simple to understand for people to use actors for these use-cases. The more ad-hoc lock/semaphore solutions we're allowing.

I'm not against introducing more low level concurrency control, in fact I'd welcome it -- like the synchronous locks being pitched now, I'm quite excited for this actually. But I do think that as we expand synchronization capabilities of the "low level" kind, we should not miss the chance to keep improving the "high level" kind (actors, tasks, groups). In this thread we jumped to the latter very quickly, which had me concerned.

I think you did confirm though that the issues you are trying to address are mostly reentrancy issues right? What if we allowed reentrancy control on actor methods instead?