[Pitch] Synchronous Mutual Exclusion Lock

Alejandro · February 9, 2024, 4:56pm

Right, this is the guard API that I think would be very beneficial in resolving a lot of the issues lock() and unlock() have. Joe already mentioned how that API fixes the ownership problem, but is still susceptible to suspension point failures.

func something() async {
  let locked = mutex.lock()
  locked.value += 1
  await downloadImage()
  // locked implicitly unlocked here
}

There's still nothing prevent the user from writing legitimate suspension points between the calls of lock() and the explicit unlock() or the implicit deinitialization which will call it. I'm tempted to say we should mark that API noasync to fix this potential safety hole, but as it's already been stated, this wouldn't fix the with*Continuation style usage of locks. We could allow this API in asynchronous contexts with the big caveat that you must explicitly call consume locked or locked.unlock() before an actual suspension point otherwise you're subject to whatever the platform's behavior is when you unlock on a different thread (you may get back to the original thread in some cases, and in some cases you wouldn't!)

Another issue that the guard API resolves is the example that @tera posted:

In the guard API, the mutex.lock() would return a value that let's you mutate the protected state and explicitly call unlock() on when you no longer need to be in the critical section. The issue this resolves in tera's example is that it is now implicitly deinitialized in the if someCondition branch releasing the lock.

MPLewis · February 9, 2024, 5:16pm

I think that there are enough use cases presented here that non-closure-based lock and unlock methods are probably required for this to be a general-purpose lock at this point in Swift’s lifetime/ownership journey. Could we treat this as another opt-in safety escape hatch and just add unsafeLock() and unsafeUnlock() to enable all use cases for those who accept the risks of not managing the lock correctly?

Joe_Groff · February 9, 2024, 5:47pm

The core functionality of lifetime-dependent types is under active design and development, and should be enough to implement a guard type in the nearish future. It would be nice to avoid having the completely unsafe independent lock and unlock methods if we can hold out until nonescaping types are a reality. While the preconditions for their use are describable, I fear that it'll be difficult for developers to intuit when it's safe to assume that a lock and unlock always happen under the same enclosing borrow, since that isn't a constraint that they're likely to be used to thinking about coming from these sorts of APIs in other languages, and the behavior when you get it wrong is incomprehensible.

tera · February 9, 2024, 5:54pm

I like that. Let me clarify something, will "safe" locks still be able to deadlock?

let mutex1 = Mutex()
let mutex2 = Mutex()
mutex1.withLock {
    mutex2.withLock {
        mutex1.withLock {}
    }
}

John_McCall · February 9, 2024, 6:12pm

I think it's unlikely that the semantics of non-escaping types would ever naturally restrict suspension, though. This just isn't a good match for value-propagation-based restrictions.

Maybe we can attack it from the continuation side and find a way to conditionally delay the overhead of setting up a continuation? Although frankly that overhead is not very high already, at least with UnsafeContinuation.

Joe_Groff · February 9, 2024, 6:15pm

Yeah, although the current implementation always puts the buffer to receive the resumption value at the top of the task context, that doesn't seem like a hard requirement. It seems implementable to separate the allocation of the "continuation", which would reserve some space to receive the resumption value and provide the resume token, from the actual suspension, even if the task has to suspend for some other reason in between.

SlugFiller · February 9, 2024, 6:22pm

Necessarily yes:

let mutex1 = Mutex()
let mutex2 = Mutex()
let thread1 = DispatchQueue(label: "thread1", attributes: .concurrent)
let thread2 = DispatchQueue(label: "thread2", attributes: .concurrent)

thread1.async {
  mutex1.withLock {
    Thread.sleep(forTimeInterval: 1)
    mutex2.withLock {
      Thread.sleep(forTimeInterval: 1)
    }
  }
}
thread2.async {
  mutex2.withLock {
    Thread.sleep(forTimeInterval: 1)
    mutex1.withLock {
      Thread.sleep(forTimeInterval: 1)
    }
  }
}

The above creates a race condition that could lead to a deadlock. And the only way to prevent it is either with a complex global structure that tracks lock orders (which, itself, needs to be protected by a lock), or solving the halting problem.

This is probably why mutexes should be treated as low level, and an approach of, to paraphrase van Rossum, "programmers who use mutexes are consenting adults", should probably be taken.

Those who want to stay truly safe should, instead, use higher level constructs that use mutexes under the hood.

John_McCall · February 9, 2024, 6:48pm

We could also put some work into statically optimizing patterns where the continuation gets synchronously resumed and just sink the creation of the continuation so it never happens on that path.

dnadoba · February 9, 2024, 7:16pm

Almost. However, we also need some state that is derived from the protected state (yieldID in NIOs AsyncSequence) in the second closure.
If we don't create the continuation we need to return another derived value from the protected state. Therefore it would be some kind of Either type that we would need to return instead of just a Bool.

In addition, we also need to call out to a delegate after we have unlocked the protected state or otherwise we risk running into recursion while holding the lock as the delegate is calling into user defined code. This makes the proposed function unusable for our use case. A third closure parameter could work that is executed after the unlock but this seems like a rather complicated API to use and quite tailored to NIOs use case.

tera · February 9, 2024, 7:33pm

There's a cheap cheat version: provide an API that allows locking / unlocking several locks at once. Once you lock something to lock more you'd have to unlock the current list of locked locks and lock again the new list.

lock(a, b)    // ✅
lock(c)       // ❌ runtime error
unlockAll()   // ✅
lock(a, b, c) // ✅

This would never deadlock (provided the "lock" implementation is locking mutexes following some ordering rule e.g. from low to high).

Torust · February 9, 2024, 8:14pm

One slightly clunky option: make unlock unavailable from async but lock available. That would allow for continuation use and provide an escape hatch while still catching the most obvious usage issues.

Alejandro · February 9, 2024, 8:48pm

This doesn't fix the issue of when you only want to do with*Continuation in one branch but not the other:

func something() async {
  mutex.lock() // ok in async
  
  if someCondition {
    await withCheckedContinuation {
      mutex.unlock() // ok because the closure is synchronous
    }
  } else {
    mutex.unlock() // error: cant use unlock in async
  }
}

You've now acquired the lock with no way to unlock it.

John_McCall · February 9, 2024, 9:03pm

For what it's worth, they already seem to be separated; the continuation object and result buffer are both just arbitrary memory which may or may not be on the task stack, and the compiler just initialize the continuation with a pointer to the buffer.

The more I look into this, the more I think just unconditionally creating the continuation is the right solution to this problem. NIO folks, have you benchmarked to see if creating the continuation is actually even a significant overhead? There's certainly optimization work we can do to eliminate that overhead, but I'm curious to know if you're just assuming that it's more expensive than it is.

dnadoba · February 10, 2024, 12:31am

IIRC it made a measurable difference with checked continuations. We eventually switched to unchecked continuation and I don’t think we have measured always creating a continuation after the switch. @FranzBusch should know more.

With that being said I think there will be a measurable difference. AFAIK resuming a continuation will just enqueue a job on the executor. Is there or can there be some special code path that resumes the continuation synchronously without going through the executor?

John_McCall · February 10, 2024, 1:05am

No, if you resume a continuation synchronously, it will continue without suspending.

Alejandro · February 20, 2024, 6:55pm

I've gone ahead and updated the proposal to change the name from tryWithLock to withLockIfAvailable as I think this name reads better with the closure based API. When we get the guard APIs, I think tryLock would be the better name there, but that's a discussion for another day.