`TaskGroup` and `sending` closures

Hello,

recently, I've been experimenting with some uses cases with TaskGroups. One problem that bites me quite often, is the following:

func send(
  operation: sending @escaping () async -> Void
) async {
  await withTaskGroup(of: Void.self) { taskGroup in
    taskGroup.addTask { // error: passing closure as a 'sending' parameter risks causing data races between code in the current task and concurrent execution of the closure
      await operation()
    }
  }
}

I've mentioned this is in past threads and also reported this as bug in Github already. However, this has been in several Swift versions (6.0 -> today's nightly), so I am not so sure anymore if this is an actual bug or something to be expected/not working (yet).

From my (maybe too limited) understanding of this topic, is, that operation is being sent once to the isolation domain of the withTaskGroups body and another time into the isolation domain of addTask. In reality, this should actually be fine, no? I am not using operation in any other context other than in addTask, so it should not be required to be @Sendable.

The error message changed comparing recent releases of Swift and nightly builds, however I am still not sure, if I fully understand it and can relate it to my code. Would someone mind help me understand the problem here?

Thank you.

1 Like

speculating a bit, but i think this may be 'working as expected'. in the original RBI evolution doc i found this callout, which perhaps explains this behavior:

Within the body of a non-Sendable closure, the closure and its non-Sendable captures are treated as being Task isolated since just like a parameter, both the closure and the captures may have uses in their caller:

so i think the implication in this circumstance is that when operation is captured by the withTaskGroup closure, it is, from the vantage of RBI, 'task-isolated' and not in a disconnected region. this means when attempting to further pass it through to addTask, it will be unable to be sent (hence the compiler error presumably).

you can seemingly work around this with a bit of indirection, e.g. the following appears to compile:

func send(
  operation: sending @escaping () async -> Void
) async {
  let box = { operation }
  await withTaskGroup(of: Void.self) { taskGroup in
    let operation = box()
    taskGroup.addTask {
      await operation()
    }
  }
}

i'm not entirely sure what the RBI rules that get applied in this case are, but my guess is that box would be treated as disconnected per this comment:

A non-Sendable closure's region is the merge of its non-Sendable captured parameters. As such a nonisolated non-Sendable closure that only captures values that are in disconnected regions must itself be in a disconnected region and can be transferred

so it can be sent into the withTaskGroup body. then when operation is pulled back out within the inner closure, it's treated as still being in a disconnected region maybe? that step is where i'm least certain i understand which rules are being applied (or if the behavior is correct...).

anyway, hopefully that helps illuminate things somewhat. would be great to hear from the experts though cc @hborla @Michael_Gottesman.


edit: as @rayx points out below, the boxing/unboxing 'trick' may be a bug, as its use can also lead to subverting data race detection in some cases, so 'caveat programmer'.

7 Likes

That's too bad :confused:.

Interesting workaround, thank you. I have tried to find something myself, but failed doing so.

Wondering if it'd be helpful if we'd had something like -emit-regions similar to -emit-sil but emits the regions in a similar textual fashion as described in the proposal. I think that could make RBI more transparent for more advanced users.

2 Likes

this kind of exists already. in compilers built with assertions enabled, there is a flag that will emit data about RBI analysis[1]. you can enable it by passing -Xllvm -sil-regionbasedisolation-log=<on|verbose> to the compiler invocation.

in your example, right around the error emission, the logs look like (godbolt):

===> PROCESSING: $s6output4send9operationyyyYacn_tYaFyScGyytGzYaXEfU_
Emitting diagnostics for function $s6output4send9operationyyyYacn_tYaFyScGyytGzYaXEfU_
Walking blocks for diagnostics.
|--> Block bb0
Entry Partition: [(1 2)]
Applying: require %%2:   %13 = partial_apply [callee_guaranteed] [isolated_any] %10(%12, %11) : $@convention(thin) @async @substituted <τ_0_0> (@guaranteed Optional<any Actor>, @guaranteed { var @async @callee_guaranteed () -> () }) -> @out τ_0_0 for <()> // user: %16
    Before: [(1 2)]
    After:  [(1 2)]
Applying: assign %%7 = %%2:   %13 = partial_apply [callee_guaranteed] [isolated_any] %10(%12, %11) : $@convention(thin) @async @substituted <τ_0_0> (@guaranteed Optional<any Actor>, @guaranteed { var @async @callee_guaranteed () -> () }) -> @out τ_0_0 for <()> // user: %16
    Before: [(1 2)]
    After:  [(1 2 7)]
Applying: require %%7:   %16 = apply %15<()>(%8, %13, %14) : $@convention(method) <τ_0_0 where τ_0_0 : Sendable> (@in_guaranteed Optional<TaskPriority>, @sil_sending @owned @isolated(any) @async @callee_guaranteed @substituted <τ_0_0> () -> @out τ_0_0 for <τ_0_0>, @inout TaskGroup<τ_0_0>) -> ()
    Before: [(1 2 7)]
    After:  [(1 2 7)]
Applying: send %%7:   %16 = apply %15<()>(%8, %13, %14) : $@convention(method) <τ_0_0 where τ_0_0 : Sendable> (@in_guaranteed Optional<TaskPriority>, @sil_sending @owned @isolated(any) @async @callee_guaranteed @substituted <τ_0_0> () -> @out τ_0_0 for <τ_0_0>, @inout TaskGroup<τ_0_0>) -> ()
    Before: [(1 2 7)]
    Emitting Error. Kind: Sent Non Sendable
        ID:  %%7
        Rep:   %13 = partial_apply [callee_guaranteed] [isolated_any] %10(%12, %11) : $@convention(thin) @async @substituted <τ_0_0> (@guaranteed Optional<any Actor>, @guaranteed { var @async @callee_guaranteed () -> () }) -> @out τ_0_0 for <()> // user: %16
        Dynamic Isolation Region: task-isolated
        Isolated Value: %2 = argument of bb0 : ${ var @async @callee_guaranteed () -> () } // users: %11, %4
        Isolated Value Name: operation
    After:  [(1 2 7)]
Applying: require %%1:   %16 = apply %15<()>(%8, %13, %14) : $@convention(method) <τ_0_0 where τ_0_0 : Sendable> (@in_guaranteed Optional<TaskPriority>, @sil_sending @owned @isolated(any) @async @callee_guaranteed @substituted <τ_0_0> () -> @out τ_0_0 for <τ_0_0>, @inout TaskGroup<τ_0_0>) -> ()
    Before: [(1 2 7)]
    After:  [(1 2 7)]
Exit Partition: [(1 2 7)]

as you can see, it logs the region partition info using similar notation to that used in the RBI evolution doc. since this pass operates on SIL, there's a bit of work to map it back to the actual source code involved, but it's certainly helpful for getting a better sense of what's going on.


  1. as covered here ↩︎

7 Likes

No way, I‘ve never seen this before. Thanks for the pointer.

1 Like

@jamieQ's workaround looks like a bug (or limitation?) in compiler, because I can use this approach to successfully compile code having data race.

Example 1: the clsoure passed to send() runs concurrently in taskGroup.addTask() and withTaskGroup().

func send(
  operation: sending @escaping () async -> Void
) async {
  let box = { operation }
  await withTaskGroup(of: Void.self) { taskGroup in
    let operation = box()
    taskGroup.addTask {
      await operation()
    }
    await box()() // closure runs concurrently but it compiles
  }
}

Example 2: This is based on an example in "Task Isolated Regions" section in SE-0414. The transferToMainActor() call should fail, but it compiles in the code below.

class NonSendable {}

@MainActor func transferToMainActor(_ x: NonSendable) async { }

func nonIsolatedCallee(_ x: NonSendable) async { }

func nonIsolatedCaller(_ x: NonSendable) async {
  await nonIsolatedCallee(x) // this is ok (as expected)

  let box = { return x } 
  await transferToMainActor(box()) // this should fail but it compiles

  await nonIsolatedCallee(x) // x is accessed concurrently in both MainActor and global executor.
}

EDIT: I filed #79262.

2 Likes

I just re-read the sending proposal and in the Future directions a Disconnected type is mentioned. From what I can understand, if it would support closures, it should keep the sending closure in disconnected region instead of task isolated region until I pass it down to addTask.

However, the last sentence:

and a value of type Disconnected can never be merged into another isolation region.

... makes me think that it might actually not work as I expect it to :man_shrugging:.

In the end it just makes me sad to see, that with the current TaskGroup-design it is just impossible to not use @Sendable.

@rayx Thanks for the heads up and the report.

This is one of the unfortunate downsides of changing Sendable into sending in APIs. Unlike Sendable, which is a pretty strong trait that could be easily composed, sending makes an object prone to be merged into existing regions (thus prohibiting it from being sent further).

The way I often use when I want to pass a sending closure across multiple layers into a final call is something like this:

func send(
  operation: sending @escaping () async -> Void
) async {
  // hack to view it as a Sendable closure
  nonisolated(unsafe) let operation = operation
  let op = { @Sendable in operation() } 

  // from now on, we use `op`, we know it cannot actually introduce races here
  await withTaskGroup(of: Void.self) { taskGroup in
    taskGroup.addTask {
      await op()
    }
  }
}

Of course, such code shifts the responsibility of guaranteeing data-race safety from the compiler to the developers. However, in this case I think it is acceptable, because we're explicitly using nonisolated(unsafe), which is eye-catching. When Swift someday provides a more intuitive approach, we can then perform a batched migration.

It could be generalized further, if this is what you want:

func withTemporarylSendability<T>(
  _ operation: sending @escaping () async -> Void, 
  block: (@Sendable @escaping () async -> Void) async -> T
) async -> T{
  nonisolated(unsafe) let operation = operation
  let op = { @Sendable in     
    await operation()
  }
  return await block(op)
}

func send2(
  operation: sending @escaping () async -> Void
) async {
   await withTemporarylSendability(operation) { op in
    await withTaskGroup(of: Void.self) { taskGroup in  
      taskGroup.addTask {
        await op()
      }
    }
  }
}

I personally refrain myself from using this though.

3 Likes

You are correct that this is working as expected.

The issue is that from the perspective of rbi, we do not know how many times the parent closure is actually invoked. If we had the ability to say that the closure was a used once closure and if we could have a way to say in a capture list that we are sending the value in, then I think we could get away with this. I agree that being able to escape that parameter in that manner is a bug. I am preparing a fix for that.

3 Likes

This is actually the logging I created to enable quick diagnosis/triaging of rbi bugs. In the past bit on main it was exposed also in non-asserts builds to make it easier to diagnose bugs even without an asserts compiler to speed up triaging in a situation where I am presenting just with a no-asserts compiler.

well, thanks for adding it! it has been tremendously helpful as an 'intuition building' tool with RBI. also, i personally greatly enjoy the text-based graphical rendering format that it uses.

also, as a slight tangent – just wanted to say that RBI is a really neat part of the compiler and concurrency model. you probably see lots of feedback about bugs/limitations/etc, but it's a really cool piece of technology, and when it works it's great and makes the concurrency model a lot nicer to use.

2 Likes

I wonder if we could have an escape hatch in the meantime by leaving the guarantee up to the developer and performing a check in runtime, similarly to how Optional.take() is used to capture a non-copyable value in a closure?

Is it because RBI analysis is only performed locally (meaning: within a single function at a time)? I don't find this was mentioned in SE-0414, but I think it can explain many behaviors/limitations of RBI.

An example:

// This doesn't compile
func send() async {
  let c: () async -> Void = { }
  await withTaskGroup(of: Void.self) { taskGroup in
    taskGroup.addTask {
      await c()
    }
  }
}

// But this compiles
func send() async {
  await withTaskGroup(of: Void.self) { taskGroup in
    let c: () async -> Void = { }
    taskGroup.addTask {
      await c()
    }
  }
}

IIUC it's this implementation limitation that makes it impossible to handle non-sendable value in Task isolation region in a more user-friendly way.