RBI: missing data race diagnostics in actor-context-inheriting closure

there is another outstanding issue with RBI diagnostics that has been lingering in the back of my mind for a while. a prior thread on it can be found here, and the corresponding bug report here. the reproduction takes the following form:

class NS {
  var data: Int = 0
}

nonisolated func mutate(_ ns: NS) async {
  ns.data = 1
}

@MainActor
func isolation_crossing_race() async {
  let ns = NS()

  Task { @MainActor in
    // `ns` merged into MainActor's region here but isolation-
    // crossing call to `mutate()` is not diagnosed
    await mutate(ns)
  }

  // suspend the MainActor to allow the Task to run
  await Task.yield()

  // data race risk here since we read from the MainActor
  // but write from a nonisolated context
  _ = ns.data
}

this construction allows the non-Sendable value ns to be used in both an actor-isolated and non-isolated context simultaneously, which allows for a data race.

i initially thought the issue could be due to the changes to support GlobalActorIsolatedTypesUsability in the case that a non-Sendable type was sent into and then only used from a global actor's region[1]. upon further reflection however, i think the issue has to do with how actor-isolated closures deal with their parameters in the RBI analysis.

the issue in our example is that the closure passed to the Task is isolated to the main actor, and has a non-Sendable capture. per my current understanding, RBI operates on SIL functions and does not perform interprocedural analysis, so presumably the analysis must work only with information available in the closure's SILFunction representation.

if we enable the verbose RBI logging output[2] and find the SIL representation of the closure in this case, it looks something like this:

Dump:
// closure #1 in isolation_crossing_race()
// Isolation: global_actor. type: MainActor
sil private [ossa] @$s6output23isolation_crossing_raceyyYaFyyYacfU_ : $@convention(thin) @Sendable @async @substituted <΄_0_0> (@guaranteed Optional<any Actor>, @guaranteed NS) -> @out ΄_0_0 for <()> {
// %0 "$return_value"
// %2 "ns"                                        // users: %9, %3
bb0(%0 : $*(), %1 : @guaranteed $Optional<any Actor>, %2 : @closureCapture @guaranteed $NS):
  debug_value %2, let, name "ns", argno 1         // id: %3
  %4 = metatype $@thick MainActor.Type            // user: %6
  // function_ref static MainActor.shared.getter
  %5 = function_ref @$sScM6sharedScMvgZ : $@convention(method) (@thick MainActor.Type) -> @owned MainActor // user: %6
  %6 = apply %5(%4) : $@convention(method) (@thick MainActor.Type) -> @owned MainActor // users: %7, %10, %11
  hop_to_executor %6                              // id: %7
  // function_ref mutate(_:)
  %8 = function_ref @$s6output6mutateyyAA2NSCYaF : $@convention(thin) @async (@guaranteed NS) -> () // user: %9
  %9 = apply %8(%2) : $@convention(thin) @async (@guaranteed NS) -> ()
  hop_to_executor %6                              // id: %10
  destroy_value %6                                // id: %11
  %12 = tuple ()                                  // user: %13
  return %12                                      // id: %13
} // end sil function '$s6output23isolation_crossing_raceyyYaFyyYacfU_'

the logging output for the closure also includes the following:

╾──────────────────────────────â•ŧ
Performing pre-dataflow scan to gather flow insensitive information $s6output23isolation_crossing_raceyyYaFyyYacfU_:
╾──────────────────────────────â•ŧ
Initializing Function Args:
    Sendable: %0 = argument of bb0 : $*()
    Sendable: %1 = argument of bb0 : $Optional<any Actor>
    %%2 (sending): %2 = argument of bb0 : $NS                        // users: %9, %3

we can see that the closure's 'Initial Function Args' are determined to be the return value (Void in this case), an optional actor, and the closure-captured NS value.

what seems surprising to me here is that in the earlier log output (right after it says 'Dump:'), it appeared that the closure was known to be main-actor-isolated, but in the function argument initialization, it records the NS capture parameter as '(sending)'.

additionally, if we then consider the logging produced before performing the RBI dataflow, we can see that the TrackableValueState for the NS parameter has been inferred to be in a 'disconnected' region ([region_value_kind: disconnected]):

╾──────────────────────────────â•ŧ
Performing Dataflow!
╾──────────────────────────────â•ŧ
Values!
%%0: TrackableValue. State: TrackableValueState[id: 0][is_no_alias: no][is_sendable: yes][region_value_kind: disconnected].
    Rep Value: %0 = argument of bb0 : $*()
%%1: TrackableValue. State: TrackableValueState[id: 1][is_no_alias: no][is_sendable: yes][region_value_kind: disconnected].
    Rep Value: %1 = argument of bb0 : $Optional<any Actor>
%%2: TrackableValue. State: TrackableValueState[id: 2][is_no_alias: no][is_sendable: no][region_value_kind: disconnected].
    Rep Value: %2 = argument of bb0 : $NS                        // users: %9, %3

the fact that the analysis considers the NS parameter to be initially in a disconnected region is a problem as that allows it to be sent to the nonisolated async function without producing a diagnostic.

my interpretation of the RBI rules suggest that in this case, the non-Sendable closure parameter should initially be considered to be part of the MainActor's region, and not in a disconnected region.

from debugging the current behavior, it looks like there is this logic that gets hit which causes the capture to return true from the canFunctionArgumentBeSent() function. however, simply forcing that function to return false when the function is actor-isolated still doesn't seem to be sufficient to produce the expected diagnostic. doing so means that the parameter is no longer considered to be 'sending', but it is still not merged into the MainActor's region initially. it ends up being added to nonSendableJoinedIndices in this logic, but it is the only parameter added to that list, so the initial partition is still considered to be 'disconnected'.

i've also tested replacing the Task initializer with a function that looks like this:

func run(
  @_inheritActorContext
  _ operation: sending @escaping @isolated(any) () async -> Void
) {}

and it exhibits the same (compile-time) behavior. interestingly, removing the @_inheritActorContext parameter attribute does result in an error being produced.

i'll also note that i did test the (compile-time) behavior with the upcoming NonisolatedNonsendingByDefault feature enabled and the nonisolated function replaced with a @concurrent one, and the behavior appeared to be the same.

@Michael_Gottesman or @hborla do you have any ideas what an appropriate solution to this might look like? is there a reason we can't 'just' merge non-Sendable parameters into an actor's region in the case that a function/closure like this is actor-isolated?

if interested, the example & debug logging output can be seen here[3].


  1. relevant change mostly implemented here i think â†Šī¸Ž

  2. as described here â†Šī¸Ž

  3. at time of writing this was using a nightly build of Swift 6.2-dev â†Šī¸Ž

1 Like