the following is a compilation of notes and thoughts from researching code that should produce RBI diagnostics, but is not currently doing so. it is somewhat long and meandering, but can hopefully serve as documentation for any future efforts to investigate similar problems. i don't claim to have a great mental model of everything going on here, so do feel free to correct any mistakes i've made!
Context
i've been looking into a class of issues with region based isolation (RBI) diagnostics that seem to admit code with data races. here's an example that is fairly representative of the problem:
@MainActor // *
func mutableLocalCaptureDataRace() {
var x = 0
Task.detached { x = 1 }
x = 2
}
// * note: the @MainActor annotation may not actually be necessary to
// reproduce this problem, but it was present during my investigation
this currently compiles without a concurrency error or warning in the 6.0 (and 6.1) compilers, under both 5 & 6 language modes.
using the RBI logging[1], we can inspect the output of the analysis to get a better sense of what's going on. here are the 'interesting' bits of the output for the mutableLocalCaptureDataRace
function (note: this excludes the Task closure as it is analyzed independently).
per my current understanding of the relevant code, the initial step of the analysis processes the SILGen instructions to determine the 'trackable' values required for the dataflow phase. additionally, each instruction is translated from SIL into zero or more 'partition operations', depending on the specific instruction and translation semantics. the 'trackable state' collected during this phase also records various pieces of information about elements, like whether it represents a non-sendable value, the isolation region it's in, etc.
here's a portion of the pre-dataflow phase log output where the partial_apply
instruction for the Task closure is handled:
// SIL:
// function_ref closure #1 in mutableLocalCaptureDataRace()
%11 = function_ref @$s6output27mutableLocalCaptureDataRaceyyFyyYacfU_ : $@convention(thin) @async @substituted <ฯ_0_0> (@guaranteed Optional<any Actor>, @guaranteed { var Int }) -> @out ฯ_0_0 for <()> // user: %14
%12 = copy_value %1 : ${ var Int } // user: %14
%13 = enum $Optional<any Actor>, #Optional.none!enumelt // user: %14
%14 = partial_apply [callee_guaranteed] [isolated_any] %11(%13, %12) : $@convention(thin) @async @substituted <ฯ_0_0> (@guaranteed Optional<any Actor>, @guaranteed { var Int }) -> @out ฯ_0_0 for <()> // user: %16
// RBI:
Visiting: // function_ref closure #1 in mutableLocalCaptureDataRace()
%11 = function_ref @$s6output27mutableLocalCaptureDataRaceyyFyyYacfU_ : $@convention(thin) @async @substituted <ฯ_0_0> (@guaranteed Optional<any Actor>, @guaranteed { var Int }) -> @out ฯ_0_0 for <()> // user: %14
Semantics: assign_fresh
Visiting: %12 = copy_value %1 : ${ var Int } // user: %14
Semantics: look_through
Visiting: %13 = enum $Optional<any Actor>, #Optional.none!enumelt // user: %14
Semantics: assign_fresh
Visiting: %14 = partial_apply [callee_guaranteed] [isolated_any] %11(%13, %12) : $@convention(thin) @async @substituted <ฯ_0_0> (@guaranteed Optional<any Actor>, @guaranteed { var Int }) -> @out ฯ_0_0 for <()> // user: %16
Semantics: special
โโโฌโโผ %14 = partial_apply [callee_guaranteed] [isolated_any] %11(%13, %12) : $@convention(thin) @async @substituted <ฯ_0_0> (@guaranteed Optional<any Actor>, @guaranteed { var Int }) -> @out ฯ_0_0 for <()> // user: %16
โ โโโผ line:5:17
โโโโโโโผ require %%1: %14 = partial_apply [callee_guaranteed] [isolated_any] %11(%13, %12) : $@convention(thin) @async @substituted <ฯ_0_0> (@guaranteed Optional<any Actor>, @guaranteed { var Int }) -> @out ฯ_0_0 for <()> // user: %16
โ โโผ assign %%10 = %%1: %14 = partial_apply [callee_guaranteed] [isolated_any] %11(%13, %12) : $@convention(thin) @async @substituted <ฯ_0_0> (@guaranteed Optional<any Actor>, @guaranteed { var Int }) -> @out ฯ_0_0 for <()> // user: %16
โโโโโโโผ Used Values
โโผ State: %%1. TrackableValueState[id: 1][is_no_alias: no][is_sendable: no][region_value_kind: disconnected].
Rep Value: %1 = begin_borrow [var_decl] %0 : ${ var Int } // users: %27, %12, %2
Type: ${ var Int }
โโผ State: %%10. TrackableValueState[id: 10][is_no_alias: no][is_sendable: no][region_value_kind: disconnected].
Rep Value: %14 = partial_apply [callee_guaranteed] [isolated_any] %11(%13, %12) : $@convention(thin) @async @substituted <ฯ_0_0> (@guaranteed Optional<any Actor>, @guaranteed { var Int }) -> @out ฯ_0_0 for <()> // user: %16
Type: $@isolated(any) @async @callee_guaranteed @substituted <ฯ_0_0> () -> @out ฯ_0_0 for <()>
the logs indicate that handling the isolated partial_apply
has 'special' translation semantics, and in this case the instruction is translated into both the 'require' and 'assign' partition operations. the 'require' is for the tracked value with id %%1
, which in this instance is the mutable local variable x
. IIUC, a 'require' here means that during the dataflow, the tracked value's region will be required to not have been marked as 'sent' when this partition operation is encountered. if that's not the case, a diagnostic should (generally) be produced.
then we have the output from the call to Task.detached
which looks like:
//SIL:
// function_ref static Task<>.detached(priority:operation:)
%15 = function_ref @$sScTss5NeverORs_rlE8detached8priority9operationScTyxABGScPSg_xyYaYAcntFZ : $@convention(method) <ฯ_0_0, ฯ_0_1 where ฯ_0_0 : Sendable, ฯ_0_1 == Never> (@in_guaranteed Optional<TaskPriority>, @sil_sending @owned @isolated(any) @async @callee_guaranteed @substituted <ฯ_0_0> () -> @out ฯ_0_0 for <ฯ_0_0>, @thin Task<ฯ_0_0, Never>.Type) -> @owned Task<ฯ_0_0, Never> // user: %16
%16 = apply %15<(), Never>(%9, %14, %8) : $@convention(method) <ฯ_0_0, ฯ_0_1 where ฯ_0_0 : Sendable, ฯ_0_1 == Never> (@in_guaranteed Optional<TaskPriority>, @sil_sending @owned @isolated(any) @async @callee_guaranteed @substituted <ฯ_0_0> () -> @out ฯ_0_0 for <ฯ_0_0>, @thin Task<ฯ_0_0, Never>.Type) -> @owned Task<ฯ_0_0, Never> // user: %19
// RBI:
Visiting: // function_ref static Task<>.detached(priority:operation:)
%15 = function_ref @$sScTss5NeverORs_rlE8detached8priority9operationScTyxABGScPSg_xyYaYAcntFZ : $@convention(method) <ฯ_0_0, ฯ_0_1 where ฯ_0_0 : Sendable, ฯ_0_1 == Never> (@in_guaranteed Optional<TaskPriority>, @sil_sending @owned @isolated(any) @async @callee_guaranteed @substituted <ฯ_0_0> () -> @out ฯ_0_0 for <ฯ_0_0>, @thin Task<ฯ_0_0, Never>.Type) -> @owned Task<ฯ_0_0, Never> // user: %16
Semantics: assign_fresh
Visiting: %16 = apply %15<(), Never>(%9, %14, %8) : $@convention(method) <ฯ_0_0, ฯ_0_1 where ฯ_0_0 : Sendable, ฯ_0_1 == Never> (@in_guaranteed Optional<TaskPriority>, @sil_sending @owned @isolated(any) @async @callee_guaranteed @substituted <ฯ_0_0> () -> @out ฯ_0_0 for <ฯ_0_0>, @thin Task<ฯ_0_0, Never>.Type) -> @owned Task<ฯ_0_0, Never> // user: %19
Semantics: apply
โโโฌโโผ %16 = apply %15<(), Never>(%9, %14, %8) : $@convention(method) <ฯ_0_0, ฯ_0_1 where ฯ_0_0 : Sendable, ฯ_0_1 == Never> (@in_guaranteed Optional<TaskPriority>, @sil_sending @owned @isolated(any) @async @callee_guaranteed @substituted <ฯ_0_0> () -> @out ฯ_0_0 for <ฯ_0_0>, @thin Task<ฯ_0_0, Never>.Type) -> @owned Task<ฯ_0_0, Never> // user: %19
โ โโโผ line:5:8
โโโโโโโผ require %%10: %16 = apply %15<(), Never>(%9, %14, %8) : $@convention(method) <ฯ_0_0, ฯ_0_1 where ฯ_0_0 : Sendable, ฯ_0_1 == Never> (@in_guaranteed Optional<TaskPriority>, @sil_sending @owned @isolated(any) @async @callee_guaranteed @substituted <ฯ_0_0> () -> @out ฯ_0_0 for <ฯ_0_0>, @thin Task<ฯ_0_0, Never>.Type) -> @owned Task<ฯ_0_0, Never> // user: %19
โ โโผ send %%10: %16 = apply %15<(), Never>(%9, %14, %8) : $@convention(method) <ฯ_0_0, ฯ_0_1 where ฯ_0_0 : Sendable, ฯ_0_1 == Never> (@in_guaranteed Optional<TaskPriority>, @sil_sending @owned @isolated(any) @async @callee_guaranteed @substituted <ฯ_0_0> () -> @out ฯ_0_0 for <ฯ_0_0>, @thin Task<ฯ_0_0, Never>.Type) -> @owned Task<ฯ_0_0, Never> // user: %19
โโโโโโโผ Used Values
โโผ State: %%10. TrackableValueState[id: 10][is_no_alias: no][is_sendable: no][region_value_kind: disconnected].
Rep Value: %14 = partial_apply [callee_guaranteed] [isolated_any] %11(%13, %12) : $@convention(thin) @async @substituted <ฯ_0_0> (@guaranteed Optional<any Actor>, @guaranteed { var Int }) -> @out ฯ_0_0 for <()> // user: %16
Type: $@isolated(any) @async @callee_guaranteed @substituted <ฯ_0_0> () -> @out ฯ_0_0 for <()>
we can see that in the translation of the apply
instruction, both a 'require' and 'send' partition operation are created. the 'send' will mark all trackable state values in the same region as %%10
to become marked as 'sent', and so subsequent 'require' operations on any of them should typically produce diagnostics.
now if we focus on the logging around the subsequent assignment x = 2
, we have from the pre-dataflow logs:
// SIL:
%24 = begin_access [modify] [dynamic] %2 : $*Int // users: %25, %26
store %23 to [trivial] %24 : $*Int // id: %25
end_access %24 : $*Int // id: %26
// RBI:
Visiting: %24 = begin_access [modify] [dynamic] %2 : $*Int // users: %25, %26
Semantics: look_through
Visiting: store %23 to [trivial] %24 : $*Int // id: %25
Semantics: store
Visiting: end_access %24 : $*Int // id: %26
Semantics: ignored
the logging indicates that the SIL store
instruction has 'store' semantics, and if we look at the logic for this translation, it appears that a store to a non-sendable destination should be translated into either an assign or merge partition operation, both of which should eventually add some additional 'require' partition operations if trackable state is involved. i believe this means that in the dataflow portion of the analysis, we would expect to see a 'require' operation processed after the earlier 'send' produced for the Task.detached
instructions.
however, if we inspect the output of the dataflow logs, they appear to end with the 'send' operation associated with the Task.detached
call:
// Relevant trackable state:
%%0: TrackableValue. State: TrackableValueState[id: 0][is_no_alias: no][is_sendable: no][region_value_kind: disconnected].
Rep Value: %0 = alloc_box ${ var Int }, var, name "x" // users: %28, %1
%%1: TrackableValue. State: TrackableValueState[id: 1][is_no_alias: no][is_sendable: no][region_value_kind: disconnected].
Rep Value: %1 = begin_borrow [var_decl] %0 : ${ var Int } // users: %27, %12, %2
%%10: TrackableValue. State: TrackableValueState[id: 10][is_no_alias: no][is_sendable: no][region_value_kind: disconnected].
Rep Value: %14 = partial_apply [callee_guaranteed] [isolated_any] %11(%13, %12) : $@convention(thin) @async @substituted <ฯ_0_0> (@guaranteed Optional<any Actor>, @guaranteed { var Int }) -> @out ฯ_0_0 for <()> // user: %16
// Dataflow:
Block: bb0
Visiting Preds!
Applying: assign_fresh %%0: %0 = alloc_box ${ var Int }, var, name "x" // users: %28, %1
Before: []
After: [(0)]
Applying: require %%0: %1 = begin_borrow [var_decl] %0 : ${ var Int } // users: %27, %12, %2
Before: [(0)]
After: [(0)]
Applying: assign %%1 = %%0: %1 = begin_borrow [var_decl] %0 : ${ var Int } // users: %27, %12, %2
Before: [(0)]
After: [(0 1)]
Applying: require %%1: %14 = partial_apply [callee_guaranteed] [isolated_any] %11(%13, %12) : $@convention(thin) @async @substituted <ฯ_0_0> (@guaranteed Optional<any Actor>, @guaranteed { var Int }) -> @out ฯ_0_0 for <()> // user: %16
Before: [(0 1)]
After: [(0 1)]
Applying: assign %%10 = %%1: %14 = partial_apply [callee_guaranteed] [isolated_any] %11(%13, %12) : $@convention(thin) @async @substituted <ฯ_0_0> (@guaranteed Optional<any Actor>, @guaranteed { var Int }) -> @out ฯ_0_0 for <()> // user: %16
Before: [(0 1)]
After: [(0 1 10)]
Applying: require %%10: %16 = apply %15<(), Never>(%9, %14, %8) : $@convention(method) <ฯ_0_0, ฯ_0_1 where ฯ_0_0 : Sendable, ฯ_0_1 == Never> (@in_guaranteed Optional<TaskPriority>, @sil_sending @owned @isolated(any) @async @callee_guaranteed @substituted <ฯ_0_0> () -> @out ฯ_0_0 for <ฯ_0_0>, @thin Task<ฯ_0_0, Never>.Type) -> @owned Task<ฯ_0_0, Never> // user: %19
Before: [(0 1 10)]
After: [(0 1 10)]
Applying: send %%10: %16 = apply %15<(), Never>(%9, %14, %8) : $@convention(method) <ฯ_0_0, ฯ_0_1 where ฯ_0_0 : Sendable, ฯ_0_1 == Never> (@in_guaranteed Optional<TaskPriority>, @sil_sending @owned @isolated(any) @async @callee_guaranteed @substituted <ฯ_0_0> () -> @out ฯ_0_0 for <ฯ_0_0>, @thin Task<ฯ_0_0, Never>.Type) -> @owned Task<ฯ_0_0, Never> // user: %19
Before: [(0 1 10)]
After: [{0 1 10}]
Working Partition: [{0 1 10}]
Exit Partition: [{0 1 10}]
Updated Partition: yes
// โ๏ธ where's the `require` for the write to `x`?
Potential explanation
after digging into the implementation[2], this is what i think is going on:
- during the initial pass through the SIL instructions to collect 'trackable state', when we reach the
begin_access
instruction for the second write tox
:
we enter the method for translating instructions with 'lookthrough' semantics.%24 = begin_access [modify] [dynamic] %2 : $*Int // users: %25, %26
- the 'destination' value, which in this case is the
begin_access
instruction, is attempted to be tracked. - after a few method hops, this logic ends up in getUnderlyingTrackedValueHelper
- it fails the test to check for non-sendability here (first goes here, and then here). this seems to be because the type of the destination value resolves to
Int
, which is sendable. - since the value appears to be sendable, a new 'underlying tracked value' is created and returned. this eventually shows up in the log output as:
which indicates that the%%18: TrackableValue. State: TrackableValueState[id: 18][is_no_alias: no][is_sendable: yes][region_value_kind: disconnected]. Rep Value: %24 = begin_access [modify] [dynamic] %2 : $*Int // users: %25, %26
is_sendable
flag is true. - now, when we get to the translation logic for the
store
instruction, we check if the store's destination is trackable state (destination in this case is thebegin_access
inst), and we fail because the state is marked asis_sendable
. - the
store
is thus ignored, and produces no partition operations that would validatex
isn't used after being sent
Possible remediation
i think it's odd that the result of translating an instruction with 'look through' semantics would produce a new underlying trackable value. it would seem to me that a begin_access
should generally track whatever backs the accessed value. there is logic to this effect for handling ProjectBoxInst
, for example.
adding an additional condition to the underlying value helper early exit check that requires the instruction not be a look through instruction, still does not produce a diagnostic. however, this change does enable the translateSILStore()
logic to end up in translateSILMerge, which seems like an improvement.
it seems that in the merge translation logic, we currently only add a partition 'require' for the destination value if both it and at least one of its operands are 'trackable'. if we change the code to always add a 'require' for the destination value as long as it's trackable, then we get a diagnostic to the effect of:
18 | var x = 0
19 |
20 | Task.detached { x = 1 }
| |- error: sending value of non-Sendable type '() async -> ()' risks causing data races
| `- note: Passing value of non-Sendable type '() async -> ()' as a 'sending' argument to static method 'detached(priority:operation:)' risks causing races in between local and caller code
21 |
22 | x = 2
| `- note: access can happen concurrently
23 | }
24 |
i have not yet attempted drafting a PR or looked too deeply into what might break with such a change, but i'm curious if a fix along the lines proposed seems reasonable to those more familiar with the code, or any interested parties (cc @Michael_Gottesman, @hborla). i will also note that @Nickolas_Pohilets recently put together a draft PR to address a similar issue, which takes a slightly different approach to addressing the problem (changing the sendability check logic vs altering 'look through' tracking semantics).
References
- example code in godbolt. note that at the time of writing, it is using a compiler from the current 6.1 dev branch so that the RBI logs will show up.