[RBI]: analysis and a proposed fix for a `class_method` data race safety hole

i was investigating this issue recently, which demonstrates a hole in RBI under certain conditions. a concise reproduction of which is:

nonisolated func send<V>(_: sending V) {}

open class NS {}

@MainActor
class MyKlazz {
    var state: [NS] = []

    func doit() {
        let alias = state[0]
        send(alias) // no error
    }
}

as noted in the issue, if the class is explicitly marked final the invalid send is identified. a comparison of the SIL and analysis pass' debug logs can be seen in this compiler-explorer example.

the primary difference seems to be that in the non-final case, the state access occurs through a class_method instruction (presumably to support potential subclassing), whereas in the final case, the access is performed via a ref_element_addr. currently it seems that in the former case the non-sendable value is considered to be in a disconnected region, whereas in the latter, it is appropriately identified as being in a main actor region. here's a snippet of the debug logs for those cases:

// non-final class example
%%5: TrackableValue. State: TrackableValueState[id: 5][is_no_alias: no][is_sendable: yes][region_value_kind: disconnected].
    Rep Value:   %6 = class_method %0 : $MyKlazz, #MyKlazz.state!getter : (MyKlazz) -> () -> [NS], $@convention(method) (@guaranteed MyKlazz) -> @owned Array<NS> // user: %7

// final class example
%%5: TrackableValue. State: TrackableValueState[id: 5][is_no_alias: no][is_sendable: no][region_value_kind: main actor-isolated].
    Rep Value:   %6 = ref_element_addr %0 : $MyKlazz, #MyKlazz.state // user: %7

this difference seems to result in the non-sendable state never becoming correctly 'entangled' with a main actor-isolated region, and so it is incorrectly allowed to be sent.

looking into how ClassMethodInst is treated by the analysis, it seems both it and RefElementAddrInst have some special handling. RefElementAddrInst has a custom implementation to determine partition operation semantics[1], and ClassMethodInst has this logic that checks for isolation information when computing the 'trackable' values for the analysis.

a likely explanation for the current behavior is that when performing value tracking, an attempt is made to determine the appropriate isolation of the class_method instruction by means of an AST walker (here). currently this particular pattern of subscript access isn't handled during the AST traversal, so that analysis fails to see that there is a self reference in play. as a result, none of the existing logic to try and derive global actor isolation is performed, so we end up thinking the instruction has no relevant isolation, and therefore it produces no 'trackable value'. this ultimately results in the incorrect observed behavior.

this PR contains an exploratory change that attempts to address this issue by widening the 'look through' operations of the AST walker such that it will find the self reference in this case so it will correctly track the class_method instruction, and assign it main actor isolation. the difference in debug logs when handling the class_method instruction before and after this change looks like:

// before:
Visiting:   %6 = class_method %0 : $MyKlazz, #MyKlazz.state!getter : (MyKlazz) -> () -> [NS], $@convention(method) (@guaranteed MyKlazz) -> @owned Array<NS> // user: %7
    Semantics: assign

// after:
Visiting:   %6 = class_method %0 : $MyKlazz, #MyKlazz.state!getter : (MyKlazz) -> () -> [NS], $@convention(method) (@guaranteed MyKlazz) -> @owned Array<NS> // user: %7
    Semantics: assign
 ┌─â”Ŧ─â•ŧ  %6 = class_method %0 : $MyKlazz, #MyKlazz.state!getter : (MyKlazz) -> () -> [NS], $@convention(method) (@guaranteed MyKlazz) -> @owned Array<NS> // user: %7
 │ └─â•ŧ  line:10:26
 ├─────â•ŧ assign_fresh %%5:   %6 = class_method %0 : $MyKlazz, #MyKlazz.state!getter : (MyKlazz) -> () -> [NS], $@convention(method) (@guaranteed MyKlazz) -> @owned Array<NS> // user: %7
 └─────â•ŧ Used Values
          └â•ŧ State: %%5. TrackableValueState[id: 5][is_no_alias: no][is_sendable: no][region_value_kind: main actor-isolated].
             Rep Value:   %6 = class_method %0 : $MyKlazz, #MyKlazz.state!getter : (MyKlazz) -> () -> [NS], $@convention(method) (@guaranteed MyKlazz) -> @owned Array<NS> // user: %7
             Type: $@convention(method) (@guaranteed MyKlazz) -> @owned Array<NS>

and the expected diagnostic is produced:

error: sending 'alias' risks causing data races [#SendingRisksDataRace]
 9 |     func doit() {
10 |         let alias = state[0]
11 |         send(alias)
   |         |- error: sending 'alias' risks causing data races [#SendingRisksDataRace]
   |         `- note: main actor-isolated 'alias' is passed as a 'sending' parameter; Uses in callee may race with later main actor-isolated uses
12 |     }
13 | }

some further thoughts/questions regarding this issue & potential solution:

first, does the proposed change seem like an appropriate fix? it's not totally clear to me that just adding more 'look through' instructions is quite right, though it does seem to do the trick in this particular instance. should we instead be pattern matching on the AST more explicitly before walking a subexpression?

next, in a case like this, is an AST traversal necessary? could we instead just take the operand from the class_method instruction and check its isolation? that's perhaps insufficient in more complex cases, and for detecting nonisolated(unsafe) in chained calls (which i think the DeclRefExprAnalysis attempts to do).

finally, i am a bit confused by the fact that when the analysis is initially set up for the function, the self argument is determined to be Sendable and therefore is not tracked (see logic here), despite the function's being main actor-isolated. this means the initial isolation of the region partition is unspecified, rather than main actor-isolated. i originally pursued a fix that would place a 'fake' tracked value to stand in for the sendable self (which appears to be done in some places), but ran into a number of problems with that approach and gave up. still, my intuition is that the first partition should somehow carry the isolation of the function itself (even if initially empty)... is that a sensible thing to think?

cc @Michael_Gottesman, @hborla – i would be interested in any feedback you may have on this if/when you get a chance to take a look.


  1. tangent: the logic & comment here appear to be mismatched... which one should be fixed? â†Šī¸Ž

Non-final properties of a class are accessed by calling their accessor methods, because the property can be overridden in a subclass. class_method is used to perform a dynamic dispatched call to an overridable class method.

For example:

class Base {
  var x: Int = 0
}

class Derived: Base {
  override var x: Int {
    get { return 0 }
    set {}
  }
}

You might also want to check if your bug occurs with non-accessor method calls, or subscripts, or a computed but final property.

1 Like