Pitch #3: ConcurrentValue and @concurrent closures

jayton · February 1, 2021, 7:44pm

I don’t think I was clear enough about the intended benefit, which is that the compiler could tell you if you broke verifiable conformance without reading an explicit conformance as “nah, it’s fine”. If it is fine, you can then add the explicit conformance.

michelf · February 2, 2021, 12:18am

If the goal is to provide compiler-verified safety, then the boundaries of what should be a ConcurrentValue depends on the boundaries of what the compiler can ensure is safe. If the compiler cannot verify you won't use the array's subscript to get an object and use that object, then it can't (safely) allow you to pass an array with those objects.

If some parts of the public API of a value aren't safe to use concurrently but others are, the best way to ensure it is safe to pass around is to use a wrapper only exposing methods that are concurrency-safe. For instance:

struct MainThreadObjectWrapper<T: AnyObject>: ConcurrentValue {
    private var _object: T

    public init(_ object: T) {
        precondition(Thread.current == .main) // ideally check actor instead
        _object = object
    }

    // concurrency-safe: no race
    public var identifier: ObjectIdentifier {
        ObjectIdentifier(_object)
    }

    // concurrency-safe: traps when called from wrong thread
    public func unwrap() -> T {
        precondition(Thread.current == .main) // ideally check actor instead
        return _object
    }
}

This wrapper is unsafe code (concurrency safety not validated by compiler within the wrapper), but restrictions imposed by its public API makes it safe to use.

The alternative would be…

... to have a function attribute for concurrency-safe APIs and some type attribute to tell us whether a variable actually belongs to another concurrency context. In such a world, dictionary lookups are only concurrency-safe if its key's hash(to:) and == are itself concurrency-safe. And the dictionary's declaration modifier telling us it's holding concurrently-accessible data needs to propagate to the return value of this lookup to limit its uses with non-@concurrent functions. That's a lot of thing to pack in a declaration:

@concurrent(where Key.hash(to:) is @concurrent)
@concurrent(where Key.==() is @concurrent)
subscript (key: Key) -> @concurrent(where Self is @concurrent) Value
{ ... }

Except for the most simple cases, this is very impractical. It's also less flexible than manually written wrappers where you can use other strategies to provide safe APIs, such as unwrap() above.

gwendal.roue · February 2, 2021, 10:43am

Sounds legit at first sight. But I wonder if the designers of this pitch did foresee this side effect, and the difficulties attached to it. Hence my question to @Chris_Lattner3.

Douglas_Gregor · February 3, 2021, 1:42am

One thing that bothered me about introducing by-value captures here is that we would end up with different default capture rules for @concurrent and non-@concurrent functions. And given that @concurrent is going to be inferred most of the time, the actual capture semantics for a given closure won't be obvious from the source code.

We already have a syntax for introducing a by-value capture, using capture lists:

var x = 17
_ = { ... x ... }.         // by-reference capture
_ = { [x] in ... x ... } // by-value capture of x

@Joe_Groff suggested something that could help here: instead of having @concurrent closures capture by value, we could perform a flow-sensitive check that ensure that a value captured by a @concurrent closure is not modified after the point where it is captured. The difference can be observed in code like this:

var x = 17
Task.runDetached {
  /* sleep for a while */
  print(x)
}
x = 25

If @concurrent closures capture by value, this will print 17. Joe's suggestion is to make this code ill-formed because the modification of x is after its capture. This makes by-value vs. by-reference irrelevant for @concurrent closures.

The above statement is pedantically true, but pragmatically speaking, uses of withoutActuallyEscaping or fork-join parallelism implemented in C are very rare. On the other hand, @escaping closures are very common---in completion handlers (for asynchronous code not yet ported to async) and callback interfaces to work with libraries.

If every existing @escaping that comes in from some library that hasn't been ported to concurrency is a potential data race with the actors you write, we don't have a model people can feel confident working with.

The discussion here is focusing on C APIs, but the same issues persist for existing Swift libraries that escape closures to other threads.

Chris_Lattner3:

Douglas_Gregor:

Even though it might eventually need to be a separate proposal, I'd like the notion of a "marker protocol" to be sketched out a little bit in this proposal, so folks can understand the intended limitations. I think it's mostly a matter of not permitting is ConcurrentValue or as? ConcurrentValue or, really, any value of type ConcurrentValue ... a bit like the current prohibition on creating values of protocol type where the protocol contains Self requirements. That's the semantics we need to make marker protocols have zero runtime footprint.

I'm super happy to adopt that, could you propose wording, or directly write it into the proposal? You should have direct edit access, let me know if not. Alternatively, I can also inject edits if you'd prefer to communicate them another way.

I added some wording to the document.

Chris_Lattner3:

Douglas_Gregor:

I feel like we're going to need a stronger opt out for "I know this declaration uses non- ConcurrentValue types but I really need to use it from another actor". The property-wrapper formulation handles some cases (parameters, properties), but not other important ones (like the return value of a function or subscript). @actorIndependent(unsafe) has effectively the right semantics here, so we could use that, but it feels like the wrong spelling when you're doing the opt-out for (say) a captured local variable.

Are you suggesting an additional affordance, or arguing against something in the base proposal? If the former, then I think it should be part of an @actorIndependent proposal. If the later, I'd love to know more, e.g. with an example of what you're thinking about here.

I'm suggesting an additional affordance. Let's use NSString as an admittedly-silly example:

actor A {
  func f() -> NSString {
    return NSString(copyingBytes: ...) // I promise to return a unique instance!
  }
}

func test(a: A) async {
  print(await a.f()) // currently an error
}

The affordance I'm looking for here is an annotation on A.f that says "I know I have non-ConcurrentValue types, but I promise to be careful if you let me call this from outside the actor". We will have these cases, where we need an opt out, and I fear that if the only opt out is the dreaded extension making NSString conform to ConcurrentValue, we'll have made an incorrect global assertion to solve a very local (and likely transitional) problem.

The document states that @concurrent is inferred for a closure that contains an await. await is already used to infer that a closure is async, and I don't think we want to conflate asynchrony with concurrency. An asynchronous closure in an actor-isolated method should be able to actor-isolated state, so long as it won't be run concurrently.

The document has this:

Global functions are @concurrent compatible if their argument and result types all conform to ConcurrentValue.**

Does "@concurrent compatible" mean that @concurrent is inferred on the type of the global function, or does it mean that the user is allowed to write @concurrent on the global function? At a minimum, I expect:

@concurrent func global2(x: Int) -> Int { x } // well-formed, type is @concurrent (Int) -> Int
@concurrent func global3(x: Int) -> NSString { NSString() } // error!

However, what about these?

func global1(x: Int) -> Int { x }  // all are ConcurrentValue types, of course

let g1 = global1(x:) // is the type (Int) -> Int or @concurrent (Int) -> Int?
let g2: @concurrent (Int) -> Int = global1(x:) // is this permitted?

Doug

Chris_Lattner3 · February 4, 2021, 5:30pm

I just posted draft #4 in a new pitch thread incorporating much of the feedback received in the last few days. I think it is best to discuss this draft on that thread, but I wanted to acknowledge some points here.

Yes, I agree with you. The new draft splits the protocol into a ConcurrentValue and UnsafeConcurrentValue protocol, and argues that class conformance should always use the Unsafe version of this. I'd appreciate your thoughts on pitch 4 thread.

Agreed, same comment. Flipping the parity of the proposal to require explicit conformance and John McCall's suggestion of adding an UnsafeConcurrentValue has provided a really nice model for this that seems clean.

The problem here is that you don't get transitive memory safety. Consider Array<NSMutableString> for example. While you can pass the array itself around, dereferencing the pointers to the strings inside of it would cause race conditions given the possibility for mutation. One major purpose of this proposal is to define away safety problems like that.

They aren't out of the game. The proposal provides a number of ways to address this with library-based solutions, see for example this section. Those library solutions are not part of this proposal though, they should be subsequent proposals.

Douglas_Gregor:

@Joe_Groff suggested something that could help here: instead of having @concurrent closures capture by value, we could perform a flow-sensitive check that ensure that a value captured by a @concurrent closure is not modified after the point where it is captured. The difference can be observed in code like this:
var x = 17
Task.runDetached {
  /* sleep for a while */
  print(x)
}
...
x = 25
If @concurrent closures capture by value, this will print 17. Joe's suggestion is to make this code ill-formed because the modification of x is after its capture. This makes by-value vs. by-reference irrelevant for @concurrent closures.

I'm not exactly sure how all the rules would stack out here, but I'm concerned about the UX that this would provide. With the "concurrent closures capture by value" rule, you get simple and predictable errors when you try to mutate values inside of the closure, and an easy way to explain the issue to the user.

With a flow sensitive rule, you get errors at the runDetatched call site because you have an assignment to x (potentially much later in the function). x may also be captured by other sync closures in the function, so you get transitive lifetime checking issues, which can be done with dataflow but make it even more difficult to explain to users.

I agree that this is an important concern. I think it makes sense to add a clang attributes to the key APIs (eg. dispatch) to control whether blocks get imported as concurrent closures in Swift. Perhaps the default should be that they should come in as concurrent closures if escaping, but there is an attribute to turn that off? I'm not sure the best sense of this. I don't think that thin function pointers need to be concurrent because they can't capture anything.

Yes, I think there is another missing link in the actor type system that is worth exploring. I'm not sure if the exploration goes anywhere, though. The observation is that in:

actor SomeActor {
  func foo() {}
  func test(other: SomeActor) {
     other.foo()  // ill formed because of cross actor reference
     self.foo()    // ok
  }
}

Here we have two values of SomeActor type: "other" and "self". However, other is actually has something more like a async SomeActor type (and we allow omitting the async) and self is more like sync SomeActor (which we currently cannot spell).

If we had the ability to spell this, it would allow a lot of interesting things, including proper exposure of unsafe APIs that poke at the internals of an actor across actor boundaries, etc.

Anyway, I have a picture in my brain of what this could look like, but it isn't fully built out, I'll try to throw some e-ink to e-paper about this this weekend if I have time to bake them out a bit better.

Douglas_Gregor:

The document has this:

Global functions are @concurrent compatible if their argument and result types all conform to ConcurrentValue.**

Does " @concurrent compatible" mean that @concurrent is inferred on the type of the global function, or does it mean that the user is allowed to write @concurrent on the global function? At a minimum, I expect:
@concurrent func global2(x: Int) -> Int { x } // well-formed, type is @concurrent (Int) -> Int
@concurrent func global3(x: Int) -> NSString { NSString() } // error!
However, what about these?
func global1(x: Int) -> Int { x }  // all are ConcurrentValue types, of course

let g1 = global1(x:) // is the type (Int) -> Int or @concurrent (Int) -> Int?
let g2: @concurrent (Int) -> Int = global1(x:) // is this permitted?

Good questions. I changed the wording to be a bit more specific (with your help :), thank you!). The intent is that global functions automatically get @concurrent function type when their signatures permit it because they cannot capture any values. I think it is reasonable to allow an explicit @concurrent as well, but that would be redundant from the general inference. If one specified it, but a non-ConcurrentValue argument/result type is specified, we should diagnose the error.

The reason for this is a desire to provide a consistent model that scales from global functions to nested functions to closures.

In any case, the proposal has been updated quite a bit with some pretty big design changes, I think it would be best to continue the discussion in the pitch #4 thread. Thank you everyone for the input!

-Chris