[Pitch] Inherit isolation by default for async functions

gwendal.roue · September 25, 2024, 2:14pm

Exactly like JSONDecoder.decode. Attractive, and yet it requires a deliberate decision regarding scheduling.

Since no one is going to ask that JSONDecoder.decode is marked as @concurrent, or even regret that it is not, I'm thinking that the feature request for @concurrent sync functions is pretty weak.

hborla · September 25, 2024, 2:19pm

Just to be clear, synchronous @concurrent functions are not included in the proposal exactly because I don't have strong use cases for them. However, I still think it's beneficial to not conflate the ability to suspend with the execution semantics of a function. I'm totally open to alternative syntax ideas for @concurrent, but I'm not convinced that a variation of async is the right syntactic model.

Jumhyn · September 25, 2024, 2:20pm

I'd actually perhaps put JSONDecoder.decode in a third bucket which captures functions where the size of work being done is unbounded but under the control of the client. I don't think it would make sense to make this interface async either, because it might sometimes be entirely reasonable to decode sufficiently small data types, even on the main thread. That strikes me as sufficiently different-in-kind from APIs which knowingly do substantial amounts of work (without specific direction from the client).

In any case, I agree that the decision is more difficult for true library code with third-party clients. I am mostly talking in my cases about modularized code which is used exclusively in an application context by first-party clients, so the entire universe of use-cases can be known decisively.

juanarzola · September 25, 2024, 2:49pm

For this case of a library author exposing a @concurrent function as nonisolated synchronous, could you not write its implementation in a nonisolated sync function of the same name, expose that one, call it within the @concurrent version, and and leave the @concurrent version internal?

You won't have to refactor the whole library and the client gets the function with flexible isolation.

// keep concurrent function internal
@concurrent func decode() async {
    decode()
}

// actual API of my library
public func decode() {
    // expensive sync work
}

(you could even imagine sugar like @concurrent? to do this - generating both versions of the function)

jaredsinclair · September 25, 2024, 3:09pm

@hborla This statement you made (which is superb, I’m bookmarking it) is, I fear, the most succinct summary of the difficulties that the community has faced as Swift concurrency spreads and deepens its adoption:

However, I still think it's beneficial to not conflate the ability to suspend with the execution semantics of a function.

I think that conflation is exactly what has happened, for many, certainly not all, but many developers—enough of them that changes like the ones proposed in this thread will break their mental model.

Workaday developers focused on delivering code are pressed for time. If there’s a convenient mental model mapping from past to future, they will reach for it.

I think that happened with the move from GCD to Swift Concurrency.

GCD, as represented by ubiquitous asynchronous dispatching to a global queue, was exactly the conflation you describe for many daily tasks: Do this work on that queue at this priority and tell me later when it’s done.

Swift Concurrency obviously is quite different, but at least in one corner of its feature set, the nonisolated async function/method, it was so close to the familiar GCD global queue pattern that for many developers that’s the extent of their mental model for Swift Concurrency.

In other words: the challenge here is not only that this proposal changes behaviors, it also invalidates what I suspect is the most common mental model for how this stuff works for the middle-of-the-bell-curve developer.

This is not an argument against the proposal, but rather an argument for a strong consideration of how this change gets announced, rolled out, documented, diagnosed, fixitized, etc.

kean · September 25, 2024, 3:47pm

For what it's worth, I've been using GCD since its introduction on iOS and haven't made the connection between global queues and the async keyword until learning it from a blog. I initially assumed it was an implementation detail I shouldn't rely on and still rarely do. I don't think I wrote more than ten async functions that explicitly rely on SE-0338 over multiple apps and frameworks, and I use actors everywhere else.

If there was a high-level API to explicitly state "run this on a default executor" similar to DispatchQueue.global, I would feel more comfortable using it. (Edit: specifying this on the callee side would be great).

a strong consideration of how this change gets announced, rolled out

Ideally, in a major concurrency-focused release with source-breaking changes...

David_Smith · September 25, 2024, 6:06pm

FWIW the advantage of the "callee decides where to run" model that async functions use is that you specify it once, rather than specifying it at each call site. This completely eliminates the "oops I forgot to call that in the right context" class of errors (which for serial queues being used to protect state is equivalent to "oops I forgot to take the lock before accessing that", i.e. a data race).

What I've seen in the past is that rather than reaching for async functions (which do callee-decides), people reach for Task and then run synchronous functions inside it, which to my eye clearly looks like they're searching for a way to keep their old dispatch-based mental model and haven't considered the advantages of the other.

[edit] Just to be clear, I don't think Holly's proposal invalidates the advantage I'm describing. Isolated async functions are the ones critical to protecting state, and those still use callee-decides.

mattie · September 25, 2024, 7:41pm

I cannot shake this and what @gwendal.roue have pointed out. This is feeling to me almost like a new class of effect - expectations about runtime. Lifting stuff that's purely the domain of documentation into the type system, like "runtime is proportional to input size", or "does synchronous IO" seems incredibly powerful.

I'm not advocating for something like this. I just cannot stop thinking about it now.

juanarzola · September 25, 2024, 10:34pm

I think that you still make a good point - it's convenient to mark functions as concurrent to solve a problem but then creating another one: these functions are not as flexible, they always need to suspend, so your call site must be async, and now everything is async etc etc. The pattern I mentioned can be done, but probably won't be done if the library author isn't thoughtful about this and just follows the path of least resistance of exposing the @concurrent function as public.

It could perhaps be enforced that a @concurrent async function must actually suspend. If it doesn't, then it must not be marked async but it can be called either synchronously or awaited. That way, if a library author exposes a @concurrent function, the client of the library can still choose how to call it:

// swift: ERROR this async function doesn't suspend. FIXIT: remove async.
@concurrent func doThing() async -> SendableThing { 
	let result = doSomeSynchronousThings()
	return result
}

// Now it's synchronous. So it can be called from any actor synchronously.
// If you want to hop off the actor, 
// then fine, await it, it's @concurrent.
@concurrent func doThing() -> SendableThing {
	let result = doSomeSynchronousCPUThings()
	return result	
}

// normal concurrent async func (it's an API and that's ok, this function inherently has to be async)
@concurrent func doAsyncThings() async -> SendableThing { 
	let result = await fetchFromNetworkSavingToDatabaseLotsOfThingsNotInvolvingCPUs()
	return result
}

That way, if a library author was just making a synchronous function @concurrent because of some time complexity they are trying to "save the client from", the client still has a choice of what actor gets hit with said complexity.

If the API has to be async due to its non-forward progress nature (waiting for i/o for example), then it's ok expose a public @concurrent async function.

malhal · September 26, 2024, 12:08am

I would argue your example code is incorrect because the continuation APIs are designed to retain a new legacy object, e.g.

struct Controller {
     func send(data: Data) async {
        let object = LegacySender()
	    await withCheckedContinuation { continuation in
	    	object.legacySendDataWithCompletion(data: data) {
		    	continuation.resume()
		    }
	    }
    }
}

Now your ordering issue is prevented. It also fixes your retain cycle self. legacySendDataWithCompletion.

Andrew_Hoos · September 26, 2024, 5:11am

I love the idea of making actor isolation more understandable. I am struggling to find a way to make that possible while reducing the migration challenges outlined. As best as I understand with respect to isolation there are three kinds of functions

Isolated : Functions explicitly isolated to a specific actor (e.g. global or methods on actor instances)
nonisolated : Functions that are explicitly non-isolated (e.g. async methods on structs and classes as they behave today)
Inherited : Function that inherit their isolation from the calling context (e.g. async methods with isolation: isolated (any Actor)?=#isolation)

There seems to be three competing difficulties that this proposal is trying to rebalance

It's weird that sync method inherit isolation and async methods don't (at least without a bunch annotations and work)
It is beneficial for performance reasons to eagerly leave isolation when calling async methods
It is not currently possible to perform some calls from actor isolated functions to nonisolated async methods that involve non-sendable isolated types because you can't opt-into inheriting actor isolation unless the function author added the isolation boilerplate

This proposal makes the following changes

Change the default behavior for async methods from nonisolated to inherited and eliminate the isolation: isolated (any Actor)?=#isolation boilerplate
Change the default behavior when calling methods async methods from disinherit isolation to inherit isolation

That second bullet point breaks existing behavior and makes migration tricky. What if instead we added new syntax to opt-into inheriting isolation (See fake code below)

class NonSendableType {
   func doNonSendableThing() async  {}
}

actor A {
   var nonSendable: NonSendableType 

   func actorIsolatedMethod() async {
       await <syntax for inheriting isolation>  self.nonSendable.doNonSendableThing()
   }
}

Adding some form of caller syntax to opt-into preserving the isolation when transitioning from an isolated function to an isolation inherited function would resolve the issues highlight in the motivation of the proposal, but without breaking the current behavior which people depend on in existing code. To further ease the migration I would recommend deprecating the nonisolated keyword and replacing it wth a new syntax (isolated(Never) perhaps?) if the author explicitly wants to force the method off of any actor. There would also need to be syntax for opting actor methods into the new default behavior of inheriting ( isolated(any) or isolated(inherit) perhaps?) would be good. I acknowledge that this does not fully resolve the difference between sync and async methods called from actor isolated functions, but it makes the choice more explicit.

On reading this I am wondering if there is even a need for explicitly isolated(Never) function or if users just need better controls for preserving or disinheriting isolation. The best idea I have for why it might be needed is methods that are known to be computationally expensive or something to do with NIO and I/O EventLoops? But I can't concretely come up with one either

sliemeobn · September 26, 2024, 9:34am

Quick idea that I think might fit inside this proposal (or maybe a future direction):

As posted above (repeatedly ; ) I am convinced that the new default proposed here is the correct one, and in "general purpose" code it is almost always what you want.

I am, however, sympathetic to the fine folks who spend most of their time in UI application code, where quickly hopping off the main actor is a thing that needs to be very easy and comfortable.

So, long story short, could we allow @concurrent (or whatever it will end up being) on struct and class declarations? This would be the counterpart to a @SomeGlobalActor annotation, and would mean "all my stuff runs on the global executor".

@concurrent // <- still not sure about the name
struct AllMyUIBackingCode {
   func doAThing() async { }
   func doAnotherThing() async { }
   func doSyncWork() // <- not sure about this one
   var something: Something // <- not sure about this one
}

I haven't thought about the details (like, does this only work on @Sendable types, what about synchronous functions, what about properties/fields?)

But it feels to me there is some low-hanging fruit here to annotate your API wrappers or background-calculation types with @concurrent and have a nice life off the main thread.

nkbelov · September 26, 2024, 10:23am

I've also expressed earlier that type-level annotations like this could be a good fit, because it feels like a very rare occasion that one would need such fine-grained control over method isolation on a type that they would want to annotate each function individually.

However, I'll counter that the use case of wanting to switch to the global executor stops at being a convenience to hop off the main actor. It is about hopping off any actor, and not only just for performance / contention reasons.

Semantically, when a function stays running on an actor, it IMO should have something to do with that actor's state. Imagine having a parallel universe JSONDecoder that provides a decode() async method; it's also still non-Sendable. If kept as an actor's property, like so:

actor MyActor {
    let decoder = JSONDecoder()
}

it indeed is a part of the actor's state and needs be protected.

However, when instantiated in a function (which is arguably more common)

actor MyActor {
    func foo() async {
        let decoder = JSONDecoder()
        await decoder.decode(something)
    }
}

it resides in a disconnected region, and I'd argue that it should not inherit the actor — or there should be an easy-to-apply mechanism to switch this execution off. Generally, the day-to-day task of decoding a bunch of network responses is, as they say, embarrassingly parallel, but if the decoder inherits the actor, all invocations of foo will be made serial for no technical reason.

CFUsername · September 26, 2024, 10:55am

I am bit confused about the proposed change. Does it affect the actor reentrancy ?

For example, with the code from the proposal, what will be the output?

class NotSendable {
  var counter = 1
  func performSync() { ... }
  func performAsync() async {
    print("Started")
    // Some heavy operation
    print("Ended")
  }
}

actor MyActor {
  let x: NotSendable

  func call() async {
    x.performSync() // okay

    await x.performAsync() // okay
  }
}

let actor = MyActor()
await actor.call()
await actor.call()

Output A:

"Started"
"Started"
"Ended"
"Ended"

or

Output B:

"Started"
"Started"
"Ended"
"Ended"

mattie · September 26, 2024, 12:20pm

The pitch does not affect reentrancy at all. However, it has the potential to reduce the number of places where code would leave the current actor. And, as @saagarjha points out, this ability can be absolutely essential for solving certain kinds of problems.

mattie · September 26, 2024, 12:28pm

I think this is actually a fantastic illustration of how difficult these things are to reason about.

This translation you have done, which many people (myself included) do, is subtly incorrect. This code has now shifted from start-now-finish-later to start-later-finish-later, and that's exactly what this proposal is attempting to address.

To make this send async without modifying the semantics, you must use an isolated parameter.

malhal · September 26, 2024, 12:40pm

Hi mattie, could you explain why you think it is now start-later because I don't believe it is.

mattie · September 26, 2024, 1:20pm

The explanation is up a bit, in this thread:

The key is this send function is now a non-isolated async function, which means it will execute on the global executor. Within the same task, you are right, ordering is preserved. But across tasks, which is an extremely common scenario particularly with UI applications, it can be lost.

mattie · September 26, 2024, 2:22pm

I'm genuinely surpised to hear these suggestions, because my experiences have been exactly the opposite. I haven't yet encountered a situation where I've wanted some kind of stateless background type. But I have regularly found places where I have this one tiny bit of synchronous work that can be expensive and can and should be done off-actor. JSONDecoder is a fantastic, real-world example.

This is the critical concept.

I was just imagining what would happen if we made this change without the ability to annotate functions as @concurrent. I think it would introduce two problems (but please tell me if I'm missing something).

The least important is reducing reentrancy. I don't think this is something to panic about at all, but I do think it is a legimate compatibility concern.

But by far, the most important one is synchronous work now must block an actor. And, there's no good way to deal with this, because as far as I know, there's no other way to put work on the global executor. I agree with @sliemeobn that @noactor doesn't look great, but it does fit the concept.

But more and more, I'm not sure async functions have a problem at all. I think the problem is exclusively how to deal with synchronous work?

I think @Andrew_Hoos (and others!) are on to something about this being primarily a caller concern. I have aboslutely hit situations where waiting for some synchronous work was important. Sometimes you need to block the actor. Can I riff on his idea of modifying await, but make it look a little like defer?

let result = try await {
  // this runs on the global executor
  try nonIsolatedExpensiveWork()
}

Edit: I forgot about async let! Ok so there's an existing mechanism for handling this, maybe that's sufficient?

Jumhyn · September 26, 2024, 2:24pm

I think async let would continue function to offload work to the global executor?