[Pitch] Inherit isolation by default for async functions

8675309 · September 24, 2024, 4:13pm

I just have anecdotal evidence that nonisolated has been particularly unituitive for devs in our codebase. It's not quite an English word or phrase, (not isolated? unisolated?) and really doesn't convey its meaning on its own.

I don't think the word isolated should even be a part of Swift concurrency; instead just base the keywords around actor, e.g. anyactor or the like.

tclementdev · September 24, 2024, 6:00pm

hborla:

Finally, I believe the current default optimizes for the advanced programmer who understands parallelism, which hurts progressive disclosure of the concurrency model. The philosophy I think we should follow is that the language should not impose parallelism on programmers by default; that should be a deliberate decision that is explicit in source (while still being convenient to express). There are so many programmers writing asynchronous code just because they need the ability to suspend, e.g. because they need to call some library API that suspends while it waits for other work to complete. In this case, the programmer doesn't actively want parallelism, and they don't need to be confronted with data-race safety errors because they're not trying to pass state across isolation boundaries.

Yes, 100%. This seems like unpopular opinion these days but personally I strongly believe that the default should be single-threaded and multi-threaded should be opt-in and explicit. Which as you're saying seems to fit nicely with the idea of progressive disclosure of complexity.

hborla:

tclementdev:

This is bothering me because it seems like the same kind of problem this proposal is trying to address, yet another confusing and unexpected implicit hopping. I think Task should always reuse the current isolation. It does when used from inside an isolated function, and I would expect the same to be true in this case. We have Task.detached and (upcoming) closure isolation control to explicitly move the closure to a different isolation. I hope the default implicit behavior of Task can be made consistent.

This design decision was made specifically to make the behavior of Task.init consistent across nonisolated synchronous and async functions. We can't change the behavior of unstructured task isolation in nonisolated synchronous functions, because that would require changing the ABI of every unspecified synchronous function. I do not think that's worth it; that's a much larger ABI transition than the one that's currently proposed.

It's also not true that Task.init always inherits isolation in an isolated method. Isolation is only inherited if you capture the isolated parameter in the closure body. The behavior in the proposal more closely matches the existing behavior of task creation in isolated methods.

ah this is too bad. I really find this confusing and this goes in the opposite direction of what I said just above. I just wish Task.init would always inherit isolation regardless of captured parameters and explicit syntax would allow developers to change that (e.g. Task.detached, closure isolation control, etc...).

Alejandro_Martinez · September 24, 2024, 7:17pm

Thank you all for the mindful conversation. Although I still think I prefer the current state of affairs, I admit after reading @hborla I can see the appeal on fixing some aspects of it. I'm just yet not convinced changing the default and adding other syntax is the way to go.

I'm glad that the difference between parallelism and concurrency has been brought up because is something I think is worth raising after reading a few things like:

async(global)
@concurrent
offload work to the cooperative thread pool

is this really what we are talking about? There is a nuanced difference to talk about non-sticky behaviour when calling an async function and all the above wording. Is the global cooperative thread pool part of the language semantics? In my mental model that's just an implementation detail of Apple platforms and not part of the language. So IMO what we are discussing here is just the sticky or non-sticky behaviour, aka staying or not on the isolation of the current actor.

I've raised myself the conversation about avoiding work on the main actor and instead switching it to the cooperative thread pool by default, but that's just a specific example where the current behaviour shines, and I don't think that the specific actor should be the focus of the conversation.

I'm not sure there is much of an imposition. In my mind, Swift Concurrency is about concurrent behaviour, it doesn't say anything about parallellism, that's a property of the runtime. So I don't think the current behaviour, or the pitched for that matter, affects that in any way.

I find this a bit confusing given the context above. If we want to fix the issues raised in this patch while keeping this convenience, would be an option to provide an actor that represents the cooperative pool in the same way we have the main actor? At least that would keep the semantics of the language in terms of isolation instead of making new annotations for explicetly specifying where an async function should run.

This are great improvements that I'm glad are being brought up <3

Thanks for the effort put going through the feedback. It's a privilege just to have the opportunity to be part of such an insightful conversation. Hugely appreciated no matter where this ends. Journey before destination <3

Jumhyn · September 24, 2024, 7:21pm

If you don't stick on the current actor then you need to somehow define where you'll end up running. The cooperative thread pool might be considered an implementation detail, but I believe the terminology that SE-0338 used to define where nonisolated async functions run is 'the generic executor not associated with any actor' and I think it makes sense to consider that part of the language semantics.

mattie · September 24, 2024, 7:50pm

This is totally true. But, there is something I do not understand. Why would it be important to be able to synchronously call a function which is explicitly marked as executing concurrently?

hborla · September 24, 2024, 8:47pm

The problem is that newcomers do have to think about which code is running in the background because parallelism admits the risk of data races, and when compiling in the Swift 6 language mode, the result is compiler errors telling you that your code is incorrect. What has changed between SE-0338 going through the Swift evolution process originally and now is that the data-race safety model at the time was not complete; it had many holes where the compiler did not diagnose violations, so I don't think the usability impact of always switching to the generic executor was well understood.

For example:

Most of the classes I've seen in app code that wrap calls to async APIs have mutable state, or are inheriting from a superclass that has mutable state. If this class is not Sendable and it's stored in your SwiftUI view, it's invalid to call the async method from within the .task {} modifier in the Swift 6 language mode. var body is isolated to @MainActor, and .task {} inherits that isolation, so when you call a nonisolated async function on a class with mutable state, you now have the potential for a data race, because other tasks on the main actor can access that class while the async function is running in the background. Compiling in Swift 6 mode will fail, and the solution I've seen people reach for most frequently is inheriting the isolation of the caller via isolated parameters in the async methods of the class. In my experience, it's also not typically the case that these methods are doing anything super expensive.

I don't agree that there is a technical argument that it is more correct for more functions to run on the generic executor by default. There are certainly a lot of cases where it's best to run code on the generic executor, and I agree that actor-instance-isolated methods are relatively rare today, but it is extremely common to isolate code to the main actor, especially in application code. When app programmers write async methods on mutable classes, many of them either reach for isolation inheritance, or they isolate the entire class to the main actor, because the async method is impossible to call from the main actor unless the class happens to only be stored in local variables in disconnected regions.

Nope, the @concurrent attribute will always switch off an actor to the generic executor. I will clarity this in the proposal.

I want to underscore this, and add: I think it is better to have more code run on the main actor and have a performance problem than it is to have more code running off the main actor and have a data race. Neither are good problems to have, of course, but reproducing and fixing a performance problem is easier than reproducing and fixing a data race.

I see this a little differently. I think that writing this explicitly in an API is not pretty and it's tedious to write, but I believe SE-0420 (which introduced isolation inheritance via optional isolated parameters) was a necessary step before making the change proposed here; this proposal builds on SE-0420 by passing down the optional isolated argument implicitly. The ability to express this at all had to come first.

The performance impact of executor switches vs. long-running code on an actor depends on a variety factors such as how many executor switches are happening in your code compared to how much work the function does before the next switch, whether you're doing a real context switch between the main thread and another thread, whether the code running on an actor suspends to give other tasks the opportunity to run, etc. This is why I believe that neither default is undeniably better for performance for all code, and you need performance analysis tools to understand where the inefficiencies are.

hborla · September 24, 2024, 9:18pm

Ah, let me clarify what I meant: you're right that the language has no notion of the threads or cores in the underlying runtime system. At runtime, there is a thread system that tasks run on, but you don't need to know the specifics of the platform when you're writing code. Instead, the language assumes that any code that can execute concurrently between different isolation domains may execute in parallel at runtime, and it only diagnoses data-race safety errors when parallelism is possible. What I really mean is that the current default imposes data-race safety errors onto programmers who are not trying to share mutable state across isolation domains. I consider this to be imposing parallelism because on many platforms, that is the runtime effect.

tclementdev · September 24, 2024, 11:09pm

One follow up question on Task not inheriting the isolation. If we do want that, what will be the best way to do it? Will this work:

func foo() async {
    let isolation = #isolation
    Task {
        _ = isolation
        // more code here
    }
}

kean · September 24, 2024, 11:17pm

I wonder if, assuming the current proposal is implemented, there will be any scenarios left where you would want to express it as a user?

re: @concurrenct. Wouldn't you now be able to use Task.detached to implement it? This is how I naively used to do it until I discovered that async functions already do roughly that by default. When I started using Swift Concurrency, I was under a strong assumption that async methods without a specified actor run on the current one. Or are there any issues with it?

P.S. I feel that the terminology is getting out of hand, especially with "(non)isolated".

David_Smith · September 25, 2024, 12:09am

Task.detached has a number of other effects that you don't usually want, such as disabling priority and voucher propagation. It's really a very niche tool that probably shouldn't be getting as heavy use as it is (and indeed, the libdispatch equivalent of it, DISPATCH_BLOCK_DETACHED, is so little-used that nobody I've shown it to knew it existed until I did). I strongly suspect most people using it are using it to disable actor inheritance and are entirely unaware of its other effects.

mattie · September 25, 2024, 12:22am

I think that'll work. It's weird though. That's a lot of "isolation", and it's really non-obvious.

This area has some overlap with closure isolation control that will may have to be resolved. But, I'm confident we can figure that all out.

mannuch · September 25, 2024, 1:08am

Given that @concurrent is part of a function's declaration, what if we're in a situation in which we're calling a function from a third-party library?

For instance, assuming this proposal is accepted, let's say we're writing some actor logic and we are calling an async function from a library we don't own. What if we, the callers, don't want this function to block up our actor?

// ThirdPartyLibrary
func thirdPartyDoSomething() async { 
  // could take a while... 
}

// MyApp
import ThirdPartyLibrary

actor MyActor {
  func callSomething() async {
    await thirdPartyDoSomething() // Don't want to block up our actor, context-switch is worth it
  }
}

I suppose we could wrap this function in our own that is @concurrent.

// MyApp
@concurrent func concurrentDoSomething() async {
  await thirdPartyDoSomething()
}

I'm not sure how often this would come up in practice but just wanted to throw it out there!

(I guess we could write a macro to do this wrapping for us if the need arises enough?)

kean · September 25, 2024, 3:18am

It's really a very niche tool that probably shouldn't be getting as heavy

Maybe a regular Task then?

I know everyone has been having a great time offloading the work to the background with just a single async keyword, but maybe there is no need for syntax sugar like @concurrent for the off-chance you are doing it without using actors?

func doStuff() async {
    // This could use some other higher-level API.
    await Concurrency.globalConcurrentExecutor.perform {
        // run long-running synchronous work
    }
}

not sure how often this would come up in practice

There has been an established practice of not running any long-running code synchronously in completion-based APIs, so perhaps it could be reestablished for async.

ksluder · September 25, 2024, 3:46am

I think such a policy would be lacking context. It’s not a good idea to have long-running completion handlers in UI applications when those completion handlers are invoked on the main thread. But a program written in continuation-passing style is composed of nothing but long-running completion handlers or async functions. That program architecture just doesn’t mesh well with how most UI frameworks (including Apple’s) work.

saagarjha · September 25, 2024, 7:59am

(This is a fast-moving discussion, so apologies for butting in to reply to this) This is actually one of the main reasons why I am +1 on this proposal. Being stuck on the calling isolation "for a bit longer" is actually very useful; for example, it allows for putting queueing logic at the top of your function, with the knowledge that within an isolation domain all calls are sequenced. This has been the cause of real-world bugs I have seen–here is a toy example:

func send(data: Data) async {
	await withCheckedContinuation { continuation in
		legacySendDataWithCompletion(data: data) {
			continuation.resume()
		}
	}
}

The bug here is that calling send(data:) in program order (let's say the two calls come from the same isolation domain) does not necessarily mean legacySendDataWithCompletion(data:completion:) is called in the same order. What might happen is the first call to send(data:) is scheduled but never executed, and the second one goes through and hits legacySendDataWithCompletion(data:completion:) first. This might have observable side effects if this is doing something like sending data over a socket, which would now appear out of order. In this case the critical section before control is lost is (depending on how you look at it) one or even zero lines of code. But the invariant of "I called this and it hasn't done anything async yet, so surely no switching has happened yet" is pervasive and creates concurrency issues. Of course, inheriting isolation explicitly "solves" this problem–but my point is that I believe this is how people already feel the code should behave. Especially since we bridge callbacks in as async functions which do behave this way, and from the call site it is impossible to tell them apart even though they have very different semantics.

Alejandro_Martinez · September 25, 2024, 8:27am

I'm starting to think that I've just designed my codebase in a way that has let me avoid these issues so far. In the light of that I guess I will just accept there is something I'm missing and trust the judgement of the rest of the community

That makes sense, I will have to ponder a bit more about it to fully grasp the consequences.

Thanks for the insight and clarifications, I hugely appreciate it.

vns · September 25, 2024, 8:37am

That’s one of my fears that this would led to implicit and hard-to-spot cases like this ordering that relies on the assumed behavior of the function and really easy to break. Not that this is a reason not to make the change, but I think this is a wrong motivation and implying some ordering guarantees in that way (it is gets scattered across several functions) has never been a good idea (even in synchronous world).

hborla · September 25, 2024, 11:09am

You would only be able to call it synchronously from within another @concurrent function. Calling it from anywhere else would have to be done asynchronously, because the caller will first switch to the generic executor before running the synchronous function. I don't think this is a huge expressivity win, but a synchronous @concurrent function is a more natural way to model the case that @Jumhyn described:

gwendal.roue · September 25, 2024, 12:07pm

I read and avoided talking because I haven't carefully read your pitch, but here I want to voice a concern.

I agree that SE-0338 has provided a convenient way to address a need, which is to "write a function that does not block the main thread no matter what". It is natural to look for new conveniences.

But to me, @concurrent synchronous functions is a bad idea that looks like a good one.

When writing a CPU-intensive function, developers will always balance between marking it @concurrent or not.

Library developers will not declare them as @concurrent. For example: JSONDecoder.decode. Even though most GUI applications avoid decoding server payload on the main thread, the actual scheduling of this method should remain the client's concern. Nobody here would start questioning the declaration of JSONDecoder.decode.

Application developers will declare CPU-intensive methods as @concurrent, because it is so handy. It solves the problem at hand.

But now it becomes impossible to call this function synchronously, even outside of the main thread. For no good reason.

Tests that call this function must be made async, for no good reason.

When the developer decides to wrap the function into a library, there are only two bad options:

Keeping @concurrent. Now this function is an outsider in the landscape of libraries (which do not use @concurrent even for CPU-intensive functions, because the actual scheduling of these methods should remain the client's concern. Being an outsider is not good.
Removing @concurrent. Now all call sites have to be refactored. This is not good.

I do not see any good outcomes from @concurrent synchronous functions.

Jumhyn · September 25, 2024, 1:54pm

This is a good perspective. OTOH, when using this pattern internally I specifically don't want to expose an attractive entry point for the 'completely sync' version of the work, because it should roughly never be called without deliberate consideration of the scheduling behavior (in addition to the main thread, I wouldn't want it to block an actor's work).

You could accomplish this via a naming convention, e.g., doWorkSync or doWorkBlockingSync, but at the application level I've generally found it much better for internal clients if the sync version is internal and wrapped by a module which exposes just the async interface.