Doug's concurrent merge sort + actor counters examples

Over in the async/await proposal review, Doug posted a toolchain and some examples using async/await. I have some comments and observations, but they have nothing to do with that proposal review, so I broke them out here.


In the Actor Counters example, several questions:

  1. It looks like the Counter.scratchBuffer is both leaked and used uninitialized, and isn't actually useful. It isn't clear what it is about, should it just be dropped?

  2. Similarly in the worker part of the code, it has its own scratchBuffer. It isn't used, and is deallocated, but why does it exist? should it be dropped from the example?


On the concurrent merge sort, I find the async let syntax to spawn a subtask to be too subtle:

func mergeSort<T: Comparable>(_ array: [T]) async -> [T] {
  guard array.count > 1 else { return array }
  let middleIndex = array.count / 2

  async let leftArray = await mergeSort(Array(array[0..<middleIndex]))
  async let rightArray = await mergeSort(Array(array[middleIndex..<array.count]))

  return merge(await leftArray, await rightArray)
}

This is a powerful example, but there is a lot going on in the leftArray/rightArray lines, and it isn't clear to me how this will compose with other things. In the case of the actor proposal, actor message sends are always cleanly scoped and explainable as part of the message to actors: someActor.foo(x) the transfer of x to the actor is part of the message invocation that is immediately obvious.

In the example above, tasks are created, stuff is passed, a future is formed, and it is all implicit. Let me rewrite a bit of the example above, it is equivalent to:

  let tmp = Array(array[middleIndex..<array.count])
  async let rightArray = await mergeSort(tmp)
  return merge(await leftArray, await rightArray)

From what I can tell, the bottom two lines are doing something like this (where spawnTask is probably Task.runDetached?):

  let rightArrayFuture = spawnTask { await mergeSort(tmp) }
  return merge(await leftArrayFuture.get(), await rightArrayFuture.get())

why is this sugar necessary?

I'm asking for two reasons:

  1. Reducing magic to make the model more understandable is important.
  2. The magic makes it more difficult to diagnose errors.

Given that this is forming a subtask, transfers between it and the current task will need to be checked for compatibility with ActorSendable to ensure memory safety. How are these errors going to be explained to the user?

Furthermore, I thought that one of the design points of the concurrency story was that there would be multiple underlying implementation models for the actor runtime that are possible. Without making the spawning of a task an explicit library call, how is this replaced? What happens when we want new forms of spawning?


That example is also a bit suboptimal because it is doing tremendous amounts of copying of the array data. I understand that the original example was doing that to, but I think another good example would be to look at the equivalent code when doing an in-place version of this with an UnsafeBufferPointer into an Array. Where do things like @UnsafeTransfer go in this task spawning design?

-Chris

3 Likes

In-place merge sort is possible but really, really complicated; you're basically asking Doug to either spend hours on this not-real example or to rewrite the code to use a different sort.

EDIT: Also, given that this isn't meant to be library code, I'm not sure what the point would be.

I think the point is this. Most innocent developers, seduced by keyword compiler magic, will probably ship code like this and think that the solution is fine when it actually has performance problems that are difficult to understand and diagnose (especially in the absence of first class algorithms implemented by specialists in the library).

2 Likes

That has always been a problem, even more so for a callback-based and lock-based design. async-await wouldn’t solve that. It only aims for safety (as you'll always remember to protect state) and more explicit flow-control.

1 Like

I thought the subtask would just run its partial tasks serially in the same context as the current task. No need for ActorSendable for this.

Although if that's the case it makes the whole example rather pointless: spawning subtasks for a purely compute operation that only awaits on its own computation results all running in the same thread. Perhaps the example is just misleading.

Correction: not entirely pointless since if you were running it on the main thread (for instance) it'd be able to interleave event handling with your sorting operation.

async let explicitly exists specifically to introduce concurrency. Clarified downthread

Is there a difference between these two examples?

// func f()

// 1
await Task.runDetached {
    f()
}

// 2
async let task = f()
await task

Likewise:

// func g() async

// 3
await g()

// 4
async let task = g()
await task

If examples 3 and 4 behave differently then that concerns me. You should be able to split off the continuation without changing the context in which the function is called.

I would rather forbid example 2 than make 3 and 4 behave differently. I don’t personally see much value in supporting case 2 when case 1 is clearer and seems to meet all the same use cases.

Scheduling partial tasks on a run loop is a form of concurrency. It's not preemptive, nor parallelism, but it is concurrency.

I see this has been clarified elsewhere: On the proliferation of try (and, soon, await) - #85 by Douglas_Gregor

So the answer is that it can either run in parallel, or it can be serialized, and you can't really distinguish this from the code. I feel we're missing this distinction in the language that'd allow us to request one or the other.

I guess we could have:

async(serial) let = await mergeSort(Array(array[0..<middleIndex]))

or:

async(concurrent) let = await mergeSort(Array(array[0..<middleIndex]))

and the later would require ActorSendable arguments and perhaps a mergeSort that is isolated from the global state, somehow.

1 Like

How is it difficult? It is exactly the same algorithm, just passing UnsafeBufferPointers down to the child tasks. The merge is serial for each level of recursion just like it is in the implementation he posted.

-Chris

It has nothing to do with concurrency. If I remember correctly, it is not possible to merge two adjacent sorted arrays in-place in the general case without either linear scratch space, a quadratic number of moves, or re-sorting a portion of the array. There is an attractive idea of swapping "unwanted" elements from the first half into the start of the second half, but it falls apart if you try to work it out. Consider what happens if you take an element from the second half twice, then one from the first half (i.e. one of the two elements you've moved into the second half) — you start to see how the bookkeeping for tracking the "sorted" elements that have been moved out of the first half gets arbitrarily complex (unless you keep the moved elements in order, which requires O(N) insertions at each step).

There are non-merge sorts that you can do in-place, of course.

1 Like

Assuming for a moment this algorithm can work in place, I think you also need to add logic to align the split to word boundaries.

I'm confused. Are we talking about preallocated linear-space merge sort or sub-linear space merge sort?

That's mostly detritus from me messing around with verification. I've cleaned up the gist a bit.

The sugar is doing a couple of things that are hard to guarantee with a library solution:

  1. The async let is executed in a child task whose lifetime is tied to the lexical scope of the async let. When we exit the scope, the child task is implicitly canceled. This maintains the task-tree structure implied by structured concurrency.
  2. The type of the variables in the async let is the type of the future's result. An async let is a let, but with the initializer's computation performed concurrently.
  3. There are a couple of syntactic micro-optimizations here (to drop the need for await/try on the right-hand side) that we can only really do in the language.
  4. The compiler and optimizer can reason about async let because it provides a static description of the child tasks within a given scope, allowing us to (e.g.) allocate all of the various async call frames together rather than in separately.

Taking these backwards, (4) could perhaps be done with compiler heroics and/or specialized annotations (like we do with Array). (3) we could live without; at worst, it's just a little noise. (2) seems like the kind of thing one can do with a child-task-spawning property wrapper. I've tried that, because I'd really like it to work out, but really this brings us to (1): we do not have a way in Swift to create an instance that's effectively pinned to its stack frame: one can can't be captured in an escaping closure, can't be copied out of its location, etc. If we had one of those, then something like what you suggest to spawn a child task:

let leftArrayFuture = spawnChildTask { await mergeSort(tmp) }

but limit it to maintain the properties of structured concurrency:

print(leftArrayFuture) // error, you can't copy it anywhere
let closure = { ... leftArrayFuture } // error, you can't capture it in an escaping closure
return leftArrayFuture // error, you can't return it

If we had such a thing, async let would be sugar for it. But async let is both syntactic sugar and semantic restriction tied together, providing a lightweight way to introduce concurrency via child tasks while keeping you solidly within the confines of structured concurrency.

Task groups try to do the same, because they keep track of the child tasks you create within the task group's scope (its closure) and make sure they finish before the closure exits. However, they're syntactically heavyweight (you need to introduce a new closure to act as the "scope" for the task group) and require users to promise not to break the structured-concurrency model (e.g., by copying the task group out).

Sure, you mean something like this?

  func test() async {
    var array = [1, 2, 3]

    async let lastElement = array.removeLast()  // DATA RACE!
  }

The prototype implementation currently produces the following:

race.swift:12:29: warning: local var 'array' is unsafe to reference in code that may execute concurrently
    async let lastElement = array.removeLast()
                            ^
race.swift:10:9: note: var declared here
    var array = [1, 2, 3]
        ^

This is the same checking described in the closures and local functions part of the actors pitch. Essentially, we treat the initializer of an async let as running concurrently with any other code, so references to local vars (or, with ActorSendable in the mix, local let's that are not ActorSendable) are flagged as problematic.

Note that the checking works just as well if you use some library solution to launch a task, whether it's a child task or a detached one, because it's based on establishing whether the closure "may execute concurrently with" the enclosing context. It helps protect actor state as well as local variables.

Don't read too much into the fact that it's a warning; that's just staging in the prototype. It can be a hard error.

I don't think task creation is meant to be replaced. Most of the customization we're looking at involves enqueueing tasks on actors or replacing executors.

We can tweak my example a bit to make @UnsafeTransfer relevant:

  func test() async {
    let object = MyClass()
    @UnsafeTransfer let object2 = MyClass()

    async let x = object.doSomething()     // error, object used from concurrency code has non-ActorSendable type MyClass
    async let y = object2.doSomething()   // okay, Compiler assumes no data race because object is marked as @UnsafeTransfer, otherwise it would complain.
  }

Doug

[EDIT: I realized you had more questions, so tacking on to this]

4 Likes

Thx!

Right but this is sugar for one part of the more general task group APIs. This desire for lexical scope checking isn't new or specific to this proposal: we ideally want the same thing in the withUnsafePointer APIs, withoutActuallyEscaping, etc. In practice, those APIs aren't graced with syntactic sugar and don't provide that static guarantee of correctness, but they are highly precedented and safe enough in practice.

My understanding is that the failure can still be handled "safely" with a dynamic check. I bet that future work on ownership types would even allow directly expressing the nice thing with static checks.

I agree that this sort of syntax optimization requires a language feature, but I personally don't think it justifies it. The type obfuscation and magic behavior may look great in small examples, but it add surprising behavior.

I don't see why this should be required: unless I'm missing something, a SIL pass should be able to do the same thing with a simple escape analysis (similar stack allocation of classes). This would be a generally useful thing, instead of being tied to one syntax.

But the base proposal already has a thing in its task group API.

Right, this is a sugar only feature :slight_smile:

Awesome, that's great, I'm glad that checking is in place. Is there any reason you prefer this to be a special parallel reimplementation of the same logic, instead of having it just fall out of the existing closure logic?

Ok, but shouldn't it be possible to have multiple executors and for them to interoperate?

This is one of the big idealogical arrows through the set of proposals that I still don't understand. The proposals seem to imply that actors and structured concurrency will be "the" way to do concurrency in Swift. This is exciting, but of course the reality is that we'll have code using them with a mix of pthreads, GCD, and lots of other ad-hoc stuff over time. I don't see an obvious reason why the notion of "structured concurrency" needs to be tied to "the global executor" as proposed - it seems simpler and more flexible to keep these things orthogonal.

Such an approach seems simpler and easier to explain, allows partial adoption of new features (e.g. when you have an existing executor-like-thing you have to use), and forces more orthogonality into the design. For example, less special cases for the "spawn one thing" case means that the general task group API would benefit as well.

Soapbox: "Simple things that compose" is an important design principle that helps shape "today" problems well while allowing for unplanned "future" problems to work out well. Monolithic designs can be finely honed for the today case, but are less adaptable as the language and world around it evolves.

Ok, I'm glad that there is some answer here, but I don't see why this is a great design: The transfer doesn't happen at the declaration of the variable, it happens when the async let is executed, and the two points can be very distant in the code. With this approach, the attribute would be better spelled UnsafeTransferrable since it is a property of the declaration and not a property of a use. Furthermore, this approach really doesn't work well with rvalues in general - things like values returned from functions, arguments to the current function that are being passed into the async let, etc.

In contrast, if you go with a closure based design (of any sort), you'd have this behavior:

 func test(object3: MyClass) async { // Arguments are fine!
    let object = MyClass()
    let object2 = MyClass() // Ok, these are just declarations.
   ....
    let x = spawn { object.doSomething() }    // error, object used from concurrency code has non-ActorSendable type MyClass
    let y = spawn { [@UnsafeTransfer object2] in object2.doSomething() }

    // Works with arbitrary rvalues
    let z = spawn { [@UnsafeTransfer object3] in object3.doSomething() }
    let z2 = spawn { [@UnsafeTransfer object4 = foo()] in object4.doSomething() }

I think it is very nice that this all dovetails with existing syntax and programming language concepts we already have, rather than introducing a basket of new special cases and concepts. It also clearly delineates where the concurrency boundaries / captures are and allows library-based extensibility.

In any case, I can see that there are tradeoffs here. Given that this is sugar, I would recommend splitting this out of the base "structured concurrency" feature and tackle this in a follow up once we get some experience with the base proposal.

-Chris

3 Likes

To be clear, it's not a "special parallel reimplementation"; its syntactic sugar over a concurrent closure.

The whole problem here is "transfer". This is part of the reason I've been moving toward "shareable" or just "concurrent value" terminology.

I see the appeal of putting the attribute on the closure capture, because it sorta mirrors putting it on a parameter to say "yes, I know this specific instance is concurrently unsafe". OTOH, this isn't a place where we have attributes today, capture lists aren't that common, and this attribute---whatever it is called---is useful for more than captured local variables.

Doug

Ok, I don't think that really affects the point though. SharableValue makes sense as a value that gets captured, SharableDeclaration would make sense for an attribute on a let declaration.

I agree that capture lists aren't that common, but neither should the need for unsafe transfer attribute (whatever it is called), so I don't see how that is a problem.

I also agree with you that we don't currently have attributes here, but we clearly can. It composes properly, and is a natural extension of the model.

Further, you didn't really address any of the points I made.

-Chris

Perhaps I wasn't direct enough: putting it on the capture list only works for local captures. This same notion is also important on properties, functions, subscripts, and anything else that's potentially used from outside the actor's isolation domain.

Most of the rest of your note is essentially reasons why you want async let to be split out from Structured Concurrency, with many claims of it having special cases (most of which are wrong). The second pitch ensures that async let is presented as syntactic sugar over task groups so we can reason about just the one thing, understand what the syntactic sugar buys, and remove the syntactic sugar if that's the right thing to do. I don't consider going point-by-point to be useful to move the needle forward here.

Doug

Setting async let aside:

I really don't understand what you mean here. Can you please provide some examples?

What is your recommendation?

-Chris