[Pitch] Promotion of `Task.select` from swift-async-algorithms to Swift Concurrency

Are you referring to QoS overhang in dispatch? That is something that we actively work on reducing as much as possible - over proliferation of a high priority is not great for the system either. I don't think we should be using that as precedence to bring more of such behaviour to new technologies we build.

In the case of structured concurrency, we do actually permanently escalate all child tasks but the scoping that is provided by structured concurrency gives us confidence that we don't have over proliferation since the parent will eventually await the child.

In this case though, these tasks in the Task select group are unstructured tasks and once the calling task returns from Task.select(), it may not wait on it again. It sounds like we are hypothesizing that that is likely but it is not a guarantee and so I'm hesitant to say that we should have priority escalation support here at all.

3 Likes

Yes, I use that function for a long time but very rarely. Probably that's why I was surprised to see it as the next API aded to the very minima API surface of Task.

I like adding the API to Collection<Task> but just to play devil's advocate why not add it to Sequence or AsyncSequence? The list of tasks doesn't need to be finite in order to find the first one to complete? (In case we're still talking about "finding" the first one to complete vs. removing it from a given collection)

1 Like

Sequence is fine conceptually (granted it may have some sort of performance hit for doing just that and not something with an index as John indicated). Putting it on AsyncSequence does not even make sense since that is the primary use case I am trying to build up.

One other concept that I thought of last night; this could be homed on TaskGroup/ThrowingTaskGroup as an initializer. Which then the deinitialization poses a point at which we could feasibly attempt an end of the priority elevation. That would lean into the concept that this API and TaskGroup are related. However something would need to be done about the whole addTask APIs when it would be initialized as such.

As a complete outsider, naming wise, I think the operation is fine as a static method on Task with a slightly clearer name, something like Task.selectFirst(...) perhaps. I think I'd be more likely to look for it on Task rather than as an extension method on a collection of tasks.

6 Likes

First off, I definitely agree with the general need of moving this into the concurrency library :+1:

I'll go with some quick points to keep it short :slight_smile:


Where to put it and names:

  • It'd be nice to be on Task IMHO, since Task is about unstructured tasks, so it fits right in.

  • It would not fit on TaskGroup since it is about structured tasks, which this operation isn't about.

  • I personally like Task.select as-is to be honest, because of lots of prior art like that, but we can explore longer names.

  • The equivalent feature exists in Scala futures and is called Future.firstCompletedOf[T](futures:) -> Future[T] (scaladoc). This naming is useful since you are clear that it returns the first completion, regardless if the first one to complete perhaps was even a failure (!)

    • so in Swift that'd be Task.firstCompleted(of: [])
    • this means there can also be firstSuccessful(of: [...]) (or successful, whichever you prefer as spelling). This version ignores failures, unless they all fail, then the last failure is returned. Kind of like "keep trying until one of them succeeds".

Well said on the table comparing it with withTaskGroup and Task.select that table should make it into the API docs and reference guide (swift book).

  • Some version on some collections might be nice... no super strong feelings there other than discoverability of those will suck, I'd perhaps consider them as additional to make the use of Task.select nicer, but not the core API?

Priority:

  • Based on how we used this operation with Scala futures I don't think the use-case for this operation is to also await the other tasks.
  • The usual use-case I believe here is to get the "first" one and "i don't care about the rest".
    • as such, I don't think we necessarily need to boost priorities here.
    • rather, we could consider an overload that cancels all tasks as it returns... but either way, just doing this by defer { tasks.foreach(\.cancel()) } works fine so perhaps not overload is needed...

I'm not sure what you mean by a more efficient implementation inside the stdlib - do you just mean inlining behavior or something similar or actually implementing this on a lower level in C deeper in the Concurrency runtime?

5 Likes

Actually I think that will be uncommon. The use cases will eventually get all the tasks; consider merge or debounce as good reference of where that occurs. Granted some cases may cancel things like what you described.

The primary cost is that to accomplish this now I have to create N+1 additional tasks to select on N tasks. Having access to the C++ backing I think it can be done with at most 1 additional task. My guess is performance wise that is the most costly part of it in real world usage (and not contrived microbenchmarks).

4 Likes

Hm, I see where you're coming from -- implementations of stream operators. While that's true there, as a top level operation of "fire a bunch of requests, just wait for one" is more how one would use the Task.select "stand alone" and how I thought of it.

I guess it comes down to: both cases are common. Not canceling is fine then since who cares can do it manually; boosting I'm not so sure about to be honest, but it seems @rokhinip also wasn't so keen on boosting so perhaps let's just not :+1:

Overall sounds good :+1:

If we think that’s common, should we have some way to do it explicitly, like with an AsyncSequence or something similar? Or maybe a way to wait on all of them?

I interpreted the response a bit differently; it was more so that the objection was that doing the permanent boost was distasteful. If a daemon or other sensitive QoS scenario can't use things like debounce or merge because of mysterious reasons about quality of service I think it would be less than ideal. I am quite certain we can do better than that.

I don't understand how it could be an AsyncSequence when what we are trying to build is the tool to make AsyncSequences that use this behavior.

Eventually, after every pass of the iteration is done then they all should be awaited upon. But a single function itself by the very nature cannot await all of them; because else wise we have exactly the behavior that makes TaskGroup not suitable for these types of tasks.

I guess I'm wondering what the right high level API is, given that we're talking about a low level facility for waiting for one of a set of tasks to complete. Because if the expected high level usage is that we're eventually going to wait for all of the tasks to complete — maybe there's some way to back out of that, but the initial expectation is that we're waiting for all of them — then it seems like there's two interesting points:

  • Escalating all of the tasks to the priority of the waiting task is much more obviously the right thing to do.
  • It would be really good to design this API to avoid needing to repeatedly set up the wait with iteratively smaller sets, because that's quadratic work; you really want to set up the wait once and then terminate it on request.
5 Likes