TaskGroup vs an array of tasks

What’s the difference between using a TaskGroup vs an array (or dictionary, etc.) of tasks?

I understand a TaskGroup allows retrieving tasks results “continuously” (I get the result of first task that succeed, then the next, etc.), as opposed to an array of tasks where I cannot guess which task will finish first and I have to decide the order in which I’ll await the tasks.

But if I need the results of all the tasks anyway, is there any other hidden benefits from a task group that should make me prefer it over simply storing the tasks in an array and waiting for them in order?

1 Like

Correct, the defining feature of a group is collecting results in completion order. You cannot write this as efficiently using other techniques (Just Task{} and a stream etc will be heavier memory and scheduling wise than what a group does).

You'll have to do one of the two things if you wanted to not use a group:

  1. spawn one by one, no parallelism:
for work in works { 
  let t = await work.work()
  things.append(t)
}

That is meh since it is not parallel at all. So you might write this instead:

  1. some (unbounded - meh) parallelism
for work in works { 
  Task {
    let t = await work.work()
    await self.append(t)
  }
}

func append(t: T) async { 
  self.things.append(t)
}

which is meh for a number of reasons:

  • unbounded parallelism is meh in general, this just throws all the tasks at the scheduler without much control over how many are in flight at any point in time (whereas implementing such limiting is simple in a group)
  • you had to use unstructured tasks (Task{}) which are heavier than task group created tasks (group.addTask{} - this is a child task and is very efficient)
    • you're missing out on structured concurrency entirely; so propagation of task-locals is more expensive in this, as well as there being no guarantee whatsoever that all tasks complete before you "proceed" while tasks in a group keep the group waiting until they all complete
  • you cannot collect the results on the task kicking off the work easily... so you'll either pay additional hops like shown above, or you'll have to invent your own way to message into an async stream from there...

So... use a group instead for those patterns, it handles them very well :wink: :+1:

withTaskGroup(of: T.self) { group in 
  for t in tasks { 
    group.addTask { await t } // efficient child task
  }
  for await t in group { // efficiently gathering in completion order
    // back in calling task
    things.append(t)
  }
} // always guaranteed to have drained all the tasks
4 Likes

Thanks for this.

I had a third possibility in mind, which does not retrieve the results in completion order:

let tasks = works.map{ Task{ await $0.work() } }
var res = [Result<ResType, Error>]()
for task in tasks {
   res.append(try await task.result)
}

This has the cool property of keeping the order of the results, which allows easily mapping the works to their result for instance.

But like you said it throws all the tasks at the scheduler…

So what should I do if I need to map the task to the value?

So in the end I did use a TaskGroup, and did this:

  • Each task in the group will return a tuple of the expected offset in the final array + the computed value;
  • Once all the tasks have returned their result and I have them stored in an array, I return the sorted array + mapped to remove the offset.

Example here: CollectionConcurrencyKit/CollectionConcurrencyKit.swift at 1105df41b66823c09a1f3bfcf754ae04287e70ea · happn-app/CollectionConcurrencyKit · GitHub (hopefully merged by John Sundell in the original repo).

1 Like

If you're going to iterate at the end using map anyway, then you'd be better off just doing it upfront and using an array with the correct count, instead of the equivalent of a dictionary, which then has to be sorted.

return try await group.reduce(
  into: map { _ in nil } as [T?]
) {
  $0[$1.offset] = $1.element
} as! [T]
1 Like