The distinction goes back to the definition of parallelism vs concurrency.
When you are performing a parallel compute, it means "I want to run all these work items that are part of the same parallel compute workload, as quickly as possible". In dispatch, that means that the closure to concurrentPerform and the n iterations it needs to execute, are all part of 1 big parallel workload. We therefore have optimizations to make sure that work is executed across all cores - with intelligent work-stealing across parallel worker threads to run workloads quickly and efficiently. We are able to do these optimizations for concurrentPerform because of the semantics of the API - it is work that needs to happen as quickly as possible. For more details, check out the "Manage Parallel workloads" section of this article.
To contrast, concurrency simply says "These are independent work items which can be run independently". That means that the work items can be run in any order, by 1 or more threads, but there is no semantic notion that this is a workload that needs to be done quickly as possible.
A task group simply provides scoping for when the independent work items need to be done by, so that the parent task can continue. As such, it doesn't provide the additional guarantees that this workload may not be interleaved with other unrelated tasks running on the system. A thread may execute a child task from a task group and then run some completely unrelated work item on the same thread even if there are pending child tasks in the task group.
In a parallel workload, you would not want any other work to necessarily be interleaved with the parallel work that you are looking to complete as soon as possible.
To go back to your question, can you create a task group and sometimes get multiple threads to run that work and therefore get some speedup? Absolutely. But can you also get just a single thread that does all that work? Yes. I recommend not using task groups if what you are really look for is parallelism. The best primitive available today for parallel workloads, is concurrentPerform.
For more information, I highly recommend watching this segment of our previous WWDC talk which explains parallelism vs concurrency in detail: Modernizing Grand Central Dispatch Usage - WWDC17 - Videos - Apple Developer