Quadratic performance of the `ListMerger` in specific use case

Nickolas_Pohilets · January 13, 2024, 9:17pm

This is what I've got - Fix quadratic performance of the `ListMerger` in specific usage pattern by nickolas-pohilets · Pull Request #70910 · apple/swift · GitHub

In this example, number of tasks is determined by the tree structure. Tasks are queued on the actor or global executor. Since tasks are executed in FIFO order, tree is visited in breadth-first order. So size of the queue is determined by the dynamics of breadth-first traversal.

In some cases, tasks can be suboptimal as a queue entry. E.g. I see that flagAsAndEnqueueOnExecutor calls _swift_task_alloc_specific triggering allocation of the first slab before task starts executing. This probably can be optimised.

But once task starts execution, it may be suspended. And I expect this to be pretty common for async deinit. Suspended tasks cannot be reused and must be kept in memory. If we artificially limit number of concurrent tasks available for async deinit, then we would be underutilising hardware resources, because of the suspended tasks.

And what should happen when execution reaches a point where an object with async deinit needs be to destructed, but new task cannot be started? Should we create and enqueue some kind of task-launcher-job? Assuming that we can avoid creating task allocator until task actually starts executing, I think such job would be not much different from the task itself. I think it is better to invest into making non-started task as cheap as possible.