Async variant of `withUnsafeTemporaryAllocation`?

A few times I've found myself wanting to concurrently map over elements in a sequence and insert the results into an array or IdentifiedArray. The code ends up looking something like this.

func concurrentMap1<NewElement: Sendable>(
    _ transform: @escaping (Element) async -> NewElement
) async -> [NewElement] {
    typealias IndexedValue = (index: Int, element: NewElement)

    return await withTaskGroup(of: IndexedValue.self) { taskGroup in
        for (index, value) in self.enumerated() {
            taskGroup.addTask {
                (index, await transform(value))
            }
        }
        var indexedValues = [IndexedValue]()
        for await indexedValue in taskGroup {
            indexedValues.append(indexedValue)
        }
        indexedValues.sort(by: { $0.index < $1.index })
        return indexedValues.map(\.element)
    }
}

Allocating a whole new array just to sort them isn't very efficient, though. You can get faster by using unstructured tasks rather than a taskGroup, but in my experience, the fastest was to use an unsafe buffer.

func concurrentMap2<NewElement: Sendable>(
    _ transform: @escaping (Element) async -> NewElement
) async -> [NewElement] {
    typealias IndexedValue = (index: Int, element: NewElement)

    return await withTaskGroup(of: IndexedValue.self) { taskGroup in
        var count = 0
        for (index, value) in self.enumerated() {
            taskGroup.addTask {
                (index, await transform(value))
            }
            count += 1
        }
        let buffer = UnsafeMutableBufferPointer<NewElement>.allocate(capacity: count)
        defer { buffer.deallocate() }
        for await (index, value) in taskGroup {
            buffer.initializeElement(at: index, to: value)
        }
        return Array(buffer)
    }
}

Having to manually allocate and deallocate the buffer always makes me nervous though. Is there any reason there couldn't be an async overload of withUnsafeTemporaryAllocation? This seems like it would provide a slightly safer way to eke a bit more performance out of this kind of thing.

Alternatively, if the standard library (or AsyncAlgorithms) provided concurrentMap, that would also suffice.

1 Like

Anything that gets stored across an await has to be on the heap anyway (because the Task has to hang on to its state while it's in the executor's queue), so you lose most of the benefit of withUnsafeTemporaryAllocation. You could make your own version that just does the UnsafeMutableBufferPointer thing you've done here, so you can never forget the defer, but I'm not sure that rises to the level of "belongs in the stdlib". And on the flip side, it's a heap allocation either way, so it's possible even Array<NewElement?> will be fine for most use cases, even if it makes my micro-optimization brain itch too.

2 Likes

My understanding is that each task has a region of memory which is used to store its local variables and is destroyed when the task completes. Is that not the case?

It is not the OS thread's stack, but I'm not sure I would describe variables in that space as being "on the heap".

If that space exists and is being used to store local variables, in theory you could request some of it using withUnsafeTemporaryAllocation.

By the way - in case anybody is using this exact code, or copies it - you also need to deinitialise the contents of the buffer before deallocating it.

Alternatively you could use Array(unsafeUninitalizedCapacity: ...) and move elements from buffer to the array's storage. That would be more efficient.

2 Likes

One more thing - a natural follow-up question might be: why can't we have an async version of Array(unsafeUninitalizedCapacity: ...)?

It always allocates on the heap anyway, and you wouldn't even need to bother writing things to some temporary buffer and copying/moving them in to Array's storage if you could just write directly to Array storage. We solved this problem already.

The answer is that the standard library is unable to use async functions or closures.

The standard library has been split in to multiple modules - there's core (where Array lives), _Concurrency, and recently we've added even more modules such as Synchronization. Since core is at the lowest level, it can't use anything from the other libraries since that would create a dependency cycle. That includes async functions, because they make use of task-related symbols.

It's possible that some compiler hackery might alleviate this one day. At least for async functions and closures. Until then, the module split makes adding async APIs to existing standard library types extremely awkward.

1 Like

in C++ it would be awkward, but this is swift! can’t _Concurrency just add an extension to Array?

1 Like

Is this related to the lack of an async version of withoutActuallyEscaping?

It would need access to Array internals that aren't typically exposed. For this, it would need to:

  • Create an empty Array of the given capacity
  • Get stable access to its storage pointer (not inside of a with... closure)
  • Initialise the elements itself in async-land
  • Tell Array "hey, I just initialised some elements in you! Update your count!"

In general, the pattern is: you do some pre-async stuff, then hand an unfinished object out to the caller in the _Concurrency module where it can await, then possibly it will call you back later to prepare the object for general consumption.

For Array specifically, some of those hooks already exist for C bridging, but the last one certainly doesn't. For withUnsafeTemporaryAllocation, none of that stuff is exposed.

So you end up needing to structure things in a really awkward way and expose a lot of SPIs between core and _Concurrency.

There are so many places where standard library APIs accept a Sequence, but can't reasonably add suspending AsyncSequence variants because the number of SPIs needed to bridge the module gap would be immense. For instance,

  • Dictionary.init(uniqueKeysWithValues: some Sequence),
  • Set.init(some Sequence)
  • Array.append(contentsOf: some Sequence)
  • String.replaceSubrange(Range<Index>, with: some Sequence)
  • ...etc

There ought to be async versions of each of these which accept async sequences, but it's just not practical to implement.

I don't know about this one specifically.

1 Like

to be honest, this strikes me as exactly the type of situation that @_spi was designed for.

The problem isn't that @_spi can't do it -- it's that splitting up a function in to a bunch of SPIs with delicate invariants is not a practical, scalable solution.

I've done it. You write code to be used in the standard library, and it's relatively straightforward, easy to read, understand, audit, etc. Then you want to add an async version, so you need to split things up and add SPIs to various intermediate stages of the operation, massively increasing the implementation complexity.

If the standard library's core module were able to use async functions and closures, this wouldn't be an issue. While it remains, we will likely continue to lack async versions of many standard library functions. Sometimes this comes with a performance cost, but it always comes with an ergonomics cost for users of the language (who rightly expect familiar operations and APIs to support async code).

1 Like

Apologies if you already explained this, but why can't it use async? You mentioned "because they make use of task-related symbols", but can you elaborate? Is that a technical limitation - some intrinsic reason they have to be in a separate _Concurrency module - or a logistical one or…?

It'd be a shame to do so much hard work on Swift's concurrency support in other regards, such as regarding safety, only to not actually have great concurrency support in the currency data types.

1 Like