Swift Concurrency and Decoding

manfred · November 12, 2023, 5:39pm

Let's say I have multiple data blocks that need to be processed when decoding an object in init(from: Decoder). I would like to do it in parallel to speed up decoding. How would I do this with Swift Concurrency?

I could wrap processing in a Task {} block, but then I would have an improperly initialised object for an undetermined time.

taylorswift · November 12, 2023, 6:12pm

you probably want to use TaskGroup.

manfred · November 12, 2023, 7:25pm

But I cannot wait for the TaskGroup to be completed in init(from: Decoder).

taylorswift · November 12, 2023, 8:04pm

generally, you would want to decode entire "chunks" in each child task, and then assemble the decoded items outside of an init(from:) context.

is there any way you can parse raw chunks from the data without decoding it?

manfred · November 12, 2023, 8:14pm

I can process the raw data asynchronously outside of init(from: Decoder), but then I have a improperly initialised object the must not be used until this is done. And there is no way to enforce this.

wadetregaskis · November 12, 2023, 8:41pm

Can you elaborate a bit more about why the performance of decoding is a concern for your use case?

The Decodable API generally assumes the data is readily available in memory (or at least a memory-mapped file). It's not designed to have any real kind of blocking (e.g. streaming input data over a network).

jaleel · November 12, 2023, 8:42pm

Can you elaborate a bit more? A bit hard to theorise without some specific example.

What does stop you process the raw data asynchronously outside of init(from: Decoder) and then pass it in different init?

Or as already suggested you can use TaskGroup, like

let decodedObject = try await withThrowingTaskGrop { group in
   for thing in something {
      group.addTask { try await customDecode(of: thing) }
   }
   return try await group.reduce(into: Object()) { $0.add($1) }
}

manfred · November 12, 2023, 8:51pm

I need to basically uncompress/unzip multiple data blocks, and I would like to do it in parallel.

manfred · November 12, 2023, 8:53pm

That just shifts the problem one class up.

taylorswift · November 12, 2023, 9:08pm

Codable is about schema, it’s not about data compression. is there a different reason why you’re routing this stuff through Codable?

Jon_Shier · November 12, 2023, 9:11pm

There is no way to block synchronous work while waiting on async work in Swift concurrency. I suggest you write your own AsyncCodable equivalent that gives you an async interface in which you can decode properties in parallel. You'd still be able to deal in Codable types but your top level interface would allow you to decode concurrently at the granularity you want.

But as others have asked, you probably want to explain your problem in more detail so that more solutions can be properly considered.

manfred · November 12, 2023, 9:25pm

Is it unusual to compress data before it's encoded? Is it unusual to have to decode compressed data? Doesn't the JSON decoder have to do it when it deals with Base64 data?

When should this be handled if not during encoding/decoding?

taylorswift · November 12, 2023, 9:29pm

yes, compression is best done at the database driver level, there are few benefits to storing compressed blobs within database schema. (some exceptions: images, video, etc.)

base64 isn’t a compressed data format, it’s meant for compatibility between text and binary formats. how much base64 data are you decoding?

wadetregaskis · November 12, 2023, 9:36pm

Perhaps your best approach is to use GCD, specifically concurrentPerform(iterations:execute:). That's simplest as it already blocks for you, so you don't need any boilerplate. It's probably safer as well, as your decompression tasks will effectively be blocking temporarily which is in a legal grey area for Structured Concurrency, but is generally fine in GCD (GCD will over-subscribe the CPU cores and utilise pre-emptive multi-tasking to allow other work to make progress simultaneously).

If you really wish to use Structured Concurrency, you can, but it'll be more work. You can use either async let x = …, if you have a small number of compile-time-deterministic elements to decode, or TaskGroup otherwise. You'll need to wrap those in a Task, of course, in order to get an async context. You can then use a semaphore to block in the synchronous parent function until all the async work is done.

For the output of the decompression tasks, I suspect you won't be able to just assign to member variables directly (doing so probably captures self and the compiler won't like the mix of shared references and mutation) so you might need to use a concurrency-safe collection or wrapper (e.g. an atomic), as a local variable in your init(from:) function.

Be aware that if your init(from:) method is invoked in an asynchronous context you're technically not permitted to block - it can lead to deadlock. You might be able to have the compiler help you prevent that by annotating it with @available(*, noasync) (I'm not sure off-hand how that will interact with Decodable conformance).

Keep in mind that Decoder is not concurrency-safe, so you'll need to extract the compressed data from the serialised form first, serially, then pass those compressed blocks out to your decompression tasks.

manfred · November 12, 2023, 9:39pm

I didn't mention databases. I'm reading from and writing to files.

I know Base64 is not compression. I was just giving an example of data that needs to be processed ofter decoding - and is by the JSON decoder - because you said processing doesn't belong in the decoding path.

taylorswift · November 12, 2023, 9:46pm

can you describe the file format?

manfred · November 12, 2023, 9:50pm

These are some valuable suggestions. Thanks!