Swift Concurrency and Decoding

Let's say I have multiple data blocks that need to be processed when decoding an object in init(from: Decoder). I would like to do it in parallel to speed up decoding. How would I do this with Swift Concurrency?

I could wrap processing in a Task {} block, but then I would have an improperly initialised object for an undetermined time.

you probably want to use TaskGroup.

But I cannot wait for the TaskGroup to be completed in init(from: Decoder).

generally, you would want to decode entire "chunks" in each child task, and then assemble the decoded items outside of an init(from:) context.

is there any way you can parse raw chunks from the data without decoding it?

I can process the raw data asynchronously outside of init(from: Decoder), but then I have a improperly initialised object the must not be used until this is done. And there is no way to enforce this.

Can you elaborate a bit more about why the performance of decoding is a concern for your use case?

The Decodable API generally assumes the data is readily available in memory (or at least a memory-mapped file). It's not designed to have any real kind of blocking (e.g. streaming input data over a network).

Can you elaborate a bit more? A bit hard to theorise without some specific example.

What does stop you process the raw data asynchronously outside of init(from: Decoder) and then pass it in different init? :thinking:

Or as already suggested you can use TaskGroup, like

let decodedObject = try await withThrowingTaskGrop { group in
   for thing in something {
      group.addTask { try await customDecode(of: thing) }
   }
   return try await group.reduce(into: Object()) { $0.add($1) }
}

I need to basically uncompress/unzip multiple data blocks, and I would like to do it in parallel.

That just shifts the problem one class up.

Codable is about schema, it’s not about data compression. is there a different reason why you’re routing this stuff through Codable?

There is no way to block synchronous work while waiting on async work in Swift concurrency. I suggest you write your own AsyncCodable equivalent that gives you an async interface in which you can decode properties in parallel. You'd still be able to deal in Codable types but your top level interface would allow you to decode concurrently at the granularity you want.

But as others have asked, you probably want to explain your problem in more detail so that more solutions can be properly considered.

Is it unusual to compress data before it's encoded? Is it unusual to have to decode compressed data? Doesn't the JSON decoder have to do it when it deals with Base64 data?

When should this be handled if not during encoding/decoding?

yes, compression is best done at the database driver level, there are few benefits to storing compressed blobs within database schema. (some exceptions: images, video, etc.)

base64 isn’t a compressed data format, it’s meant for compatibility between text and binary formats. how much base64 data are you decoding?

Perhaps your best approach is to use GCD, specifically concurrentPerform(iterations:execute:). That's simplest as it already blocks for you, so you don't need any boilerplate. It's probably safer as well, as your decompression tasks will effectively be blocking temporarily which is in a legal grey area for Structured Concurrency, but is generally fine in GCD (GCD will over-subscribe the CPU cores and utilise pre-emptive multi-tasking to allow other work to make progress simultaneously).

If you really wish to use Structured Concurrency, you can, but it'll be more work. You can use either async let x = …, if you have a small number of compile-time-deterministic elements to decode, or TaskGroup otherwise. You'll need to wrap those in a Task, of course, in order to get an async context. You can then use a semaphore to block in the synchronous parent function until all the async work is done.

For the output of the decompression tasks, I suspect you won't be able to just assign to member variables directly (doing so probably captures self and the compiler won't like the mix of shared references and mutation) so you might need to use a concurrency-safe collection or wrapper (e.g. an atomic), as a local variable in your init(from:) function.

Be aware that if your init(from:) method is invoked in an asynchronous context you're technically not permitted to block - it can lead to deadlock. You might be able to have the compiler help you prevent that by annotating it with @available(*, noasync) (I'm not sure off-hand how that will interact with Decodable conformance).

Keep in mind that Decoder is not concurrency-safe, so you'll need to extract the compressed data from the serialised form first, serially, then pass those compressed blocks out to your decompression tasks.

2 Likes

I didn't mention databases. I'm reading from and writing to files.

I know Base64 is not compression. I was just giving an example of data that needs to be processed ofter decoding - and is by the JSON decoder - because you said processing doesn't belong in the decoding path.

can you describe the file format?

These are some valuable suggestions. Thanks!