Swift Concurrency: Keeping Completed Tasks as Cache Objects?

bdkjones · March 20, 2024, 12:59am

Setup:

Suppose I have a bunch of unstructured Task objects that each take a while to generate a CGImage from a remote file. I add these Task objects to a iVar on an Actor so that any new requests for the same image that come in can simply await the Task that's already in-flight.

We obviously want to cache the generated images so that we only do the work once. Someone on StackOverflow suggested that it's fine to just keep the Task objects around as the cache, like this:

actor ImageLoader
{
    private var activeTasks: [URL: Task<CGImage, Never>]

    // Construct a single image
    private func perform(with url: URL) async -> CGImage
    {
        // Fetch URL data, generate CGImage, etc. Ignore errors for brevity.
    }


    // Retrieve a single image from the cache, if possible, or create a new Task if needed.
    func image(for url: URL) async -> CGImage
    {
        if let task = activeTasks[url] 
        {
            // If the task has already finished, this returns immediately.
            return await task.value
        }
        else
        {
            let task = Task<CGImage, Never> {
                return await perform(with: url)
            }
            activeTasks[url] = task
        }
    }
}

In short, we don't await the result of each Task and then "extract" the image to store in a [URL: CGImage] collection; we just keep the Tasks themselves.

Is this good advice? Does Task release all the execution context overhead it captured at creation once it has finished its work and value is available? If not, it seems like keeping a lot of completed Task objects around isn't very lightweight.

Note:

I understand that structured concurrency is preferred. In my case, it's not an option. (The URL example is a simplified analogy for what I'm doing that illustrates the general concept.)

wadetregaskis · March 20, 2024, 1:27am

I believe it's fine (and I do this sort of thing myself, without any apparent issues).

It's not a complete affirmation, but from the Task docs:

Retaining a task object doesn't indefinitely retain the closure, because any references that a task holds are released after the task completes.

That doesn't rule out the Task retaining some other references, e.g. relating to the execution context or somesuch, but at least those are likely to be cheap and fixed size, so probably not of concern even if they are kept around.

ktoso · March 20, 2024, 1:34am

We recently added some wording on this to the docs:

Task | Apple Developer Documentation
After this code has run to completion, the task has completed, resulting in either a failure or result value, this closure is eagerly released.

Retaining a task object doesn’t indefinitely retain the closure, because any references that a task holds are released after the task completes. Consequently, tasks rarely need to capture weak references to values.

Yeah, a task destroys the closure when it completes.

Yeah that's right... you should try whenever possible to use structured concurrency, perhaps it's possible here in some way.

John_McCall · March 20, 2024, 1:57am

To optimize task startup, the concurrency runtime makes a single allocation per task, and that includes some space for resources that we need during task execution. While we can release most of a task's overhead after it completes, holding a Task reference indefinitely does prevent us from freeing that initial allocation. It's not a huge allocation — it's somewhere around 600 bytes, varying by OS and result type — but it isn't totally negligible.

bdkjones · March 20, 2024, 3:18am

Thanks. I do dump the cache periodically and I expect no more than about 100 items in it at any one time. I’m also on macOS, so memory pressure isn’t as large a concern as it is on iOS.

wadetregaskis · March 20, 2024, 3:23am

Is there an alternative approach, for cases where those ~600 bytes are a problem? e.g. some elegant way of wrapping / coercing the Task into a more traditional future or somesuch?

bdkjones · March 20, 2024, 3:31am

The problem with structured concurrency (specifically TaskGroup) is that I can’t obtain a reference to the child tasks. So when a new call comes in, I don’t have a Task I can await.

I had a separate thread on that where someone suggested using continuations as a fallback. I played with that approach, but it seemed like duct taping the problem. Continuations are really a stopgap feature designed to bridge concurrency back to old completion handlers. Designing a new module with them seemed…backwards.

wadetregaskis · March 20, 2024, 3:47am

I dunno if continuations are considered temporary, but (IMO) more to the point they're a lot more work than using Tasks even with cancellation manually supported. And way less work than the olympic-gold-medal-winning-gymnast levels of contortions required to use TaskGroup, in many situations.

jrose · March 20, 2024, 4:51am

enum Future<Success, Failure> where Success: Sendable, Failure: Error {
  case pending(Task<Success, Failure>)
  case completed(Result<Success, Failure>)

  var result: Result<Success, Failure> {
    mutating get async {
      switch self {
      case .pending(let task):
        let result = await task.result
        self = .completed(result)
        return result
      case .completed(let result):
        return result
      }
    }
  }
}

John_McCall · March 20, 2024, 6:26am

Assuming this goes into a data structure, you wouldn't actually want to be mutating the structure across the entire fetch operation. (Swift specifically does not let you pass actor storage to a mutating async function.) But yeah, assuming the cache is something like a dictionary managed by an actor, putting an enum in the value seems like the right approach to eliminate that trailing overhead (if you decide it actually matters).

bdkjones · March 20, 2024, 7:01am

Attempting to use this wrapper, I get a warning:

Non-sendable type '@lvalue Future' exiting actor-isolated context in call to non-isolated property 'result' cannot cross actor boundary

Usage:


private var cache: [URL: Future] = [:]


func image(for url: URL) async -> CGImage
{
    if var cacheEntry: Future = cache[url]    // var required or 'cannot use mutating getter' error
    {
        let result = await cacheEntry.result    // build warning.
        [...]
    }
}

Workaround:

It's nice to have a single result property neatly contained within the Future, but since the mutating getter is the source of the warning, it suffices to move the switch (and mutation) to the Actor:

enum SnapFuture: Sendable
{
    case pending(Task<CGImage, Never>)
    case completed(CGImage)
}


actor ImageSnapper
{
    private var cache: [URL: SnapFuture] = [:]

    func image(for url: URL) async -> CGImage
    {
        if let cacheEntry: SnapFuture = cache[url]
        {
            switch cacheEntry
            {
            case .pending(let task):
                let image: CGImage = await task.value
                cache[url] = SnapFuture.completed(image)
                return image

            case .completed(let image):
                return image
            }
        }
        else
        {
            // Spin up a new task for this URL.
        }
    }
}

jrose · March 20, 2024, 3:45pm

I got too clever, I guess. Your workaround matches John’s advice anyway (to not treat the whole cache as in-use while resolving one element).