Best approaches for data polling using Swift concurrency

I have an application which needs to make recurring API calls to fetch a relatively large dataset. These updates happen at regular intervals, but may also be requested on an adhoc basis elsewhere in the codebase. To avoid duplicating requests to the backend server I've been looking at ways to leverage Swift concurrency to make things cleaner.

My initial stab at the problem looks a little like this, whereby I setup a repeating task which sleeps for a minute between data poll operations:

updateTask = Task(priority: .background) {
    defer { updateTask = nil }
    repeat {
        await performDataPoll()
        try? await Task.sleep(for: .seconds(60))
    } while !Task.isCancelled
}

To facilitate the adhoc requirement, I've created the following method (based on an approach seen elsewhere), which either initiates a poll or, if it detects one is already in-flight, awaits the result of the existing task. One benefit of this approach is that if the poll is initiated from multiple sites they will all wait for the initial request to complete.

func performDataPollOrWaitForInFlightRequest() async {
    let task: Task<Void, Never>

    if let dataPollFetchTask {
        task = dataPollFetchTask
    } else {
        task = Task {
            await performDataPoll()
        }
        dataPollFetchTask = task
    }

    return await withTaskCancellationHandler {
        defer { dataPollFetchTask = nil }
        return await task.value
    } onCancel: {
        task.cancel()
    }
}

The example here is obviously simplified and I'm still fairly new to Swift's unstructured concurrency, but am interested to hear if this is a sound approach (and of any potential pitfalls).

3 Likes

I think AsyncStream might be more convenient and expressive tool here. I have only implemented AsyncSequence, but since stream is a sequence, there should be no complications to implement it.

1 Like

AsyncStream was actually my first port of call, but didn't seem viable because, unless I'm mistaken, it doesn't support multiple consumers? There's a possibility that multiple call sites could request the data concurrently.

2 Likes

Im not sure if I understand everything correctly, but how about an actor + state machine? This is structured, but it solves the state problem.

actor PollThingie {

  enum State {
    case initial
    case requestPending(things needed to join the pending task)
    case waiting(last request result + its date to avoid polling too often)
  }

  private var state = State.initial

  func poll() {
    // Check the self.state and maybe join the pending task.
  }
}

Then in some outside loop call poll every X minutes (in a separate task). You can also call it manually by hand (when the user pressed the button).

2 Likes

So from my perspective there are two components of what needs to be done

  1. Polling using concurrency
  2. Deduplication

And these are not connected tasks. I think you should be fine using AsyncStream for the first point. With second, I don't remember what (if any) drawbacks of your second code snippet, but in general wrapping in Task I think might produce more complications, since they don't utilise structured concurrency. Actor approach suggested above is the one I would prefer to have, yet keep logic as simple as possible due to actors reentrancy.

This article contains the "Actor that stores Tasks" example: Swift actors: How do they work, and what kinds of problems do they solve? | Swift by Sundell (you can just scroll to the last code snippet if you don't want to read the whole thing).

Then you just need:

  • a Task that calls poll in a loop - similar to your 1st post
  • some other method that calls poll when user presses the button (or whatever other trigger you have, can be another Task).

If you are using Apple platforms there is also Timer.TimerPublisher which can be converted into an AsyncSequence

let sequence = Timer.TimerPublisher(...)
   .autoconnect()
   .values

this is a common problem and i find that the best way to ensure at most 1 thing is happening at a time is to do the things sequentially in a for/while loop and await on some signal such as an AsyncStream with a buffer size of 1.

in this pattern, you would consume the AsyncStream serially from one concurrency domain, and signal to it from others. there would be one concurrent task that yields the heartbeats every fixed interval, and you would also be able to resume the polling loop by yielding to the continuation from elsewhere.

1 Like