Running an async task with a timeout

ole · June 18, 2021, 11:58am

I wrote a function async(timeoutAfter:work:). Its goal is to run an async task with a timeout. If the timeout expires and the work hasn't completed, it should cancel the task and throw a TimedOutError.

Here’s my code. It seems to work, but I’m not sure if it’s correct. Am I on the right track here or is there a better way to achieve this?

import Foundation.NSDate // for TimeInterval

struct TimedOutError: Error, Equatable {}

/// Runs an async task with a timeout.
///
/// - Parameters:
///   - maxDuration: The duration in seconds `work` is allowed to run before timing out.
///   - work: The async operation to perform.
/// - Returns: Returns the result of `work` if it completed in time.
/// - Throws: Throws ``TimedOutError`` if the timeout expires before `work` completes.
///   If `work` throws an error before the timeout expires, that error is propagated to the caller.
func `async`<R>(
  timeoutAfter maxDuration: TimeInterval,
  do work: @escaping () async throws -> R
) async throws -> R {
  return try await withThrowingTaskGroup(of: R.self) { group in
    // Start actual work.
    group.async {
      return try await work()
    }
    // Start timeout child task.
    group.async {
      await Task.sleep(UInt64(maxDuration * 1_000_000_000))
      try Task.checkCancellation()
      // We’ve reached the timeout.
      throw TimedOutError()
    }
    // First finished child task wins, cancel the other task.
    let result = try await group.next()!
    group.cancelAll()
    return result
  }
}

Here’s a usage example. You can set the sleep amount in the await Task.sleep(100_000_000) to a higher value, e.g. 300_000_000, to see it fail.

(If you want to try this out with Xcode 13.0 beta 1, use this workaround to make Task.sleep work.)

detach {
  do {
    let favoriteNumber: Int = try await async(timeoutAfter: 0.25) {
      await Task.sleep(100_000_000)
      return 42
    }
    print("Favorite number: \(favoriteNumber)")
  } catch {
    print("Error: \(error)")
  }
}

(Edit: Modified the code slightly to accommodate tasks that can throw.)

mickeyl · June 18, 2021, 2:13pm

I wonder about the cost of all those coroutines being suspended in the sleep call, especially given a low timeout probability…

jjoelson · June 18, 2021, 3:30pm

Is that timeout task guaranteed to start executing immediately? All this stuff is new to me obviously, but I don’t really see how that could be guaranteed.

tclementdev · June 18, 2021, 7:54pm

Right I don't think it can, tasks are scheduled on the thread pool which could be busy with other things.

Alejandro_Martinez · June 18, 2021, 9:37pm

Very similar to what I did and seems to be alright. The problem is that right now sleep doesn’t stop early when cancelled, which makes this a bit worthless.

I mentioned that in the structured concurrency review. SE-0304 (3rd review): Structured Concurrency - #36 by Alejandro_Martinez

toph42 · June 18, 2021, 9:57pm

Couldn’t you just put the timer in a while loop that checks every second if it’s cancelled or time’s up and break accordingly?

ole · June 21, 2021, 11:43am

Thank you everyone for your input. I also received good feedback on Twitter, which I wanted to link here.

Importantly, @John_McCall pointed out that a timeout API should be expressed as a deadline (a specific point in time at which the timeout occurs) rather than a duration because (1) it’s not guaranteed when the timeout task will start (as @jjoelson also pointed out), and (2) deadlines compose better when propagated to child tasks.

@jrose suggested to get rid of the explicit TimedOutError and model a timeout as a plain cancellation. I think I agree with this.

tclementdev · June 21, 2021, 11:54am

Though using a deadline does not really solve the problem by itself, because if you're unlucky the timeout task might be severely delayed and you might just set up the deadline too late. It seems like the timeout would need to be set up in the parent task of the task you want to have the timeout for (and have it fire outside of the tread pool?).

ole · June 21, 2021, 12:15pm

I’m not sure I understand what you mean. Something like this

True, but I don’t know if we should be concerned about that. It’s the nature of a cooperative system that tasks may be delayed. It’s no different in that regard than Timer in Foundation, which is also not guaranteed to fire on time.

With a deadline check, even if the timeout task gets scheduled for the first time after the deadline has passed, it would immediately cancel and thus trigger the cancellation of the "work task" at the earliest time the system was able to accommodate. I think this is as good an outcome as we can expect.

ole · June 21, 2021, 12:19pm

I’m not sure if I understand what you mean, but you gave me the idea to write my own sleep function with cancellation support. Thanks! This implementation sleeps for short intervals and performs manual cancellation checks in between:

import Foundation

extension Task {
  /// Like `Task.sleep` but with cancellation support.
  ///
  /// - Parameter deadline: Sleep at least until this time. The actual time the sleep ends can be later.
  /// - Parameter cancellationCheckInterval: The interval in nanoseconds between cancellation checks.
  static func sleepCancellable(
    until deadline: Date,
    cancellationCheckInterval: UInt64 = 100_000
  ) async {
    while Date.now < deadline {
      guard !Task.isCancelled else {
        break
      }
      // Sleep for a while between cancellation checks.
      await Task.sleep(cancellationCheckInterval)
    }
  }
}

Using this instead of Task.sleep in my timeout implementation makes it behave correctly.

toph42 · June 21, 2021, 1:40pm

This is what I meant and I’m glad it works like you wanted it to. I think I’d have made it more generalized to have a cancelIf: @autoclosure () -> Bool to be able to trigger cancellation off of other things but also include an init that takes a cancelingAfter: Date parameter and sets cancelIf to a closure that returns true if it’s past the time specified.

Doug_Stein · January 27, 2022, 6:25pm

I took a shot at adding the correct "deadline" behavior to Ole Beremann's excellent code sample, and also updated for Swift 5.5. I also notice that Task.sleep now supports cancellation:

import Foundation.NSDate // for TimeInterval

struct TimedOutError: Error, Equatable {}

///
/// Execute an operation in the current task subject to a timeout.
///
/// - Parameters:
///   - seconds: The duration in seconds `operation` is allowed to run before timing out.
///   - operation: The async operation to perform.
/// - Returns: Returns the result of `operation` if it completed in time.
/// - Throws: Throws ``TimedOutError`` if the timeout expires before `operation` completes.
///   If `operation` throws an error before the timeout expires, that error is propagated to the caller.
public func withTimeout<R>(
    seconds: TimeInterval,
    operation: @escaping @Sendable () async throws -> R
) async throws -> R {
    return try await withThrowingTaskGroup(of: R.self) { group in
        let deadline = Date(timeIntervalSinceNow: seconds)

        // Start actual work.
        group.addTask {
            return try await operation()
        }
        // Start timeout child task.
        group.addTask {
            let interval = deadline.timeIntervalSinceNow
            if interval > 0 {
                try await Task.sleep(nanoseconds: UInt64(interval * 1_000_000_000))
            }
            try Task.checkCancellation()
            // We’ve reached the timeout.
            throw TimedOutError()
        }
        // First finished child task wins, cancel the other task.
        let result = try await group.next()!
        group.cancelAll()
        return result
    }
}

bkbeachlabs · February 25, 2022, 4:40pm

Thanks everyone for providing a clear example on how to do this!

I've tried implementing this myself though and I'm seeing some behaviour I don't understand. I'm hoping someone here can sort it out better than I can, or can confirm if the same thing is happening for you (I'm using Xcode 13.2.1 running on an iOS 15.2.1 iPhone).

If the timeout child task finishes first, the TimedOutError() is thrown by that task. It is correctly propagated out of the TaskGroup, BUT then it just stops - it doesn't get rethrown by the withTimeout function until the work task completes. If the work task eventually completes, then the exception from withTimeout does at that point get thrown back up the call stack.

I'm not sure what's going on, but it feels like the throw is waiting for the task group to unlock or something and that doesn't happen until the work task returns.

I have a feeling that the problem may actually be in my work task, but I don't see it. Am I somehow blocking a thread or something? It is something like this:

	func startup(timeoutAt deadline: Date = Date().advanced(by: 5)) async throws {
		do {
			try await withTimeLimit(deadline: deadline, do: {
				try self.checkForForceUpgrade()
				let tokens = try await self.services.tokenRefresher.requestValidTokens()
				let channel = try await self.services.serviceC.doStuff()
				try await self.services.serviceA.connect()
				try await self.services.serviceB.start()
				let (_, _) = try await self.services.serviceB.refresh()
			})
		} catch {
			// If one of the async calls above never returns
			// (such as when a network request never receives a response,
			// `withTimeLimit()` throws, but this catch block isn't triggered.
			// 
			// If the work task above takes longer than the timeout, the timeout
			// error IS caught here when the work task finishes. So I guess it was
			// waiting to be thrown that whole time?
			print("STARTUP FAILED: \(error)")
			throw error
		}
	}

for completeness, here is my withTimeLimit function, but it is basically exactly as has been described above:

public extension Task where Success == Never, Failure == Never {
	static func sleepRespectingCancellation(until deadline: Date, cancelCheckIntervalNs: UInt64 = 100_000) async throws {
		while Date() < deadline {
			guard !isCancelled else { break }
			try await sleep(nanoseconds: cancelCheckIntervalNs)
		}
	}
}

public func withTimeLimit<ResultType>(deadline: Date, do work: @escaping () async throws -> ResultType) async throws -> ResultType {
	
	print("TIME LIMIT - starting")
	
	return try await withThrowingTaskGroup(of: ResultType.self) { group in

		group.addTask {
			// If we throw after the sleep, the catch handler of the user of this method never seems to be triggered.
			// If we throw before, it is.
			// Using the normal sleep(nanoseconds:) here doesn't improve the behaviour
			try await Task.sleepRespectingCancellation(until: deadline)
			print("TIME LIMIT - sleep finished")
			try Task.checkCancellation()
			throw TJError(code: CommonError.timeout, description: "async task timed out")
		}
		
		group.addTask {
			let result = try await work()
			print("TIME LIMIT - work finished")
			return result
		}
		
		print("TIME LIMIT - waiting")

		do {
			let result = try await group.next()!
			print("TIME LIMIT - done")
			group.cancelAll()
			return result
			
		} catch {
			print("TIME LIMIT - throw \(error)")
			group.cancelAll()
			// In the timeout case, we hit this catch statement.
			// But THIS throw doesn't trigger the catch at the next level
			throw TJError(code: CommonError.timeout,
						  description: "async 2",
						  underlyingError: error)
		}
	}
}

Thank you in advance for any insights or guidance you can provide!

ole · February 25, 2022, 5:38pm

This is expected behavior if your work task doesn't handle cancellation.

The task group will always wait for all child tasks to finish before it ends. This is a result of the fundamental principle of structured concurrency that child tasks must not outlive the scope in which they were started.
Cancellation is cooperative. Canceling a task just sets a flag; it's to up to each async function to check this flag periodically return early if the task has been canceled. You can do that with Task.isCancelled or try Task.checkCancellation() in the function(s) that make up your work task.

bkbeachlabs · February 25, 2022, 5:42pm

Thank you @ole! This makes sense. I appreciate your help in my learning.

cristik · March 4, 2022, 6:30pm

A little bit late to the party, but can we use GCD to implement the timeout mechanism? Something like

func computeAllDecimalsOfPI(withTimeout timeout: TimeInterval) async throws -> String {
    let didTimeout = UnsafeMutablePointer<Bool>.allocate(capacity: 1)
    return try await withCheckedThrowingContinuation { continuation in
        DispatchQueue.global().asyncAfter(deadline: .now() + timeout) {
            didTimeout.pointee = true
        }
        extremelyLongExternalComputationForAllDecimalsOfPI(continuation: continuation, didTimeout: didTimeout)
    }
}

extremelyLongExternalComputationForAllDecimalsOfPI would ofcourse periodically check for the value of didTimeout, and free the allocated pointer when it finishes (one way or another).

Or is the usage of GCD along with the structured concurrency something that we should avoid?

ole · March 4, 2022, 9:30pm

I don't see why you couldn't mix in GCD, but I don't think this is a good pattern. It looks like you're trying to reimplement something (a cancellation flag) that already exists for tasks.

Moreover, your code is unsafe because it accesses didTimeout.pointee from multiple threads without synchronization, i.e. you have a data race.

cristik · March 5, 2022, 12:41pm

It looks like you're trying to reimplement something (a cancellation flag) that already exists for tasks.

Agreed, this doesn't sound too idiomatic, however I couldn't wrap my head around converting a currently long and synchronous operation into an async form. I looked over the documentation of Task, but could not find the equivalent of a continuation, all Task async operations require an already async function, which I don't have.

your code is unsafe because it accesses didTimeout.pointee from multiple threads without synchronization, i.e. you have a data race.

I'm not too worried about that, it's a simple flag set/reset, a data race can at most cause the execution of an extra loop iteration in the called function. Indeed, Task.isCancelled sounds much better, but not sure I can use it in a context like this.

Will look more in this, so far my main curiosity was if a GCD queue can be used to implement a timeout for an async call.

realityworks · September 21, 2022, 1:00am

Am I missing something here, but can't we just cancel the Task, which throws a CancellationError. I feel like there's so much complexity in some of these solutions when the following would do.

let timedTask = Task {
    do {
        try await doAsyncWork()
    } catch {
        if let cancellationError = error as? CancellationError {
             // Handle cancellation
        } 
    }
}

Timer(timeInterval: 5, repeats: false , block: { self?.timedTask?.cancel() })

Am I missing the benefit of the other solutions?

soumyamahunt · September 21, 2022, 11:38am

I came up with a solution with one single modification, to make sure that the submitted async action started before the timeout sleep I used continuation:

public func waitForTaskCompletion<R>(
    withTimeoutInNanoseconds timeout: UInt64,
    _ task: @escaping () async throws -> R
) async throws -> R {
    return try await withThrowingTaskGroup(of: R.self) { group in
        await withUnsafeContinuation { continuation in
            group.addTask {
                continuation.resume()
                return try await task()
            }
        }
        group.addTask {
            await Task.yield()
            try await Task.sleep(nanoseconds: timeout)
            throw TimedOutError()
        }
        defer { group.cancelAll() }
        return try await group.next()!
    }
}