[Pitch] Retry & Backoff

Hello everyone!

I've been working on a proposal to add retry functionality with backoff strategies to Swift Async Algorithms, and I'd like to pitch it.

Retry logic with backoff is a common requirement in asynchronous programming, especially for operations subject to transient failures such as network requests. Today, developers must reimplement retry loops manually, leading to fragmented and error-prone solutions across the ecosystem.

This proposal introduces a standardized retry function that handles these scenarios cleanly.

nonisolated(nonsending) func retry<Result, ErrorType, ClockType>(
  maxAttempts: Int,
  tolerance: ClockType.Instant.Duration? = nil,
  clock: ClockType = ContinuousClock(),
  operation: () async throws(ErrorType) -> Result,
  strategy: (ErrorType) -> RetryAction<ClockType.Instant.Duration> = { _ in .backoff(.zero) }
) async throws -> Result where ClockType: Clock, ErrorType: Error

Here are the two main use cases:

When you control the retry timing:

let rng = SystemRandomNumberGenerator() // or a seeded RNG for unit tests
var backoff = Backoff
  .exponential(factor: 2, initial: .milliseconds(100))
  .maximum(.seconds(10))
  .fullJitter(using: rng)

let response = try await retry(maxAttempts: 5) {
  try await URLSession.shared.data(from: url)
} strategy: { error in
  return .backoff(backoff.nextDuration())
}

When a remote system controls the retry timing:

let response = try await retry(maxAttempts: 5) {
  let (data, response) = try await URLSession.shared.data(from: url)
  if
    let response = response as? HTTPURLResponse,
    response.statusCode == 429,
    let retryAfter = response.value(forHTTPHeaderField: "Retry-After"),
    let seconds = Double(retryAfter)
  {
    throw TooManyRequestsError(retryAfter: seconds)
  }
  return (data, response)
} strategy: { error in
  if let error = error as? TooManyRequestsError {
    return .backoff(.seconds(error.retryAfter))
  } else {
    return .stop
  }
}

The design provides error-driven retry decisions, composable backoff strategies (constant, linear, exponential, decorrelated jitter). It also includes jitter support to prevent thundering herd problems when multiple clients retry simultaneously.

Please read the full proposal here.

Thank you.

17 Likes

I've recently updated this proposal since I've initially pitched it. This is the change log:

1. Dropping support of DurationProtocol in linear and exponential backoff

Due to a flaw found by Github user bobergj I made the decision to drop support of DurationProtocol in linear and exponential backoff algorithms and use Duration instead. This also had the side effect, that I had to bump the overall availability of the algorithms of this proposal to AsyncAlgorithms 1.1.

Why?

If you consider this backoff:

 Backoff.exponential(factor: 2, initial: .seconds(5)).maximum(.seconds(120))

... it might seem like the calculation cannot overflow because it is capped at 120 seconds. However, the exponential function will silently keep multiplying each retry which eventually will overflow Duration at some point (in the given example after 64 retries).

One solution to this problem would be to stop multiplying when the maximum has been reached. However this is would only work if we'd bake the concept of "maximum" into the exponential backoff strategy. This has the consequence that every "top level" backoff strategy would have to implement their own version of maximum.

The other solution would be to keep multiplying in exponential backoffs, but detect that it has overflown (by using multipliedReportingOverflow) and then stop multiplying. This has the consequence that, as stated previously, DurationProtocol is not supported, because only Duration exposes such functionality.

2. Dropping decorrelatedJitter
I might've been a bit overambitious in adding backoff strategies. I initially used this resource: Exponential Backoff And Jitter | AWS Architecture Blog as a reference for jitter variants. However, a lot of other retry frameworks of different programming languages do not support such a wide variety of backoff algorithms. Most support constant, exponential and also some form of randomization.

This is why I am dropping support for decorrelatedJitter and also considering dropping equalJitter (but keeping fullJitter) and linear (since exponential backoff seems to be the industry standard for retry scenarios). I am curious, though, what people think about this. Please provide feedback which timing functions or backoff variants you'd like to use or are using currently.

Since there is no precedent of standalone retry-with-backoff in either Apple frameworks or Swift-related frameworks, I feel like this is quite a balance act of not overdoing it / confusing adopters with too many options but also providing enough so most of the use cases can be solved.

2 Likes

Just want to mention although I have not had the time to go through the proposal, I think the issue that you're trying to solve is very valid.

I had it on my long TODO list to create such a package that handles different kinds of backoffs, as I had to manually implement backoff strategies in DiscordBM and as I'll very soon need such backoff strategies in swift-dns's dns resolver.

As long as new strategies can be introduced in the future, I’m fine with going with a smaller set of strategies that are more common.

1 Like

I haven’t seen too many packages that contain proper backoff implementations in my day-to-day server-side use. My own DiscordBM is one of them. Another one is @adam-fowler ‘s AWSClient in Soto.

I know he's on vacation but thought I'd mention him here just incase he has any special opinions, whenever he can catch up.

Soto is a very high quality package, and has proven itself in practice over the years, so I think Adam's opinion would be of value.

1 Like

Yes, this is definitely possible. I believe it‘d probably require some sort of formal amendment or proposal, though.

1 Like

This is great. I’ve written so many versions of this over the years.

As far as I can tell, given BackoffStrategy is stateful, I would need to create a new instance of it each time I want to run a process using backoff. As a library author if I wanted to provide a method for users to define the backoff strategy parameters I would have to create separate types to define these and then create a BackoffStrategy from this new type. It might be useful for this proposal to include these defining types.

Interesting, which parameters would you‘d like to see included? The timing function (=only BackoffStrategy) or also how many attempts?

Possibly everything. Hadn’t thought about max attempts though as this isn’t defined in the BackoffStrategy. Out of interest any reason this is separate from the BackoffStrategy.

Perhaps you could have something like this

let rng = SystemRandomNumberGenerator() // or a seeded RNG for unit tests
let backoffType = Backoff
  .exponential(factor: 2, initial: .milliseconds(100))
  .maximum(.seconds(10))
  .fullJitter(using: rng)
var backoff = backoff.instatiate()

let response = try await retry(maxAttempts: 5) {
  try await URLSession.shared.data(from: url)
} strategy: { error in
  return .backoff(backoff.nextDuration())
}

Where backoffType is a Sendable type that conforms to BackoffStrategyType which is defined as

protocol BackoffStrategyType {
    associatedtype Strategy: BackoffStrategy
    func instantiate() -> Strategy
}
3 Likes

Mainly because maxAttempts does not fit to BackoffStrategy. I thought backoff should not really be concerned about how often it should compute a timing, it is more related to retrying rather than backoff.

If this would be added to this configuration type it should probably be called RetryStrategy rather than BackoffStrategy

This reminds me of IteratorProtocol. I wonder if it'd make sense to imitate Sequence and IteratorProtocol like this:

public protocol BackoffStrategy<Duration> {
  associatedtype Iterator: BackoffIterator
  associatedtype Duration: DurationProtocol where Self.Duration == Self.Iterator.Duration
  func makeIterator() -> Iterator
}

public protocol BackoffIterator {
  associatedtype Duration: DurationProtocol
  mutating func nextDuration() -> Duration
}

Library authors could make their API accept BackoffStrategy where concrete strategies can easily be made Sendable. The iterators will be stateful, while the backoff strategy itself will be stateless. Strategies could still be composed like this:

let backoffType = Backoff
  .exponential(factor: 2, initial: .milliseconds(100))
  .maximum(.seconds(10))
  .fullJitter(using: rng)

... and it even shares similarities with already existing concepts within Swift, like (lazy) sequences or asynchronous sequences.

Very valuable input. Thank you, @adam-fowler.

1 Like