Swift Async Algorithms Proposal: Interspersed

FranzBusch · May 10, 2023, 12:21pm

Proposal: SAA-0011
Authors: Philippe Hausler
Review Manager: Franz Busch
Status: Implemented
Implementation: Source | Tests

Motivation

A common transformation that is applied to async sequences is to intersperse the elements with
a separator element.

Proposed solution

We propose to add a new method on AsyncSequence that allows to intersperse
a separator between each emitted element. This proposed API looks like this

extension AsyncSequence {
  /// Returns a new asynchronous sequence containing the elements of this asynchronous sequence, inserting
  /// the given separator between each element.
  ///
  /// Any value of this asynchronous sequence's element type can be used as the separator.
  ///
  /// The following example shows how an async sequences of `String`s can be interspersed using `-` as the separator:
  ///
  /// ```
  /// let input = ["A", "B", "C"].async
  /// let interspersed = input.interspersed(with: "-")
  /// for await element in interspersed {
  ///   print(element)
  /// }
  /// // Prints "A" "-" "B" "-" "C"
  /// ```
  ///
  /// - Parameter separator: The value to insert in between each of this async
  ///   sequence’s elements.
  /// - Returns: The interspersed asynchronous sequence of elements.
  @inlinable
  public func interspersed(with separator: Element) -> AsyncInterspersedSequence<Self> {
    AsyncInterspersedSequence(self, separator: separator)
  }
}

Detailed design

The bulk of the implementation of the new interspersed method is inside the new
AsyncInterspersedSequence struct. It constructs an iterator to the base async sequence
inside its own iterator. The AsyncInterspersedSequence.Iterator.next() is forwarding the demand
to the base iterator.
There is one special case that we have to call out. When the base async sequence throws
then AsyncInterspersedSequence.Iterator.next() will return the separator first and then rethrow the error.

Below is the implementation of the AsyncInterspersedSequence.

/// An asynchronous sequence that presents the elements of a base asynchronous sequence of
/// elements with a separator between each of those elements.
public struct AsyncInterspersedSequence<Base: AsyncSequence> {
  @usableFromInline
  internal let base: Base

  @usableFromInline
  internal let separator: Base.Element

  @usableFromInline
  internal init(_ base: Base, separator: Base.Element) {
    self.base = base
    self.separator = separator
  }
}

extension AsyncInterspersedSequence: AsyncSequence {
  public typealias Element = Base.Element

  /// The iterator for an `AsyncInterspersedSequence` asynchronous sequence.
  public struct Iterator: AsyncIteratorProtocol {
    @usableFromInline
    internal enum State {
      case start
      case element(Result<Base.Element, Error>)
      case separator
    }

    @usableFromInline
    internal var iterator: Base.AsyncIterator

    @usableFromInline
    internal let separator: Base.Element

    @usableFromInline
    internal var state = State.start

    @usableFromInline
    internal init(_ iterator: Base.AsyncIterator, separator: Base.Element) {
      self.iterator = iterator
      self.separator = separator
    }

    public mutating func next() async rethrows -> Base.Element? {
      // After the start, the state flips between element and separator. Before
      // returning a separator, a check is made for the next element as a
      // separator is only returned between two elements. The next element is
      // stored to allow it to be returned in the next iteration. However, if
      // the checking the next element throws, the separator is emitted before
      // rethrowing that error.
      switch state {
        case .start:
          state = .separator
          return try await iterator.next()
        case .separator:
          do {
            guard let next = try await iterator.next() else { return nil }
            state = .element(.success(next))
          } catch {
            state = .element(.failure(error))
          }
          return separator
        case .element(let result):
          state = .separator
          return try result._rethrowGet()
      }
    }
  }

  @inlinable
  public func makeAsyncIterator() -> AsyncInterspersedSequence<Base>.Iterator {
        Iterator(base.makeAsyncIterator(), separator: separator)
  }
}

JuneBash · May 10, 2023, 2:18pm

FYI I'm getting a 404 for the source link, and "Tests" links to TestLazy.swift.

As for the proposal, I thought this already existed, so I'm all for it!

FranzBusch · May 10, 2023, 2:19pm

Thanks for catching this. It should now work!

The implementation already existed since the early days of the project; however, we never got around to formally pitch it.

Karl · May 10, 2023, 2:26pm

Can you explain why this is useful/desirable?

FranzBusch · May 10, 2023, 2:40pm

This is a good point. I was wondering the same when drafting up the proposal. @Philippe_Hausler can you chime in here and explain the reasoning for why we return the separator first and the re throw in this particular case.

Philippe_Hausler · May 10, 2023, 2:56pm

Re-looking at the code; It could throw immediately - however the output may look delayed in that case since it has to prime the next iteration.

I could imagine an alteration that makes it throw at the same throwing point, but that might take some fixing.

davedelong · May 10, 2023, 3:19pm

Interspersing is important, but this single addition is insufficient:

public func interspersed(with separator: Element) → AsyncInterspersedSequence<Self>

This signature assumes that Element is a value type and is clone-able via copy-on-write. If, on the other hand, you're dealing with reference types, this will end up interspersing the same allocated value into the final sequence.

For example: If you're dealing with things like UIViews and wanting to intersperse separators between sections, then you've now interspersed the same UIView between your sections, which means it will only end up once in the final view hierarchy.

So, there needs to also be a version of this that accepts a () → Element closure so it can generate the interspersed elements.

(FWIW, Swift-Algorithms also has this flaw)

stevapple · May 10, 2023, 3:36pm

I wish that be () async -> Element. Not sure if throwing makes sense here. The proposed API could be a thin wrapper over it.

pyrtsa · May 10, 2023, 3:59pm

One signature would be enough if it was with an @autoclosure () -> Element argument.

davedelong · May 10, 2023, 4:02pm

When I've implemented this in the past in my own extensions, I start with the @autoclosure version and invariably end up splitting it out into two functions. @autoclosure is great if I have a single-line constructor that I want to use (such as Divider()), but:

it cannot be used for more complex constructors that rely on constructing intermediate state/initializer parameters
it cannot accept an actual closure as a parameter

In general I've found @autoclosure to be a nice thing for simplifying the odd API here and there, but it tends to not hold up very well when building a comprehensive API surface.

Philippe_Hausler · May 10, 2023, 8:18pm

The one issue with the closure there is it might need to be Sendable.

mtsrodrigues · May 11, 2023, 12:26pm

I know it's not that simple but I wish T and () -> T where just interchangeable.

scanon · May 11, 2023, 1:19pm

I think I want the API to be able to intersperse a separator every n elements (defaulting to 1), rather than only every other element. I expect "every other element" to account for like 98% of use, so I wouldn't do this if it added any additional complexity, but it doesn't, so my gut feeling is that we should include it.

FranzBusch · June 1, 2023, 1:19pm

Thanks for all the great feedback here. I opened a PR that integrates all of the feedback. Here is a quick summary:

Change the trailing separator behaviour. We are no longer returning a separator before we are forwarding the error
Add a synchronous and asynchronous closure based interspersed method.
Support interspersing every n elements

I would love to get some feedback on the updated proposal. Especially on:

The naming of the every parameter
The closures are not throwing because we can't easily make them rethrows and this would make this sequence always throwing. To support re throws we probably have to introduce another generic parameter to this type.

Philippe_Hausler · June 1, 2023, 3:33pm

every is a decent name - it does what it says on the tin and provides natural language of the method; but swift does have a similar nomenclature for things like that - stride.

In this case the API ends up bifurcating to two types - one for throwing closures and one for non-throwing closures. From a usage standpoint it should be isomorphic to users but from the type system it needs to have concrete awareness that the closure participates in the error-ness of the iteration.

FranzBusch · June 1, 2023, 4:20pm

Right, we can totally do this by having a AsyncThrowingInterspersedSequence. Should we include it in this proposal?

Philippe_Hausler · June 1, 2023, 4:38pm

Yea I think it is a direct consequence of the closure - that should support failure modes and therefore we need throwing variant of the sequence.

FranzBusch · June 1, 2023, 5:19pm

Updated the PR to also include the throwing closure variants + the backing AsyncThrowingInterspersedSequence. I will keep the PR open for a bit so that we can get more feedback in here.

davedelong · June 2, 2023, 3:44pm

Thanks for the update! I think it looks really good.

I noticed that the three methods have the same example snippet in their doc comments, even though they're different. IE, they all show:

let input = ["A", "B", "C"].async
let interspersed = input.interspersed(with: "-")

... even for the variations that take a closure. If that's "the way" docs are done then so be it, but I think it'd be better if each method had a different example, so it's more obvious from the docs why you'd want to choose one variant over another.

FranzBusch · June 2, 2023, 3:47pm

Ah nope good catch. I will update those. It is a bit hard to come up with sensible examples for the async and throws variants. Might just stick to something very basic there or re-use the regular closure based example.