Swift Async Algorithms Proposal: Zip

Philippe_Hausler · April 18, 2022, 11:03pm

To kick off the reviews set forth by the roadmap that @Tony_Parker posted; this is perhaps one of the fundamental algorithms provided so far. Just because this has an implementation already, please treat it as if it was a full review of a feature. The implementation of course can still change since we have left ourselves with the leeway of having a resilient boundary (plus before 1.0 we can still change things if we see fit). As we discuss these proposals; some process changes may occur to streamline things. I hope that this will serve as a template for us to make some great forward progress together!

Zip

Proposal: SAA-NNNN
Authors: Philippe Hausler
Status: Implemented
Implementation: [Source, Source | Tests]
Decision Notes:
Bugs:

Introduction

The swift standard library has a function that allows for the combining of two sequences into one sequence of tuples of the elements of the base sequences. This concept can be achieved for AsyncSequence with the iteration being asynchronous but also each side being concurrently iterated while still rethrowing potential failures. This proposal covers that parity between AsyncSequence and Sequence. It is often times useful to describe asynchronous sequences of events as paired occurrences. The fundamental algorithm for this is zip.

Detailed Design

Zip combines the latest values produced from two or more asynchronous sequences into an asynchronous sequence of tuples.

let appleFeed = URL(string: "http://www.example.com/ticker?symbol=AAPL")!.lines
let nasdaqFeed = URL(string: "http://www.example.com/ticker?symbol=^IXIC")!.lines

for try await (apple, nasdaq) in zip(appleFeed, nasdaqFeed) {
  print("APPL: \(apple) NASDAQ: \(nasdaq)")
}

Given some sample inputs the following zipped events can be expected.

Timestamp	appleFeed	nasdaqFeed	combined output
11:40 AM	173.91
12:25 AM		14236.78	AAPL: 173.91 NASDAQ: 14236.78
12:40 AM		14218.34
1:15 PM	173.00		AAPL: 173.00 NASDAQ: 14218.34

This function family and the associated family of return types are prime candidates for variadic generics. Until that proposal is accepted, these will be implemented in terms of two- and three-base sequence cases.

public func zip<Base1: AsyncSequence, Base2: AsyncSequence>(_ base1: Base1, _ base2: Base2) -> AsyncZip2Sequence<Base1, Base2>

public func zip<Base1: AsyncSequence, Base2: AsyncSequence, Base3: AsyncSequence>(_ base1: Base1, _ base2: Base2, _ base3: Base3) -> AsyncZip3Sequence<Base1, Base2, Base3>

public struct AsyncZip2Sequence<Base1: AsyncSequence, Base2: AsyncSequence>: Sendable
  where
    Base1: Sendable, Base2: Sendable,
    Base1.Element: Sendable, Base2.Element: Sendable,
    Base1.AsyncIterator: Sendable, Base2.AsyncIterator: Sendable {
  public typealias Element = (Base1.Element, Base2.Element)

  public struct Iterator: AsyncIteratorProtocol {
    public mutating func next() async rethrows -> Element?
  }

  public func makeAsyncIterator() -> Iterator
}

public struct AsyncZip3Sequence<Base1: AsyncSequence, Base2: AsyncSequence, Base3: AsyncSequence>: Sendable
  where
    Base1: Sendable, Base2: Sendable, Base3: Sendable
    Base1.Element: Sendable, Base2.Element: Sendable, Base3.Element: Sendable
    Base1.AsyncIterator: Sendable, Base2.AsyncIterator: Sendable, Base3.AsyncIterator: Sendable {
  public typealias Element = (Base1.Element, Base2.Element, Base3.Element)

  public struct Iterator: AsyncIteratorProtocol {
    public mutating func next() async rethrows -> Element?
  }

  public func makeAsyncIterator() -> Iterator
}

The zip(_:...) function takes two or more asynchronous sequences as arguments with the resulting AsyncZipSequence which is an asynchronous sequence.

Each iteration of an AsyncZipSequence will await for all base iterators to produce a value. This iteration will be done concurrently to produce a singular tuple result. If any of the base iterations terminates by returning nil from its iteration, the AsyncZipSequence iteration is immediately considered unsatisfiable and returns nil and all iterations of other bases will be cancelled. If any iteration of the bases throws an error, then the other iterations concurrently running are cancelled and the produced error is rethrown, terminating the iteration.

AsyncZipSequence requires that the iterations are done concurrently. This means that the base sequences, their elements, and iterators must all be Sendable. That makes AsyncZipSequence inherently Sendable.

The source of throwing of AsyncZipSequence is determined by its bases. That means that if any base can throw an error then the iteration of the AsyncZipSequence can throw. If no bases can throw, then the AsyncZipSequence does not throw.

Naming

The zip(_:...) function takes its name from the Swift standard library function of the same name. The AsyncZipSequence family of types take their name from the same family from the standard library for the type returned by zip(_:_:). The one difference is that this asynchronous version allows for the affordance of recognizing the eventual variadic generic need of expanding a zip of more than just two sources.

It is common in some libraries to have a ZipMap or some other combination of zip and map. This is a common usage pattern, but leaving a singular type for composition feels considerably more approachable.

Comparison with other libraries

Swift The swift standard library has an API definition of zip as a top level function for combining two sequences.

ReactiveX ReactiveX has an API definition of Zip as a top level function for combining Observables.

Combine Combine has an API definition of zip as an operator style method for combining Publishers.

Effect on API resilience

`@frozen` and `@inlinable`

These types utilize rethrowing mechanisms that are awaiting an implementation in the compiler for supporting implementation based rethrows. So none of them are marked as frozen or marked as inlinable. This feature (discussed as rethrows(unsafe) or rethrows(SourceOfRethrowyness) has not yet been reviewed or implemented. The current implementation takes liberties with an internal protocol to accomplish this task. Future revisions will remove that protocol trick to replace it with proper rethrows semantics at the actual call site. The types are expected to be stable boundaries to prevent that workaround for the compilers yet to be supported rethrowing (or TaskGroup rethrowing) mechanisms. As soon as that feature is resolved; a more detailed investigation on performance impact of inlining and frozen should be done before 1.0.

Alternatives considered

It was considered to have zip be shaped as an extension method on AsyncSequence however that infers a "primary-ness" of one AsyncSequence over another. Since the standard library spells this as a global function (which infers no preference to one side or another) it was decided that having symmetry between the asynchronous version and the synchronous version inferred the right connotations.

There are other methods with similar behavior that could be controlled by options passed in. This concept has merit but was initially disregarded since that would complicate the interface. Design-wise this is still an open question if having a "zip-behavior-options" parameter to encompass combining the latest values or zipping based upon a preference to a "primary" side or not is meaningful.

It is common to have a zip+map to create structures instead of tuples, however that was disregarded since that concept could easily be expressed by composing zip and map.

twittemb · April 19, 2022, 11:25am

Hi.

Thanks @Philippe_Hausler for the proposal.

I’d like to bring 2 topics regarding unit tests:

could we adopt a naming convention for unit test functions that clearly mentions the system under test, the expected result and the conditions of the test, like: ´test_zip_produces_bounded_sequence_when_first_sequence_is_longer()´?
As discussed in this PR, could we enforce having a timeline diagram (in the form of comments) in the tests involving combining sequences or chronological operations?

I think enforcing those requirements will bring consistency across the project, will ease the understanding for new comers that would like to make PRs and will help in detecting the lacks in the code coverage.

What do you think ?

Thanks again.

Philippe_Hausler · April 19, 2022, 4:36pm

We do need to balance the requirements for contributions with the quality to the tests - so having a full rule for that might inhibit folks who might otherwise contribute. I personally don't mind having some high standards for testing - in this case we have a touch over 98% coverage on zip's machinery.

I think it is a fair change and definitely an approachable contribution to open a bug to rename the methods for the tests. Likewise I think it is reasonable to have diagrams for the methods (and actually I would rather explore not making them comments but instead making them actually supported diagrams). Since the string based validation DSL does not currently support tuples it might be worthwhile to come up with some standard combination methods to take input characters and emit something recognizable. e.g. 🤍+🟩 = 💚.

I think overall the bar for any proposal is that it is tested relatively well for coverage of its behavior.

The key areas that we gain benefit from is making sure the API surface area is the most appropriate, answering questions about usability of the APIs, ensuring the behaviors are what we want, and that things feel at home in the language from a consistency standpoint.

To that last point, the question I have is: is zip really something different from zipLatest? or perhaps are they actually the same thing but with options to the behavior? Granted the implementation went the route of making them distinct, but I think that concept has merit to merge them into one thing.

twittemb · April 19, 2022, 6:02pm

About the last point, food for thoughts:

I think zip makes perfect sense and implies a « de facto » symmetrical consumption of every sequences (the analogy with a true zipper is to be kept)
as zip implies symmetry, zipLatest does not seem well suited for combineLatest where sequences can be consumed at different paces.
naming both functions zip with an arg (like an enum) for the strategy to apply would be nice for discoverability but: as previously said zip does not fit with combineLatest, and what about the return type?

As zipping is a subset operation in the realm of combining operations, an option could be to have combine(.zip, a, b) and combine(.latest, a, b), but the return type question remains. And I guess we want to keep ˋzip` as is in regard to its sync counterpart anyway.

About the diagrams with the DSL we might hit a wall. I guess using emojis for tuples with more than 2 parts will be difficult + + = ?

About variadic parameters, perhaps it is worth mentioning in the proposal (even if not implemented yet) that there will be limitations like not being able to zip heterogeneous async sequences.

I’m also interested in making a PR for renaming unit tests functions. I’ll try to address that in the coming days.

Philippe_Hausler · April 19, 2022, 6:13pm

We can modify that DSL to make it more approachable. Right now we have tokens used for -^|;'[] . Ostensibly we could add () to construct tuples and overload the , operator to mean delimiting tuple members.

That way zip could be:

validate {
  "a-    -b-    -c-    -|"
  "-1    --2    --3    -|"
  zip($0.inputs[0], $0.inputs[1])
  "-(a,1)--(b,2)--(c,3)-|"
}

Currently it is possible to do heterogeneous async sequences, I would expect variadic generics would allow for that affordance too. @codafi might be able to chime in here on that. From my experience with working on Combine with folks zips of more than 2 inputs are rare. I also bet that once we get some throws AsyncSequence<T> (or however that is spelled) that will make it even more easily accomplished.

Philippe_Hausler · April 27, 2022, 11:04pm

This proposal has run its course of a decent amount of time visible; there have been some positive indications and only minor implementation/testing issues brought up. Thanks @twittemb for addressing the naming of the testing - I think that will aide in making it more approachable.

*rummages through my desk drawer for a big rubber stamp of [APPROVED]

I want to make sure to have good air time for each of the initial implementation proposals but perhaps for the next I will have it be a bit shorter since they are not really horribly controversial (like this one) and we have a litany of algorithms to cover.

Please chime in if we take it too fast and need more time to respond etc.