Swift Async Algorithms: Design Guidelines

tcldr · November 26, 2022, 12:06pm

Hi all, I started work documenting some of the guidelines for asynchronous sequence algorithms that seem to be emerging as the project moves forward.

What's I've found particularly interesting is not just the similarities to other reactive programming libraries, but particularly the differences. These differences largely emerge from asynchronous sequences being a 'pull' based data structure tightly integrated in the wider context of Swift Concurrency, rather than the 'push' based semantics that exist in traditional reactive programming libraries.

With that novelty, comes a bit of discovery of what a 'well-formed' asynchronous sequence looks like. Here, I've tried to document some of the guidelines gathered from around the community – as well as couple of my own – which I hope can be used as a starting point for discussion.

Eventually, if people find it useful, perhaps it can be moved to the repo to a) give clients an idea of what they can expect by composing these algorithms, and b) help to bring new contributors up to speed.

Algorithm Design Guidelines

1.0 Introduction

Asynchronous sequences are powerful and flexible data structures that process values over time. By assembling asynchronous sequences into pipelines it is possible to model complex asynchronous behaviors succinctly and declaratively.

These benefits are only made possible through the application of a specification that all asynchronous sequences and their iterators must adhere to. Some of these properties can be enforced by Swift’s type system through conformance to AsyncSequence and AsyncIteratorProtocol. However, some properties cannot be verified by the Swift compiler, and in these cases it is the responsibility of the asynchronous sequence designer to ensure that they are upheld.

2.0 Specification

2.1 Asynchronous Iterators:

MUST conform to AsyncIteratorProtocol
MUST produce elements in the following order:
- return Element (0-n),
- throw Error(0-1)
- return nil (1-n)
In other words, an asynchronous iterator returns a series of zero or more Elements, with completion of the series signalled by returning nil or throwing an Error. After completion is signalled, all subsequent calls to next() MUST return nil.
MUST eventually account for cancellation; either by returning nil or throwing from next. This means that iteration of an AsyncSequence must be responsive to cooperative task cancellation but the cost of checking for cancellation or setting up a handler for cancellation does not need to be paid for from each cycle of iteration. Instead, it can be paid for where it makes sense; particularly at suspension points.
SHOULD transmit back pressure from consumer/s to producer/s. However, an asynchronous sequence and its iterator MAY specifically be designed to absorb and/or release back pressure, but this behavior MUST be documented.
SHOULD NOT conform to Sendable in general. However, if this behavior enables some specific functionality, i.e. an iterator designed to be shared amongst Tasks, this behavior is permitted.

2.2 Asynchronous Sequences

MUST conform to AsyncSequence
MUST produce asynchronous iterators that adhere to 2.1.
MAY be Sendable. However, if an asynchronous sequence serves to transform a base asynchronous sequence that is Sendable itself, it SHOULD also be Sendable where possible.
If an asynchronous sequence is Sendable, it MUST be safe for its iterators to co-exist across Tasks. This is because it is possible for more than one Task to iterate a Sendable asynchronous sequence at the same time.*
MAY emit a different series of elements for each vended iterator. It is up to individual sequences whether they produce iterators that emit the same elements to all consumers, or whether a different consumer will see an entirely different sequence of elements, so long as each iterator independently adheres to the iterator specification.

EDIT: *Removed to reflect discussion summary notes

FranzBusch · November 26, 2022, 1:50pm

I have a similar start of a document floating around. Currently, on vacation but will post it once I am back. Would love to collaborate to come up with a ruleset doc to make development of future algos easier.

FranzBusch · November 26, 2022, 2:03pm

I think this is not true and a rather dangerous statement. One thing that I put into my doc is explaining the different kind of shapes of AsyncSequences (root vs algo, unicast vs multicast) which IMO is super valuable.

I consider this statement wrong because unicast AsyncSequences MUST not have multiple iterators. This can either be implemented by fatalErroring when a second iterator is created or by just creating an empty iterator that returns nil right away.

tcldr · November 26, 2022, 2:15pm

Great! I thought there was probably something similar floating around but wanted to surface this now to clarify for myself and others what a well-formed asynchronous sequence looks like.

Yes, I agree. That does sound valuable and definitely something that should be documented.

I think that's one of the reasons I thought I'd put this up for discussion.

I don't think a fatalError is an acceptable outcome for a lot of public facing API. I think that the far more acceptable outcome is what you outline as your second solution, 'creating an empty iterator that returns nil right away'. This would satisfy 2.2.4.

FranzBusch · November 26, 2022, 2:36pm

I am torn on this one. We (the NIO) team opted to using a fatalError since it is a user error to try to iterate the sequence multiple times and just returning nil is very hard to track down and doesn’t surface the user error well enough IMO.

In the future, it might be possible to solve this with language features such as taking, but until then I still think fatalErroring here is better instead of silent failures

tcldr · November 26, 2022, 5:23pm

It's interesting that there is a common class of asynchronous sequence, those which fit your classification as root unicast sequences, for which iterating twice is a programmer error. I think AsyncStream is another for which iterating twice can result in a fatalError. I don't think that's ideal. It would be really strange if there were common Sequence (non async) types which trapped on second iteration.

I'm not saying programmers/libraries can never break these rules, and for a library like NIO it might make perfect sense, but I think we can probably come up with some guidelines for SAA that make snapping together a pipeline of asynchronous sequences at least as intuitive as the Rx or Combine equivalent. Even if that means creating a multicast algorithm whose only purpose is to assert on multiple iteration. At least that makes the behavior explicit.

I think that 'take' would possibly be only a partial solution for these types of asynchronous sequences, as an iterator could be created in one asynchronous context, and then 'taken' only to be re-iterated in another asynchronous context.

More generally, I do think there's something curious about these 'root unicast' algorithms. Honestly, my hunch is that they're not asynchronous sequences at all, but actually some kind of asynchronous iterator.

Rx and Rx derivatives have neatly side-stepped this issue in their public APIs through Subjects – no issue with multiple iteration of a source if it's multicast. But perhaps here there's an opportunity to do something different if we look at these sources through a different lens.

FranzBusch · November 26, 2022, 6:03pm

The problem for unicast sequences is not the iterator being reiterated after throwing/returning nil. This is a programmer error where just continuing to return nil is expected and the same as with Sequence.
The problem is rather the creation of multiple iterators. It might be that we can prevent this by making the function taking func makeAsyncIterator(). (No idea if this will work at all, we have to see how the other proposal are shaping up)

Karl · November 26, 2022, 6:11pm

Why? From the documentation:

A conforming sequence that is not a collection is allowed to produce an arbitrary sequence of elements in the second for -in loop.

Iterating a Sequence<Int> a second time might produce an empty sequence -- or it might produce an extremely long chain of random numbers. That can propagate through your program, exhausting memory, filling up storage, draining your user's batteries as it gets streamed out to the network, etc.

At least for synchronous sequences, iterating them a second time is a programmer mistake. Unfortunately we can't diagnose it, so we say its behaviour is unspecified. I would suggest that terminating is closer to ideal than any of the potential, legal outcomes listed above.

So what is the purpose of synchronous Sequence? So that you can write generic algorithms which only require a single iteration (such as contains), and that single implementation will be available both on iterators and collections. However, you must still only iterate a single time.

Something which supports multiple iteration, such as a file handle or network resource, seems to me like more of an "async source". There are lots of open design questions around such types. Async data streams typically involve non-local or ephemeral data, so it is difficult to obtain useful guarantees for building generic algorithms or data structures.

tcldr · November 26, 2022, 6:14pm

Yes, that's what I was trying to say. If you call makeAsyncIterator() on an asynchronous sequence in one async context, then 'take' the asynchronous sequence into another asynchronous context (via a Task for example), and call makeAsyncIterator() again in the new context, you'll potentially have two concurrent iterators regardless of whether the sequence was taken.

tcldr · November 26, 2022, 6:18pm

Yes, I'm arguing exactly the same thing, see 2.2.5.

What I'm saying is that an asynchronous sequence that is iterated multiple times probably wouldn't fatalError() in an ideal world.

FranzBusch · November 26, 2022, 6:18pm

I am implying here that taking means that the sequence is not usable anymore, I.e. you can only call makeAsyncIterator once. The sequence would also need to be a move-only type

tcldr · November 26, 2022, 6:20pm

Got it, that would certainly be an interesting route if it can indeed be expressed like that.

Karl · November 26, 2022, 6:30pm

Yeah I disagree. You should hope that it traps. It is definitely the ideal outcome.

Take this example, from the Sequence documentation:

for element in sequence {
    if ... some condition { break }
}

for element in sequence {
    // No defined behavior
}

That should scare the living daylights out of anybody who writes generic algorithms using Sequence.

Too many developers have this idea that trapping is the worst thing that can happen. Unpredictable behaviour should never be the preferable outcome.

tcldr · November 26, 2022, 6:42pm

I think you're fighting a different argument. I don't have an issue with trapping, generally. And I'm not trying to come up with blanket rules for the Swift developers who are creating their own asynchronous sequences. I'm talking about the SAA package specifically. If developers or library designers wish to design their own trapping asynchronous algorithms – that's fine.

I'm talking here about the SAA library specifically and creating a set of tools that allow programmers to snap together components and sequences in a way that feels at least as intuitive as other reactive programming libraries.

Karl · November 26, 2022, 6:52pm

I don't have any issue with developers snapping together components.

I do have an issue with the idea that they won't be given prompt notice when they compose those components incorrectly, and the suggestion that unpredictable behaviour is a more ideal outcome in that situation, or that being lax about such misuse leads to a more intuitive library.

That applies to this package, as well as it applies to any other library.

FranzBusch · November 26, 2022, 7:08pm

I think in general trapping is only really needed in root AsyncSequences since they decide what kind of castiness they have. Most other algorithms are just inheriting the castiness of their upstreams (unless the algo is about changing that property e.g. multicast())

However, I agree with @Karl here. Trapping is way better than undefined behavior or silent failures. Furthermore, Swift is quite big on this with bound checking etc.

tcldr · November 26, 2022, 7:17pm

The aim here is to foster predictability, so if anything here were to run counter to that I'd consider it a failure. I don't think I'd be alone in hoping that a Sequence or AsyncSequence that trapped on multiple iteration at least documented that behavior. Do you think that would be a fair assumption for a programmer to make?

I'm certainly not arguing for undefined behaviour. I'm not sure where that was suggested. But I do think asynchronous algorithms are falling short of the Rx ergonomics in this regard. I also think relating it to a bounds check is clearly a false comparison. A sequence has isEmpty or count to assess whether it can be accessed safely at a particular index. Asynchronous sequences have no equivalent methods/properties to assess whether or not they are safe to iterate.

Karl · November 26, 2022, 7:54pm

Well, if the documentation didn't mention it, I wouldn't assume that it was safe to iterate multiple times. It might be worth reiterating that in the type's documentation, but everybody has their own idea of what makes "good" documentation - I tend to like specifying these kinds of things, but others find it verbose.

Actually, Sequence doesn't have either of those. All it has it makeIterator().

I think the problem you're having is that synchronous Sequence is just kind of broken and doesn't compose, and AsyncSequence inherited most of that brokenness. I did mention those shortcomings during the review:

I think we need a better model for async data streams which isn't based on Sequence. AsyncIterator is fine - it is single-pass, no question - but when you get in to questions like multiple iteration, or what data you will see from those iterations, things get a lot murkier and we don't have a good model for them. Sequence is designed to not answer those questions, which means you must be conservative in your assumptions.

I'd also like to point out another important consideration - leaving gaps where you fail to enforce correct usage can lead users to depend on accidental results stemming from implementation details, which makes maintenance of the library much more difficult.

Unless the library authors are willing to document and guarantee that a particular async sequence will always allow multiple iterators to be created, the safest thing to do is fatalError to prevent it.

tcldr · November 26, 2022, 8:52pm

You're absolutely right, but generally I don't think an API would vend a non-Collection level Sequence if its iteration count could be unknown and its behavior on multiple iterations was undefined. In the async world, we don't yet have that option, unfortunately.

Thanks for sending this through, an interesting read that articulates some of the difficulties I've been having trying to reconcile the behaviour of asynchronous sequences. I would agree that AsyncIteratorProtocol better encapsulates that one-pass behaviour. My own assessment was that a 'root unicast' asynchronous sequence isn't a sequence at all, but a form of asynchronous iterator – so this echoes that idea in some ways.

It would be great to see what this might look like. On the consumption side, I think the model works quite well with for await _ in x { ... }, the transformation algorithms work relatively well with some question marks on priority inversion with multicasting, but the production side of things definitely feels like it could benefit from a more capable model.

I think agreeing a rule here either way would go some ways to helping people get intuition here. So if the best way of doing that is saying whether a root sequence will trap or not in the docs – I think that's all we can do. Non-root sequences will inherit the iterability of their base, so no need for any special callout in the docs here. But for root sequences, it sounds like the behavior will need to be called out one way or the other – which is less than ideal, but sounds like the best route forward absent of a better way of modelling things.

Outside of these guidelines: it's definitely making me lean towards the creation of some Subject-like multicasting root sequences as a way of smoothing out the intuition of the production side of things.

David_Smith · November 27, 2022, 12:17am

Regarding push vs pull, it may be useful to know that one of the really foundational things in both Combine and AsyncSequence was the realization that you can use buffers to turn push into pull, and use coroutines to turn pull into push. Registering a callback block and blocking a coroutine in next() are fundamentally the same operation. So I kinda reject the idea that it’s actually a pull model.

We spent a hilarious amount of time from 2012-2014 debating this before we figured out the symmetry I described above.