Swift Async Algorithms: Design Guidelines

DevAndArtist · November 27, 2022, 7:47am

That caught my attention. Would you mind to provide very minimal examples for those cases? I find myself running into bugs when working with back pressure mechanics. For example flatMap works fairly different with AsyncSequence than it does in RxSwift.

David_Smith · November 27, 2022, 10:13am

I think I’d need to see specifics, but maybe it’s the difference between “like a one-shot callback block” and “like a block that’s registered once and called repeatedly”?

tcldr · November 27, 2022, 10:25am

That's an interesting perspective, this is kind of fundamental to rule 2.1.4.

My own opinion is that it's going to be very difficult to reason about a pipeline that has multiple buffers (of the unbounded/latest/earliest variety) placed throughout its length.

The idea behind 2.1.4 is that whilst a pipeline that maintains back pressure from the point of consumption to the point of production can easily be converted into one that does not – through the placement of a buffer – the reverse is not true. Therefore, the algorithms for SAA should support back pressure as a default, and programmers can then compose the behavior they want either way. In other words, algorithm designers need to be mindful of the effect of placing a buffer in their algorithm.

Going back to the push/pull analogy (if it holds), I'm not sure that you're converting from a pull/push by placing a buffer, it's more like adding an additional source of 'pull' locomotion up the pipeline. It's not too dissimilar to plumbing in that regard, if you place a bunch of pumps at different levels you have to be careful not to cause a flood.

I think the natural polarity of the overall system is important, it does inform the design of the algorithms.

lukasa · November 28, 2022, 8:35am

While your insights are generally applicable here, it is important to grasp that "placing a buffer" doesn't imply "abandoning backpressure". You can continue to maintain backpressure while using a buffer to invert a push model to a pull model. SwiftNIO exposes just such a type: NIOAsyncSequenceProducer. We needed this to deal with exactly the situation that @David_Smith discussed: NIO's ChannelPipeline has a push model, AsyncSequence has a pull model, so we had to turn the backpressure model around.

The mere presence of a buffer does not inherently make a pipeline more difficult to reason about. The presence of a buffer that is unbounded and doesn't exert backpressure most certainly does. That's not a reason not to do this, but it is a reason to take particular care.

The core distinction about push-vs-pull streaming semantics is that pull semantics are impossible to implement unless you have a source transformation step or something thread-like you can block. Pull is a vastly superior programming model: it's easier to understand, users tend to naturally implement backpressure, and it matches our synchronous intuitions. The only problem is that if you don't have a powerful macro system or pervasive green threading, you cannot build such a system outside of a programming language. This is (part of) why Netty and NIO are both push-based: it is (or in NIO's case was) simply not possible to implement them as pull-based in their respective languages.

I will disagree with @David_Smith though. While I agree entirely that push and pull can be transformed into one another, the experience of working in them is fundamentally different. Inversion of control is hell for programmers, and returning to standard control flow makes understanding code far easier. It also requires pull-based streaming. While you can always transform one into the other, that transformation doesn't make them identical.

David_Smith · November 28, 2022, 9:49am

That's definitely a fair critique; I should try to be more precise about the distinctions I'm making.

What to do about multiple buffers in a stream is something I spent a while thinking about in the past and never reached any truly satisfying conclusions on. It does seem like you want them to be "aware" of each other in some sense, which makes piecewise composition of streams by unrelated bits of code tricky.

Figuring out where the optimal place for the buffer is can be nontrivial as well. Like if you have a network -> image data -> image stream, with a fast image decoder and not much ram you probably want to buffer compressed image data and decode it on the way out, but if you have more ram and a slower decoder you probably want to buffer decoded images so you can let the decoder get ahead during down time.

This is probably getting a bit off topic for this thread though, sorry. Better suited to the Buffer proposal I think

tcldr · November 28, 2022, 10:06am

I think that's where the distinction between a 'back pressure aware' buffer and a 'non back pressure aware' buffer comes in. In the buffer thread, one variation of these 'back pressure aware' buffers has been referred to as 'throughput' buffers. As @lukasa says, I don't think adding this kind of buffer makes a pipeline any more difficult to reason about.

It's the unbounded/earliest/latest variety you see in AsyncStream. There's nothing wrong with this kind of buffer, it just needs to be considered within context. I've seen people using AsyncStream as a surrogate type eraser for AsyncSequence. That'll cause a surprise one day!

This is precisely why I raised the issue. My argument would be that, while it won't alleviate all difficulties, if we can at least at least consider 2.1.4 when adding an algorithm to SAA, the overall system will be much easier to reason about. Snapping together a bunch of 'back pressure ready' components will be easier than dealing with a two-tier system where some algorithms do support back pressure and some don't. (Of course there's no issue with creating components which explicitly release back pressure, like the above described buffer, I just think their behavior should be intentional.)

In the end, the ability to do this requires understanding the difference between the pull and push models.

lukasa · November 28, 2022, 1:21pm

I will again note that you really don't need the buffers to be aware of each other. Networking is the canonical example, where every node in the network has buffers, all of which are unaware of each other. The key is that the buffers need to be capable of exerting backpressure on one another, such that they can appropriately rate-limit.

tcldr · November 29, 2022, 12:00pm

Thank you so much to everyone that contributed their thoughts to this thread. It’s been a really useful exercise. It’s certainly opened my eyes to some things that I hadn’t considered before, and I’m grateful for that.

Gathering the feedback from the comments above, there appears to be a broad consensus, with one notable exception:

2.2.4: If an asynchronous sequence is Sendable, it MUST be safe for its iterators to co-exist across Tasks. This is because it is possible for more than one Task to iterate a Sendable asynchronous sequence at the same time.

The community felt that this rule was at odds with the existing aims of consistency with synchronous Sequence types, for which no specific multi-pass behaviour is defined. Specifically, the documentation for Sequence states: “ [the Sequence] protocol makes no requirement on conforming types regarding whether they will be destructively consumed by iteration.” Therefore, this guideline has been removed in its entirety.

Thanks again, all, for your valuable input.