Follow-up on AsyncSequence and parity with Sequence based APIs

Jon_Shier · June 18, 2021, 1:16am

I agree, forEach, and parity with Sequence, is important for AsyncSequence's ergonomics and adoption.

IIRC, there was resistance to adding forEach on Sequence in the first place, due to purity concerns in the "functional" APIs. That is, people wanted such APIs to remain for transforming values, leaving the looping to for. I think history has shown this to be pretty wrong. APIs which force users to switch between paradigms or contexts, for lack of a better term, are always sources of friction. This friction either frustrates the users on a regular basis or, for the more motivated, pushes the user into creating equivalent APIs in their context. We saw this with forEach and Result. I think we still see it in the Optional vs. throwing context. And we would see it here if there weren't an equivalent looping API.

nikitamounier · June 18, 2021, 11:37am

Completely agree. It’s the same reason why people who use FRP libraries like Combine are more likely to stick to Result-based error propagation instead of throwing functions - so that they don’t have to switch contexts from the Result-based Combine to their throwing context, which are two significantly different styles.

Switching between the two creates unnecessary friction. It’s the same when switching between imperative control flow such as for-in, if or while to functional chained methods.

This is essentially what’s happening when using chained methods on an AsyncSequence combined with an imperative for-await loop.

josephlord · June 20, 2021, 9:56pm

Something like collect looks useful but I wonder if it is a special case of a more general operation that gathers or batches groups of items. map is great when you want to process each element but a standardised way of grouping could often make sense.

Whether it is for variable length groups e.g. converting a byte sequence into a line sequence or for fixed size groupings, e.g. byte sequence into arrays of 8 UInt8s that can be converted into a UInt64.

There is perhaps a second question of whether the batch operation should take a closure to a apply to the batch or a map operation should be applied afterwards instead.

There may also have to be an option to handle ending on a partial batch such as whether to make the last batch smaller or if an error should throw.

Not sure if there is an existing name for this. Thinking from scratch I think batch or partialReduce could be suitable.

Philippe_Hausler · June 21, 2021, 4:50am

There are variants of collect that in other FRP systems will collect by count or by time or by count and time. The pitch does not cover those and would be in a totally different set of pitched functionality. The pitched function is in the topological family of reducers; in that it reduces an empty array by appending until completion and returns that reduced array when successfully done.

Karl · June 21, 2021, 3:17pm

Broadly, I agree; Swift is designed to support multiple programming styles.

But that goes in both ways - even though many of the operations on AsyncSequence seem to assume or strongly encourage a functional style, that seems to be at odds with structured concurrency, which is all about writing concurrent code without forcing a functional style.

What that means is that if you want to write

let myData = someSource.map { ... }
                       .concat(
                         anotherSource.filter { ... }
                      ).collect()
                       .sorted()

like you might in RxSwift, Combine, or any similar framework, that's fine. But structured concurrency also lets you write:

var myData = await Array(someSource.map { ... })
await myData.append(contentsOf: anotherSource.filter { ... })
myData.sort()

And that is what I think is really missing right now. We should go through every standard library API which accepts a Sequence, and add AsyncSequence variants for as many of them as possible. Given the time constraints, many of them will probably have naive implementations for the time being, but at least you'll be able to write code like the above.

Yeah, I think it makes sense to do both.

This is an interesting point; I had just assumed the _Concurrency library was a sort of staging area while the feature was being developed, and that it would all be merged in to the single, standard library later. Is that not the case?

Because if the library barrier stops us adding first-class support for concurrency to standard library protocols, that's a big problem.

Philippe_Hausler · June 21, 2021, 3:30pm

@Douglas_Gregor do you know if we have ways to accomplish this? I don’t know the details of that barrier between the standard library and concurrency.

Douglas_Gregor · June 21, 2021, 7:22pm

We don't have good ways to accomplish this, beyond what can be added to existing types/protocols via extensions.

The _Concurrency module remains separate because that is one of several things we needed to do to make backward deployment technically possible. We'd like to collapse the _Concurrency module down into the standard library at some point, because it'll allow tighter integration.

Doug

nikitamounier · July 8, 2021, 12:38pm

Sorry to bring this thread back up, I didn't want to do this in the Request to amend AsyncSequence thread, since that's focused on the erasure of the error type by the AsyncIteratorProtocol. Is adding forEach(_:) method to AsyncSequence still something we want for parity with Sequence and for ergonomics? From what I've reread up above, it seems like most people agree.

Karl · August 16, 2021, 5:11pm

(Bumping this because IMO it's still an important missing feature)

So we can't use types like AsyncSequence, but can we use async closures in the standard library? IIUC that's a language feature.

Because if so, I think we should try to add versions of [Contiguous]Array.init(unsafeUninitializedCapacity) and String.init(unsafeUninitializedCapacity) where the initializing closure is allowed to suspend. This requires access to implementation details of these types, so it would either have to live in the standard library or use some kind of SPI to expose details to the _Concurrency module.

This would at least allow us to implement some basic features, such as a concurrent map, without temporary allocations. Even if they're added later, having these core functions in the standard library could help ensure they are back-deployable.

Douglas_Gregor · August 16, 2021, 7:19pm

It doesn't really matter that it's a language feature; it still depends on the concurrency runtime for, e.g., task allocation. There might be technical solutions here short of collapsing the concurrency runtime into the standard library (e.g., forcing weak-linking of the necessary symbols), but they'll require some fiddling with the compiler.

We'll need to sort all of this out to make good use of reasync, which I suspect should go onto just about everything in the standard library that currently has rethrows.

Doug