[Concurrency] AsyncSequence

Thanks for writing out this idea @CTMacUser.

It seems that the core design point is that the 'collective' type is not just a collection but also a kind of lazy asynchronous buffer. I believe that, if we decided such a type is useful, we could build it on top of AsyncSequence in a manner like the existing protocol hierarchy.

I'd like to reiterate a core idea of the AsyncSequence protocol is a focus on ease of adoption. That does leave something on the table, perhaps, vs a complete set of protocols that can be arbitrarily composed, but I think that the tradeoff towards simplicity is worth it.

1 Like

Hi @jack,

You are correct that awareness of the cost of iterating the async collection has to be part of writing code that uses it.

I imagine the implementation of our hypothetical URL.lines() function above would iterate the contents of the URL each time you called for/in on it. This does not preclude the possibility of some lower level intelligent cache, nor does it preclude the possibility of introducing a buffering async collection type in the future.

I'm not sure if an API like lines would be written to return such a type, though, because it seems to require a buffer that is the size of the file wether you want it or not (and assumes a finite file size). It might be something we would optionally compose with a sequence-based lines API.

If I understand correctly (a big if), we can mark the cancel function as __consuming, which will allow us to more smoothly transition into move-only implementations of future AsyncIterators, when move only types are finished. I imagine we will need to figure out a transition story for all iterators at that time, including existing Sequence API, correct?

2 Likes

The main point of the design I gave is that AsyncCollective (or whatever the name will be) can't be built on top of AsyncSequence; the abstract sub-typing is actually the other way! (AS refines ACve by imposing requirements for a fixed publication order and said order being semantically significant.) ABI stability means that it can't be fixed later.

The thing I'm primarily concerned about is the language/compiler (in this case, the implementation of the for/in loop) knowing about this style of cancelation, not about the protocol requirement. This notion of cancelation don't have anything to do with async - the same issue arises today when breaking out of a for-in loop over a Sequence iterator that has internal state.

What I'm observing here is that we have the ability to model this today for Sequences (it is just inefficient due to the extra class indirection) and that ownership will allow allow this to be efficient as well, all without extending the language or compiler. We have exactly the same situation with the async version of this, so it would be nice if we didn't have to add this wart to the compiler and language as a short term workaround.

Of course, with the compiler/language support dropped, it follows that it wouldn't have to be part of the protocol at either, further increasing consistency here.

-Chris

6 Likes

Related to cancel/deinit:

Looking at the .net equivalent to this, they have both a cancellation token and DisposeAsync. How does this design compare to that?

  1. If the cancel here could potentially be modelled as a deinit, is that equivalent to .Net’s Dispose? Should cancel be cancel() async?

  2. Is there an equivalent to the cancellation token, where the source of the items going into the AsyncSequence can be stopped? Eg. Cancelling the network request from a separate thread/actor

I think this would be a great addition to Swift. I've previously used for-await-of in JavaScript/Node, with IxJS to add collection operations to them and found it a very clean way to work with long streams of data without exhausting memory.

One issue that has come up in Node with this approach is the overhead of awaiting each iteration of the loop (see here for an example). Would Swift have the same problem? My guess is that it probably doesn't, because await only suspends if the thing generating the responses has to suspend, but it would be good to have that confirmed. One concrete example of this would be consuming a paged web service API. You would only have to suspend every 50 (for example) items in the list to wait for the next page of results.

I think this a great suggestion, and I hope it does not go overlooked. It seems like a classic example of Swift taking existing ideas and removing the unnecessary legacy cruft. The word await does not yet have any preconceived meaning in Swift, so await ... in still seems like a natural construct.

From my reading of this, it seems that the way this would work is to wait for the first item in the sequence, then the second, then the third, and so on. Suppose I have an array of URLs, and I want to fetch the contents of them and turn them into Strings, and then perform some task with the resulting array of strings. Is there a plan for a way to iterate over these where each item can be performed concurrently? In this case I wouldn't care the order in which requests start or finish (though I would want the final array to be in the same order as the original).

Something like this:

let array: [URL] = ...

let result = await array.asyncMap { await fetchString(forURL: $0) }

// do something with result

Late to the party here, but I just wanted to give a big +1 to the concerns @Chris_Lattner3 is raising; this seems to desperately call for move-only types, and compromising the language/library (especially if it must be binary stable indefinitely into the future) seems somewhat short-sighted.

Thank you everyone for your hard work!

Maybe it's worth mentioning these names in the Alternatives Considered section.

Overall I really like this proposal. I would just like some clarifications. Firstly, I noticed you included the methods append(_:) and prepend(_:), which –– if I'm not mistaken –– are not part of today's Sequence protocol; is there a particular reason for this choice? Also, a lazy sequence's compactMap(_:) method currently has a return type LazyMapSequence<..., ElementOfResult> *, whereas in AsyncSequence it returns AsyncCompactMapSequence. I wonder if there's a reason for this or if it was done so as to fit in the table.

* The full return type is: LazyMapSequence<LazyFilterSequence<LazyMapSequence<Elements, ElementOfResult?>>, ElementOfResult>

The generic signature of the return types were elided for brevity. AsyncCompactMapSequence would have a generic signature of AsyncCompactMapSequence<Upstream: AsyncSequence, Transformed>

I am not sure compactMap in terms of async sequences can be expressed as a composition of map + filter + map. Lazy and Async have similarities but are not 1-to-1 mappings.

1 Like

Hello all,

Thank you for your thoughtful feedback. I will attempt to consolidate some answers/updates/comments into one reply. I will also update the proposal at the link above to include this content shortly.

Append/Prepend

This was intended to be more like Array's + function. For now, to simplify the proposal, I've removed them.

Async Cancel

We think cancel should be synchronous, especially since the compiler will call it for you (see below for more on this). This is discussed in Alternatives Considered as well.

We will also update the proposal to suggest that deinit should be equivalent to cancel if it exists on the iterator.

Awaiting Many Things

A future enhancement (either in the standard library or a higher level library) could be to introduce a kind of buffering AsyncSequence. As an initializer argument it could be told how many things it should attempt to eagerly fetch and act as a kind of signal smoother.

We've investigated the plans for the overhead of the await keyword itself (not including any user code) and we believe it is low enough to not be an issue on its own.

Naming

We considered AsyncGenerator but would prefer to leave the Generator name for future language enhancements. Stream is a type in Foundation, so we did not reuse it here to avoid confusion.

await in

We considered a shorter syntax of await...in. However, since the behavior here is fundamentally a loop, we feel it is important to use the existing for keyword as a strong signal of intent to readers of the code. Although there are a lot of keywords, each one has purpose and meaning to readers of the code.

Add APIs to iterator instead of sequence

We went off and explored this idea in depth. It is certainly a compelling argument. However, we ultimately came back around to the decision that consistency with the existing Sequence APIs is the tradeoff decision we would like to propose.

We discussed applying the fundamental API (map, reduce, etc.) to the AsyncIterator protocol instead of AsyncSequence. There has been a long-standing (albeit deliberate) ambiguity in the Sequence API -- is it supposed to be single-pass or multi-pass? This new kind of iterator & sequence could provide an opportunity to define this more concretely.

While it is tempting to use this new API to right past wrongs, we maintain that the high level goal of consistency with existing Swift concepts is more important.

For example, for...in cannot be used on an Iterator -- only a Sequence. If we chose to make AsyncIterator use for...in as described here, that leaves us with the choice of either introducing an inconsistency between AsyncIterator and Iterator or giving up on the familiar for...in syntax. Even if we decided to add for...in to Iterator, it would still be inconsistent because we would be required to leave for...in syntax on the existing Sequence.

Another point in favor of consistency is that implementing an AsyncSequence should feel familiar to anyone who knows how to implement a Sequence.

We are hoping for widespread adoption of the protocol in API which would normally have instead used a Notification, informational delegate pattern, or multi-callback closure argument. In many of these cases we feel like the API should return the 'factory type' (an AsyncSequence) so that it can be iterated again. It will still be up to the caller to be aware of any underlying cost of performing that operation, as with iteration of any Sequence today.

Move-only iterator and removing Cancel

We discussed waiting to introduce this feature until move-only types are available in the future. This is a tradeoff in which we look to the Core Team for advice, but the authors believe the benefit of having this functionality now has the edge. It will likely be the case that move-only types will bring changes to other Sequence and Iterator types when it arrives in any case.

Prototyping of the patch does not seem to indicate undue complexity in the compiler implementation. In fact, it appears that the existing ideas around defer actually match this concept cleanly. I've updated the proposal to show how this could work.

We have included a __consuming attribute on the cancel function, which should allow move-only iterators to exist in the future.

10 Likes

The proposed design treats await as a pattern:

for await let element in myAsyncSequence {
  doSomething(element)
}
Transformed into:
var it = myAsyncSequence.makeAsyncIterator()
while let element = await it.next() {
  doSomething(element)
}

I wonder if async let could gain the same treatment. @ktoso, would this fit with the proposed structured concurrency design? We would get a basic non-customizable parallel for..in loop having the same behavior we would get by writing an async let for each element of the sequence:

for async let element in myAsyncSequence {
  doSomething(await element)
}
Transformed into:
var it = myAsyncSequence.makeAsyncIterator()
await Task.withGroup(resultType: type(of: it).Element.self) { group in
  await group.add { await it.next() }
  await group.add { await it.next() }
  ...

  while let element = await group.next() {
    doSomething(element)
  }
}
4 Likes

Hi Tony,

Can you elaborate more of why this "has the edge"? There doesn't appear to be anything specific about async-ness to the idea of having a "closing off the iterator": shouldn't we add the same thing to iterator types if this is important, for consistency and to solve equivalent use cases for normal iterators?

Relatedly, I don't think "cancel" is the obviously right term here. Cancelation is a different thing that applies to tasks. This concept something more akin to close() or finalize() operation in GC'd systems. It is a separate-from-deinit pattern for closing things off.

-Chris

2 Likes

Hi Chris,

We've done some surveys of the APIs available in Apple's SDKs and it seems to be the case that most synchronous APIs (e.g., returning an Array) does not require some kind of cancellation whereas most asynchronous APIs do have some mechanism for cancellation. Examples of cancellable APIs from Foundation include timers, notifications, and KVO. I think the idea of supporting cancellation is part of what makes an async API different from a synchronous one.

Your suggestion of the terminology of cancellation vs closing here is interesting. Perhaps there is some explanation missing from the document. It's intended that cancellation is only invoked if the for loop is exited early. The compiler would not add a call to cancel when the iterator returns nil. What that means is that cancellation is a signal from the iteration to the iterator instead of a signal from the iterator to the iteration. The reason I write that it 'has the edge' is that I am convinced it is valuable to have this be an explicit thing that either the compiler or the code author can write, whereas relying on memory management has the potential to be a lot harder to make fully deterministic.

Thanks for your thoughts and involvement on this!

3 Likes

Hi all,

Heads-up that we've scheduled the full review of this pitch as SE-0298 January 12...26, 2021.

Doug

7 Likes

Great, thanks for the feedback Tony!

2 Likes

... and the review has started over at SE-0298: Async/Await: Sequences, thank you everyone.

Doug

1 Like
Terms of Service

Privacy Policy

Cookie Policy