SE-0298: Async/Await: Sequences

xwu · January 16, 2021, 4:22pm

Note that this doesn't hold if next() throws:

If next() throws an error, then iteration also ends and the compiler does not insert a call to cancel() .

Are you convinced that this is the best we can do for a cancel() API?

It leaves us where the core use case that justifies cancel() not being interchangeable with deinit requires users to read the documentation about an iterator type to find out about an exception to the semantics of AsyncIterator.

It would seem to me that it would be sensible instead to have AsyncIterator not support a cancel() API at all in that case, with the type in your use case vending an async-iterable view and, separately, its own custom cancellation API.

Tino · January 16, 2021, 5:22pm

I'm not really convinced by the whole "async"-story, but that would be a fight against windmills...
However, as everyone is talking about cancel now, I'm not sure if this is actually the best name for the action: I'm not an expert for natural languages, but something like "stop" would be a better fit for me.
"Stop" implies that something is already happening, whereas "cancel" is often used in a context where the action didn't even start yet ("stop the train" vs. "the flight was cancelled").

I think the best choice would be break(), which draws a connection to regular loops — but for the exact same reason, it is currently forbidden (unless you add backticks).

johannesweiss · January 16, 2021, 5:40pm

You're right, I forgot that case. So I should've written:

iterator run to completion ( next() returns nil )
iterator returns an error (next() throws)
cancel() is called

The key is that the iterator itself knows when to initiate resource teardown whilst it is still a fully live object.

I'm don't know if cancel() is the very best we can possibly do. However I'm absolutely convinced that relying on deinit will lead to terrible outcomes and cancel() is definitely good enough.

I disagree with this. AsyncInterator ought to support iterating over things that require non-trivial kernel resources (network sockets will require eventing (such as DispatchSources) to work in an async work. These cannot be cancelled synchronously.) cancel() is IMHO the currently best suggestion and so much better than doing nothing and leaving cancel() out.

xwu · January 16, 2021, 6:30pm

I understand the desire, but this cancel() API explicitly doesn’t support such asynchronous cancellation, so why should it be accepted as-is? Rejecting it doesn’t mean never having a cancel() API, just not this one.

Anders_Ha · January 16, 2021, 6:42pm

As proposed, it is. But we have had language features, like static callable in the form of callAsFunction(), that look for matching declarations on the expression result without tying the sugar to a named protocol in the Standard Library.

A potential alternative design for for await ... in ... could be done as such, by looking up a usable func makeIterator<U: AsyncIteratorProtocol>() -> U declaration on the result of the given expression.

Chris_Lattner3 · January 16, 2021, 8:18pm

As I mentioned up-thread, the rules that apply to Objective-C are not always applicable to Swift for a number of reasons, including absence of autorelease and the (soon!) absence of threading issues. Furthermore, much of the guidance in the Mac/Cocoa world was revised and designed in the era of Objective-C garbage collection (libauto), which had very different constraints that are definitely inapplicable to Swift.

All I'm trying to say is that I'd recommend first principles thinking here instead of citing ancient Objective-C cocoa docs as "proof".

-Chris

johannesweiss · January 17, 2021, 10:54am

Honestly, I'm a little shocked how

In fact, some good old docs have some key information too (still in ObjC...): "Don't Use dealloc to Manage Scarce Resources" (like file descriptors).

this is now seen as me trying to sell this as a "proof"? I said some old documentation contains some key information and I stand by that. The three points made in the old docs are:

Order dependencies on object graph tear-down.

My verdict: Not or only partially true in today's Swift. The order of release calls is not fully clear/obvious so I reckon there are still cases that would cause major headaches today.
Non-reclamation of scarce resources.
Memory leaks are bugs that should be fixed, but they are generally not immediately fatal.

Absolutely true in today's Swift.
Cleanup logic being executed on the wrong thread.
If an object is autoreleased at an unexpected time, it will be deallocated on whatever thread’s autorelease pool block it happens to be in. This can easily be fatal for resources that should only be touched from one thread.

The autoreleasepool argument is only partially true in today's Swift but the "wrong thread" is still 100% true. This may be resolved with future proposals but as it stands, it is not.

My real-world Swift experience where deinit has been used to manage scarce resources had me discover independently that deinit is not suitable for reclamation of scarce resources (& other resources that require locks or similar to tear down or for whatever other reason take long to tear down).

EDIT: I know that me "discovering" this doesn't mean it's true. I may be wrong, the codebases I've worked on may have been of poor quality, there are many reasons. I am stating my opinion here and I'm happy for others to disagree.

I found this old Cocoa documentation years after I made up my mind about this and thought that it's a good summary of some of the issues (not all of them apply today). If this were written for Swift (which obviously has ARC) is that the exact point where deinit is called is dependent on the optimiser settings. That in my mind also causes real issues if you manage scarce resources.

gwendal.roue · January 17, 2021, 12:48pm

As a plain data point, I just want to say that the SQLite wrapper GRDB relies on deinit in order to cleanup the resources associated with the iteration of database results. Users can write code like:

try dbQueue.read { db in
    // This code runs on a specific DispatchQueue.
    let cursor = /* some cursor built from the db closure argument */
    while let element = try cursor.next() {
        // Use element.
        // Break any time if needed.
        // No need to call close() or cancel() or whatever.
        // No risk of forgetting to call close() or cancel() or whatever.
    }
}

Database accesses must happen on a specific DispatchQueue, and this is enforced by preconditions that ship in release builds. The goal is to prevent the user from misusing the library and access the database outside of its dedicated dispatch queue.

Cursor's deinit is guarded by such a precondition. Program crashes if a cursor is deallocated on the wrong queue.

Practically speaking, those preconditions never trigger. Not a single issue about this was ever opened in five years. This means that cursors are properly deallocated when expected. Why? Because they are only used for iteration, and are never retained by anything else. Why? Because nobody needs to do it. ARC behavior, in such context, is deterministic enough.

I'm not expert enough to discuss the merits of cancel vs. deinit in an asynchronous context. But I know a working example where ARC is reliable, and I'm tired of reading that it should never be relied upon.

Lantua · January 17, 2021, 1:32pm

If we really want to include cancel and call it at the end of the loop, I think it's better to call it unconditionally. Things like map and filter already erase the classful-ness of the sequence. It's not a reliable discriminator.

Anders_Ha · January 17, 2021, 2:28pm

Your example is reliable by the virtue of the underlying library supporting multiple active resources with overlapping accesses. So a subsequent resource can be opened, regardless of whether the first resource has been closed or not. Now imagine now hypothetically, that only one SQLite statement can be active at a time, or otherwise the process crashes. Could we still say for certain that relying on ARC deinitialization is reliable?

When people say ARC "cannot be relied upon", IMO they are meant to say that the deinitialization timing of scope-local constants and variables is not reliable, because inlining and optimisation passes can produce different orders than you expected, without you noticing at the language level. Say it could deinitialize at the end of the scope, or deinitialize right after the last point of use, all depending on what compiler flags are used. This is also why primitives like withExtendedLifetime() exists for when you need a deterministic deinitialization order.

If the use case can tolerate the indeterminism of deinitialization timing, ARC is perfectly fine. But we should care about use cases that cannot tolerate it, while designing a new construct in the language, shouldn't we?

gwendal.roue · January 17, 2021, 2:37pm

I agree. But for the sake of the quality of the discussion, an example of such case should be exhibited, an example where the lack of an explicit cancel() creates an actual problem.

johannesweiss · January 17, 2021, 4:50pm

Agreed, both with “it works fine for GRDB” and with the request for an example.

I tried to create a real, not too long example in the “Details” bit at the very bottom of SE-0298: Async/Await: Sequences - #39 by johannesweiss . Do you agree that this would be hard to even harder to achieve with just deinit?

Regarding “it works fine for GRDB”: I totally believe you that it works fine for your use case (and many out there) but I think we shouldn’t make the assumption “it will work for all use cases”, therefore I believe the AsyncIterator must have some sort of cancel that isn’t deinit.

gwendal.roue · January 17, 2021, 5:59pm

I missed your post. Sorry some may have understood, because of what I wrote, that such a detailed example has not been provided before. This really helps the review!

Chris_Lattner3 · January 17, 2021, 6:06pm

@johannesweiss I'm not sure why you are shocked. You are extrapolating extremely general advice from a different language to a very specific situation (the design of a cancel() member) in Swift. Also, I didn't say you are completely wrong, only that the approach isn't super helpful -- that is why I argued for first principles thinking.

I will explain what I mean, but if you are specifically interested in making general claims about Swift than talking about the design of the cancel() feature, then I think it would be best taken to a different thread.

ARC will release the iterator eagerly for any reasonable use-case when you are done with them, and is 100% guaranteed to do so with the for-in syntax in particular. These are not general object graphs were are talking about in an overly heavy OO system, this is an iterator object which is a value.

You have correctly observed that iterators can be captured and escaped by closures, and doing so would make their lifetime less predictable. However, this isn't the basis of the guidance you are quoting, I don't think it is very common, and there are well known ways to solve that problem when they occur in practice. Even more concerning, "solving" this for AsyncIterator would imply that we should apply this pattern to effectively every other protocol (including e.g. Iterator) since there is nothing AsyncIterator specific about this problem -- many protocols implementations can indirectly refer to "expensive resources".

I can't see how "order of release" calls is relevant or has any problematic effects for the discussion at hand. You cannot destroy the sequence before the iterator, nor would the sequence have a pointer back to its iterator.

Are you making a claim about swift in general or are you trying to constructively contribute to the point of this discussion: iterators over async sequences? While the observation above is correct in the abstract, it is not particularly useful here unless you think that iterators need special defensive work to keep them from escaping out, per the points I made above.

We are talking about AsyncIterator iterator instances, none of which have been written yet, not making generic claims about Swift types overall. There exist no autoreleased implementations of AsyncIterator, this is 100% true.

This is a key stated goal of the Swift Concurrency direction and the core team has already said that they will revised older accepted proposals if necessary as the overall design evolves.

One of the things I observed in the pitch thread is that the cancel() design is both unnecessary (as observed above) but also that it is equivalent to the manual "invalidate" pattern used in GC systems to avoid finalizers (xref reference on finalize and an explanation). Not having finalizers (and the many problems they bring with them) in ARC and Swift is a huge progression over other GC'd languages. Bringing the manual reclamation patterns they require into Swift will cause exactly this sort of confusion (and has other issues like making AsyncIterator inconsistent with Iterator, etc). The only rationale to add it as part of this proposal is because we do not yet have linear types in Swift, which makes this a "point in time problem" not something inherent to the design of async sequences.

Here is the thing: ARC is not unreliable like GC'd systems, and I don't think we want finalizer like patterns in Swift API design. This is not a precedent that we want to set or encourage. Swift's gentle encouragement of people to use value types instead of reference types is one of the ways we directly improve this vs Objective-C, but there are many others as well. This significantly reduces the problem in practice, but does not completely invalidate @johannesweiss concern. We should aspire to keep improving the language (as already planned) instead of just go back to old GC patterns.

This is one of the reasons that I'm not in favor of adding the cancel() member. It has demonstrably created confusion in this thread, and we don't want people to pull forward this design into other places.

-Chris

johannesweiss · January 17, 2021, 8:24pm

First of all, thank you very much for your thoughtful reply. I pretty much agree with everything, except this one point:

You say it's a value and I agree that usually iterators are values. However, if we take the current design but removed cancel, then you write

which I understand as "if we accept this proposal without cancel, then we will push people to use classes instead of structs for iterators" (until the introduction of deinit for structs). Once they are classes (ie. not values), I am worried that we will get designs where people are more or less forced into using deinit to discard their network connections/file handles/... . This then rang my alarm bells (maybe a little loud). Example:

class NetworkConnection {
    /* need deinit */ class NetworkConnectionIterator {
        typealias Element = [UInt8]

        // [...]
        deinit {
           // NetworkConnection tear down
        }
    }

    func makeIterator() -> NetworkConnectionIterator

    func close() async throws
}

I think your point is that NetworkConnectionIterator is less likely to be part of an object graph, and maybe it's unlikely that users accidentally create a reference cycle that contain the iterator etc.

I can see the point that maybe the iterator is just so unlikely to be implicated in the problems that I describe that "just deinit" may be good enough. I definitely take the point that I should have thought this through more carefully in the domain of just AsyncIterator and not in general. In general, I absolutely stand by my points but maybe they are not as relevant as I though to the very tightly scoped domain of AsyncIterator.
I guess what I have seen so far is that once one piece starts to manage non-memory resources with deinit, it becomes pervasive which I've seen as a huge issue. I guess if a low-level (my usual domain) piece manages its resources using deinit, it indeed becomes pervasive. But iterators are not low-level objects, we don't typically build many composed things with iterators. That usually makes the lifetime of iterators is much shorter than the many examples of issues that I have in my mind.

Still, I believe having cancel is preferrable:

We have the option to use or not use deinit to implement resource cancellation. That makes it less important if we miss an important case where deinit just doesn't work (because we have cancel).
We can save allocations for the iterator (because we can actually make it a struct/value)
Implementing a correct multi-step asynchronous tear down in deinit is really hard (all the reasons I described, must not publish a self reference, ...)

Now, if we did indeed get linear types, move-only types, and other type level features, then everything around deinit will change. But I am always reviewing a proposal with what is in the language today + the previously accepted proposals that I am aware of. I find it hard to take for example linear types & "threading issues safety" into account before it is decided that and exactly how we're getting them.

// EDITs: grammar/typos/clarity

Chris_Lattner3 · January 18, 2021, 12:25am

Yes, that's right. I think that would be the natural consequence of removing the cancel requirement, until swift supports structs with deinits.

Right exactly. The fact that NetworkConnectionIterator happens to be a class is not part of the generic AsyncSequence protocol (since the member is not AnyObject constrained), so generic algorithm clients wouldn't even be exposed to this implementation detail.

Right, or at least trading one problem for another. The presence of the cancel member itself leads multiple possible problems and bugs with manually written code, since it has a complex set of conditions in which it is supposed to be called (or not). I think that humans will likely get that wrong even if the compiler always does the right thing for for/in.

Fair enough, I respect your concern and opinion but I personally don't think that this is a bad thing. I'm explicitly encouraging use of deinit for resource management, because even though it isn't perfect, it does it actually does work pretty well in Swift in the majority case. The exceptional cases (e.g. object graph cycles) have well known solutions, and ARC being deterministic (unlike GC collections) make it relatively easy to track down the "unpredictable" cases because they can be reliably reproduced.

I agree on your second point particularly in the immediate term. Linear types aren't part of the concurrency push that AsyncSequence is part of, and therefore isn't something we can count on in any specific timeframe.

On the third point, I don't understand which cases would cause the iterator to publish its self reference. Typically I'd expect the iterator's deinit to just call a sync method on the iteratee to say "hey, stop what you're doing if convenient" as Tony described above. I don't expect the iterators lifetime itself to contain useful state. If the iterator itself is heavy and complicated, then it doesn't make sense as an implicitly copyable struct anyway.

On the first point, I don't understand who "we" is. Are you referring to the implementor of an AsyncSequence, or of a client that is iterating the AsyncSequence? From the client perspective, I see the removal of cancel to eliminate an entire class of concerns and potential for misuse. It it likely that I'm misunderstanding what you're saying here though!

-Chris

johannesweiss · January 18, 2021, 3:54pm

Likewise, I very much respect your opinion. I guess we have had very different experiences and that's okay.

Needless to say that I agree with the actual facts that ARC is both 100% reliable and 100% deterministic, it's a compile-time feature after all. Regarding ARC and determinism of whole programs I'd say "ARC doesn't make an otherwise deterministic program non-deterministic." That's fantastic [and an actual difference to a GC]. However, most programs aren't deterministic which means the context deinit is called from is also not deterministic in general. But as you said before, the question in this proposal review is if relying on deinit is sufficient in the more limited domain of async iterators.

Even for just async iterators, I am still am not convinced because there are very few use cases where I'd want to use an async iterator if there are no networks involved. For example wouldn't an API like this be reasonable?

class MyDBConnection {
    class DBCursor: AsyncIterator {
        typealias Element = DBRow
        ...
    }
    func makeRequest(_ request: Request) async throws -> DBCursor
}

Unless we iterated the DBCursor until completion, I would expect that once we're done with the DBCursor we need (via the network) tell the DB that we're done with the cursor (as it occupies resources in the DB). After telling the DB, we'll need to (a)wait the result of the cursor cancellation from the DB. So there'll be a lot of asynchronous code at play here and the cancellation of the cursor will take an arbitrary amount of time.

Regarding "just telling the iteratee" and "useful state", the best I can come up with right now is the DBCursor as async iterator example above.

Regarding "publishing its self reference", on the spot, I can only come up with made up code like:

func stupidLog(_ message: @escaping @autoclosure () -> String)

class DBCursor: AsyncIteratorProtocol {
    // [...]

    func handDBConnectionBackIntoConnectionPool() {
        guard self.stillOwningConnection else { return }
        self.stillOwningConnection = false
        self.connectionPool.syncQueue.async { // <- self publish 1 (through escaping capture)
            self.connectionPool.append(self.connection)
        }
        stupidLog("\(self): handed conn back") // <- self publish 2 (through @escaping @autoclosure)
    }

    deinit {
        // looks innocuous here.
        self.handDBConnectionBackIntoConnectionPool()
    }
}

Yes, my example is made up, could be rewritten, and would probably look nicer after Swift's concurrency efforts have landed. And yes, with ActorSendable, it would probably fail to compile because DBCursor and DBConnection may not be ActorSendable.

Note: This was specifically about self publishing so in the example above I left out the hard part which is communicating to the DB that we no longer require the cursor.

I'm merely pointing out that there is code that may pass a code review but turns out to not be suitable if called from deinit. I think that would be a problem if deinit is the only way that cancellation is signalled. Sadly, we also don't fully know how likely self publishing in deinits is today because today's Swift turns them straight into a memory unsafe program (UaF) and maybe we're mostly getting away with it? Yeah, unlikely, still needs fixing IMHO.

Sorry, that was unclear. "We" was meant to be the implementor of an async iterator. But you are of course right: The users of async iterators will have a harder life if cancel is required because they'll need to remember to call it. In most cases I'd think that a simple defer { it.cancel() } (or ofc using the for loop) will do the job, and maybe we could help with static analysis.

benrimmington · January 18, 2021, 7:34pm

The automatic cancel() could be useful, if makeAsyncIterator() returns:

self (e.g. default implementation), when the sequence is its own iterator;
or a cached iterator, which should be reset and put back into the cache later.

ETA: Alternatively, wrapper classes with deinit could be used for the above.

If a type can conform to both of the Sequence and AsyncSequence protocols, what happens to an API such as contains(_:), which differs only by its async rethrows?

Should AsyncSequence use __consuming on its makeAsyncIterator() requirement?

Sequence uses __consuming on its makeIterator() requirement.
The proposal uses __consuming only on the cancel() method.
The current implementation doesn't use __consuming anywhere.
(I don't understand what __consuming actually means.)

ETA: Swift 5.3.2:

error: method must not be declared both '__consuming' and 'mutating'
  __consuming mutating func cancel()
  ^~~~~~~~~~~~

ddddxxx · January 19, 2021, 3:51am

You can't publish self in deinit. Once you get to deinit, there's no way back.

class DBCursor {
    deinit {
        let q = DispatchQueue.main
        q.async {
            print(self) // Error: EXC_BAD_ACCESS (code=1, address=0x1c)
        } // we probably need a runtime check to catch this (retain a dying object).
        q.async { [weak self] in
            print(self) // nil
        }
        q.async { [unowned self] in
            print(self)
        } // Fatal error: Attempted to read an unowned reference but object 0x10111f0a0 was already deallocated
    }
}

do { _ = DBCursor() }
dispatchMain()

Chris_Lattner3 · January 19, 2021, 5:35am

johannesweiss:

class MyDBConnection {
    class DBCursor: AsyncIterator {
        typealias Element = DBRow
        ...
    }
    func makeRequest(_ request: Request) async throws -> DBCursor
}
Unless we iterated the DBCursor until completion, I would expect that once we're done with the DBCursor we need (via the network) tell the DB that we're done with the cursor (as it occupies resources in the DB). After telling the DB, we'll need to (a)wait the result of the cursor cancellation from the DB. So there'll be a lot of asynchronous code at play here and the cancellation of the cursor will take an arbitrary amount of time.

This is really a better question for the authors of the proposal than me. I'm just advocating for the removal of the already-sync cancel() member in favor of using the also-sync deinit member. Your point an argument for making cancel() be async - if that is a requirement then I agree that we can't eliminate it.

If I had to guess, your DBConnection would be modeled as an actor, and the iterator would send it a "hey wrap things up message" on a new task. This eliminates the need for cancel (or deinit) to be marked sync. The other option is to make DBConnection an internally synchronized class (e.g. with a mutex manually or whatever you prefer) and mark it as a ConcurrentValue. Then you could definitely have a "wrap things up" hook at isn't async.

As others pointed out, this isn't a memory safety issue or other issue specific to this proposal, this is a general issue with swift deinits and is already caught dynamically. I am not aware of this being a big stumbling block for Swift programmers in practice, even those who do non-trivial reclamation of stuff in deinit. For example, the S4TF object deallocates tensors (which can be gigabytes on gpus etc) using deinit and have never seen a problem that I'm aware of.

-Chris