SE-0298: Async/Await: Sequences

If we really want to include cancel and call it at the end of the loop, I think it's better to call it unconditionally. Things like map and filter already erase the classful-ness of the sequence. It's not a reliable discriminator.

1 Like

Your example is reliable by the virtue of the underlying library supporting multiple active resources with overlapping accesses. So a subsequent resource can be opened, regardless of whether the first resource has been closed or not. Now imagine now hypothetically, that only one SQLite statement can be active at a time, or otherwise the process crashes. Could we still say for certain that relying on ARC deinitialization is reliable?

When people say ARC "cannot be relied upon", IMO they are meant to say that the deinitialization timing of scope-local constants and variables is not reliable, because inlining and optimisation passes can produce different orders than you expected, without you noticing at the language level. Say it could deinitialize at the end of the scope, or deinitialize right after the last point of use, all depending on what compiler flags are used. This is also why primitives like withExtendedLifetime() exists for when you need a deterministic deinitialization order.

If the use case can tolerate the indeterminism of deinitialization timing, ARC is perfectly fine. But we should care about use cases that cannot tolerate it, while designing a new construct in the language, shouldn't we?

I agree. But for the sake of the quality of the discussion, an example of such case should be exhibited, an example where the lack of an explicit cancel() creates an actual problem.

2 Likes

Agreed, both with “it works fine for GRDB” and with the request for an example.

I tried to create a real, not too long example in the “Details” bit at the very bottom of SE-0298: Async/Await: Sequences - #39 by johannesweiss . Do you agree that this would be hard to even harder to achieve with just deinit?

Regarding “it works fine for GRDB”: I totally believe you that it works fine for your use case (and many out there) but I think we shouldn’t make the assumption “it will work for all use cases”, therefore I believe the AsyncIterator must have some sort of cancel that isn’t deinit.

1 Like

I missed your post. Sorry some may have understood, because of what I wrote, that such a detailed example has not been provided before. This really helps the review!

1 Like

@johannesweiss I'm not sure why you are shocked. You are extrapolating extremely general advice from a different language to a very specific situation (the design of a cancel() member) in Swift. Also, I didn't say you are completely wrong, only that the approach isn't super helpful -- that is why I argued for first principles thinking.

I will explain what I mean, but if you are specifically interested in making general claims about Swift than talking about the design of the cancel() feature, then I think it would be best taken to a different thread.

ARC will release the iterator eagerly for any reasonable use-case when you are done with them, and is 100% guaranteed to do so with the for-in syntax in particular. These are not general object graphs were are talking about in an overly heavy OO system, this is an iterator object which is a value.

You have correctly observed that iterators can be captured and escaped by closures, and doing so would make their lifetime less predictable. However, this isn't the basis of the guidance you are quoting, I don't think it is very common, and there are well known ways to solve that problem when they occur in practice. Even more concerning, "solving" this for AsyncIterator would imply that we should apply this pattern to effectively every other protocol (including e.g. Iterator) since there is nothing AsyncIterator specific about this problem -- many protocols implementations can indirectly refer to "expensive resources".

I can't see how "order of release" calls is relevant or has any problematic effects for the discussion at hand. You cannot destroy the sequence before the iterator, nor would the sequence have a pointer back to its iterator.

Are you making a claim about swift in general or are you trying to constructively contribute to the point of this discussion: iterators over async sequences? While the observation above is correct in the abstract, it is not particularly useful here unless you think that iterators need special defensive work to keep them from escaping out, per the points I made above.

We are talking about AsyncIterator iterator instances, none of which have been written yet, not making generic claims about Swift types overall. There exist no autoreleased implementations of AsyncIterator, this is 100% true.

This is a key stated goal of the Swift Concurrency direction and the core team has already said that they will revised older accepted proposals if necessary as the overall design evolves.


One of the things I observed in the pitch thread is that the cancel() design is both unnecessary (as observed above) but also that it is equivalent to the manual "invalidate" pattern used in GC systems to avoid finalizers (xref reference on finalize and an explanation). Not having finalizers (and the many problems they bring with them) in ARC and Swift is a huge progression over other GC'd languages. Bringing the manual reclamation patterns they require into Swift will cause exactly this sort of confusion (and has other issues like making AsyncIterator inconsistent with Iterator, etc). The only rationale to add it as part of this proposal is because we do not yet have linear types in Swift, which makes this a "point in time problem" not something inherent to the design of async sequences.

Here is the thing: ARC is not unreliable like GC'd systems, and I don't think we want finalizer like patterns in Swift API design. This is not a precedent that we want to set or encourage. Swift's gentle encouragement of people to use value types instead of reference types is one of the ways we directly improve this vs Objective-C, but there are many others as well. This significantly reduces the problem in practice, but does not completely invalidate @johannesweiss concern. We should aspire to keep improving the language (as already planned) instead of just go back to old GC patterns.

This is one of the reasons that I'm not in favor of adding the cancel() member. It has demonstrably created confusion in this thread, and we don't want people to pull forward this design into other places.

-Chris

13 Likes

First of all, thank you very much for your thoughtful reply. I pretty much agree with everything, except this one point:

You say it's a value and I agree that usually iterators are values. However, if we take the current design but removed cancel, then you write

which I understand as "if we accept this proposal without cancel, then we will push people to use classes instead of structs for iterators" (until the introduction of deinit for structs). Once they are classes (ie. not values), I am worried that we will get designs where people are more or less forced into using deinit to discard their network connections/file handles/... . This then rang my alarm bells (maybe a little loud). Example:

class NetworkConnection {
    /* need deinit */ class NetworkConnectionIterator {
        typealias Element = [UInt8]

        // [...]
        deinit {
           // NetworkConnection tear down
        }
    }

    func makeIterator() -> NetworkConnectionIterator

    func close() async throws
}

I think your point is that NetworkConnectionIterator is less likely to be part of an object graph, and maybe it's unlikely that users accidentally create a reference cycle that contain the iterator etc.

I can see the point that maybe the iterator is just so unlikely to be implicated in the problems that I describe that "just deinit" may be good enough. I definitely take the point that I should have thought this through more carefully in the domain of just AsyncIterator and not in general. In general, I absolutely stand by my points but maybe they are not as relevant as I though to the very tightly scoped domain of AsyncIterator.
I guess what I have seen so far is that once one piece starts to manage non-memory resources with deinit, it becomes pervasive which I've seen as a huge issue. I guess if a low-level (my usual domain) piece manages its resources using deinit, it indeed becomes pervasive. But iterators are not low-level objects, we don't typically build many composed things with iterators. That usually makes the lifetime of iterators is much shorter than the many examples of issues that I have in my mind.

Still, I believe having cancel is preferrable:

  • We have the option to use or not use deinit to implement resource cancellation. That makes it less important if we miss an important case where deinit just doesn't work (because we have cancel).
  • We can save allocations for the iterator (because we can actually make it a struct/value)
  • Implementing a correct multi-step asynchronous tear down in deinit is really hard (all the reasons I described, must not publish a self reference, ...)

Now, if we did indeed get linear types, move-only types, and other type level features, then everything around deinit will change. But I am always reviewing a proposal with what is in the language today + the previously accepted proposals that I am aware of. I find it hard to take for example linear types & "threading issues safety" into account before it is decided that and exactly how we're getting them.

// EDITs: grammar/typos/clarity

2 Likes

Yes, that's right. I think that would be the natural consequence of removing the cancel requirement, until swift supports structs with deinits.

Right exactly. The fact that NetworkConnectionIterator happens to be a class is not part of the generic AsyncSequence protocol (since the member is not AnyObject constrained), so generic algorithm clients wouldn't even be exposed to this implementation detail.

Right, or at least trading one problem for another. The presence of the cancel member itself leads multiple possible problems and bugs with manually written code, since it has a complex set of conditions in which it is supposed to be called (or not). I think that humans will likely get that wrong even if the compiler always does the right thing for for/in.

Fair enough, I respect your concern and opinion but I personally don't think that this is a bad thing. I'm explicitly encouraging use of deinit for resource management, because even though it isn't perfect, it does it actually does work pretty well in Swift in the majority case. The exceptional cases (e.g. object graph cycles) have well known solutions, and ARC being deterministic (unlike GC collections) make it relatively easy to track down the "unpredictable" cases because they can be reliably reproduced.

I agree on your second point particularly in the immediate term. Linear types aren't part of the concurrency push that AsyncSequence is part of, and therefore isn't something we can count on in any specific timeframe.

On the third point, I don't understand which cases would cause the iterator to publish its self reference. Typically I'd expect the iterator's deinit to just call a sync method on the iteratee to say "hey, stop what you're doing if convenient" as Tony described above. I don't expect the iterators lifetime itself to contain useful state. If the iterator itself is heavy and complicated, then it doesn't make sense as an implicitly copyable struct anyway.

On the first point, I don't understand who "we" is. Are you referring to the implementor of an AsyncSequence, or of a client that is iterating the AsyncSequence? From the client perspective, I see the removal of cancel to eliminate an entire class of concerns and potential for misuse. It it likely that I'm misunderstanding what you're saying here though!

-Chris

4 Likes

Likewise, I very much respect your opinion. I guess we have had very different experiences and that's okay.

Needless to say that I agree with the actual facts that ARC is both 100% reliable and 100% deterministic, it's a compile-time feature after all. Regarding ARC and determinism of whole programs I'd say "ARC doesn't make an otherwise deterministic program non-deterministic." That's fantastic [and an actual difference to a GC]. However, most programs aren't deterministic which means the context deinit is called from is also not deterministic in general. But as you said before, the question in this proposal review is if relying on deinit is sufficient in the more limited domain of async iterators.

Even for just async iterators, I am still am not convinced because there are very few use cases where I'd want to use an async iterator if there are no networks involved. For example wouldn't an API like this be reasonable?

class MyDBConnection {
    class DBCursor: AsyncIterator {
        typealias Element = DBRow
        ...
    }
    func makeRequest(_ request: Request) async throws -> DBCursor
}

Unless we iterated the DBCursor until completion, I would expect that once we're done with the DBCursor we need (via the network) tell the DB that we're done with the cursor (as it occupies resources in the DB). After telling the DB, we'll need to (a)wait the result of the cursor cancellation from the DB. So there'll be a lot of asynchronous code at play here and the cancellation of the cursor will take an arbitrary amount of time.

Regarding "just telling the iteratee" and "useful state", the best I can come up with right now is the DBCursor as async iterator example above.

Regarding "publishing its self reference", on the spot, I can only come up with made up code like:

func stupidLog(_ message: @escaping @autoclosure () -> String)

class DBCursor: AsyncIteratorProtocol {
    // [...]

    func handDBConnectionBackIntoConnectionPool() {
        guard self.stillOwningConnection else { return }
        self.stillOwningConnection = false
        self.connectionPool.syncQueue.async { // <- self publish 1 (through escaping capture)
            self.connectionPool.append(self.connection)
        }
        stupidLog("\(self): handed conn back") // <- self publish 2 (through @escaping @autoclosure)
    }

    deinit {
        // looks innocuous here.
        self.handDBConnectionBackIntoConnectionPool()
    }
}

Yes, my example is made up, could be rewritten, and would probably look nicer after Swift's concurrency efforts have landed. And yes, with ActorSendable, it would probably fail to compile because DBCursor and DBConnection may not be ActorSendable.

Note: This was specifically about self publishing so in the example above I left out the hard part which is communicating to the DB that we no longer require the cursor.

I'm merely pointing out that there is code that may pass a code review but turns out to not be suitable if called from deinit. I think that would be a problem if deinit is the only way that cancellation is signalled. Sadly, we also don't fully know how likely self publishing in deinits is today because today's Swift turns them straight into a memory unsafe program (UaF) and maybe we're mostly getting away with it? Yeah, unlikely, still needs fixing IMHO.

Sorry, that was unclear. "We" was meant to be the implementor of an async iterator. But you are of course right: The users of async iterators will have a harder life if cancel is required because they'll need to remember to call it. In most cases I'd think that a simple defer { it.cancel() } (or ofc using the for loop) will do the job, and maybe we could help with static analysis.

The automatic cancel() could be useful, if makeAsyncIterator() returns:

  • self (e.g. default implementation), when the sequence is its own iterator;

  • or a cached iterator, which should be reset and put back into the cache later.

ETA: Alternatively, wrapper classes with deinit could be used for the above.


If a type can conform to both of the Sequence and AsyncSequence protocols, what happens to an API such as contains(_:), which differs only by its async rethrows?


Should AsyncSequence use __consuming on its makeAsyncIterator() requirement?

  • Sequence uses __consuming on its makeIterator() requirement.

  • The proposal uses __consuming only on the cancel() method.

  • The current implementation doesn't use __consuming anywhere.

  • (I don't understand what __consuming actually means.)

ETA: Swift 5.3.2:

error: method must not be declared both '__consuming' and 'mutating'
  __consuming mutating func cancel()
  ^~~~~~~~~~~~

You can't publish self in deinit. Once you get to deinit, there's no way back.

class DBCursor {
    deinit {
        let q = DispatchQueue.main
        q.async {
            print(self) // Error: EXC_BAD_ACCESS (code=1, address=0x1c)
        } // we probably need a runtime check to catch this (retain a dying object).
        q.async { [weak self] in
            print(self) // nil
        }
        q.async { [unowned self] in
            print(self)
        } // Fatal error: Attempted to read an unowned reference but object 0x10111f0a0 was already deallocated
    }
}

do { _ = DBCursor() }
dispatchMain()

This is really a better question for the authors of the proposal than me. I'm just advocating for the removal of the already-sync cancel() member in favor of using the also-sync deinit member. Your point an argument for making cancel() be async - if that is a requirement then I agree that we can't eliminate it.

If I had to guess, your DBConnection would be modeled as an actor, and the iterator would send it a "hey wrap things up message" on a new task. This eliminates the need for cancel (or deinit) to be marked sync. The other option is to make DBConnection an internally synchronized class (e.g. with a mutex manually or whatever you prefer) and mark it as a ConcurrentValue. Then you could definitely have a "wrap things up" hook at isn't async.

As others pointed out, this isn't a memory safety issue or other issue specific to this proposal, this is a general issue with swift deinits and is already caught dynamically. I am not aware of this being a big stumbling block for Swift programmers in practice, even those who do non-trivial reclamation of stuff in deinit. For example, the S4TF object deallocates tensors (which can be gigabytes on gpus etc) using deinit and have never seen a problem that I'm aware of.

-Chris

Arguably, in your code (and mine before) you have published a self reference from deinit. When trying to print it (after it escaped deinit), you then hit a bad pointer (EXC_BAD_ACCESS on 0x1c) because it's a use after free. If you re-run your program in a loop, I wouldn't be surprised if the exact output changes.

As you said, we could (should!) catch this. The condition to catch isn't "retain a dying object" (that's legal) but that the reference count at the end of deinit is > 0 which clearly indicates that we've published self.

Said that, as Chris says ...

... this isn't specific to this proposal. The reason I bring it up is that I think relying on deinit may cause self publishing issues more frequently.

Example: [It hurts to type this]

class DBCursor: AsyncIteratorProtocol /* assuming no `cancel` at all */ {
    func tearDownCursor() async throws {
        // ...
    }
   
    deinit { // oops, and causes a UaF or a crash (once the Swift bug is fixed)
        _ = Task.runDetached {
            try? await self.tearDownCursor() // <- attempted self publish in `deinit`
       }
    }

Sure, the code is awful but could we really blame somebody for writing it if they're just given func tearDownCursor() async throws and deinit?


Yes, you are right. I would indeed the happiest with func cancel() async throws. If we can't get this for efficiency/complexity reasons, I am okay with having the synchronous func cancel(). At least I still know how to do my job being given a func cancel. Without any explicit cancel I really don't know if we can get everything to work.

Regarding the synchronous cancel: I have shown an example of async teardown with just a sync cancel at the end of this post ("Details" at the very end).

Also my bad example from above would totally work with a synchronous cancel. Yes, not how structured concurrency is meant (and likely inefficient). But hey, contrary to putting the code in deinit it would actually work:

class DBCursor: AsyncIterator /* with sync cancel, as proposed */ {
    func tearDownCursor() async throws {
        // ...
    }
   
    func cancel() { // works, not ideal (and allocates for the closure)
        _ = Task.runDetached {
            try? await self.tearDownCursor()
       }
    }

Obviously as you note, an async cancel would make everything ...

class DBCursor: AsyncIteratorProtocol /* with async cancel */ {
    func tearDownCursor() async throws {
        // ...
    }
   
    func cancel() async throws {
        try await self.tearDownCursor()
    }
}

... so much nicer indeed (but yes, does need somebody/something to call the cancel).


Yes, having the DBConnection own all the cursors' resources does work, we're essentially pushing the cleanup responsibility for the cursor up one layer (from cursor to connection).

There is one important ownership question though: DBCursor has to hold a strong reference to DBConnection because otherwise for row in makeDBConnection().request("SELECT foo FROM bar").makeCursor() wouldn't work because the DBConnection would no longer be directly referenced.

Essentially, the cursor has to retain the connection. But what do we do if the last cursor deinits? If we don't take any precautions, then DBConnection.deinit would now be invoked and also needs to finish synchronously (but still needs to communicate with the DB to tear down the cursor, which is async).

To work around this, we could have the connection create a (temporary) retain cycle with itself once it starts tearing down a cursor. And when it's done tearing down the cursor, it'd break that cycle [alternative: keep self alive in an async func]. That however leads to the next problem: If we're using also using ARC to manage the connection, the connection now needs to (asynchronously) tear itself down in the synchronous deinit. Yes, we can play the whole game again and say that the connection teardown is owned by a NetworkingSystem and DBConnection.deinit would just send a message to NetworkingSystem to tear the connection down. And then NetworkingSystem needs a tear down at some point... This feels like we're always pushing the responsibility for the cleanup up by one layer until we have no more layers.

Maybe we can make this "push the cleanup responsibility up" design work somehow but it seems very complex and error-prone. Also it does feel a little backwards to me given that DBCursor will need to be a class anyway (for the deinit if we don't have cancel).


Just in case I misunderstood your DBCursor sends message/sync calls DBConnection to tear down. In the "Details" I have a code sketch how i understood both options:

1. `DBConnection` as actor
class DBCursor {
    // assuming no `cancel`

    let connectionActor: DBConnectionActor
    let myCursorID = ... // need a unique ID here, can't use `self` or
                         // ObjectIdentifier(self) because needs to stay valid after
                         // `self` is deallocated. Likely provided in `init`
                         // from the `DBConnection`.

    deinit {
        // pull two variables out of `self` so we won't need `self` in
        // detached task.
        let connectionActor = self.connectionActor
        let myCursorID = self.myCursorID

        _ = Task.runDetached {
            try await connectionActor.finishCursor(myCursorID)
        }
    }
}
  1. as "internally synchronised" class
class DBCursor {
    // assuming no `cancel`

    let connection: DBConnection
    let myCursorID = ... // need a unique ID here, can't use `self` or
                         // ObjectIdentifier(self) because needs to stay valid after
                         // `self` is deallocated. Likely provided in `init`
                         // from the `DBConnection`.

    deinit {
        // problem here is: `DBConnection` needs to make sure it doesn't
        // get `deinit`ed straight after this call.
        self.connection.finishCursor(self.myCursorID)
    }
}

I've been following the conversation here (and elsewhere) about cancel and feel like a lot of good points have been raised. @Philippe_Hausler and I are going to discuss this and get back to this thread with a follow-up.

13 Likes

I've read the proposal and the discussion so far.

Essentially looks reasonable and whichever way it goes on cancel I can live with it.

The question I would have regarding the cancel debate is whether it needs to be in protocol or that those sequences which need a specific cancellation call if they are being interrupted can add it themselves (deciding whether or not it should be async or if there should be any options for how urgently the cancel should happen).

It may be that the requirement within certain projects are that every AsyncSequence should be written to conform to an additional protocol that includes a cancel of a certain form but there may be many projects and cases where it is unnecessary.

It seems like the alternatives are an async cancel() or leveraging deinit (and possibly accelerate deinit on structs.

However, it does appear that asynchronous tear-down may be needed. If we use deinit, would this require dispatching cleanup work to another actor (such as that which the iterator was created from)? Or would there be a way to do this with a detached Task?

Could you please rebuild the async-experimentation toolchain now that [SR-14046] Assertion failure: Forming a CanType out of a non-canonical type in createTuple · Issue #56437 · apple/swift · GitHub has been fixed?

Thank you so much everyone for your feedback on our proposal. We are grateful that you are all taking time to share your experiences and humbled that so many of you believe this is a great addition to the Swift language.

Based on the feedback here and elsewhere, we've updated the proposal to remove the explicit cancel call. There are two main reasons:

  1. Reducing the requirements of implementing AsyncIteratorProtocol makes it simpler to use and easier to understand. The rules about when cancel would be called, while straightforward, would nevertheless be one additional thing for Swift developers to learn.
  2. The structured concurrency proposal already includes a definition of cancellation that works well for AsyncSequence. We should consider the overall behavior of cancellation for asynchronous code as one concept.

We also took the opportunity to fix a few small naming issues in the proposal.

An additional thought, not included in the 'alternatives considered' because it is more speculative: the existence of Task.checkCancellation should also provide those that need async cleanup on cancellation or completion a path to do so (in async next). Those implementations that do not need async cleanup can still do so in next() or in deinit.

12 Likes

With removing the requirement for cancel there are a few interesting opportunities that we should consider:

extension Task.Group: AsyncSequence, AsyncIteratorProtocol {
    public typealias Element = TaskResult
    public func makeAsyncIterator() -> Task.Group<TaskResult> {
        return self
    }
}

To reference Doug's structured concurrency document.

func chopVegetables() async throws -> [Vegetable] {
  // Create a task group where each task produces (Int, Vegetable).
  try await Task.withGroup(resultType: (Int, Vegetable).self) { group in 
    var veggies: [Vegetable] = gatherRawVeggies()
    // Create a new child task for each vegetable that needs to be 
    // chopped.
    for i in veggies.indices {
      await group.add { 
        (i, veggies[i].chopped())
      }
    }
    // Wait for all of the chopping to complete, slotting each result
    // into its place in the array as it becomes available.
    for try await (index, choppedVeggie) in group {
      veggies[index] = choppedVeggie
    }
    return veggies
  }
}

func cookedVegetables() async throws -> [CookedVegetable] {
  // Create a task group where each task produces (CookedVegetable).
  try await Task.withGroup(resultType: (CookedVegetable).self) { group in 
    // Create a new child task for each vegetable that needs to be 
    // cooked.
    for veggie in gatherRawVeggies() {
      await group.add { 
        veggie.cooked()
      }
    }
    // Wait for all of the cooking to complete by using the extensions to AsyncSequence
    return try await group.collect()
  }
}
5 Likes

Hi all!

This proposal has been accepted with modifications. Please see the decision notes in the acceptance post for details on the modifications. Thank you!

Doug

4 Likes