On the proliferation of try (and, soon, await)

Back when we were introducing error handling for Swift, I raised the concern that the rule requiring a try label on every potentially-throwing statement was too indiscriminate. It's not that requiring a try label is always a mistake, I argued—in fact, in some cases being forced to acknowledge a possible throw could be extremely helpful in reasoning about code—it's just that there are so many cases where it wasn't helpful, that:

  • It would make code needlessly complex and hard to read.
  • try would have little value in the places where it really mattered, because people would get used to silencing the compiler by thoughtlessly adding it in all the cases where it didn't matter.
  • That would cause people to make mistakes that, because error-handling paths are seldom tested, lay undiscovered for years and fail in the field.
  • Perhaps worst, it would teach users to think about the correctness of error handling code by focusing primarily on control flow rather than invariants (I'll explain more below), which makes reasoning about it much harder, and leads easily to a overabundance of caution that acts to prop up the supposed need for boilerplate. The whole reason I got into language development was to empower programmers, and this feature, I said, would do the opposite.

IMO many of my concerns have been borne out, particularly the last one. There may be no better illustration than this pitch where it was proposed to omit try because it “is likely to just cause noise, and annoy users in the majority of use cases where this does not matter.” The proposers recognized the problem of pervasive try, but in fact the cases described were among those where it is most likely to matter. Failure to recognize that is a symptom of the disempowerment I described above. It would be almost impossible for anyone to make this mistake if they were focused on invariants rather than on control flow when thinking about error handling correctness.

Because it seems we're about to introduce another keyword with similar broad application requirements, IMO it's important to discuss the problem now, before we end up with code full of try awaits that don't actually serve the programmer.

When does requiring try help?

It's worth asking why we ever want error propagation sources to be marked. The original rationale document is thin on details. Obviously, the rest of the function will be skipped in the case of a throw, but why should we care? I claim that there is only one reason to care: the rest of the function may be responsible for restoring a temporarily-broken invariant.

This may seem like an overly-broad claim, but in my experience it is true, and every time I think I've found an exception to this rule, it has turned out that there was an invariant I'd failed to recognize. While I welcome counterexamples, they probably don't affect my argument: even if a few elusive corner cases exist, what matters here is not whether eliminating try marks can be proven to be harmless in all cases that don't break invariants, but how mandatory try marking plays out in the great majority of code.

Once we recognize that broken invariants are the real issue, it becomes immediately obvious that large categories of functions and methods do not benefit from try:

  • Functions and methods that don't mutate anything (i.e. pure functions). That includes nearly all non-mutating methods, even more nearly all inits, and non-mutating getters.
  • Functions that only mutate instances of well-encapsulated types through their public APIs. That includes nearly all functions that are not methods.
  • Methods of well-encapsulated types that only mutate non-member instances of other well-encapsulated types through their public APIs.
  • Methods whose only throwing operations occur before invariants are broken or after invariants have been restored.

What does this leave, to benefit from try marking? In typical programming, it's actually a very limited set of cases: mutating methods where an error may be thrown while an invariant is temporarily broken. In fact, if you look around at uses of “try” in real code, you'll find very few of these. The keyword mostly appears where it can't make any real difference (hello, cod🐟e that [de-]serializes and code that uses lots of closures, I'm looking at you)! Ask yourself how many times being alerted to an error propagation point has saved you from making a programming mistake. For me the answer—and I'm not exaggerating—is zero.

What can we do about it?

Given that invariant-breaking is already a danger zone in which programmers need to exercise vigilance, it's tempting to suggest that on balance, we'd be better off without any try marking at all. It certainly would have been worth considering whether the requirement for marking might have been better designed as an opt-in feature, enabled by something like the use of “breaks_invariants_and_throws” in lieu of “throws”. However, that ship has sailed. I actually would have liked to explore using higher-level language information to reason about where requiring try was likely to be useful (e.g. leave it off non-mutating methods on value types, and all inits), but I fear that ship too has sailed.

Instead, I suggest using a keyword in lieu of throws to simultaneously acknowledge that a function throws, and declare that we don't care exactly where it throws from:

func encode(to encoder: Encoder) throws_anywhere {
  var output = encoder.unkeyedContainer()
  output.encode(self.a) // no try needed!
  output.encode(self.b)
  output.encode(self.c)
}

Of course, “throws_anywhere” is not beautiful, and frankly I'd like a much lighter-weight name. Back in the days when error handling was still under design, I proposed rethrows for this purpose. Unfortunately, nobody liked the purpose, but they loved the name and used it for something else, so rethrows would also need an _anywhere variant. I'm totally open to name suggestions if we buy the general premise here.

This proposal generalizes the principle that allows try to be placed anywhere in a throwing expression: sometimes, the exact source of error propagation doesn't matter, and if you had to write a try for each precise source, f(try a, try b, try c) would just be annoying boilerplate. If we can move the marker before f, for many functions, we can equally well move it to the top level of the function.

What about async?

We can apply the same line of inquiry to async. First, why should we care that a call is async? The motivation can't be about warning the programmer that the call will take a long time to complete or that the call would otherwise block in synchronous code, because we'll write async even on calls from an actor into itself. No, AFAICT the fundamental correctness issue that argues for marking is almost the same as it is for error-handling: an async call can allow shared data to be observed partway through a mutation, while invariants are temporarily broken.

So where is that actually an issue? Notice that it only applies to shared data: except for globals (which everybody knows are bad for concurrency, right?), types with value semantics are immune. Furthermore, the whole purpose of actors appears to be to regulate sharing of reference types. Especially as long as actors are re-entrant by default, I can see an argument for awaiting calls from within actors, but otherwise, I wonder what await is actually buying the programmer. It might be the case that something like async_anywhere is called for, but being much less experienced with async/await than with error handling I'd like to hear what others have to say about the benefits of await.

One last thing about async: experience shows that pervasive concurrency leads to pervasive cancellation. In the case of our async/await design, we handle cancellation by throwing, so that means pervasive error handling. That is sure to make the problem of needless try proliferation much worse if we don't do something to address it.

Thanks for reading,
Dave

22 Likes

From a certain perspective, await as a keyword is entirely redundant.

Every call to a method or function makes execution of the current scope wait for the call to complete, before proceeding to the next line. That’s just how things work.

The implementation detail of whether the current execution context gives up its thread during the call is irrelevant to the programmer.

The fact that subsequent lines of code might execute in a different context is somewhat more important, but that has nothing to do with waiting for a call to complete. We have to wait for every call to complete before we continue, regardless.

Furthermore, the idea of a function call possibly mutating state, has nothing to do with synchronicity. A regular old bog-standard synchronous function could easily modify the state of the class instance from which it is called.

For a programmer, the interesting thing is, “Does this call initiate some concurrent task, which will execute in parallel with the current codepath, and if so how can I interact with it (observe progress, cancel, get notified when it completes, etc.)?”

That’s the actual new thing, the thing which is truly asynchronous.

All this other talk of suspension points is important to the implementation, but doesn’t directly affect the surface-level of the language. You always have to wait for a call to complete before the next line is executed, so why should some calls require an await keyword?

The new async let, however, is different. It actually introduces asynchronous execution. It brings concurrency into the language. And when you have an async let, then it makes sense to await it when you need the value.

But putting await directly on function and method calls does not seem beneficial to me at all.

11 Likes

After working in C# for a couple of years, I have found the required await keyword to be definitely helpful. I also haven't found Swift's required try to be very bothersome, in fact I agree with the existing rationale for requiring it and I think it applies equally well to await. Sure it's not strictly necessary, but as a reminder to the programmer, or the reader, I think it's worthwhile.

35 Likes

It's extremely relevant, because of thread safety considerations.

await is not like try in one important sense. If — stylistically — everyone was happy with omitting await everywhere, but if a programmer happened to overlook the possibility of a particular function being a suspension point, then the consequences of that could be catastrophic: mangled memory, race conditions, etc.

That seems to be the sort of thing Swift is designed to avoid, at the expense of repetition of a keyword.

I'm not sure that argument applies to try, though.

2 Likes

I do not understand.

Could you provide an example where the outcome would (or could) be different by calling and awaiting an async function, versus calling a synchronous function that does the same thing?

Since async/await tends to get used a lot for UI code, a common mistake people make is not understanding when control returns to the UI run loop, which may have the effect of making intermediate UI changes visible to the user. Knowing that every "await" in a function called from the UI thread is a temporary return to the run loop is helpful so that you know to carefully place code that changes the UI before or after that await depending on when you want those UI changes to be visible to the user.

If the call wasn't marked with "await" then you wouldn't even know to think about that, and you may have to just guess or keep checking the functions you're calling to remember whether they're async or not.

9 Likes

That's true, but to be totally fair, there's a difference with async. Assuming you are able to reason about which code has access to a given class instance, it's possible to reason that a synchronous call into code where the partially-modified instance is inaccessible will not cause the instance to be re-entered and thus observed in the partially-modified state. When the call is async, that instance is open to access, not just by the callee, but by any currently-suspended code that may resume at the point of the call.

That said, you could argue that being able to reason about which code has access to a given class instance is a fantasy that's rarely fulfilled in reality… another good reason to eschew classes :wink:

For a programmer, the interesting thing is, “Does this call initiate some concurrent task, which will execute in parallel with the current codepath, and if so how can I interact with it (observe progress, cancel, get notified when it completes, etc.)?”

Just trying to get a grip on what you're saying here… IIUC, every async call (except those into the same actor?) is a suspension point. Doesn't that mean that each one can effectively initiate a concurrent task, since some waiting task might start when this one suspends? Also, I'm not sure about “in parallel,” if we're distinguishing parallelism and concurrency. If some other thread spawns a new thread at the moment I make a call, has that call effectively initiated some task that will execute in parallel with the current codepath?

It sounds like you're talking about initiating a task that persists past the call, to which the caller can get access… which sounds like a future to me.

All this other talk of suspension points is important to the implementation, but doesn’t directly affect the surface-level of the language. You always have to wait for a call to complete before the next line is executed, so why should some calls require an await keyword?

Well, yes, that is the question I'm putting on the table.

The new async let , however, is different. It actually introduces asynchronous execution. It brings concurrency into the language. And when you have an async let, then it makes sense to await it when you need the value.

OK, but the fact that the word await “makes sense” at that point in the code is not a good enough reason for the compiler to mandate it. It has to serve some purpose, alerting the reader of the code to… something. For example, & is mandated on (nearly all) inout arguments in order to alert the reader to mutation, which materially affects the reader's ability to understand the meaning of the code.

I argue that there's nothing about evaluating an async let that makes it more worthy of a mandated keyword than any other async call. Both have the same potential to allow re-entrant access to shared state that wouldn't be allowed if the call were not async. But since that issue seems increasingly marginal as actor isolation becomes stronger, it's not clear to me that it's worth the cost.

Unless you're talking about the typical problems of reentrancy that occur even without threads, I don't think so. Giving up a thread means the same thread may start running some other task. It doesn't introduce a new opportunity for race conditions or deadlocks (or mangled memory).

No, I'm not talking about concurrency problems, but consider code running on the main thread in isolation.

When it's synchronous (i.e sequential and not giving up the thread), then the code is thread safe because nothing can be interleaved. The code has complete control of its data structures.

When it's asynchronous (i.e. sequential but gives up the thread at a suspension point), other code may be interleaved into the sequence. That code is not prevented from modifying the data structures that the suspended code was also modifying, potentially corrupting them.

OK, now we're getting somewhere. That is indeed a typical problem for class-based UI programs. That leads to some obvious questions:

  • Should we be designing the language for this paradigm when better alternatives (e.g. SwiftUI) exist?
  • Can we limit the need for await marking to those cases where it typically matters (e.g. to instance methods of classes)?
  • How is realizing the full concurrency vision going to affect this, if at all? Is the evolution of OOP UI frameworks once we have full actor isolation likely to make it a non-issue?

I didn't say “concurrency problems;” I said the “typical problems of reentrancy,” which are exactly what you described (and which in fact are also concurrency problems when data structures are shared). These problems are distinct from thread safety.

But they're also strongly related to thread safety, so it's an understandable mixup. In fact, I view thread safety as a matter of preventing the observation of broken invariants, where even a non-atomic Int is considered to have its invariants temporarily broken while a thread is modifying it. So at its core, it ends up being the same issue.

SwiftUI would, at best, only be shifting the problem though. It's no longer about the main "UI" runloop, but you still need to entertain the main "State" runloop. With so many independent pieces (users, networks, etc), we eventually ends up with synchronization somewhere. It'd be nice if we can push it all the way into OS so that it's not our problem, but SwiftUI hasn't gone that far yet.

That said, SwiftUI requires a lot less synchronization, so I'm not sure whether await would be on noise or signal side.

1 Like

I’m not an expert in the terminology here. I was attempting to distinguish between code that runs like this (with time progressing downward):

// do something
//      ↓
// call foo()
//      ↓
// do something
//      ↓
// use result of foo

And code that runs like this:

// do something
//      ↓
// call foo()  – – – – → foo does something
//      ↓                      ↓
// do something          foo continues doing things
//      ↓                      ↓
// use result of foo  ←  foo returns

The first example I would call “linear” or “single-path” or “synchronous”. If foo is a regular function, that’s the behavior you get. Also, if foo is asynchronous but called with await (as proposed), that’s also what you get. Things happen one after another.

The second example I would call “parallel” or “concurrent” or “asynchronous”. Things are happening at the same time in different functions.

The same issue: that it matters where the suspension points are. However you slice it, a small error in invariant preservation can be catastrophic.

So, I repeat, there's more risk in omitting async than (likely) in omitting try.

1 Like

I repeat my request.

Could you please provide an example where the outcome would or could be different by calling and awaiting an async function, versus calling a synchronous function that does the same thing without suspending?

It's kinda hard to make a very short, plausible example, but here's some synchronous code:

func computeAdjustment(count: Int) -> Int {
    return 1
}

var myCount = 0

func testIntegrity() {
    DispatchQueue.main.asyncAfter(deadline: .now() + 2) {
        myCount += 1
    }
    print(myCount + computeAdjustment(count: myCount))
}

This prints "1". Now make it asynchronous:

func computeAdjustment(count: Int) async -> Int {
    // assume there is something asynchronous here
    return 1
}

var myCount = 0

func testIntegrity() {
    DispatchQueue.main.asyncAfter(deadline: .now() + 2) {
        myCount += 1
    }
    print(myCount + await computeAdjustment(count: myCount))
}

Now it may print "2" or "1", depending on whether the dispatched closure runs during the suspension, or after the function exits.

2 Likes

Hi Dave,

I think that you have a couple of good points, but are missing the bigger picture on this.

Try and await marking are key points of helping programmers understand control flow in their applications and cover for serious deficiencies in the C++ and Java exception handling model (where many programmers ignore exceptional control flow). This failure of the C++/Java model is one of the reasons that C++ exception safe programming model is such a binary (all or nothing) thing, and one of the reasons Java doesn't interop with C very well. We should do better here. Furthermore, the back-pressure on "exception handling" logic is intentional, and is one of the things that is intended to help reduce the number of pervasively throwing methods in APIs.

Your characterization of marking being a historical artifact (whose "ship has sailed") isn't really fair IMO: many of us are very happy with it for the majority case, and believe that the original Swift 2.0 design decisions have worked out well in practice. I also personally believe that async marking is a promising (but unproven) direction to eliminate a wide range of deadlock conditions in concurrent programs when applied to the actors model. I'm not aware of any other model that achieves the same thing.

I personally think that your proposal:

is an overreach, and "over solution" to the problem. In addition to applying to the whole function, this approach makes it appear as though it is part of the API of the function, when it is really an artifact of the implementation details / body of the function.

That said, I agree that you're on to something here and I agree with you that async will exacerbate an existing issue. There are a couple of ways to address this. One is to reduce the keyword soup by introducing a keyword that combines try and await into one keyword (similarly throws and async) - but I am convinced that we should land the base async proposal and gain usage experience with it before layering on syntactic sugar.

That other side of this is the existing point that you're observing where we have existing use cases with try that are so unnecessarily verbose that they obfuscate the logic they contain. To me, I look towards solutions that locally scope the "suppress try" behavior that you're seeking: while you pick some big cases where it appears to be aligned with functions, often this is aligned with regions of functions that are implemented in terms of throwing (and also, in the future, async logic). That said, the whole scope of the declaration isn't necessarily implicated into this.

The solution to this seems pretty clear. We already have a scoped control flow statement that allows modifiers: the do statement. I think that we should extend it with try and await modifiers to provide the behavior you want.

Instead of your example:

func encode(to encoder: Encoder) throws_anywhere {
  var output = encoder.unkeyedContainer()
  output.encode(self.a) // no try needed!
  output.encode(self.b)
  output.encode(self.c)
}

We would instead provide:

func encode(to encoder: Encoder) throws {
  var output = encoder.unkeyedContainer()
  try do {
    output.encode(self.a) // no try needed!
    output.encode(self.b)
    output.encode(self.c)
  }
}

While this is slightly more verbose, it moves the weight into the implementation details where local scopes can be marked as implicitly trying, rather than it being at the granularity of entire decls. This seems to provide the same capability that you're looking for, but with a more composable approach that puts the burden on the implementation instead of the interface.

-Chris

53 Likes

Thanks.

In that example, the issue seems to arise from the async call breaking the order of operations within a single statement.

Instead of the left side of “+” being evaluated (myCount) and then the right side (calling computeAdjustment), the inclusion of await causes the entire print statement to be delayed until computeAdjustment completes.

This is essentially equivalent to moving the call up a line:

func testIntegrity() {
    DispatchQueue.main.asyncAfter(deadline: .now() + 2) {
        myCount += 1
    }
    let x = await computeAdjustment(count: myCount)
    print(myCount + x)
}

In that reformulation, the sync and async versions will behave identically, printing 1 or 2 based on how long computeAdjustment takes to complete.

So the complexity arises not from anything inherent to suspension, but rather because a statement containing an async call gets transformed “as if” the async call happened on the previous line, thus upending the expected order of operations.

That seems like a significant “gotcha” that people will not expect.

1 Like

Well, my example wasn't very good, because the outcome may depend on when the compiler loads the value of myCount for the expression, too. That part wasn't the point of the example.

However, your re-written version still always prints "1" when synchronous, never "2". It doesn't matter when the deadline expires (relative to the access to myCount), because the fact that the code is synchronous ensures that the closure won't execute until after the function has returned — and indeed after the thread has returned to main event loop.

In the synchronous version, the closure has no effect on the rest of the function. In the asynchronous version, it might.

Well now I’m even more confused.

I thought await meant “await”.

Are you saying that the main event loop would continue to execute during the await?

(I’m assuming testIntegrity was called on the main thread. Was that your intention? I also notice it is not marked as async, but I assume it should be, right?)