Await/Async, part deux

gscarr · June 14, 2020, 5:36am

@Nevin(I'm confused by the "not trying to achieve concurrency" part of this. I assume you mean that concurrency is only to be achieved in very specific places. )
@Quncy I have never used an async/await model, but this whole discussion about the apparent "surprises" in order of execution of statements makes me wonder why the actor model isn't receiving more discussion. Chris Lattner's manifesto describes it as analogous to the Go /CSP model. In that case all of your code is sequential except where you spin off a goroutine and is seems much simpler to reason about.You just pull in data from channels at points that clearly define the synchronization between the background coroutines. You don't have to worry about which thread any of them are dispatched on. You just read the data on your main thread as you update your UI or whatever. I don't know if any of that impacts the monad discussion, but, at this point, I can't see the advantage of the async/await approach (BTW is await asynchronous equivalent to synchronous?) I think I see some benefit to "parallelizing" the syntax between throwing and async, but is Async/await the best model for rationalization? I think the complexity of this whole discussion calls that into question. (Off topic: I wonder if the changes brought about using the Combine framework lend themselves to a better fit of one or other concurrency model)

ddddxxx · June 14, 2020, 6:06am

Did you proposing a stackful coroutine? If so async/await keyword is not needed, and:

QuinceyMorris · June 14, 2020, 6:24am

Concurrency is creating multiple paths of execution. "Spinning off" things also sounds like creating multiple paths of execution.

In many cases you only want a single path of execution. Currently, asynchronous functions (in the sense of functions that return immediately and invoke a completion handler later) fork one path into two paths of execution.

The goal here is to actually provide a single path of execution when that's all you need. That's even simpler to reason about.

QuinceyMorris · June 14, 2020, 6:30am

Well, I'm not proposing how the implementation is done. The proposal we've got is non-specific about implementation details.

You don't have to use 'async' on the function signature for this. As I mentioned earlier, it's enough for the compiler to discover an 'await' operator in the body of the function.

ddddxxx · June 14, 2020, 6:44am

Async model is not just "implementation details". Your A1() example doesn't exist in stackless coroutine model, which is the only choice for The Swift Language.

Discovering await is not enough. In stackless coroutine model, the compiler really can't do anything but block the current thread in a normal function, which defeat the whole purpose of coroutine. Otherwise the call stack will be destroyed as await returns.

ddddxxx · June 14, 2020, 7:44am

Calling await in a non-async context is the source of these chaos. It should not (and can't) be allowed and await itself does not create paths of execution.

I'd expect Swift provide async primitive that allows other event loop library to implement async runtime:

extension MyScheduler /* like DispatchQueue and RunLoop */ {
    
    // perform async function concurrently and returns immediately
    func async(_ body: () async -> Void) {
        ...
    }
    
    // do other things in event loop until async function returns
    func sync(_ body: () async -> Void) {
        ...
    }
}

Or you need to deal with the result yourself, like Rust.

let asyncExecution: Future<Int> = async {
   await getIntValue()
}

ExFalsoQuodlibet · June 14, 2020, 7:56am

The point is what you make it to be: the discussion up to this point seems to show that there is not single universal correct interpretation of the keywords async and await.

I don't see why one should be forced to call an async function with await if the context is already asynchronous (that is, the function within which it's called is async): there are reasons to do so, and I might want to do it strategically (see later in this post for further explanation).

For example, code like:

let x = asyncFunction()
let y = await anotherAsyncFunction(x)
let z = stillAnotherAsyncFunction(y)

can only be called from an async function because the 2 of the async functions are not called with await. But the path of execution within this particular function is clear (to get y I need x, to get z I need y).

I wrote that there are various reasons why the parallelism between throws/try and async/await is only partial. A relevant reason, to me, is related to typing. (A) -> B and (A, (B) -> Void) -> Void look like different function types, but they are the same at the type level, in so far that they're isomorphic and follow the same typing rules (relationships, inheritance). I can even write functions to convert from one to the other and viceversa:

func fromSyncToAsync<A, B>(_ f: @escaping (A) -> B) -> (A, (B) -> Void) -> Void {
  return { a, onB in
    onB(f(a))
  }
}

func fromAsyncToSync<A, B>(_ f: @escaping (A, (B) -> Void) -> Void) -> (A) -> B {
  return { a in
    var asyncB: B?
    let s = DispatchSemaphore(value: 0)
    f(a) { b in
      asyncB = b
      s.signal()
    }
    s.wait()
    return asyncB!
  }
}

let f: (String) -> Int = { $0.count }
let howdy = "howdy"

print("sync", f(howdy))

let fAsync = fromSyncToAsync(f)

fAsync(howdy) {
  print("async", $0)
}

let fSyncAgain = fromAsyncToSync(fAsync)

print("sync again", fSyncAgain(howdy))

/*
 will print:
 
 sync 5
 async 5
 sync again 5
 */

Other than being interesting by itself, it shows a fundamental difference with throws/try: in the latter, try is really needed because I'm transforming a type that's essentially (A) -> Result<B, Error> into (A) -> B, which is not the same type (but related), thus try is a special form of as. await, on the other end, changes the shape of a type for reasons unrelated to the type itself: the reasons are related to turning an asynchronous operation into a non-asynchronous, which is not related to types, but to execution flow.

A certain thing is true for both, but not everything is, in particular the meaning of try and await are not in the same ballpark.

Modeling an operation as an arrow in a category based on a monad can be useful, but it's just a model, and in this case it would be more appropriate if we were discussing the parallelism between functions of type (A) -> Result<B, Error> and (A) -> Future<B>, which is not the scope of this discussion.

True, like many other statements and features of the language. This doesn't mean that try and await express the same concept, though.

Other than signaling to the user (which is always useful ), in my model, not using await on any of those calls would make the function itself async (with a compilation error if not marked as such).

It's actually returning A, what changes is the execution flow. If you check the way the function is transformed you can see what I mean.

It would probably be clarifying if we considered 2 possible implementations of an async function:

func asynchronous_v1(_ a: A) async -> C {
  let b = await asynchronous1(a)
  let c = asynchronous2(b)
  return c
}

func asynchronous_v2(_ a: A) async -> C {
  let b = asynchronous1(a)
  let c = await asynchronous2(b)
  return c
}

What's the difference? It becomes clear if we express them with callbacks and DispatchSemaphore like I did it before:

func asynchronous_V1(_ a: A, _ onC: @escaping (C) -> Void) {
  var asyncB: B?
  let sb = DispatchSemaphore(value: 0)
  asynchronous1(a) {
    asyncB = $0
    sb.signal()
  }
  sb.wait()
  let b = asyncB!

  /// when called, this function will return after the next line, and will complete the execution asynchronously
  asynchronous2(b) { c in
    onC(c)
  }
}

func asynchronous_V2(_ a: A, _ onC: @escaping (C) -> Void) {
  /// when called, this function will return after the next line, and will complete the execution asynchronously
  asynchronous1(a) { b in
    var asyncC: C?
    let sc = DispatchSemaphore(value: 0)
    asynchronous2(b) {
      asyncC = $0
      sc.signal()
    }
    sc.wait()
    let c = asyncC!

    onC(c)
  }
}

When someone calls asynchronous_V1, the caller will see any side effect of asynchronous1 produced before asynchronous_V1 returns, plus any (potential) side effect asynchronous2 produces before returning. Calling asynchronous_V2, instead, only immediately yields any (potential) side effect produced by asynchronous_V1 before returning.

You can imagine cases where, for example, a function is asynchronous because it returns before having completed its execution, but we still want to setup up something synchronously before it returns.

The fact that a function is async doesn't necessarily means that it returns immediately.

ddddxxx · June 14, 2020, 8:18am

I'm curious why do you think calling async function synchronously as well as block current thread is useful? Where we can do other things in event loop with an async runtime. Assuming we are talking about stackless coroutine.

GreatApe · June 14, 2020, 9:22am

I think the confusion is that what hacksaw describes actually seems to be what it does in A1, although that is of course not the main use case.

GreatApe · June 14, 2020, 9:24am

I also eagerly await actors, but won't they be a bit clunky for cases like these, where you have a sequence of calls and need to take the output of each and pipe it into the next?

alaborie · June 14, 2020, 12:34pm

If I'm not wrong, this invariant is the key difference with the current model. I mean, the original A1 example could be written with Dispatch as follow:

import Dispatch

func A() {}
func B(completion: () -> Void) { completion() }
func C(completion: () -> Void) { completion() }
func D(completion: () -> Void) { completion() }


func A1() {
    A()

    let group = DispatchGroup()
    group.enter()
    B() { group.leave() }
    group.wait()

    group.enter()
    C() { group.leave() }
    group.wait()

    group.enter()
    D() { group.leave() }
    group.wait()
    print("but what about me??")
    print("got here")
}

This implementation preserves a single-path of execution, but it breaks the invariant due to the blocking call to wait(). Am I right in my understanding?

What would be nice is to have a model that requires less boiler code than the existing Dispatch group (i.e, group creation, enter and leave) as well as being less error prone (i.e., enter and leave unbalanced). Maybe something like:

await (B, C, D)

I like the approach of seeing async as something that adds a completion handler to a function.

Karl · June 14, 2020, 12:54pm

If I understand correctly, the thing people are unhappy with is the parasitic/“async-everywhere” nature of async/await - basically that any function that awaits an async function needs to itself be async, and so on, all the way up.

That’s a well-known issue in the model for all languages AFAIK (Rust, C#, all have this issue), and the only way to get out of it is to have Futures/Promises land alongside async/await, allowing non-async functions to break the chain by returning a Future instead of await-ing.

Personally I’m a lot more interested in the go model than async/await, but I haven’t read the entire previous thread about limitations of the implementation.

anandabits · June 14, 2020, 1:08pm

Have you read Understanding Real-World Concurrency Bugs in Go?

Karl · June 14, 2020, 1:11pm

No, but I’ll check it out, thanks.

Nevin · June 14, 2020, 3:21pm

I think I’m starting to wrap my head around the main ideas here. It would be nice if there were a detailed introduction that doesn’t assume the reader already knows the meaning or purpose of async and await, but I haven’t seen one posted yet.

In any case, let me lay out an example.

Suppose I have a function fetchContentsOfURL which retrieves the contents of a URL in the background, and uses a completion handler to process the resulting text.

And I have another function downloadImage which does similar but the completion handler works with image data.

And a third function processImage which runs some image manipulation code in the background, and uses a completion handler to do something with the transformed image.

Now in my app, I have an array of URLs, and for each one I want to fetch their contents, parse them for image URLs, then download and process each image, adding the results to a library the user can browse.

With the existing way of doing things, I might write code like this:

func getImages(urlList: [URL]) {
  for url in urlList {
    fetchContentsOfURL(url) {
      for imageURL in extractImageURLsFromHTML($0) {
        downloadImage(imageURL) {
          processImage($0) {
            imageLibrary.addImage($0)
          }
        }
      }
    }
  }
}

Importantly, this function does not block. All it does it start fetching the contents of all the URLs, and let those fetches and subsequent callbacks happen in the background. The getImages function will return almost immediately, having started this background work, and the program will continue without delay.

Using async / await I should be able to achieve the same thing with code like this:

func getImages(urlList: [URL]) {
  for url in urlList {
    let html = await fetchContentsOfURL(url)
    for imageURL in extractImageURLsFromHTML(html) {
      let image = await downloadImage(imageURL)
      let result = await processImage(image)
      imageLibrary.addImage(result)
    }
  }
}

Those functions should be equivalent, with all the URLs in the list being handled in parallel (or however the background task scheduler dispatches them).

In other words, each iteration of the outer for loop simply kicks off a background task, and continues onto the next iteration to kick off another background task until they have all been started. Then getImages returns, and the background tasks do their thing in the background while the rest of the program continues.

Similarly, all the image URLs extracted from a given html response are also handled in parallel. The inner for loop works much the same as the outer one, kicking off each task in turn.

Or at least, this is the type of code I would want to write, and this is how I would hope and expect it to behave. I want to do things like kick off many asynchronous actions in parallel, and have each of those actions subsequently perform more actions when it has completed.

This is the sort of use-case I envision: multiple parallel task pipelines consisting of sequential steps.

It’s worth noting that getImages is not itself async. It does not take a completion handler. It does not do anything when the calls it kicks off are complete. If we wanted to have something happen when all images have been fully processed we could make it take a callback, but in this example we don’t so it doesn’t.

Karl · June 14, 2020, 3:33pm

That’s not how I understand it. Await-ing means your function gets suspended and resumed again later (meaning it also has to be async). At each point you await, the function will pause. What you’re talking about is more like the go model.

The way I see it is that an async function captures the bits of state that it needs for further processing, packages it as an instantiable object, and executed it as it’s own stream of dependent operations.

This is a useful (brief) overview: Does Go need async/await? | yizhang82’s blog

Nevin · June 14, 2020, 3:36pm

Then how, under your model of async / await, can I achieve the behavior I want?

Karl · June 14, 2020, 3:40pm

You would call them all without awaiting (returning Futures), and use some kind of combinator to create one future that was fulfilled when all the tasks completed (like a DispatchGroup). Then you’d await that.

For example, Rust has the JoinAll combinator:

A future which takes a list of futures and resolves with a vector of the completed values.

This future is created with the join_all function.

Nevin · June 14, 2020, 3:42pm

Could you show me an example?

It sounds complex and, at least as I picture it, rather ugly.

I’d like to see how the exact workflow I described would look.

Karl · June 14, 2020, 4:27pm

It would look quite a lot like Combine, most likely. Lots of functional combinators. That's why this kind of programming is so popular at the moment - it lets you express a complex chain of data dependencies in a fairly elegant-ish and distinctive way.

Simple chains can happen with async/await, but when you need finer-grained control, you need to drop to future/task objects.

That said, you've explicitly opted to do everything you can concurrently (fetching the content of the URLs, then the content of the URLs extracted from them, then the processing again parallelised), which is probably not the best design in practice. That said, for what you've asked for, the code isn't tooooo bad (and I'm not really an expert at reactive programming - there could be simpler ways).

async func fetchContentsOfURL(_: URL) -> HTML { ... }
async func downloadImage(_: URL) -> Image { ... }
async func processImage(_: Image) -> Result { ... }
// not async.
func extractImageURLsFromHTML(_: html) -> [URL] { ... }

func getImages(urlList: [URL]) {
    urlList.map { (url: URL) -> [Future<[URL]>] in
     // Calling async function without awaiting returns a Future. 
     return fetchContentsOfURL(url).map { (html: HTML) -> [URL] in
        return extractImageURLsFromHTML(html)
      }
    }
    .flattened() // [Future<[URL]>] -> [Future<URL>]
    .map { (fImageURL: Future<URL>) -> Future<Void> in 
      return fImageURL.flatMap { (imageURL: URL) -> Future<Void> in
        // Lets imagine you can flatMap with an async closure.
        let image  = await downloadImage(imageURL)
        let result = await processImage(image)
        imageLibrary.addImage(result)
      }
    }.run()
}