Reliably testing code that adopts Swift Concurrency?

stephencelis · May 13, 2022, 5:14pm

I’m having a hard time figuring out how to test application code as it adopts async-await and begins to be scheduled concurrently. I’ll try to show a simple example.

Here’s a basic SwiftUI application that tracks how many times a user has taken a screen shot. It has a view model with an endpoint that spins off some async work, and then the view calls this endpoint when it first appears:


class ViewModel: ObservableObject {
  @Published var count = 0

  @MainActor
  func task() async {
    let screenshots = NotificationCenter.default.notifications(
      named: UIApplication.userDidTakeScreenshotNotification
    )
    for await _ in screenshots {
      self.count += 1
    }
  }
}

struct ContentView: View {
  @ObservedObject var viewModel = ViewModel()

  var body: some View {
    Text("\(viewModel.count) screenshots have been taken")
      .task { await viewModel.task() }
  }
}

I’d like to write a test that simulates loading this screen, taking a screen shot, and asserting that the view model’s data has incremented. I’ve come up with the following:

@MainActor
func testBasics() async throws {
  let vm = ViewModel()
  let task = Task { await vm.task() }

  XCTAssertEqual(vm.count, 0)

  // Give the task an opportunity to start executing its work.
  await Task.yield()

  // Simulate a screen shot being taken.
  NotificationCenter.default.post(
    name: UIApplication.userDidTakeScreenshotNotification, object: nil
  )

  // Give the task an opportunity to update the view model.
  await Task.yield()

  XCTAssertEqual(vm.count, 1)

  task.cancel()
  await task.value
}

The calls to Task.yield feel wrong to me, but I don’t know of an alternative.

The real problem with this code, though, is that the test occasionally fails! You can use Xcode’s “run repeatedly” feature 1,000 times and will almost always get a failure or two. From what I can gather, this is because there’s no guarantee that Task.yield will suspend long enough for the task to do its work.

I can sprinkle in more Task.yields and the test fails less often:

// REALLY give the task an opportunity to start executing its work.
await Task.yield()
await Task.yield()
await Task.yield()
await Task.yield()
await Task.yield()

Or we could try adding explicit Task.sleeps, instead.

But these obviously can’t be the recommended ways of handling the problem. Both solutions are flakey and could fail at any time, and sleeping slows tests down unnecessarily.

So how are we supposed to test async code like this?

I’ve pushed the project if anyone wants to mess around with it: GitHub - stephencelis/ScreenshotWatcher: A demonstration of testing uncertainty in Swift Concurrency

tourultimate · May 13, 2022, 6:42pm

We have experienced the same issue. And we, too, have used the await Task.yield(). What we've noticed is that there are certain machines where running the test repeatedly will fail much more often than other machines. For example, I'm using an M1 mac and my colleague is using a late model intel based mac and these types of test fail much more often for him. Sometimes even running the test once will fail for him. And sometimes tests with these yield statements will never fail for me even if I run the test hundreds of times repeatedly.

I would love to hear the preferred solution. Thank you for asking this question.

stephencelis · May 13, 2022, 7:43pm

To be clear I think Task.yield() shouldn't be necessary. Ideally Swift would provide tools to run async code in a serial, predictable manner in tests, perhaps using executors. Maybe a compiler engineer can weigh in on if this is or will be possible.

Otherwise it seems like code cannot adopt async-await and hope to be testable, which is a big bummer because equivalent Combine code is quite easy to test

Philippe_Hausler · May 13, 2022, 7:49pm

One concept that I have toyed for doing async things in synchronous contexts is to send a closure to an AsyncStream that is being iterated in a singular task. That way you avoid potential costly task creation. However it comes at the caveat that priority is a bit of a mess.

stephencelis · May 13, 2022, 8:07pm

@Philippe_Hausler Could you sketch out how that could address the issue in the example above? I'm not sure I understand it.

The problem seems simple enough. I want to test that a view model's async endpoint works:

I tell the view model to start doing some async work
And then I communicate to a synchronous interface that feeds data to that async work

But because 1 has the async hop, 2 can happen before 1, causing the test to fail.

Can we maybe agree that there should be a simple solution to this problem?

Cyberbeni · May 13, 2022, 9:53pm

If you just use a regular block based observation instead of AsyncSequence then count should increment on the same thread as the notification is posted and since you want to receive the notification on the main thread and it gets posted on the main thread, you don't need any extra dispatching.

You could write a method that does await Task.yield() until the condition you pass into it becomes true or a certain amount of time passes.

Either way, I don't think this test doesn't really test anything meaningful about your V/VM.

mlarandeau · May 13, 2022, 11:04pm

I have very little experience with async/await, but this seemed like an interesting issue to tackle. I came up with something that is perhaps a bit messy but seems reliable. I ran this ~30,000 times without failure. It avoids the need for yield() or sleep().

class ScreenshotWatcherTests: XCTestCase {
    
    func testBasics() {
        let vm = ViewModel()
        
        let startExpectation = XCTestExpectation(description: "Waiting for task to start")
        let task = Task {
            DispatchQueue.main.async { startExpectation.fulfill() }
            await vm.task()
        }
        
        wait(for: [startExpectation], timeout: 1)
        
        let expectedCount = 1
        let waitForCountExpectation = XCTestExpectation(description: "Waiting for count")
        Task {
            for await value in vm.$count.values where value >= expectedCount {
                break
            }
            waitForCountExpectation.fulfill()
        }
        
        Task {
            for _ in 0..<expectedCount {
                await NotificationCenter.default.post(name: UIApplication.userDidTakeScreenshotNotification, object: nil)
            }
        }
        
        wait(for: [waitForCountExpectation], timeout: 5)
        
        XCTAssertEqual(vm.count, expectedCount)
        
        task.cancel()
    }
}

The main points:

The test is NOT marked with @MainActor or as async. For some reason, it work fine with one or the other, but once you have both, waiting for the expectations doesn't work. Neither seemed necessary for the test though, so I removed them.
An expectation is used to wait for the initial task to run. The fulfillment of the expectation is dispatched to the main queue just to allow the view model to actual start its task first.
The underlying publisher, vm.$counts, is used to observe when the publisher actually updates its values in response to the notifications. Another expectation is used to wait for the proper number of notifications to be received.

kiel · May 13, 2022, 11:52pm

I know this is not your fault and you’re doing your best with limited APIs.

This test is the stuff of nightmares and the async testing experience should be an top priority problem to solve for Swift 6. It’s hard to understand what’s going on in the test, it took a handful of experienced people to write and as such hard to maintain. Getting people into a habit of writing tests at all is hard enough, let alone writing good tests. This kind of friction will result in less tests (it’s too hard and time consuming, gotta ship it etc) and less tests writ large impacts quality in the industry.

WWDC will indubitably promote using sequences to work with values over time. After the swoons have passed, we would do well to immediately ask ourselves how we can test such code before rushing off and implementing them in our apps.

Maybe a working group for testing is needed.

stephencelis · May 14, 2022, 2:59pm

The point of my post is to show that the mere introduction of async-await to otherwise simple, easy-to-test code turns it into something that seems to be impossible to test. I could also rewrite the equivalent code in Combine using NotificationCenter's publisher API and everything becomes easy to test.

I would love to use async-await code and have a path forward to test it. Rewriting all async-await code when you want to test it doesn't seem like a practical way forward for users of the language, especially when Swift Concurrency has been promoted so much.

This provides a workaround for the second Task.yield() but not the first. There's no material condition that changes between calling await vm.task() and the view model's for await _ in, but the latter loop must start before notificationCenter.post is called in the test.

This example is purposefully simple to demonstrate the root problem, which only gets worse for complex, async logic that is more valuable to test.

stephencelis · May 14, 2022, 3:09pm

Thanks for sharing a workaround! I hope this isn't the way we're expected to test async-await code, though

I think the combination of 1.) expectation and 2.) DispatchQueue.main.async is probably similar to calling Task.yield() twice, which also gets the test to pass pretty reliably for me. It's still not guaranteed, and will be a flakey test, especially as application logic changes over time.

stephencelis · May 14, 2022, 3:40pm

Good point! I guess this will become an even bigger problem as APIs are introduced that have no synchronous alternative. These APIs will just not be reasonably testable.

hassila · May 14, 2022, 5:12pm

Without too deep thinking: Did you try using LIBDISPATCH_COOPERATIVE_POOL_STRICT when running the test? Perhaps that’ll alleviate some of the problem.

caiozullo · May 14, 2022, 7:57pm

Hi! For the specific example you provided, I believe an inner task as a workaround can ensure reliably that the notification is posted after the subscription. This should solve the first Task.yield(). But the downside is that the code won't read linearly anymore:

@MainActor
func testBasics() async {
  let vm = ViewModel()

  // this task will execute some time in the future
  Task {

    // [inner task] to post notification *after* the subscription.
    // This task will also execute some time in the future.
    Task {
      NotificationCenter.default.post(
        name: UIApplication.userDidTakeScreenshotNotification, object: nil
      )
    }
    
    // this will run *before* the inner task
    await vm.task()
  }

  ...

For the second Task.yield(), you can wait until the value eventually becomes what you expect as described in other posts on this thread.

Proof of concept (ran 100,000 times without failing once):

class ViewModel: ObservableObject {
  @Published var count = 0
  
  @MainActor
  func task() async {
    let screenshots = NotificationCenter.default.notifications(
      named: UIApplication.userDidTakeScreenshotNotification
    )
    for await _ in screenshots {
      self.count += 1
    }
  }
}

class Test: XCTestCase {
  
  @MainActor
  func testBasics() async throws {
    let vm = ViewModel()
    let task = Task {
      Task {
        NotificationCenter.default.post(
          name: UIApplication.userDidTakeScreenshotNotification, object: nil
        )
      }
      
      await vm.task()
    }
    
    XCTAssertEqual(vm.count, 0)
    
    await assertEventuallyEqual({ vm.count }, 1)
    
    task.cancel()
    await task.value
  }
  
  @MainActor
  func testWithMultipleNotifications() async {
    let vm = ViewModel()
    
    Task {
      Task {
        NotificationCenter.default.post(
          name: UIApplication.userDidTakeScreenshotNotification, object: nil
        )
      }
      
      await vm.task()
    }
    
    XCTAssertEqual(vm.count, 0)
    
    await assertEventuallyEqual({ vm.count }, 1)
    
    NotificationCenter.default.post(
      name: UIApplication.userDidTakeScreenshotNotification, object: nil
    )
    
    await assertEventuallyEqual({ vm.count }, 2)
    
    NotificationCenter.default.post(
      name: UIApplication.userDidTakeScreenshotNotification, object: nil
    )
    
    await assertEventuallyEqual({ vm.count }, 3)
  }
  
}

func assertEventuallyEqual<T>(
  _ expression1: @MainActor @escaping () -> T,
  _ value2: T,
  timeout: TimeInterval = 0.1,
  file: StaticString = #filePath,
  line: UInt = #line
) async where T : Equatable {
  let maxDate = Date() + timeout
  var value1: T?
  
  while (value1 != value2) && maxDate > Date() {
    value1 = await Task { await expression1() }.value
  }
  
  XCTAssertEqual(
    value1,
    value2,
    "Didn't match after \(timeout) seconds",
    file: file,
    line: line
  )
}

Definitely not how I'd want/expect to write async-await code/tests - but I believe people will be doing it as a workaround at the moment. It's slower than it should be and probably unreliable in more complex cases. Also, harder to read as it's not fully linear and requires thinking about task scheduling. Would love to see some guidance at this WWDC.

jaredgrubb · May 15, 2022, 4:17pm

I've hit a similar issue. With the "main" context, I used a CFRunLoop to observe when the runloop stops executing events, but I don't know how I would do it for an arbitrary executor (empirically this seems to have been enough).

But, I do think there should be a way to "flush" an executor (eg, maybe "yield" could return if it doesn't actually yield to anything; this would let the caller decide if/how-long to re-yield).

mlarandeau · May 16, 2022, 3:52am

I think the point about whether or not the test is meaningful is a good point to bring up. I understand that this particular example is purposefully designed to be simple to demonstrate the larger point which is the complexity of testing async code. But it is still important to ask "What are we trying to test?" and "What is the test actually testing?". I would argue that as consumers of Combine and NotificationCenter, we don't need to test whether or not the notification is properly sent to our view model or that updating the published value notifies its subscribers. What we really want to test is what our view model does with the notification once it is received. If this logic were separated out into its own method (or some other object for a more complex system), then you could write a test for this piece without the need for handling async in the test. This doesn't capture the whole system as the original post is trying to do, but it gets you most of the way there with far less code that is less complex and more maintainable.

lukeredpath · May 16, 2022, 9:45am

I think you're over-thinking what is supposed to be a trivial example to demonstrate the fundamental problem: how do you easily test async/await code.

That said, whilst what you suggest would allow you to unit test your domain logic, it's equally valid to want to write some form of integration test that that you are observing the right thing. This kind of test would catch bugs like subscribing to the wrong notification, for example.

Whether or not you find this valuable feels besides the point of this thread. As has been pointed out, writing a test like this for something that used Combine would be fairly trivial as publishers can be fully controlled using schedulers...no such think exists for async/await right now. Its not like you can easily abstract away the async nature of such APIs...choosing to adopt async/await in your code is a very strong architectural decision as it becomes baked into everything you do. The testing story needs to be greatly improved.

mlarandeau · May 16, 2022, 3:53pm

I definitely agree that there is a time and place for integration tests and for tests that ensure everything is wired up correctly. And as more of an app's architecture relies on async/await, the more critical it will be to have better tools for testing. However, to my knowledge, we don't have those tools yet. So if you are writing tests for async/await code right now, you have to be able to work with what we have right now; which is what I was trying to provide in the workaround I posted earlier in the thread. Based on others reactions, the workaround was unsatisfactory (which I completely understand). The point of my comment is that if testing the whole system with the tools currently available is unreliable or hard to maintain, it is still possible to get most of the way there in a way that is reliable and maintainable. In a large code base with many people working on it every day, a test that fails every couple hundred attempts could have serious impacts on team output. My suggestion provides an alternative to not testing that system at all.

I do disagree though that the original post was designed to be a "trivial example" and that handling the timing of publishers with schedulers is "fairly trivial". Instead, I think the original post did an excellent job of highlighting the complexity of what seems like a simple system. For instance:

The call to vm.task() doesn't return. So even if you had a way to explicitly wait for a task to complete, that wouldn't help here because the task never completes. This by itself creates a lot of the complexity and difficulty in this particular example.
The system under test actually uses two different asynchronous APIs; async/await and Combine. Yes, they work hand in hand, but they are still two different systems whose concepts must be understood to properly use and write meaningful tests for.

stephencelis · May 16, 2022, 4:53pm

@caiozullo Thanks for sharing another workaround! It's interesting that your version passes with 100,000 iterations, but I think your code ultimately has the same flaw as the code I originally posted.

You start a new task to send the notification like this:

Task {
  NotificationCenter.default.post(
    name: UIApplication.userDidTakeScreenshotNotification, object: nil
  )
}

And in doing so you are giving other parts of the code just a little bit of time to execute their work, in particular it seems enough time has passed for the for await in the view model to start. So while it seems to work in this case, there are no guarantees that doing work in a new task will give some other unrelated async code enough time to do its work.

In the end I don't really think firing up this new task is much different from the Task.yield() from my original code sample.

Spinning up a new task seems to have a higher success rate, but so does changing my code to call Task.yield() twice. When two Task.yield()s are added to my original example then it also passes with 100,000 iterations:

await Task.yield()
await Task.yield()
NotificationCenter.default.post(
  name: UIApplication.userDidTakeScreenshotNotification, object: nil
)

Test Suite 'ScreenshotWatcherTests.xctest' passed at 2022-05-16 08:37:05.310.
Executed 100000 tests, with 0 failures (0 unexpected) in 61.353 (97.453) seconds

So this all takes us back to the original reason I posted. There just doesn't seem to be a way to write a test for async code like this and guarantee it will pass. We are either getting very lucky, or if the system was under enough stress I think the view model task could fail to progress to the for await before the notification is posted, and thus fail.

All of this is to say that it seems the only way for us to guarantee the order of execution of these things is if we can somehow serialize the execution of tasks. At @Philippe_Hausler's request I am going to start a new forum post on just that topic if anyone is interested in following along. I'll post a link here when I do.

stephencelis · May 16, 2022, 4:55pm

That's all the more reason to highlight the problem. I'm not aware of much if any acknowledgement of the problem from the core team, or any timeline/plan/pitch for addressing the problem.

I appreciate your suggestion and the other workarounds, but in the end I think all they do is reinforce that a problem exists.

Philippe_Hausler · May 16, 2022, 4:58pm

Btw part of the success there is because the NSNotification async sequence does have a slight buffer inside of it to prevent this type of failure. The possibility of missed notifications in this is virtually impossible. It is not an unlimited buffer so given totally unreasonable load to the system (e.g. locking up everything because of thread exhaustion etc, it should still produce values in this particular fashion).

I think there is a broader discussion to be had here with regards to testing.