Reliably testing code that adopts Swift Concurrency?

stephencelis · March 27, 2023, 11:06pm

Just wanted to share an update, as we've continued to write more and more tests for code that uses Swift concurrency. Introducing Swift concurrency to a feature continues to be the biggest cause of test flakiness in our experience, but we have managed to reduce flakiness dramatically by cribbing some helpers from swift-async-algorithms, in particular, a "C async support" module, which exposes a global hook override for task enqueuing, which we use to redirect everything to the main, serial executor:

github.com

pointfreeco/swift-composable-architecture/blob/main/Tests/ComposableArchitectureTests/SerialExecutor.swift#L1-L12


      
          import _CAsyncSupport
          
          
@_spi(Internals) public func _withMainSerialExecutor<T>(
            @_implicitSelfCapture operation: () async throws -> T
          ) async rethrows -> T {
            let hook = swift_task_enqueueGlobal_hook
            defer { swift_task_enqueueGlobal_hook = hook }
            swift_task_enqueueGlobal_hook = { job, original in
              MainActor.shared.enqueue(unsafeBitCast(job, to: UnownedJob.self))
            }
            return try await operation()
          }

Whenever we have a flakey continuous integration failure, we wrap the test with this helper and don't typically have a concurrency-based problem with that test again. As a bonus, the test runs more quickly.

The solution is far from perfect, but has saved us from a ton of pain, and we think it basically makes async code behave more like Combine code (i.e. well-defined "subscription" order), and hence becomes a lot more reliable to test.

I believe it will probably only work for code that does not use custom executors, but that should be the case for most folks right now. We also haven't tried to include this code in library/application code yet, but if Swift doesn't provide a public solution to the problem, we'll likely look into extracting this helper into its own package, which should make it easier to drop into a project.

gwendal.roue · March 28, 2023, 11:10am

That's very generous of Point-Free. Many developers appreciate the time you spend building great libraries.

Now, not all developers are happy to depend on third-party libraries for such a fundamental need. And not all developers have the time, skills, and dedication to refine and extend the built-in fundamental apis.

@tkremenek, this sounds like an alarm bell to my ears. It looks like the language and its libraries don't meet some basic developer needs. We're all happy that academic-quality work is delivered, but once correctness has been achieved, isn't it the time to focus of adequacy? It looks to me that the Swift language would profit from a little more focus on practical use cases, for the benefit of all, and if only to validate the theoretical designs.

stephencelis · April 13, 2023, 8:34pm

For those interested, we've exported the withMainSerialExecutor helper in this release of our Dependencies library. There are a number of Swift concurrency helpers in the library that aid with testing concurrency, so this is one more to add to the group!

stephencelis · April 14, 2023, 9:25pm

We spoke too soon, sorry! Introducing this C module to the library exposed many of our library users to an Xcode bug that prevents an app's unit test target from linking to a package. The only workaround is for the user to manually link their test target to the static library, which is not always possible. The revert is here:

github.com/pointfreeco/swift-dependencies

Revert "Add SPI-based `withMainSerialExecutor` (#81)"

pointfreeco:main ← pointfreeco:revert-81

opened 09:18PM - 14 Apr 23 UTC

stephencelis

+0 -295

Unfortunately an Xcode bug prevents us from shipping a library with a C module w…ithout breaking common downstream scenarios, like an app having a unit test target. Trying to build such a target results in a compile error: > 🛑 Missing required module '_CAsyncSupport' While it is possible to work around the bug by publicly exposing the `_CAsyncSupport` module as a product and instructing users to manually link to this module in their test targets when they encounter this error, this is a bit of a support nightmare to manage for a helper that's hidden by SPI. For more information, see: - [This `swift-atomics` issue](https://github.com/apple/swift-atomics/issues/54#issuecomment-1474692794), which mentions that this is a tooling issue that cannot be solved by the package alone. Here's hoping Xcode 15 has a fix soon! - [This `GRDB.swift` issue](https://github.com/groue/GRDB.swift/issues/642#issuecomment-575994093), which had the same problem and provides documentation to the end user to manually solve the problem. This reverts commit 615833420b3a01cec57530cae1287ee70dfdb77c.

Linked in the PR is a swift-atomics issue where @lorentey mentions that this is a tooling issue that we can't solve in the package alone, and a GRDB.swift issue where @gwendal.roue had to expose the C module as a product and maintain documentation for users that wish to use GRDB.swift and write tests for their app target.

Till this Xcode bug is fixed I think such a library can't really be shipped without caveats.

Alejandro_Martinez · May 2, 2023, 2:11pm

Hello folks, is there any update on any plans for Swift to solve this issue? It seems that with Xcode 14.3 the runtime has changed some internals and it's causing plenty of fluke tests.

The withMainSerialExecutor mentioned above helps in some scenarios where you want to ensure the order of concurrent tasks, but that's not the most common issue.

The main problem is when we have objects that start an unstructured task to listen for NotificationCenter notifications using the async sequence API in a for loop. I'm the first one that dislikes unstructured tasks but for production code this is the desired API in some cases (a background process that consumers shouldn't be responsable of dealing with), but it makes testing it very tricky. The scheduling of the unstructured tasks is too invariable (which is fair cause is an implementation detail!) making plenty of tests fail "randomly", lately way more than before. We are abusing PointFree's megaYield but even in some specific cases it still doesn't solve the issue. (typical works 1k times locally but fails on the CI from time to time). It also provokes developers that are not experts in Swift concurrency to just drop the megaYield on every test for good mesure potentially hiding other problems with better solutions. In other code we've been forced to change production code to include hooks so tests have something to await on.

Is there a way to have a helper that tells the runtime to immediately and synchronously schedule tasks? Or at least some test function that awaits the runtime scheduling?

On Kotlin land they are in a more reliable situation because they have some, albeit experimental, tools to deal with this. Granted, is another system that is similar but very different, so Swift may not have to offer all their options, but I think we need some.

Although I appreciate PointFree <3 helpers that get us out of this issue I do wish that we got some better tooling from Swift itself soon. Scheduling is an implementation detail and we shouldn't have to add artificial awaits just to make test reliable.

ktoso · May 2, 2023, 2:27pm

What tools specifically are referring to here, to simplify my search for it? I have my own ideas here but would like to see what you’re referring to without naming it.

Alejandro_Martinez · May 2, 2023, 2:41pm

Apologies for not being clear, forgot to add a few links ^^'

So I've referred to:

_withMainSerialExecutor to help test concurrent tasks in order.
megaYield to give time to the runtime scheduling to happen
Kotlin's test dispatchers (and docs) that give control over scheduling to the test. (this is very close to what we are used to do with Rx and Combine schedulers). UnconfinedTestDispatcher is interesting because it simplified a lot test code (UnconfinedTestDispatcher starts new coroutines eagerly, but this doesn’t mean that it’ll run them to completion eagerly as well. If the new coroutine suspends, other coroutines will resume executing.), and funny enough, is how Swift test code used to work for a few Xcode releases (Tasks seemed to be scheduled way faster before)

I think that's all I referenced ^^ cheers

FranzBusch · May 3, 2023, 7:08am

From my experience working on async-algorithms one of the features that is missing for testing is being able to supply a custom global executor and labeling of “jobs”. We have multiple places where we have potential races on the concurrent executor but we cannot unit test them without adding specific hook points where we can suspend the code. In the end, it boils down to a simple example like this

actor Foo {
  Var counter = 0

  func increment() { counter += 1 }
  func multiply(by: Int) { counter = counter * by }
}
func foo(foo: Foo) async {
  await withTaskGroup(of: Void.self) {
    group.addTask {
      await foo.increment()
    }

    group.addTask {
      await foo.multiply(by: 2)
    }
  }
}

Being able to write a unit test for this method that tests both cases of ordering without adding extra in the method itself would be great.

ktoso · May 3, 2023, 7:15am

Wild crazy ideas mode on, none of this is a thorough design;

That's a good example, if we start small with a thing like that we can start considering bigger examples.

Fundamentally I think there's two approaches here:

specific order where the test programmer is able to instruct execution to "I want the A task and then the B task"
- I wonder if this could be done with a custom executor that would just inspect either a task-local within those jobs for an identifier, or using some other method to name tasks and "hold off scheduling one before the other"
fuzzing / tracing execution, we'd again need a way to identify some tasks and record the order in which we've executed them. (the task on ...:12 was first, next the one created on ...:24 etc), we could have a tracing executor which
- I wonder if a simple form of that we could write as a specialized executor if we had the power to identify the job -> source location would be at least a brute "run many times" but once we hit a failure we would be able to dump the execution order of all tasks this executor has executed... This would then feed into the idea 1) where we force that execution order in order to debug issues
- the ultimate form of that is what e.g. foundationdb is able to pull off with their "system execution is 100% reproducible" but that would mean the entire system has to run on this "seeded, order fuzzing executor" and side effects cause problems to such system... I wonder what we could learn from such approaches though.

I think it'd definitely be worth exploring how far we could get with a custom executor if we'd be able to identify jobs to where they've been started -- I wonder if that would be useful already...

End of wild crazy ideas mode

FranzBusch · May 3, 2023, 8:27am

Yep, agree with those crazy ideas. They are worth exploring!

I think the granularity is also important here. We have been taking about tasks in the above example because both tasks only do a single thing and then return. For the full range of testing we might want to work on the “job” granularity I.e. until the next suspension point. Executor hopping is also another interesting edge to consider and we would need to be able control all executors in a test even those of actors

ktoso · May 3, 2023, 8:31am

// ongoing reminder that this is crazy ideas brainstorming mode.

Yeah such approaches only work with control of involved parties. But one step at a time I guess, today we can’t yet give executors to tasks or child tasks for that matter so we would not be in good position to do this with custom actor executors I just realized. There’s hooking the global defiant executor but that may be a big heavy handed for such testing that we’re taking about here…

so, yeah plenty of design problems to get to this — but I like your problem statement with a simple group of 2 child tasks, if we could hook those schedules that’d be an interesting first step hmm.

jacobwallstrom · May 31, 2023, 10:36am

I agree about the need for better control of execution in unit tests, but I have successfully tested Swift async-await code using the following approach:

class AsyncTests: XCTestCase {
     func testBasics() async throws {
         let vm = ViewModel()
         await vm.task()
         XCTAssertEqual(vm.count, 0)
         await NotificationCenter.default.post(
             name: UIApplication.userDidTakeScreenshotNotification, object: nil
         )

         // Give the task an opportunity to update the view model.
         while vm.count != 1 { // needs timeout 
             await Task.yield()
         }
         XCTAssertEqual(vm.count, 1)
     }
 }

 class ViewModel: ObservableObject {
     @Published var count = 0

     @MainActor
     func task() async {
         let screenshots = NotificationCenter.default.notifications(
             named: UIApplication.userDidTakeScreenshotNotification
         )
         Task { 
             try await Task.sleep(for: .seconds(1)) // pretend this takes time
             for await _ in screenshots {
                 count += 1
             }
         }
     }
 }

 struct ContentView: View {
     @ObservedObject var viewModel = ViewModel()

     var body: some View {
         Text("\(viewModel.count) screenshots have been taken")
             .task { await viewModel.task() }
     }
 }

Compared to your original version:

this test does not run on the main actor and thus allows the main actor to run while the test is awaiting.
the async sequence is created before the async task function completes guaranteeing that no events are missed, but the for await loop is created in a task as not to block the function from returning.
the test is awaiting the task function before proceeding
the test is doing Task.yield until the given condition happens

For this approach to work all synchronisation points for the test needs either to be possible to await or otherwise observable as a testable condition. I have also found that the test need a "Host application" for the main actor to run.

mbrandonw · May 31, 2023, 2:03pm

Hi @jacobwallstrom, while that test does appear to pass 100% of the time (and this technique was discussed above), you have unfortunately subtly changed the code that is being tested.

The original code performs the async work inside the task method:

func task() async {
  let screenshots = NotificationCenter.default.notifications(
    named: UIApplication.userDidTakeScreenshotNotification
  )
  for await _ in screenshots {
    self.count += 1
  }
}

Whereas your code spins up an unstructured task (and in fact isn't async at all and so doesn't need the async keyword):

func task() async {
  let screenshots = NotificationCenter.default.notifications(
    named: UIApplication.userDidTakeScreenshotNotification
  )
  Task { 
    for await _ in screenshots {
      count += 1
    }
  }
}

That difference is very important because it means when invoked from the view like this:

.task { await viewModel.task() }

…the original code will cancel the async stream when the view goes away, whereas yours will not. You can test this by creating an app that allows you to navigate to the ContentView and seeing that if you drill down and then pop back, taking screenshots still causes the async sequence to emit even though the screen isn't visible. This is not true of the original code.

So, in order to get it back to the original behavior you really do need to perform the async work directly in the task method, rather than creating an unstructured task:

 class AsyncTests: XCTestCase {
   func testBasics() async throws {
     let vm = ViewModel()
-    await vm.task()
+    Task { await vm.task() }
     XCTAssertEqual(vm.count, 0)
     await NotificationCenter.default.post(
       name: UIApplication.userDidTakeScreenshotNotification, object: nil
     )
 
     // Give the task an opportunity to update the view model.
     while vm.count != 1 { // needs timeout
       await Task.yield()
     }
     XCTAssertEqual(vm.count, 1)
   }
 }
 
 class ViewModel: ObservableObject {
   @Published var count = 0
 
   @MainActor
   func task() async {
     let screenshots = NotificationCenter.default.notifications(
       named: UIApplication.userDidTakeScreenshotNotification
     )
-    Task {
     try await Task.sleep(for: .seconds(1)) // pretend this takes time
     for await _ in screenshots {
       count += 1
     }
-    }
   }
 }

And once you do that then your technique unfortunately does not work. The test hangs forever because the notification was posted before the async sequence was subscribed to, and so the model's count never goes to 1.

As far as I can tell, this code is simply not testable in a 100% deterministic manner, but you can be 99% confident in the test if you sprinkle in some yields or sleeps.

jacobwallstrom · May 31, 2023, 6:37pm

Agreed, the await loop is not properly cancelled in my example. But I think this solves the issue without introducing flakiness:

final class AsyncTests: XCTestCase {
     func testBasics() async throws {
         let vm = ViewModel()
         let task = Task { await vm.task() }
         XCTAssertEqual(vm.count, 0)
         await NotificationCenter.default.post(
             name: UIApplication.userDidTakeScreenshotNotification, object: nil
         )

         // Give the task an opportunity to update the view model.
         while vm.count != 1 {
             await Task.yield()
         }
         XCTAssertEqual(vm.count, 1)
         task.cancel()
         await task.value
     }
 }

class ViewModel: ObservableObject {
    @Published var count = 0
    private var screenshots = NotificationCenter.default.notifications(
        named: UIApplication.userDidTakeScreenshotNotification
    )

    @MainActor
    func task() async {
        for await _ in screenshots {
            count += 1
        }
        screenshots = NotificationCenter.default.notifications(
           named: UIApplication.userDidTakeScreenshotNotification
       )
    }
}

But again, this is more of a workaround, and requires the model to be a bit more complicated. And without knowing the issue it can be hard to understand why the model is written this way.

mbrandonw · June 1, 2023, 2:23pm

Yeah that also works, but as you mention, it is weird to contort ourselves in weird ways just to make it testable. I think 6 months from now you would be quite confused as to why you pulled out the notifications async sequence into a property rather than just use it inline in the task method.

We are also heavily depending on the behavior that the notifications async sequence buffers its emissions so that if a notification is posted after the sequence is created but before the sequence is subscribed to, we still get those emissions. It's worth noting that that wasn't always the case (as is evident by some of the approaches earlier in the thread that used to not work but now do). It appears to have been fixed in Xcode 14.3 or so.

There are also other kinds of async sequences out there for which this buffering behavior may not really make sense, such as a web socket client. Then we will be back to square one.

geoffmac_dbx · June 7, 2023, 6:00pm

Putting my thoughts here (which have already been expressed)... I'm trying to build a feature that uses swift async heavily and testing the feature in (in my case with Quick/Nimble but could easily be XCTest) in a maintainable/non-flakey way is proving to be impossible. None of the above solutions are both elegant and guaranteed, and this seems to be a major showstopper for feature development unless Apple/swift team provides tool to do so. Randomly flaking tests is a nightmare for large teams building apps.

stephencelis · July 19, 2023, 6:56pm

We just released a package dedicated to making testing Swift concurrency more reliable:

This includes the withMainSerialExecutor override mentioned earlier in this thread, which redirects all async work to the main actor and works around many of the issues discussed above. Hopefully this will provide an accessible stopgap while we wait for more official solutions.

ktoso · July 19, 2023, 10:00pm

I would like to point out that the proposed hack "withMainSerialExecutor" isn't scoped at all. It affects the entire process by setting the swift_task_enqueueGlobal_hook and I'd be worried about it causing all kinds of threading issues with it flip flopping between enqueueing on main actor and the actual target actor. I'm not even sure concurrency (just plain threading) wise this will be correct tbh.

Could the package name or method at least highlight that this is intended for testing only as well?
It is very concerning that this worryingly unsafe API with global effects is pitched as a general purpose tool (judging by API name and package name).

stephencelis · July 19, 2023, 10:24pm

We can for sure emphasize this behavior in the documentation. We intended to include a warning there, but looks like we forgot about it for the release.

For what it's worth we 100% agree that this is a hack, but it is also a 100% necessary hack to exploit to write reliable tests against Swift concurrency right now. We will happily retire the hack when Swift addresses the shortcomings of this thread.

We also tried our best to introduce better scoping guarantees with a task local, but it didn't propagate to the job enqueuer in certain contexts, e.g. task groups and unstructured tasks, so it seems like Swift just doesn't provide tools for that, either.

Edit: PR adding language: Document `withMainSerialExecutor` caveats by stephencelis · Pull Request #3 · pointfreeco/swift-concurrency-extras · GitHub

Swift_Coder · August 13, 2023, 10:31am

So basically currenlty for those that have transitioned from completion handlers to async await we need to use Tasks in order to test indipendently part of code of a function that uses async await? Because before with completion handler/closures we could save that closure externally from the caller of a func in the Spy Object and decide to execute it at a leter time when we need it int he unit test. is this the only approach that we have if we have decided to use async await instead of closures/Completion handlers?

I mean we can still combine the use of closures and async await to achieve the same thing described before but in that case we would probably need to change the current production code where we are using async await with some closures mechanism but that violates the principle of not changing the production code to be able to write your unit tests...

I am facing currently this problem and i am trying to figuring out how to handle it. I will open shortly a post about it and will link it here if you have the time to look at it.