Understanding why an async function that returns almost immediately puts system in a bad state in Swift

cantor · September 6, 2022, 5:18pm

I do state management locally in my class using a enum. If a particular async work is not done then I kick it off in a Task and then await that task's result. Once the async function completes, it caches the values and updates the state.

When the async task is already completed we use the cached value and return early after updating state. It is this path that sometimes throws the system off. The task completes but the next piece of code never picks up the state update. So all subsequent calls give me an error because we are not in an expected state. (This DOES NOT happen every time)

So we go from disabled -> starting -> starting -> starting forever, instead of going from disabled -> starting -> started.

Following is a skeleton code I wrote in playground to explain the code structure. I was NOT ABLE to recreate the prod error.

import Foundation

struct AsyncResult {
    let resultKey: String
}

enum State {
    case disabled
    case starting(Task<(), Error>)
    case started(AsyncResult)
}

enum FinalResult {
 case result1
 case result2
}

enum ClientErrors: Error {
    case invalidState(State)
}

class Client {
    
    var state: State = .disabled
    var defaults = UserDefaults.standard
    var key = "resultKey"
    
    private func doSetup() async throws {
        if let result = defaults.string(forKey: key){
            // Cached value is read, state is updated and we return immediately instead of doing async work.
            print("Reusing saved value")
            state = .started(AsyncResult(resultKey: result))
            return
        }
        
        print("Doing setup")
        
        // Simulate time taken by async work needed to set the key
        try await Task.sleep(nanoseconds: 2_000_000_000)


        try await Task.sleep(nanoseconds: 1_000_000_000)

        
        try await Task.sleep(nanoseconds: 3_000_000_000)

        defaults.set("bar", forKey: key)
        state = .started(AsyncResult(resultKey: "bar"))

    }
    
    func doWork() async throws ->  FinalResult {
       
        // Start the async task if not already started
        if case .disabled = state {
            state = .starting(Task { try await doSetup() })
        }
        
        switch state {
        case .disabled:
            throw ClientErrors.invalidState(.disabled)
        case .started(_):
            print("Already setup")
            break
        case .starting(let task):
            print("Waiting for task to complete")
            // Wait on async task
            try await task.value
        }
        
        // Async task should be completed else throw an error
        guard case .started(let result) = state else {
            // This code block runs and throws an error even though a cached value is read and state is updated above
            // From logs I see it reusing saved value but this call and all following calls will fail with the same error
            // after they enter here
            throw ClientErrors.invalidState(state)
        }
        
        print("Will use result: \(result)")
        // Prod code has more logic for what to return
        return FinalResult.result1
    }
}

UserDefaults.standard.removeObject(forKey: "resultKey")

var client = Client()

// This is meant to simulate what happens in prod code
// I am unable to get it to throw an error and crash in this playground example
for i in 1...100 {
    print("Iteration \(i)")
    try! await client.doWork()
    if Bool.random() {
        print("Resetting client")
        client = Client()
    }
}

John_McCall · September 6, 2022, 5:32pm

Do you maybe have a data race in your production code? Nothing about this code is actually concurrent.

cantor · September 6, 2022, 5:38pm

The only place I modify the state is in doWork and doSetup. So if there was a race then it would stem from calls to doWork as doSetup is private. But from my logs, I see this error for calls that happen minutes apart.

Starting Task
Waiting for task to complete
Reusing saved value
throws ClientErrors.invalidState(state)

// Few mins later
Waiting for task to complete
throws ClientErrors.invalidState(state)

So the second time it did not read the previously set value.

jonathanpenn · September 6, 2022, 8:06pm

Why isn't Client an actor?

cantor · September 6, 2022, 8:07pm

I am still new to async await. I have briefly read about actors but do not know how to use them.

Jon_Shier · September 6, 2022, 8:20pm

First, you'll want to do your testing in a macOS command line tool, not a playground, as otherwise it's not in a realistic async environment.

Second, you probably want to print some sort of unique identifier for Client as part of your debug logging, otherwise you can't be sure which instance is printing when you reset the Client instance.

Third, Tasks are concurrent, meaning they can execute in arbitrary order, so simply starting them in a particular order in time doesn't guarantee they'll execute in the order. Every await you have allows other work to proceed, so you'll need to do some work to guarantee order of execution. Unfortunately there's nothing built in to let you do that easily without resorting to serial DispatchQueues but there are solutions out there for using continuations to wait for bits of work to finish asynchronously. I suggest you investigate.

jonathanpenn · September 6, 2022, 8:21pm

You have a class here that has two async methods that modify a local state property. Your async methods are not marked with any global actor (like @MainActor) so they could execute on any thread at any time.

This is exactly what actors are meant to help with. A reference type that needs its internal state protected from data races. Otherwise you'll need some other way to protect access to state with a critical section.

Jon_Shier · September 6, 2022, 8:23pm

That will help with thread safety but not with order of operations, unfortunately. @cantor If you can, run the the Xcode 14 beta and enable the thread sanitizer, which was finally updated to work with Swift's native concurrency in that version. That should also help you find any safety issues the compiler isn't yet complaining about.

cantor · September 6, 2022, 8:27pm

@Jon_Shier I have Xcode 14. But how do I use tsan from command line? Also how would I go about testing my demo code from command line?

And @jonathanpenn running this code on a Main thread sounds risky. Can I run it on a dedicated Client specific actor that uses a serial queue created specifically for the client? I know how to do this using callbacks but do not know the async await equivalent.

Jon_Shier · September 6, 2022, 8:31pm

By command line tool I meant that project type in Xcode. With that project you can still enable the thread sanitizer like normal, in the scheme's diagnostic options.

As for the tool itself, I recommend creating a new tool project, renaming main.swift to <ProjectName>.swift, and using the @main construct as the root of the executable.

@main
enum <ProjectName> {
    static func main() async {
        // Do async testing.
    }
}

Jon_Shier · September 6, 2022, 8:38pm

You're already part of the way there with the ordering issue since you capture the Task that's performing the work. You just need to ensure subsequent invocations of doWork wait for that Task to complete (and has properly transitioned your other state) before starting another sequence of work.