TicTacToe random error : Combine bug?

Hello TCA fans,

I've witnessed a bug in the TicTacToe sample application and it looks like a Combine bug :
Sometimes when pressing the login button with 2fa@gmail.com / password as credentials, nothing happens. It's completely random and I can try 20 times without a problem and then it occurs for some reason. I've investigated the issue and realised the action loginResponse never arrived and seemed to be lost in the 1 second delaying. I've managed to reduce the problem to these few lines running in a playground :

import Combine
import Foundation

struct C {
    static var cancellables = [Cancellable]()
}

func performLoop(_ loopNb: Int) {
    print("loop \(loopNb)")
    
    var receivedValue = false
    
    C.cancellables.append(
        Just(1)
        .delay(for: .milliseconds(500), scheduler: DispatchQueue.global())
        .receive(on: DispatchQueue.main)
        .print("afterDelay")
        .sink(receiveCompletion: { _ in
            if receivedValue {
                performLoop(loopNb + 1)
            } else {
                print("Here is the bug !!!!")
            }
        }, receiveValue: { value in
            print("my value is here !")
            receivedValue = true
        })
    )
}

performLoop(0)

After a few loops (totally random can be 1 can be 300), the completion is fired and the value never comes ("Here is the bug" gets printed)
Note that if we comment .receive(on: DispatchQueue.main), the value always arrives but the completion sometimes arrives before the value ...

As it happens on XCode 11.5 and XCode 12 beta, I'm going to file a report to Apple for this.
Have you already seen this kind of behavior from delay operator and perhaps find a way to circumvent it for now ? :sweat_smile:

Thanks for reading this and for your great work ! :slight_smile:

1 Like

Wow thanks for tracking this down @bioche! It's something that @mbrandonw and I have noticed but we haven't spent enough time to figure out what was going on. If it's a Combine bug hopefully it will get fixed! But maybe this is enough to figure out a workaround for the demos. Maybe we can avoid the delay operator (if it is the culprit) and do an asyncAfter instead.

1 Like

@stephencelis @mbrandonw I actually have got a response from Apple on this issue !

Basically the reason behind this weird behavior is that DispatchQueue.global() gives a concurrent queue which puts the events in random order. If instead we use a serial queue we have control over like DispatchQueue(label: "delay-queue") it solves the issue :slight_smile:

I guess it's good to know to avoid any future head scratcher :joy:

4 Likes

@bioche Good find, thanks! We'll get a fix in soon: Use serial queue for Tic Tac Toe authorization by stephencelis · Pull Request #320 · pointfreeco/swift-composable-architecture · GitHub

2 Likes

I think I'm struggling with the same bug. I'm also using Just and the delay operator with DispatchQueue.global() as the scheduler. Why is this issue caused by a concurrent queue? Could you elaborate on the cause of this bug?

I've only seen it mentioned passingly and it has now become folklore in the community, but apparently you should not use concurrent queues for Combine schedulers. Here's one reference to that:

It would be nice to have this specifically called out in the documentation, along with a description of why.

This later post in the same thread implies that you can use a concurrent queue as a Combine scheduler:

So to follow up here, there are some changes incoming in the regards to the way downstream events are propagated. We are now able to satisfy the constraint of 1.03 even if the DispatchQueue is concurrent or the OperationQueue is not a restriction of maxConcurrentOperations of 1, or for that matter any valid scheduler being concurrent; we will always send serialized events on that requested scheduler for .receive(on:) . The one remaining caveat that we deviate from the specification slightly is that upstream events such as cancel() and request(_:) in our world can happen concurrently. That being said we do handle them in a thread safe manner.

(This is not to imply I think doing so is a good idea.)