An unexpected deadlock with Combine only on Release build

Hoon_H · May 29, 2020, 10:29pm

Hello folks.

I got an unexpected deadlock with Combine that happens only on Release build. This doesn't happen on Debug build or with Thread Sanitizer "on". I tried to find a bug in my code but failed. I'd like to ask other people for what I'm missing.

Here's full source code to reproduce the issue. This happens only in Release build. I tested this with Xcode 11.5 (11E608c) on macOS 10.15.4 (19E287).

// main.swift
import Foundation
import Combine

/// Run this code with
/// - **Release build.**
/// - Thread Sanitizer **off**.
/// Then this won't print `OK`.
/// This works if Thread Sanitizer is **on**.

DispatchQueue.main.asyncAfter(deadline: .now() + .seconds(1)) {
    let gcdq = DispatchQueue(label: "Issue2")
    let sema = DispatchSemaphore(value: 0)
    var pipes = [AnyCancellable]()
    let report = PassthroughSubject<Int,Never>()
    Thread.detachNewThread {
        Thread.sleep(forTimeInterval: 1)
        print("will send")
        report.send(1111)
        print("did send")
    }
    report.receive(on: gcdq).sink(receiveValue: { m in
        print("recv")
        sema.signal()
    }).store(in: &pipes)
    sema.wait()
    print("OK")
}
dispatchMain()
// Doesn't print `OK`. Attached Combine sink function doesn't get called.

Interestingly, this works if I do one of these.

Turn debug mode on.
Turn Thread Sanitizer on.
Run the whole block without dispatchMain() call.

Here's code for 3rd option.

// main.swift
let gcdq = DispatchQueue(label: "Issue2")
let sema = DispatchSemaphore(value: 0)
var pipes = [AnyCancellable]()
let report = PassthroughSubject<Int,Never>()
Thread.detachNewThread {
    Thread.sleep(forTimeInterval: 1)
    print("will send")
    report.send(1111)
    print("did send")
}
report.receive(on: gcdq).sink(receiveValue: { m in
    print("recv")
    sema.signal()
}).store(in: &pipes)
sema.wait()
print("OK")
// Prints `OK`.

What is correct behavior and what should I do to make this work?

lukasa · May 30, 2020, 12:57pm

So here’s my guess: you’re being bitten by Swift’s lifetime analysis in release mode, and the subscription is being cancelled.

Swift programmers are accustomed to the idea that variables stop being usable outside the scope in which they were defined. In this case, pipes is such a variable: it’ll stop being usable once the async block exits scope. This may lead you to assume that pipes will be be deallocated when the scope exits, as with C++ (this is a part of RAII). This assumption is wrong.

In Swift, a variable may be deallocated immediately after its last usage site. In this case, pipes is last used at the point of store(in: &pipes). The moment that line executes, Swift is free to deallocate the array pipes, calling deinit for everything inside it. This will drop the AnyCancellables, which will cancel their associated subscription. This can happen before the sema.wait, which will now never complete because the subscriber will never receive a value.

In debug mode this optimisation is not likely to be performed, but in release mode it can be. This would explain the behaviour you’re seeing.

To resolve this, you can use withExtendedLifetime to make pipes live until the end of whatever block you’re using. If you put sema.wait inside that block, that should do it.

Hoon_H · May 30, 2020, 8:44pm

Thanks. This explains everything very well.

Though it explains everything, I'm still in doubt that this is a correct behavior. I just got more questions...

Debug build and Release build with different behaviors? If a variable is intended to die at the end of reference, so before the end of lexical scope, shouldn't it work exactly like that in debug build?

If optimizers are allowed to change length of lifetime, it means optimizers can change program behavior and lifetime is totally unpredictable. I've been thought lifetime is significant and must be visible and predictable to programmers in Swift. Isn't this what we agreed with RC rules from Objective-C era? If optimizers are allowed to break this rule, then how should I manage resources at proper timing? Does core dev team believe that the lifetime and timing is insignificant in Swift? Do they really want unpredictable lifetime?

If I pass a value to a no-op-in-release-build function, optimizers can eliminate call to the function, therefore also eliminate the reference to the value. Then what happens to the lifetime?

Is my question valid? No one in doubt on this like me?

kennyc · May 31, 2020, 4:45am

There is an older article on CocoaWithLove that discussess this very issue:

I had not run into this issue before but now that I'm working a lot more with Combine, I'm finding myself needing to pay a lot more attention to the lifetime of whatever variable is holding an AnyCancellable. If you're more accustomed to C++'s behavior, or even assumed (like I did) that Objective-C followed C++'s behaviour, then Swift's behavior is easy to catch you off-guard.

lukasa · May 31, 2020, 9:06am

You have a series of great questions here that deserve addressing. I don't know exactly where your knowledge starts and ends here so I'm going to try to cover a wide range of topics, so I apologise if some of this is stuff you already know. However, I think it's useful to clarify a bunch of things about the way optimisers work in modern programming languages.

I should stress that I'm not the real expert here: we have actual compiler authors in this community who absolutely know more than I do. Please, compiler authors, weigh in if I'm off-base on anything here. However, this is my rough understanding of why things work the way they do.

Let's start here. You say two things here, but the one I want to start with is "optimizers can change program behaviour". Yes, they absolutely can. Indeed, this is the entire point of an optimizer: to change the behaviour of the program to make it cheaper.

Consider this simple C function. (I'm using C because it's a simpler language and because the rules of the optimizer are much more clearly defined than in Swift.)

void clear(char *buf, int len) {
    for (int i = 0; i < len; i++) {
        buf[i] = 0;
    }
}

What this function does is fairly clear: it sets len bytes of buf to zero. Great.

In debug builds, we get:

clear:                                  # @clear
        push    rbp
        mov     rbp, rsp
        mov     qword ptr [rbp - 8], rdi
        mov     dword ptr [rbp - 12], esi
        mov     dword ptr [rbp - 16], 0
.LBB0_1:                                # =>This Inner Loop Header: Depth=1
        mov     eax, dword ptr [rbp - 16]
        cmp     eax, dword ptr [rbp - 12]
        jge     .LBB0_4
        mov     rax, qword ptr [rbp - 8]
        movsxd  rcx, dword ptr [rbp - 16]
        mov     byte ptr [rax + rcx], 0
        mov     eax, dword ptr [rbp - 16]
        add     eax, 1
        mov     dword ptr [rbp - 16], eax
        jmp     .LBB0_1
.LBB0_4:
        pop     rbp
        ret

This is a fairly direct translation of the program as written. The inner loop loads i, compares it to len, if it's greater than or equal (jge) jumps out of the loop. Otherwise, it loads buf and i, stores 0 into that address, and then adds 1 to i and jumps to the start of the loop again. Exactly the code we wrote.

What happens if we turn on the optimizer to -O2? We get this:

clear:                                  # @clear
        test    esi, esi
        jle     .LBB0_2
        push    rax
        mov     edx, esi
        xor     esi, esi
        call    memset
        add     rsp, 8
.LBB0_2:
        ret

This code is drastically different than the above: in particular, the optimizer has inserted a function call we never wrote!

This leads to a really important thing to understand about optimizing compilers: the "as-if" rule, and "observable behaviour". In languages like C (and Swift), the optimizer is allowed to make any change to program behaviour that does not change the "observable behaviour" of the program. That is, the program must behave "as-if" the code you wrote was executed the way you wrote it, but it is not required to actually execute that code.

So why did I put scare quotes around "observable behaviour"? Because that phrase turns out to be a term of art: it has a very specific meaning, separate from its normal English meaning. The reason for this is important: from the perspective of the CPU, all behaviour is observable behaviour. Any change, even reordering two independent instructions, is observable. It's observable to you too: you can look at the binary and observe that it's different!

So language authors have to define what kinds of effects count as "observable behaviour". Some of these seem very natural to us as programmers because we're used to them: things like loop unrolling, constant-propagation, and other straightforward optimisations seem reasonable enough. Yes, technically if we tried to break on the specific instruction we were looking for, in debug mode that instruction would be there and in release mode it would not, but we generally forgive the compiler for this.

But there are other things that might feel more observable to us, but the compiler does not consider observable. One of these is some kinds of function calls. What function calls a given program makes, in what order, is not in and of itself considered observable behaviour. This is why the compiler is allowed to insert a call to memset in my code, even though I didn't call it: not calling memset is not an observable behaviour of my program.

Similarly, the compiler is allowed to not call functions it doesn't need to. For example, consider this code:

int clear_value(void) {
    return 0;
}

void clear(char *buf, int len) {
    int value = clear_value();
    for (int i = 0; i < len; i++) {
        buf[i] = value;
    }
}

In optimised builds, the call to clear_value simply never happens: if I added a breakpoint on all calls to clear_value, I'd get different results in debug and release builds.

For completeness, what counts as "observable behaviour"? It varies by language, but it generally includes I/O: the same data must be written into files, and the same I/O ordering must occur (that is, it's not acceptable to swap the order of a read and a write). Similarly, anything that does (or may do) any of those things also exhibits observable behaviour. In general, if a compiler can't see into the body of a function (e.g. because it's defined in a separate library), it must be assumed to do one of those things and so the call must remain.

I go through all of this lengthy material to cover that optimizers are explicitly allowed to change the behaviour of the program within some certain rules. Let's now tackle your other questions:

We do agree that lifetime is significant and must be visible. However, that agreement doesn't necessarily mean we all agree on when an object's lifetime ends.

You are applying C++'s lifetime management, which says that objects are destroyed at the end of the expression that lexically contains the point where they were created. That is, they are destroyed when their enclosing scope is destroyed (approximately). However, nothing in Swift commits to this being the rule of lifetime management in Swift. Indeed, the mere existence of withExtendedLifetime makes it clear that Swift's lifetime is not defined this way.

Consider the Transitioning to ARC document originally written for Objective-C programmers moving to ARC. This is contained in its FAQ:

How do I think about ARC? Where does it put the retains/releases?

Try to stop thinking about where the retain/release calls are put and think about your application algorithms instead. Think about “strong and weak” pointers in your objects, about object ownership, and about possible retain cycles.

The Swift book also doesn't say anything about scope. In the section on ARC it says:

Additionally, when an instance is no longer needed, ARC frees up the memory used by that instance so that the memory can be used for other purposes instead. This ensures that class instances do not take up space in memory when they are no longer needed.

and

To make sure that instances don’t disappear while they are still needed, ARC tracks how many properties, constants, and variables are currently referring to each class instance. ARC will not deallocate an instance as long as at least one active reference to that instance still exists.

Notice that this does not say "when an reference exits scope". Instead it uses the somewhat vague term "exists".

This vagueness is the source of the difference in behaviour. How long does a reference exist? Well, it certainly ceases to exist when the reference exits the scope that defined it (as in C++). This is easy for the compiler to evaluate quickly, so it's a natural fallback: certainly, no later than that point. However, in principle the reference may stop existing sooner than that! As nothing has committed to the idea that the reference lasts to the end of the function, the optimizer is free to assume that the reference stops existing at the moment of its last use. Hence, the call to swift_release may happen anytime after the last time you use a reference. This is the only promise the compiler makes.

I will repeat that as a call-out, because it's important: the lifetime of a Swift reference ends at the moment it is last accessible from any part of the program through valid Swift. However, just because the lifetime ends there does not mean that the compiler promises to call swift_release at that point. Instead, the compiler may call swift_release at any time from that point onward.

So, now to the most important question you asked:

Ideally, yes! It would be great if debug and release builds worked the same way. However, in practice, there are a number of problems with that idea.

The first is that the optimisations required to shorten the lifetime of a reference in release mode are expensive: they take time. This makes compiles slow. An important part of debug builds is that they need to be as fast as possible, to allow rapid development cycles. Making debug builds do this lifetime analysis will slow them down.

More importantly, this interacts with other optimisations. Inlining, constant-propagation, and other optimisations can all affect the lifetime of a reference. This means that in practice to achieve the same behaviour requires running all the same optimisations. This affects not only compile time, but also the debuggability of your program.

No, the core team would love predictability, but it comes with tradeoffs. As noted above, to make debug match release, we'd slow down our builds and make them less debuggable: it's a non-starter.

You might well ask: well then, why not make release match debug? Why not make the lifetime of all references explicitly end when their enclosing scope ends?

The easiest answer to this is that it makes Swift programs slower. Consider this code:

func doubleFirstAndPrint(_ array: [UInt8]) {
    var array2 = array  // Take local copy, we're going to mutate
    array2[0] *= 2
    print(array2)
}

func test() {
    let myArray = [1, 2, 3]
    doubleFirstAndPrint(myArray)
}

If the lifetime of a reference extends until the end of its lexical scope, then when we call test the myArray reference is alive until after doubleFirstAndPrint completes. That means that when we do array2[0] *= 2 in line 1 of doubleFirstAndPrint, there are three references to that array in scope: one in test (called myArray), one now-unreachable in doubleFirstAndPrint (the argument array) and another in doubleFirstAndPrint ( the local var array2). This means that the assignment here must copy and write, costing performance.

With Swift's model, we can flip that. We can say: well, myArray is not reachable after we pass it in to doubleFirstAndPrint. We created myArray at +1, so we can pass it at +1 to doubleFirstAndPrint, handing ownership to that function. Then, that function performs a copy to var array2. At this point, the argument array is unreachable, and so we can do a release for that name, and a retain for var array2. These two cancel out, so we can optimise them both away.

Now, when we come to our assignment, the refcount for var array2 is still only 1! That means we don't need to copy-on-write: Swift has correctly observed that the array is only reachable from one place in the code, the place we're modifying it from, and so there is no need to perform the CoW. Our program is also faster, as we've eliminated some retains/releases we didn't need in the code.

I hope this essay has helped make it a bit clearer why things are this way: why release and debug behave differently, and why it's hard and/or undesirable to make them behave the same in Swift.

This behavioural flexibility is part of why it in Swift it can be unwise to do resource management in deinit: as you've noted, it's not necessarily deterministically called at the same point in the code. In general, managing the lifetime of things that need to be cleaned up at a specific point is best done with with functions.

michelf · May 31, 2020, 11:26am

It could be argued though that Combine's reliance on AnyCancellable's deinit sort of makes memory management more observable than it should be. Memory management techniques aren't necessarily appropriate for managing subscription lifetime.

If AnyCancellable.deinit printed a message complaining when cancel() had not already been called, it'd encourage people to be explicit about when they cancel, and thus when the deinit gets called would matter less. But that means more boilerplate code for the common situation where you store the AnyCancellable within the object that is interested in maintaining the subscription.

I too encountered this issue when writing tests with Combine. My solution was not to use withExtendedLifetime but to call sub.cancel() to explicitly cancel the subscription at the right moment.

Hoon_H · May 31, 2020, 1:37pm

Thanks for long answer. But after reconsidering all the information, I really don't get convinced. Here's why.

ARC is applied only to ref-types, not value-types.

Reference counting applies only to instances of classes. Structures and enumerations are value types, not reference types, and are not stored and passed by reference.

And var pipe = [AnyCancellable]() is a value-type variable. It's not a reference. It should follow value-type lifetime rule, and shouldn't be affected by ref-type rules regardless of whatever contained in it.
What I meant with program behavior (or observable behavior) is simple. Shape of program doesn't matter. Only the execution result of computation (including all the I/O, and their orders) does matter. Optimizers are allowed to make only transformations that does not change the execution result. If it doesn't, optimized programs can produce any random result, and no one can write correct program. An optimization changes the result of program execution? That's a non-sense and simply a wrong optimization. In my example, execution result changed. Which is unacceptable if this is an optimization. Though I'm hoping that my case is not this one...
Why lifetime and timing of death matters? Because different deinit timing can change the result. In Swift, deinit timing is intentionally synchronous to allow programmers to predict and depend on it. Deinit timing must be kept.
Compiler producing incorrect code for better performance is non-sense. Correctness can't be a target of trade-off.
Swift value-types follow C/C++ value lifetime rules. owned by lexical block (or call stack frame). It's natural and obvious choice as a language to replace Objective-C and based on C/C++ memory model.
What is a reference?

To make sure that instances don’t disappear while they are still needed, ARC tracks how many properties, constants, and variables are currently referring to each class instance. ARC will not deallocate an instance as long as at least one active reference to that instance still exists.

Strong reference means a variable in a lexical block which stores a (shared) ownership-carrying pointer to a ref-type object in heap. It's not vague. As variables are owned by a lexical block, the variable lives exactly to the end of the scope, then the strong reference should also live exactly to the end of the scope. Therefore, ref-type object will live exactly to the end of the scope if the reference was the last one. "Last usage" is a vague term. Reference variable never dies before the end of lexical scope unless you call some special functions to decrease ref-count manually.

cukr · May 31, 2020, 8:26pm

Oh boy. I wonder what would happen when you find out about languages with garbage collectors

Compilers can and do change the behavior with optimizations, if language specs allow them to. Even in C++.
Example 1) any program with undefined behavior
Example 2) Compiler Explorer

lukasa · May 31, 2020, 8:42pm

You say this, or derive a conclusion from it, multiple times in your post. However, it’s simply not true, on multiple levels.

Firstly, C and C++ have distinctly different rules around lifetime and scoping. In particular, C has no automatic resource management of any kind: it does not do RAII in the sense that C++ does. There is no way to observe when a value exits scope in C.

Secondly, and more importantly, Swift’s value types verifiably do not follow the behaviour of exiting at the end of the lexical scope in which the value is defined. You can observe this by creating a struct that holds a class reference and observing when the class is deinited. The class can only be deinited when the struct is out of scope. If you do this, you’ll find the class-held-by-struct is deinited at the same time as a class directly.

The valid lifetime of a Swift value type is exactly the same as that of a Swift class reference. The only difference is that there is no way for you to write code that will execute when a value leaves scope (there is no deinit). So you can only observe this lifetime management indirectly. Nonetheless it is very real.

If you can accept this, the following comments stop being statements of fact too:

Value vs reference type is not relevant: when the value is cleaned up, which may happen anytime after the last usage site, all references it holds are released.

This is a strong statement, but it’s not true. As noted above, optimizers are allowed to make any transformation the language model allows them to make. In this case, Swift allows the optimizer to release a reference at any point after its last usage site. If the deinit has side-effects, this can change the timing of those side effects. It’s explicitly allowed by Swift.

This is another strong statement that is, again, not true. Compilers compile code according to the rules of the language. Within those rules, as long as an optimisation preserves them, the optimisation is sound. We can argue that the rules are bad, but the compiler is absolutely allowed to do this.

The ur-example of compilers being allowed to do this is C/C++’s undefined behaviour rules. In C/C++, a program that invokes undefined behaviour explicitly has no defined semantics. None. The compiler is entitled to do whatever it pleases to that program: it can even have that program do nothing at all.

In this case, the compiler is absolutely preserving the correctness of Swift’s programming model. The problem you have encountered is that your understanding of what is allowed by Swift’s rules is not the same as the compiler’s understanding. Sometimes this is a result of a compiler bug but in this case, your understanding is wrong. The compiler is absolutely entitled to make this optimisation.

This is a perfectly valid statement of values. You could absolutely construct a Swift-like language that made this guarantee. Swift does not.

Hoon_H · June 1, 2020, 12:02am

However, it’s simply not true, on multiple levels.

Swift’s value types verifiably do not follow the behaviour of exiting at the end of the lexical scope in which the value is defined.

when the value is cleaned up, which may happen anytime after the last usage site, all references it holds are released.

Swift allows the optimizer to release a reference at any point after its last usage site.

It’s explicitly allowed by Swift.

I'm sorry, but can you let me know about any official document or statement for this? Early death of value-types is too much surprising behavior, and I really cannot accept this without an official statement, and I couldn't find any statement about this. Same for early death of ref-types as the references are supposed to be pointer values on the call stack.

Even with an official statement, I still cannot accept this because this breaks promise of "predictable deinit timing".

The promise established by ARC rules is that "a ref-type object can die only at the moment of last reference gets removed". Not earlier or later. And the references means any stored pointer variables so we can track their lifetimes.

That's why we can say Objective-C/Swift RC is predictable. Unpredictable finalization timing and its effects are the biggest downsides of typical tracing GC languages, therefore this predictability has been advertised as a benefit of Objective-C/Swift.

Am I wrong on this?

Anyway, I tried to verify your claim of "last usage site", but failed.

final class AAA {
    deinit {
        print("AAA deinit")
    }
}
struct BBB {
    let aaa = AAA()
}
func ccc() {
    print("pre")
    let bbb = BBB()
    ddd(bbb)
    print("post")
}
func ddd(_ b:BBB) {
    print("DDD b: \(b)");
}
ccc()

Prints

pre
DDD b: BBB(aaa: main.AAA)
post
AAA deinit

Same for both swiftc main.swift and swiftc -O main.swift commands. And here's my toolchain version.

Apple Swift version 5.2.4 (swiftlang-1103.0.32.9 clang-1103.0.32.53)
Target: x86_64-apple-darwin19.4.0

Optimizer didn't killed the value early in this time on my machine for this simple case. Please let me know how to verify that.

"Transitioning to ARC" is written for Objective-C, which implies C variables and call stack. C local variable live to the end of scope. Strong reference pointer variables are same. Therefore lifetime of the strong reference is to the end of the lexical scope. If Swift inherited this behavior, it should show same lifetime by default.

You can verify lifetime of C variables by using Objective-C with ARC strong references.

@import Foundation;
@interface AAA : NSObject
@end
@implementation AAA
- (void)dealloc {
    NSLog(@"AAA dealloc");
}
@end
void test1(AAA* a) {
    NSLog(@"test1 %@", a);
}
int main() {
    NSLog(@"pre");
    AAA* a = [[AAA alloc] init];
    test1(a);
    NSLog(@"post");
}

Prints this.

2020-06-01 09:24:24.134788+0900 X4[61954:3007976] pre
2020-06-01 09:24:24.135581+0900 X4[61954:3007976] test1 <AAA: 0x100704470>
2020-06-01 09:24:24.135657+0900 X4[61954:3007976] post
2020-06-01 09:24:24.135706+0900 X4[61954:3007976] AAA dealloc

... We can argue that the rules are bad, but the compiler is absolutely allowed to do this.

Yes, I'm trying to say that this randomness in lifetime is unacceptable. I'm not really talking about the terms. I apology if my terms were vague.

I agree that compilers are allowed to do whatever they want. But I don't want such random compiler. I want my compiler only to make programs faster with keeping result same for all configurations. And I believe that this is a common sense. Do we agree on this?

Without a spec, this seems becoming a fight of belief vs. belief. I hope to see a clear spec on this.

michelf · June 1, 2020, 12:39am

At the very least I think this thread deserves a link to Clang's documentation for ARC that says Objective-C does not offer "predicable object semantics" for "local variables of automatic storage duration".

Hoon_H · June 1, 2020, 1:08am

Thanks for the spec!

So under ARC by default, Swift and Objective-C deinit/dealloc timing have not been predictable from first. Designers chose performance over predictability.

Okay this is my fault.
This unpredictability is really unacceptable. Really painful.
How about the value-types?
It'd be nice if I have a compiler flag to make all local variables to have precise lifetimes.

michelf · June 1, 2020, 1:34am

Value types do not have a deinit in Swift, so they don't really have any lifetime semantics they can offer. I think it's most likely references stored within value types on the stack are treated the same as references on the stack that are not within value types.

They chose a perfectly good behavior for memory management. Using this memory management mechanism for other things (like Combine's AnyCancellable) is what is making things painful.

Hoon_H · June 1, 2020, 2:24am

Well, maybe this is good in memory management optimization perspective, but IMO, really bad in whole program correctness perspective. How can I write correct programs for differently working compiler by versions and configurations? There can't be a static check for this as optimization is unpredictable. With this behavior, writing correct program is far more difficult, and needs more manual tracking efforts. If I forget marking a reference withExtendedLifetime and optimizer didn't worked for that, the bug will remain in the code and fixing them later gonna be extremely difficult.

I always been thought Swift is designed for safety (which also implies correctness) first and performance second. I am not sure now. This is not a design to help to write correct programs.

This issue can be raised in any kind of local-only references. For example, a detached thread which runs long-running jobs while keeping everything on stack.

I really hope to see a compiler flag to enforce "precise lifetimes" to everything.

Hoon_H · June 1, 2020, 4:06am

@lukasa @michelf Thanks for the help. Today is a big day for me.