Will updating value type variable concurrently or from multiple threads cause crash?

duc · January 10, 2020, 5:40am

As I know setting a variable concurrently or from multiple threads will cause race condition and sometimes crash. For example:

class Data {}

class Node {
    var data = Data()
}

var node = Node()
let concurrentQueue = DispatchQueue(label: "queue", attributes: .concurrent)

for i in 0...1000 {
    concurrentQueue.async {
        node.data = Data()    // EXC_BAD_ACCESS KERN_INVALID_ADDRESS
    }
}

But when I try to change data type to Int, it didn't get crash anymore. At least, I can't make it crash.

class Node {
    var data = 0
}

var node = Node()
let concurrentQueue = DispatchQueue(label: "queue", attributes: .concurrent)

for i in 0...1000 {
    concurrentQueue.async {
        node.data = i
    }
}

Could anyone help me explain this behavior? Is there any difference between updating value type and reference type variable concurrently or from multiple threads?
Thanks!

lukasa · January 10, 2020, 7:46am

This is still a bug, and you will get unexpected outcomes. There is no difference between updating value and reference types from multiple threads: both are bugs. You can see this most obviously because Data is a value type as well as Int.

You must never do this. Crashes are only one manifestation of the kinds of problems you can get when you write data races of this kind. You can also get any number of issues. Please always use synchronisation primitives.

duc · January 10, 2020, 7:53am

Thanks for your response. As I said in my question, I know it will lead to race condition but I'm confused why I can't get crash with value type.

Btw, I think updating value and reference types from multiple threads are different. At least, reference types are managed by ARC so releasing value type variable will be definitely different from releasing reference type variable.

eskimo · January 10, 2020, 8:43am

I'm confused why I can't get crash with value type.

Threading bugs like this are undefined behaviour, so it’s hard to predict exactly how things will fail. The best way to investigate such issues is with the thread sanitiser. If you enable that and run your second example, things fail immediately.

Share and Enjoy

Quinn “The Eskimo!” @ DTS @ Apple

lukasa · January 10, 2020, 9:32am

There is no "can't", only "don't". I would liken this to committing a crime: just because when you commit a crime the cops don't immediately show up to arrest you, doesn't mean you got away with it. Your code is wrong, and one day you'll discover it. There is no rule of programming that says "race conditions must cause crashes".

This is not a very good way to think about the distinction between value and reference types. The difference between value and reference types is in their semantics, not in the way they allocate memory, and it's entirely possible for a value type to include ARC'd data. Indeed, Data does just that (as does String, Array, and many other value types in Swift): if you copy a Data around you will see swift_retain and swift_release calls, even though Data is unambiguously a value type in Swift.

duc · January 10, 2020, 9:40am

@eskimo Thanks for your supporting.

Yes, it’s hard to predict exactly how things will fail. If I use Thread Sanitizer, an error will be thrown due to race condition. But it doesn't mention what exactly happened.

duc · January 10, 2020, 9:46am

@lukasa I'm so sorry but maybe you misunderstood me .

I know that code is wrong and race conditions may not cause crashes. I'm just curious what happened under the hood and hope someone can help me explain.

lukasa · January 10, 2020, 10:06am

The way to understand what happens under the hood is to understand the difference between node.data = Data() and node.data = i.

In the first case, the program has to do the following things:

Allocate a new Data
Reduce the reference count of the old Data stored in node.data, potentially freeing it if it is no longer referenced.
Move the new Data into the node.data field.

In the second case, the program has to do the following:

Move the integer i into node.data.

The important difference here is step (2) in the class based model. Here we have to modify the reference count of node.data and potentially free that object. To do so, we have to dereference the pointer in node.data.

The problem here is that we are racing, doing this algorithm in multiple threads. That means we may encounter a situation where the two threads interleave operations like this:

THREAD 1          |          THREAD 2
-------------------------------------
Allocate new Data |
                  |
                  | Allocate new Data
                  |
Load pointer to   |
old data          |
                  |
Reduce reference  |
count of old Data |
                  |
                  | Load pointer to
                  | old data
                  |
Free old Data     |
                  |
                  | Reduce reference
                  | count of old data
                  | (!)
                  |
                  | Free old Data (!)
                  |
Store new Data    |
                  |
                  | Store new Data (!)

There are a number of issues with that set of operations. In particular, thread 2 is holding a dangling pointer: a pointer to memory that thread 1 has already operated on and freed. Any number of problems may happen here, but the most common one is that you will get a segmentation fault because thread 2 tries to dereference that pointer after thread 1 has already freed it. Some other issues can occur too. Notice also that we are at risk of leaking one of the new Data objects, because thread 2 may not correctly reduce the reference count of the one stored by thread 1.

Compare this to the operations with an integer:

THREAD 1          |          THREAD 2
-------------------------------------
Store integer     |
                  |
                  | Store integer

Lots of problems can happen here: you can get tearing, you can end up with an unexpected final value. However, on an intel CPU this interleaving of operations will never cause a crash: you will just end up with unexpected (and potentially invalid) data.

Note that I said "on an Intel CPU" because many CPUs do not promise that doing this will not cause a crash. More generally, you cannot assume that just because a type is trivial, you cannot cause crashes when writing it from multiple threads. To be clear: you should not assume that the code is ok, or that it could never crash, or anything else. By sheer good luck the code you wrote does not crash today, but it might crash tomorrow, or next week, or in a different machine, or in bad weather. The compiler is even allowed to assume what you wrote cannot happen and so rewrite your code entirely to avoid the operation.

duc · January 10, 2020, 10:15am

PERFECT ANSWER !!!!

That's exactly what I expected. Your answer totally convinced me. That code will never appear in my app.

@lukasa Thank you very much for your patience and support !!!

viktorcode · January 10, 2020, 11:10am

Basically, on Intel Store Integer is an atomic operation, but this atomicity actually is CPU-dependent and not dictated by Swift?

lukasa · January 10, 2020, 11:12am

The atomicity of the operation is not relevant: if the integer was made up of 8 separate bytes, each of which had a separate instruction to be stored, you would still not see a crash. What matters is that you don't trigger either a precondition or a non-recoverable memory fault when storing integers, because you don't have to follow a potentially-invalidated pointer in order to copy that memory.

duc · January 10, 2020, 3:22pm

Sorry for bothering you again but seem like you are talking about integer only. I know Int is a struct in Swift. So will it cause crash when using other struct type?

lukasa · January 10, 2020, 5:51pm

It may. In particular, structs that have variables storing classes will be very likely to crash. Other structs probably won't immediately crash in your specific example. However, as @eskimo correctly points out, it is not worth getting too deep into working out exactly which bad thing will happen to you: something bad will happen.

Karl · January 10, 2020, 6:37pm

The important difference between Int and Data in your example is that, even though both of them are structs and have value semantics, Data can have dynamically-allocated memory, which is managed by reference counting. As @lukasa described earlier, when you assign to a reference-counted type, the old value is important because it must be released and possibly freed. So during the assignment, it matters that the old value is not corrupted, because it will be accessed and stuff depends on its value. That doesn't happen for fixed-size types like Int; they have no dynamically-allocated storage, so there is no reason to access the old value and nothing depends on it.

I bet that if you try it with a user-defined struct composed of fixed-size primitives like Int and Float, you also wouldn't see a crash. If you add a class property, or something with dynamic storage like an Array or Data, then you may see crashes again (if you're lucky). I don't know if corrupted protocol existentials will or won't be detected.

duc · January 11, 2020, 5:30am

@lukasa @Karl Thank you guys! I got the point.

jonathanpenn · January 11, 2020, 6:21pm

Digging in to find how what conditions can cause crashes in these concurrent cases is laudable. It’s certainly good to understand one’s tools!

But I thought it would be good to point out that Swift currently makes no guarantees about atomicity of its assignment operations. Your questions about the differences between assigning a mere primitive Int vs a complex class makes me think you might be hunting for a deeper answer.

It seems to me like the questions like this often reduce to...“So, what is the smallest operation that Swift can guarantee no other thread can interrupt?” And the answer is that there is none at the moment. Right now you need to bring in your own atomic primitives (like locks, mutexes, or writing C based wrappers around things like C++ atomics so Swift can see them), and using them to protect the things you care about.

That’s why everyone here keeps saying “Don’t worry about why one crashes or the other doesn’t...neither is correct.” Swift does not automatically wrap these assignments in atomic primitives. If it happens to look like it works, you got lucky.