Data races with copy-on-write

Hi,

I'm trying to better understand how copy-on-write works, especially in a multithreaded environment, but there are a few things that confuse me.

From the documentation, it is said that:
"If the instance passed as object is being accessed by multiple threads simultaneously, isKnownUniquelyReferenced(_:) may still return true. Therefore, you must only call this function from mutating methods with appropriate thread synchronization. That will ensure that isKnownUniquelyReferenced(_:) only returns true when there is really one accessor, or when there is a race condition, which is already undefined behavior."

Let's consider this sample code:

func mutateArray(_ array: [Int]) {
  var elements = array
  elements.append(1)
}

let q1 = DispatchQueue(label: "testQ1")
let q2 = DispatchQueue(label: "testQ2")
let q3 = DispatchQueue(label: "testQ3")

let iterations = 1000

var array: [Int] = [1, 2, 3]

q1.async {
  for _ in 0..<iterations {
    mutateArray(array)
  }
}

q2.async {
  for _ in 0..<iterations {
    mutateArray(array)
  }
}

q3.async {
  for _ in 0..<iterations {
    mutateArray(array)
  }
}

// ...

From what I understand, since Array<T> implements copy-on-write, the array should be copied only when the mutating append(_:) method is called in mutateArray(_:). Therefore, isKnownUniquelyReferenced(_:) should be called to determine whether a copy is required or not.

However, it is being accessed by multiple threads simultaneously, which may cause a race condition, right? So why does the thread sanitizer never detect a race condition here? Is there some compiler optimization going on here?

On the other hand, the thread sanitizer always detects a race condition, for instance, if I add the following code to mutate directly the array:

for _ in 0..<iterations {
  array.append(1)
}

In this case, is it because I mutate the array buffer that is being copied from other threads?

Even strangier, let's consider the following sample code:

class SynchronizedArray<Element> {

  // [...]

  private var lock = NSLock()
  private var _elements: Array<Element>

  var elements: Array<Element> {
    lock.lock()
    defer { lock.unlock() }
    return _elements
  }
  
  @discardableResult
  public final func access<R>(_ closure: (inout T) throws -> R) rethrows -> R {
    lock.lock()
    defer { lock.unlock() }
    return try closure(&_value)
  }
}

let syncArray = SynchronizedArray<Int>()

func mutateArray() {
  syncArray.access { array in
    array.append(1)
  }

  var elements = syncArray.elements
  var copy = elements // no race condition detected by TSan when I add this line
  elements.append(1) // race condition detected by TSan (if previous line is missing)
}

// Call mutateArray() from multiple threads like in the first sample code.

The line marked with does nothing useful, yet adding this line prevents the race condition at the next line to be detected by the thread sanitizer. Is this again because of some compiler optimization?

However, when the array buffer is being copied, we can mutate the same buffer with the append(_:) method, right? So, shouldn't the thread sanitizer detect a race condition here?

Please let me know if I ever misunderstood how copy-on-write works in Swift.

Also, I'd like to know:
- besides capture lists, what are the correct ways to pass a copy-on-write value between threads?
- for thread-safe classes that expose an array as a property, should I always copy the private array variable before returning it from the public getter? If so, is there any recommended way to force-copy a value type in Swift ?

Any help would be greatly appreciated.
Thanks.

Note: I'm using Swift 4 with the latest Xcode version (9.2 (9C40b)) and the thread sanitizer enabled.

1 Like

The array *storage* is copy on write. The array variable (which essentially contains a pointer to the storage) is not copy on write. If you refer to the same array variable from multiple threads, you have a race. Rather, use a different copy of the variable to each thread. Copied variables will share the same storage but will make a copy of the storage when writing to it.

I'm not sure what is the problem with your SynchronizedArray example. But I would try reimplementing `var element` this way:

  var elements: Array<Element> {
    return access { $0 }
  }

If this change fixes the race it means the compiler is making the copy after the `unlock()` with your previous code, which could explain the detected race. I suppose the additional copy afterwards fixes the race because of an internal implementation detail (like changing the reference count for the storage). I'd be wary of the optimizer breaking this trick though.

···

Le 5 déc. 2017 à 5:20, Romain Jacquinot via swift-users <swift-users@swift.org> a écrit :

Hi,

I'm trying to better understand how copy-on-write works, especially in a multithreaded environment, but there are a few things that confuse me.

From the documentation, it is said that:
"If the instance passed as object is being accessed by multiple threads simultaneously, isKnownUniquelyReferenced(_:) may still return true. Therefore, you must only call this function from mutating methods with appropriate thread synchronization. That will ensure that isKnownUniquelyReferenced(_:) only returns true when there is really one accessor, or when there is a race condition, which is already undefined behavior."

Let's consider this sample code:

func mutateArray(_ array: [Int]) {
  var elements = array
  elements.append(1)
}

let q1 = DispatchQueue(label: "testQ1")
let q2 = DispatchQueue(label: "testQ2")
let q3 = DispatchQueue(label: "testQ3")

let iterations = 1000

var array: [Int] = [1, 2, 3]

q1.async {
  for _ in 0..<iterations {
    mutateArray(array)
  }
}

q2.async {
  for _ in 0..<iterations {
    mutateArray(array)
  }
}

q3.async {
  for _ in 0..<iterations {
    mutateArray(array)
  }
}

// ...

From what I understand, since Array<T> implements copy-on-write, the array should be copied only when the mutating append(_:) method is called in mutateArray(_:). Therefore, isKnownUniquelyReferenced(_:) should be called to determine whether a copy is required or not.

However, it is being accessed by multiple threads simultaneously, which may cause a race condition, right? So why does the thread sanitizer never detect a race condition here? Is there some compiler optimization going on here?

On the other hand, the thread sanitizer always detects a race condition, for instance, if I add the following code to mutate directly the array:

for _ in 0..<iterations {
  array.append(1)
}

In this case, is it because I mutate the array buffer that is being copied from other threads?

Even strangier, let's consider the following sample code:

class SynchronizedArray<Element> {

  // [...]

  private var lock = NSLock()
  private var _elements: Array<Element>

  var elements: Array<Element> {
    lock.lock()
    defer { lock.unlock() }
    return _elements
  }
  
  @discardableResult
  public final func access<R>(_ closure: (inout T) throws -> R) rethrows -> R {
    lock.lock()
    defer { lock.unlock() }
    return try closure(&_value)
  }
}

let syncArray = SynchronizedArray<Int>()

func mutateArray() {
  syncArray.access { array in
    array.append(1)
  }

  var elements = syncArray.elements
  var copy = elements // no race condition detected by TSan when I add this line
  elements.append(1) // race condition detected by TSan (if previous line is missing)
}

// Call mutateArray() from multiple threads like in the first sample code.

The line marked with does nothing useful, yet adding this line prevents the race condition at the next line to be detected by the thread sanitizer. Is this again because of some compiler optimization?

However, when the array buffer is being copied, we can mutate the same buffer with the append(_:) method, right? So, shouldn't the thread sanitizer detect a race condition here?

Please let me know if I ever misunderstood how copy-on-write works in Swift.

Also, I'd like to know:
- besides capture lists, what are the correct ways to pass a copy-on-write value between threads?
- for thread-safe classes that expose an array as a property, should I always copy the private array variable before returning it from the public getter? If so, is there any recommended way to force-copy a value type in Swift ?

Any help would be greatly appreciated.
Thanks.

Note: I'm using Swift 4 with the latest Xcode version (9.2 (9C40b)) and the thread sanitizer enabled.

_______________________________________________
swift-users mailing list
swift-users@swift.org
https://lists.swift.org/mailman/listinfo/swift-users

--
Michel Fortin
https://michelf.ca

I think you’ve misunderstood. The race condition Romain is referring to is when the two threads both access the shared storage, through separate array variables.

Romain:
From the thread sanitizer warnings, it sounds like the implementation of Array is not using synchronization around its call(s) to isKnownUniquelyReferenced … which would mean the class is not thread-safe.

It’s pretty common for the regular (mutable) collection classes supplied by a framework to be non-thread-safe; for example consider Foundation and Java. The reason is that the overhead of taking a lock every time you access an array element is pretty high. Generally it’s preferable to use larger-granularity locks, i.e. grab an external lock before performing a number of array operations.

—Jens

···

On Dec 5, 2017, at 6:24 AM, Michel Fortin via swift-users <swift-users@swift.org> wrote:

The array *storage* is copy on write. The array variable (which essentially contains a pointer to the storage) is not copy on write. If you refer to the same array variable from multiple threads, you have a race. Rather, use a different copy of the variable to each thread. Copied variables will share the same storage but will make a copy of the storage when writing to it.

Small typo in the second sample code. Should read:

return try closure(&_elements)

···

Begin forwarded message:

From: Romain Jacquinot via swift-users <swift-users@swift.org>
Subject: [swift-users] Data races with copy-on-write
Date: December 5, 2017 at 11:20:46 AM GMT+1
To: swift-users@swift.org
Reply-To: Romain Jacquinot <rjacquinot@me.com>

Hi,

I'm trying to better understand how copy-on-write works, especially in a multithreaded environment, but there are a few things that confuse me.

From the documentation, it is said that:
"If the instance passed as object is being accessed by multiple threads simultaneously, isKnownUniquelyReferenced(_:) may still return true. Therefore, you must only call this function from mutating methods with appropriate thread synchronization. That will ensure that isKnownUniquelyReferenced(_:) only returns true when there is really one accessor, or when there is a race condition, which is already undefined behavior."

Let's consider this sample code:

func mutateArray(_ array: [Int]) {
  var elements = array
  elements.append(1)
}

let q1 = DispatchQueue(label: "testQ1")
let q2 = DispatchQueue(label: "testQ2")
let q3 = DispatchQueue(label: "testQ3")

let iterations = 1000

var array: [Int] = [1, 2, 3]

q1.async {
  for _ in 0..<iterations {
    mutateArray(array)
  }
}

q2.async {
  for _ in 0..<iterations {
    mutateArray(array)
  }
}

q3.async {
  for _ in 0..<iterations {
    mutateArray(array)
  }
}

// ...

From what I understand, since Array<T> implements copy-on-write, the array should be copied only when the mutating append(_:) method is called in mutateArray(_:). Therefore, isKnownUniquelyReferenced(_:) should be called to determine whether a copy is required or not.

However, it is being accessed by multiple threads simultaneously, which may cause a race condition, right? So why does the thread sanitizer never detect a race condition here? Is there some compiler optimization going on here?

On the other hand, the thread sanitizer always detects a race condition, for instance, if I add the following code to mutate directly the array:

for _ in 0..<iterations {
  array.append(1)
}

In this case, is it because I mutate the array buffer that is being copied from other threads?

Even strangier, let's consider the following sample code:

class SynchronizedArray<Element> {

  // [...]

  private var lock = NSLock()
  private var _elements: Array<Element>

  var elements: Array<Element> {
    lock.lock()
    defer { lock.unlock() }
    return _elements
  }
  
  @discardableResult
  public final func access<R>(_ closure: (inout T) throws -> R) rethrows -> R {
    lock.lock()
    defer { lock.unlock() }
    return try closure(&_value)
  }
}

let syncArray = SynchronizedArray<Int>()

func mutateArray() {
  syncArray.access { array in
    array.append(1)
  }

  var elements = syncArray.elements
  var copy = elements // no race condition detected by TSan when I add this line
  elements.append(1) // race condition detected by TSan (if previous line is missing)
}

// Call mutateArray() from multiple threads like in the first sample code.

The line marked with does nothing useful, yet adding this line prevents the race condition at the next line to be detected by the thread sanitizer. Is this again because of some compiler optimization?

However, when the array buffer is being copied, we can mutate the same buffer with the append(_:) method, right? So, shouldn't the thread sanitizer detect a race condition here?

Please let me know if I ever misunderstood how copy-on-write works in Swift.

Also, I'd like to know:
- besides capture lists, what are the correct ways to pass a copy-on-write value between threads?
- for thread-safe classes that expose an array as a property, should I always copy the private array variable before returning it from the public getter? If so, is there any recommended way to force-copy a value type in Swift ?

Any help would be greatly appreciated.
Thanks.

Note: I'm using Swift 4 with the latest Xcode version (9.2 (9C40b)) and the thread sanitizer enabled.

_______________________________________________
swift-users mailing list
swift-users@swift.org
https://lists.swift.org/mailman/listinfo/swift-users

Thank you Michel.

Rather, use a different copy of the variable to each thread.

How should I copy a variable to each thread?

I’m not sure to understand what you mean by “copy” in this case. If I have this:
var a = Array<Int>()
var b = a
do you consider “b” as being a copy of the variable “a”?

Copied variables will share the same storage but will make a copy of the storage when writing to it.

But by sharing the same storage and making a copy of the storage when writing to it, there would be a race condition if the storage is being written by another thread while being copied, right?

I'm not sure what is the problem with your SynchronizedArray example. But I would try reimplementing `var element` this way:

Reimplementing the elements property doesn’t fix the race. Of course, for my specific example, I can still do:

var elements: Array<Element> {
  return access { array in
    array.map { $0 }
  }
}

but I’d like to know if there is a more generic way to copy a value type variable in Swift.

···

On Dec 5, 2017, at 3:23 PM, Michel Fortin <michel.fortin@michelf.ca> wrote:

The array *storage* is copy on write. The array variable (which essentially contains a pointer to the storage) is not copy on write. If you refer to the same array variable from multiple threads, you have a race. Rather, use a different copy of the variable to each thread. Copied variables will share the same storage but will make a copy of the storage when writing to it.

I'm not sure what is the problem with your SynchronizedArray example. But I would try reimplementing `var element` this way:

  var elements: Array<Element> {
    return access { $0 }
  }

If this change fixes the race it means the compiler is making the copy after the `unlock()` with your previous code, which could explain the detected race. I suppose the additional copy afterwards fixes the race because of an internal implementation detail (like changing the reference count for the storage). I'd be wary of the optimizer breaking this trick though.

Le 5 déc. 2017 à 5:20, Romain Jacquinot via swift-users <swift-users@swift.org <mailto:swift-users@swift.org>> a écrit :

Hi,

I'm trying to better understand how copy-on-write works, especially in a multithreaded environment, but there are a few things that confuse me.

From the documentation, it is said that:
"If the instance passed as object is being accessed by multiple threads simultaneously, isKnownUniquelyReferenced(_:) may still return true. Therefore, you must only call this function from mutating methods with appropriate thread synchronization. That will ensure that isKnownUniquelyReferenced(_:) only returns true when there is really one accessor, or when there is a race condition, which is already undefined behavior."

Let's consider this sample code:

func mutateArray(_ array: [Int]) {
  var elements = array
  elements.append(1)
}

let q1 = DispatchQueue(label: "testQ1")
let q2 = DispatchQueue(label: "testQ2")
let q3 = DispatchQueue(label: "testQ3")

let iterations = 1000

var array: [Int] = [1, 2, 3]

q1.async {
  for _ in 0..<iterations {
    mutateArray(array)
  }
}

q2.async {
  for _ in 0..<iterations {
    mutateArray(array)
  }
}

q3.async {
  for _ in 0..<iterations {
    mutateArray(array)
  }
}

// ...

From what I understand, since Array<T> implements copy-on-write, the array should be copied only when the mutating append(_:) method is called in mutateArray(_:). Therefore, isKnownUniquelyReferenced(_:) should be called to determine whether a copy is required or not.

However, it is being accessed by multiple threads simultaneously, which may cause a race condition, right? So why does the thread sanitizer never detect a race condition here? Is there some compiler optimization going on here?

On the other hand, the thread sanitizer always detects a race condition, for instance, if I add the following code to mutate directly the array:

for _ in 0..<iterations {
  array.append(1)
}

In this case, is it because I mutate the array buffer that is being copied from other threads?

Even strangier, let's consider the following sample code:

class SynchronizedArray<Element> {

  // [...]

  private var lock = NSLock()
  private var _elements: Array<Element>

  var elements: Array<Element> {
    lock.lock()
    defer { lock.unlock() }
    return _elements
  }
  
  @discardableResult
  public final func access<R>(_ closure: (inout T) throws -> R) rethrows -> R {
    lock.lock()
    defer { lock.unlock() }
    return try closure(&_value)
  }
}

let syncArray = SynchronizedArray<Int>()

func mutateArray() {
  syncArray.access { array in
    array.append(1)
  }

  var elements = syncArray.elements
  var copy = elements // no race condition detected by TSan when I add this line
  elements.append(1) // race condition detected by TSan (if previous line is missing)
}

// Call mutateArray() from multiple threads like in the first sample code.

The line marked with does nothing useful, yet adding this line prevents the race condition at the next line to be detected by the thread sanitizer. Is this again because of some compiler optimization?

However, when the array buffer is being copied, we can mutate the same buffer with the append(_:) method, right? So, shouldn't the thread sanitizer detect a race condition here?

Please let me know if I ever misunderstood how copy-on-write works in Swift.

Also, I'd like to know:
- besides capture lists, what are the correct ways to pass a copy-on-write value between threads?
- for thread-safe classes that expose an array as a property, should I always copy the private array variable before returning it from the public getter? If so, is there any recommended way to force-copy a value type in Swift ?

Any help would be greatly appreciated.
Thanks.

Note: I'm using Swift 4 with the latest Xcode version (9.2 (9C40b)) and the thread sanitizer enabled.

_______________________________________________
swift-users mailing list
swift-users@swift.org <mailto:swift-users@swift.org>
https://lists.swift.org/mailman/listinfo/swift-users

--
Michel Fortin
https://michelf.ca <https://michelf.ca/&gt;

Hi Jens,

In the SynchronizedArray class, I use a lock to perform mutating operations on the array. However, my questions concern the case where an array is exposed as a public property of a thread-safe class, like this:

public class MyClass {

  private var _myArray: Array<Int> =
  private var _lock = NSLock()

  public var myArray: Array<Int> {
    _lock.lock()
    defer { _lock.unlock() }
    return _myArray
  }

  public func doSomethingThatMutatesArray() {
    _lock.lock()
    _myArray.append(1)
    _lock.unlock()
  }
}

Arrays are implemented as structs in Swift. This is a value type, but because Array<T> implements copy-on-write, there is an issue if you do something like this from multiple threads:

let myClass = MyClass()

func callFromMultipleThreads() {
  let arrayCopy = myClass.myArray
  arrayCopy.append(2) // race condition here
}

This is quite problematic, because the user of MyClass shouldn’t have to worry about side-effects when using a copy of the value from myArray.

···

On Dec 5, 2017, at 8:22 PM, Jens Alfke <jens@mooseyard.com> wrote:

On Dec 5, 2017, at 6:24 AM, Michel Fortin via swift-users <swift-users@swift.org <mailto:swift-users@swift.org>> wrote:

The array *storage* is copy on write. The array variable (which essentially contains a pointer to the storage) is not copy on write. If you refer to the same array variable from multiple threads, you have a race. Rather, use a different copy of the variable to each thread. Copied variables will share the same storage but will make a copy of the storage when writing to it.

I think you’ve misunderstood. The race condition Romain is referring to is when the two threads both access the shared storage, through separate array variables.

Romain:
From the thread sanitizer warnings, it sounds like the implementation of Array is not using synchronization around its call(s) to isKnownUniquelyReferenced … which would mean the class is not thread-safe.

It’s pretty common for the regular (mutable) collection classes supplied by a framework to be non-thread-safe; for example consider Foundation and Java. The reason is that the overhead of taking a lock every time you access an array element is pretty high. Generally it’s preferable to use larger-granularity locks, i.e. grab an external lock before performing a number of array operations.

—Jens

Also, on the official Swift blog (Value and Reference Types - Swift Blog - Apple Developer <Value and Reference Types - Swift Blog - Apple Developer), it is stated that:

"One of the primary reasons to choose value types over reference types is the ability to more easily reason about your code. If you always get a unique, copied instance, you can trust that no other part of your app is changing the data under the covers. This is especially helpful in multi-threaded environments where a different thread could alter your data out from under you.
[…]
In Swift, Array, String, and Dictionary are all value types. They behave much like a simple int value in C, acting as a unique instance of that data. You don’t need to do anything special — such as making an explicit copy — to prevent other code from modifying that data behind your back. Importantly, you can safely pass copies of values across threads without synchronization. In the spirit of improving safety, this model will help you write more predictable code in Swift.”

Yet, data race can occur here:

public class MyClass {

    private var _myArray: Array<Int> =
    private var _lock = NSLock()

    public var myArray: Array<Int> {
        lock.lock()
        defer {
            lock.unlock()
        }
        let copy = _myArray
        return copy
    }

    public func doSomethingThatMutatesArray() {
        _lock.lock()
        _myArray.append(1) // data race here: write of size 8 by thread 1
        _lock.unlock()
    }
}

let myClass = MyClass()

func mutateArray() {
    myClass.doSomethingThatMutatesArray()
    var arrayCopy = myClass.myArray
    arrayCopy.append(2) // data race here: read of size 8 by thread 10
}

let q1 = DispatchQueue(label: "testQ1")
let q2 = DispatchQueue(label: "testQ2")
let q3 = DispatchQueue(label: "testQ3")

let iterations = 100_000

q1.async {
    for _ in 0..<iterations {
        mutateArray()
    }
}

q2.async {
    for _ in 0..<iterations {
        mutateArray()
    }
}

q3.async {
    for _ in 0..<iterations {
        mutateArray()
    }
}

for _ in 0..<iterations {
     mutateArray()
}

q1.sync {}
q2.sync {}
q3.sync {}
NSLog("done")

It also can occur for instance if I replace the mutateArray() function with the following method:

func enumerateArray() {
    myClass.doSomethingThatMutatesArray()
    for element in myClass.myArray { // data race here: read of size 8 by thread 5
        let _ = element
    }
}

How can I return a “predictable” copy from the MyClass.myArray getter, so that I can later mutate the copy without synchronization like in the mutateArray() function?

···

On Dec 5, 2017, at 9:23 PM, Romain Jacquinot via swift-users <swift-users@swift.org> wrote:

Hi Jens,

In the SynchronizedArray class, I use a lock to perform mutating operations on the array. However, my questions concern the case where an array is exposed as a public property of a thread-safe class, like this:

public class MyClass {

  private var _myArray: Array<Int> =
  private var _lock = NSLock()

  public var myArray: Array<Int> {
    _lock.lock()
    defer { _lock.unlock() }
    return _myArray
  }

  public func doSomethingThatMutatesArray() {
    _lock.lock()
    _myArray.append(1)
    _lock.unlock()
  }
}

Arrays are implemented as structs in Swift. This is a value type, but because Array<T> implements copy-on-write, there is an issue if you do something like this from multiple threads:

let myClass = MyClass()

func callFromMultipleThreads() {
  let arrayCopy = myClass.myArray
  arrayCopy.append(2) // race condition here
}

This is quite problematic, because the user of MyClass shouldn’t have to worry about side-effects when using a copy of the value from myArray.

On Dec 5, 2017, at 8:22 PM, Jens Alfke <jens@mooseyard.com <mailto:jens@mooseyard.com>> wrote:

On Dec 5, 2017, at 6:24 AM, Michel Fortin via swift-users <swift-users@swift.org <mailto:swift-users@swift.org>> wrote:

The array *storage* is copy on write. The array variable (which essentially contains a pointer to the storage) is not copy on write. If you refer to the same array variable from multiple threads, you have a race. Rather, use a different copy of the variable to each thread. Copied variables will share the same storage but will make a copy of the storage when writing to it.

I think you’ve misunderstood. The race condition Romain is referring to is when the two threads both access the shared storage, through separate array variables.

Romain:
From the thread sanitizer warnings, it sounds like the implementation of Array is not using synchronization around its call(s) to isKnownUniquelyReferenced … which would mean the class is not thread-safe.

It’s pretty common for the regular (mutable) collection classes supplied by a framework to be non-thread-safe; for example consider Foundation and Java. The reason is that the overhead of taking a lock every time you access an array element is pretty high. Generally it’s preferable to use larger-granularity locks, i.e. grab an external lock before performing a number of array operations.

—Jens

_______________________________________________
swift-users mailing list
swift-users@swift.org
https://lists.swift.org/mailman/listinfo/swift-users

I'm seeing the race too when compiling with -O (and TSan enabled). I'm 95% sure this should be valid Swift code without any further synchronization, so please file a bug at https://bugs.swift.org. (But I'm only 95% sure because concurrency is hard.)

Looking at the backtraces, it looks like one thread thinks it has exclusive ownership of the buffer while the other thread is still copying things out of it. This is a bug on Swift's side; this code should work. I'm pretty sure this is actually a situation I was just talking about with Michael I from the stdlib team a few days ago, so it's good to have a test case for it.

Jordan

···

On Dec 5, 2017, at 14:23, Romain Jacquinot via swift-users <swift-users@swift.org> wrote:

Also, on the official Swift blog (Value and Reference Types - Swift Blog - Apple Developer <Value and Reference Types - Swift Blog - Apple Developer), it is stated that:

"One of the primary reasons to choose value types over reference types is the ability to more easily reason about your code. If you always get a unique, copied instance, you can trust that no other part of your app is changing the data under the covers. This is especially helpful in multi-threaded environments where a different thread could alter your data out from under you.
[…]
In Swift, Array, String, and Dictionary are all value types. They behave much like a simple int value in C, acting as a unique instance of that data. You don’t need to do anything special — such as making an explicit copy — to prevent other code from modifying that data behind your back. Importantly, you can safely pass copies of values across threads without synchronization. In the spirit of improving safety, this model will help you write more predictable code in Swift.”

Yet, data race can occur here:

public class MyClass {

    private var _myArray: Array<Int> =
    private var _lock = NSLock()

    public var myArray: Array<Int> {
        lock.lock()
        defer {
            lock.unlock()
        }
        let copy = _myArray
        return copy
    }

    public func doSomethingThatMutatesArray() {
        _lock.lock()
        _myArray.append(1) // data race here: write of size 8 by thread 1
        _lock.unlock()
    }
}

let myClass = MyClass()

func mutateArray() {
    myClass.doSomethingThatMutatesArray()
    var arrayCopy = myClass.myArray
    arrayCopy.append(2) // data race here: read of size 8 by thread 10
}

let q1 = DispatchQueue(label: "testQ1")
let q2 = DispatchQueue(label: "testQ2")
let q3 = DispatchQueue(label: "testQ3")

let iterations = 100_000

q1.async {
    for _ in 0..<iterations {
        mutateArray()
    }
}

q2.async {
    for _ in 0..<iterations {
        mutateArray()
    }
}

q3.async {
    for _ in 0..<iterations {
        mutateArray()
    }
}

for _ in 0..<iterations {
     mutateArray()
}

q1.sync {}
q2.sync {}
q3.sync {}
NSLog("done")

It also can occur for instance if I replace the mutateArray() function with the following method:

func enumerateArray() {
    myClass.doSomethingThatMutatesArray()
    for element in myClass.myArray { // data race here: read of size 8 by thread 5
        let _ = element
    }
}

How can I return a “predictable” copy from the MyClass.myArray getter, so that I can later mutate the copy without synchronization like in the mutateArray() function?

On Dec 5, 2017, at 9:23 PM, Romain Jacquinot via swift-users <swift-users@swift.org <mailto:swift-users@swift.org>> wrote:

Hi Jens,

In the SynchronizedArray class, I use a lock to perform mutating operations on the array. However, my questions concern the case where an array is exposed as a public property of a thread-safe class, like this:

public class MyClass {

  private var _myArray: Array<Int> =
  private var _lock = NSLock()

  public var myArray: Array<Int> {
    _lock.lock()
    defer { _lock.unlock() }
    return _myArray
  }

  public func doSomethingThatMutatesArray() {
    _lock.lock()
    _myArray.append(1)
    _lock.unlock()
  }
}

Arrays are implemented as structs in Swift. This is a value type, but because Array<T> implements copy-on-write, there is an issue if you do something like this from multiple threads:

let myClass = MyClass()

func callFromMultipleThreads() {
  let arrayCopy = myClass.myArray
  arrayCopy.append(2) // race condition here
}

This is quite problematic, because the user of MyClass shouldn’t have to worry about side-effects when using a copy of the value from myArray.

On Dec 5, 2017, at 8:22 PM, Jens Alfke <jens@mooseyard.com <mailto:jens@mooseyard.com>> wrote:

On Dec 5, 2017, at 6:24 AM, Michel Fortin via swift-users <swift-users@swift.org <mailto:swift-users@swift.org>> wrote:

The array *storage* is copy on write. The array variable (which essentially contains a pointer to the storage) is not copy on write. If you refer to the same array variable from multiple threads, you have a race. Rather, use a different copy of the variable to each thread. Copied variables will share the same storage but will make a copy of the storage when writing to it.

I think you’ve misunderstood. The race condition Romain is referring to is when the two threads both access the shared storage, through separate array variables.

Romain:
From the thread sanitizer warnings, it sounds like the implementation of Array is not using synchronization around its call(s) to isKnownUniquelyReferenced … which would mean the class is not thread-safe.

It’s pretty common for the regular (mutable) collection classes supplied by a framework to be non-thread-safe; for example consider Foundation and Java. The reason is that the overhead of taking a lock every time you access an array element is pretty high. Generally it’s preferable to use larger-granularity locks, i.e. grab an external lock before performing a number of array operations.

—Jens

_______________________________________________
swift-users mailing list
swift-users@swift.org <mailto:swift-users@swift.org>
https://lists.swift.org/mailman/listinfo/swift-users

_______________________________________________
swift-users mailing list
swift-users@swift.org
https://lists.swift.org/mailman/listinfo/swift-users

Hi Jordan,

For which specific code sample(s) do you think it’s a bug? In the previous code samples, are there cases where you think a data race is to be expected?

Thanks.

···

On Dec 6, 2017, at 12:05 AM, Jordan Rose <jordan_rose@apple.com> wrote:

I'm seeing the race too when compiling with -O (and TSan enabled). I'm 95% sure this should be valid Swift code without any further synchronization, so please file a bug at https://bugs.swift.org <Issues · apple/swift · GitHub. (But I'm only 95% sure because concurrency is hard.)

Looking at the backtraces, it looks like one thread thinks it has exclusive ownership of the buffer while the other thread is still copying things out of it. This is a bug on Swift's side; this code should work. I'm pretty sure this is actually a situation I was just talking about with Michael I from the stdlib team a few days ago, so it's good to have a test case for it.

Jordan

On Dec 5, 2017, at 14:23, Romain Jacquinot via swift-users <swift-users@swift.org <mailto:swift-users@swift.org>> wrote:

Also, on the official Swift blog (Value and Reference Types - Swift Blog - Apple Developer <Value and Reference Types - Swift Blog - Apple Developer), it is stated that:

"One of the primary reasons to choose value types over reference types is the ability to more easily reason about your code. If you always get a unique, copied instance, you can trust that no other part of your app is changing the data under the covers. This is especially helpful in multi-threaded environments where a different thread could alter your data out from under you.
[…]
In Swift, Array, String, and Dictionary are all value types. They behave much like a simple int value in C, acting as a unique instance of that data. You don’t need to do anything special — such as making an explicit copy — to prevent other code from modifying that data behind your back. Importantly, you can safely pass copies of values across threads without synchronization. In the spirit of improving safety, this model will help you write more predictable code in Swift.”

Yet, data race can occur here:

public class MyClass {

    private var _myArray: Array<Int> =
    private var _lock = NSLock()

    public var myArray: Array<Int> {
        lock.lock()
        defer {
            lock.unlock()
        }
        let copy = _myArray
        return copy
    }

    public func doSomethingThatMutatesArray() {
        _lock.lock()
        _myArray.append(1) // data race here: write of size 8 by thread 1
        _lock.unlock()
    }
}

let myClass = MyClass()

func mutateArray() {
    myClass.doSomethingThatMutatesArray()
    var arrayCopy = myClass.myArray
    arrayCopy.append(2) // data race here: read of size 8 by thread 10
}

let q1 = DispatchQueue(label: "testQ1")
let q2 = DispatchQueue(label: "testQ2")
let q3 = DispatchQueue(label: "testQ3")

let iterations = 100_000

q1.async {
    for _ in 0..<iterations {
        mutateArray()
    }
}

q2.async {
    for _ in 0..<iterations {
        mutateArray()
    }
}

q3.async {
    for _ in 0..<iterations {
        mutateArray()
    }
}

for _ in 0..<iterations {
     mutateArray()
}

q1.sync {}
q2.sync {}
q3.sync {}
NSLog("done")

It also can occur for instance if I replace the mutateArray() function with the following method:

func enumerateArray() {
    myClass.doSomethingThatMutatesArray()
    for element in myClass.myArray { // data race here: read of size 8 by thread 5
        let _ = element
    }
}

How can I return a “predictable” copy from the MyClass.myArray getter, so that I can later mutate the copy without synchronization like in the mutateArray() function?

On Dec 5, 2017, at 9:23 PM, Romain Jacquinot via swift-users <swift-users@swift.org <mailto:swift-users@swift.org>> wrote:

Hi Jens,

In the SynchronizedArray class, I use a lock to perform mutating operations on the array. However, my questions concern the case where an array is exposed as a public property of a thread-safe class, like this:

public class MyClass {

  private var _myArray: Array<Int> =
  private var _lock = NSLock()

  public var myArray: Array<Int> {
    _lock.lock()
    defer { _lock.unlock() }
    return _myArray
  }

  public func doSomethingThatMutatesArray() {
    _lock.lock()
    _myArray.append(1)
    _lock.unlock()
  }
}

Arrays are implemented as structs in Swift. This is a value type, but because Array<T> implements copy-on-write, there is an issue if you do something like this from multiple threads:

let myClass = MyClass()

func callFromMultipleThreads() {
  let arrayCopy = myClass.myArray
  arrayCopy.append(2) // race condition here
}

This is quite problematic, because the user of MyClass shouldn’t have to worry about side-effects when using a copy of the value from myArray.

On Dec 5, 2017, at 8:22 PM, Jens Alfke <jens@mooseyard.com <mailto:jens@mooseyard.com>> wrote:

On Dec 5, 2017, at 6:24 AM, Michel Fortin via swift-users <swift-users@swift.org <mailto:swift-users@swift.org>> wrote:

The array *storage* is copy on write. The array variable (which essentially contains a pointer to the storage) is not copy on write. If you refer to the same array variable from multiple threads, you have a race. Rather, use a different copy of the variable to each thread. Copied variables will share the same storage but will make a copy of the storage when writing to it.

I think you’ve misunderstood. The race condition Romain is referring to is when the two threads both access the shared storage, through separate array variables.

Romain:
From the thread sanitizer warnings, it sounds like the implementation of Array is not using synchronization around its call(s) to isKnownUniquelyReferenced … which would mean the class is not thread-safe.

It’s pretty common for the regular (mutable) collection classes supplied by a framework to be non-thread-safe; for example consider Foundation and Java. The reason is that the overhead of taking a lock every time you access an array element is pretty high. Generally it’s preferable to use larger-granularity locks, i.e. grab an external lock before performing a number of array operations.

—Jens

_______________________________________________
swift-users mailing list
swift-users@swift.org <mailto:swift-users@swift.org>
https://lists.swift.org/mailman/listinfo/swift-users

_______________________________________________
swift-users mailing list
swift-users@swift.org <mailto:swift-users@swift.org>
https://lists.swift.org/mailman/listinfo/swift-users

Ah, it's a Swift bug in the SynchronizedArray / MyClass cases, and your bug in the very first example with the global (since access to the global itself is not synchronized). The case I actually tested myself was with your most recent example ("Yet, data race can occur here").

Jordan

···

On Dec 5, 2017, at 15:53, Romain Jacquinot <rjacquinot@me.com> wrote:

Hi Jordan,

For which specific code sample(s) do you think it’s a bug? In the previous code samples, are there cases where you think a data race is to be expected?

Thanks.

On Dec 6, 2017, at 12:05 AM, Jordan Rose <jordan_rose@apple.com <mailto:jordan_rose@apple.com>> wrote:

I'm seeing the race too when compiling with -O (and TSan enabled). I'm 95% sure this should be valid Swift code without any further synchronization, so please file a bug at https://bugs.swift.org <Issues · apple/swift · GitHub. (But I'm only 95% sure because concurrency is hard.)

Looking at the backtraces, it looks like one thread thinks it has exclusive ownership of the buffer while the other thread is still copying things out of it. This is a bug on Swift's side; this code should work. I'm pretty sure this is actually a situation I was just talking about with Michael I from the stdlib team a few days ago, so it's good to have a test case for it.

Jordan

On Dec 5, 2017, at 14:23, Romain Jacquinot via swift-users <swift-users@swift.org <mailto:swift-users@swift.org>> wrote:

Also, on the official Swift blog (Value and Reference Types - Swift Blog - Apple Developer <Value and Reference Types - Swift Blog - Apple Developer), it is stated that:

"One of the primary reasons to choose value types over reference types is the ability to more easily reason about your code. If you always get a unique, copied instance, you can trust that no other part of your app is changing the data under the covers. This is especially helpful in multi-threaded environments where a different thread could alter your data out from under you.
[…]
In Swift, Array, String, and Dictionary are all value types. They behave much like a simple int value in C, acting as a unique instance of that data. You don’t need to do anything special — such as making an explicit copy — to prevent other code from modifying that data behind your back. Importantly, you can safely pass copies of values across threads without synchronization. In the spirit of improving safety, this model will help you write more predictable code in Swift.”

Yet, data race can occur here:

public class MyClass {

    private var _myArray: Array<Int> =
    private var _lock = NSLock()

    public var myArray: Array<Int> {
        lock.lock()
        defer {
            lock.unlock()
        }
        let copy = _myArray
        return copy
    }

    public func doSomethingThatMutatesArray() {
        _lock.lock()
        _myArray.append(1) // data race here: write of size 8 by thread 1
        _lock.unlock()
    }
}

let myClass = MyClass()

func mutateArray() {
    myClass.doSomethingThatMutatesArray()
    var arrayCopy = myClass.myArray
    arrayCopy.append(2) // data race here: read of size 8 by thread 10
}

let q1 = DispatchQueue(label: "testQ1")
let q2 = DispatchQueue(label: "testQ2")
let q3 = DispatchQueue(label: "testQ3")

let iterations = 100_000

q1.async {
    for _ in 0..<iterations {
        mutateArray()
    }
}

q2.async {
    for _ in 0..<iterations {
        mutateArray()
    }
}

q3.async {
    for _ in 0..<iterations {
        mutateArray()
    }
}

for _ in 0..<iterations {
     mutateArray()
}

q1.sync {}
q2.sync {}
q3.sync {}
NSLog("done")

It also can occur for instance if I replace the mutateArray() function with the following method:

func enumerateArray() {
    myClass.doSomethingThatMutatesArray()
    for element in myClass.myArray { // data race here: read of size 8 by thread 5
        let _ = element
    }
}

How can I return a “predictable” copy from the MyClass.myArray getter, so that I can later mutate the copy without synchronization like in the mutateArray() function?

On Dec 5, 2017, at 9:23 PM, Romain Jacquinot via swift-users <swift-users@swift.org <mailto:swift-users@swift.org>> wrote:

Hi Jens,

In the SynchronizedArray class, I use a lock to perform mutating operations on the array. However, my questions concern the case where an array is exposed as a public property of a thread-safe class, like this:

public class MyClass {

  private var _myArray: Array<Int> =
  private var _lock = NSLock()

  public var myArray: Array<Int> {
    _lock.lock()
    defer { _lock.unlock() }
    return _myArray
  }

  public func doSomethingThatMutatesArray() {
    _lock.lock()
    _myArray.append(1)
    _lock.unlock()
  }
}

Arrays are implemented as structs in Swift. This is a value type, but because Array<T> implements copy-on-write, there is an issue if you do something like this from multiple threads:

let myClass = MyClass()

func callFromMultipleThreads() {
  let arrayCopy = myClass.myArray
  arrayCopy.append(2) // race condition here
}

This is quite problematic, because the user of MyClass shouldn’t have to worry about side-effects when using a copy of the value from myArray.

On Dec 5, 2017, at 8:22 PM, Jens Alfke <jens@mooseyard.com <mailto:jens@mooseyard.com>> wrote:

On Dec 5, 2017, at 6:24 AM, Michel Fortin via swift-users <swift-users@swift.org <mailto:swift-users@swift.org>> wrote:

The array *storage* is copy on write. The array variable (which essentially contains a pointer to the storage) is not copy on write. If you refer to the same array variable from multiple threads, you have a race. Rather, use a different copy of the variable to each thread. Copied variables will share the same storage but will make a copy of the storage when writing to it.

I think you’ve misunderstood. The race condition Romain is referring to is when the two threads both access the shared storage, through separate array variables.

Romain:
From the thread sanitizer warnings, it sounds like the implementation of Array is not using synchronization around its call(s) to isKnownUniquelyReferenced … which would mean the class is not thread-safe.

It’s pretty common for the regular (mutable) collection classes supplied by a framework to be non-thread-safe; for example consider Foundation and Java. The reason is that the overhead of taking a lock every time you access an array element is pretty high. Generally it’s preferable to use larger-granularity locks, i.e. grab an external lock before performing a number of array operations.

—Jens

_______________________________________________
swift-users mailing list
swift-users@swift.org <mailto:swift-users@swift.org>
https://lists.swift.org/mailman/listinfo/swift-users

_______________________________________________
swift-users mailing list
swift-users@swift.org <mailto:swift-users@swift.org>
https://lists.swift.org/mailman/listinfo/swift-users

Ah, it's a Swift bug in the SynchronizedArray / MyClass cases

If I remove the intermediate copy in the myArray getter, should it — without Swift bug — also return a unique copy?

public var myArray: Array<Int> {
    lock.lock()
    defer { lock.unlock() }
    _myArray
}

By the way, here is a simpler sample code where TSan also detects a race condition:

// Code sample A

var array: [Int] = [1, 2, 3]

let iterations = 1_000_000

var copy1 = array
q1.async {
    for i in 0..<iterations {
        copy1.append(i)
    }
}

var copy2 = array
q2.async {
    for i in 0..<iterations {
        copy2.append(i)
    }
}

var copy3 = array
q3.async {
    for i in 0..<iterations {
        copy3.append(i)
    }
}

for i in 0..<iterations {
    array.append(i)
}

q1.sync {}
q2.sync {}
q3.sync {}
NSLog("done")

I can avoid the race condition to occur by using a capture list:

// Code sample B

var array: [Int] = [1, 2, 3]

let iterations = 1_000_000

q1.async { [array] in
    for i in 0..<iterations {
        var copy = array
        copy.append(i)
    }
}

q2.async { [array] in
    for i in 0..<iterations {
        var copy = array
        copy.append(i)
    }
}

q3.async { [array] in
    for i in 0..<iterations {
        var copy = array
        copy.append(i)
    }
}

for i in 0..<iterations {
    array.append(i)
}

q1.sync {}
q2.sync {}
q3.sync {}
NSLog("done")

or by using a constant copy, which, if I’m not wrong, should behave the same way than the capture list:

// Code sample C

var array: [Int] = [1, 2, 3]

let iterations = 1_000_000

let copy1 = array
q1.async {
    var copy1 = copy1
    for i in 0..<iterations {
        copy1.append(i)
    }
}

let copy2 = array
q2.async {
    var copy2 = copy2
    for i in 0..<iterations {
        copy2.append(i)
    }
}

let copy3 = array
q3.async {
    var copy3 = copy3
    for i in 0..<iterations {
        copy3.append(i)
    }
}

for _ in 0..<iterations {
    array.append(array.count)
}

q1.sync {}
q2.sync {}
q3.sync {}
NSLog("done")

Do you also think the data race in "sample code A" above is a Swift bug?

···

On Dec 6, 2017, at 1:04 AM, Jordan Rose <jordan_rose@apple.com> wrote:

Ah, it's a Swift bug in the SynchronizedArray / MyClass cases, and your bug in the very first example with the global (since access to the global itself is not synchronized). The case I actually tested myself was with your most recent example ("Yet, data race can occur here").

Jordan

On Dec 5, 2017, at 15:53, Romain Jacquinot <rjacquinot@me.com <mailto:rjacquinot@me.com>> wrote:

Hi Jordan,

For which specific code sample(s) do you think it’s a bug? In the previous code samples, are there cases where you think a data race is to be expected?

Thanks.

On Dec 6, 2017, at 12:05 AM, Jordan Rose <jordan_rose@apple.com <mailto:jordan_rose@apple.com>> wrote:

I'm seeing the race too when compiling with -O (and TSan enabled). I'm 95% sure this should be valid Swift code without any further synchronization, so please file a bug at https://bugs.swift.org <Issues · apple/swift · GitHub. (But I'm only 95% sure because concurrency is hard.)

Looking at the backtraces, it looks like one thread thinks it has exclusive ownership of the buffer while the other thread is still copying things out of it. This is a bug on Swift's side; this code should work. I'm pretty sure this is actually a situation I was just talking about with Michael I from the stdlib team a few days ago, so it's good to have a test case for it.

Jordan

On Dec 5, 2017, at 14:23, Romain Jacquinot via swift-users <swift-users@swift.org <mailto:swift-users@swift.org>> wrote:

Also, on the official Swift blog (Value and Reference Types - Swift Blog - Apple Developer <Value and Reference Types - Swift Blog - Apple Developer), it is stated that:

"One of the primary reasons to choose value types over reference types is the ability to more easily reason about your code. If you always get a unique, copied instance, you can trust that no other part of your app is changing the data under the covers. This is especially helpful in multi-threaded environments where a different thread could alter your data out from under you.
[…]
In Swift, Array, String, and Dictionary are all value types. They behave much like a simple int value in C, acting as a unique instance of that data. You don’t need to do anything special — such as making an explicit copy — to prevent other code from modifying that data behind your back. Importantly, you can safely pass copies of values across threads without synchronization. In the spirit of improving safety, this model will help you write more predictable code in Swift.”

Yet, data race can occur here:

public class MyClass {

    private var _myArray: Array<Int> =
    private var _lock = NSLock()

    public var myArray: Array<Int> {
        lock.lock()
        defer {
            lock.unlock()
        }
        let copy = _myArray
        return copy
    }

    public func doSomethingThatMutatesArray() {
        _lock.lock()
        _myArray.append(1) // data race here: write of size 8 by thread 1
        _lock.unlock()
    }
}

let myClass = MyClass()

func mutateArray() {
    myClass.doSomethingThatMutatesArray()
    var arrayCopy = myClass.myArray
    arrayCopy.append(2) // data race here: read of size 8 by thread 10
}

let q1 = DispatchQueue(label: "testQ1")
let q2 = DispatchQueue(label: "testQ2")
let q3 = DispatchQueue(label: "testQ3")

let iterations = 100_000

q1.async {
    for _ in 0..<iterations {
        mutateArray()
    }
}

q2.async {
    for _ in 0..<iterations {
        mutateArray()
    }
}

q3.async {
    for _ in 0..<iterations {
        mutateArray()
    }
}

for _ in 0..<iterations {
     mutateArray()
}

q1.sync {}
q2.sync {}
q3.sync {}
NSLog("done")

It also can occur for instance if I replace the mutateArray() function with the following method:

func enumerateArray() {
    myClass.doSomethingThatMutatesArray()
    for element in myClass.myArray { // data race here: read of size 8 by thread 5
        let _ = element
    }
}

How can I return a “predictable” copy from the MyClass.myArray getter, so that I can later mutate the copy without synchronization like in the mutateArray() function?

On Dec 5, 2017, at 9:23 PM, Romain Jacquinot via swift-users <swift-users@swift.org <mailto:swift-users@swift.org>> wrote:

Hi Jens,

In the SynchronizedArray class, I use a lock to perform mutating operations on the array. However, my questions concern the case where an array is exposed as a public property of a thread-safe class, like this:

public class MyClass {

  private var _myArray: Array<Int> =
  private var _lock = NSLock()

  public var myArray: Array<Int> {
    _lock.lock()
    defer { _lock.unlock() }
    return _myArray
  }

  public func doSomethingThatMutatesArray() {
    _lock.lock()
    _myArray.append(1)
    _lock.unlock()
  }
}

Arrays are implemented as structs in Swift. This is a value type, but because Array<T> implements copy-on-write, there is an issue if you do something like this from multiple threads:

let myClass = MyClass()

func callFromMultipleThreads() {
  let arrayCopy = myClass.myArray
  arrayCopy.append(2) // race condition here
}

This is quite problematic, because the user of MyClass shouldn’t have to worry about side-effects when using a copy of the value from myArray.

On Dec 5, 2017, at 8:22 PM, Jens Alfke <jens@mooseyard.com <mailto:jens@mooseyard.com>> wrote:

On Dec 5, 2017, at 6:24 AM, Michel Fortin via swift-users <swift-users@swift.org <mailto:swift-users@swift.org>> wrote:

The array *storage* is copy on write. The array variable (which essentially contains a pointer to the storage) is not copy on write. If you refer to the same array variable from multiple threads, you have a race. Rather, use a different copy of the variable to each thread. Copied variables will share the same storage but will make a copy of the storage when writing to it.

I think you’ve misunderstood. The race condition Romain is referring to is when the two threads both access the shared storage, through separate array variables.

Romain:
From the thread sanitizer warnings, it sounds like the implementation of Array is not using synchronization around its call(s) to isKnownUniquelyReferenced … which would mean the class is not thread-safe.

It’s pretty common for the regular (mutable) collection classes supplied by a framework to be non-thread-safe; for example consider Foundation and Java. The reason is that the overhead of taking a lock every time you access an array element is pretty high. Generally it’s preferable to use larger-granularity locks, i.e. grab an external lock before performing a number of array operations.

—Jens

_______________________________________________
swift-users mailing list
swift-users@swift.org <mailto:swift-users@swift.org>
https://lists.swift.org/mailman/listinfo/swift-users

_______________________________________________
swift-users mailing list
swift-users@swift.org <mailto:swift-users@swift.org>
https://lists.swift.org/mailman/listinfo/swift-users

Ah, it's a Swift bug in the SynchronizedArray / MyClass cases

If I remove the intermediate copy in the myArray getter, should it — without Swift bug — also return a unique copy?

public var myArray: Array<Int> {
    lock.lock()
    defer { lock.unlock() }
    _myArray
}

Yes. As soon as you treat it as a value (including returning it), it should be independent. The assignment to a local variable doesn't really add anything interesting, and the optimizer will throw it out.

By the way, here is a simpler sample code where TSan also detects a race condition:

// Code sample A

var array: [Int] = [1, 2, 3]

let iterations = 1_000_000

var copy1 = array
q1.async {
    for i in 0..<iterations {
        copy1.append(i)
    }
}

var copy2 = array
q2.async {
    for i in 0..<iterations {
        copy2.append(i)
    }
}

var copy3 = array
q3.async {
    for i in 0..<iterations {
        copy3.append(i)
    }
}

for i in 0..<iterations {
    array.append(i)
}

q1.sync {}
q2.sync {}
q3.sync {}
NSLog("done")

Yes, this would be failing for the same reason (didn't test it). The mutations are operating on different Array variables, but they start off sharing the same storage, and that means we can get into the same situation where one queue thinks it can modify the storage in place while another queue is still copying the initial contents.

I can avoid the race condition to occur by using a capture list:

// Code sample B

var array: [Int] = [1, 2, 3]

let iterations = 1_000_000

q1.async { [array] in
    for i in 0..<iterations {
        var copy = array
        copy.append(i)
    }
}

q2.async { [array] in
    for i in 0..<iterations {
        var copy = array
        copy.append(i)
    }
}

q3.async { [array] in
    for i in 0..<iterations {
        var copy = array
        copy.append(i)
    }
}

for i in 0..<iterations {
    array.append(i)
}

q1.sync {}
q2.sync {}
q3.sync {}
NSLog("done")

or by using a constant copy, which, if I’m not wrong, should behave the same way than the capture list:

// Code sample C

var array: [Int] = [1, 2, 3]

let iterations = 1_000_000

let copy1 = array
q1.async {
    var copy1 = copy1
    for i in 0..<iterations {
        copy1.append(i)
    }
}

let copy2 = array
q2.async {
    var copy2 = copy2
    for i in 0..<iterations {
        copy2.append(i)
    }
}

let copy3 = array
q3.async {
    var copy3 = copy3
    for i in 0..<iterations {
        copy3.append(i)
    }
}

for _ in 0..<iterations {
    array.append(array.count)
}

q1.sync {}
q2.sync {}
q3.sync {}
NSLog("done")

Yep, these two are equivalent, at least as far as this issue goes. This dodges the issue because the capture or the 'let' keeps an extra reference to the original array at all times, meaning that none of the mutations are ever done in-place.

Thank you again for catching this! (It's also possible we've already fixed it on master, but it's still worth having a bug report so we can add a regression test.)

Jordan

···

On Dec 5, 2017, at 16:24, Romain Jacquinot <rjacquinot@me.com> wrote:

Sorry, forgot the “return" keyword in my example:

public var myArray: Array<Int> {
    lock.lock()
    defer { lock.unlock() }
    return _myArray
}

···

On Dec 6, 2017, at 1:24 AM, Romain Jacquinot via swift-users <swift-users@swift.org> wrote:

public var myArray: Array<Int> {
    lock.lock()
    defer { lock.unlock() }
    _myArray
}

Thank you Jordan.

I’ve filed a bug at bugs.swift.org <Issues · apple/swift · GitHub https://bugs.swift.org/browse/SR-6543\.

···

On Dec 6, 2017, at 2:47 AM, Jordan Rose <jordan_rose@apple.com> wrote:

On Dec 5, 2017, at 16:24, Romain Jacquinot <rjacquinot@me.com <mailto:rjacquinot@me.com>> wrote:

Ah, it's a Swift bug in the SynchronizedArray / MyClass cases

If I remove the intermediate copy in the myArray getter, should it — without Swift bug — also return a unique copy?

public var myArray: Array<Int> {
    lock.lock()
    defer { lock.unlock() }
    _myArray
}

Yes. As soon as you treat it as a value (including returning it), it should be independent. The assignment to a local variable doesn't really add anything interesting, and the optimizer will throw it out.

By the way, here is a simpler sample code where TSan also detects a race condition:

// Code sample A

var array: [Int] = [1, 2, 3]

let iterations = 1_000_000

var copy1 = array
q1.async {
    for i in 0..<iterations {
        copy1.append(i)
    }
}

var copy2 = array
q2.async {
    for i in 0..<iterations {
        copy2.append(i)
    }
}

var copy3 = array
q3.async {
    for i in 0..<iterations {
        copy3.append(i)
    }
}

for i in 0..<iterations {
    array.append(i)
}

q1.sync {}
q2.sync {}
q3.sync {}
NSLog("done")

Yes, this would be failing for the same reason (didn't test it). The mutations are operating on different Array variables, but they start off sharing the same storage, and that means we can get into the same situation where one queue thinks it can modify the storage in place while another queue is still copying the initial contents.

I can avoid the race condition to occur by using a capture list:

// Code sample B

var array: [Int] = [1, 2, 3]

let iterations = 1_000_000

q1.async { [array] in
    for i in 0..<iterations {
        var copy = array
        copy.append(i)
    }
}

q2.async { [array] in
    for i in 0..<iterations {
        var copy = array
        copy.append(i)
    }
}

q3.async { [array] in
    for i in 0..<iterations {
        var copy = array
        copy.append(i)
    }
}

for i in 0..<iterations {
    array.append(i)
}

q1.sync {}
q2.sync {}
q3.sync {}
NSLog("done")

or by using a constant copy, which, if I’m not wrong, should behave the same way than the capture list:

// Code sample C

var array: [Int] = [1, 2, 3]

let iterations = 1_000_000

let copy1 = array
q1.async {
    var copy1 = copy1
    for i in 0..<iterations {
        copy1.append(i)
    }
}

let copy2 = array
q2.async {
    var copy2 = copy2
    for i in 0..<iterations {
        copy2.append(i)
    }
}

let copy3 = array
q3.async {
    var copy3 = copy3
    for i in 0..<iterations {
        copy3.append(i)
    }
}

for _ in 0..<iterations {
    array.append(array.count)
}

q1.sync {}
q2.sync {}
q3.sync {}
NSLog("done")

Yep, these two are equivalent, at least as far as this issue goes. This dodges the issue because the capture or the 'let' keeps an extra reference to the original array at all times, meaning that none of the mutations are ever done in-place.

Thank you again for catching this! (It's also possible we've already fixed it on master, but it's still worth having a bug report so we can add a regression test.)

Jordan