[Proposal] Threadsafe lazy vars


(Michael Peternell) #1

Hi all,

lazy vars are not threadsafe in Swift 2. I saw code that uses lazy initialization in an unsafe way that introduces race conditions so often that I would be rich by now if get a penny each time. Many people use patterns like { if(_var == nil) { _var = [self _calculateVar]; } return _var; } or they just dispatch_once, but forget that they are in an instance method, and that it will all break if there is ever more than one instance of that class.

I propose to make lazy vars atomic. Optionally, the old lazy var behavior could be renamed to something like lazy_nonatomic.

I want to list some pros and cons for making lazy vars threadsafe:

Pros:
- This proposal will not change the behavior of programs which are free from data races. I could argue that the change is therefore backwards-compatible.
- I would say that programs which require lazy vars to be nonatomic in order to function correctly, are really bad style; threadsafe lazy vars behave much more deterministic. Many programs which use lazy vars incorrectly could suddenly become safe if this proposal is implemented.
- The overhead would be minimal. For example, suppose we have a lazy var of type `NSImage`. We could represent that variable as a simple pointer which is initialized to NULL. The access could look something like this (this is just an example, there may be even more efficient solutions): {
    // we need to make sure that reads on _var are not cached:
    memory_read_barrier(&_var);
    // ^^and I'm not 100% sure that we really need that memory barrier.
    // (at least it's not needed for static vars, as proven by the implementation of dispatch_once())

    if(_var == nil) {
        @synchronized(&_var) {
            // ^^we synchronize on &_var, and not on _var
            // this is semantically invalid in objc, but the objc-runtime supports it.
            // The point I want to make is that we don't need extra storage for the
            // synchronization, in many cases.
            if(_var != nil) {
                return _var;
            }
            ... some code that initializes _var
        }
        //@synchronized() already employs memory barriers, so no additional barriers are needed
        //maybe we should use a non-recursive lock though..
    }
    return _var;
}
- Currently, if you need threadsafety, you cannot use lazy. You can of course wrap a lock around a nonatomic lazy var, but that would be much more inefficient than a native implementation.
- I guess, no one will really complain if lazy var's are suddenly threadsafe. I also cannot see how it would break any code (except for contrived examples.)
- In some cases, the nonatomic behavior can be used as an optimization, if it is semantically equivalent. For example, a lazy var that lives in automatic storage (i.e. not an ivar or static var, but just a local var) and that is *not* captured in a closure expression can be safely initialized in a non-threadsafe way, because the variable can not be accessed from more than one thread concurrently anyways.

Cons:
- This would be the first concurrency primitive built into the language (at least as far as I know)
- It may suggest to users of the language that other primitives (like var's) would be threadsafe too, which is obviously not the case.
- There is at least *some* runtime overhead involved. It's not zero-cost. On the other hand, lazy initialization should only be used when the cost of initialization is much higher than the cost of creating and maintaining a thunk. And in that case, I think the performance characteristics are pretty well.
- It may be out of scope for Swift 3 :frowning:

Proposed solution:

    public lazy var foo: Type = fn()

is semantically equivalent to

    private var _lazy_storage_foo: Type?
    private var _lazy_lock_foo: Lock
    public var foo: Type {
        get {
            var result: Type?
            _lazy_lock_foo.withLock {
                if(_lazy_storage_foo == nil) {
                    _lazy_storage_foo = fn()
                }
            }
            return _lazy_storage_foo!
        }
    }

except that the builtin solution is much more efficient, and that the two private extra vars are not exposed when you use the lazy keyword.

All in all, I think that threadsafe lazy vars would be a nice feature for the language. I welcome feedback and am interested in a discussion.

Regards,
Michael


(Andrew Trick) #2

These are good points. I think we need both nonatomic and atomic lazy variables. The syntax and scaffolding will likely fall out of Property Behaviors:
https://github.com/apple/swift-evolution/blob/master/proposals/0030-property-behavior-decls.md

All that’s left would be optimizing the implementation, which would be premature to discuss.

-Andy

···

On Apr 6, 2016, at 2:07 PM, Michael Peternell via swift-evolution <swift-evolution@swift.org> wrote:

Hi all,

lazy vars are not threadsafe in Swift 2. I saw code that uses lazy initialization in an unsafe way that introduces race conditions so often that I would be rich by now if get a penny each time. Many people use patterns like { if(_var == nil) { _var = [self _calculateVar]; } return _var; } or they just dispatch_once, but forget that they are in an instance method, and that it will all break if there is ever more than one instance of that class.

I propose to make lazy vars atomic. Optionally, the old lazy var behavior could be renamed to something like lazy_nonatomic.

I want to list some pros and cons for making lazy vars threadsafe:

Pros:
- This proposal will not change the behavior of programs which are free from data races. I could argue that the change is therefore backwards-compatible.
- I would say that programs which require lazy vars to be nonatomic in order to function correctly, are really bad style; threadsafe lazy vars behave much more deterministic. Many programs which use lazy vars incorrectly could suddenly become safe if this proposal is implemented.
- The overhead would be minimal. For example, suppose we have a lazy var of type `NSImage`. We could represent that variable as a simple pointer which is initialized to NULL. The access could look something like this (this is just an example, there may be even more efficient solutions): {
   // we need to make sure that reads on _var are not cached:
   memory_read_barrier(&_var);
   // ^^and I'm not 100% sure that we really need that memory barrier.
   // (at least it's not needed for static vars, as proven by the implementation of dispatch_once())

   if(_var == nil) {
       @synchronized(&_var) {
           // ^^we synchronize on &_var, and not on _var
           // this is semantically invalid in objc, but the objc-runtime supports it.
           // The point I want to make is that we don't need extra storage for the
           // synchronization, in many cases.
           if(_var != nil) {
               return _var;
           }
           ... some code that initializes _var
       }
       //@synchronized() already employs memory barriers, so no additional barriers are needed
       //maybe we should use a non-recursive lock though..
   }
   return _var;
}
- Currently, if you need threadsafety, you cannot use lazy. You can of course wrap a lock around a nonatomic lazy var, but that would be much more inefficient than a native implementation.
- I guess, no one will really complain if lazy var's are suddenly threadsafe. I also cannot see how it would break any code (except for contrived examples.)
- In some cases, the nonatomic behavior can be used as an optimization, if it is semantically equivalent. For example, a lazy var that lives in automatic storage (i.e. not an ivar or static var, but just a local var) and that is *not* captured in a closure expression can be safely initialized in a non-threadsafe way, because the variable can not be accessed from more than one thread concurrently anyways.

Cons:
- This would be the first concurrency primitive built into the language (at least as far as I know)
- It may suggest to users of the language that other primitives (like var's) would be threadsafe too, which is obviously not the case.
- There is at least *some* runtime overhead involved. It's not zero-cost. On the other hand, lazy initialization should only be used when the cost of initialization is much higher than the cost of creating and maintaining a thunk. And in that case, I think the performance characteristics are pretty well.
- It may be out of scope for Swift 3 :frowning:

Proposed solution:

   public lazy var foo: Type = fn()

is semantically equivalent to

   private var _lazy_storage_foo: Type?
   private var _lazy_lock_foo: Lock
   public var foo: Type {
       get {
           var result: Type?
           _lazy_lock_foo.withLock {
               if(_lazy_storage_foo == nil) {
                   _lazy_storage_foo = fn()
               }
           }
           return _lazy_storage_foo!
       }
   }

except that the builtin solution is much more efficient, and that the two private extra vars are not exposed when you use the lazy keyword.

All in all, I think that threadsafe lazy vars would be a nice feature for the language. I welcome feedback and am interested in a discussion.