Advanced type system for preventing data races.


(Nickolas Pohilets) #1

Hello, swift-evolution

This is a big feature, and I don't expect it to get into Swift 3.0, even
though it may require ABI changes. But still, I find it important enough to
put it on the table for discussion.

Code examples below are quite sketchy - just some Swift-like pseudocode to
express the idea, rather then an actual proposal of changes to the language
syntax.

Basic idea is to associate each variable with a lock that guards it though
a mechanism of lock names. Variable can be read only if associated lock is
held for reading or writing. Variable can be written only if associated
lock is held for writing. Locks are mutexes, read-write locks and also
special locks that cover cases of thread-contained variables and immutable
data.

class Foo<Lk : LockName> {
    var bar under Lk : Int = 0
}

func add<DstLk : LockName, SrcLk : LockName>(dst : Foo<DstLk>, src :
Foo<SrcLk>, srcMutex : ReadWriteLock<SrcLk>) write DstLk {
    // Held locks: [(write DstLk)]
    dst.bar += src.bar // Error: cannot read src.bar because SrcLk is not
held
    sync read srcMutex {
        // Held locks: [(write DstLk), (read SrcLk)]
        dst.bar += src.bar // Ok
    }
    // Held locks: [(write DstLk)]
}

Lock names are generic parameters of classes and functions and kinds are
used to distinguish between type parameters and lock name parameters.
Ideally lock names should not require witness tables or any other runtime
otherhead, but there are few open questions related to dynamic downcasting.

Function signature includes lock requirements - in the example above,
function add() can be called only if lock DstLk is known to be held for
writing at the call site.
For every statement inside the function, compiler tracks a set of locks
currently being held. Initially this set is initialized with lock
requirements, and sync statement adds one more lock to this set for the
scope of it's body.

Constructing new locks returns a dependent tuple:

func newMutex() -> (L : LockName, Mutex<L>)

Unpacking that tuple allows to create objects protected by newly created
mutex:

let (Lk, m) = newMutex()
let f = Foo<Lk>()

Lock names can be assigned only to constants. Each call of the newMutex()
returns distinct lock name. If two generic types contain lock names and
compiler cannot prove that these lock names are equal, then such types are
considered to be distinct:

let (Lk1, m1) = newMutex()
let f1 = Foo<Lk1>()
let (Lk2, m2) = newMutex()
let f2 = Foo<Lk2>()
let Lk3 : LockName = Lk1
var f3 : Foo<Lk3> = f1 // Ok, Lk3 == Lk1 => Foo<Lk3> == Foo<Lk1>
f3 = f2 // Error, Lk3 != Lk2 => Foo<Lk3> != Foo<Lk2>

Fields of structs and enums are always protected by the same lock that
protects the entire struct/enum and that lock is not a part of a variable
type, but rather an attribute of a variable itself. Classes may use
different locks to protect their fields.

There is a special kind of lock names - ThreadName, a subkind of LockName.
Each function has at least one lock name of kind ThreadName for which it
requires write access. This is a special lock name used for
thread-contained objects. There is no mutex and any other real lock
associated with this name, but write access is granted at thread entry
point. Function main() has signature:

func main<ThreadLk : LockName>() write ThreadLk

And API for creating a new thread takes a first-class generic function of
type:

typealias ThreadFunc = <ThreadLk : LockName>() write ThreadLk -> Void

If there is no explicit lock name of this kind for which write access is
requested, that implicit generic argument called ThisThread is added to the
function:

func main() ==> func main<ThisThread : ThreadName>() write ThreadName

If there is one, then ThisThread is defined as an alias to that argument.

func main<ThreadLk : LockName>() write ThreadLk { ... }
==>
func main<ThreadLk : LockName>() write ThreadLk {
let ThisThread = ThreadLk
    ...
}

If there are multiple such locks, then all them are considered equal.

Inside the closure, there is also a ThisThread lock name, which shadows the
definition of the ThisThread from the parent scope. These ThisThread's are
not equal.

A small demo of the described above:

class RunLoop<Lk : ThreadLock> {
    typealias BlockType = () write Lk -> Void
    let MutexLock : LockName
    let mutex : Mutex<MutexLock>
    let cond : WaitCondition<MutexLock>
    var queue under MutexLock : [BlockType]
    var stopped under MutexLock : Bool = false

    init() {
        // Lock is not required to be held to initialize a variable
        (MutexLock, mutex) = newMutex()
        cond = WaitCondition<MutexLock>()
    }

    func run() write Lk {
        while let block = takeBlock() {
            block();
        }
    }

    func stop() {
        sync write mutex {
            stopped = true
        }
    }

    func enqueue(block : BlockType) { // No lock requirements - can be
called from any thread
        sync write mutex {
            queue += [block]
            cond.signal()
        }
    }

    func takeBlock() -> BlockType? {
        sync write mutex {
            while (queue.isEmpty) {
                if stopped {
                    return nil
                }
                cond.wait()
            }
            let retVal = queue[0]
            queue.removeAtIndex(0)
            return retVal
        }
    }
}

/// Thread-local variable
thread var _currentRunLoop : RunLoop<ThisThread>? = nil

func currentRunLoop() -> RunLoop<ThisThread> {
    if let loop = _currentRunLoop {
        return loop
    } else {
        let loop = RunLoop<ThisThread>()
        _currentRunLoop = loop
        return loop
    }
}

struct Rect {
    var x : Int
    var y : Int
    var width : Int
    var height : Int
}

class View<Lk : LockName> {
    var frame under Lk : Rect
    var subviews under Lk : [View<Lk>]
}

func foo(subviews : [View<ThisThread>]) {}

func main() -> Void {
    let loop = currentRunLoop()
    let view = View<ThisThread>()
    let Lk : ThreadName = ThisThread
    spawn({ // Starts new thread
        var sum = 0
        for x in 1..<1000 {
            sum += x
        }
        loop.enqueue({
            () write Lk in
            view.frame.x = sum
            foo(view.subviews)
        })
    })
    loop.run()
}

Immutable variables can be protected by the special lock name constant -
Immutable. Read lock for this lock name is always implicitly assumed to be
held. But write lock cannot be acquired, because it is not possible to
create a mutex or read-write lock associated with this name.

Atomic types are classes whose methods don't have any lock requirements,
and thus can be called from any thread. They are reference types, because
identity is important for atomic variables.

Described type system is based on this paper -
https://homes.cs.washington.edu/~djg/papers/cycthreads.pdf, with some
modifications:
* Lock name for the current thread is not a constant, but an argument to
every function. So, sharabilities are not needed. Also sharabilities
prevent passing references to thread-local objects to other threads without
dereferencing them - this is a very common thing for callbacks accessing UI
objects.
* Read and Write locks are distinguished, and Immutable lock is added

There are some known limitations:
1. Lock-less migration between threads is not supported

class Foo {
    var x : Int = 0
    fun inc() { x += 1 }
}

func main() {
    let f = Foo()
    f.inc()
    let t = spawn({
       f.inc()
    })
    t.join()
    f.inc()
}

Either lock needs to be added, or deep copy should be made.

2. Mutation after initialization is not supported:

class Foo<L : LockName> {
    var bar under L : Int

    init(x : Int) {
        self.bar = x
        self.bar += 1 <— ERROR
    }
}

Either calculations need to be done in local variables before
initialization, or lock requirement needs to be added.

3. And the most disturbing one - is subclass has lock names not mentioned
into base class or protocol, then dynamic downcast is a problem:

class Foo<A : LockName> {
}

class Bar<A : LockName, B : LockName> : Foo<A> {
    func update() write B { … }
}

func process<Lk : LockName>(foo : Foo<Lk>) {
    if let bar = foo as? Bar<Lk, ThisThread> {
        bar.update()
    }
}

This can be implemented by storing lock names in runtime and actually
checking them during the cast, but this seems to me to be too expensive.

Alternatives are:
- Forbidding such downcasts
- using existential subcasting:

class Foo<A : LockName> {
}

class Bar<A : LockName, B : LockName> : Foo<A> {
    func update() write A { … }
}

func process<Lk : LockName>(foo : Foo<Lk>) {
    if let (bar, B_Lk) = foo as? Bar<Lk, _> {
        bar.update() <— ERROR
    }
}

But usefulness of obtained lock name is doubtful - it can be used only if
associated lock is stored inside the object itself.