Actor structs?

I think I see what you're going for, Dave, and it's cute. I definitely see why you think of it as a sort of actor with value semantics. That's leading to some avoidable confusion, though, so it might be better to adopt a different name, at least in the short term.

So, each value-actor stores an underlying value and a queue of value-specific jobs that manipulate the underlying value. When you copy the value, you don't copy the queue or the underlying value; instead, the new value gets a "promise", basically a job added to the old value's queue to copy its underlying value into the new value. Operations enqueued on the new value will be blocked until that promise is fulfilled, essentially as if the promise were a running item in the new value's queue. (Obviously some of this work can happen synchronously in the common case where there's nothing running or pending in the old value's queue.)

I think there are fundamentally two problems with this.

The first is that it's unclear how you would ever get multiple operations enqueued on the actor. If this is a normal value with normal exclusivity, how do we manage to make two asynchronous mutating calls on it at once? An asynchronous call has to happen from an async function, which is going to wait for that call to complete before it can make another call. You can of course have multiple functions making mutating cals on that value concurrently, using some sort of shared storage, but that's an exclusivity violation. And if you don't have multiple enqueued operations, this isn't doing much.

What you're describing is something much more like a "fire and forget" API, where all the operations on the value-actor are externally synchronous; basically it's a value type with some major underlying trickery about where the computation happens. I can see why that would be an interesting feature, but it's very different from what we call actors in Swift, and I'm not sure how generally useful it is.

The second is that I think most uses will need to be able to get a synchronous snapshot. For example, you say:

But this sort of copy is an asynchronous snapshot; we can't actually read from it synchronously if there's a pending/running operation on the value. The UI presumably wants to be able to synchronously walk the document to decide what to render. So I actually need a DocumentSnapshot type that reflects the underlying value of my Document value-semantics actor; and once I have that, it's unclear why I can't just keep Document as a value type, with an ordinary reference-type actor that manages the canonical copy, applying edits and so forth, and which periodically updates the UI with the latest snapshot.

12 Likes

I have the following understanding: The law of exclusivity should apply to actor values just as it does to regular values. So, the only difference between a regular value and an actor value is that all operations on the actor value are asynchronous and will be executed on a different actor. There will not be multiple operations running concurrently using the same value.

Similar to:

@BackgroundActor
func asyncWith<T>(_ value: inout T, f: (inout T) async -> Void) async {  
  await f(&value) 
}

var values = [1, 2, 3]
await asyncWith(&values[1]) { $0 += 40 }

// Not possible
await asyncWith(&values[1]) { $0 += values[2] } // error: overlapping accesses to 'values', but modification requires exclusive access

Exclusivity is still guaranteed, for the Int value as well as the complete Array. The only difference is that the operation is executed asynchronously on a different actor.

With an actor struct, an IntActor and thus Array<IntActor> could be created, which makes it cleaner:

await values[1] += 40

But the same exclusivity rules still apply. There is only one owner of this actor value and no other code can use it.

I see @John_McCall beat me to it and is basically saying the same thing (and more), but hopefully this still provides some value.

I'm not at all interested in arguing about whether this thing can legitimately be called “an actor,” FWIW. What to call this thing is at the bottom of my list of priorities.

I don't know where that idea comes from, but it's just not so. A value type whose access is moderated by a serial queue can easily change state and alter its behavior. A value type instance that is never copied is indistinguishable from a reference type instance that is never multiply-referenced.

No one is suggesting copying a value-typed actor “from the outside.” All of the data access is performed “on the inside,” just as with a manually written clone() method on a reference-typed actor… except that the compiler can synthesize a clone() for a value type.

Value types in Swift are an excellent way to represent mutable state. Are you sure you're not confusing value semantics with immutability?

Open to suggestions, seriously. “Serialized async value” doesn't seem great to me.

So, each value-actor stores an underlying value and a queue of value-specific jobs that manipulate the underlying value. When you copy the value, you don't copy the queue or the underlying value; instead, the new value gets a "promise", basically a job added to the old value's queue to copy its underlying value into the new value. Operations enqueued on the new value will be blocked until that promise is fulfilled, essentially as if the promise were a running item in the new value's queue. (Obviously some of this work can happen synchronously in the common case where there's nothing running or pending in the old value's queue.)

If when you say “blocked” you don't mean blocking a thread but just suspending the client task, this sounds about right. I have also mentioned a cruder but equally valid approach, “blocking” (i.e. suspending) the client at the point where the actual copy is made, only allowing it to resume after all the previously-queued items are finished and the copy has been made. But I think that probably leaves some potential parallelism on the floor, so I prefer the approach you described.

I think there are fundamentally two problems with this.

The first is that it's unclear how you would ever get multiple operations enqueued on the actor. If this is a normal value with normal exclusivity, how do we manage to make two asynchronous mutating calls on it at once?

I'm confused; of course you don't ever do two things at once to a value. However, it's easy to get multiple things onto the queue.

doc.gaussianBlurSelection() // gaussian blur is queued, may start
doc.brighten(1.2)           // brighten is queued
//...
let tinyBlurredAndBrightened = await /*if you must*/ doc.thumbnail

In this example, at least two mutating operations are notionally queued. In the last line, if they are not complete, the caller is suspended until they have been completed.

An asynchronous call has to happen from an async function, which is going to wait for that call to complete before it can make another call.

Well, maybe we have to change the definition of “complete.” An async call that returns nothing and operates only on a thing with value semantics can be seen by its clients to “complete” immediately.

You can of course have multiple functions making mutating cals on that value concurrently, using some sort of shared storage, but that's an exclusivity violation.

Of course! I wouldn't dream of making multiple mutating calls concurrently on a value. Another way of saying “that's an exclusivity violation” is “that would undermine the whole point of value semantics!”

What you're describing is something much more like a "fire and forget" API, where all the operations on the value-actor are externally synchronous; basically it's a value type with some major underlying trickery about where the computation happens.

I don't know what trickery you're talking about, but it's not “fire and forget.” You can still read from the thing, and that read is serialized after all the already-queued operations.

I can see why that would be an interesting feature, but it's very different from what we call actors in Swift, and I'm not sure how generally useful it is.

Obviously that has yet to be demonstrated, but I think we've seen that the vast majority of mutable state does not need to be shared, and it is a key strength of Swift that programmers can choose to make state non-shared (as opposed to “everything is a reference type”).

We can easily imagine moving (or passing copies of) values to some reference-typed actor to do some fancy computation, but if that actor has no interesting state of its own (other than its queue) we're really just needlessly serializing computations on non-overlapping data. No, in that scenario, the values should each have a distinct queue.

The second is that I think most uses will need to be able to get a synchronous snapshot. For example, you say:

Making a copy of Document is as simple as initializing a variable or appending it to an Array .
But this sort of copy is an asynchronous snapshot; we can't actually read from it synchronously if there's a pending/running operation on the value.

I take “can't read from it synchronously” to mean “can't read without potentially suspending.” I think we can read the whole document value without suspending, as noted above, but reading its parts would indeed potentially suspend. But it's an actor-or-whatever, so that's fine.

The UI presumably wants to be able to synchronously walk the document to decide what to render.

No, I don't expect to be able to observe the interior structure of an actor-or-whatever without potentially suspending.

So I actually need a DocumentSnapshot type that reflects the underlying value of my Document value-semantics actor; and once I have that,

You don't need DocumentSnapshot exactly, but you might think this was the moral equivalent: you probably do want some large-scale read operations on Document that can extract a substantial amount of data so the UI isn't negotiating with the actor-or-whatever over every byte. An application like Photoshop would design the structure of such a document very carefully, probably breaking large images down into sub-actors that can be manipulated in parallel.

it's unclear why I can't just keep Document as a value type, with an ordinary reference-type actor that manages the canonical copy, applying edits and so forth, and which periodically updates the UI with the latest snapshot.

The same reason we don't want to make everyone do array mutation through an enclosing ArrayBox class. Of course you can do that, but now the actor can (and surely will) be shared, which is just as bad as having a reference-typed Document in the first place. Worse, in fact, because now even TSan won't tell us when two threads are operating under the illusion that each has complete ownership of an independent DocumentActor.

3 Likes

That's correct. I mean, that's the whole point of values really.

IIUC the await on your first mutation means that your code is suspended until that mutation completes, so the second access is not overlapping with the first.

You're describing an optimization to reduce the number of times the UI would go back-and-forth with the document. That doesn't change the fact that the UI would not be able to show anything from the document if a mutation was running on it. To not just go blank, the UI would have to remember the last document value that didn't have a mutation active, which seems like it's reinventing snapshots.

Why would the actor be shared? You wouldn't write code that worked with a DocumentActor; all the code for manipulating documents would be written as ordinary value-type manipulations on a Document, and the actor which maintained the program's notion of the current document would be a private detail of the UI.

4 Likes

I think you misread the second line to be the same as the first. The second line is using values within the closure, which triggers the exclusivity error. I was trying to demonstrate that while it's possible to put the operation on a different actor, the exclusivity rules prevent using the same value in another place and I think a value actor should behave the same. As I understand it we agree on that though.

This helps; I think I'm getting a better understanding now. I played around with it a bit and created something that kind of simulates the behavior you're after IIUC. The use site looks as follows:

var doc = ValueActor(Document()) // ValueActor, for lack of a better name
await doc.schedule { $0.gaussianBlurSelection() }
await doc.schedule { $0.brighten(1.2) }
let tinyBlurredAndBrightened = await doc.get { $0.thumbnail }
For anyone interested, the full implementation

Note: This works on swift-5.5-DEVELOPMENT-SNAPSHOT-2021-07-23-a-ubuntu20.04. Since many things are constantly changing at the moment, it might not work on your toolchain.

class ValueWithOperationQueue<T> {
    private var continuation: AsyncStream<(inout T) -> Void>.Continuation!
    
    init(_ value: T) {
        let stream = AsyncStream<(inout T)->Void> { self.continuation = $0 }
        Task {
            var value = value
            for await operation in stream {
                operation(&value)
            }
        }
    }
    
    func withValue(_ f: @escaping (inout T) -> Void) {
        self.continuation.yield(f)
    }

    deinit {
        continuation.finish()
    }
}

struct ValueActor<T> {
    private var state: ValueWithOperationQueue<T>

    init(_ value: T) {
        self.state = ValueWithOperationQueue(value)
    }

    var value: T {
        get async {
            return await withCheckedContinuation { continuation in
                state.withValue { value in
                    continuation.resume(returning: value)
                }
            }
        }
    }

    private mutating func ensureUniqueState() async {
        if !isKnownUniquelyReferenced(&state) {
            self.state = ValueWithOperationQueue(await self.value)
        }
    }

    mutating func schedule(_ f: @escaping (inout T) -> Void) async {
        await ensureUniqueState()
        state.withValue(f)
    }

    mutating func get<U>(_ f: @escaping (inout T) -> U) async -> U {
        await ensureUniqueState()
        return await withCheckedContinuation { continuation in
            state.withValue { value in
                continuation.resume(returning: f(&value))
            }
        }
    }
}

struct Document {
    var value = 0.0
    mutating func gaussianBlurSelection()    { value += 10 }
    mutating func brighten(_ factor: Double) { value *= factor }
    var thumbnail: Double                    { -value }
}

Task {
    var doc1 = ValueActor(Document())
    await doc1.schedule { $0.gaussianBlurSelection() }
    var doc2 = doc1
    await doc1.schedule { $0.brighten(1.2) }
    await doc2.schedule { $0.gaussianBlurSelection() }

    let tinyBlurredAndBrightened1 = await doc1.get { $0.thumbnail }
    let tinyBlurredAndBrightened2 = await doc2.get { $0.thumbnail }

    print(tinyBlurredAndBrightened1) // -12.0
    print(tinyBlurredAndBrightened2) // -20.0
    print(await doc1.value) // Document(value: 12.0)
    print(await doc2.value) // Document(value: 20.0)
}
3 Likes

You say that as though it's not exactly my intent. :wink: We don't want to show users a partly-mutated document state; Photoshop won't do that today. Put another way, the observability of partly-complete mutations is one of the many problems with reference semantics; we'd like to avoid doing more of that.

To not just go blank, the UI would have to remember the last document value that didn't have a mutation active, which seems like it's reinventing snapshots.

I don't think I understand what you mean by “reinventing snapshots.” Values are copyable; snapshots are just copies. Note that in applications operating on large-scale data, independent “copies” of parts of the data simply need to exist in order to keep UIs responsive. For example, Photoshop is able to work on an image of the surface of Mars at 1m resolution. It can also show you the whole document at once on your iPhone and scroll around smoothly. This couldn't possibly work without having a screen-resolution rendering of the document in memory somewhere.

Why would the actor be shared?

Because ”that's what reference types do,“ implicitly.

I'm getting the sense that maybe I've failed to understand the scenario you're describing, and might appreciate some more detail if you don't mind. Here's what I had imagined you meant: there would be a synchronous Document API, and then an async copy of the API on the actor. That would lead people to understand that the actor was the thing to use because it could be accessed safely from multiple threads, and then we would be back in reference-semantics land. Reading @orobio's post makes me think maybe you just mean to have one actor API that accesses and manipulates a Document value through closures; that would solve the API duplication problem at least. Having the document be owned by the UI does seem to me like a pretty concerning layering violation, but maybe I'm not understanding what you mean there, either.

Aside: I have had the intuition that a possible programming guideline for actors as they exist today is that they should contain a single stored property having value semantics, unless you're prepared to suspend the usual assumptions about the relationships between properties and the ability to maintain an arbitrary invariant. Heh, maybe that's true of classes, too :wink:

Yeah, thanks, I see it now! I agree that would be an exclusivity error, just as though the call wasn't async. Thanks for building that prototype; I'll take a look!

Update: I'm unable to test it at the moment since I don't seem to be able to put my hands on a Monterey beta, but from what I can tell, it looks like it has exactly the semantics I was suggesting! Thanks!
Update 2 Hmm, on second thought, I'm not sure why you'd want to ensureUniqueState on get. Isn't that just a read operation?

3 Likes

I started out with two overloads of schedule. One that returns Void, which is still there, and one with a return value. So it was possible to schedule any closure that mutates the state. If it had a non-Void return value, it would wait for the result of the operation and return it; if it returned Void, it would not wait.

It seemed a bit unclear though that schedule would wait / not wait depending on the return value, so I renamed the overload with return value to get. I considered changing get to non-mutating, or even have get and mutget, but in the end I just left it as is. A non-mutating get would probably be a good idea though.

Edit:
Instead of having a non-mutating get, you might as well just use the value property:

let tinyBlurredAndBrightened = await doc.value.thumbnail
3 Likes

@Chris_Lattner3 briefly weighed in on this idea last year here, but at the time the idea that an actor-or-whatever could be genuinely useful was not floated.

1 Like

Just a brief response, I haven't processed this entire thread:

  1. In terms of terminology, I think that Dave agrees that this isn't an actor struct. I'd recommend not coming at it from that direction because that is likely to cause more heat and noise than common understanding. Actors are classically defined as having reference semantics, have a mailbox, and contain mutable state. They aren't values in the value semantic sense. I'd recommend picking a more correct term even if it is a placeholder while the ideas develop.

  2. That's ok though! The Swift concurrency model doesn't force everything to be an actor! Lots of things can interoperate and play with each other seamlessly, e.g. a concurrent hash table with bespoke locking. That's a great thing.

  3. Dave's idea seems really interesting and can probably be prototyped as a library. I'd encourage more exploration with this, even if the syntax isn't ideal. If this captures important design patterns that we want to model, and if libraries aren't "good enough" then we can definitely consider what the right direction for language support would look like.

  4. I'm a bit concerned that something along this direction would eventually want implicit copies to be async. If so, I don't think that is ever going to fit with the Swift compilation model, so it would be good to steer away from such directions.

-Chris

8 Likes

it was quite common, just nobody made much fuss about it and didn't call it "value-type semantics" back then.

php example
<?php
$a = array("hello", "world");
$b = $a;
$b[0] = "goodbye";
var_dump($a);
var_dump($b);

results:

array(2) {
  [0]=> string(5) "hello"
  [1]=> string(5) "world"
}
array(2) {
  [0]=> string(7) "goodbye"
  [1]=> string(5) "world"
}
3 Likes

It's interesting that these examples have largely been forgotten. No, I don't think it's just me; I've met many people who struggle to even conceive of of assignment and function call semantics that don't implicitly create shared mutable state. Also, the academic literature has settled on a convention that assumes reference semantics for arrays and other aggregate types, which I believe strongly shapes the way people think about what's possible in a language.

8 Likes

After sleeping on this for a while, I think this resembles promises a lot:

let a: async Document = …

var b = a // Assignment is synchronous

b.modify() // Synchronously schedule an async operation that will start at some point in the future

let c: Document = await b

vs

let a: Promise<Document> = …

var b = a // Assignment is synchronous

// Synchronously schedule an async operation that will start at some point in the future
b = b.then { old in
    var d = old
    d.modify()
    return d
}

let c = await b.get()
1 Like

indeed. swift value semantic for arrays/dictionaries is a frequent cause of eyebrow raising from my rust / c# colleagues (notwithstanding that the C structs are true value types and so are the C arrays embedded in structs).

the dark side of value semantics
a["b"].c.d[123].e.f = g
a["b"].c.d[123].e.h = i
a["b"].c.d[123].e.j = k

is this code efficient, let alone nice to write? all those dictionary/array/member lookups are not free, right?

var e = a["b"].c.d[123].e
e.f = g
e.h = i
e.j = k
a["b"].c.d[123].e = e

if this "equivalent form" efficient either? there's still dictionary/array/member lookup repetition here, albeit to a lesser extent. besides "c" and "e" can be large and not cheap to copy.

I'm surprised no one mentioned the C++ standard library where all the containers are value types. In C++ that means you have to explicitly pass them by reference anytime you don't want to create a copy (so most of the time) which makes it easy to be inefficient if you aren't paying attention. But it's harder to have unnoticed aliasing of those mutable containers, which is good for concurrency.


The most efficient way is to use inout and mutate everything in one operation:

func mutate(_ e: inout E) {
   e.f = g
   e.h = i
   e.j = k
}
// then:
mutate(&a["b"].c.d[123].e)

... a pattern some people generalize with a helper function taking a closure:

with(&a["b"].c.d[123].e) { e in
   e.f = g
   e.h = i
   e.j = k
}
3 Likes

yep, good point. so pascal, php, C(partly), C++ std library -- quite a few examples of languages with value semantic aggregate types.

thanks, this is an interesting technique indeed. the whole E's setter is called anyway but at least you don't have to call E's getter first to know the values of all other fields that need to stay intact.

No, that’ll still call the getter. The only way to avoid that would be to do a simple assignment to e. Generally I find it’s better practice to set things up so you can do that kind of whole-value assignment instead of trying to separately assign all the properties.

that's the thing, i'm not assigning all properties of E... maybe 10% of all properties, or 1%. besides the values of g/i/k might depend on any property of E, so the value of E must be known upfront and it is not possible to do just this:

a["b"].c.d[123].e = E(...)

with all value type variants (including "mutate") the whole E will be first get and then set. however the overall number of getters / setters is lower with the "mutate" variant.

code
import Foundation

var getterCount = 0
var setterCount = 0

struct A {
    subscript(key: String) -> B {
        get {
            getterCount += 1
            print("a[\"\(key)\"] getter")
            return B()
        }
        set {
            setterCount += 1
            print("a[\"\(key)\"] setter")
        }
    }
}

struct B {
    var c: C {
        get {
            getterCount += 1
            print("B.c getter")
            return C()
        }
        set {
            setterCount += 1
            print("B.c setter")
        }
    }
}

struct C {
    var d: Ds {
        get {
            getterCount += 1
            print("C.d getter")
            return Ds()
        }
        set {
            setterCount += 1
            print("C.d setter")
        }
    }
}

struct Ds {
    subscript(index: Int) -> D {
        get {
            getterCount += 1
            print("d[\"\(index)\"] getter")
            return D()
        }
        set {
            setterCount += 1
            print("d[\"\(index)\"] setter")
        }
    }
}

struct D {
    var e: E {
        get {
            getterCount += 1
            print("D.e getter")
            return E()
        }
        set {
            setterCount += 1
            print("D.e setter")
        }
    }
}

struct E {
    var f: Int {
        get {
            getterCount += 1
            print("E.f getter")
            return 0
        }
        set {
            setterCount += 1
            print("E.f setter")
        }
    }
    var h: Int {
        get {
            getterCount += 1
            print("E.h getter")
            return 0
        }
        set {
            setterCount += 1
            print("E.h setter")
        }
    }
    var j: Int {
        get {
            getterCount += 1
            print("E.j getter")
            return 0
        }
        set {
            setterCount += 1
            print("E.j setter")
        }
    }
}

func test1() {
    print("........................................\nTest1:")
    var a = A()
    
    getterCount = 0
    setterCount = 0
    a["b"].c.d[123].e.f = 1
    a["b"].c.d[123].e.h = 2
    a["b"].c.d[123].e.j = 3
    print("getters: \(getterCount), setters: \(setterCount), total: \(getterCount + setterCount)")
}

func test2() {
    print("........................................\nTest2:")
    var a = A()
    
    getterCount = 0
    setterCount = 0
    var e = a["b"].c.d[123].e
    e.f = 1
    e.h = 2
    e.j = 3
    a["b"].c.d[123].e = e
    print("getters: \(getterCount), setters: \(setterCount), total: \(getterCount + setterCount)")
}

func test3() {
    
    func mutate(_ e: inout E) {
       e.f = 1
       e.h = 2
       e.j = 3
    }

    print("........................................\nTest3:")
    var a = A()
    getterCount = 0
    setterCount = 0
    mutate(&a["b"].c.d[123].e)
    print("getters: \(getterCount), setters: \(setterCount), total: \(getterCount + setterCount)")
}

test1()
test2()
test3()
results

........................................
Test1:
a["b"] getter
B.c getter
C.d getter
d["123"] getter
D.e getter
E.f setter
D.e setter
d["123"] setter
C.d setter
B.c setter
a["b"] setter
a["b"] getter
B.c getter
C.d getter
d["123"] getter
D.e getter
E.h setter
D.e setter
d["123"] setter
C.d setter
B.c setter
a["b"] setter
a["b"] getter
B.c getter
C.d getter
d["123"] getter
D.e getter
E.j setter
D.e setter
d["123"] setter
C.d setter
B.c setter
a["b"] setter
getters: 15, setters: 18, total: 33
........................................
Test2:
a["b"] getter
B.c getter
C.d getter
d["123"] getter
D.e getter
E.f setter
E.h setter
E.j setter
a["b"] getter
B.c getter
C.d getter
d["123"] getter
D.e setter
d["123"] setter
C.d setter
B.c setter
a["b"] setter
getters: 9, setters: 8, total: 17
........................................
Test3:
a["b"] getter
B.c getter
C.d getter
d["123"] getter
D.e getter
E.f setter
E.h setter
E.j setter
D.e setter
d["123"] setter
C.d setter
B.c setter
a["b"] setter
getters: 5, setters: 8, total: 13

result with reference type

........................................
TestClass:
a["b"] getter
B.c getter
C.d getter
d["123"] getter
D.e getter
E.f setter
E.h setter
E.j setter
getters: 5, setters: 3, total: 8

Sounds like you have a good reason to use the not-officially-in-the-language _modify:

var storage: D
var d: D {
  get { storage }
  _modify { yield &storage }
}

Hopefully that feature will land officially some day.

3 Likes