[Proposal] Distributed Actor Isolation

ktoso · November 13, 2021, 2:14am

Hello everyone,
We would like to share with you the first (of many) proposals relating to distributed actors: Distributed Actor Isolation.

This proposal is focusing only on the isolation rules necessary to support distributed actors, and is split out from the large overall Distributed Actors pitch. Our intent is to propose the various pieces of that very large pitch, as individual yet interconnected proposals, similar to how Swift Concurrency was introduced last year. This way we hope to keep the amount of content reviewable, and also discussions focused on the specific topics at hand.

This isn't the official review thread yet, but we wanted to share the proposal early so there is more time to iron out any potential feedback before the official Swift Evolution review.

Soon we will also share other related proposals regarding the distributed actor runtime details, as well as our new approach to serialization, which allows for good support for Codable, but also, supporting any alternative serialization approaches you might be interested in, without the need for any source-generation.

Please look forward to those upcoming proposals, and let's try to focus on only the isolation aspects in this proposal thread - thanks in advance!

Read the full proposal text here.

Distributed Actor Isolation

If you see any typos or similar issues, please comment or fix on the Swift Evolution PR over here: https://github.com/ktoso/swift-evolution/blob/distributed-isolation/proposals/mmmm-distributed-actor-isolation.md rather than in this thread, so we can keep the thread focused on semantics discussions.

Please feel free to also use the new Distributed Actors forums category to ask any questions which may be related to this work, but don't quite fit the specific topic under review.

Thanks in advance for your time and feedback!

Douglas_Gregor · November 18, 2021, 1:47am

Hello,

Splitting out the semantics of distributed actor definition and isolation makes a lot of sense to me. I have a bunch of little comments, but all of this is looking very good to me.

In the early discussion about remote and distributed actors, shouldn't the read and write operations be distributed but neither async nor throws?

distributed actor TokenRange {
  let range: (Token, Token)
  var storage: [Token: Data]
  
  init(...) { ... }
  
  distributed func read(at loc: Token) -> Data? {
    return storage[loc]
  }

  distributed func write(to loc: Token, data: Data) -> Data? {
    let prev = storage[loc]
    storage[loc] = data
    return prev
  }
}

It's probably worth a quick sentence in this section to say that one must try when calling these functions outside the actor because a communication error might describe failure by throwing, and that this will be discussed further later. It's an important part of the model to keep in one's mind when reading.

The "transport" terminology seems to have been partially replaced with "distributed actor system." This is a good change, but there are a few places where the term "transport" is still used. It's worth a quick search-and-replace.

Thus, access to a distributed actor's stored properties from outside of the actor's isolation are forbidden. In addition, computed properties cannot be nonisolated or participate in a key-path. We will discuss computed properties later on.

I don't think that latter part is true: computed properties can be nonisolated. I think what you want to say there is that stored properties cannot be accessed from outside the actor, even if they are immutable, and can not be nonisolated.

Distributed functions may be subject to additional type-checking. For example, in a future proposal we will discuss the serialization aspects of distributed method calls, where we will discuss how to statically check and enforce parameters and return values of distributed methods are either Codable , or conforming to some other marker protocol that may be used by the distributed actor runtime to serialize the messages.

I think this proposal would be more self-contained if we could bring a description of that type checking into this proposal, so we know the full set of requirements on a distributed method.

Remote actor references are not obtained via initializers, but rather through a special resolve(_:using:) function that is available on any distributed actor or DistributedActor constrained protocol. The specifics of resolving, and remote actor runtime details will be discussed in a follow up proposal shortly.

There are a number of places where the text refers to "any distributed actor or DistributedActor constrained protocol", and I think the text would benefit from defining a new term, distributed actor type, that describes any type that is known to conform to the DistributedActor protocol. Then, we can refer to distributed actor types throughout rather than this awkward phrase.

As a tiny, tiny nit, I'd drop the "shortly". Time is irrelevant on Swift Evolution :D.

Regarding the DistributedActor protocol, it is mentioned but never defined. I think it's important for this proposal to define this protocol, even if the follow-on proposal then extends it (e.g., with a distributed actor system member). I'll come back to the actual definition.

A distributed actor type, extensions of such a type, and DistributedActor inheriting protocols are the only places where distributed method declarations are allowed. This is because, in order to implement a distributed method, a transport and identity must be associated with the values carrying the method. Distributed methods can synchronously refer to any of the state isolated to the distributed actor instance.

Here's a good place to use the "distributed actor type" definition. A distributed method must be a member of a distributed actor type and its self must be isolated. There's no need to call out the transport or identity here.

There are two special properties that we'll discuss in the future that are accessible this way: the actor's identity, and the distributed actor system it belongs to. Those properties are synthesized by the compiler, and we'll soon explain them in greater depth in the runtime focused proposals detailing the distributed actor runtime design.

I don't think we need to call these out as special. First of all, the distributed actor system isn't really part of this proposal. Second, the identity is going to be part of the DistributedActor protocol, and can be a nonisolated computed property, so there's nothing special about it except that the compiler synthesizes the conformance of distributed actor to the DistributedActor protocol for you.

Distributed functions must be able to invoked from another process, by code from either the same, or a different module. As such distributed functions must be either public , internal or fileprivate . Declaring a private distributed func is not allowed, as it defeats the purpose of distributed method, it would not be possible to invoke such function using legal Swift.

I disagree with this restriction on private. You could have a nonisolated async throws method on the distributed actor that can call a private distributed method. distributed is orthogonal to access control.

It is not allowed to use rethrows with distributed functions, because it is not possible to serialize a closure and send it over the network to obtain the "usual" re-throwing behavior one would have expected.

As you note, the closure is the problem here, not the rethrows. I think we should remove this rethrows restriction and let the fact that a closure will not conform to Codable (or whatever) make this distributed method ill-formed.

Similarily, it is not allowed to declare distributed function parameters as inout .

I don't really see this as falling out from the closure/rethrows rule above. There are reasonable inout semantics we could implement in the semantic model, but the problem is really that we'll have trouble with the distributed actor system implementation. I think it's better to say that we ban inout because inout is not a first-class type in the type system, so we can't implement remote calls when there are inout parameters.

It is also worth calling out the interactions with Task and async let . Their context may be the same asynchronous context as the actor, in which case we also do not need to cause the implicit asynchronous effect. When it is known the invocation is performed on an isolated distributed actor reference, we infer the fact that it indeed is "known to be local", and do not need to apply the implicit throwing effect either:

I think this behavior isn't really about Task or async let. Rather, it falls out of the implied restriction that a local function or closure cannot be distributed. Therefore, if you are in a function that has an isolated parameter of distributed actor type, you know that the actor referenced by that parameter is local, and that functions/closures inside that method to that cannot end up ever being called remotely. I think this should come earlier in the section, to call out the "isolated", "local", "potentially remote" three states and how they are determined, so this "sometimes you don't need try" bit doesn't come as a surprise.

In the section on protocol conformances, the conformance of ExampleActor to Example requires that asyncThrows be distributed. I think that needs to be called out and explained more, because it's really important (and not as obvious as the other cases that are called out).

This proposal mentioned the DistributedActor protocol a few times, however without going into much more depth about its design.

Let's define it! I think it's something like this:

protocol DistributedActor: Sendable, Identifiable where Self.ID: Sendable {
  associatedtype SerializationRequirement = Codable
  nonisolated var id: ID { get }
}

Now, if we pull in SerializationRequirement, we have to explain it. I think that's a good thing, because it nicely ties up all of the semantics of distributed so the follow-on proposal on distributed actor systems can focus on that layer.

Doug

ktoso · November 18, 2021, 3:19am

Thanks a lot for the thorough review!

I agree on all points basically, thanks for the catches and suggestions.

I'll polish up the wording and restrictions as you suggested. I agree that pulling in the SerializationRequirement and all the types we need to show a comprehensive stand-alone proposal here makes sense.

So this proposal would then have all semantic / typechecking rules, and the followup would be the details about how we deal with serialization, invocations and all those runtime details about actor lifecycle etc. That seems like a great clean split

You're right that's a wrong sentence there, thanks!

While we're here I might as well bring up a consideration I had here but am trying to not scope creep too much in here... So technically we could allow stored nonisolated properties BUT it would mean the following:

distributed actor DA { 
  nonisolated let names: ConcurrentHashMap<...>
}

DA().names.insert(...) // still bad, only distributed/nonisolated funcs/computed-vars are ok
DA().whenLocal { (da: local DA) in 
  da.names.insert(...)
}

This is the "other" whenLocal implementation... not that what we have right now, and it'd need local marking in the typesystem. This whenLocal does not need to hop to the target actor, unlike the current approach where it just hops to the local one (if local) and offers an isolated DA. So we'd still be able to access, without hops, nonisolated state on a distributed (but known to be local) actor.

I feel we'd reap a lot of benefits from the local marking, also for storing a known local one... but so far I've left it out of the proposals, mostly because trying to avoid scope creep and implementation complexity. I was thinking if we'd need this, this would be possible to relax restrictions in the future -- does
that sound right? (I.e. not proposing this right now, but if we hit enough use cases to warrant this, we could add it).

That's great, thanks -- I was looking for a good phrase for these, "distributed actor type" works well

Right, they're not really special in terms of isolation checking.

Ah that's a great catch, we overthought this it seems. I'm very happy that distributed after all is once agian back to being completely orthogonal to access control, it makes a lot of sense

Fixed: [Distributed] After all, a private func may be distributed by ktoso · Pull Request #40237 · apple/swift · GitHub

kirilltitov · November 18, 2021, 10:59pm

Great proposal. However, I couldn't really get from the proposal text what exactly a distributed method is. I somewhat carefully read the whole text, but it didn't help. It could be explained by my general stupidity or late night reading after work day. However, the original proposal text turned out to be quite clear and unambiguous.

Maybe you should clarify it in the proposal text, I know it for myself that after numerous rewrites and edits such things become somewhat obvious — in author's head, but not for people who read it without prior context.

ktoso · November 18, 2021, 11:11pm

Thanks for the feedback, I'll see how to improve wording here

ktoso · November 24, 2021, 1:59pm

Thank you everyone for the feedback!

I did address the feedback received on and off-thread, and updated the proposal to 1.3, it is mostly additions of things we discussed before, but now in a formalized manner.

You can review just the changes made by checking out this diff: https://github.com/ktoso/swift-evolution/compare/aeb9b14291d2db27c2b454316c47d8c6aab91ded...distributed-isolation

We decided to pull into this proposal all of the static checks related to isolation. This includes the SerializationRequirement (i.e. how we can enforce Codable parameters, but keep it open for extension for other serialization formats), as well as more discussion about the implicit effects that distributed functions have etc. We also discuss some of the future direction regarding versioning of distributed actors which I'm sure plenty people are interested in.

Quoth the changelog:

- 1.3 More about serialization typechecking and introducing mentioned protocols explicitly 
  - Revisions Introduce `DistributedActor` and `DistributedActorSystem` protocols properly
  - Discuss future directions for versioning and evolving APIs
  - Introduce conditional Codable conformance of distributed actors, based on Identity
  - Discuss `SerializationRequirement` driven typechecking of distributed methods
  - Discuss `DistributedActorSystem` parameter requirement in required initializers
  - Discuss isolation states in depth "isolated", "known to be local", "potentially remote" and their effect on implicit effects on call-sites

As usual, thanks in advance for any questions and ideas you might have -- this work has matured significantly and I'm very happy with where it's heading.

We're going to post a "runtime" focused proposal next, but this proposal as it stands is very much a standalone piece of value that is useful by itself as it explains a complete distributed-isolation checking model. We would like to proceed to reviewing the proposal soon, so if there's anything else we should hash out or people feel is missing here, please let us know

Douglas_Gregor · November 25, 2021, 6:23am

Yes, that sounds right. This feels a lot like SE-0313, where we took a lexical property (self in a method is isolated) and generalized it to be part of the type system with isolated parameters.

Just a few comments on revision 1.3 while I'm here:

Stored properties cannot ever be access from outside the distributed actor. They cannot be declared distributed nor nonisolated.

Per the discussion above, we don't have to be this restrictive. We cannot access stored properties from another node, but we could access them from outside the actor if we know we're still on the same node. Here's a silly example:

distributed actor Counter {
  var count = 0
  func publishNextValue() {
    count += 1
    Task.detached { @MainActor in
       ui.countLabel.text = "Count is now \(await self.count)"
     }
  }
}

Because we're in a closure defined in a lexical context where self was isolated, we know that self is on the local node, so it's okay to read one of its stored properties (asynchronously).

I think the restriction you want is that stored properties cannot be accessed if the actor isn't known to be local. Or, you can write this in terms of the more general rule governing the addition of effects: if we would need to add the throws effect to read the stored property, then the access is ill-formed.

Thanks for adding the DistributedActor protocol! A couple of minor things:

The associated type name DistributedActorSystem is really long. How about just ActorSystem here (since we're already in the DistributedActor protocol), and then DistributedActorSystem can remain the name of the protocol?
The type alias Identity plays exactly the same role as the type ID from the Identifiable protocol. Should we use ID here everywhere rather than having two names for the same thing?
The property id should have type Identity (or ID if you take my suggestion immediately prior to this), not ActorIdentity.

For the DistributedActorSystemProtocol, a few comments:

As noted above, I think this should be named DistributedActorSystem)
You can default SerializationRequirement to Codable in the definition, e.g., associatedtype SerializationRequirement = Codable

The standard library provides a type-eraser called AnyDistributedActorSystem for this purpose, and distributed actors default to this type of actor system.

While I can see why AnyDistributedActorSystem can be useful (e.g., to decide between different actor system implementations dynamically), I don't think we should use it as a default, because it pushes folks toward a less-efficient, type-erased implementation. Rather, I'd expect each library to provide a DefaultDistributedActorSystem so there is a good default tailored to the libraries you use. One can then opt in to type erasure.

As an aside, AnyDistributedActorSystem either needs to be given a complete API or its mention should be removed.

This...

protocol ClusterActor: DistributedActor {
  typealias DistributedActorSystem = ClusterSystem
}

would be better expressed as

protocol ClusterActor: DistributedActor 
  where Self.DistributedActorSystem == ClusterSystem {
}

A distributed actor's designated initializer must always contain exactly one DistributedActorSystem parameter. This is because the lifecycle and messaging of a distributed actor is managed by the system.

Hmm. Should we recognize this parameter because it has the same type as DistributedActorSystem, or should we also require the argument name to match (e.g., it must be called actorSystem)? I have a vague preference for the latter, because it's easier to syntactically match, and type identity is not always an obvious thing... one could turn an Int into a distributed actor system and make it very, very annoying write initializers that work.

I don't think I understand why we're making the designated vs. convenience initializer distinction here, especially when part of the discussion of SE-0327 is about eliminating this distinction.

If a distributed actor's Identity conforms to Codable, the distributed actor automatically gains a Codable conformance as well.

I'm struggling a bit to understand how this specific use of Codable relates to SerializationRequirement. Should it be the case that a distributed actor should implicitly conform to the protocols in SerializationRequirement whenever its Identity type conforms to the protocols in SerializationRequirement? The actual implementation of the requirements in those protocols might end up being synthesized (they will for Codable) or could come from some implementations in a protocol extension. Is that the general rule here, or is Codable special for some reason?

Doug

ktoso · November 25, 2021, 7:47am

Right, I feel we'll want to revisit this but leaving it out of the proposals for now.

It actually plays fantastically with:

I didn't lift that restriction yet because I felt we needed local to make use of it. Specifically we'd be able to implement the "better" when local, and allow for this:

distributed actor Counter {
  // ok, allow declaring nonisolated properties:
  nonisolated let counter: AtomicInt // silly but valid nonisolated use
}

func test(c: Counter) { 
  c.counter // error: distributed actor-isolated property, even though noniso
  c.whenLocal { (c: local Counter) in // no async hop here (!)
    counter.increment()
  }
}

// but also... I wonder if we could even allow the following:
func test(system: DistributedActorSystem) {
  let c: local Counter = Counter(system: system)
  c.counter // OK, we know it's local
  func test(c)
}

func test(c: Counter) {
 // lost the `local Counter` so usual isolation applies.
}

I can see this be very useful in testing.

I agree with your statement here:

Yeah that's right, I think I can reformulate this like that and we'll reimplement how nonisolated is interpreted to match this, I think that's a much better model than just banning nonisolated on stored properties

While we're mentioning that local keyword, I think it would also play well with protocol conformances actually... We can conform to such a thing on our "local" side, but we cannot actually call it unless the base of the call is known local:

protocol PA {
  func hello() async 
}

distributed actor DA: PA {
  // it's not throwing in the protocol, so we cannot conform to it using
  // distributed func, but it could totally be conformed to by the "local" side...
  local func hello() { ... } 
}

I wonder if this could be legal. It seemed to me like it could, but I have not dived very deep into thee local conformance question

I'll reword the nonisolated restriction into the rule stated above for now, thanks!

Hah, I struggled with naming here to be honest, so thanks for another pair of eyes on it! I was a bit worried about "just" ActorSystem for two reasons 1) it sounded like the normal actors don't have an "actor system" but semantically all actors really form systems, just that here we need the explicit type; 2) I guess I'm still attached to the ActorSystem meaning "the cluster system" but that's my historical attachment and a mistake to stick to it

Re-reading all this again: I think you're right, associatedtype ActorSystem should work well. Thanks for chiming in.

Yes, that's right, it's fulfilling the exact role as Identifiable's ID really.

And I took a stab at hiding the ActorIdentity protocol entirely in this proposal already, we only have associatedtype Identity: Hashable & Sendable the previously known protocol ActorIdentity is gone.

I'll admit I was reluctant about this because how (subjectively) it made actor system APIs look ugly (system.assignID, system.resignID, system.decodeID), but "consistency is king ", so honestly there's only one right thing to do here: make it ID everywhere.

Whoops thanks, that's a leftover, I thought i got rid of all of ActorIdentity, I'll grep through again as I change everything to ID.

Sounds good on both

Good point, and very true -- people will usually use exactly one type of system. Maybe two if they know exactly what they're doing.

I have one use case in mind where there will be two actor systems, but really they're handling the same transport, just that one of the systems assigns a Codable ID and the other one not. This will allow us to opt-in certain actors for "sending them around" which is a thing we want to have tight control over in XPC for example.

Given the above, I think let's better remove it completely for now. It could make a comeback, we'll see.

Ah, interesting one... will do

This is done by type indeed, and I had hoped to keep that, it is nice to be able to say:

Worker(id: 123, on: nodeA)
// init(id: Int, on system: ClusterSystem) { ... }

True about making random types conform to DistributedActorSystem which then makes it messy, but in reality is this something to really worry about? I think not, but perhaps you've seen some cases indicating otherwise in the past?

Ah yes, the reason is deeply intertwined with actor initializers, and also Kavon's recent work. I didn't dive into the depths of that in this proposal as it can get very long... but I'll try to summarize and put a short version into the proposal as well.

This is because the point at which an actor is "fully initialized" must immediately invoke system.actorReady(self), and the point where "fully initialized" happens, is only in designated initializers. So what we mean by "convenience" here is really only "an init that does delegate, to an init that does not delegate", since the non-delegating initializer is at the root of the initialization and is the one which will cause both the actor to become fully initialized (with all the isolation implications Kavon is working on), as well as ready-ed (meaning it may begin receiving incoming messages).

In that sense we really only care about "non delegating initializer" which is the one where the id = system.assignID(...) and system.actorReady(self) calls must be made. Delegating initializerswhich in practice are spelled as convenience initializers. We'd love to not have to say convenience because it doesn't matter for actors at all (because no inheritance), but if they'll continue to exist or not depends on SE-0327.

The plan I had in mind was indeed specialized for Codable only.

Codable is special in one way: the decoding has to be implemented via an initializer (Decodable.init(from:) -> Self) and that is not possible to implement using our "resolve" capability in present day Swift. Specifically this piece:

// distributed actor Player: Codable, ... {
  nonisolated public init(from decoder: Decoder) throws {
    // ... 
    let id: Identity = try system.decodeIdentity(from: decoder)
    self = try Self.resolve(id, using: system) // !!!

Because actors are reference types, like classes, today they hit the following restriction here: error: cannot assign to value: 'self' is immutable. So what we do for distributed actors is to just lift this restriction for this initializer because this actually works, but is just blocked at the typesystem level.

@Joe_Groff actually worked on this a while ago Allow `self = x` in class convenience initializers and as I discussed this with him a while ago it basically works, but we'd need a separate SE proposal to unlock this for all types. I was hoping to do this separately, and then every class/actor could implement such things.

Other serialization mechanisms can do some func static func deserialize(as:) -> Self which would work fine.

Should it be the case that a distributed actor should implicitly conform to the protocols in SerializationRequirement whenever its Identity type conforms to the protocols in SerializationRequirement ?

It does not really have any relation to SerializationRequirement, it just so happens that an actor system transport that makes use of SerializationRequirement = Codable and makes the ID: Codable, effectively is saying "actors which I manage, and assign my ID to, are Codable, and therefore can be passed to distributed methods".

Though you're right now that I re-read this that other serialization mechanisms would be forced into a more annoying adoption route, they would have to do:

protocol CoolActor: CoolMessage, DistributedActor {
  func toCoolMessage() throws -> CoolMessage
  static func fromCoolMessage(message: CoolMessage) throws -> Self
}

as we can't do the extension DistributedActor: CoolMessage where ID: CoolMessage {} trick...

I'll be giving this piece more thought now as I'm working through the runtime with the new serializaiton calls that @xedin has been working on

I don't think we can say "all protocols that the ID conforms to the actor does as well" that seems too random, the ID could be conforming to all kinds of things after all... If we needed to open this up I wonder if that'd be another typealias...? That would be unfortunate additional complexity, but I also think this can remain out of scope until proven necessary perhaps?

ktoso · November 25, 2021, 8:20am

I think re ID we might do:

protocol DistributedActorSystem: Sendable {
  associatedtype ActorID: Hashable & Sendable
}

protocol DistributedActor: Identifiable { 
  associatedtype ActorSystem: DistributedActorSystem
  typealias ID = ActorSystem.ActorID
}

Felt a bit weird saying raw ID on the actor system, as it is the IDs it assigns to the actors, not to itself.

ktoso · December 3, 2021, 12:26pm

Thanks again for the comments -- I have updated the minor pieces we discussed here.
Summing up the primary ones:

private distributed is allowed; distributed is completely orthogonal to access control <3
Codable synthesis is explained a bit, and also the reason with the init; hopefully we can do the mutable self just as a follow up proposal as well
Allowing distributed computed properties
typealiases and protocols changed names -- this reads very good now (!)
strictly using system and ID wording
Any... wrappers are gone, we dont need them -- everything is concrete types thanks to typealiases and associated types
serialization requirement defaults to Codable, but can be changed
property access got more examples, we talk about "known to be local"
we discuss the throwing implicit effect earlier in the document
slight refactors to be more in-line with upcoming proposal #2 that will discuss the runtime; but not requiring any of it; the proposal is very much stand-alone

I called this just 1.3.1 really, since it's just small cleanups.

Hoping to kick off a review very soon, and share the runtime writeup as pitch as we do so