Mangled names in SE-0344: Distributed Actor Runtime

bobergj · February 23, 2022, 12:33pm

@ktoso I am starting this separate discussion thread to avoid derailing the proposal review thread.

In the proposal under review there's a RemoteCallTarget struct: swift-evolution/0344-distributed-actor-runtime.md at main · apple/swift-evolution · GitHub

public struct RemoteCallTarget: Hashable {
  /// The mangled name of the invoked distributed method.
  /// 
  /// It contains all information necessary to lookup the method using `executeDistributedActorMethod(...)`
  var mangledName: String { ... }
  
  /// The human-readable "full name" of the invoked method, e.g. 'Greeter.hello(name:)'.
  var fullName: String { ... }
}

Looking at mangledName and swift_func_getParameterTypeInfo in the draft implementation here:

github.com

apple/swift/blob/e94aff3ba719e5deffd82f8d71ffe68d971db475/stdlib/public/Distributed/DistributedMetadata.swift#L59-L67


      
          @available(SwiftStdlib 5.7, *)
          @_silgen_name("swift_func_getParameterTypeInfo")
          public // SPI _Distributed
          func __getParameterTypeInfo(
              _ typeNameStart: UnsafePointer<UInt8>, _ typeNameLength: UInt,
              _ genericEnv: UnsafeRawPointer?, // GenericEnvironmentDescriptor *
              _ genericArguments: UnsafeRawPointer?,
              _ types: Builtin.RawPointer, _ typesLength: Int
          ) -> Int32

and here:

github.com

apple/swift/blob/e6a1e23a9f29b07b0f395ca3e0e137dc67f458e5/stdlib/public/runtime/MetadataLookup.cpp#L2040-L2047


      
          SWIFT_CC(swift) SWIFT_RUNTIME_STDLIB_SPI
          unsigned
          swift_func_getParameterTypeInfo(
              const char *typeNameStart, size_t typeNameLength,
              GenericEnvironmentDescriptor *genericEnv,
              const void * const *genericArguments,
              Metadata const **types, unsigned typesLength) {
            if (typesLength < 0) return -1;

I interpret this as mangledName will always be a mangled name per the Swift ABI mangling as documented here: swift/Mangling.rst at main · apple/swift · GitHub ?
Then, does it follows that the message, contained in the envelope sent to the remote actor node, will always contain this mangled name string*?

*although in a form chosen by the specific actor system serialization

ktoso · February 23, 2022, 12:52pm

That’s actually very on topic for the review

So the RemoteCallTarget type is only given to the “local” (the one making the remoteCall side). It is indeed the mangled name of the function (or computed property).

You don’t really have to transfer the actual string over the network though. You could rely on shared knowledge (be it exchanged lookup tables, or just “I know the other process is exactly the same” or other techniques like “only send the string the first time, then use an ID for it”) to avoid sending the string around all the time.

It is true and annoying that this name changes eg if you changed the type of a parameter from a class to a strict — even though for all of the serialization infra it all would still work and be the same, eg if you used Codable on that type. But the target would have changed so it’s not great for protocol evolution.

That’s the reason why we don’t pass a raw mangled name to the remoteCall though but this struct. We could, and I hope we will, offer a better identification scheme in the future.

Some implementations may send the “hello(name:)” (that is what the fullName is) string as target identifier! This makes it impossible to support function overloads — but an actor system could say “we don’t support distribute function overloads because we care about wire compatibility more” etc. so it is pretty flexible in what implementations may choose to support.

bobergj · February 23, 2022, 1:52pm

You don’t really have to transfer the actual string over the network though. You could rely on shared knowledge (be it exchanged lookup tables, or just “I know the other process is exactly the same” or other techniques like “only send the string the first time, then use an ID for it”) to avoid sending the string around all the time.

What's going to be the best practice for an DistributedActorSystem though? I can imagine there's gonna be one or more default actor system implementations from Apple or the Swift project, how are they recommended to implement this part of the protocol?
If best practice is going to be relying on shared knowledge rather than sending the mangled name around, I don't see why the mangledName and swift_func_getParameterTypeInfo has to be part of the design at all. What do I mean by that? Well imagine an alternative design where the distributed functions must always be annotated:

distributed actor MyDistributedActor {
  @methodIdentifier("do-cool-stuff") // a unique identifier on this actor
  distributed func doCoolStuff() { ... }
}

then the compiler could synthesize that shared lookup table, and the design wouldn't be tied to the Swift ABI at all. I can imagine you've considered this alternative, but I am mentioning considering I don't see anything like that under Alternative Considered in the proposal.

eg if you used Codable on that type

Would you ever want to do that though, that is, just encode what's in RemoteCallTarget with Codable?
The ABI mangling isn't a terribly efficient representation for neither inter-process nor network transport.

Unrelated to mangling, but related to the serialization:
I find DistributedTargetInvocationEncoder and DistributedTargetInvocationDecoder are very much tied to the Swift function calling convention. It's difficult (impossible?) to implement an invocation encoding scheme where the distributed actor is implemented in another language than Swift. Then, why would it actually be useful for developers to customize the encoding to use, eg. protocol buffers?
The proposal touches on this in alternatives considered:

Hardcoding the distributed runtime to make use of Codable
Codable is a great, useful, and relatively flexible protocol allowing for serialization of Swift native types, however it may not always be the best serialization system available. For example, we currently do not have a great binary serialization format that works with Codable
...

But what about actually "fixing" Codable, and then have a single built-in distributed actor serialization that is actually efficient?

ktoso · February 24, 2022, 9:20am

Hm, it's not really about a "best practice" as different system implementations have vastly different design goals. Some may assume "same binary on both ends", some may not. Some may want to spare every possible bit on the wire, and some don't care since payloads will out-weight the identifiers every time considerably. At every approach there are tradeoffs, and the goal of the language is to allow those system implementations to make those tradeoffs.

I'm aware of a few very different implementations, all of which would make quite different tradeoffs here. Either way, it is the goal of the proposal under review to allow for this flexibility because the implementations will learn in practice what tradeoffs they must take here. This may sound a bit hand-wavy, but I'm actively engaged with a number of use-cases we have in mind here...

Having that said, a "good default, that works in simple things" is the mangled names as they allow the most user friendly "i can call anything i can expose" semantics.

Currently we have open sourced and are focused on the peer to peer server-side focused cluster implementation: Swift.org - Introducing Swift Distributed Actors It'll use mangled names for now*

* continue reading why "for now"

bobergj:

If best practice is going to be relying on shared knowledge rather than sending the mangled name around, I don't see why the mangledName and swift_func_getParameterTypeInfo has to be part of the design at all. What do I mean by that? Well imagine an alternative design where the distributed functions must always be annotated:
distributed actor MyDistributedActor {
  @methodIdentifier("do-cool-stuff") // a unique identifier on this actor
  distributed func doCoolStuff() { ... }
}

No, not all systems can rely on shared knowledge.

Yes, such "user provided stable name" for methods is exactly what I'd like to get to in future work on this feature. We called this exact annotation "stable name" in some discussions

It is an additional improvement and logical next step to improve the versioning story of the model to allow people to use compressed names etc. There's a lot to polish about versioning, but such "stable name" and other techniques is exactly what we're hinting at in the Future Directions of the previous (SE-0336) proposal here: (https://github.com/apple/swift-evolution/blob/main/proposals/0336-distributed-actor-isolation.md#future-directions.

I don't that a model that "always must give stable names to things" can be the default. A great getting started experience is important for this feature; and same as with Akka and my previous work there, it is important to get people wrap their heads around "aha, this is how it can work", and only once they need to go stable, bother them with many "before you go to prod" additional settings, tweaks etc (one such thing could be "do stable names").

To address this explicitly: it is not an "alternative considered" per se, because such optional attribute I still think is something we'll want for our improved versioning story -- which is a future direction and mentioned a little bit in the previous proposal.

We could add to alternatives considered "always force setting stable names" I guess; but not sure process wise, since that's the previous proposal.

The "that type" in my sentence may have been ambiguous and mislead you there. I was calling out why the ABI function name sucks for wire formats, and that we indeed will want to offer something better (again, see versioning "future directions").

By that type I meant that:

caller has class X,
caller is system "v1", knows of distributed method hello(x: X)
recipient actually has changed X to a struct, struct X
caller calls `hello(x:)
- target identification is naive, and uses the mangled name
recipient cannot locate the target handle because it is registered under a function mangled name that includes the struct X, and not class...

Because all of the interaction was made over the network... and e.g. X was put through Codable on both sides... we literarily don't care if it's a struct or not, we just want to know "yeah, it's that X" but the Swift ABI is too strict for that.

So that's the situation I was pointing at how the ABI names are too strict.

We are very aware of this and will be working on improvements in future versions to add other encodings to the RemoteCallTarget so this way systems can use them in the future when able to; and without us breaking API/ABI of the distributed actors feature.

It absolutely is possible to implement calls to non-swift, although it is not a primary goal of this design.

The only piece which is, as you say, "tied to swift calling convention" is the generic substitutions. And you would not really be able to express the exact method overload semantics as swift in another language, so... I don't think that matters -- you would not support complex generics when calling into a different language and that's it I guess.

As for the rest, the encoder is rather boring:

for every argument, recordArgument
record return type
record error type

There's nothing "super tied to Swift" here -- that's just how function invocations look like

On that note though, and on improving the simplicity to call other languages:

I am pondering the addition of the name parameter to:

func recordArgument<Arg: SerializationRequirement>(
  name: StaticString,  <<< we could add that...
  argument: Argument
) throws

because it would make it simpler to stuff arguments into a hash with the parameters named... Not sure if it's necessary, but the implementation and performance costs of that are very low so I want to float the idea. It is something @Slava_Pestov suggested while we were reviewing the API and I quite liked the idea.

We are interested in "fixing" Codable, more than you probably think we are even :-)

But it is not something we can block the distributed actor feature on. And neither would Codable, even a "fixed one" be acceptable for some use-cases we are interested in -- which may even avoid traditional serialization per-se, and used shared memory and other techniques...

So all in all, I see your points, but I think we're on track on a road towards all the perfect things, but it'll take time to get there. Stable names and things like those arrived at competing implementations many years after the initial versions, so we're not alone in this game of catch up hah.

ktoso · February 24, 2022, 9:47am

Ok I realized the "stable name" is not called out explicitly in the linked to future work section, but it's definitely one of the pieces... That said, it's not fully designed yet but yeah -- definitely some form of such mechanism would be good for versioning.