Exceptional work so far! I cannot wait to have distributed actors in the language. But I really want to dive in here on the ABI
front.
Runtime ABI Requirement (Mangled Names)
TL;DR - Synthesized CodingKey
-like protocol but for explicit method lookup on distributed actor
types, regardless of version.
Problem Space
It seems we're looking at a few particular issues here:
1.) Usage of API that may bind a distributed actor
implementation to an explicit ABI
, and;
2.) Future desire for Version
ing across Nodes/Clusters/DistributedActorSystem
Let's address these in reverse order, it is a lot but I promise we'll get somewhere.
2.)
As someone with an Xserve
cluster still in active use, running OS X
10.11
El Capitan
, I can say beyond a shadow of a doubt-- Version
ing is a key part to making Server
/Client
functionality future-proof. Certainly, while thinking about it now serves a valuable purpose of planning-ahead; it is beyond the scope of this proposal, so I suggest we not make an ABI
decision based upon it.
With explicit ABI-specific Mangled Name
usage (anywhere in distributed actor
s) we actually hurt a potential future where we want true Version
ing. (@ktoso, I've actually been developing a Version
ing & Version Migration
system to share with you privately, for a potential future proposal after the distributed actor
reviews are completed. My developed approach addresses Version
ing for distributed actor
s and Codable
. -- Two Swift birds, one protocol-based stone.)
1.)
Locking implementation to an explicit ABI
by Mangled Name
boils down to being antithetical to the core tenets of Swift, these may be prudent or even necessary measures but ABI-specific Mangled Name
s and the Stringly-Typed
nature of them is to-be-avoided (where possible) in Swift. [Side Note: (Probably best for @xedin) - Should we consider more elaborate diagnostics that warn with usage of ABI-specific API in future?]
But that's actually Safety
at the core of the language. It gives the compiler insight into developer intent, removes Stringly-Typed
Dynamic Dispatch
at Runtime
, and moves compatibility issues out of the language and into the developer's source code. Will we be able to remove them all? Probably not. (Generic
s, I'm looking at you.) Can we remove them for distributed func
lookup? Entirely.
Potential Solution
Codable
uses a protocol CodingKey
to essentially represent KeyPath
s in Decoding
. And while not a one-to-one mapping the reason for their existence is clear, how can you decode
a blob of data if you don't have somewhere to start? A CodingKey
is necessary for Type-Safe
Decoding
. Likewise, distributed actor
s need a key of some kind to look up the proper distributed func
, and since (for a myriad of reasons, including the above) we should avoid use of Mangled Name
, we'll need a new type of key.
Let's call it DistributedMethodLookupKey
, we'll need to make it at least Sendable
and distributed actor
will need to implement its synthesis for each distributed func
:
public protocol DistributedMethodLookupKey /* Sendable */ {
/// UInt8 gets us up to 256 methods we can look up
/// plenty for our example
var uintValue: UInt8 { get }
init?(uintValue: UInt8)
}
Yielding:
distributed actor SimpleExample {
// ~~~ Compiler Synthesized ~~~
enum DistributedMethods: UInt8, DistributedMethodLookupKey {
case accessible
}
// Internal Function Call Machinery
func receivedDistributedCall(_ lookup: DistributedMethods) {
switch lookup {
case .accessible:
self.accessible()
}
}
// ~~~ Compiler Synthesized ~~~
func notDistributed() {
// ...
}
distributed func accessible() {
// ...
}
}
We can continue down this rabbit-hole, dealing with computed-properties, known-arguments and even known-return-types:
distributed actor ContrivedExample {
// ~~~ START Compiler Synthesized ~~~
enum DistributedMethods: UInt8, DistributedMethodLookupKey {
case accessible = 0
case accessibleWithKnown_argumentType // 1
case accessibleWithMultipleKnown_argumentType1_argumentType2 // 2
case accessibleWithMultipleKnownReturnTypes // 3
case computed // 4
}
// Internal Function Call Machinery
func receivedDistributedCall(_ lookup: DistributedMethods, arguments data: UnsafeRawPointer? = nil) -> AggregateOutputTypes? {
switch lookup {
case .accessible:
/// No `arguments`, so never use them
self.accessible()
return nil
case .accessibleWithKnown_argumentType:
// Call Deserialization/Decode on `arguments data` to vend back instance of type
guard let sentBool = try unpack(data, toType: Bool.self) else { fatalError("FATAL or THROWS!") }
self.accessibleWithKnown(argumentType: sentBool)
return nil
case .accessibleWithMultipleKnown_argumentType1_argumentType2:
// Call Deserialization/Decode on `arguments data` to vend back instance of type
guard let (sentString, sentDouble) = try unpack(data, toTypes: [String.self, Double.self]) else {
fatalError("FATAL or THROWS!")
}
self.accessibleWithMultipleKnown(argumentType1: sentString, argumentType2: sentDouble)
return nil
case .accessibleWithMultipleKnownReturnTypes:
let functionCallResults = self.accessibleWithMultipleKnownReturnTypes()
let output = AggregateOutputTypes(functionCallResults)
return output
case .computed:
return AggregateOutputTypes(self.computed)
}
}
struct AggregateOutputTypes /* Codable, Sendable */ {
// Include the called method so the `Recipient` knows what this `returns` from
let methodCalled: DistributedMethods
// Accumulate all known output types
// Could be any number of instances of these types (or we could explicitly count them)
let knownType1: [String] = []
let knownType2: [Int16] = []
let knownType3: [UInt32] = []
let knownType4: [Int64] = []
// Synthesized Initializers
// .accessibleWithMultipleKnownReturnTypes
init(_ r0: (Int16, UInt32, Int64)) {
self.knownType2 = [r0.0]
self.knownType3 = [r0.1]
self.knownType4 = [r0.2]
}
// .computed
init(_ r0: String) {
self.knownType1 = [r0]
}
}
// ~~~ END Compiler Synthesized ~~~
distributed var computed: String { "" }
distributed func accessible() { ... }
distributed func accessibleWithKnown(argumentType: Bool) { ... }
distributed func accessibleWithMultipleKnown(argumentType1: String, argumentType2: Double) { ... }
distributed func accessibleWithMultipleKnownReturnTypes() -> (Int16, UInt32, Int64) { ... }
func notDistributed() { ... }
}
Through this we've had the compiler unwind all of the possible known types of both our input and output arguments, eliminated all Stringly-Typed
ABI-specific Mangled Name
s, put distributed actor
s inline to exploit various improvements coming to the very similar to Codable
API
s. Including Version
ing and Delta-Updates
/Diff
ing, which would be a boon to distributed actor
s. This synthesis even removes the recording of a number of Existential
s and Generic
s, as it allows for the synthesized code to hand back those same Existential
and Generic
types.
If you've made it this far, congratulations and my condolences.
This has been in my head for weeks, so I hope everyone gets some value out of this, it isn't my intention to slow-up work on distributed actor
s (to the contrary, I've been dying to really dive-in since "Scale By The Bay"). We really, REALLY, shouldn't let ABI-specific Mangled Name
s anywhere we can absolutely avoid them.
I didn't even touch on multi-year later broken API
s or the potential Security
concerns about slinging around what Objective-C
would call Selectors
and the 37 CVEs of XPC's dynamic decoding. I'm happy to discuss those, if the forum thinks it's worth wading into, but for now I'll stop here.
It is my contention, that we need to break away from any ABI
usage. As well as, making distributed actor
s more like Codable
-- not in requiring it for Serialization
, but rather; more-synthesis and staying as high an abstraction as possible.
If we got the more Codable
route, we can even allow Swift's concept of progressive disclosure for more future customization points in distributed actor
s. Such as, Codable
s synthesized-by-default approach, while still allowing power users to conform the tooling to their needs.
Certainly, there is one issue with Codable
that this approach would replicate. For lack of a better explanation, I've dubbed it The Needlepoint Problem
; where a piece of code has function calls across a trace-only boundary. In the case of Codable
it calls encode(_:)
and moves back and forth between compiled language functions, synthesized data types and developer source code. Synthesizing DistributedMethods
would inherit that minor issue.