Hi all, I'm excited to share a language evolution pitch we've been working on at Google as part of Swift for TensorFlow team!
Automatic Requirement Satisfaction in plain Swift
Proposal: SE-NNNN
Authors: Denys Shabalin, Dan Zheng, Dave Abrahams, Eugene Burmako, Brennan Saeta
Review Manager: TBD
Status: TBD
Motivation
Swift can automatically generate code to satisfy the requirements of special protocols such as Equatable
, Hashable
and Codable
. For example, any struct
whose stored properties conform to Equatable
can itself conform simply by declaring it so, without explicitly implementing Equatable
’s requirement for a static func ==(Self, Self) -> Bool
:
struct Point: Equatable {
var x: Float
var y: Float
}
The implementation of ==
for Point
is synthesized by the compiler behind the scenes.
This functionality is extremely convenient, but it is currently limited to the few protocols for which the compiler defines automatic requirement satisfaction. We would like to lift this restriction and allow for arbitrary libraries to vend protocols and automatic conformances. Examples of applications for this capability include:
-
Comparable
(from the Swift standard library), as well as related potential refinements, such asPartialEquatable
andPartialComparable
. -
DefaultInitializable
, to allow construction of new instances of a generic type (e.g.:let tmp = T()
). - Alternative serialization formats with type-checked requirements.
- Machine learning and deep learning focused operations, such as efficient movement of Tensors between devices, and efficient distributed communication such as all-reduce.
Precedent
Other languages support user-defined requirement satisfaction, so the pattern can continue in libraries without requiring compiler changes. For example:
-
Derive Macros in Rust and Implicit Macros in Scala.
Languages with rich macro systems such as Rust and Scala let users provide custom code generators.
-
Datatype Generic Programming in Haskell, Shapeless in Scala and Derivation in Dotty.
Approaches inspired by Haskell's
GHC.Generics
offer a way to satisfy requirements based on the structure of stored data. We propose to do the same for Swift.
Benefits
The proposed features have the following benefits:
- The barrier for entry to automatic requirement satisfaction for standard library protocols is lowered.
It becomes possible to automatically satisfy requirements of protocols outside the standard library. - The implementation of existing automatic requirement satisfaction could be drastically simplified, and done entirely in Swift itself. The need for built-in requirement satisfaction is limited to one protocol:
Structural
. - Libraries can retroactively add automatic requirement satisfaction to the protocols which were originally not designed with it in mind.
As we will show, our solution requires minimal language changes, and allows existing automatic requirement satisfaction to be reimplemented in pure Swift.
Proposed solution
A new protocol is added to the standard library:
/// A type that can be converted to and from its structural representation.
public protocol Structural {
/// A structural representation for `Self`.
associatedtype StructuralRepresentation
/// Creates an instance from the given structural representation.
init(structuralRepresentation: StructuralRepresentation)
/// A structural representation of `self`.
var structuralRepresentation: StructuralRepresentation { get set } // Note: _modify is preferred.
}
Structural
provides a bidirectional mapping between Self
type and its StructuralRepresentation
, which encodes the structure of the type. For value types (such as user-defined structs and enums), the requirements of Structural
will be automatically satisfied by the compiler given just a declaration of conformance.
For example, given the following type and conformance to Structural
:
struct Point: Structural {
var x: Float
var y: Float
}
The compiler would synthesize the following StructuralRepresentation
implementation:
extension Point: Structural {
public typealias StructuralRepresentation =
StructuralStruct<
StructuralCons<StructuralProperty<Float>,
StructuralCons<StructuralProperty<Float>,
StructuralEmpty>>>
public var structuralRepresentation: StructuralRepresentation {
get {
return StructuralStruct(Point.self,
StructuralCons(StructuralProperty("x", x, isMutable: true),
StructuralCons(StructuralProperty("y", y, isMutable: true),
StructuralEmpty())))
}
set {
self = .init(structuralRepresentation: newValue)
}
}
public init(structuralRepresentation: StructuralRepresentation) {
self.x = structuralRepresentation.properties.value.value
self.y = structuralRepresentation.properties.next.value.value
}
}
The structural representations are composed using the following auxiliary types:
-
StructuralStruct<Properties>
. A representation for a struct value that is not sensitive to the reference identity. It contains the type of the corresponding struct and aStructuralCons
-encoded list of properties. -
StructuralProperty<Value>
. A representation of a struct property or an enum associated value. Includes property name and the corresponding value. -
StructuralEnum<Cases>
. A representation of a value for the enum, that includes enum type, and a list of cases encoded usingEither
-encoded union ofCase
-s. -
StructuralCase<RawValue, AssociatedValues>
. A representation of an enum case, that preserves case name, raw value and a list of associated values. -
StructuralCons<Value, Next>
andStructuralEmpty
. A type-preserving encoding of lists, used to construct a sequence of structure properties and a sequence of associated values. -
StructuralEither<Left, Right>
. A type-preserving alternative of either left or right value. It used to form a type-safe union of all cases within an enum.
This encoding naturally composes with Property Wrappers. A property @MyWrapper var x: Int is encoded as a
Property<MyWrapper>`. This allows us to take advantage of property wrappers to customize the protocol behavior for each field individually, based on the custom derived conformance defined for the wrapper type.
Implementing protocols with Structural
To implement a custom protocol MyProtocol
using Structural
, one has to:
-
Provide an implementation of the given protocol for the structural representations listed above. For example, to support any struct, one needs to provide instances for
StructuralStruct
,StructuralProperty
,StructuralCons
andStructuralEmpty
. Here the library may also refine the type parameters of the corresponding constructors to express domain-specific requirements. -
Define an extension for the protocol that bridges
StructuralRepresentation
-based implementation with any type that conforms toStructural
:
extension MyProtocol where Self: Structural, Self.StructuralRepresentation: MyProtocol {
public func myFunction() -> Bool {
return self.structuralRepresentation.myFunction()
}
}
Afterwards, any type that conforms to Structural
will be able to take advantage of that implementation, as long as all of its leaf values transitively conform to the given protocol. For example the original definition for Point
can now implement MyProtocol
in the following way:
struct Point: Structural, MyProtocol {
var x: Float
var y: Float
}
Here, the compiler will generate an instance for Structural
, and in turn MyProtocol
will be able to rely on an implementation based on the underlying StructuralRepresentation
.
Case study implementations
To evaluate the proposed design and illustrate how it is intended to be used in practice, we implemented a number of protocols based on the Structural API. The code can be found via Swift Structural repo on github.
First, we started with protocols inspired by the Swift standard library. We've given them slightly modified names to make it easy to compare and contrast their performance relative to the existing implementations:
-
CustomEquatable. Compares each property element-wise with the same semantics as
Equatable
. -
CustomHashable. Hashes all properties element-wise with the same semantics as
Hashable
. - CustomComparable. Compares struct fields recursively in lexicographic order.
-
CustomDebugString. Provides debug string implementation, similar to
String(reflecting:)
Additionally, we've also experimented with implementing serialization-style protocols directly on top of Structural:
- DecodeJSON. Encodes structs into a JSON string.
- EncodeJSON. Decodes structs out of a JSON string into the corresponding struct.
Lastly, we implemented protocols inspired by the the ones we use in Swift for TensorFlow:
- Additive. Adds structs elementwise and produces the result as sum of the two.
- Zero. Creates a zero value for a struct, based on zero values of all its leaf values.
- InplaceAdd. In-place modifies struct by a given value, modifying leafs elementwise.
- DefaultInitializable. A protocol with a zero-argument constructor. Derived implementation default-initializes all of its leaves transitively.
Performance evaluation
We benchmarked the implementation of the Structural-based protocols and compared them against the reference and/or hand-specialized implementations:
Benchmark | Baseline (1x) | Performance (more is better) |
---|---|---|
CustomEquatable | Equatable | 0.356 |
CustomHashable | Hashable | 0.641 |
CustomComparable | Hand-specialized | 0.343 |
CustomDebugString | String(reflecting:) | 1.907 |
DecodeJSON | Codable-based JSON decode | 2.809 |
EncodeJSON | Codable-based JSON encode | 36.13 |
Additive | Hand-specialized | 0.368 |
InplaceAdd | Hand-specialized | 0.281 |
Exact measurements are available on the Raw Performance Numbers page. Numbers for the relative time are obtained by normalizing each benchmark running time by the corresponding baseline. Moreover, we perform measurements across a number of struct sizes (1 to 16 stored properties) and report the geometrical mean across all of them as the final relative time.
Benchmarks are run in the release mode, with cross-module optimizations, on a recent checkout of the swift toolchain from tensorflow branch.
Benchmark numbers suggest that the current approach has overhead compared to the hand-specialized implementation (Additive, InplaceAdd) or compiler-generated derived conformance (CustomEquatable, CustomHashable).
On the other hand, protocol implementations that today rely on various forms of reflection (CustomDebugString, DecodeJSON, EncodeJSON) offer better performance based on top of the Structural protocol, rather than the reference implementation. This suggests that Structural can be used as a lightweight substrate for performance-sensitive workloads such as serialization.
We believe that the performance gap between the hand-specialized implementations and Structural can also be greatly diminished by improving the SIL optimizer to elide copying. The performance overhead comes from the shallow copying between the concrete value and its structural representation.
Impact on the existing code
The proposal adds new library functionality and a compiler-generated derived conformance for the Structural protocol. These changes have no direct impact on the existing APIs and their implementation. We demonstrate how existing protocols such as Equatable can be implemented on top of it.
The Structural protocol—like Swift’s Mirrors—exposes a type’s internal implementation. In order to enable the use of Structural generic programming within libraries that provide a stable API & ABI (e.g. are compiled for library evolution), please see the related pitch: Scoped Conformances.
Future Directions
Avoiding Copies: While the implementation proposed here is sufficiently expressive to automatically implement a variety of protocols, coroutines (i.e. standardization of modify
) would enable efficient expression of copyless modifications with generic implementations. This capability is necessary to avoid quadratic runtime behavior with CoW-based value types, and (in the future) move-only types. (See ScaleBy in the examples repository for a demonstration of the performance improvements that are possible.)
Compiler Optimizations: As mentioned previously, we believe that the performance gap between the hand-specialized implementations and Structural
can be greatly diminished by improving the SIL optimizer.
Improved Diagnostics: When a type does not conform to a library-defined protocol, it can be difficult to understand which field is causing problems. Compiler diagnostics can be augmented to guide the user to the field or fields that prevent the structural conformance from applying.
Automatically conform to Structural. If a protocol has a known Structural-based implementation, one has to explicitly state conformance to Structural
before using that implementation.
To make Structural be a complete 1-to-1 replacement for compiler code generation, one could also include an extra annotation @structuralProtocol
:
@structuralProtocol
protocol MyProtocol { … }
struct Point: MyProtocol { … }
In this case Point
is going to get additional protocol conformance to : Structural
, generated by the compiler. With this change, existing compiler-supported protocols such as Equatable
could be implemented in the standard library without breaking the user code due to a migration to Structural
.
Preserving type invariants. Mutating methods implemented via either setting to structuralRepresentation
or by calling self = .init(structuralRepresentation: …)
can mutate properties of the underlying struct. Unlike the standard constructor, the generated constructors lack any domain-specific validation that might be necessary to preserve the domain-specific type invariants. One potential direction could be to provide some form of retroactive validation of instances created or mutated through structural representations.
Structural representation for classes. Current proposal targets structs and enums. Currently, we do not provide support for classes. The proposal can be extended with additional representation types that encode the class instances. It’s important to distinguish them from the Struct representation, due to the fact that classes are sensitive to their reference identity, and to support derived classes’ additional fields.
Alternatives considered
-
Syntactic metaprogramming via macros.
Derive Macros in Rust and Implicit Macros in Scala have shown how a macro system can be used to automatically generate derived instances. As a downside, it requires a design and implementation of a complete macro system integrated into the language which has far greater scope than the minimal approach we are proposing here. Additionally, empirically macro-based systems might ossify compiler internals and/or impede further language development.
Eugene Burmako has previously experimented with adding support for syntactic reflection via Swift Quasiquotes. Compared to the syntactic reflection, this proposal does not aim to expose a model of the Swift language syntax. Instead, we rely on a bidirectional encoding of struct and enum values to the corresponding structural representation.
Conceptually, structural representation reflects on the structure of the runtime values in a type safe manner, rather than their surface syntax in the language.
-
Visitor-based API.
As an alternative to structural representation, we've considered exposing a purely visitor-style API. Visitor-based approach could allow us to avoid performing copies to and from the corresponding structural representation, and operate on the underlying concrete value instead. As a downside, this significantly affects the ease of use of the API.
Structural representation encoding is analogous to DOM, while visitors would be equivalent to SAX. In practice, DOM is usually far easier to reason about and provides a more intuitive end-user experience.
-
Naming alternatives.
Prior work in this space calls the
Structural
protocol equivalent:Generic
. We decided to avoid that name because it’s ambiguous with respect to existing generic types and generic programming in Swift.The associated type within the
Structural
protocol (in this proposal calledStructuralRepresentation
) has taken various alternative names during development, including:Representation
,AbstractValue
,Structure
,ReflectedStructure
, andStructureStorage
. We’ve settled on a more explicit version of the name to avoid any potential name clashes, and because it accurately reflects the type’s use. (Abstract (in AbstractValue) implies a subtraction of information that is being abstracted over; and we believe future compiler optimizations can optimize away all storage in common cases.)Representation types for
StructuralCons
,StructuralEither
, andStructuralEmpty
could be useful outside of this proposal as they encode an equivalent of strongly typed products and sums. An alternative naming forCons
isTuple
. Additionally,Empty
could be a typealias forVoid
once tuples (the structural types) are allowed to conform to protocols.