Automatic Requirement Satisfaction in plain Swift

shabalin · June 2, 2020, 7:06pm

Hi all, I'm excited to share a language evolution pitch we've been working on at Google as part of Swift for TensorFlow team!

Automatic Requirement Satisfaction in plain Swift

Proposal: SE-NNNN
Authors: Denys Shabalin, Dan Zheng, Dave Abrahams, Eugene Burmako, Brennan Saeta
Review Manager: TBD
Status: TBD

Motivation

Swift can automatically generate code to satisfy the requirements of special protocols such as Equatable, Hashable and Codable. For example, any struct whose stored properties conform to Equatable can itself conform simply by declaring it so, without explicitly implementing Equatable’s requirement for a static func ==(Self, Self) -> Bool:

struct Point: Equatable { 
    var x: Float
    var y: Float
}

The implementation of == for Point is synthesized by the compiler behind the scenes.

This functionality is extremely convenient, but it is currently limited to the few protocols for which the compiler defines automatic requirement satisfaction. We would like to lift this restriction and allow for arbitrary libraries to vend protocols and automatic conformances. Examples of applications for this capability include:

Comparable (from the Swift standard library), as well as related potential refinements, such as PartialEquatable and PartialComparable.
DefaultInitializable, to allow construction of new instances of a generic type (e.g.: let tmp = T()).
Alternative serialization formats with type-checked requirements.
Machine learning and deep learning focused operations, such as efficient movement of Tensors between devices, and efficient distributed communication such as all-reduce.

Precedent

Other languages support user-defined requirement satisfaction, so the pattern can continue in libraries without requiring compiler changes. For example:

Derive Macros in Rust and Implicit Macros in Scala.

Languages with rich macro systems such as Rust and Scala let users provide custom code generators.
Datatype Generic Programming in Haskell, Shapeless in Scala and Derivation in Dotty.

Approaches inspired by Haskell's GHC.Generics offer a way to satisfy requirements based on the structure of stored data. We propose to do the same for Swift.

Benefits

The proposed features have the following benefits:

The barrier for entry to automatic requirement satisfaction for standard library protocols is lowered.
It becomes possible to automatically satisfy requirements of protocols outside the standard library.
The implementation of existing automatic requirement satisfaction could be drastically simplified, and done entirely in Swift itself. The need for built-in requirement satisfaction is limited to one protocol: Structural.
Libraries can retroactively add automatic requirement satisfaction to the protocols which were originally not designed with it in mind.

As we will show, our solution requires minimal language changes, and allows existing automatic requirement satisfaction to be reimplemented in pure Swift.

Proposed solution

A new protocol is added to the standard library:

/// A type that can be converted to and from its structural representation.
public protocol Structural {
    /// A structural representation for `Self`.
    associatedtype StructuralRepresentation

    /// Creates an instance from the given structural representation.
    init(structuralRepresentation: StructuralRepresentation)

    /// A structural representation of `self`.
    var structuralRepresentation: StructuralRepresentation { get set }  // Note: _modify is preferred.
}

Structural provides a bidirectional mapping between Self type and its StructuralRepresentation, which encodes the structure of the type. For value types (such as user-defined structs and enums), the requirements of Structural will be automatically satisfied by the compiler given just a declaration of conformance.

For example, given the following type and conformance to Structural:

struct Point: Structural { 
    var x: Float
    var y: Float
}

The compiler would synthesize the following StructuralRepresentation implementation:

extension Point: Structural {
    public typealias StructuralRepresentation =
        StructuralStruct<
            StructuralCons<StructuralProperty<Float>,
                StructuralCons<StructuralProperty<Float>,
                    StructuralEmpty>>>

    public var structuralRepresentation: StructuralRepresentation {
        get {
            return StructuralStruct(Point.self,
                StructuralCons(StructuralProperty("x", x, isMutable: true),
                    StructuralCons(StructuralProperty("y", y, isMutable: true),
                        StructuralEmpty())))
        }
        set {
            self = .init(structuralRepresentation: newValue)
        }
    }
    
    public init(structuralRepresentation: StructuralRepresentation) {
        self.x = structuralRepresentation.properties.value.value
        self.y = structuralRepresentation.properties.next.value.value
    }
}

The structural representations are composed using the following auxiliary types:

StructuralStruct<Properties> . A representation for a struct value that is not sensitive to the reference identity. It contains the type of the corresponding struct and a StructuralCons-encoded list of properties.
StructuralProperty<Value>. A representation of a struct property or an enum associated value. Includes property name and the corresponding value.
StructuralEnum<Cases>. A representation of a value for the enum, that includes enum type, and a list of cases encoded using Either-encoded union of Case-s.
StructuralCase<RawValue, AssociatedValues>. A representation of an enum case, that preserves case name, raw value and a list of associated values.
StructuralCons<Value, Next> and StructuralEmpty. A type-preserving encoding of lists, used to construct a sequence of structure properties and a sequence of associated values.
StructuralEither<Left, Right>. A type-preserving alternative of either left or right value. It used to form a type-safe union of all cases within an enum.

This encoding naturally composes with Property Wrappers. A property @MyWrapper var x: Int is encoded as a Property<MyWrapper>`. This allows us to take advantage of property wrappers to customize the protocol behavior for each field individually, based on the custom derived conformance defined for the wrapper type.

Implementing protocols with Structural

To implement a custom protocol MyProtocol using Structural, one has to:

Provide an implementation of the given protocol for the structural representations listed above. For example, to support any struct, one needs to provide instances for StructuralStruct, StructuralProperty, StructuralCons and StructuralEmpty. Here the library may also refine the type parameters of the corresponding constructors to express domain-specific requirements.
Define an extension for the protocol that bridges StructuralRepresentation-based implementation with any type that conforms to Structural:

extension MyProtocol where Self: Structural, Self.StructuralRepresentation: MyProtocol {
    public func myFunction() -> Bool {
        return self.structuralRepresentation.myFunction()
    }
}

Afterwards, any type that conforms to Structural will be able to take advantage of that implementation, as long as all of its leaf values transitively conform to the given protocol. For example the original definition for Point can now implement MyProtocol in the following way:

struct Point: Structural, MyProtocol {
    var x: Float
    var y: Float
}

Here, the compiler will generate an instance for Structural, and in turn MyProtocol will be able to rely on an implementation based on the underlying StructuralRepresentation.

Case study implementations

To evaluate the proposed design and illustrate how it is intended to be used in practice, we implemented a number of protocols based on the Structural API. The code can be found via Swift Structural repo on github.

First, we started with protocols inspired by the Swift standard library. We've given them slightly modified names to make it easy to compare and contrast their performance relative to the existing implementations:

CustomEquatable. Compares each property element-wise with the same semantics as Equatable.
CustomHashable. Hashes all properties element-wise with the same semantics as Hashable.
CustomComparable. Compares struct fields recursively in lexicographic order.
CustomDebugString. Provides debug string implementation, similar to String(reflecting:)

Additionally, we've also experimented with implementing serialization-style protocols directly on top of Structural:

DecodeJSON. Encodes structs into a JSON string.
EncodeJSON. Decodes structs out of a JSON string into the corresponding struct.

Lastly, we implemented protocols inspired by the the ones we use in Swift for TensorFlow:

Additive. Adds structs elementwise and produces the result as sum of the two.
Zero. Creates a zero value for a struct, based on zero values of all its leaf values.
InplaceAdd. In-place modifies struct by a given value, modifying leafs elementwise.
DefaultInitializable. A protocol with a zero-argument constructor. Derived implementation default-initializes all of its leaves transitively.

Performance evaluation

We benchmarked the implementation of the Structural-based protocols and compared them against the reference and/or hand-specialized implementations:

Benchmark	Baseline (1x)	Performance (more is better)
CustomEquatable	Equatable	0.356
CustomHashable	Hashable	0.641
CustomComparable	Hand-specialized	0.343
CustomDebugString	String(reflecting:)	1.907
DecodeJSON	Codable-based JSON decode	2.809
EncodeJSON	Codable-based JSON encode	36.13
Additive	Hand-specialized	0.368
InplaceAdd	Hand-specialized	0.281

Exact measurements are available on the Raw Performance Numbers page. Numbers for the relative time are obtained by normalizing each benchmark running time by the corresponding baseline. Moreover, we perform measurements across a number of struct sizes (1 to 16 stored properties) and report the geometrical mean across all of them as the final relative time.

Benchmarks are run in the release mode, with cross-module optimizations, on a recent checkout of the swift toolchain from tensorflow branch.

Benchmark numbers suggest that the current approach has overhead compared to the hand-specialized implementation (Additive, InplaceAdd) or compiler-generated derived conformance (CustomEquatable, CustomHashable).

On the other hand, protocol implementations that today rely on various forms of reflection (CustomDebugString, DecodeJSON, EncodeJSON) offer better performance based on top of the Structural protocol, rather than the reference implementation. This suggests that Structural can be used as a lightweight substrate for performance-sensitive workloads such as serialization.

We believe that the performance gap between the hand-specialized implementations and Structural can also be greatly diminished by improving the SIL optimizer to elide copying. The performance overhead comes from the shallow copying between the concrete value and its structural representation.

Impact on the existing code

The proposal adds new library functionality and a compiler-generated derived conformance for the Structural protocol. These changes have no direct impact on the existing APIs and their implementation. We demonstrate how existing protocols such as Equatable can be implemented on top of it.

The Structural protocol—like Swift’s Mirrors—exposes a type’s internal implementation. In order to enable the use of Structural generic programming within libraries that provide a stable API & ABI (e.g. are compiled for library evolution), please see the related pitch: Scoped Conformances.

Future Directions

Avoiding Copies: While the implementation proposed here is sufficiently expressive to automatically implement a variety of protocols, coroutines (i.e. standardization of modify) would enable efficient expression of copyless modifications with generic implementations. This capability is necessary to avoid quadratic runtime behavior with CoW-based value types, and (in the future) move-only types. (See ScaleBy in the examples repository for a demonstration of the performance improvements that are possible.)

Compiler Optimizations: As mentioned previously, we believe that the performance gap between the hand-specialized implementations and Structural can be greatly diminished by improving the SIL optimizer.

Improved Diagnostics: When a type does not conform to a library-defined protocol, it can be difficult to understand which field is causing problems. Compiler diagnostics can be augmented to guide the user to the field or fields that prevent the structural conformance from applying.

Automatically conform to Structural. If a protocol has a known Structural-based implementation, one has to explicitly state conformance to Structural before using that implementation.

To make Structural be a complete 1-to-1 replacement for compiler code generation, one could also include an extra annotation @structuralProtocol:

@structuralProtocol
protocol MyProtocol { … }

struct Point: MyProtocol { … }

In this case Point is going to get additional protocol conformance to : Structural, generated by the compiler. With this change, existing compiler-supported protocols such as Equatable could be implemented in the standard library without breaking the user code due to a migration to Structural.

Preserving type invariants. Mutating methods implemented via either setting to structuralRepresentation or by calling self = .init(structuralRepresentation: …) can mutate properties of the underlying struct. Unlike the standard constructor, the generated constructors lack any domain-specific validation that might be necessary to preserve the domain-specific type invariants. One potential direction could be to provide some form of retroactive validation of instances created or mutated through structural representations.

Structural representation for classes. Current proposal targets structs and enums. Currently, we do not provide support for classes. The proposal can be extended with additional representation types that encode the class instances. It’s important to distinguish them from the Struct representation, due to the fact that classes are sensitive to their reference identity, and to support derived classes’ additional fields.

Alternatives considered

Syntactic metaprogramming via macros.

Derive Macros in Rust and Implicit Macros in Scala have shown how a macro system can be used to automatically generate derived instances. As a downside, it requires a design and implementation of a complete macro system integrated into the language which has far greater scope than the minimal approach we are proposing here. Additionally, empirically macro-based systems might ossify compiler internals and/or impede further language development.

Eugene Burmako has previously experimented with adding support for syntactic reflection via Swift Quasiquotes. Compared to the syntactic reflection, this proposal does not aim to expose a model of the Swift language syntax. Instead, we rely on a bidirectional encoding of struct and enum values to the corresponding structural representation.

Conceptually, structural representation reflects on the structure of the runtime values in a type safe manner, rather than their surface syntax in the language.
Visitor-based API.

As an alternative to structural representation, we've considered exposing a purely visitor-style API. Visitor-based approach could allow us to avoid performing copies to and from the corresponding structural representation, and operate on the underlying concrete value instead. As a downside, this significantly affects the ease of use of the API.

Structural representation encoding is analogous to DOM, while visitors would be equivalent to SAX. In practice, DOM is usually far easier to reason about and provides a more intuitive end-user experience.
Naming alternatives.

Prior work in this space calls the Structural protocol equivalent: Generic. We decided to avoid that name because it’s ambiguous with respect to existing generic types and generic programming in Swift.

The associated type within the Structural protocol (in this proposal called StructuralRepresentation) has taken various alternative names during development, including: Representation, AbstractValue, Structure, ReflectedStructure, and StructureStorage. We’ve settled on a more explicit version of the name to avoid any potential name clashes, and because it accurately reflects the type’s use. (Abstract (in AbstractValue) implies a subtraction of information that is being abstracted over; and we believe future compiler optimizations can optimize away all storage in common cases.)

Representation types for StructuralCons, StructuralEither, and StructuralEmpty could be useful outside of this proposal as they encode an equivalent of strongly typed products and sums. An alternative naming for Cons is Tuple. Additionally, Empty could be a typealias for Void once tuples (the structural types) are allowed to conform to protocols.

References

saeta · June 2, 2020, 7:17pm

The related pitch has now been posted: Scoped Conformances

anandabits · June 2, 2020, 7:56pm

In addition to preserving invariants, exposing a structural representation reduces encapsulation of implementation details. Exposing the guts of a type is not necessary for the current compiler generated synthesis. I think this is a problem worth thinking more about.

mayoff · June 2, 2020, 8:06pm

Can we manually provide an implementation of Structural for a type that supports a synthetic implementation? I assume so (since the existing synthesizable protocols work that way), but I didn't see it explicitly stated.

typesanitizer · June 2, 2020, 8:30pm

You mention these under Future Directions but IMHO the success of proposal is kinda' contingent on being able to implement said optimizations. Otherwise, we run into the risk of adding a language feature that makes things nice to use and also makes your code for Equatable/Comparable/Hashable etc noticeably slower than what we have today, which would be bad.

Going the other way, if there is difficulty in implementing the optimizations, perhaps that should make us revise the API so that the generated code is easier to optimize.

Aside: I don't have better suggestions for terminology but I think there is a risk of confusion with structural/non-nominal types, which themselves cannot conform to protocols in Swift.

Sajjon · June 2, 2020, 9:54pm

Great pitch! Super excited about this! Since this would allow for automatic conformance to protocols (given some constraints), could this possibly also help with implementation of automatic protocol forwarding, in the use case of Haskells newtype something that has been discussed for Swift here and here?

saeta · June 2, 2020, 11:03pm

Thanks for the response @anandabits! Could you describe your concerns a bit further? I'd like to understand a little bit better. Thanks!

Note: I'm guessing a little bit, but if I'm understanding your concerns correctly, I think we attempted to allude to this in the following section:

The related pitch is here: Scoped Conformances. That said, it's quite possible we have forgotten about something, so please do let us know what we've missed!

saeta · June 2, 2020, 11:07pm

This is a great question! As it turns out, you absolutely can. (And that's exactly how we've been prototyping this feature internally ourselves.) You can check out our examples at: GitHub - google/swift-structural which is a repository that compiles with stable compilers (unmodified). In particular, you can check out StudentGrade, of which lines 27-70 would eventually be automatically generated by the Swift compiler (but are hand-written out today).

Thanks for the feedback! A few thoughts:

Agreed that we shouldn't make Equatable / Hashable slower! (Nit: IIUC, Comparable is not automatically synthesized.) We'd like to intentionally decouple changes to the existing Equatable / Hashable derived conformances from this proposal itself, as this might involve considerations of backwards compatibility & beyond. (Haven't thought enough, and we'd like to hear from some other experts.)
We're cautiously optimistic that we can get the performance optimizations to where we want to go. (Swift as a language is fairly well designed to support the kinds of optimizations @shabalin (and others) have in mind.) The reason we haven't implemented them yet is because we want to iterate with the community, and so we've open sourced our implementation (and put together this pitch) at an (uncomfortably) early stage of development. (They're under future development because we believe the proposal should accurately reflect the state of the world today, instead of a hypothetical future.)
Putting aside Hashable and Equatable, performance is either as-good-as (and often significantly better with) with structural generic programming, compared to what we do in Swift today. Concretely, all hand-specialized implementations can continue to be hand-implemented, even with structural generic programming. In all other cases, implementations based on structural generic programming are between 1.9x and 36x faster with today's stable compilers.

If folks would like to help out with improving the Swift optimizer, we'd love collaborators!

anandabits · June 2, 2020, 11:11pm

I think I follow now. Is the idea is to use a fileprivate Structural conformance so that the structure of the type is not visible outside the file, but the “synthesized” conformance is visible outside the file? I think that would address the concern I have.

Would the compiler synthesize a second Structural conformance in a different file (possibly in a different module) if conformances to protocols not visible to the original type declaration are desired? Or would conformances relying on Structural all have to be in the same file as the type declaration if the Structural conformance is not exposed outside the file (or module) of that declaration?

saeta · June 2, 2020, 11:47pm

Great question! (@shabalin and/or the other authors: feel free to jump in and disagree, but...) I suspect we'd want Structural to follow the same rules the the existing derived conformances use today. (i.e. you can only get an automatic conformance if you request it when defining the type.) Does that make sense?

In short, yes.

Said another way, if you would like to get a structural-based protocol implementation, you would have to declare that conformance in some place where the Structural conformance is visible. For example: if the Structural conformance is fileprivate, then all of those conformances would have to be declared within the same file. Alternatively, if the conformance is public (as all conformances to public protocols on public types are today), then those conformances could be declared in any dependent module (and you'd probably want to compile with -cross-module-optimization).

anandabits · June 3, 2020, 12:14am

I suppose it does. Other files may not even have visibility to the necessary details to synthesize the conformance.

In general I think this proposal looks really nice. I’ll try to look at the repo and offer more feedback afterwards, but time is pretty tight for me right now.

saeta · June 3, 2020, 1:51am

No worries; and thank you very much for the feedback so far!

saeta · June 3, 2020, 4:59am

Whoops, sorry for missing seeing your comment. I think the answer is that it probably could! That's a great addition to the rationale for the pitch.

DeFrenZ · June 3, 2020, 9:49am

I think that the pitch gives a valid approach to this kind of problems, but for very performance-sensible protocols like Equatable and Comparable I don't think that it will ever be viable to basically do a runtime introspection (equivalent to Mirror but better defined) as opposed to directly doing the work.

Imho automatic requirements have to be defined statically.

filip-sakel · June 3, 2020, 10:14am

Would this negate the need for SwiftUI to use metadata for managing a View’s state?

Karl · June 3, 2020, 11:27am

I'm not sure this is the best design for static reflection in Swift (which is what this is - using compiler code synthesis to build up some traversable metadata describing the value's members).

Just concentrating on the non-public conformance case for now, I think it should be possible to get equivalent performance to a hand-written implementation. I'm concerned that the long chains of generic types could lead to longer compile times (and worse -Onone/debug performance), but I guess function builders rely on the same thing.

That said, I'm not really sure I love the chain idea (with the StructuralEmpty terminator). It's painfully clear that we need variadic generics soon. It would be much nicer to express something like "a conformance for all Structurals when all their members are Equatable" like this:

extension Structural: Equatable where TypeInfo<Self>.Members<(T: Equatable)...> {
  // ...
}

Where TypeInfo<Root>.Members<T...> could be some magic static reflection type (like MemoryLayout<T>) with guaranteed optimisations.

shabalin · June 3, 2020, 1:45pm

Just to clarify the point of performance: it's not a matter of implementing completely new optimizations, but rather a matter of polishing what's already there.

As an example, a simpler structural encoding that only use Cons and Empty is easier to optimize away and offers a much closer performance to custom hand-specialized implementation (raw numbers here):

Benchmark	Baseline (1x)	Performance (more is better)
CustomEquatable	Equatable	0.8268825083
CustomHashable	Hashable	0.8791851976
CustomComparable	Hand-specialized	0.7609794796
Additive	Hand-specialized	0.8708537815
InplaceAdd	Hand-specialized	0.5392857712

We designed Structural in such as way that it's possible to statically remove the conversion to and from the encoded version. Current optimiser is already doing an extremely reasonable job here without any contributions on our side.

shabalin · June 3, 2020, 1:58pm

The proposal outlined above doesn't introduce runtime reflection, but rather aims to be an equivalent of static or compile-time reflection. While we don't include any direct static evaluation guarantees in the text, we believe this can be added in the future either through improved optimizations at SIL level and/or more explicit addition of guaranteed compiled-time evaluation to Swift language.

As outlined in my previous comment, Swift is already doing a great job at optimizing a simpler version of the encoding, so there is a some evidence that suggests that more work on optimiziations at SIL level could be sufficient to minimize (or even completely eliminate) the current performance gap.

Joe_Groff · June 3, 2020, 2:23pm

Generalizing protocol conformance synthesis is an interesting problem, but I don't think type-level metaprogramming is the way to do it. It's a path of sadness that'll lead only to bad compiler errors, slow compile times, heavy memory usage from deeply nested type metadata, and programmer frustration because the type system is not, and really should not, be expressive enough for the sorts of things people will want to do in their protocol implementations. I also fear that we're working on too many different ways of expressing the same thing—there's already a well-developed proposal for iterating through the structure of a type with key paths, for instance. Type specialization is really just a special case of compile-time evaluation, and Swift is well set up to be able to lift values between type and value level, and also constant-fold and evaluate well-designed reflection APIs at compile time so that default implementations can be written in terms of reflection and still generate optimized specialized implementation. I think that's a far more promising direction.

Karl · June 3, 2020, 2:59pm

Not for the implementations, but you at least need something at the type-system level to constrain conformances (e.g. for Equatable). By far the most common constraint is "all types conform to X", and I think any reasonable implementation of variadic generics ought to also support that. So it seems reasonable to model a type's properties as a variadic generic.

C++'s proposed static reflection doesn't make use of template metaprogramming, either (because of all the things you mentioned, particularly compile time). They have a magic type meta::info, which you can query or iterate using built-in expressions like get_data_members(info), and it's guaranteed to never escape to runtime (which we can be looser about).