Differentiating wrapped properties

dan-zheng · April 23, 2020, 4:40am

Hi folks,

I'd like to share some ideas about differentiating stored properties attributed with property wrappers.

Ideas by: @rxwei, @bartchr808, @dan-zheng, @marcrasi
Implementation: [AutoDiff] Support differentiation of wrapped properties. by dan-zheng · Pull Request #31173 · apple/swift · GitHub

Background

Differentiable programming in Swift uses the Differentiable protocol, which has compiler support for derived conformances.

The compiler synthesizes TangentVector member structs for Differentiable-conforming types based on which stored properties conform to Differentiable. Details here.

Let "wrapped (stored) properties" refer to "stored properties attributed with property wrappers".

Let "tangent (stored) properties" refer to "the stored properties synthesized in TangentVector member structs, corresponding to original stored properties that conform to Differentiable".

struct Pair<T: Differentiable, U: Differentiable>: Differentiable {
  var first: T
  var second: U

  // Compiler synthesizes:
  // struct TangentVector: Differentiable & AdditiveArithmetic {
  //   var first: T.TangentVector      // tangent property
  //   var second: U.TangentVector     // tangent property
  // }
}

Current behavior

Differentiable conformance derivation currently computes tangent properties from wrapper backing stored properties instead of wrapped stored properties. This leads to some unexpected behavior.

Let's look at an example:

import _Differentiation

// Naive property wrapper.
@propertyWrapper
struct Wrapper<Value> {
  var wrappedValue: Value
}

struct Struct {
  @Wrapper var x: Float
  // Compiler generates:
  // var _x: Wrapper<Float>
  // var x: Float {
  //   get { _x.wrappedValue }
  //   set { _x.wrappedValue = newValue }
  // }

  @Wrapper @Wrapper var y: Float
}

Currently, tangent properties are computed from wrapper backing stored properties, requiring wrapper types to conform to Differentiable:

// Wrappers must conform to `Differentiable`.
extension Wrapper: Differentiable where Value: Differentiable {}

struct Struct: Differentiable {
  @Wrapper var x: Float
  @Wrapper @Wrapper var y: Float

  // Compiler currently synthesizes:
  // struct TangentVector: Differentiable & AdditiveArithmetic {
  //   var x: Wrapper<Float>.TangentVector
  //   var y: Wrapper<Wrapper<Float>>.TangentVector
  //   ...
  // }
}

It seems weird that Wrapper<...>.TangentVector appears in the synthesized TangentVector struct, and that Wrapper must conform to Differentiable. Many property wrappers (e.g. @Lazy) are unrelated to differentiation, and it may not make sense to conform them to Differentiable.

Since the wrapped property Struct.x has type Float, one would expect the corresponding tangent property to have type Float, not Wrapper<Float>.TangentVector.

Idea

Instead, we can make Differentiable conformance derivation treat wrapped stored properties like normal stored properties, using them to compute tangent properties in TangentVector. This makes behavior consistent for normal stored properties and wrapped stored properties: one might say this is a fix rather than a new feature.

struct Struct: Differentiable {
  @Wrapper var x: Float
  @Wrapper @Wrapper var y: Float

  // New behavior:
  // struct TangentVector: Differentiable & AdditiveArithmetic {
  //   var x: Float
  //   var y: Float
  //   ...
  // }
}

This behavior seems desirable for all wrapped stored properties and property wrapper types. Whether wrapper types conform to Differentiable is now irrelevant - what matters is that wrapped properties conform to Differentiable, just like normal stored properties.

Wrapper types are required to provide a setter for var wrappedValue, which is needed to synthesize mutating func move(along:). This is consistent with existing Differentiable conformance derivation requirements.

Accesses to wrapped stored properties can be differentiated, as expected:

@differentiable
func multiply(_ s: Struct) -> Float {
  s.x * s.y
}
print(gradient(at: Struct(x: 3, y: 4), in: multiply))
// Struct.TangentVector(x: 4.0, y: 3.0)

Use cases

The new behavior makes differentiation work naturally for wrapped stored properties. Here's an example using non-trivial example property wrappers from SE-0258:

// `@Lazy` and `@Clamping` from:
// https://github.com/apple/swift-evolution/blob/master/proposals/0258-property-wrappers.md

struct Struct: Differentiable {
  @Lazy var x: Float = 10

  @Clamping(min: -10, max: 10)
  var y: Float = 5
}

@differentiable
func multiply(_ s: Struct) -> Float {
  return s.x * s.y
}
print(gradient(at: Struct(x: 3, y: 4), in: multiply))
// Struct.TangentVector(x: 4.0, y: 3.0)

Any comments are welcome!

DevAndArtist · April 23, 2020, 5:12am

I have not read the post, sorry about that, but the title is contradicting itself, as stored properties cannot be attributed with property wrappers, because the stored properties are the property wrappers (by the current design) and only computed properties are attributed / wrapped but forbidden to have explicit get / set accessors.

@Wrapper
var property: Value // <~ technically not a stored property

var _property: Wrapper // <~ this is the stored property
var property: Value {
  get { ... }
  set { ... }
}

dan-zheng · April 23, 2020, 5:24am

I understand that is how property wrappers are implemented.

I actually intentionally wrote using user-facing language (like the Language Guide on property wrappers), focusing on the fact that @Wrapper var x: Float is syntactically a stored property attributed with a property wrapper.

Perhaps "wrapped property" is more accurate than "wrapped stored property", in both user-facing and implementation language.

DevAndArtist · April 23, 2020, 5:27am

I'd prefer that terminology because it's extensible as it also may allow us to refer to wrapped properties from extensions in the future. Such wrapped properties would not create a new stored property for the wrapper type.

But again, that's just a small nit-pick, as I still understand the discussion from the given context.

jrose · April 23, 2020, 5:31am

Hm. I'm not convinced this is necessarily a good idea. Consider a (weird) property wrapper @Squared:

struct Foo: Differentiable {
  @Squared var x: Float = 100 // stored as 10
}

Is it correct to report the gradient of operations on Foo in terms of the underlying value here?

dan-zheng · April 23, 2020, 6:02am

Thanks for the question!

I think this proposal actually exactly fixes the semantics in your example. For wrapped properties, differentiation should be with respect to the wrapped value, not the underlying storage.

Consider this:

@differentiable
func bar(_ foo: Foo) -> Float {
  return foo.x // implementation detail: foo._x.wrappedValue
}
print(gradient(at: Foo(x: 100), in: bar))

Full code

import Darwin
import _Differentiation

@propertyWrapper
struct Squared<Value: FloatingPoint> {
  var value: Value

  var wrappedValue: Value {
    get { value * value }
    set { value = sqrt(newValue) }
  }

  init(wrappedValue: Value) {
    self.value = wrappedValue
  }
}

struct Foo: Differentiable {
  @Squared var x: Float = 100 // stored as 10
}

@differentiable
func bar(_ foo: Foo) -> Float {
  return foo.x // implementation detail: foo._x.wrappedValue
}
print(gradient(at: Foo(x: 100), in: bar))

Before this proposal, the underlying storage foo._x was differentiated, which isn't desirable.

Since var _x: Squared<Float> doesn't conform to Differentiable, Differentiable derived conformances for Foo emits a warning and skips it during TangentVector synthesis, so Foo.TangentVector is empty.

$ swift squared.swift
squared.swift:17:3: warning: stored property '_x' has no derivative because 'Squared<Float>' does not conform to 'Differentiable'; add an explicit '@noDerivative' attribute
  @Squared var x: Float = 100 // stored as 10
  ^
  @noDerivative
TangentVector()

With this proposal, the wrapped property foo.x is differentiated. gradient(at:in:) returns Foo.TangentVector(x: 1), as expected.

cukr · April 23, 2020, 6:30am

Something to point out is that it would make Differentiable conformance derivation behave differently from Codable and Equatable conformance derivation

dan-zheng · April 23, 2020, 6:32am

That's true, I noticed as well. I think it's fine for conformance derivation to behave differently for the different protocols in this case, since they have different semantics.

Dante-Broggi · April 23, 2020, 11:02am

One thing I think should be considered is that, when I have read previous updates on Differentiation, I have always thought that @noDerivative would eventually be implemented as a property wrapper.

This update would seem to prevent that.

dan-zheng · April 23, 2020, 11:10am

For readers unfamiliar: @noDerivative can be declared on stored properties to opt them out of TangentVector synthesis.

From the manifesto:

By default, the compiler synthesizes a nested TangentVector structure type that contains the TangentVector s of all stored properties that are not marked with @noDerivative . In other words, @noDerivative makes a stored property not be included in a type's tangent vectors.

@NoDerivative could actually totally be implemented as a "pass-through" property wrapper! The compiler just needs to have knowledge of the attribute, it doesn't care whether the attribute is baked-in or custom (a property wrapper).

If made into a property wrapper, @NoDerivative would be a special case unaffected by this proposal. It would continue to have the same behavior, opting stored properties out of TangentVector synthesis.

dan-zheng · April 23, 2020, 11:30am

Here's a short summary of the OP:

If we derive a tangent property for a "syntactic stored property" (var x: T), we always want the tangent property to have type T.TangentVector.
This is true both for normal stored properties (var x: T) and wrapped properties (@Wrapper var x: T).

saeta · April 23, 2020, 6:07pm

Hmm, I could be mis-understanding here, but if we keep the current behavior, then @NoDerivative could be implemented without any compiler support or any special compiler knowledge.

Concretely, say we defined Empty as an empty struct and conformed it to the Differentiable protocol, then we could write:

@propertyWrapper
struct NoDerivative<Value>: Differentiable {
  var wrappedValue: Value
  typealias TangentVector = Empty
  mutating func move(along value: Empty) {}
}

and not require any special compiler support. Or am I missing something?

Note: we must use a special Empty struct instead of Void (which would make more sense) because Void is defined as the empty tuple, and tuples can't (yet) conform to protocols. So to work around this (temporary) language limitation, we have to define Empty which is effectively just a different spelling for Void. (Fortunately, the Swift compiler does a good job optimizing away zero-sized types like Empty and Void, so this is a zero-cost abstraction.) Sample Empty implementation:

struct Empty: Differentiable, AdditiveArithmetic {
  init() {}
  static var zero: Empty { Empty() }
  static func +(lhs: Self, rhs: Self) -> Self { Empty() }
}

Further, because we have retroactive differentiability implemented (this is awesome), then it's totally fine that the Lazy property wrapper doesn't itself conform to Differentiable, as you can always add it yourself. (e.g. Writing typealias TangentVector = Value seems pretty straight forward.)

The cost of this is: property wrappers that want to be used in AD contexts need to be explicitly conformed. This actually seems like a benefit, instead of a drawback. I'd also expect a (probably negligible) slightly slower compile pass, as the compiler has to compute the tangent vector type based on the property wrapper instead of special casing the tangent vector type lookup.

Am I missing something else?

dan-zheng · April 23, 2020, 6:34pm

Thanks for sharing your idea!

I'm not sure I agree that "implementing @NoDerivative without compiler support for wrapped properties" is a goal, if it hurts use cases.

The current semantics are clearer and more efficient:

@noDerivative stored properties do not have tangent properties.
- This is clearer than generating NoDerivative<T>.TangentVector (Empty) tangent properties.
The compiler never does work to differentiate accesses to @noDerivative stored properties, because the accesses are never marked active by activity analysis.
- This is more efficient than doing any work, e.g. initializing and adding Empty tangent vectors.

dan-zheng · April 23, 2020, 6:41pm

I still feel that the proposed semantic model for wrapped properties makes sense: differentiation should be with respect to wrapped values, not underlying storages.

When we differentiate accesses to syntactic stored properties s.x, we want to differentiate with respect to "x as a member of s" - how s.x is implemented (e.g. s._x.wrappedValue) is irrelevant.

In this model, it doesn't matter whether wrapper types (e.g. @Lazy, @Squared, or @NoDerivative) conform to Differentiable.

Edit: access level is another reason not to use wrapper underlying storages (s._x) for TangentVector synthesis and s.x differentiation. s._x is currently always private, so using it in user-exposed ways (in TangentVector synthesis) is an access level violation.

This supports the perspective that this proposal is really a semantic fix, not a new feature.

dan-zheng · April 23, 2020, 7:01pm

Brennan: I get the impression that you're concerned this proposal is somehow a "loss in functionality", which I don't believe is true. Details below.

I think you may be interested in Differentiable wrapper types (and thus Differentiable underlying storages like _x) as hypothetical use cases.

Question: if there are useful Differentiable wrapper types, can we represent differentiation with respect to Differentiable underlying storages in the proposed semantic model?

Self-answer: yes, we can by leveraging projected values. It's actually an access level violation to talk about underlying storages (_x), but we can talk about projected values ($x) which are intentionally provided by property wrappers.

Some property wrappers may provide Differentiable projected values - potentially even self, the underlying wrapper value - and hypothetically we may want to differentiate with respect to projected values.

Here's an example Differentiable property wrapper that projects self :

import _Differentiation

@propertyWrapper
struct Wrapper<Value: Differentiable> {
  var wrappedValue: Value

  // Wrappers may provide some differentiable projected value.
  // Let's use `self` as an example. There are use cases for projecting
  // `self`, like `@Freezable` (no real usages, unpolished):
  // https://github.com/tensorflow/swift-apis/blob/master/Sources/TensorFlow/Freezable.swift
  var projectedValue: Self { self }
}
extension Wrapper: Differentiable where Value: Differentiable {}

struct Struct: Differentiable {
  @Wrapper var x: Float
}

@differentiable
func projectedValue(_ s: Struct) -> Wrapper<Float> {
  // We should be able to differentiate projected value accesses.
  return s.$x // s._x.projectedValue
}

To enable differentiation with respect to projected values (Struct.$x), we can simply register derivatives for projected values:

extension Struct {
  @derivative(of: $x)
  func derivativeOfProjectedX(...) { ... }
}

I hope "supporting differentiation with respect to projected values" shows that this proposal isn't a loss in functionality!

The proposal is really a semantic fix. Wrapped values and projected values (the only two entry points provided by property wrappers) can both be differentiated in ways that make sense.

saeta · April 25, 2020, 8:38pm

Could the compiler (optimizer) could figure this out in a more general way (assuming Empty is visible to the optimizer, or rather in particular that move(along:) method on Empty is inline-able and can thus be determined to be a no-op)?

dan-zheng · April 25, 2020, 8:45pm

Rather than relying on optimizations for efficiency, I think the current model is more robust: enforcing efficiency via user-controlled @noDerivative annotations that affect activity analysis. @noDerivative has simple guaranteed semantics that help users understand automatic differentiation behavior and performance.

@noDerivative is actually pretty general - it can be declared on declarations other than stored properties too. It lowers to a SIL [_semantics "autodiff.nonvarying"] attribute, which activity analysis recognizes.

dan-zheng · May 11, 2020, 1:37am

I chatted with @saeta offline about his concerns and I think we're on the same page now.

We're moving forward with wrapped property differentiation as proposed now: [AutoDiff] Support differentiation of wrapped properties. by dan-zheng · Pull Request #31173 · apple/swift · GitHub. Thanks all for the discussion!