StoredPropertyIterable

Joe_Groff · January 15, 2020, 7:08pm

There are probably enough breadcrumbs that you could reconstruct a name retroactively from a KeyPath; standard library code can use the internal object layout to iterate through the components, and for stored properties, match the stored offset back to the metadata to get the name back, and for computed properties, make a best effort to dladdr the accessor symbols and demangle the names out of them. That could be useful for debugging and logging, but wouldn't be robust enough for general API (and in time, we want to support people intentionally jettisoning reflection and symbol info for privacy purposes).

anandabits · January 15, 2020, 10:39pm

I agree that MemoryLayout serves a different purpose. Reflection sounds like a good idea to me.

Karl · January 16, 2020, 12:39am

I like that this is using reflection rather than conformance synthesis - the fewer magic protocols we have, the better!

That said, I'm still not convinced on KeyPathIterable or KeyPathSchema<T> for your use-cases. I feel that perhaps it's assuming that all Swift code is designed for ML. In general, the particular schema an object wishes to expose depends very much on who is asking and what they're using that information for. Even differentiable types like Int and Float might not be appropriate for an ML optimiser to alter (e.g. the Int may be a file-descriptor or other kind of resource-handle).

Sure, I could use the Custom... protocol, but that's only if I can successfully debug the issue and discover that file-descriptor nested deeply in a bunch of sub-objects (some of which may be from external libraries without source available, but whose fields are still visible to the runtime).

I think you would want a domain-specific KeyPathSchema type and CustomKeyPathSchema protocol, although it would probably build on some of this functionality (e.g. you could introduce an @Optimisable property wrapper which conforms to some marker protocol, then iterate all properties which conform to that protocol).

It's nice to save the user writing boilerplate, and we should definitely look at ways to do that, but we should also be careful not to try and be too clever.

EDIT: Also, do stored properties with willSet/didSet handlers appear as "stored properties" via reflection?

dan-zheng · January 16, 2020, 1:16am

Thanks for your feedback!

I actually view "key path schemas" as a generally useful technique for property/element iteration, similar to Mirror. Some dynamic languages expose this as a top-level construct (e.g. JavaScript's Object.keys and Object.values, Python's dir builtin).

A single allKeyPaths: [PartialKeyPath<Root>] API can be used to define default implementations of other related type-safe APIs, like:

func allKeyPaths<T>(to _: T.Type) -> [KeyPath<Root, T>]: all key paths to properties/elements of a particular type.
var recursivelyAllKeyPaths: [PartialKeyPath<Root>]: recursively all key paths to nested properties/elements.
func recursivelyAllWritableKeyPaths<T>(to _: T.Type) -> [WritableKeyPath<Root, T>]: recursively all key paths to writable nested properties/elements of a particular type.

Machine learning optimization is hardly the only use case for iterating over properties of nested structures. The original tweeted use case is defining default implementations to protocols like Hashable.

I also feel most types want to provide just one single key path schema. It makes sense for Array's schema to return key paths to elements - it's hard to think of a use case for [\Array._buffer] as the schema. Using Mirror as a parallel, types can conform to CustomReflectable in only one way.

Karl · January 16, 2020, 1:51am

That's kind of my point - an object may not want to expose a file-descriptor to an ML optimiser, but it likely would want to expose that field as part of a derived Hashable or Equatable conformance. Two schemas for the same object, and which one is appropriate depends on who is asking and what they're doing with it.

Property iteration is certainly useful, but it is a big hammer. An enormous hammer, actually. But in my experience, there are very few situations where you want to iterate an arbitrary object's fields with essentially no context about their semantics. Even the type system is often not expressive enough to capture those details. Typically when you iterate properties, you just assume they must have some particular meaning because of the objects you're used to seeing in your test cases.

When this gets implemented in S4TF, I'd recommend the property wrapper + marker protocol I mentioned earlier, as protocols represent semantic information and property wrappers allow you to set that on a per-property basis (rather than a per-type basis, as a typical protocol conformance would). It's exactly the right thing to add that contextual information IMO.

dan-zheng · January 16, 2020, 9:28pm

I wonder if you feel that a property/element reflection API is significantly different from Mirror? It seems to me that your same argument could be made for Mirror:

I feel that a property/element reflection API is similar to Mirror. Types can conform to a CustomKeyPathSchema protocol, just like CustomReflectable. Domain-specific property/element iteration can be implemented on top of key path schemas (maybe via property wrappers, as you mentioned) using filtering and querying techniques.

PassiveLogic presented their KeyPathIterable extensions doing just this (querying, etc) at a Swift for TensorFlow open design meeting. Though I feel their APIs are in need of polish.

dan-zheng · January 16, 2020, 9:52pm

Karl:

Property iteration is certainly useful, but it is a big hammer. An enormous hammer, actually. But in my experience, there are very few situations where you want to iterate an arbitrary object's fields with essentially no context about their semantics. Even the type system is often not expressive enough to capture those details. Typically when you iterate properties, you just assume they must have some particular meaning because of the objects you're used to seeing in your test cases.

When this gets implemented in S4TF, I'd recommend the property wrapper + marker protocol I mentioned earlier, as protocols represent semantic information and property wrappers allow you to set that on a per-property basis (rather than a per-type basis, as a typical protocol conformance would). It's exactly the right thing to add that contextual information IMO.

These are fair points. A single custom schema may not fit all use cases, counter to my earlier point:

I think this criticism then also applies to Mirror, since types can only provide one var customMirror: Mirror via CustomReflectable.

Let me try to think of examples where a single custom schema is insufficient:

Stored property iteration. Maybe I want to only iterate over stored properties, and thus I want Array's schema to be [\Array._buffer] instead of indices.map { \Array[$0] }.
Iterating over properties/elements with certain semantics in addition to a particular type. @Karl made this point.

A concrete example of (2) is machine learning optimizers, which want to iterate over "trainable parameters of a particular type".

Consider a deep learning layer which defines "parameter" stored properties (to be trained) and auxiliary stored properties (not to be trained) of the same type:

// Adapted from: https://github.com/tensorflow/swift-apis/blob/b89263188fae67ac87c84b85d69e1b805ad2d612/Sources/TensorFlow/Layers/Normalization.swift#L24

/// Normalizes inputs to have a mean close to `0` and a standard deviation
/// close to `1`.
class BatchNorm: Layer {
    /// The offset value, also known as beta. Parameter, to be trained!
    var offset: Tensor<Float>
    /// The scale value, also known as gamma. Parameter, to be trained!
    var scale: Tensor<Float>
    ...

    /// The running mean. Auxiliary property, not to be trained!
    var runningMean: Tensor<Float>
    /// The running variance. Auxiliary property, not to be trained!
    var runningVariance: Tensor<Float>

    /// Returns the output obtained from applying the layer to the given input.
    func callAsFunction(_ input: Tensor<Float>) -> Tensor<Float> { ... }
}

A machine learning optimizer might try to update all properties via batchNorm.recursivelyAllWritableKeyPaths(to: Tensor<Float>.self). However, this includes the auxiliary properties runningMean and runningVariance, which is not desirable.

Solution ideas:

BatchNorm could provide a custom key path schema excluding runningMean/runningVariance.
- Caveat: but other key path reflection users might actually want access to runningMean/runningVariance key paths.
Another nasty workaround is to change the type of runningMean/runningVariance to something else (e.g. Wrapper<Tensor<Float>>).
- Caveat: this smells. Reflection APIs shouldn't require users to change their code to work nicely.
We could make the reflection API use a stored-property-based key schema for every type. Domain-specific schemas can simply filter stored properties.
- Caveat: this breaks down for Array and Dictionary. Their schemas should (arguably) provide key paths to elements and values.
We could design a reflection API that supports multiple key schemas per type. This is what @Karl suggests.
- This is a new design question!

Karl · January 16, 2020, 11:28pm

My interpretation of Mirror is that it is also domain-specific; specifically, it is designed for debugging and visualisation rather than as a general-purpose property inspection tool. The documentation makes many, many references to the fact that Mirror is for display purposes:

(I have to use emojis to add emphasis to the doc comments; bold doesn't work...)

/// A representation of the substructure ➡️ and display style ⬅️ of an instance of
/// any type.
///
/// A mirror describes the parts that make up a particular instance, such as
/// the instance's stored properties, collection or tuple elements, or its
/// active enumeration case. ➡️ Mirrors also provide a "display style" property
/// that suggests how this mirror might be rendered. ⬅️
///
/// ➡️ Playgrounds and the debugger use the `Mirror` type to display
/// representations of values of any type. For example, when you pass an
/// instance to the `dump(_:_:_:_:)` function, a mirror is used to render that
/// instance's runtime contents. ⬅️

It may also be used for other things if it incidentally happens to give you the same data elements, but I think it's clear that anybody writing CustomReflectable conformances is looking to customise their debug presentation, and are not really considering what they expose to ML optimisers or implementations of Hashable. So it is really domain-specific IMO.

The example you showed does a great job of illustrating what I was talking about. My suggested solution though would be to do something like this:

protocol TrainableProperty {} // marker protocol.

@propertyWrapper
struct Trainable<T>: TrainableProperty {
  // ...
}

struct BatchNorm: Layer {
    @Trainable public var offset: Tensor<Float>
    @Trainable public var scale: Tensor<Float>

    public var runningMean: Tensor<Float>
    public var runningVariance: Tensor<Float>
}

And then something like:

batchNorm.recursivelyAllWritableKeyPaths(to: TrainableProperty.self)

Should give us all the trainable properties, as erased instances of Trainable<T>. Basically, we would have just invented custom attributes for properties.

You might say that these attributes are burdensome, but then, SwiftUI does it for @State and people seem to be happy with it

Troy_Harvey · February 7, 2020, 3:11pm

Indeed. There is a difference between internal use, and external needs. We've been reshaping the APIs for more general use before open sourcing the library for discussion... hopefully in the next couple weeks. There is talk of a followup S4TF discussion where we can get more feedback.

For example, we had to change the APIs of iterables. Currently the Swift API is combinatorics of method naming, like: allkeypaths(), recursivelyAllKeyPaths(), allWritableKeyPaths()... etc. This doesn't scale obviously to more complex usage. We changed it to a single query with optional properties. This is so fundamental, we've started with .filter(), but decided in the interim to use .broadcastKeyPath() which is more in the shape of the existing Swift allkeypaths()... but more verbose.

Another point, is we may also be a need to expand on types. We included property caching right in our broadcastKeypath type, since it wasn't viewed internally valuable to have more types, since this was our fundamental need. Arguably for a general library you may want these separated into two types.

There is certainly a lot of opportunity to poke at these APIs.

avanishr · December 31, 2021, 9:14am

Interesting thought, I was curious if an inverse operation of KeyPath.append could have been used, to link back, to the originating KeyPath. Would be great if KeyPath could provide removeLastPath methood that provides an optional KeyPath with the same Root.

A simpler alternative could be to define recursivelyAllStoredProperties as a two dimensional array, showing relationships between various PartialKeyPaths better

Bar.recursivelyAllStoredProperties
// => [[\Bar.foo, \Bar.foo.x], [\Bar.foo, \Bar.foo.y], [\Bar.z], [\Bar.w]]