SE-0379: Opt-In Reflection Metadata

ahti · December 7, 2022, 7:40am

I agree there is room for improvement over the all-or-nothing approach to including metadata we have now, but I have a number of problems with this approach that make it a -1 for me.

(whoops, bit of a wall of text, sorry )

Motivation

Let's start with the motivation section, it mentions the two underlying problems that the proposal ultimately seeks to address:

First: Metadata "may simplify reverse-engineering". In the (admittedly limited) amount of reverse-engineering Darwin apps I've done, I mostly oriented myself by symbols, so by mangled function names. I guess looking at metadata might help a little but I don't think the bit of security-by-obscurity provided by omitting it is a deal breaker or will deter many reverse-engineering attempts.

Second: Metadata "unnecessarily increases the binary size". Based on anecdotal evidence by @benpious and me back in the pitch thread the size increase seems falls somewhere between 5-10% of the binary size. The 20-40mb for Instagram (245mb download size) mentioned in this thread are a bit bigger than that, but still in the same ballpark. I think it would be good if the proposal gave a more informed picture of potential space savings in the motivation section.

So, I don't really agree with the reverse-engineering part of the motivation. Binary size on the other hand is a bit of a problem area for Swift.

Logging / String Interpolation

First of all, the proposal mentions print, debugPrint, and dump as having altered behaviour for types that are not marked Reflectable, but I do wanna point out that string interpolation is affected as well (String(reflecting:) too, but that is obvious).

Now, I log a lot of stuff in my apps/server code, both in development and production (with dynamically configured log levels). With the changes in this proposal, every time I want to include some value in a log message, I would need to remember to check whether that type is already marked Reflectable, has a custom string representation, or add the conformance if required.

If the type in question is not from my code, but from some dependency that neither marked the type Reflectable nor provided a custom string representation, I'm out of luck completely, and need to add some custom description computed variable that I use when I log that type.

So far, I'm only complaining about annoyances for me. Maybe most developers don't work that way. But there is one group, like @glessard alluded to, that in my experience uses logs a lot: learners.

When I see developers new to a language or framework struggle with something, they almost always start logging a bunch of stuff. How you log things is often among the first things you learn about a new language. I distinctly remember that when I started learning and adopting Swift, the ability to get useful descriptions of values without any extra effort was one of my favourite features. Maybe number one. Seeing how much I'm writing here it might still be :D

Requiring developers to know about reflection metadata, what it is used for, and to mark their types Reflectable for something as simple as logging a value is imo a violation of progressive disclosure for learners, and quite the annoying bit of ceremony for everyone else.

Source Compatibility

The proposal states multiple times that in opt-in-mode, in Swift 6, use of reflection API would cause compile-time errors:

For modules that have the metadata disabled, but are consumers of reflectable API, the compiler will emit the error enforcing the guarantee.

[...] code with types that has not been audited to conform to the Reflectable protocol will fail to compile if used with APIs that consume the reflection metadata.

These sentences fail to take into account this earlier part of the proposal: "We intentionally do not propose adding a Reflectable constraint on Mirror type", as well as the implication that print etc. won't require Reflectable either.

This means that, when moving an existing Swift 5 codebase to Swift 6, where that codebase uses Mirror, some abstraction over it, print and friends, string interpolation or some other package reading the metadata directly, nothing will fail to compile (unless you use them through dependencies that you updated and that were already changed to require Reflectable), but things will misbehave at runtime.

You might not even notice right away. Maybe this only materializes a month later while debugging some problem that only occurs in production. Sure would be nice to know what exactly is happening, but all you see in your logs is "SomeErrorWithVeryInformativeProperties()" and now you need to go add a comma, a "Reflectable", push a new build of the app and wait for the error to occur again before you can continue debugging. This could reasonably be called data loss.

I don't think such a silent change in behaviour is acceptable, even for a major language version. If Mirror required Reflectable and print and string interpolation required either Reflectable or Custom(Debug)?StringConvertible, the only silent changes left would be custom code reading the metadata and types that don't get reflected directly but as a child of another type. That still doesn't sound ideal but much better than the proposed version.

Conclusion

I think the drawbacks of this proposal in its current form a too big to accept. I think this is amplified by the fact that they have more impact on learners, while the proposals wins mostly (only? I don't really care about the 1-2mb it might save for our apps) benefit large code bases with large development teams, which should be well equipped with experienced developers and person-hours to figure out what tradeoff they want wrt. including metadata.

The proposed escape hatch of passing -enable-full-reflection-metadata doesn't solve the progressive disclosure problem (and to my taste skirts the line to introducing a dialect anyway).

Imo, this proposal would be much improved by keeping the default as implicitly conforming all types to Reflectable, by requiring proper parameter conformances for the types and functions mentioned above, and by making opt-in mode either a flag, or maybe possibly by using a @noReflection attribute on types (and/or individual fields?) instead.

bbrk24 · December 7, 2022, 1:44pm

This is a minor nitpick and I agree with your overall point, but don't forget TextOutputStreamable!

omax · December 7, 2022, 4:27pm

Thank you all for the feedback this is very useful!

It seems the main concern is related to enabling Opt-In mode in Swift 6 by default.
Would it address the concern of the community, if (1) the Opt-In mode would be hidden behind a flag as @ahti suggested?
(2) Conformance to Reflectable is synthesized by default for all types in a module.
(3) All reflection-consuming APIs have a requirement to conform to Reflectable

@Adrian_Prantl this is a valid concern, thanks!

As far as I understood, the best solution, in your opinion, would be to introduce a new section in a TEXT segment which would contain reflection emitted specifically for debugging purposes. dsymutil would later move it to dSYM, while the original binary would preserve the original section with reflection metadata used at runtime.

What do you think about a slightly different option that might help to untie emitting reflection from debugging options? What if the compile always emits reflection symbols, but references them in NTD only if that type is Reflectable? dsymutil would always be able to copy them to a dSYM while the linker would strip unused?

I’m a little weirded out that our second ever marker protocol is introducing dynamic casts (which are supposed to not be supported on marker protocols).

@beccadax As far as I understood casting to a marker protocol generally doesn't make sense, because it doesn't exist at runtime and is represented as Any type. But I don't see major issues allowing casts to a marker protocol if it has special handling. Could you give more details related to your concerns?

I guess looking at metadata might help a little but I don't think the bit of security-by-obscurity provided by omitting it is a deal breaker or will deter many reverse-engineering attempts.

@ahti Agree that reflection isn't a too huge threat, but IDA for instance has the functionality to read reflection metadata for Go programs to recover types' layout.

@ahti Currently Instagram doesn't have much Swift, and according to our estimations, Swift takes ~1.6-1.8x more binary size than ObjC, so I used 400Mb number in my calculations to get 40Mb of reflection metadata.

I also would like to emphasize that binary size isn't the main goal of this proposal while removing compiler options to emit invalid code is.

beccadax · December 7, 2022, 9:00pm

I suppose I’m worried about how this dilutes the concept of a marker protocol. Today, a “marker protocol” is a protocol that’s used purely for compile-time checking, with no ABI presence, no dynamic casting, and no impact on behavior at runtime. I’m a little concerned that we have only just introduced this concept and we are already starting to create exceptions.

Having said that, there is no practical obstacle here. I’m uncomfortable with the design and I’d like to express that in case this discomfort is widespread, leads to the recognition of a more serious problem, or can be addressed in a way that improves the design as a whole. But this observation is basically just a “design smell”, and I don’t see it as justifying a rejection by itself.

technogen · December 11, 2022, 3:26pm

I've been using the term "specifier protocol" to describe any protocol whose conforming types are never supposed to be instantiated and whose type metadata object itself is the interesting thing. With the features brought by Swift 5.7, working with such protocols have become incomparably easier. The most notable example would be an idea borrowed from SwiftUI:

public protocol AttributeSpecifier {

    // MARK: AttributeSpecifier - Attribute 

    associatedtype Attribute

    static var defaultAttribute: Attribute { get }
}

internal struct AttributeKey {

    // MARK: AttributeKey - SpecifierType

    internal typealias SpecifierType = any AttributeSpecifier.Type

    internal init(specifierType: SpecifierType) {
        self.specifierType = specifierType
    }
}

extension AttributeKey: Equatable {

    // MARK: Equatable

    internal static func == (_ preceding: Self, _ following: Self) -> {
        typealias ID = ObjectIdentifier
        return ID(preceding.specifierType) == ID(following.specifierType)
    }
}

extension AttributeKey: Hashable {

    // MARK: Hashable

    internal func hash(into hasher: inout Hasher) {
        typealias ID = ObjectIdentifier
        hasher.combine(ID(specifierType))
    }
}

A marker protocol would be special case of a specifier protocol.
Maybe the term "specifier" can be useful here.

dreampiggy · January 10, 2023, 3:45am

This is good for code size, but any suggestion to make this change useful for debugger with "prebuilt swift binary" ?

Current LLDB's swift typeref typesystem rely on Relefection, and it supports non-source-code-availble case, like third party binary

Encode the Reflection Medata into .swiftmodule or new files ?