SE-0379: Opt-In Reflection Metadata

This is a valid framing, but it’s not just GUI apps. Anybody distributing their program in any capacity has a reason to care about code size, including Docker containers on server, closed-source libraries on iOS (and eventually Windows?), anyone using a build farm (which, admittedly, is not well-supported right now), and experiments with using Swift in resource-constrained environments, such as embedded or wasm. On the flip side, you have command-line tools using print on non-CustomStringConvertible types to get something human-readable, not even reliably parseable. My impression of the relative size of these aggregate groups is opposite of what your framing implies with “enormous” and “everyday”.

I personally think tying this to a language version is the right compromise. Swift 5 mode is unlikely to go away for a long, long time.

3 Likes

A thought: should the runtime warn, once per type, if someone attempts to print a type without the appropriate metadata? It’s not much of a signal but it might help, similar to how NSKeyedArchiver warns if you archive a class without a stable runtime name.

4 Likes

I think this proposal solves a legitimate issue for apps that want to strip the reflection metadata by introducing the Reflectable protocol that library author's can use to ensure that reflection metadata is available at runtime for their APIs. What confuses me is that the proposal solves this issue, but then wants to completely flip our defaults for Swift 6. I agree with Guillaume here that this is a very large behavior difference and might actually be a large source break for some programs.

Just to display the behavioral difference of print that will come with this proposal in Swift 6:

struct Dog {
    let name: String
    let age: Int
}

let sparky = Dog(name: "Sparky", age: 2)

// Debug: "Dog(name: "Sparky", age: 2)"
// Release: "Dog()"
print(sparky)

but print doesn't have to output to stdout, you can write applications that may look like:

var dogString = ""
print(sparky, to: &dogString)

// "Dog(name: "Sparky", age: 2)"
print(dogString)

Granted this use case may be less common than the former, but the documentation for this feature doesn't say that this format stable or unstable. Given that this output hasn't been changed for a long time (unless you explicitly change it yourself by adding a CustomStringConvertible conformance), I think this would break a lot of code.

I agree, but they already have control over their code size by stripping this info themselves which is what prompt this proposal because there's no way for APIs to designate that they require this information at runtime (which this proposal solves beautifully).

I’ve been thinking why this is sticking with me, and ultimately I think it comes down to wanting good defaults. If we think 80% of projects benefit from this, it would stink if 80% of projects had to add an extra config line to their build commands, package manifest, or Xcode project. On the other hand, if we think only 20% of projects benefit from this, that’s much less motivating, and the benefit to being “batteries-included” for unconfigured projects is more important.

4 Likes

I’m really excited for this! Are there any estimates of the expected binary-size reductions?

How will the output of String(describing:), print(_:), debugPrint(_:) and dump(_:) for structs, enums and classes without reflection metadata and without CustomString/DebugConvertible look like?

e.g.

struct Foo: Reflectable {
    enum Bar {
        case a
        case b(Int)
        case c(Baz)
    }
    struct Baz {
        var d: String
    }
    var bar: Bar
}
print(Foo(bar: .a))
print(Foo(bar: .b(1)))
print(Foo(bar: .c(Baz(d: "Hello")))

Note that only Foo conforms to Reflectable


I'm a bit worried about custom Error types. Have you considered letting Error inherit from Reflectable?
But even then I'm a bit concerned that one might forget to mark all types stored in the custom errors with Reflectable as well (e.g. the Foo type above). The new defaults in Swift 6 reduce the available information drastically if we are not careful and mark all types used as stored properties. Only at runtime if you print/log such an error you will notice that some information is not available to you.

Maybe we could have something that requires all stored property types to inherit from Reflectable as well (similar to what Sendable does) or let them atomically inherit for nested types.

In SwiftNIO we often wrap our enum based errors in a struct to make it possible to add more cases without breaking API. e.g.

Error inherits from Sendable and all non-public types are automatically Sendable if conformance can be auto synthesised. Therefore no change was required and NIOHTTPObjectAggregatorError automatically conforms to Sendable. If any stored propertiy would not conform to Sendable we would get a compile time warning or error in Swift 6.

On example of such a custom error type which sadly can't conform to Sendable (full explanation why is in the source code at the very bottom) is our generic VerificationError.

The compiler made us aware of this issue through a warning at compile time but also gave us a way to silence the warning through @unchecked Sendable.

1 Like

I would think that a lot of the projects that would be marred by this change in the reflection metadata defaults would be ones that have little to no visibility. Personal projects and learner's projects among others. Projects that need the code size reduction will have the tools to achieve it, regardless of whether the default is flipped. However, others who don't need that would then have to change their code (and add extra ceremony) to retain the convenience they currently have.

I haven't tested using the actual implementation, but I imagine it'll print something like the following:

Foo(bar: output.Foo.Bar)
Foo(bar: output.Foo.Bar)
Foo(bar: output.Foo.Bar)
1 Like

Just to display the behavioral difference of print that will come with this proposal in Swift 6:

struct Dog {
    let name: String
    let age: Int
}

let sparky = Dog(name: "Sparky", age: 2)

// Debug: "Dog(name: "Sparky", age: 2)"
// Release: "Dog()"
print(sparky)

This example isn't quite right because in Swift 6 if a type doesn't conform to Reflectable, reflection metadata emitted for debugging wouldn't be accessible through Nominal Type Descriptor. So it would look like this, even if the debugger has access to reflection metadata:

// Debug: "Dog()"
// Release: "Dog()"

But if you wanted to emphasize the difference between language versions, then yes, it would look like this:

// Swift 5: "Dog(name: "Sparky", age: 2)"
// Swift 6: "Dog()"

I agree with Jordan that the current default behavior may not benefit most developers who just want their apps to be safe and small.

I can't come up with many cases when code like this would be useful for real-life applications.

var dogString = ""
print(sparky, to: &dogString)

In my opinion, the internal state of variables should be used only for debugging, and no logic should depend on it. (I may be wrong, so any examples are welcome)

Types' names are slightly different cases, but they are available without reflection. For instance, In the iOS world API like String(describing:) might be used to register UICollectionViewCell class with runtime, but I don't think that the absence of reflection will break this code.

Those reflection-consuming APIs do not provide any guarantees about their output and make the best effort to do their job. If a developer chooses to depend on their output, they implicitly take risks that the output may change. (More information might be printed, the output may change, etc)

I think the main point here is that some percentage of code might be broken in the short term, but in a long term, developers will gain an opportunity to make sure that this code will never break because of the absence of reflection. And Swift 6 might be the best option to achieve this because developers will need to migrate their codebases to this major version of the language anyway.

1 Like

This is quite difficult to say, but accordingly to our rough estimations if the major part of Instagram was written in Swift, safe stripping of reflection metadata would bring ~20-40Mb of binary size reduction.

1 Like

Projects that need the code size reduction will have the tools to achieve it, regardless of whether the default is flipped.

I see defaulting to opt-in mode as a trade-off between short-term convenience and long-term efficiency of the language.
Yes, if full reflection is enabled by default in Swift 6, it won't break some code that depends on it for some reason, but the majority of apps that don't want this, won't get any benefits from this feature.
And we will miss this opportunity to do this with a little blood. (Major language version migration, runtime warnings)

So the idea of enabling Opt-In mode by default comes from the understanding that Swift should be safe and efficient by default if it is possible to achieve.

I would think that a lot of the projects that would be marred by this change in the reflection metadata defaults would be ones that have little to no visibility. Personal projects and learner's projects among others.

They won't have to change the source code in any way, this is just a matter of adding -enable-full-reflection-metadata flag.

2 Likes

Flipping sides for a moment here, “debugging” includes debugging issues in production reported via logs from users (hence why Error was highlighted above, but not the only such case). Existing code might be dumping a struct into a log via default stringification.

(But IMO that code will continue using Swift 5 mode.)

  • What is your evaluation of the proposal?

I find some of the wording in this proposal to be misleading and have strong concerns about this proposal.

Changes for debugging

Since Reflection metadata might be used by the debugger, we propose to always keep that metadata if full emission of debugging information is enabled (with -gdwarf-types or -g flags). However, such Reflection metadata won't be accessible through the nominal type descriptor which will avoid inconsistencies in API behavior between Release and Debug modes.

The wording "might" suggests that reflection metadata is optional for the debugger, but that's not the case. The debugger depends on reflection metadata for any operation that involves displaying any variables or function arguments.

On a more technical note, I see two problems with the proposed mechanism of tying the emission of reflection metadata to the presence of the -g flag:

The presence of a debugging options is not supposed to affect code generation (i.e., the contents of the __TEXT segment). This is a requirement the Swift compiler has inherited from the LLVM project.

More importantly though, this suggestion is conflating the notion of an "unoptimized development build versus with assertions versus release build" with debug info. Many software developers build their releases with full debug info, but just don't ship the debug info together with the application. For example, on Darwin the debug info is linked separately from the binary into a .dSYM bundle, which makes this process very natural. Because debug info isn't allowed to affect code generation, this can be done without affecting the performance of the binary.

A proper design for this feature would partition the reflection metadata into a metadata that goes into the binary and metadata that should go into the debug info. The dsymutil utility currently implements an all-or-nothing variant of this that allows all reflection metadata to be copied in the .dSYM bundle together with the debug info, thus allowing for the binary to be linked without the reflection metadata, while still preserving 100% debugability.

To summarize: I believe this proposal should be reworked to allow for all of the reflection metadata to be salvaged for debugging purposes. I am happy to help coming up with a proper design for this.

7 Likes

At first glance, I generally support this feature, but I’m a little weirded out that our second ever marker protocol is introducing dynamic casts (which are supposed to not be supported on marker protocols). On the other hand, if we want that kind of runtime check for reflectability, the alternative is to provide it with functions or static methods, and I’m not really convinced that’s any better.

7 Likes

I agree there is room for improvement over the all-or-nothing approach to including metadata we have now, but I have a number of problems with this approach that make it a -1 for me.

(whoops, bit of a wall of text, sorry :speak_no_evil:)

Motivation

Let's start with the motivation section, it mentions the two underlying problems that the proposal ultimately seeks to address:

First: Metadata "may simplify reverse-engineering". In the (admittedly limited) amount of reverse-engineering Darwin apps I've done, I mostly oriented myself by symbols, so by mangled function names. I guess looking at metadata might help a little but I don't think the bit of security-by-obscurity provided by omitting it is a deal breaker or will deter many reverse-engineering attempts.

Second: Metadata "unnecessarily increases the binary size". Based on anecdotal evidence by @benpious and me back in the pitch thread the size increase seems falls somewhere between 5-10% of the binary size. The 20-40mb for Instagram (245mb download size) mentioned in this thread are a bit bigger than that, but still in the same ballpark. I think it would be good if the proposal gave a more informed picture of potential space savings in the motivation section.

So, I don't really agree with the reverse-engineering part of the motivation. Binary size on the other hand is a bit of a problem area for Swift.

Logging / String Interpolation

First of all, the proposal mentions print, debugPrint, and dump as having altered behaviour for types that are not marked Reflectable, but I do wanna point out that string interpolation is affected as well (String(reflecting:) too, but that is obvious).

Now, I log a lot of stuff in my apps/server code, both in development and production (with dynamically configured log levels). With the changes in this proposal, every time I want to include some value in a log message, I would need to remember to check whether that type is already marked Reflectable, has a custom string representation, or add the conformance if required.

If the type in question is not from my code, but from some dependency that neither marked the type Reflectable nor provided a custom string representation, I'm out of luck completely, and need to add some custom description computed variable that I use when I log that type.

So far, I'm only complaining about annoyances for me. Maybe most developers don't work that way. But there is one group, like @glessard alluded to, that in my experience uses logs a lot: learners.

When I see developers new to a language or framework struggle with something, they almost always start logging a bunch of stuff. How you log things is often among the first things you learn about a new language. I distinctly remember that when I started learning and adopting Swift, the ability to get useful descriptions of values without any extra effort was one of my favourite features. Maybe number one. Seeing how much I'm writing here it might still be :D

Requiring developers to know about reflection metadata, what it is used for, and to mark their types Reflectable for something as simple as logging a value is imo a violation of progressive disclosure for learners, and quite the annoying bit of ceremony for everyone else.

Source Compatibility

The proposal states multiple times that in opt-in-mode, in Swift 6, use of reflection API would cause compile-time errors:

For modules that have the metadata disabled, but are consumers of reflectable API, the compiler will emit the error enforcing the guarantee.

[...] code with types that has not been audited to conform to the Reflectable protocol will fail to compile if used with APIs that consume the reflection metadata.

These sentences fail to take into account this earlier part of the proposal: "We intentionally do not propose adding a Reflectable constraint on Mirror type", as well as the implication that print etc. won't require Reflectable either.

This means that, when moving an existing Swift 5 codebase to Swift 6, where that codebase uses Mirror, some abstraction over it, print and friends, string interpolation or some other package reading the metadata directly, nothing will fail to compile (unless you use them through dependencies that you updated and that were already changed to require Reflectable), but things will misbehave at runtime.

You might not even notice right away. Maybe this only materializes a month later while debugging some problem that only occurs in production. Sure would be nice to know what exactly is happening, but all you see in your logs is "SomeErrorWithVeryInformativeProperties()" and now you need to go add a comma, a "Reflectable", push a new build of the app and wait for the error to occur again before you can continue debugging. This could reasonably be called data loss.

I don't think such a silent change in behaviour is acceptable, even for a major language version. If Mirror required Reflectable and print and string interpolation required either Reflectable or Custom(Debug)?StringConvertible, the only silent changes left would be custom code reading the metadata and types that don't get reflected directly but as a child of another type. That still doesn't sound ideal but much better than the proposed version.

Conclusion

I think the drawbacks of this proposal in its current form a too big to accept. I think this is amplified by the fact that they have more impact on learners, while the proposals wins mostly (only? I don't really care about the 1-2mb it might save for our apps) benefit large code bases with large development teams, which should be well equipped with experienced developers and person-hours to figure out what tradeoff they want wrt. including metadata.

The proposed escape hatch of passing -enable-full-reflection-metadata doesn't solve the progressive disclosure problem (and to my taste skirts the line to introducing a dialect anyway).

Imo, this proposal would be much improved by keeping the default as implicitly conforming all types to Reflectable, by requiring proper parameter conformances for the types and functions mentioned above, and by making opt-in mode either a flag, or maybe possibly by using a @noReflection attribute on types (and/or individual fields?) instead.

7 Likes

This is a minor nitpick and I agree with your overall point, but don't forget TextOutputStreamable!

3 Likes

Thank you all for the feedback this is very useful!

It seems the main concern is related to enabling Opt-In mode in Swift 6 by default.
Would it address the concern of the community, if (1) the Opt-In mode would be hidden behind a flag as @ahti suggested?
(2) Conformance to Reflectable is synthesized by default for all types in a module.
(3) All reflection-consuming APIs have a requirement to conform to Reflectable

@Adrian_Prantl this is a valid concern, thanks!

As far as I understood, the best solution, in your opinion, would be to introduce a new section in a TEXT segment which would contain reflection emitted specifically for debugging purposes. dsymutil would later move it to dSYM, while the original binary would preserve the original section with reflection metadata used at runtime.

What do you think about a slightly different option that might help to untie emitting reflection from debugging options? What if the compile always emits reflection symbols, but references them in NTD only if that type is Reflectable? dsymutil would always be able to copy them to a dSYM while the linker would strip unused?

I’m a little weirded out that our second ever marker protocol is introducing dynamic casts (which are supposed to not be supported on marker protocols).

@beccadax As far as I understood casting to a marker protocol generally doesn't make sense, because it doesn't exist at runtime and is represented as Any type. But I don't see major issues allowing casts to a marker protocol if it has special handling. Could you give more details related to your concerns?

I guess looking at metadata might help a little but I don't think the bit of security-by-obscurity provided by omitting it is a deal breaker or will deter many reverse-engineering attempts.

@ahti Agree that reflection isn't a too huge threat, but IDA for instance has the functionality to read reflection metadata for Go programs to recover types' layout.

@ahti Currently Instagram doesn't have much Swift, and according to our estimations, Swift takes ~1.6-1.8x more binary size than ObjC, so I used 400Mb number in my calculations to get 40Mb of reflection metadata.

I also would like to emphasize that binary size isn't the main goal of this proposal while removing compiler options to emit invalid code is.

I suppose I’m worried about how this dilutes the concept of a marker protocol. Today, a “marker protocol” is a protocol that’s used purely for compile-time checking, with no ABI presence, no dynamic casting, and no impact on behavior at runtime. I’m a little concerned that we have only just introduced this concept and we are already starting to create exceptions.

Having said that, there is no practical obstacle here. I’m uncomfortable with the design and I’d like to express that in case this discomfort is widespread, leads to the recognition of a more serious problem, or can be addressed in a way that improves the design as a whole. But this observation is basically just a “design smell”, and I don’t see it as justifying a rejection by itself.

3 Likes

I've been using the term "specifier protocol" to describe any protocol whose conforming types are never supposed to be instantiated and whose type metadata object itself is the interesting thing. With the features brought by Swift 5.7, working with such protocols have become incomparably easier. The most notable example would be an idea borrowed from SwiftUI:

public protocol AttributeSpecifier {

    // MARK: AttributeSpecifier - Attribute 

    associatedtype Attribute

    static var defaultAttribute: Attribute { get }
}

internal struct AttributeKey {

    // MARK: AttributeKey - SpecifierType

    internal typealias SpecifierType = any AttributeSpecifier.Type

    internal init(specifierType: SpecifierType) {
        self.specifierType = specifierType
    }
}

extension AttributeKey: Equatable {

    // MARK: Equatable

    internal static func == (_ preceding: Self, _ following: Self) -> {
        typealias ID = ObjectIdentifier
        return ID(preceding.specifierType) == ID(following.specifierType)
    }
}

extension AttributeKey: Hashable {

    // MARK: Hashable

    internal func hash(into hasher: inout Hasher) {
        typealias ID = ObjectIdentifier
        hasher.combine(ID(specifierType))
    }
}

A marker protocol would be special case of a specifier protocol.
Maybe the term "specifier" can be useful here.

This is good for code size, but any suggestion to make this change useful for debugger with "prebuilt swift binary" ?

Current LLDB's swift typeref typesystem rely on Relefection, and it supports non-source-code-availble case, like third party binary

Encode the Reflection Medata into .swiftmodule or new files ?