PSA: Compiler optimisation remarks

Karl · December 19, 2021, 8:08pm

There is a super-hidden feature in the compiler which gives you insight in to all kinds of optimisation decisions: such as the cost-benefit calculations for inlining, where exclusivity is enforced, where ARC operations happen, which generic specialisations get generated and which calls get devirtualised.

To use it, add the @_assemblyVision attribute to a function, and build in release mode (or @_semantics("optremark") on a pre-5.6 compiler. I'm using 5.5.2, so that's what I'm showing). Be careful not to mark too many functions - on just one function, I received 805 remarks.

Make sure you edit your scheme settings to build in release mode:

I haven't seen it mentioned at all, and couldn't find any documentation about it, but it's really helpful. So I figured it was worth letting people know about it.

Saklad5 · December 20, 2021, 9:32pm

Add this to the list of things that are undocumented for no clear reason.

ahti · December 20, 2021, 9:49pm

I'm gonna take a stab and guess that this was probably developed as a tool for folks working on the optimizer, not for developers to use for guidance on rearranging their code to be more optimizer friendly.

As such, I would expect the decisions and factors that are laid bare by this attribute to be able to change with any dot release, and certainly not base any critical decisions on the info gleamed from this.

E.g. some code is too slow to ship without optimization, but if we just move this line here a little, the cost and benefit shift slightly and it gets optimized, then some dot release comes along at a random point in the future, and suddenly the critical section is slow again without any code changes.

Saklad5 · December 20, 2021, 9:50pm

On the one hand, you’re almost certainly correct. On the other hand, this is really cool.

Michael_Gottesman · December 20, 2021, 11:04pm

You are actually looking at two different features.

The first is called opt remarks. "opt remarks" shows the decisions the optimizer is making (e.x.: did I specialize this, did I inline this/etc).

The second is something that I invented called "Assembly Vision". This is the thing that is telling you where the ARC/exclusivity checks/runtime casts. Assembly Vision also enables the other form of normal opt-remarks since that information can be useful when determining where/why ARC is there. The idea is to make it so that instead of having to read the assembly, one can have vision on approximately where these calls are so you don't have to read the assembly yourself.

The proper way to invoke this is to put the @_assemblyVision attribute on a nominal type or a function. I believe that it is on trunk/5.6 (it might be in 5.5, I don't remember).

The reason why I haven't made a bigger deal about it is I want to extend it further and make it more powerful before I really shouted about it.

I hope it is useful for you! The concept came from me trying to automate how I optimize Swift code as a compiler engineer so that other engineers who aren't compiler people can be just as effective. The best way to see it in practice is look at the test cases that I have committed into tree (I posted some links to it below). Another thing to keep in mind is that -O gives worse remarks due to function signature optimization messing with some stuff. -Osize though works really well.

github.com

apple/swift/blob/main/test/SILOptimizer/assemblyvision_remark/nominal_type_attributes.swift

// RUN: %target-swiftc_driver -Osize -emit-sil %s -o /dev/null -Xfrontend -verify
// REQUIRES: optimized_stdlib,swift_stdlib_no_asserts

// Make sure we emit remarks on nominal types

@inline(never)
func callPrint(_ s: String) { print(s) }

var global: String = "123"

@_assemblyVision
struct Struct {
    func printMe() {
        callPrint(global) // expected-remark {{begin exclusive access to value of type '}}
                          // expected-note @-6 {{of 'global'}}
                          // expected-remark @-2 {{end exclusive access to value of type '}}
                          // expected-note @-8 {{of 'global'}}
                          // expected-remark @-4 {{retain of type '}}
                          // expected-note @-10 {{of 'global'}}
                          // expected-remark @-6 {{release of type '}}

This file has been truncated. show original

github.com

apple/swift/blob/main/test/SILOptimizer/assemblyvision_remark/basic.swift

// RUN: %target-swiftc_driver -O -Rpass-missed=sil-assembly-vision-remark-gen -Xllvm -sil-disable-pass=FunctionSignatureOpts -Xfrontend -enable-copy-propagation -emit-sil %s -o /dev/null -Xfrontend -verify -Xfrontend -enable-lexical-borrow-scopes=false
// REQUIRES: optimized_stdlib,swift_stdlib_no_asserts

public class Klass {
    var next: Klass? = nil
}

// TODO: Change global related code to be implicit/autogenerated (as
// appropriate) so we don't emit this remark.
public var global = Klass() // expected-remark {{heap allocated ref of type 'Klass'}}

@inline(never)
public func getGlobal() -> Klass {
    return global // expected-remark @:5 {{retain of type 'Klass'}}
                  // expected-note @-5:12 {{of 'global'}}
                  // expected-remark @-2:12 {{begin exclusive access to value of type 'Klass'}}
                  // expected-note @-7:12 {{of 'global'}}
                  // expected-remark @-4:12 {{end exclusive access to value of type 'Klass'}}
                  // expected-note @-9:12 {{of 'global'}}
}

This file has been truncated. show original

github.com

apple/swift/blob/main/test/SILOptimizer/assemblyvision_remark/attributes.swift

// RUN: %target-swift-frontend -enable-copy-propagation=requested-passes-only -enable-lexical-borrow-scopes=false -emit-sil %s -verify -Osize -o /dev/null -module-name main
//
// NOTE: We only emit opt-remarks with -Osize,-O today! -O does drop way more
// stuff though, so we test with -Osize.

public class Klass {}

public var mySingleton = Klass()

@inline(never)
func getGlobal() -> Klass {
    return mySingleton
}

@inline(never)
func useKlass(_ k: Klass) {}

@_semantics("optremark")
@inline(never)
public func forceOptRemark() {

This file has been truncated. show original

Michael_Gottesman · December 20, 2021, 11:04pm

ahti:

Saklad5:

Add this to the list of things that is undocumented for no clear reason.

I'm gonna take a stab and guess that this was probably developed as a tool for folks working on the optimizer, not for developers to use for guidance on rearranging their code to be more optimizer friendly.

As such, I would expect the decisions and factors that are laid bare by this attribute to be able to change with any dot release, and certainly not base any critical decisions on the info gleamed from this.

E.g. some code is too slow to ship without optimization, but if we just move this line here a little, the cost and benefit shift slightly and it gets optimized, then some dot release comes along at a random point in the future, and suddenly the critical section is slow again without any code changes.

This is not true. I just wanted to improve it further before I really shouted about it. If it is useful to you... use it! That being said, you are correct that it doesn't provide guarantees per say, but it /can/ help you to understand your code (which is the point).

Michael_Gottesman · December 20, 2021, 11:15pm

@Karl I would appreciate if you could fix your example to use @_assemblyVision so that people do not use @_semantics("optremark")

Jon_Shier · December 20, 2021, 11:24pm

FYI to anyone interested in these features: the new Swift build system integration launched in Xcode 13.2 seems to hide remarks, so if you aren't seeing the expected output, turn it off.

Saklad5 · December 21, 2021, 12:53am

That’s an incredibly impressive feature, thank you! I think I saw it in the underscored attribute list before.

Though it’s a little weird that you’d put it into production unfinished, without even a compiler flag to toggle it. Is that common?

David_Smith · December 21, 2021, 12:56am

"usable but not fully ready" is not uncommon, yeah. For example _cdecl is in a similar state. Just need to make sure that we don't accidentally commit to a future plan prematurely, and that the unpolished bits don't impact production-ready stuff (e.g. _assemblyVision shouldn't have any impact on code that doesn't use it).

Michael_Gottesman · December 21, 2021, 1:09am

The key thing here is the underscore. The underscore signals that it is unfinished and not intended to be treated as a final finished thing. It is sort of like saying this is experimental or unstable.

Saklad5 · December 21, 2021, 4:15am

Sure, but wouldn’t it make more sense to lock it behind a feature flag too?

Michael_Gottesman · December 21, 2021, 4:58am

There isn't any real advantage to doing that and I wanted to be able to get feedback.

Karl · December 21, 2021, 5:50am

Oh! I'm Sorry! I assumed it was something being used internally or for optimiser development. But it is super-helpful, so thank you very much for creating it!

Yeah in one day it has already helped me discover that:

withContiguousStorageIfAvailable fallbacks for String.UTF8View were resulting in more specializations than I expected. It's really subtle and easy to miss, but I managed to reduce my binary size by 20% (!) by avoiding that.
Some of my algorithms should be split in to small functions for better inlining
Some theoretical fast paths in my algorithms were incurring ARC. I've seen big benefits by slimming them down.
wCSIA is not eliminated until late in the optimiser pipeline, so the compiler specialises my program 3 or 4 times more than it needs to, then discards most of what it did. I filed SR-15624 about it. No impact on size or performance, but there are potentially compile-time savings there.

That spelling doesn't seem to work on 5.5.2/Xcode 13.2, but I added it and to the post as the preferred spelling.

The feedback I can give is that it's great , but if you have a function which is specialised multiple times, the results can look a bit cluttered and it's hard to tell them apart. For example, this function is inlined in some specializations but not in others, and it's not clear which:

Also, it's not clear why some specializations are generated. I was looking everywhere for what was causing the String.UTF8View specializations to be generated (bear in mind that it may not even be reachable code, as in SR-15624).

David_Smith · December 21, 2021, 6:14am

Hm phase ordering issues like this are super subtle, I wouldn't be surprised if there actually are lurking performance issues hidden behind that. Neat find.

Michael_Gottesman · December 21, 2021, 6:58am

No worries!

Karl:

Michael_Gottesman:

I hope it is useful for you! The concept came from me trying to automate how I optimize Swift code as a compiler engineer so that other engineers who aren't compiler people can be just as effective. The best way to see it in practice is look at the test cases that I have committed into tree

Yeah in one day it has already helped me discover that:

withContiguousStorageIfAvailable fallbacks for String.UTF8View were resulting in more specializations than I expected. It's really subtle and easy to miss, but I managed to reduce my binary size by 20% (!) by avoiding that.

Some of my algorithms should be split in to small functions for better inlining

Some theoretical fast paths in my algorithms were incurring ARC. I've seen big benefits by slimming them down.

wCSIA is not eliminated until late in the optimiser pipeline, so the compiler specialises my program 3 or 4 times more than it needs to, then discards most of what it did. I filed SR-15624 about it. No impact on size or performance, but there are potentially compile-time savings there.

Great! That makes me really happy!

Ok.

The specialization issue I am not sure what that is about, but what I can say is that:

I want to add to @_assemblyVision the ability to specialize which of the perf remarks you are getting with the idea that you could select individual ones or we could make specific views (like ARC or Exclusivity) and maybe you could add the marker multiple times with different categories that we or together. I think this would help cut down on the verbosity of the output.
I think it may be clearer if you look at it on the command line or use something like emacs with that where you can jump to definition from the compile output. I find that makes it easier to read in Xcode.

Jon_Shier · December 21, 2021, 7:20am

Xcode integration with the remarks system in general would be great. Not just for this and other custom remarks, but the educational output and other systems as well.

David_Smith · December 21, 2021, 8:55am

I tossed a PR up to see what happens if we make those always-inline

David_Smith · December 21, 2021, 10:24am

Looks inconclusive perf-wise, and a notable code size regression, so I think we can conclude that the optimizer is doing its job and cleaning up after itself despite the phase ordering issue as @Karl noted. Alas, no trivially easy wins to be had here.

Karl · December 21, 2021, 10:27am

Oh well, thanks for trying