Ideas for code size optimizations

Joe_Groff · November 30, 2020, 9:38pm

In addition to the LLVM improvements Erik mentioned, a lot of the changes we've been making to Swift fall into these categories:

Replacing code-driven runtime mechanisms with compact data-driven ones. One example of this is using mangled names to encode types we want to access metadata for, and using a runtime interpreter to perform the access, instead of open-coding sequences of accessor calls.
Systematic improvements to SIL, such as OSSA and opaque values, that allow for more pervasive optimization. Unnecessary ARC traffic and extra copies in generic code are significant code size sinks.
Looking for opportunities where on-demand-generated definitions, such as generic specializations and thunks, can be migrated to centralized locations such as the libswiftCore.dylib, so that all Swift binaries can share one copy.
Improving lazy code generation in Swift's IRGen and SILGen stages, so that fewer unused private/internal/on-demand-generated definitions get emitted in the first place.

Work is ongoing in all of those areas. More recently, we've also been investigating whether we can be less conservative about marking definitions used, so that linker-level dead stripping can be more effective, which you can see in this PR:

github.com/apple/swift

Add a -emit-dead-strippable-symbols flag that emits functions/variable/metadata in a dead_strip-friendly way.

apple:main ← kubamracek:dead_strip1

opened 08:22PM - 12 Oct 20 UTC

kubamracek

+344 -56

Add a -emit-dead-strippable-symbols flag that emits functions/variable/metadata …in a dead_strip-friendly way. This enables static linking to remove unused functions and classes even across modules. For now, I'm adding the flag as an opt-in experimental mode, but the goal is to eventually be able to emit symbols this way for everything. This way we could see nice code size savings when e.g. including a lot of package dependencies where not everything is actually used.

One hazard in doing this is breaking apps that rely on type reflection, so we'd need some way to more explicitly track types whose metadata is needed at runtime. @omax's proposal here is one possibility:

Here is a somewhat random list of code size improvement ideas we're still working on, or planning to start work on. (This should not be taken as us "claiming" this work for ourselves exclusively, if any other motivated people or teams want to investigate any of these projects!)

We now use mangled names for accessing fully specialized type references, and we want to extend the mangling to be able to do so for protocol conformances and witness tables as well. This will also address one of the launch time regression issues from using mangled names, which is the need to re-look-up conformances while resolving mangled generic arguments.
- We also want to improve the performance and code size of the runtime demangler; it's based on the demangler implementation from the swift-demangle tool as a bootstrapping step, but this has a lot of unnecessary functionality, and a design that's inefficient for runtime use, where it first builds a parse tree and the runtime then has to walk the tree. A demangler implementation that could be pared down to only what was necessary for resolving types and conformances, and which was callback-based so that the mangled name could be interpreted on the fly without an intermediate tree representation, ought to be able to reduce the interpreter overhead to the point we could use mangled names for all metadata accesses, not only fully specialized ones.
Value witness functions can also be a significant source of code size, and there are a number of approaches we can investigate to reduce their size:
- Types with the same layout (meaning the same size, alignment, stride, extra inhabitants for enum cases, and retain/released fields in the same places) can share value witnesses. We only take advantage of this in a handful of special cases, but IRGen could have a more systematic notion of "type layout", which we can use as the key for emitting and sharing value witnesses, instead of individual types.
- We need to do type layout optimization at some point (such as reordering struct fields to minimize padding), and there is also an opportunity to optimize type layout with an eye toward creating shared layouts more often. Layout rules such as favoring putting refcounted fields at the beginning of a type's layout could cause value witnesses to be shared more often.
- Along the same lines of using mangled names to represent type accesses, we can come up with a runtime encoding for type layouts, and use that instead of open-coded value witnesses for the majority of value witness implementations.
- We could add language functionality that makes it easier to make the inline size of value types smaller, such as indirect fields in structs. Smaller types generally need less code to copy and destroy.
The Clang importer can generate a lot of code on behalf of imported C types, including type metadata, synthesized conformances, thunks for ObjC methods, and so on. A lot of these are not terribly interesting and don't really vary their CPU-level implementation across types (for instance, every imported C enum's RawRepresentable conversion is just a bitcast), so providing shared implementations of common witness tables and thunks in the runtime could save a lot of code size in mixed Swift-ObjC projects.
Generic type metadata instantiation has to compute the offsets of fields in a generic type, and it does so with currently open-coded instantiation code. Some or all of this could be driven by the field reflection metadata that also already exists in the runtime.

I'll add more to this list as I have time. Hopefully that at least gives you an idea of some of what we've done, and what we plan to still do.