Reducing value witness size in struct composition

noahsmartin · September 17, 2021, 6:21pm

Hi,

I've seen struct composition used frequently for json decoding and representing SwiftUI state, for example:

struct TitleSubtitle: Decodable {
  let title: Text
  let subtitle: Text
}
struct Text: Decodable {
  let value: String
  let style: TextStyle
}
struct TextStyle: Decodable {
  let color: String
  let size: Int
}

While working on a binary size profiling tool, Emerge, I noticed whenever I add properties to a struct the size of the value witness increases, but the value witness of every type directly or indirectly referencing this struct also gets larger. So a small addition to a struct like TextStyle can result in a big size increase due to it's use in Text, and multiple uses of Text in TitleSubtitle.

As a concrete example, if you start with two structs:

struct Foo {
  let prop1: String
}

struct Bar {
  let prop1: String
  let prop2: String
  ...
  let prop20: String
}

and use nm to view symbol sizes in the binary:

0000000100004e4c t initializeWithCopy value witness for StructSizeTests.Bar
0000000100004ff8 t assignWithCopy value witness for StructSizeTests.Bar
...
0000000100005530 t initializeWithCopy value witness for StructSizeTests.Foo
000000010000555c t assignWithCopy value witness for StructSizeTests.Foo

Bar has an initializeWithCopy that's 0x1AC bytes, and Foo's is 0x2C bytes. Now if you change Foo to be

struct Foo {
  let prop1: String
  let prop2: Bar
}

Bar will stay the same size, but Foo gets a larger initializeWithCopy: 0x1C0 bytes. A similar thing happens to other parts of the value witness like assignWithCopy. It's as if the implementation of Bar is being copied into Foo. This was all tested with Xcode 13 RC.

I thought outlining would have taken care of any duplicate instructions here, but even though I'm compiling with -Osize and can verify _OUTLIINED_FUNCTION_* symbols in the binary, the overall size of the __text section is still increasing by about the size of Bar's value witness in this example.

I also tried moving around the order of properties in case there was an offset causing the new instructions to be different, but got the same result with this struct:

struct Foo {
  let prop2: Bar
  let prop1: String
}

A quick look in Hopper also indicates very similar instructions:

sub        sp, sp, #0xb0
stp        x28, x27, [sp, #0x50]
stp        x26, x25, [sp, #0x60]
stp        x24, x23, [sp, #0x70]
stp        x22, x21, [sp, #0x80]
stp        x20, x19, [sp, #0x90]
stp        x29, x30, [sp, #0xa0]
add        x29, sp, #0xa0
mov        x19, x0
ldp        x8, x0, [x1]
stp        x8, x0, [x19]
ldp        x8, x20, [x1, #0x10]
stp        x8, x20, [x19, #0x10]
ldp        x8, x21, [x1, #0x20]
stp        x8, x21, [x19, #0x20]

vs.

sub        sp, sp, #0xc0
stp        x28, x27, [sp, #0x60]
stp        x26, x25, [sp, #0x70]
stp        x24, x23, [sp, #0x80]
stp        x22, x21, [sp, #0x90]
stp        x20, x19, [sp, #0xa0]
stp        x29, x30, [sp, #0xb0]
add        x29, sp, #0xb0
mov        x19, x0
ldp        x8, x0, [x1]
stp        x8, x0, [x19]
ldp        x8, x20, [x1, #0x10]
stp        x8, x20, [x19, #0x10]
ldp        x8, x21, [x1, #0x20]
stp        x8, x21, [x19, #0x20]

It doesn't seem like this increase in size should be necessary, are there any tricks to reducing the size impact in cases like this, or any compiler flags to re-use some of these instructions instead of generating them twice? Could anyone help me understand why outlining doesn't handle this?

Thanks!

George · September 17, 2021, 6:32pm

I don't know the answer to this, but you might be interested in SR-14273: Byte Code Based Value Witnesses

gmittertreiner · September 17, 2021, 6:51pm

You're right that it looks like the implementation of Bar is being copied into Foo, because that's exactly what happens right now in a lot of cases: swift/GenRecord.h at main · apple/swift · GitHub. IRGen essentially does a depth first tree traversal of the type and generates the code it need inline. This cause other issues because the copies end up interspersed with retains/releases, so llvm is never able to group the copies into a memcpy and it ends up generating a bunch of loads and movs.

Swift's IRGen lowering for value types is pretty naive in a lot of places. I haven't personally looked too deep into why outlining doesn't help as much as one might think it would. My initial theory was that the GEPs that it produces for sub fields are different enough that the outliner isn't able to handle them, but when I experimented with some GEP lifting work @plotfi was working on, I still didn't see too much of a difference, so there might be more going on there than just the geps.

The typeinfo lowering does do some manual IR level outlining. It creates "outlined assignWithCopy" functions which you'll see as 'WO' mangled names (https://github.com/apple/swift/blame/main/docs/ABI/Mangling.rst#L299). These seem to help, but could probably be more aggressive.

Other than that, I've been working on a way to have value witness functions call into a runtime interpreter which should completely eliminate the codegen size. But I'm still a few months away from something that's upstreamable. Current status is that I've got a prototype working, but need to do a bunch of testing to see how much it helps and how much runtime impact it has.

noahsmartin · September 17, 2021, 7:34pm

Thanks for the extra context! It looks like [IRGen] Code Size Reduction: Outline Copy/Consume (Loadable) Enum by shajrawi · Pull Request #6451 · apple/swift · GitHub and Code Size: Outline copy addr instruction by shajrawi · Pull Request #12687 · apple/swift · GitHub implemented the manual outlining. There are some restrictions on what types generate them, and I'm not seeing them in my example so it must not be applicable for these types.

The runtime interpreter to move some of this out of app binaries sounds great, thanks for working on that!