Hi Everyone,
I've been working on SR-14273 for the last little bit. I've gotten to the point of having a working proof of concept (See this github branch: GitHub - gmittert/swift at upstream_runtimevw) and I'm hoping to get some feedback from the community so I can begin stabilizing and hopefully upstreaming the work. Many thanks to @Arnold who has been helping me along with this.
SR-14273 "Working toward a byte code representation for value witnesses" is an approach to reducing the code size of Swift binaries. When the Swift compiler is forced to create a value witness function for a type, these functions can become very large, especially for large nested value types. These functions tend to have very similar set of operations that they do. Some examples being: "compute the address and alignment of a field", "free a reference at a given address", and "check if an enum has a payload".
Rather than generating code for the types, we propose instead to compute a compile time type layout, then create a runtime function that interprets the type layout, computing alignments and addresses at runtime to copy or free them.
For a simple example, given a struct like
struct S {
let a: SomeClass
let b: UInt32
let c: SomeOtherClass
}
enum E {
case A(c: SomeClass)
case B(s: S)
case C(i: UInt32)
case D
}
Normally, we would see a value witness function to destroy it look something like:
define $s1E1AVwxx(addr, metadata): // value witness destroy for E
gep to field containing extra inhabitants
compute enum case based on extra inhabitants
if A:
release A
if B:
destroy B
With Byte Code Value Witnesses, we would at compile time, compute a string representative of the type, then pass that to a runtime function to interpret. To make up a format for clarify:
// Sample format for clarity, see below for actual byte based format
string = enum {
emptyCases: 1,
cases: [
Reference,
Struct: [ Reference, POD32, Reference ],
POD32,
]
}
define $s1E1AVwxx(addr, metadata): // value witness destroy for E
call runtime_destroy(addr, string, len(string), metadata)
Our code that had to do all the work is now a call to a runtime function with a precompiled string. This allows us to make a code size/runtime cost trade off. A complete implementation should allow the replacement of at the minimum, destroying, as well as the assign/initialize with copy/take value witness functions functions. It likely also extends to the buffer and enum operations, though my focus has mainly been on copying/destroying so far.
Metrics
I don't currently have good metrics on size improvements in a large app, but it in reduced test case of defining some nested structs/enums, I was seeing about a 6% text section decrease by replacing the destroy and copy functions. I don't at all have numbers on how big of a performance decrease to expect from interpreting the strings. As is the proof of concept is anything but optimized, so unfortunately I don't think getting performance numbers now would be all that useful.
Some Logistics
As for enabling and disabling this, the proof of concept is behind a global flag, but I think this would be more useful with something that is a little more fine grain. Being able to annotate a specific problematic large type would be more useful than having to globally enable and disable the behaviour. My understanding is that an underscored attribute doesn't require too much process to add.
The Byte Code
The rest of this post is about the byte code format, which is the main thing I'd like feedback on. I'd expect it to be changeable, in the future, but somewhat cumbersome to do so, so I'd like to make sure it's ship shape before committing to something. Both Archetypes and Multi Enums have a lot of complex cases, and I while I did my best to understand the current implementation, there are not a lot of documentation on them, so it's not unlikely my understanding is incorrect or incomplete in places.
Overview
The byte code leverages and adds onto the current Type Layout infrastructure (see lib/irgen/TypeLayout.cpp). The type layout infrastructure has several types of types that we need to handle:
- Scalars (Non composite types, ints and References mostly)
- Single Enums (enums with only one payload case)
- Multi Enums (enums with two or more payload cases)
- AlignedTypes (Structs, also enum cases that have multiple fields)
- Archetypes (the T in
struct Foo<T> {}
where we don't know what T is until runtime) - ResilientTypes, which we need enough information to call their value witness functions
VALUE := SCALAR | ALIGNED_GROUP | SINGLE_ENUM | MULTI_ENUM | ARCHETYPE
Most of the below mnemonics aren't there for any particular reason other than it was what was easy for me to remember and debug, so I'm not particularly attached to them if anyone has alternate suggestions.
General Design
As far as the over design goes, types are encoded about in the order they've been defined in the source file. I've leaned towards length prefixing the byte code of composite types, which allows the implementation to jump to which chunk of the string it needs (e.g. once it figures out which enum case it needs to free) as well as recursively call itself by passing the recursive call a pointer to somewhere in the byte string, as well as a string length without too much iteration over the string scanning for a closing brace. This also avoids having to copy the passed in byte string, we can just work with substrings of it by passing around a start pointer and a length.
For said length encodings, I've typically used 32bits in big endian (no qualms against using little endian, this was just easier to read when dumping memory). IRGen uses uint32_ts/unsigneds for most things (enum cases, number of struct fields), but I think there's reason to be had for just using 8 or 16 bits for lengths. The vast majority of the time, the top 3 bytes will be empty, and we could fall back to the existing non runtime implementation if the representation isn't able to handle it (65535 enum cases should be enough for anyone, right?). One alternate approach would be to default to only a single byte, then reserve 0xFF
to indicate a larger amount:
LENGTH := UInt32
// or
LENGTH := UInt8 | '0xFF' UInt32
I haven't really attempted to do any sort of extensive bit hacking to really trim down the size, so suggestions there are well appreciated.
Scalars
SCALAR := 'c'|'s'|'l'|'L'|'C'|'r'|'N'|'n'|'W'|'u'|'w'|'b'|'B'|'o'|'f'|'x'
For our purposes, scalars are types that are not composed of other types. In swift, this is mostly POD types such as ints, and reference types. For a scalar, we need to know its size, and how it should be copied and released.
We define POD types for types that just represent data. These do need to be copied, but do not need to be released or retained. We also need to track them to figure out address and alignment in our parent types.
I8 = 'c', // char
I16 = 's', // short
I32 = 'l', // long
I64 = 'L', // long long
(perhaps b[yte],w[ord],d[ouble],q[uad word] would also be appropriate?)
We also have reference types. While they are all 64bit sized, we need to differentiate between them because they have different ways of being released/retained.
ErrorReference = 'r'
NativeStrongReference = 'N'
NativeUnownedReference = 'n'
NativeWeakReference = 'W'
UnknownUnownedReference = 'u'
UnknownWeakReference = 'w'
BlockReference = 'b'
BridgeReference = 'B'
ObjCReference = 'o'
ExistentialReference = 'x',
Closures, aka ThickFunctions are 128 bits. The first 64 bits is a function pointer, and the last 64 bits is an optional reference counted pointer if the closure is storing data. I tried to lower these instead to an aligned group, but I wasn't able to figure out how to do that without causing a bunch of crashing, so they remain here for now. These could likely be removed in a complete implementation.
ThickFunc = 'f',
Aligned Groups
Structs (and Enum cases with multiple fields) are expressed as a group of values that have required alignments.
For structs, we're mainly interested in having enough information to determine the offset of each field, which we can derive from the alignment, and using the nested byte string to determine the field size.
SIZE := uint32
// a numFields (alignment,fieldLength,field)+
ALIGNED_GROUP:= 'a' SIZE (ALIGNMENT SIZE VALUE)+
ALIGNMENT := '1'|'2'|'4'|'8'
The Alignment attached to the structs indicates the number of bytes the struct should be aligned on.
Note that because the alignment of an aligned group is the largest alignment of its subfields, aligned groups need to nest and sadly cannot be flattened.
For example:
struct {
i: UInt8
struct {
j: UInt8
n: SomeClass
}
}
Will align 64-bit align j, while
struct {
i: UInt8
j: UInt8
n: SomeClass
}
Will 8-bit align j.
For an example of the layout, given the struct
struct {
i: UInt8
struct {
j: UInt8
n: SomeClass
}
}
We produce the type layout:
a<0x00000002>1<0x00000001>b8<0x00000011>a<0x00000002>1<0x00000001>b8<0x00000001>N
// Which is
a // Aligned group
<0x00000002> // 2 fields in the group
1 // Field is byte aligned
<0x00000001> // Byte string length is 1
b // Byte string for UInt8
8 // Field is 8 Byte Aligned
<0x00000011> // Byte string is 17 bytes long
a // We have a nested aligned group
<0x00000002>
1<0x00000001>b // byte aligned byte
8<0x00000001>N // 64bit aligned native reference
Single Payload Enums
Single Enums are encoded as a
SIZE := uint32
// e numEmptyPayloads lengthOfPayload payload
SINGLEENUM := 'e' SIZE SIZE VALUE
For single payload enums we need enough information to determine the overall size of the enum, where its extra inhabitants are if any, and how to release/retain it. Even though we don't need to release or retain the no payload cases, we still need to know the number of no payload cases, because we may have to add extra tag bytes to the enum if we don't have enough extra inhabitants to represent them.
For example, the following enum would produce:
// A single enum with 3 no payload cases, a payload length of 1, and a payload
// of a single Native Pointer
enum E {
case a(c: SomeClasee)
case b
case c
case d
}
e<0x00000003><0x00000001>N
e // enum idicator
<0x00000003> // 3 No payload cases
<0x00000001> // Payload bytestring length of 1
N // Payload is a single native reference
Multi Payload Enums
// E numEmptyPayloads numPayloads lengthOfEachPayload payloads
MULTIENUM := 'E' SIZE SIZE SIZE+ VALUE+
For multi payload enums we need enough information to determine the overall size of the enum from each payload and how to release/retain each payload. We can use the the nested payload byte strings to determine the largest payload for the overall size of the enum, as well as the location of the spare bits of each payload so we can determine if we need to add extra tag bytes.
For example with the following enum:
struct MyStruct {
let a: SomeClass
let b: SomeClass
}
enum E {
case a(c: SomeClass)
case b(c: MyStruct)
case c
case d
case e
}
// A Multi enum with 3 no payload cases, two payloads, one of a struct, the
// other of just a Native pointer
'E<0x00000003><0x00000002><0x00000001><0x00000011>Na<0x00000002>8<0x00000001>N8<0x00000001>N'
E // Multi enum indicator
<0x00000003> // 3 no payload cases
<0x00000002> // 2 Payload Cases
<0x00000001> // Payload 1 is len 1
<0x00000011> // Payload 2 has len 17
N // SomeClass is a Native Reference
a<0x00000002>8<0x00000001>N8<0x00000001>N // Struct becomes an aligned group
Archetypes
Archetypes are the type variables passed in through types. For example, in the struct
struct Example<T> { a: T}
T is the archetype. We don't know T's size statically unless it gets specialized, but we can find out through the passed in type metadata pointer.
INDEX := UINT32
ARCHETYPE := 'A' INDEX
The associated index represents the index into the generic argument vector for the passed in type. From there, we can get the type metadata for the type which gives us the size, alignment, and extra inhabitants which we need for copying and destroying.
Archetypes in particular I'm less certain about. GenericTypeParamDecl also contains a depth as well as an index, and I'm not sure if there's a case in which I need that in addition to the index. How Archetypes get lowered in the current IRGen is somewhat opaque (pun somewhat intended).
Resilient Types
For Resilient Types, we need a way to know how to call their value witness.
Given their mangled name, we can figure that out using the runtime function
swift_getTypeByMangledName
.
// 'R' mangledNameLength mangledName
RESILIENT := 'R' UINT32 mangledName