I've been trying to improve my understanding Swift's performance characteristics. John McCall's WWDC talk, Exploring Swift Performance, from this year's WWDC is an excellent resource, and I highly recommend it for anyone who is interested in optimizing their Swift code.
One of the main topics is memory allocation, and one of the main points is that heap allocation is significantly more expensive than stack allocation, and also incurs the cost of reference counting. McCall points out that heap allocation is always used for reference types like classes, but it is also used for copy-on-write value types like Array, which hides a reference type property under the hood.
This is something that probably should have been obvious to me after using Swift for almost nine years, but I only understood the implications while watching the video. My mental model for thinking about memory allocation and reference counting used to be very simplistic: value type = stack w/o ref counting, reference type = heap w/ ref counting. Now I know that I have to consider the allocation and ARC cost for any COW type and any struct that has properties that are reference types or COW value types.
But I am having trouble doing that: Swift provides no way for us to know if a value type is hiding private reference types. I know that the Standard Library's collection types are COW, so I can assume they contain a single reference and memory allocation for the buffer. But what about other types? For example, how many of Foundation's structs are actually wrappers around some NS class?
And when dealing with larger structs with many properties, how can one figure out how much heap allocation and ref counting will occur? One can determine how many allocations will occur for the struct's public properties by counting how many COW types there are, but what about private properties?
I am starting to think that it would be nice if there were some kind of convention in Swift's documentation that provided a type's ARC cost and heap allocation cost. Until then, when trying to optimize my code, how can I figure out how much reference counting and heap allocation a struct will incur?
There's no great way to know this. It's maybe more tractable to see if a type recursively has a reference counted field somewhere, but it's pretty hard to determine if a type does a manual heap allocation on its own besides reading the source or it being documented somewhere.
I'm surprised there's no way to get this information from the compiler - e.g. some dump of the type hierarchy showing relevant metadata like whether each type is ref-counted or has transitive children that are? The compiler must have that information internally.
The question you're asking for is if a type is trivially copyable and destroyable, which in practice is the case iff the type does not contain any reference types or existentials. There's an _isPOD() (Warning: underscored API!) entry point in the stdlib for this purpose:
It's also worth noting that resilient types which do not currently involve heap allocations or atomic reference counting operations may later evolve to do so. So in order to reason about these factors at all for an SDK/standard library type, it first needs to be @frozen.
Well resilient types themselves may introduce heap allocations in order to instantiate themselves because their size isn't known at compile time (of course we have optimizations to promote these to stack allocs iirc).
Unless something has changed, I don't believe they do necessarily involve heap allocations. See the final paragraph:
Structs and enums
If a struct or enum is not declared @frozen, its in-memory layout is opaque across a resilience boundary. This includes the size and alignment of the value, as well as whether additional work must be performed when moving, copying and destroying values of this type (for example, updating reference counts).
When generating code that interfaces with a resilient struct or enum across a resilience boundary, the compiler will always manipulate the value indirectly, passing type metadata to describe the in-memory layout of the value. This is analogous to how unspecialized generic functions manipulate values of generic parameter type, which is a topic discussed in detail in the 2017 LLVM Developer’s Meeting talk titled Implementing Swift Generics.
An important property of the implementation is that a resilient struct or enum has the same in-memory layout as a non-resilient struct or enum; there is no boxing or indirection at the level of values. Instead, code that manipulates those values must take additional steps to calculate field offsets or pass values as parameters between functions. This ensures that while library evolution support can increase code size, it does not impact the cache locality of data.
The last paragraph states that once initialized, resilient types have the same in-memory representation as a non-resilient one. Meaning that none of the fields are individually boxed, or any of the fields have a changed representation. It does not state that the type itself is not boxed (in fact the 2nd paragraph even states that the compiler has to work with these values indirectly and in order to work with something indirectly it has to be in memory either on the stack or in the heap).
Of course it does need to be in memory, but I don't believe the values are forced to live on the heap just because their size is not known at compile-time. My understanding is that the compiler basically performs a size lookup and alloca (as opposed to a malloc + free).
I do discuss that in the talk. Different containers handle dynamically-sized types differently. Local contexts handle them by allocating space separately from the frame, but that allocation is still on the stack (C or async, depending); there was a point where this was a heap allocation, but that hasn't been true since IIRC either Swift 3 or 4. Dynamically-sized global variables are allocated with a permanent heap allocation, but of course that's one-off. Most other contexts just force the container to be dynamically laid out — for example, if you have a class instance property that's dynamically-sized, the compiler has to determine the class's layout at runtime.
I also wondered about that. As @Alejandro mentioned, it would be very difficult to determine if a type does any heap allocation at all. But it seems like the compiler has enough information to determine how many allocations occur as a result of reference-type descendent properties.
It would be cool if that information could be used during the generation of documentation so that data types could have some kind of documented reference counting cost. For example, a struct with two string properties might include in its documentation:
ARC cost: 2 (When a type is passed around how many increments/decrements occur)
Allocated Ref Types: 2
This info would give developers a pretty good idea of the performance impact of the type, kind of like how big-O gives a good idea of the performance of a function, though it might not tell the whole story.
Would this prevent the info I mentioned above from being documented for the current version of the Standard Library? I suppose if it's documented it's part of the API, so it would have to remain stable at least between major versions. Or would it be wrong to document at all?
Is there any value to this idea? It seems like something I would really like, and it would help people make informed decisions about a major aspect of performance. Would there be any point in raising this idea in a discussion on the Evolution forum?
This is encoded in the reflection metadata that is consumed by the reflection library. You can look at a binary, and starting from the mangled name of a type, compute its layout, which recursively breaks down into POD or reference counted values.
Another approach is to hook into the front end to get this information from IRGen. Here is some code I added a few years ago (edit: it was 6 years ago ) to dump the size and alignment of a given list of types in YAML format: swift/lib/IRGen/TypeLayoutDumper.cpp at main · swiftlang/swift · GitHub. This can be expanded upon to get more information out of the TypeInfo object.
The difference in the two approaches is that the front end won’t see inside a resilient type’s layout, because you’re seeing the compile-time view of the world. The reflection library on the other hand always computes the exact runtime layout, using full knowledge of the reflection metadata in all libraries.
For generic types, this depends on the instantiation though, so it’s more difficult to boil down the “cost” of passing the abstract type, without knowing the generic arguments. However, you can similarly determine if the size of the generic type depends on the size of the arguments or not, which also changes the runtime costs involved.
Good point. This was discussed in the WWDC talk. Could a minimum cost, rather than a definite cost, be calculated based on the non-generic properties of the generic type? That would still be useful because the developer could look up the cost of the concrete types for their use case of the generic, giving them some baseline for reasoning about the type.
I think there are still quite a few open questions around it, but it's important for some kinds of programs to commit to particular performance characteristics. I believe it's still something we all want to happen in some form.
I don't know if we could quantify it - to give you something in the middle between "any allocations" (the default) and @_noAllocations, but it's an interesting thing to consider.
I wonder if this could be done by looking inside an object file or executable file for symbols that might indicate heap allocation.
// 5.swift
let u: Array <Int> = [1, 2, 3, 4]
var v = u
v.append (contentsOf: Array <Int> (repeating: 17, count: 1024))
swifts 5.swift nm ./5
0000000100002010 s _$s4main1uSaySiGvp
0000000100002018 s _$s4main1vSaySiGvp
0000000100000f40 t _$sSa12_endMutationyyF
U _$sSa6append10contentsOfyqd__n_t7ElementQyd__RszSTRd__lF
U _$sSa9repeating5countSayxGx_SitcfC
U _$sSaMa
0000000100002000 d _$sSaySiGMD
0000000100002008 d _$sSaySiGSayxGSTsWL
0000000100000e80 t _$sSaySiGSayxGSTsWl
U _$sSayxGSTsMc
U _$sSiN
U _$ss27_allocateUninitializedArrayySayxG_BptBwlF
0000000100000dd0 t _$ss27_finalizeUninitializedArrayySayxGABnlF
0000000100000e10 t ___swift_instantiateConcreteTypeFromMangledName
0000000100000ed0 t ___swift_instantiateConcreteTypeFromMangledNameAbstract
0000000100000f94 s ___swift_reflection_version
0000000100000000 T __mh_execute_header
0000000100000ce0 T _main
U _swift_beginAccess
U _swift_bridgeObjectRetain
U _swift_endAccess
U _swift_getTypeByMangledNameInContext2
U _swift_getTypeByMangledNameInContextInMetadataState2
U _swift_getWitnessTable
0000000100000f82 s _symbolic SaySiG