Global heaps incur atomics and locks, which is the reason why when you profile swift, retain & release are always• on top. What's worse, Atomics aren't just "slow functions" but actually wipe out things like l2/l3 caches (think of Atomics as particular bad cache misses), so they make all subsequent functions slower. And ARC liberally sprinkles atomics all over your swift code, so your code is always missing•.
What we propose it a way for the community to start experimenting with faster nonatomic ARC based class instances so that we can start modeling how future swift will look with isolates, and find performance wins, and help light the path to performance parity with languages like C/C++/Rust/Java, who all have either borrowing, infant generations, or some other way around high frequency atomic or lock based interactions with global heaps as Swift doesn't.
Nonatomic ARC is already in the codebase, we just need a way to use it in a deliberate manner.
We propose a simple "marker protocol", NonAtomic, that you just mixin to a class declaration, which then causes the swift runtime to use the non-atomic version of ARC.
Second, we propose that if nonatomic classes have a static func alloc() / free() with suitable params, that HeapObject calls that static function for allocation instead of malloc.
This entire initiative is low cost, and very high gain: the ARC code is already in there, & adding a static func per class is linear work, no magic there.
How would you use it? When you have a system that does heavy weight compute, you can now use normal classes to do the work. You can start by making your classes generating temporary work nonatomic, and use them only where its safe to do so (temporary work tends to be in a single threaded context, so thats not hard).
Second, if you find allocation to be showing up on profiles, you override the alloc/free functions for your class and add a memory pool, or use one of the myriad high performance C++ subpool libraries. Separating pools out by alloc type,frequency, & locality of use means subpools can fill and empty many times without getting hole creep, and in many cases, when the operation is complete, the temporary pool can be zero-ed out with no heap clean.
Third, there's tons of cases where class instance are heavily mutated during production, and then are semantically immutable. As a future, we propose adding a facility to bake that into the language. That would give we programmers the ability to express "production complete" instances that could then be safely sent many places. (Immutabilize), with all credit given to whoever invented "immortalize".
Thats the ideas- & the first two are easy to do & with Apple's blessing, allow the community to move towards a much more multi core savvy world of much higher performance.
Today, we use these techniques to give us higher performance. But without being able to turn off ARC, we are forced to immortalize all of our temp instances, replace init with a static creation function, and recycle our instances in and out of our pools without ARC. That wastes a great feature of the language, and means our pools are bigger than they need be. We also have to use our own collection types, because things like Array merrily engage with atomic ARC at a whim. & without the custom malloc, we still incur high malloc overhead on startup, and suffer for decreased locality of reference. With custom malloc, our type specific pools can have nice bounds checking, and performance.
Sounds pretty exciting to me.