I've found that in many systems there is object chaff that is created during production of results and that that chaff can be stored in mini collections which are also chaff- and sometimes even whole trees, but then all of that ends up being tossed away when the Task generating these results is finished. Sometimes a single result is picked from many. Sometimes the results are used for rendering and then discarded- but they are ephemeral and only seen by one thread. And that an entire "Task" can be an ephemeral task, whose inputs and outputs may be more long lasting, but all of the that temp chaff in that can be large.
In graphics (I worked on Quartz at Apple in the late 90's and early 2000's) its things like masks of shapes to be rendered (the outlines that were beziers are converted to bitmaps). Back in the day we would generate several megabytes of these things for complicated visual frames, but we didn't need them when the frame was over, so they were all ephemeral, temp. And we literally couldn't afford to use malloc/etc., because back then the PowerPC was at least 2.5x slower than intel for what we were doing, so we needed a win, so we did this. To win our software needed to be better, faster, less compute/unit of work.
So for us it was those temp generations of bitmaps and things. In NIO it might be those ChannelHandlers. In AI's its things like statistical probabilities. In speech recognition (we have a swift based voice Ai) it's things that are something like phonetic probabilities from something akin to n-grams).
These are all temp/chaff things that are generated - and you want them in a high level language (other ASR's use C++) . And they have limited lifetimes that are either obvious already or can be made obvious with API- something between the API we have in swift for extended lifetimes, and a stack scoping works. For temporary objects there are no heap fragments (I have patents inside Apple for temp allocation in graphical 2D heaps that avoid heap fragments, Im aware of the issue) Because you sweep the temp heap at the end of the temporary alloc time. All objects are gone. You can do this non-atomically because it's local to a Task. And in debug modes you can mark the crap out of the objects so they don't get reused and assert, etc.
With this we get almost free allocation for ephemeral objects. Free because you ptr move to alloc, and sweep the heap with one ptr move at the end. Thats of course MUCH cheaper than calling malloc, which is famously slow at Apple and used to be owned by Bertrand, with a smile and a wink. The argument made then and I make now is you shouldn't use Malloc in perf important code, instead using the Zone alloc stuff in the OS, stack alloc, etc, and in Quartz we have a frame allocator that works like im talking about here. .
If the borrow stuff doesn't work well enough, and you can see the local objects on your side of the wall, you can use a non atomic retain release- since you dont need atomics for non global objects only seen in one thread. If you have API to designate these local regions, you can annotate the code like with the other cases and just switch the implementation to emit nonAtomicArc calls instead of ARC calls.
You could imagine then a "local only array" that also doesn't do atomic retain/release, and isn't "Sendable". The main reason why I was talking to Chris L about Sendable early on was this . Not being Sendable means you can do all of this "don't use the heavyweight OS primitives" stuff.
The data supporting this is there if you look. It is the future. We are going to more and more cores, and Atomics and Malloc will just keep getting more and more expensive, and the people I know on the hardware side say the unified memory thing (which makes Atomics briefly cheaper) is a temporary fix for M1/M2, and not the future of the industry.
Love you all, and love Swift.