Swift Performance

It's worth remembering Swift's implementation of ARC is still far from optimal. Swift 5.3's optimizer significantly reduces the number of ARC calls in optimized code; we've seen up to 2x improvements in hot parts of SwiftUI and Combine without code changes. There will continue to be ARC optimizer improvements as we propagate OSSA SIL through the optimizer pipeline as well. (However, the main benefit of ARC over other forms of GC will always be lower-overhead interop with non-GC-managed resources, such as the large volumes of C, C++, and growing amount of value-oriented Swift code that implements the lower level parts of the OS, rather than the performance of the heap management itself.)


Joe, happy to look at Instruments profiles with you sometime.

Think of the SwiftUI/Combine case you mentioned as an important and also lock bound high message count overhead case, with not much actual compute. I was the chief architect of JavaFX, SwiftUI's great grand dad, and am aware of the case you are dealing with.

Think of Brighten AI, our system - as as a wide open concurrent compute system all cores engaged for 200ms, and unlike SwiftUI/Combine doesn't have a bunch of threading constraints (SwiftUI and main), so compute can go wide open. What we found building it is that ARC tends to insidiously get in the way, either because of variable sized structs, collection use, ownership through stack pops, etc. To be clear, we also built new storage sub systems to manage our systems streaming inputs. SQL/CoreData/Realm/etc. all had the same stomp on the allocator and locks performance issues that would have killed us.

We use actors for course grained concurrency, and the ultra high perf stuff for our streaming recognition inside one of the actors.

1 Like

Im late to the party i know..

But all of this makes me think how it was a missed opportunity for Swift
to have used its own Generics subsystem to support tagging the objects
instead of backing it in directly in the compiler as a keyword.

If we had something like:


etc.., it would much better for the community to come up with its own solutions to different scenarios and grow in a organic fashion.

Now we need to wait until things become keywords, and theres no way to fight against a compiler that will hardcode release/retain as a compiler pipeline step.

Imagine if instead of having to wait for a 'strong' keyword we could have designed a strong<> holder that would use simple alloc and free directives.

Swift is one of the most beautiful and cool languages to develop in out there, its a pitty that most of its flaws on the design are there because it had to be compatible with Objective C.

1 Like

I'm not an expert on this but I found that Nim lang recently started to use ARC, and they mention that this change has improved Nim's performance. The difference between Swift and Nim ARC implementation is that Nim's one doesn't use Atomic ARC. They explain all of this here. Maybe swift with a concurrency model, could make optimizations under the hood and perform ARC nonatomic like Nim and could improve language performance.

The last two posts are along the lines we are advocating- one way to do it is to replace ARC's global memory ideas, and let eng teams use API constructs to tell ARC how to act. We get a lot of bang out of generational alloc- an infant generation that looks alot like stack alloc, but is actually a mini heap- but importantly- because there is no (lock/atomic), you get ultra fast alloc, and you just graduate infants to the another heap as you need to. And as Fabio says, lets just make it possible to get ARC out of the way, or change its characteristics. Using Unsafe stuff works, but there are more sugary ways to do it.

There are some limited optimizations to turn refcounts into nonatomic operations when we know the object being referenced doesn't escape to other threads, but it's difficult to do much of that today because Swift as a whole does not have a strong notion of threads. However, recent Apple Silicon hardware, like the A14 CPU in the latest iPhones and iPads, new makes uncontended atomic operations almost as cheap as nonatomic accesses, making that kind of compile-time optimization less necessary.


? Your link takes me to a page with no code on it AFAICT.

Dave, the code on the left is an actor DSL, and the code on the right in a swift knowledge graph DSL.
At Swift | Brighten AI.

Joe, yeah we have to add Api to let engineering teams talk about threads. The actor model is a way to do that. Once in the actor, you have to be able to say "actor local allocation". Then you can use an actor private heap, with no lock.

What we do for fine grained concurrency, is make a set of "operations" that each have their own infant generation heaps inside them, and we "go wide" to many cores, we hand one to each core. Each core operation parties in its container, free of locks, and graduates results from that operation that are back on the main ARC heap. This is a massive win- because there's tons of compute in each core, and for 1000,000 inputs, & 10x that temporaries, there's just a few outputs. All of that infant generation garbage is high perf, no atomics, no locks allocs.

Just to be clear, im not advocating against ref-count per-se.

Its a very reasonable GC when you need deterministic guarantees. Thanks for letting us be aware about how the compiler can make some optimizations by avoiding atomics in certain circumstances.

But lets imagine that Swift started with two small changes: Struct destructors and using something like

struct shared<T> {
   .. t ...
    init(_t: T) {
    deinit {

To do the exact same thing the compiler is doing now, it would be a matter of bake in the recognition of '_refcounted_alloc' as a special function and make the compiler do it as a no-op in the cases it cleverly diagnosed that a ref-count is actually not needed.

I'm not a compiler expert, but i wonder if only the language were launched with struct destructors, the capacity of moving value types and some sort of macro keyword for retain/release would do to Swift.

What's on the 'ownership manifesto' could probably have being achieved by now, without taking the training wheels for the majority of people that don't need to customize the language by that much.

We all want more performance using generally built into the language constructs- and Fabio i see your point. I would add that we still need to not touch the main heap with its needs for atomics or locks too.

There's just no reason to destroy single threaded compute perf that way.

And as a person who has studied performance alot (I helped design Shark, back in the day), I can tell you, Swift is insidious in the way it inserts ARC cost into call chains even dodging class use, etc. Part of why were are talking about it here, is that I don't think most compiler people have time to study ultra high performance compute projects, and so don't see the effects of ARC in the way we do.

As I noted, when I visit that page (in Chrome) I see no code whatsoever. OK, I tried a different browser. Safari shows something.

Please do not presume; I can vouch for the HPC bona-fides of at least one key member of the Swift team.

Ah, I see. We use a PDF viewer plugin, good to know, apologies !

We use a really simple Actor DSL, the idea is generate a static Actor so its safe to call with async messaging, and have a queue associated with it. We get a lot of simplicity out of that. Brighten Ai is tens of megabytes of compiled code, so we need submodules to merrily do their work and talk to each other without having to worry about threading. This is the large scale course grained concurrency stuff, and we would add that we also have a pipelining version of that (for things like decoding, with many steps, and producer / consumer motif). and also advocate that Actors need sooner or later to be able to say "I need these actors started before me". And I would advocate also only allowing Codables to be pumped for V1.

I hope its okay to just post the link to the PDF here? https://www.brighten.ai/wp-content/uploads/2020/10/actor3.pdf

fyi swiftai is already already an opensource project GitHub - fastai/swiftai: Swift for TensorFlow's high-level API, modeled after fastai

I hope brighten ai will consider opensourcing some of this work.

Sure glad you posted the link.

So about the "high allocation frequency"- you want to think of BrightenAI as something akin to a decoding pipeline, taking raw audio-->speech/nlu-->action. We handle language models with well over 20 million entities (because navigation & tv program search are large) 20 million x any thing is alot of temporary allocs. Our time budget is 50 ms to compare all 20 million. (The rest of the pipeline has to finish within 200ms too).

If you have looked into the insides of a speech engine, or an NLU system, you will find lots of temporary structures and things describing models and theories, which are en masse gathered during an early part of a pipeline stage, and then best-fit pathways are found through least cost viterbi searches through cost connected node forests. The rest are pruned away and evaporate.

Swift is getting pretty good at the immutable struct thing- but what you find is if you use collections, or dynamic length objects (ie. a string, but thats not what we are talking about- but a string is dynamic length as well- you run into ARC and heap work and performance getting hit pretty hard.

Our allocations that are high frequency are temporary, so we make a temporary heap and allocate from there, and manually promote winners out of the infant heap (when they win). We didn't do anything weird to the language. Swift's "unsafe" stuff gives us the ability to work around ARC and get high performance with API. We believe when we learn idioms that generate high performance, our job is to talk about them here so the core team can adopt them into language land and make them pretty.


Here, we've discussed marking functions for asynchronous processing with keywords akin to try and throws. We've talked about generalizing this to an entire effects system. Are you thinking about making the weak/strong/singleton an effect too?

Particularly im against turn some things into keywords and hiding the implementation from the developers where they have no recourse once the path forced onto everyone don't work quite right for them.
I think Swift have the potential to be more explicit and less implict about some things, and i think that when it was decided for a deterministic GC for start, it would be feasible to change the underlying implementation once the language matured.

So im mostly against to do things like Go where you have a keyword. I rather prefer this to be a library where people can opt-in and customize the ways to do it in a competitive fashion.

You know, even the actor model its not the solution to every problem on concurrency. Sometimes you need thread pools and message loops, and for certain problems they are the right way to do multi-threading despite the actor model be great for a lot of scenarios as Go has proven, specially in servers scenarios.

I was advocating that even in the memory control aspect, make it a language construction with hidden implementation instead of a library with explicit and therefore customizable for different scenarios.

And as it turns out, right now the fact that the ref-count is backed in the language without any recourse, its making Swift unable to become competitive in a couple of important scenarios.

So instead of an 'async' keyword, how about a library with concurrent collections, with a concurrent queue in in and a thread pool that uses native pthread's and leave the rest with the devs.

Concurrent data structures and thread pools with scheduling are very hard already, so making those available and let the user compose their concurrency according to its need would be much better than force to use things that are hidden.

So im against things like 'async' becoming a keyword with the implementation hidden from the user.
But i understand why it would be popular for the majority of programmers out there.

But as jonhburkey have shown in this thread, the fact that the ARC is unavoidable is a big impediment for Swift to become more performant in a couple of scenarios.

No problem with keywords or effects system as long its programmable and not hardcoded . But maybe Swift wants to be more like Java, Go and Javascript and less like Rust and C++ (and in that case, i think its a lost potential of what this language can achieve, which is a lot).

Edit: In case its not clear, its an anwer to @CTMacUser. BTW is there any thread or document about the design you are talking about?
If is something like:
async func x()
--- where ---
func async() {
(explicit implementation for the async 'pseudo-keyword' goes here)

Im all for it. If it manage to be explicit and customizable but also being more elegant, even better.