Swift Performance

Im late to the party i know..

But all of this makes me think how it was a missed opportunity for Swift
to have used its own Generics subsystem to support tagging the objects
instead of backing it in directly in the compiler as a keyword.

If we had something like:

weak<X>
shared<X>
singleton<X>

etc.., it would much better for the community to come up with its own solutions to different scenarios and grow in a organic fashion.

Now we need to wait until things become keywords, and theres no way to fight against a compiler that will hardcode release/retain as a compiler pipeline step.

Imagine if instead of having to wait for a 'strong' keyword we could have designed a strong<> holder that would use simple alloc and free directives.

Swift is one of the most beautiful and cool languages to develop in out there, its a pitty that most of its flaws on the design are there because it had to be compatible with Objective C.

1 Like

I'm not an expert on this but I found that Nim lang recently started to use ARC, and they mention that this change has improved Nim's performance. The difference between Swift and Nim ARC implementation is that Nim's one doesn't use Atomic ARC. They explain all of this here. Maybe swift with a concurrency model, could make optimizations under the hood and perform ARC nonatomic like Nim and could improve language performance.

The last two posts are along the lines we are advocating- one way to do it is to replace ARC's global memory ideas, and let eng teams use API constructs to tell ARC how to act. We get a lot of bang out of generational alloc- an infant generation that looks alot like stack alloc, but is actually a mini heap- but importantly- because there is no (lock/atomic), you get ultra fast alloc, and you just graduate infants to the another heap as you need to. And as Fabio says, lets just make it possible to get ARC out of the way, or change its characteristics. Using Unsafe stuff works, but there are more sugary ways to do it.

There are some limited optimizations to turn refcounts into nonatomic operations when we know the object being referenced doesn't escape to other threads, but it's difficult to do much of that today because Swift as a whole does not have a strong notion of threads. However, recent Apple Silicon hardware, like the A14 CPU in the latest iPhones and iPads, new makes uncontended atomic operations almost as cheap as nonatomic accesses, making that kind of compile-time optimization less necessary.

7 Likes

? Your link takes me to a page with no code on it AFAICT.

Dave, the code on the left is an actor DSL, and the code on the right in a swift knowledge graph DSL.
At Swift | Brighten AI.

Joe, yeah we have to add Api to let engineering teams talk about threads. The actor model is a way to do that. Once in the actor, you have to be able to say "actor local allocation". Then you can use an actor private heap, with no lock.

What we do for fine grained concurrency, is make a set of "operations" that each have their own infant generation heaps inside them, and we "go wide" to many cores, we hand one to each core. Each core operation parties in its container, free of locks, and graduates results from that operation that are back on the main ARC heap. This is a massive win- because there's tons of compute in each core, and for 1000,000 inputs, & 10x that temporaries, there's just a few outputs. All of that infant generation garbage is high perf, no atomics, no locks allocs.

Just to be clear, im not advocating against ref-count per-se.

Its a very reasonable GC when you need deterministic guarantees. Thanks for letting us be aware about how the compiler can make some optimizations by avoiding atomics in certain circumstances.

But lets imagine that Swift started with two small changes: Struct destructors and using something like

struct shared<T> {
   .. t ...
    init(_t: T) {
      _refcounted_alloc(t)
    }
    deinit {
       _refcounted_dealloc(t)
    } 
}

To do the exact same thing the compiler is doing now, it would be a matter of bake in the recognition of '_refcounted_alloc' as a special function and make the compiler do it as a no-op in the cases it cleverly diagnosed that a ref-count is actually not needed.

I'm not a compiler expert, but i wonder if only the language were launched with struct destructors, the capacity of moving value types and some sort of macro keyword for retain/release would do to Swift.

What's on the 'ownership manifesto' could probably have being achieved by now, without taking the training wheels for the majority of people that don't need to customize the language by that much.

We all want more performance using generally built into the language constructs- and Fabio i see your point. I would add that we still need to not touch the main heap with its needs for atomics or locks too.

There's just no reason to destroy single threaded compute perf that way.

And as a person who has studied performance alot (I helped design Shark, back in the day), I can tell you, Swift is insidious in the way it inserts ARC cost into call chains even dodging class use, etc. Part of why were are talking about it here, is that I don't think most compiler people have time to study ultra high performance compute projects, and so don't see the effects of ARC in the way we do.

As I noted, when I visit that page (in Chrome) I see no code whatsoever. OK, I tried a different browser. Safari shows something.

Please do not presume; I can vouch for the HPC bona-fides of at least one key member of the Swift team.

Ah, I see. We use a PDF viewer plugin, good to know, apologies !

We use a really simple Actor DSL, the idea is generate a static Actor so its safe to call with async messaging, and have a queue associated with it. We get a lot of simplicity out of that. Brighten Ai is tens of megabytes of compiled code, so we need submodules to merrily do their work and talk to each other without having to worry about threading. This is the large scale course grained concurrency stuff, and we would add that we also have a pipelining version of that (for things like decoding, with many steps, and producer / consumer motif). and also advocate that Actors need sooner or later to be able to say "I need these actors started before me". And I would advocate also only allowing Codables to be pumped for V1.

I hope its okay to just post the link to the PDF here? https://www.brighten.ai/wp-content/uploads/2020/10/actor3.pdf

fyi swiftai is already already an opensource project GitHub - fastai/swiftai: Swift for TensorFlow's high-level API, modeled after fastai

I hope brighten ai will consider opensourcing some of this work.

Sure glad you posted the link.

So about the "high allocation frequency"- you want to think of BrightenAI as something akin to a decoding pipeline, taking raw audio-->speech/nlu-->action. We handle language models with well over 20 million entities (because navigation & tv program search are large) 20 million x any thing is alot of temporary allocs. Our time budget is 50 ms to compare all 20 million. (The rest of the pipeline has to finish within 200ms too).

If you have looked into the insides of a speech engine, or an NLU system, you will find lots of temporary structures and things describing models and theories, which are en masse gathered during an early part of a pipeline stage, and then best-fit pathways are found through least cost viterbi searches through cost connected node forests. The rest are pruned away and evaporate.

Swift is getting pretty good at the immutable struct thing- but what you find is if you use collections, or dynamic length objects (ie. a string, but thats not what we are talking about- but a string is dynamic length as well- you run into ARC and heap work and performance getting hit pretty hard.

Our allocations that are high frequency are temporary, so we make a temporary heap and allocate from there, and manually promote winners out of the infant heap (when they win). We didn't do anything weird to the language. Swift's "unsafe" stuff gives us the ability to work around ARC and get high performance with API. We believe when we learn idioms that generate high performance, our job is to talk about them here so the core team can adopt them into language land and make them pretty.

3 Likes

Here, we've discussed marking functions for asynchronous processing with keywords akin to try and throws. We've talked about generalizing this to an entire effects system. Are you thinking about making the weak/strong/singleton an effect too?

Particularly im against turn some things into keywords and hiding the implementation from the developers where they have no recourse once the path forced onto everyone don't work quite right for them.
I think Swift have the potential to be more explicit and less implict about some things, and i think that when it was decided for a deterministic GC for start, it would be feasible to change the underlying implementation once the language matured.

So im mostly against to do things like Go where you have a keyword. I rather prefer this to be a library where people can opt-in and customize the ways to do it in a competitive fashion.

You know, even the actor model its not the solution to every problem on concurrency. Sometimes you need thread pools and message loops, and for certain problems they are the right way to do multi-threading despite the actor model be great for a lot of scenarios as Go has proven, specially in servers scenarios.

I was advocating that even in the memory control aspect, make it a language construction with hidden implementation instead of a library with explicit and therefore customizable for different scenarios.

And as it turns out, right now the fact that the ref-count is backed in the language without any recourse, its making Swift unable to become competitive in a couple of important scenarios.

So instead of an 'async' keyword, how about a library with concurrent collections, with a concurrent queue in in and a thread pool that uses native pthread's and leave the rest with the devs.

Concurrent data structures and thread pools with scheduling are very hard already, so making those available and let the user compose their concurrency according to its need would be much better than force to use things that are hidden.

So im against things like 'async' becoming a keyword with the implementation hidden from the user.
But i understand why it would be popular for the majority of programmers out there.

But as jonhburkey have shown in this thread, the fact that the ARC is unavoidable is a big impediment for Swift to become more performant in a couple of scenarios.

No problem with keywords or effects system as long its programmable and not hardcoded . But maybe Swift wants to be more like Java, Go and Javascript and less like Rust and C++ (and in that case, i think its a lost potential of what this language can achieve, which is a lot).

Edit: In case its not clear, its an anwer to @CTMacUser. BTW is there any thread or document about the design you are talking about?
If is something like:
async func x()
--- where ---
@this_is_magical_unicorn_that_turn_this_into_a_side_effect_keyword
func async() {
(explicit implementation for the async 'pseudo-keyword' goes here)
}

Im all for it. If it manage to be explicit and customizable but also being more elegant, even better.

Well said, and lets be clear, Java destroys Swift in a lot of these scenarios on performance, because modern GC (like last 20 years, since hotspot - remember I was Java chief architect) outperforms "deterministic GC" ARC because it doesn't force high frequency function calling with locking (the retain/releases) in many concurrent compute cases that ARC does. The advantage of run time compiling is that you can compile away dynamic method invocations & GC work when at runtime the compiler can see that those can go poof.

Swift has so many other cool things about it, including simple native compiling and targeting binaries to tons of platforms without a big platform port, and we are here because as Fabio is also advocating, with a bit more innovation towards letting us get in there and get ARC out of the way, we can have even better stuff, thats competitive under real load without the kind of surgery we did to get swift to be performant.

Agree YES that actors are not a panacea, but another good tool, as we've said, we love its simple course grained asynchrony use, not for fine grained- we use concurrent GCD queues, (which are atomic queues over thread pools), with an added heap + context per concurrent queue for our max performance "farm out same compute to multiple cores please" areas.

We do believe freeing us from ARC is actually a performance panacea ;-). (Given that you let us annotate an "operation" and make it look like ARC, but this mini heap annotated ARC would allocate locally from a "queue local heap" without locking etc. , & we promise to behave. Label it "unsafe" if you like, its certainly safer to do that than actually use UnsafeMemory all over our code! Which is what we are doing, and maybe we should open source some of our tricks, but then performance minded swift people would drift further away from ARC, and that seems to not scale....

As an aside, one of the ways you will know Swift is actually fast is when they compare its performance to tuned Java & C++, and not Javascript, Python, or Obj-C, all languages with hideous performance characteristics.

Is the last part a question to me or @CTMacUser? :slight_smile:

1 Like

if swift will have moveonly value-types(moveonly struct) and borrowing rules like in rust(based on which to implement RAII) then there will be probability of higher swift performance without ARC and GC.
This is about implementing ownership and borrowing in swift similar to rust lang.

3 Likes