Swift Performance

AlexanderM · September 12, 2019, 5:28pm

Oh okay, didn't know. I figured he was responding to me, but I didn't get a usual "reply" notification so wasn't sure.

martini · September 12, 2019, 5:30pm

Would it be possible to implement the opposite of Rust's Send marker trait in Swift? I.e. a 'NoSend' marker protocol which means that the type cannot be send to another thread? Then reference counting would not have to use the expensive atomic operations.

jamesgh · September 12, 2019, 5:32pm

Ah yeah I was replying to you.

David_Smith · September 12, 2019, 6:58pm

We have entry points in the runtime for nonatomic reference counting, but making use of them is still TBD, most likely until Swift has a concurrency model (it doesn't really make sense to try to promise things about threads when you don't formally know what a thread is). There's some interesting things that could potentially be done with it around escape analysis and proving unique references; the copy-on-write system already requires uniqueness checks, so it might be possible to teach the compiler what they mean.

Jon_Shier · September 12, 2019, 7:29pm

On what I think is a related note, I've seen some work on "semantic ARC" in 5.1 and master, is there any documentation about that effort, or just what "semantic ARC" means? I can't find anything that even explains the term.

Joe_Groff · September 12, 2019, 7:38pm

"Semantic ARC" is ARC/copy optimization based on ownership SSA.

Joe_Groff · September 12, 2019, 8:03pm

For non-UI apps that don't really need multiple threads, having a strictly single-threaded runtime mode might be interesting too, since we could do without a lot of runtime synchronization in a single-threaded process.

wadetregaskis · September 13, 2019, 12:56am

Intriguing, though I worry that it would encourage Python-style “thread safety” - i.e. none. Which might leak into libraries and hurt people that aren’t using such a mode.

lukasa · September 13, 2019, 9:35am

Not Johannes, but I'll leap in anyway and share my views.

Firstly, our experience with SwiftNIO backs up the assessment made by @David_Smith upthread, which is that the vast majority of naive Swift benchmarks will spend the majority of their time in retain/release calls. As @Joe_Groff has pointed out, this is likely in part because classes are the easiest and most natural thing to reach for in Swift code, especially as the apparent costs of reference counting your objects are not necessarily apparent.

However, our experience with SwiftNIO also reveals that substantial performance improvements can be had by attempting to push your reference type boundaries as far towards the edges of your program as possible. Taking swift-nio-http2 as an example, the core protocol implementation is built entirely of enums and structs. A couple of these enums and structs are variable-width, but the ARC traffic around those is limited and has been improving as the further ARC optimisations land.

The result of this is that in a SwiftNIO HTTP/2 benchmark, the core protocol handling code represents a tiny fraction of our runtime. This is great! The takeaway here is that operating directly on values is a huge win in many cases, and that in those cases Swift can definitely challenge Rust and C on performance.

The downside is that if you aren't careful the language does punish you when you escape into class land. For example, we pass our parsed HTTP/2 frames from our HTTP2Handler to our HTTP2StreamMultiplexer (both classes), and the ARC traffic on both the messages and the two handlers is a noticeable performance cost. This is definitely a thing that can continue to improve, and submitting benchmarks that are dominated by ARC traffic that could be elided is the best possible way to fix them.

As @Joe_Groff also mentioned upthread, the story here is improving. Moving from Swift 5.0 to Swift 5.1 has provided a bunch of SwiftNIO benchmarks with performance improvements from 5% to 20% with zero code change, almost all of which are likely derived from improving ARC optimisations. We also saw similar wins in moving from Swift 4.2 to Swift 5.

We should also be heartened by the fact that these improvements are consistently coming. The fact that they appear release after release suggests that there is still plenty of (relatively) low-hanging fruit here. There's also more good news available with future language work: if we can formalise a model of move-only types these should unlock the ability to provide extra ARC optimisation (I am excited about the combination of move-only types to wrap pointers into C code, as the extra heap allocation is an annoyance), and formalising a concurrency model also opens up new opportunities for reducing costs.

It remains the case, however, that the best solution to Swift performance problems is benchmarking, refactoring, and bug reporting. If you never benchmark your code, you won't know if it's slow. This applies to any language, C included. The best recommendation I have is to write benchmarks, analyse the program to find hotspots (using either perf or instruments), and then to examine what is causing those hotspots. This was exactly what we did in SwiftNIO HTTP/2, which led to a lengthy series of nice performance improving patches that in aggregate gave a performance improvement of around 25% in real-world benchmarks. They also provided one or two bugs.swift.org reports for performance improvements, as well as some benchmarks.

My TL;DR here is that Swift is not inherently slow. In fact, lots of Swift code is really fast! But unlike in C it is not always possible to glance at a line of code and see that it generates lots of ARC traffic, or an unnecessary copy on write. The fastest way to make your applications faster is to profile them, but whenever you encounter something that seems excessive, contributing test cases to the Swift project give the Swift team something to aim at for performance improvements. This is particularly true for test cases that generate substantial ARC traffic, as those are a commonly-identified pain point.

The final takeaway here is that the Swift team, particularly the performance folks, have done great work already on optimising ARC traffic, and there are more wins to come.

Torust · September 13, 2019, 10:29am

On that note, since I'm similarly looking forward to move-only types, how much extra implementation effort are they expected to require on the SILGen/IRGen/optimiser side of things? Does they fall naturally out of the ARC optimisation work, or is there significant extra effort involved beyond language design concerns?

AronL · September 13, 2019, 11:41am

It is always about the implementation details.

Here is a slightly modified version of the naive example from your first benchmark that is approximately 10x faster. (Edit: Never mind the example, seems like I broke something just before I published it)

I am not saying that Swift does not have some performance issues, but in a day to day use my general impression is that it is comparable to most other precompiled binary languages.

Spencer_Kohan · September 13, 2019, 1:05pm

I built a toy ECS implementation to poke at the performance ceiling for Swift, and I had a lot of the same takeaways you list here. Yes it's possible to write very performant Swift code, but it's really only possible through profiling. I hoped to find a rules-based approach for writing high-performance swift code, but would often be surprised when something innocuous, like capturing a particular variable within a closure, would incur a serious ARC penalty.

I am hopeful a lot of those "gotchas" are the low-hanging fruit that can be obviated away by a smarter compiler in future iterations as you suggest, but it also made me wish for more explicit tools for telling the compiler when ARC is not necessary at all. For instance if some object which will always exist for the lifetime of the program, it would be nice to be able to skip ARC on it completely.

Joe_Groff · September 13, 2019, 3:49pm

Yeah, the closure representation in SIL is also not ideal, and improvements here could lead to much better optimization of escaping closure code. On top-of-tree, we recently committed an improvement to the ARC optimizer that eliminates unnecessary reference counting around calls to the closure itself in many cases, which might have been the issue you're running into here.

We should also be able to dynamically optimize objects that live the lifetime of the program, by setting a flag in their object header that marks them as "immortal" so that retain and release calls early-exit without attempting to modify refcounts, avoiding the most expensive part of the atomic RMW. The compiler could conceivably set this bit automatically when it sees that a value is assigned to a global or static let variable, and maybe we could also provide a runtime function to allow code to manually tag objects as immortal.

Lantua · September 13, 2019, 3:55pm

What's the criteria for immortal object? It does sound close to Unsafe marking of some kind.

Joe_Groff · September 13, 2019, 3:57pm

Immortal wouldn't be unsafe in the sense of breaking memory safety. What it would mean is that the object is never released, so the memory it uses may be leaked if it could have been deallocated normally.

Lantua · September 13, 2019, 4:17pm

So it couldn't become zombies, but instead could leak (if detector isn't careful). What kind of objects would benefit here? It doesn't seems to fit many of classes I've seen. Or do we wrap them in something akin to autoreleasepool?

Joe_Groff · September 13, 2019, 4:33pm

It would be useful for singletons and other long-lived objects that are known to be used for the lifetime of the program.

anandabits · September 13, 2019, 4:40pm

Any chance the compiler would be able to detect an entire sub-graph of strong references that are all rooted in a global constant and therefore candidates for immortality?

Joe_Groff · September 13, 2019, 4:48pm

We probably could, using runtime metadata to traverse the graph, though that might cause performance issues of its own if it ends up walking a very large graph immortalizing objects.

David_Smith · September 13, 2019, 8:18pm

Currently the immortality flag is used for the empty collection singletons, which was a nice perf win. Definitely worth investigating where else it can be used; we’ve gotten a decent amount of mileage out of a similar concept in CoreFoundation.