What are you using for performance profiling?

taylorswift · January 29, 2022, 9:26pm

What is everyone using for performance profiling on linux? i used to just open up my mac and and build the macOS target of my application, and use Instruments.

pros: detailed results and visualizations, (for libraries) it forces me to target and test on macOS, where the majority of Swift users live

cons: not very accurate for server applications. it’s also not a very efficient workflow…

what are some good ways to profile performance on linux?

Karl · January 29, 2022, 9:50pm

On Linux, perf is good, and you can use Flamegraphs to visualise the data. There's an extremely well-hidden swift-server guide with some examples:

github.com

swift-server/guides/blob/main/docs/performance.md

# Debugging Performance Issues

First of all, it's very important to make sure that you compiled your Swift code in _release mode_. The performance difference between debug and release builds is huge in Swift. You can compile your Swift code in release mode using

    swift build -c release
    
## Instruments

If you can reproduce your performance issue on macOS, you probably want to check out Instrument's [Time Profiler](https://developer.apple.com/videos/play/wwdc2016/418/).
    
## Flamegraphs

[Flamegraphs](http://www.brendangregg.com/flamegraphs.html) are a nice way to visualise what stack frames were running for what percentage of the time. That often helps pinpointing the areas of your program that need improvement. Flamegraphs can be created on most platforms, in this document we will focus on Linux.

### Flamegraphs on Linux

To have something to discuss, let's use a program that has a pretty big performance problem:

```swift
/* a terrible data structure which has a subset of the operations that Swift's

This file has been truncated. show original

taylorswift · January 29, 2022, 9:53pm

yep! in the 25 minutes since i made this thread, i found perf by rummaging through the code in this slideshow (about 75% through). those SSWG guides just don’t show up in search…

Karl · January 29, 2022, 9:58pm

There's also this guide (also from the SSWG) explaining what perf can do and how to set it up.

And yeah, it's incredibly well hidden. You go to swift.org, scroll the sidebar all the way to the bottom to get "Swift on Server", then scroll that page all the way to the bottom to find links to the guides (like "Setup and code editing", "Building", "Testing", "Deployment", etc - the kind of thing you'd expect to be a little easier to find).

The website badly needs an overhaul.

taylorswift · January 29, 2022, 10:02pm

i copied and pasted the first 32 words of that perf guide into google and it does not appear…

hassila · January 29, 2022, 10:47pm

You might want to check out lttng.org as an option - have had good experience with it for low overhead profiling but haven’t yet looked at integration with swift.

taylorswift · January 29, 2022, 11:08pm

okay, so i got perf and the flame graphs working, but, as it turns out, the swift compiler (on -c release) inlines so aggressively that the call stack is almost never more than 2 or 3 stack frames deep. so i just get one slab named after the module, and a bunch of general symbols like swift_retain and some opaque standard library APIs layered on top of it. it’s not very useful for determining exactly where all the time is being spent. any tips on getting a more useful breakdown?

gnuoyd · January 30, 2022, 6:25pm

In my experience, perf record does not seem to use all available debug
information to generate a call stack. You may have better luck using
--call-graph dwarf than using the defaults.

I don't know whether or no the Swift compiler defaults to emitting DWARF
debug information in the most current and comprehensive form. If it
does not, then you may get better call stacks by forcing later/greater
DWARF info. Maybe swiftc has an equivalent to gcc's -gdwarf-4 option;
I have not been able to find it.

Dave

johannesweiss · January 31, 2022, 8:17pm

Yup, that's also covered in the perf guides linked above.

Additionally, you may want to use

perf record \
    -m 50000 \
    --call-graph dwarf,16384 \
    <your regular options>

The -m 50000 sets perf's buffer size to 50k pages and the ,16386 sets the max of each stack shot to 16k. The reason that increasing perf's buffer size makes sense is because with up to 16k per stack shot, the default buffer size is easily exhausted...

If you ask yourself how to find "the best" values of -m and --call-graph dwarf,X:

if you see [[unknown]] frames, increase the number X in --call-graph dwarf,X
if you see Warning: Processed ... events and lost ... chunks! then increase the number after -m (or reduce sampling frequency (-F))

I'd generally recommend to start low-ish, maybe --call-graph 2048 and -m 10000 and only raise them if you need the information. I added a few more tricks like that to the allocation flame graphs doc.

taylorswift · January 31, 2022, 8:25pm

the dwarf-based graphs are much more useful, thanks!