What are you using for performance profiling?

What is everyone using for performance profiling on linux? i used to just open up my mac and and build the macOS target of my application, and use Instruments.

pros: detailed results and visualizations, (for libraries) it forces me to target and test on macOS, where the majority of Swift users live

cons: not very accurate for server applications. it’s also not a very efficient workflow…

what are some good ways to profile performance on linux?

2 Likes

On Linux, perf is good, and you can use Flamegraphs to visualise the data. There's an extremely well-hidden swift-server guide with some examples:

2 Likes

yep! in the 25 minutes since i made this thread, i found perf by rummaging through the code in this slideshow (about 75% through). those SSWG guides just don’t show up in search…

There's also this guide (also from the SSWG) explaining what perf can do and how to set it up.

And yeah, it's incredibly well hidden. You go to swift.org, scroll the sidebar all the way to the bottom to get "Swift on Server", then scroll that page all the way to the bottom to find links to the guides (like "Setup and code editing", "Building", "Testing", "Deployment", etc - the kind of thing you'd expect to be a little easier to find).

The website badly needs an overhaul.

2 Likes

i copied and pasted the first 32 words of that perf guide into google and it does not appear…

2 Likes

You might want to check out lttng.org as an option - have had good experience with it for low overhead profiling but haven’t yet looked at integration with swift.

1 Like

okay, so i got perf and the flame graphs working, but, as it turns out, the swift compiler (on -c release) inlines so aggressively that the call stack is almost never more than 2 or 3 stack frames deep. so i just get one slab named after the module, and a bunch of general symbols like swift_retain and some opaque standard library APIs layered on top of it. it’s not very useful for determining exactly where all the time is being spent. any tips on getting a more useful breakdown?

1 Like

In my experience, perf record does not seem to use all available debug
information to generate a call stack. You may have better luck using
--call-graph dwarf than using the defaults.

I don't know whether or no the Swift compiler defaults to emitting DWARF
debug information in the most current and comprehensive form. If it
does not, then you may get better call stacks by forcing later/greater
DWARF info. Maybe swiftc has an equivalent to gcc's -gdwarf-4 option;
I have not been able to find it.

Dave

1 Like

Yup, that's also covered in the perf guides linked above.

Additionally, you may want to use

perf record \
    -m 50000 \
    --call-graph dwarf,16384 \
    <your regular options>

The -m 50000 sets perf's buffer size to 50k pages and the ,16386 sets the max of each stack shot to 16k. The reason that increasing perf's buffer size makes sense is because with up to 16k per stack shot, the default buffer size is easily exhausted...

If you ask yourself how to find "the best" values of -m and --call-graph dwarf,X:

  • if you see [[unknown]] frames, increase the number X in --call-graph dwarf,X
  • if you see Warning: Processed ... events and lost ... chunks! then increase the number after -m (or reduce sampling frequency (-F))

I'd generally recommend to start low-ish, maybe --call-graph 2048 and -m 10000 and only raise them if you need the information. I added a few more tricks like that to the allocation flame graphs doc.

8 Likes

the dwarf-based graphs are much more useful, thanks!