Apparent contention in futex related to libswiftcore (5.3 arm linux)

William_Dillon2 · April 22, 2021, 11:48pm

I've been looking at a performance problem we've been seeing in Arm32/Linux since switching to Swift 5.3 (from 5.1). This is in an environment where it's cross-compiled via Yocto/bitbake, and we're also working on upstreaming the meta-swift layer we created. We've been cross-compiling this same way with 5.1 in the past.

I've bee struggling a little with nailing it completely down, but it looks like libswiftcore is holding the bag at the moment. When we run our executables via strace we see an astounding number of system calls of the form: futex(0xb6f608a0, FUTEX_WAKE_PRIVATE, 2147483647) = 0 Some of these system calls take many seconds to complete.

We have been able to collect time profiles of the process, and the hottest function of the entire program is swift::RefCounts<swift::SideTableRefCountBits>::incrementUnownedNonAtomic(unsigned int) (at about 6% of all samples). I suspect my sample set doesn't include the futex system call at all, though it is present in strace output.

I'm hoping that someone has an idea about this section of the code, what might have changed, or something I could use for a breadcrumb to start working on.

Thanks

David_Smith · April 23, 2021, 9:19am

Heavy use of weak references would cause side table refcounting. Inline refcount overflow would also do it.

tclementdev · April 23, 2021, 4:46pm

Hey David, just curious here, what do you mean by "heavy use"? Don't weaks always use side tables?

William_Dillon2 · April 23, 2021, 5:28pm

Also, I think there's a scale of performance issue that I didn't emphasize enough in the last message... One process can easily load-up an entire CPU with a relatively small amount of actual "work"

top - 17:23:25 up 19:46,  1 user,  load average: 0.92, 0.50, 0.35
Tasks: 105 total,   2 running, 103 sleeping,   0 stopped,   0 zombie
%Cpu(s): 89.9 us,  5.9 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  4.2 si,  0.0 st
MiB Mem :    493.4 total,    119.5 free,     79.0 used,    294.9 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.    377.1 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 9547 root      20   0  123168  29868  27920 R  94.3   5.9   1:03.48 CLIAgent

An application that does relatively little takes 0.7 seconds on Swift 5.1 and 7 seconds in swift 5.3, with the same source code.

And, in the case of the above application, I don't think there are any weak references at all. It's single-threaded with no use of closures.

David_Smith · April 23, 2021, 8:20pm

Please file a bug about that. We shouldn't be regressing perf!

David_Smith · April 23, 2021, 8:21pm

Yeah, weak refs always use side tables, but usually it doesn't matter like this.

William_Dillon2 · April 23, 2021, 8:24pm

Thanks David!

I'm very happy to dig in and assign some folks on my team to this issue to help get it resolved if you think that's a tractable thing. If it's not, I think we'll focus on working through our pain-points with 5.1 (mostly relating to changes in the metadata format of SPM between the compiler versions).

David_Smith · April 23, 2021, 8:34pm

One possible approach if you have some time is to try to boil your test case down to something that would be usable in the Swift benchmark suite, see if it reproduces on Darwin systems, and put up a PR with it. That way any time in the future this regresses, CI will yell at us.

William_Dillon2 · April 23, 2021, 8:36pm

I think the biggest struggle with that will be that I've only seen it in 32-bit arm so far. But, that is a good plan.

David_Smith · April 23, 2021, 8:37pm

oh! 32 bit! That might be highly related. The refcount representation is different on 32 bit and overflows more easily.