I need to optimize my application and in the process of profiling it using Instruments — and after "Inverting the Call Tree" — I discovered the most used construct is ___chkstk_darwin
. I didn't see anything that cleared up what it was after searching. Can anyone explain to me what ___chkstk_darwin
is?
Stack checking, to catch stack overflows. You can disable it by passing -fno-stack-check
if necessary.
further reading: Security Technologies: Stack Smashing Protection (StackGuard)
If this is an iOS application compiled by Xcode, where would I place the -fno-stack-check
flag?
Build Settings > Other C Flags.
After setting the flag it's still showing up in the call stack in Time Profiler. Do you know if the flag needs to also be set in the Build Settings of each module used? And somehow set for each SPM used?
"Other C Flags" will only affect C/C++/ObjC compilation. There isn't any equivalent flag (that I know of) for the Swift compiler. -Xclang -fno-stack-check
might do the trick in "Other Swift Flags", but no guarantees. Stack checks in Swift often arise from the dynamic allocations that we emit for unspecialized generic code, so if you're seeing it creep into your hot paths, there might be something inadvertently blocking specialization?
This function is using ___chkstk_darwin
the most:
extension Array where Element == UInt {
/// Get bit i.
func getBit(at i: Int) -> Bool {
let limbIndex = Int(UInt(i) >> 6)
if limbIndex >= self.count { return false }
let bitIndex = UInt(i) & 0b111_111
return (self[limbIndex] & (1 << bitIndex)) != 0
}
}
Does anyone know what about it is causing it to be used?
Some of those operations are generic, so if you're building with optimization disabled, we may be calling out to their unspecialized generic implementations. If I build with optimization, I don't see any __chkstk
calls in the optimized assembly on macOS x86_64:
_$sSa3fooSuRszlE6getBit2atSbSi_tF:
pushq %rbp
movq %rsp, %rbp
testq %rdi, %rdi
js LBB1_5
movq %rdi, %rax
shrq $6, %rax
cmpq 16(%rsi), %rax
jae LBB1_3
movq 32(%rsi,%rax,8), %rax
btq %rdi, %rax
setb %al
popq %rbp
retq
LBB1_3:
xorl %eax, %eax
popq %rbp
retq
LBB1_5:
## InlineAsm Start
## InlineAsm End
ud2
The iOS app has -Os
(Fastest, smallest) set for the Apple Clang - Code Generation
Optimization Level
setting, and it has -O
(Optimize for Speed) set for the Swift Compiler - Code Generation
Optimization Level
setting; and it's still showing the __chkstk
calls.
The library that's making the calls is actually a SPM dependency though. Does there need to be optimization settings set for it specifically in order for optimizations to apply?
That could be it, since we still don't currently do cross-module optimization by default. You could try making the function in the SPM dependency @inlinable
so that it can be inlined and specialized into the client project.
Does this only mean that the optimization settings set in the iOS app's build settings will not apply to the dependency, or does it mean that plus even if there are optimization settings set in the SPM dependency they too will also not be applied?
Your optimization settings should apply to all packages independently. Without cross-module optimization or @inlinable
, though, we won't do optimizations across package boundaries, such as inlining a call to a small helper function in a package, or specializing a generic call in another package.
Are you referring to self.count
and self[limbIndex]
?
For reference:
extension Array where Element == UInt {
/// Get bit i.
func getBit(at i: Int) -> Bool {
let limbIndex = Int(UInt(i) >> 6)
if limbIndex >= self.count { return false }
let bitIndex = UInt(i) & 0b111_111
return (self[limbIndex] & (1 << bitIndex)) != 0
}
}
Hm. Those both come from the standard library, so they should be inlinable and specializable when used from any package. It might be possible that the package containing the getBit
definition is in fact not being built with optimization, though I'm not sure why that would be the case.
Found some evidence that compiling in debug mode might result in optimizations being off for SPM dependencies. After changing the run scheme for the iOS app to run in release
mode, all performance issues disappeared.
If Google pointed you here (like myself) and you have a similar issue with ___chkstk_darwin
in release mode, try the -no-stack-check
Swift compiler flag. Please notice there is no f
in the flag name. As well as no need in -Xclang
flag.
The flag could be used in Xcode "Other Swift Flags" (OTHER_SWIFT_FLAGS
) or in SwiftSetting.unsafeFlags
for an SPM target, for example
.executableTarget(name: "MyTarget", swiftSettings: [.unsafeFlags(["-no-stack-check"])])
I've tested the flag helps on Xcode 14.3 (14E222b) and Swift compiler v5.8 (swiftlang-5.8.0.124.2 clang-1403.0.22.11.100).
BTW, @Joe_Groff in my case the ___chkstk_darwin
appears in a hot path under the UnsafeRawBufferPointer.loadUnaligned<T>(fromByteOffset:as:)
call. Please let me know if I should report that issue separately.
In debug or in release?
In release. I tried both -O
and -Ounchecked
and ___chkstk_darwin
is still there. -no-stack-check
blows it away.
Definitely worth a bug report if you can attach a full example to reproduce it.