I have a Vapor application that is experiencing random segmentation faults, but only when built with
-c release. It’s really weird for several reasons:
The segfault always occurs when accessing a particular data model protected by an
actor, nowhere else. the data was originally written by my application, so I do not think the segfault is Vapor’s fault.
I am not using any Unsafe APIs anywhere near where the segfault is happening. The only place I use an unsafe pointer is when doing datetime parsing (using
timegm), and there are multiple layers of validation between that and where the segfault is happening.
all of the data looks normal immediately after it is written, there are only problems when it is read back later. when i dump the data being read back, i see all kinds of data corruption, e.g. Decimals where the exponent field contains integer values that look like stack addresses (12 nonzero digits, starts with
this same data model gets accessed all the time by methods on its enclosing
actor, but the segfault occurs in a site where i call a method directly on one of the
actor’s members. the call site is, of course, gated by
async let, as the compiler demands. i have not observed a segfault when i wrap the call site in an
actormethod, and call that method from the other task instead. i’ve run the release binary several times with this modification, and it can get through several hundred work units before dying a “natural” death (see next section), just as the debug binaries do. for what it’s worth, the data model is a
structwhich contains some normal Swift arrays, which makes me suspect this issue has something to do with copy-on-write in a concurrency context.
the segfault never occurs on debug builds. on release builds, it often dies after about 20–70 units of work, but the debug binaries are consistently able to run for over 500 work units (before the OS kills it for excessive memory use.) I have never seen data corruption in a debug build. this, of course, makes it very difficult to identify the cause of this issue.
if anyone has any insights into what’s happening, that would be really helpful. especially tips on how to debug something that only occurs in release builds…