Need help to invesigate a possible stack overflow issue

rayx · February 17, 2023, 2:41pm

Hi @hassila I tried address and thread sanitizers. They didn't report any error. An interesting detail: If I enabled both address sanitizer and malloc scribble options, the crash was gone (otherwise the crash persisted). So adding debug code might change the binary's behavior.

Below are my other findings and how I worked around the issue.

TLDR: vmmap output suggested there was no stack overflow in the thread that crashed. So I worked around the issue by modifying my code.

I found this thread. There are a lot of useful information in it, though none are particular helpful in this case because no one mentioned how they knew for sure it's really a stack size issue. I ended up with using vmmap to check stack size of the xctest process after it crashed:

$ vmmap $(pgrep xctest) | grep -i stack
...
Stack                    70000f47f000-70000f501000 [  520K    40K    16K     0K] rw-/rwx SM=COW          thread 1
Stack                    70000f502000-70000f584000 [  520K    88K    80K     0K] rw-/rwx SM=COW          thread 3
Stack                    70000f585000-70000f607000 [  520K   324K   324K     0K] rw-/rwx SM=COW  
Stack                    70000f608000-70000f68a000 [  520K    16K    16K     0K] rw-/rwx SM=COW          thread 4
Stack (reserved)         70000f68b000-70000f70d000 [  520K     0K     0K     0K] rw-/rwx SM=NUL          reserved VM address space (unallocated)
Stack                    7ff7be516000-7ff7bed16000 [ 8192K    96K    96K     0K] rw-/rwx SM=COW          thread 0

Notes:

thread 4 is the thread that crashed.
thread 0 is the main thread, with 8M stack size
there is one line without thread name. I suppose it's thread 2. Not sure why it used so much space in the stack (the thread has a different name than 'cooperative thread', but I forgot the details).
the output varied in each run, but the crashed thread's stack size never exceeded 16k.

Memory metrics are known to be hard to interprete. But if my understandinng is correct, the above output showed that there was no stack size issue when the crash occurred.

Then I started to think about workaround. The crash always occurred in a specific part of the code. I didn't pay much attention to it at first because I think memory error crash could occur at any place. The code is a custom iterator which iterates multiple arrays at the same time and chooses one at a time based on rules. I have looked at the code a lot of times these two days but really can't find any issue in it (it worked fine on my phone anyway). I also find the following in official doc, I don't think I did anything wrong in my code:

Using Multiple Iterators

Obtain each separate iterator from separate calls to the sequence's makeIterator() method rather than by copying. Copying an iterator is safe, but advancing one copy of an iterator by calling its next() method may invalidate other copies of that iterator. for-in loops are safe in this regard.

I ended up by removing the custom iterator and the crash was gone (no idea why).

FWIW, these are details of the crash

The crash occurs at the this instruction in most cases:

-> 0x10013abeb299 <+41>: callq 0x10013a930da0 ; type metadata accessor for acdbCN.SomeDailyCEW at

The 0x10013a930da0 is a valid address (I can dissemble it).

The correspoding line in the source code is just a assignment (both sides are local variables of value type) that is impossible to fail.

The topmost frame of the stack trace is some like:

outlined assign with take of SomeDailyCEW?

SomeDailyCEW is a type in my code. I think it corresponds to the assembly code above.

The error message is something like:

Thread 7: EXC_BAD_ACCESS (code=2, address=0x70000be7cfb0)

The 0x70000be7cfb0 is a mystery. It has nothing to do with the above addresses (neither the current address nor the address to jump to).

Without the knowledge of compiler internals and the LLVM, this is the best I gathered.

"We don't solve problems. We survive them." :)