@tera, @stackotter, thanks for the suggestion. creduce sounds like an interesting tool. I'm not sure if it's helpful in this case, but I hope I'll find a use case in future and figure out how it works.
Just FYI, I made some progress in investigating the crash. below are the details.
First I want to explain, while I said the issue can be reproduced consistently, I didn't mean the behavior is fixed, it's actually dynamic. The async func in the test calls an API of my library. The API takes raw data in and returns processed data. The raw data are processed in multiple rounds. Each round contains many steps and in one step I use for
loop. That's where the custom iterator comes in. While the crash always occurred in the custom iterator initialization code, it's random in which round the crash occurred. That is, the iterator worked fine in some rounds and then it crashed. This is what I mean by 'dynamic'.
The issue isn't in the logic of the custom iterator code (BTW, I didn't show the code because it isn't implemented as a general utility. It depends on other code in my library. I could write a simplified standalone version. I didn't do it because I doubted it was that part of the code that caused the issue). The issue is in an enum variable assignment code in the custom iterator code. Please read on.
It's a stack overflow
See log below. The address that caused the crash is 0x700008cbe058.
(lldb) thread backtrace
* thread #4, queue = 'com.apple.root.default-qos.cooperative', stop reason = EXC_BAD_ACCESS (code=1, address=0x700008cbe058)
* frame #0: sp=0x0000700008cbe060 fp=0x0000700008d40b20 pc=0x0000000139b853e9 acdbCNTests`outlined assign with take of SomeDailyCEW? + 41 at <compiler-generated>:0
frame #1: sp=0x0000700008d40b30 fp=0x0000700008d422a0 pc=0x000000013a92b052 acdbCNTests`DDCEWGroupViewIterator.consumeNext(self=acdbCN.DDCEWGroupViewIterator @ 0x00007f84d3022810) + 642 at cewViewMapT+iterator.swift:156
frame #2: sp=0x0000700008d422b0 fp=0x0000700008d422b0 pc=0x000000013aacc6a9 acdbCNTests`protocol witness for CEWViewIteratorP.consumeNext() in conformance DDCEWGroupViewIterator + 9 at <compiler-generated>:0
...
From vmmap output, the valid address of that thread's stack is in 700008cc5000-700008d47000. The above address is outside the range.
Stack 700008bbf000-700008c41000 [ 520K 40K 8K 0K] rw-/rwx SM=COW thread 1
Stack 700008cc5000-700008d47000 [ 520K 184K 184K 0K] rw-/rwx SM=COW
Stack 700008d48000-700008dca000 [ 520K 16K 16K 0K] rw-/rwx SM=COW thread 3
Stack 700008dcb000-700008e4d000 [ 520K 16K 16K 0K] rw-/rwx SM=COW thread 4
Stack 7ff7ba447000-7ff7bac47000 [ 8192K 96K 96K 0K] rw-/rwx SM=COW thread 0
This diagram summarized the the relation of those addresses. Note that frame #0 has a very large size.

The assembly code of frame #0
Out of curiousity, I looked at the assembly code and managed to add some annotations (those lines starting with #
).
acdbCNTests`outlined assign with take of SomeDailyCEW?:
# Save fp
0x139b853c0 <+0>: pushq %rbp
# Copy sp to fp
0x139b853c1 <+1>: movq %rsp, %rbp
0x139b853c4 <+4>: pushq %r14
0x139b853c6 <+6>: pushq %rbx
# Increase stack size by subtracting a constant from sp:
# sp's origal value can be read from fp. It's 0x0000700008d40b20
# so sp's new value is: 0x0000700008d40b20 - 0x82ab0 = 0x700008cbe070
# the calculated value has a small difference with the actual value
# (0x0000700008cbe060). Not sure why (maybe alignment?).
0x139b853c7 <+7>: subq $0x82ab0, %rsp ; imm = 0x82AB0
0x139b853ce <+14>: movq %rdi, -0x50(%rbp)
0x139b853d2 <+18>: movq %rsi, -0x48(%rbp)
0x139b853d6 <+22>: movq %rsi, %rax
0x139b853d9 <+25>: movq %rax, -0x40(%rbp)
0x139b853dd <+29>: movq %rdi, -0x38(%rbp)
0x139b853e1 <+33>: movq %rsi, -0x30(%rbp)
0x139b853e5 <+37>: xorl %eax, %eax
0x139b853e7 <+39>: movl %eax, %edi
# This caused crash, because the sp adress is invalid.
-> 0x139b853e9 <+41>: callq 0x1398caef0 ; type metadata accessor for acdbCN.SomeDailyCEW at <compiler-generated>
The mysterious piece of code:
0x139b853c7 <+7>: subq $0x82ab0, %rsp ; imm = 0x82AB0
$0x82ab0 is larger than 512k, so no doubt the code crashed.
As I explained earlier, my code worked in multiple rounds and the crash occurred after some rounds. So it's likely that those earlier rounds called a (or multiple?) different version of the above code, which used a smaller constant. How this exactly works under the hood is a mystery to me.
My code uses value of large size (though I wrapped them in arrays and dictionaries). The above code is called when I set an enum variable, which has associated value of large size. I suspect this may cause the crash.
That is all. There are a few things I'm thinking to do:
-
Remove the enum in my iterator code and see it makes a difference. I haven't thought out a simple way to do it though.
-
Test my workaroud code in which I remove all the iterator code. And use vmmap to monitor its stack usage. If it's always below, say, 20K, then it's not a workaround, it's a fix :)
-
Write a test to assign an enum variable with a large array or dictinary and see if I can reproduce the crash.