My unit test is keep crashing because struct that conformed to Equable has a outlined retain and it was released

sunlexy · June 20, 2023, 10:37am

Hey Folks

What I have done is only added an option Int to the EventModel.AppliedFilters. And it causes a chain of crash on my CI in different tests of different files.

Xcode 14.3 swift 5.8

I have been keeping having crashes on my CI, sometimes it crashes locally, but for sure it disappears after I turn on the address sanitiser.

Sometimes I manage to mitigate the local crash by reducing the property count in this struct, or just grouping some of them into another embedded struct. But after I pushed to the CI, the crash continues.

I used godbolt to check the assembly code. there are obviously differences between 5.8 and nowadays nightly.
swift 5.8 lastest nightly swift. in the new assembly the outlined retain is gone...

Code gist.
In one of the crashes log1, log2 some module names are obfuscated. But there are pretty random happening on structs.

I have checked so many links but none of them point me to a solution. Help, please.

https://developer.apple.com/forums/thread/115715
https://developer.apple.com/forums/thread/698391

github.com/apple/swift

[SR-15215] Excessive outlined init with takes when using optionals

opened 10:01AM - 19 Sep 21 UTC

karwa

bug compiler

| | | |------------------|-----------------|… |Previous ID | SR-15215 | |Radar | rdar://problem/83297558 | |Original Reporter | @karwa | |Type | Bug | <details> <summary>Environment</summary> Swift version 5.6-dev (LLVM 70b82c8b83f6294, Swift a66435c61f9a895) Target: x86_64-unknown-linux-gnu </details> <details> <summary>Additional Detail from JIRA</summary> | | | |------------------|-----------------| |Votes | 1 | |Component/s | Compiler | |Labels | Bug | |Assignee | None | |Priority | Medium | md5: 8c58205ab47b0a9798dd31351dc797cd </details> **Issue Description:** Consider the following code: ``` java func test(_ input: HasAnEnum) -> Int { input.readValue() } enum MyEnum { case one case four } struct HasAnEnum { var s1: String var s2: String var s3: String var s4: String var s5: String var s6: String var s7: String var s8: String var s9: String var s10: String var value: MyEnum func readValue() -> Int { let x = value if case .four = x { return 4 } return -1 } } ``` In an optimised build, this generates the following assembly ([Godbolt](https://godbolt.org/#g:!((g:!((g:!((h:codeEditor,i:(filename:'1',fontScale:14,fontUsePx:'0',j:1,lang:swift,selection:(endColumn:54,endLineNumber:21,positionColumn:54,positionLineNumber:21,selectionStartColumn:54,selectionStartLineNumber:21,startColumn:54,startLineNumber:21),source:'func+test(_+input:+HasAnEnum)+-%3E+Int+%7B%0A++++input.readValue()%0A%7D%0A%0Aenum+MyEnum+%7B%0A++++case+one%0A++++case+four%0A%7D%0A%0Astruct+HasAnEnum+%7B%0A++++var+s1:+String%0A++++var+s2:+String%0A++++var+s3:+String%0A++++var+s4:+String%0A++++var+s5:+String%0A++++var+s6:+String%0A++++var+s7:+String%0A++++var+s8:+String%0A++++var+s9:+String%0A++++var+s10:+String%0A++++var+value:+MyEnum+//+Try+making+this+an+optional.%0A%0A++++func+readValue()+-%3E+Int+%7B%0A++++++++let+x+%3D+value%0A++++++++if+case+.four+%3D+x+%7B+return+4+%7D%0A++++++++return+-1%0A++++%7D%0A%7D'),l:'5',n:'0',o:'Swift+source+%231',t:'0')),k:48.13751087902524,l:'4',n:'0',o:'',s:0,t:'0'),(g:!((h:compiler,i:(compiler:swiftnightly,filters:(b:'0',binary:'1',commentOnly:'0',demangle:'0',directives:'0',execute:'1',intel:'0',libraryCode:'0',trim:'1'),flagsViewOpen:'1',fontScale:14,fontUsePx:'0',j:1,lang:swift,libs:!(),options:'-O',selection:(endColumn:1,endLineNumber:1,positionColumn:1,positionLineNumber:1,selectionStartColumn:1,selectionStartLineNumber:1,startColumn:1,startLineNumber:1),source:1,tree:'1'),l:'5',n:'0',o:'x86-64+swiftc+nightly+(Swift,+Editor+%231,+Compiler+%231)',t:'0')),k:51.862489120974764,l:'4',n:'0',o:'',s:0,t:'0')),l:'2',n:'0',o:'',t:'0')),version:4)): output.test(output.HasAnEnum) -> Swift.Int: movzx eax, byte ptr [rdi + 160] lea rax, [rax + 4*rax] add rax, -1 ret Short, sweet. Just what you'd expect, right? Now, let's add one character and watch the world go up in flames: ``` java struct HasAnEnum { // ... var value: MyEnum? // <- Make this an optional } ``` output.test(output.HasAnEnum) -> Swift.Int: push rbx sub rsp, 16 add rdi, 160 lea rbx, [rsp + 8] mov rsi, rbx call (outlined init with take of output.MyEnum?) mov rsi, rsp mov rdi, rbx call (outlined init with take of output.MyEnum?) movzx eax, byte ptr [rsp] mov ecx, eax and ecx, 1 cmp rax, 2 lea rcx, [rcx + 4*rcx - 1] mov rax, -1 cmovne rax, rcx add rsp, 16 pop rbx ret Boom. Now we're calling a bunch of runtime functions, and this code suddenly performs much worse while also being much larger. In my code, this is a major performance issue. It dominates almost everything else we do... just to load a stored optional enum. <img src="https://aws1.discourse-cdn.com/swift/optimized/3X/e/e/ee356c95c45fde95ced9957c4f5f79854cdb9d86_2_690x159.png" width="690" height="159" /> (This trace is from a function which writes a normalized URL string using scanned ranges from an input string. As you can see, the only thing more expensive than this is parsing/writing the path string, which has to simplify arbitrarily long and complex paths and percent-encode them. That's how expensive it is). In fact, \>8% of the time spent writing the URL string is the load of this one optional. Wild. <img src="https://aws1.discourse-cdn.com/swift/original/3X/c/1/c1c0abdabc5b6a4d493a7f69d263b62bc430380d.png" width="616" height="229" />

https://developer.apple.com/forums/thread/711393

sunlexy · June 20, 2023, 2:14pm

all the threads when it crashes.

Over the internet, I rarely see a trace of people talking about CG Timer. it is related to garbage collection?

tera · June 20, 2023, 4:50pm

The gist shown has some data structures but no actual code (other than the memberwise inits), if you have a snippet app that reproduces the crash worth showing it, otherwise it's hard to decipher what you are talking about.

sunlexy · June 20, 2023, 6:15pm

I'm struggling to find something that is small and representative of the problem. Totally aware of the limitation of this snippet. Will post it once I know how to produce it. Now, I'm still trying to find out why and how it happened. But the assembly code I posted is showing why I'm getting the crash as in swift 5.8 there are pairs of retain and release when the filter is set. while in the new night build it not there any more. Currently, I'm looking for an understanding of it. so I can maybe find a small example to reproduce it.

Joe_Groff · June 20, 2023, 6:22pm

A crash in retain/release related functions is often indicative of memory corruption occuring, which could be because of bugs in unsafe code, data races, or compiler/runtime bugs. If you haven't yet, building with Address Sanitizer and Thread Sanitizer enabled may catch the source of the corruption closer to where it originally occurred.

sunlexy · June 20, 2023, 6:37pm

thanks, Joe. I'm gonna keep doing more testing. So far every time I turned on ASAN, the crash disappear on my local machine. While our CI always be able to catch it. (Mac stadium virtual machine has a much-limited environment?) I haven't tried TSAN yet. Just thought it might be irrelevant as it is very likely a memory corruption. But I'll check now.
I thought structs are just a copy when passing around. My case here contradicts this assumption.

sunlexy · June 22, 2023, 9:51am

after all, not luck with TSAN it become the same as when ASAN was being used. the crash disappeared.

sunlexy · June 23, 2023, 8:57am

so, I fixed it by renaming my struct. form AppliedFIters to FilterAplied...... I do not know why

sunlexy · July 24, 2023, 3:44pm

new crash log

libswiftCore.dylib`swift::RefCounts<swift::RefCountBitsT<(swift::RefCountInlinedness)1> >::incrementSlow:
0x18bfbb494 <+0>: stp x20, x19, [sp, #-0x20]!
0x18bfbb498 <+4>: stp x29, x30, [sp, #0x10]
0x18bfbb49c <+8>: add x29, sp, #0x10
0x18bfbb4a0 <+12>: mov x19, x0
0x18bfbb4a4 <+16>: cmn w1, #0x1
0x18bfbb4a8 <+20>: b.eq 0x18bfbb4d4 ; <+64>
0x18bfbb4ac <+24>: tbz x1, #0x3f, 0x18bfbb558 ; <+196>
0x18bfbb4b0 <+28>: mov x3, x2
0x18bfbb4b4 <+32>: lsl x8, x1, #3
-> 0x18bfbb4b8 <+36>: ldp x6, x5, [x8, #0x10]
0x18bfbb4bc <+40>: cmp w2, #0x1
0x18bfbb4c0 <+44>: b.eq 0x18bfbb4e4 ; <+80>
0x18bfbb4c4 <+48>: and x9, x6, #0x80000000ffffffff
0x18bfbb4c8 <+52>: mov x10, #-0x7fffffff00000001
0x18bfbb4cc <+56>: cmp x9, x10
0x18bfbb4d0 <+60>: b.ne 0x18bfbb4e4 ; <+80>
0x18bfbb4d4 <+64>: sub x0, x19, #0x8
0x18bfbb4d8 <+68>: ldp x29, x30, [sp, #0x10]
0x18bfbb4dc <+72>: ldp x20, x19, [sp], #0x20
0x18bfbb4e0 <+76>: ret
0x18bfbb4e4 <+80>: add x0, x8, #0x10
0x18bfbb4e8 <+84>: lsr x9, x5, #32
0x18bfbb4ec <+88>: lsl x8, x3, #33
0x18bfbb4f0 <+92>: adds x10, x8, x6
0x18bfbb4f4 <+96>: b.mi 0x18bfbb53c ; <+168>
0x18bfbb4f8 <+100>: mov w11, w5
0x18bfbb4fc <+104>: mov x7, x11
0x18bfbb500 <+108>: bfi x7, x9, #32, #32
0x18bfbb504 <+112>: mov x4, x6
0x18bfbb508 <+116>: mov x5, x7
0x18bfbb50c <+120>: casp x4, x5, x10, x11, [x0]
0x18bfbb510 <+124>: eor x9, x5, x7
0x18bfbb514 <+128>: eor x10, x4, x6
0x18bfbb518 <+132>: orr x9, x10, x9
0x18bfbb51c <+136>: cbz x9, 0x18bfbb4d4 ; <+64>
0x18bfbb520 <+140>: lsr x12, x5, #32
0x18bfbb524 <+144>: mov x6, x4
0x18bfbb528 <+148>: mov x9, x12
0x18bfbb52c <+152>: adds x10, x8, x4
0x18bfbb530 <+156>: b.pl 0x18bfbb4f8 ; <+100>
0x18bfbb534 <+160>: mov x9, x12
0x18bfbb538 <+164>: mov x6, x4
0x18bfbb53c <+168>: cmn w6, #0x1
0x18bfbb540 <+172>: b.eq 0x18bfbb4d4 ; <+64>
0x18bfbb544 <+176>: mov w2, w5
0x18bfbb548 <+180>: bfi x2, x9, #32, #32
0x18bfbb54c <+184>: mov x1, x6
0x18bfbb550 <+188>: bl 0x18bfbb55c ; swift::RefCountsswift::SideTableRefCountBits::incrementSlow(swift::SideTableRefCountBits, unsigned int)
0x18bfbb554 <+192>: b 0x18bfbb4d4 ; <+64>
0x18bfbb558 <+196>: bl 0x18bf825d0 ; swift::swift_abortRetainOverflow()

Joe_Groff · July 24, 2023, 3:49pm

I wonder if that indicates a bug in the compiler with incremental recompilation. Does the crash also go away if do a clean rebuild?

sunlexy · July 24, 2023, 4:12pm

So the crash is gone.

I had assumptions that maybe it is because CI has a very limited resource but since was reproducible locally. But, I can reproduce it locally on my machine now. Seems indeed something wrong with the incremental build.

The CI is supposed to be a clean build, but maybe not so clean after all. For the past runs, I was just doing CMD+SHIFT+K to clean. Could it be because the module cache is not actually cleaned? Because those crashed structs are coming from another pod. So could it be once after I had my pod updated then the rebuild did not invalidate the module cache when running the unit test target?

sunlexy · July 26, 2023, 10:31am

Another update. On our CI we are having a new agent every time when a job is triggered. So the incremental build theory does not solve the problem there. Even though remove the folder inside the DerivedData folder it removed my local crash.