PR: Fix async let teardown ordering crash (#81771)
Hi all,
I've been working on a fix for the async let teardown ordering crash reported in swiftlang/swift#81771, which has been affecting production apps since Swift 6.1. The crash manifests as "freed pointer was not the last allocation" in swift_task_dealloc during asyncLet_finish_after_task_completion.
Thanks to the excellent investigation by @Mike_Ash (who identified the LIFO violation at the assembly level), @jamieQ (who produced a minimal reproducer and confirmed the issue extends to large return types on macOS 13+), and @John_McCall (whose PR #84528 modeled async let operations as SIL builtins), I was able to identify the remaining pieces needed to complete the fix.
The root cause spans two levels:
- SIL level: The StackNesting pass did not recognize
StartAsyncLetWithLocalBuffer/FinishAsyncLetas stack allocation/deallocation operations, so it could not preserve their ordering. - LLVM IR level: The
swift_task_alloc/swift_task_deallocruntime functions were annotated withArgMemOnly, which is semantically incorrect β they also modify the task allocator's internal slab state. This allowed LLVM's SimplifyCFG (inside the CoroCleanup pass) to moveswift_task_dealloccalls across suspension boundaries.
Diagnosis
Building on @Mike_Ash's assembly-level analysis showing the incorrect finish/finish/dealloc/dealloc ordering, I traced the problem through each compilation stage:
SIL (-emit-sil) β Correct. The canonical SIL output already has proper LIFO ordering:
finishAsyncLet(alet2) β dealloc_stack(closure2) β finishAsyncLet(alet1) β dealloc_stack(closure1)
Pre-CoroSplit LLVM IR (-emit-irgen) β Correct. IRGen emits the swift_task_dealloc calls in the right positions relative to @llvm.coro.suspend.async intrinsics. This ruled out an IRGen emission bug.
Post-CoroSplit LLVM IR (-emit-ir) β Incorrect. After LLVM's CoroSplit pass splits the coroutine at each suspension point into separate continuation functions (named with β¦TY<n>_ suffixes in the mangled symbol), both swift_task_dealloc calls end up batched together in the final continuation:
Continuation β¦TY3_ : swift_asyncLet_finish(alet2) β suspends
Continuation β¦TY5_ : swift_asyncLet_finish(alet1) β suspends
β MISSING: swift_task_dealloc(closure2) should be HERE
Continuation β¦TY7_ : swift_task_dealloc(closure2)
swift_task_dealloc(closure1) β return
β WRONG: closure2's dealloc was moved past a suspension boundary
The culprit is SimplifyCFG running inside LLVM's CoroCleanup pass (which runs even at -Onone). Because swift_task_dealloc was declared with ArgMemOnly β telling LLVM it only touches memory reachable from its arguments β the optimizer was free to merge basic blocks and move the dealloc across the suspension boundary into the final continuation.
The Fix
The fix has two parts, addressing the SIL and LLVM IR levels respectively.
Part 1: Teach StackNesting about async let stack operations (SILInstruction.cpp)
This completes the work started in @John_McCall's PR #84528, which modeled finishAsyncLet and startAsyncLetWithLocalBuffer as SIL builtins but explicitly deferred the StackNesting integration:
"Modeling
finishAsyncLetas a builtin is necessary because we need to model its stack-allocation behavior, although I'm not yet doing that in this patch because StackNesting first needs to be taught to not try to move the deallocation."
The changes:
isAllocatingStack()β RegisterStartAsyncLetWithLocalBufferas a stack allocation (resolving a pre-existing FIXME).isDeallocatingStack()β RegisterFinishAsyncLetas a stack deallocation (resolving another FIXME).isStackAllocationNested()β ReturnStackAllocationIsNotNestedforStartAsyncLetWithLocalBuffer. This is the critical piece: becauseFinishAsyncLetis an async operation that suspends, its alloc/dealloc pair cannot follow the synchronous LIFO nesting that StackNesting normally enforces. Marking it as non-nested tells the pass to leave it alone.
Part 2: Fix LLVM memory effect annotations (RuntimeFunctions.def + IRGenModule.cpp)
Changed swift_task_alloc, swift_task_dealloc, and swift_task_dealloc_through from MEMEFFECTS(ArgMemOnly) to MEMEFFECTS(InaccessibleOrArgMemOnly).
ArgMemOnly was semantically incorrect: these functions modify the task allocator's internal slab state (bump pointer, free list) β memory not reachable through any pointer argument. With ArgMemOnly, LLVM believed these calls had no side effects beyond their arguments, which gave SimplifyCFG (inside CoroCleanup) license to merge basic blocks and move deallocs past suspension boundaries.
InaccessibleOrArgMemOnly tells LLVM: "this function accesses argument memory AND may also access opaque internal state." This prevents the optimizer from reordering these calls across other memory operations or suspension boundaries.
Why both parts are needed
Part 1 alone is insufficient. Even with correct SIL ordering, LLVM's SimplifyCFG (inside CoroCleanup) can still move swift_task_dealloc across suspension boundaries if the ArgMemOnly annotation remains β the annotation tells LLVM these calls are independent when their argument pointers differ.
Part 2 alone is insufficient. Without Part 1, the StackNesting pass does not recognize StartAsyncLetWithLocalBuffer / FinishAsyncLet as an alloc/dealloc pair. The pass could attempt to "fix" what it perceives as a nesting violation by inserting or moving deallocation instructions, potentially breaking the async let teardown sequence.
Design Rationale
"Does this fix both pre-macOS 13 and macOS 13+ paths?"
@Mike_Ash explained that the old-style AsyncLet (pre-macOS 13) always allocates on the async stack, while the new-style AsyncLetWithBuffer (macOS 13+) only does so when the preallocated buffer is insufficient. @jamieQ confirmed the crash occurs on macOS 13+ too, when the return type is large enough to overflow the buffer.
The fix operates at the compiler level β it corrects the ordering of emitted deallocation calls, regardless of whether the runtime ultimately allocates on the async stack. The LIFO ordering is now guaranteed in the generated code for both paths.
"Is StackAllocationIsNotNested the right approach?"
StackAllocationIsNotNested means the StackNesting pass will not attempt to move or insert deallocation code for this allocation. This is the correct semantics because FinishAsyncLet is an async operation β it suspends, and its deallocation occurs within the runtime, across a suspension point. The StackNesting pass, which operates on synchronous LIFO nesting, cannot and should not manage this.
All other stack-allocating instructions (AllocPackInst, AllocPackMetadataInst, on-stack AllocRefInstBase, on-stack PartialApplyInst, callee-allocated BeginApplyInst, StackAlloc/UnprotectedStackAlloc builtins) continue to return StackAllocationIsNested because they follow synchronous LIFO discipline and their deallocations do not cross suspension points.
"Why InaccessibleOrArgMemOnly instead of UNKNOWN_MEMEFFECTS?"
In commit 01c2e0fde95, @Michael_Gottesman changed AsyncLetBegin and AsyncLetFinish from ArgMemOnly to UNKNOWN_MEMEFFECTS, noting:
"In a discussion with @John_McCall, we agreed that these should really be UNKNOWN_MEMEFFECTS so we are conservative."
For swift_task_alloc/swift_task_dealloc/swift_task_dealloc_through, I chose InaccessibleOrArgMemOnly instead of UNKNOWN_MEMEFFECTS because:
- It's semantically precise. These functions access argument memory (the allocated/freed block) and inaccessible allocator state (the slab linked list). They do not access globals, escaped heap, or other module-visible memory.
UNKNOWN_MEMEFFECTSwould overstate their effects. - It follows LLVM's canonical convention for allocators. LLVM itself annotates
mallocasInaccessibleMemOnlyandfreeasInaccessibleOrArgMemOnly(BuildLibCalls.cpp).swift_task_alloc/swift_task_deallochave identical semantics. - It preserves optimization opportunities.
InaccessibleOrArgMemOnlystill allows LLVM to optimize around these calls with respect to globals and other accessible memory β somethingUNKNOWN_MEMEFFECTSwould prohibit entirely.
That said, AsyncLetBegin/AsyncLetFinish are more complex operations (spawning tasks, invoking closures) that genuinely may touch arbitrary memory. UNKNOWN_MEMEFFECTS is appropriate for them. The task alloc/dealloc functions are simpler bump-pointer operations with a well-defined memory footprint, so the more precise annotation is warranted.
"Is InaccessibleOrArgMemOnly too conservative? Could it regress performance?"
The change from ArgMemOnly to InaccessibleOrArgMemOnly has minimal optimization impact:
- Alias query precision is unchanged. LLVM's BasicAliasAnalysis strips
InaccessibleMemwhen answering pointer-based queries, so all pointer alias results remain the same. - The practical effect is that two
swift_task_dealloccalls to different pointers now have a memory dependency throughInaccessibleMem, preventing reordering. This is the correctness constraint that was missing. - No benchmarks exercise this path. The Swift benchmark suite has zero tests that directly measure
swift_task_alloc/swift_task_deallocperformance.
"Was ArgMemOnly a deliberate design choice?"
No. ArgMemOnly was introduced by @nate_chandler in commit 607772aaa21 (Sep 2020) as part of a stub commit ("Stubbed entry points for task de/alloc.") β no reasoning was provided. It was then mechanically preserved through every subsequent refactoring:
- Jul 2023: Evan Wilde migrated it from
ATTRS(ArgMemOnly)toMEMEFFECTS(ArgMemOnly)during the memory effects macro introduction. - Feb 2025: Nate Chandler added
swift_task_dealloc_throughwith the sameArgMemOnly(as part of[CoroutineAccessors] Use retcon.once variant.).
The broader pattern of ArgMemOnly being too optimistic for concurrency functions has already been corrected in two prior instances:
| Date | Author | Function | Fix |
|---|---|---|---|
| Jun 2025 | @ktoso | swift_task_create |
ArgMemOnly β ReadOnly β "llvm would sometimes wrongly assume there's no indirect accesses and the optimizations can lead to a runtime crash" |
| Nov 2025 | @Michael_Gottesman + @John_McCall | AsyncLetBegin, AsyncLetFinish |
ArgMemOnly β UNKNOWN_MEMEFFECTS |
The task alloc/dealloc functions simply fell through the cracks during those corrections.
Known Limitations
Other runtime functions may need the same audit
This fix only corrects swift_task_alloc, swift_task_dealloc, and swift_task_dealloc_through. Several other runtime functions appear to have the same under-specification:
| Function | Issue |
|---|---|
swift_task_cancel |
Modifies task internal state beyond argument memory |
swift_task_getCurrentExecutor |
Takes zero arguments β ArgMemOnly with no args equals ReadNone |
swift_task_getMainExecutor |
Same as above β reads global state but annotation says no memory access |
swift_autoDiffCreateLinearMapContextWithType |
Calls malloc() internally |
swift_autoDiffAllocateSubcontextWithType |
Calls allocator.Allocate() internally |
These are beyond the scope of this fix but may warrant a follow-up audit.
swift_task_alloc annotation is conservative
LLVM annotates malloc with InaccessibleMemOnly (not InaccessibleOrArgMemOnly) since the returned pointer is brand-new and not yet accessible to the module. swift_task_alloc has the same semantics and could arguably use the stricter InaccessibleMemOnly. The current InaccessibleOrArgMemOnly is a correct superset.
StackNesting does not verify LIFO ordering for async let allocations (pre-existing)
This is a pre-existing limitation, not one introduced by this fix. Before this fix (and before PR #84528), isAllocatingStack() did not include StartAsyncLetWithLocalBuffer at all β StackNesting was already completely blind to async let allocations by omission.
This fix makes that exclusion explicit by returning StackAllocationIsNotNested, which is actually strictly better than the prior state: now the allocation IS recognized by isAllocatingStack() and isDeallocatingStack(), enabling the FlowSensitiveVerifier to check joint post-dominance (that a deallocation exists on every path). However, the verifier does not check LIFO ordering for non-nested allocations.
LIFO correctness currently relies on SILGen's cleanup stack emitting FinishAsyncLet in reverse declaration order. This invariant holds today, but there is no compiler-level safety net if a future SILGen change breaks it.
Related Issues
This fix likely addresses the same root cause behind:
- #75501 β "Runtime crash: freed pointer was not the last allocation"
- #87481 β "freed pointer was not the last allocation crash in asyncLet_finish_after_task_completion" (Stripe iOS SDK, filed Feb 25, 2026)
PR: Fix async let teardown ordering crash (#81771)
Feedback and review welcome. Happy to provide additional IR dumps or test results.