Fix for async let teardown ordering crash

PR: Fix async let teardown ordering crash (#81771)

Hi all,

I've been working on a fix for the async let teardown ordering crash reported in swiftlang/swift#81771, which has been affecting production apps since Swift 6.1. The crash manifests as "freed pointer was not the last allocation" in swift_task_dealloc during asyncLet_finish_after_task_completion.

Thanks to the excellent investigation by @Mike_Ash (who identified the LIFO violation at the assembly level), @jamieQ (who produced a minimal reproducer and confirmed the issue extends to large return types on macOS 13+), and @John_McCall (whose PR #84528 modeled async let operations as SIL builtins), I was able to identify the remaining pieces needed to complete the fix.

The root cause spans two levels:

  1. SIL level: The StackNesting pass did not recognize StartAsyncLetWithLocalBuffer / FinishAsyncLet as stack allocation/deallocation operations, so it could not preserve their ordering.
  2. LLVM IR level: The swift_task_alloc / swift_task_dealloc runtime functions were annotated with ArgMemOnly, which is semantically incorrect β€” they also modify the task allocator's internal slab state. This allowed LLVM's SimplifyCFG (inside the CoroCleanup pass) to move swift_task_dealloc calls across suspension boundaries.

Diagnosis

Building on @Mike_Ash's assembly-level analysis showing the incorrect finish/finish/dealloc/dealloc ordering, I traced the problem through each compilation stage:

SIL (-emit-sil) β€” Correct. The canonical SIL output already has proper LIFO ordering:

finishAsyncLet(alet2) β†’ dealloc_stack(closure2) β†’ finishAsyncLet(alet1) β†’ dealloc_stack(closure1)

Pre-CoroSplit LLVM IR (-emit-irgen) β€” Correct. IRGen emits the swift_task_dealloc calls in the right positions relative to @llvm.coro.suspend.async intrinsics. This ruled out an IRGen emission bug.

Post-CoroSplit LLVM IR (-emit-ir) β€” Incorrect. After LLVM's CoroSplit pass splits the coroutine at each suspension point into separate continuation functions (named with …TY<n>_ suffixes in the mangled symbol), both swift_task_dealloc calls end up batched together in the final continuation:

Continuation …TY3_ : swift_asyncLet_finish(alet2)          β†’ suspends
Continuation …TY5_ : swift_asyncLet_finish(alet1)          β†’ suspends
                      ↑ MISSING: swift_task_dealloc(closure2) should be HERE
Continuation …TY7_ : swift_task_dealloc(closure2)
                      swift_task_dealloc(closure1)          β†’ return
                      ↑ WRONG: closure2's dealloc was moved past a suspension boundary

The culprit is SimplifyCFG running inside LLVM's CoroCleanup pass (which runs even at -Onone). Because swift_task_dealloc was declared with ArgMemOnly β€” telling LLVM it only touches memory reachable from its arguments β€” the optimizer was free to merge basic blocks and move the dealloc across the suspension boundary into the final continuation.


The Fix

The fix has two parts, addressing the SIL and LLVM IR levels respectively.

Part 1: Teach StackNesting about async let stack operations (SILInstruction.cpp)

This completes the work started in @John_McCall's PR #84528, which modeled finishAsyncLet and startAsyncLetWithLocalBuffer as SIL builtins but explicitly deferred the StackNesting integration:

"Modeling finishAsyncLet as a builtin is necessary because we need to model its stack-allocation behavior, although I'm not yet doing that in this patch because StackNesting first needs to be taught to not try to move the deallocation."

The changes:

  1. isAllocatingStack() β€” Register StartAsyncLetWithLocalBuffer as a stack allocation (resolving a pre-existing FIXME).
  2. isDeallocatingStack() β€” Register FinishAsyncLet as a stack deallocation (resolving another FIXME).
  3. isStackAllocationNested() β€” Return StackAllocationIsNotNested for StartAsyncLetWithLocalBuffer. This is the critical piece: because FinishAsyncLet is an async operation that suspends, its alloc/dealloc pair cannot follow the synchronous LIFO nesting that StackNesting normally enforces. Marking it as non-nested tells the pass to leave it alone.

Part 2: Fix LLVM memory effect annotations (RuntimeFunctions.def + IRGenModule.cpp)

Changed swift_task_alloc, swift_task_dealloc, and swift_task_dealloc_through from MEMEFFECTS(ArgMemOnly) to MEMEFFECTS(InaccessibleOrArgMemOnly).

ArgMemOnly was semantically incorrect: these functions modify the task allocator's internal slab state (bump pointer, free list) β€” memory not reachable through any pointer argument. With ArgMemOnly, LLVM believed these calls had no side effects beyond their arguments, which gave SimplifyCFG (inside CoroCleanup) license to merge basic blocks and move deallocs past suspension boundaries.

InaccessibleOrArgMemOnly tells LLVM: "this function accesses argument memory AND may also access opaque internal state." This prevents the optimizer from reordering these calls across other memory operations or suspension boundaries.

Why both parts are needed

Part 1 alone is insufficient. Even with correct SIL ordering, LLVM's SimplifyCFG (inside CoroCleanup) can still move swift_task_dealloc across suspension boundaries if the ArgMemOnly annotation remains β€” the annotation tells LLVM these calls are independent when their argument pointers differ.

Part 2 alone is insufficient. Without Part 1, the StackNesting pass does not recognize StartAsyncLetWithLocalBuffer / FinishAsyncLet as an alloc/dealloc pair. The pass could attempt to "fix" what it perceives as a nesting violation by inserting or moving deallocation instructions, potentially breaking the async let teardown sequence.


Design Rationale

"Does this fix both pre-macOS 13 and macOS 13+ paths?"

@Mike_Ash explained that the old-style AsyncLet (pre-macOS 13) always allocates on the async stack, while the new-style AsyncLetWithBuffer (macOS 13+) only does so when the preallocated buffer is insufficient. @jamieQ confirmed the crash occurs on macOS 13+ too, when the return type is large enough to overflow the buffer.

The fix operates at the compiler level β€” it corrects the ordering of emitted deallocation calls, regardless of whether the runtime ultimately allocates on the async stack. The LIFO ordering is now guaranteed in the generated code for both paths.

"Is StackAllocationIsNotNested the right approach?"

StackAllocationIsNotNested means the StackNesting pass will not attempt to move or insert deallocation code for this allocation. This is the correct semantics because FinishAsyncLet is an async operation β€” it suspends, and its deallocation occurs within the runtime, across a suspension point. The StackNesting pass, which operates on synchronous LIFO nesting, cannot and should not manage this.

All other stack-allocating instructions (AllocPackInst, AllocPackMetadataInst, on-stack AllocRefInstBase, on-stack PartialApplyInst, callee-allocated BeginApplyInst, StackAlloc/UnprotectedStackAlloc builtins) continue to return StackAllocationIsNested because they follow synchronous LIFO discipline and their deallocations do not cross suspension points.

"Why InaccessibleOrArgMemOnly instead of UNKNOWN_MEMEFFECTS?"

In commit 01c2e0fde95, @Michael_Gottesman changed AsyncLetBegin and AsyncLetFinish from ArgMemOnly to UNKNOWN_MEMEFFECTS, noting:

"In a discussion with @John_McCall, we agreed that these should really be UNKNOWN_MEMEFFECTS so we are conservative."

For swift_task_alloc/swift_task_dealloc/swift_task_dealloc_through, I chose InaccessibleOrArgMemOnly instead of UNKNOWN_MEMEFFECTS because:

  1. It's semantically precise. These functions access argument memory (the allocated/freed block) and inaccessible allocator state (the slab linked list). They do not access globals, escaped heap, or other module-visible memory. UNKNOWN_MEMEFFECTS would overstate their effects.
  2. It follows LLVM's canonical convention for allocators. LLVM itself annotates malloc as InaccessibleMemOnly and free as InaccessibleOrArgMemOnly (BuildLibCalls.cpp). swift_task_alloc/swift_task_dealloc have identical semantics.
  3. It preserves optimization opportunities. InaccessibleOrArgMemOnly still allows LLVM to optimize around these calls with respect to globals and other accessible memory β€” something UNKNOWN_MEMEFFECTS would prohibit entirely.

That said, AsyncLetBegin/AsyncLetFinish are more complex operations (spawning tasks, invoking closures) that genuinely may touch arbitrary memory. UNKNOWN_MEMEFFECTS is appropriate for them. The task alloc/dealloc functions are simpler bump-pointer operations with a well-defined memory footprint, so the more precise annotation is warranted.

"Is InaccessibleOrArgMemOnly too conservative? Could it regress performance?"

The change from ArgMemOnly to InaccessibleOrArgMemOnly has minimal optimization impact:

  • Alias query precision is unchanged. LLVM's BasicAliasAnalysis strips InaccessibleMem when answering pointer-based queries, so all pointer alias results remain the same.
  • The practical effect is that two swift_task_dealloc calls to different pointers now have a memory dependency through InaccessibleMem, preventing reordering. This is the correctness constraint that was missing.
  • No benchmarks exercise this path. The Swift benchmark suite has zero tests that directly measure swift_task_alloc/swift_task_dealloc performance.

"Was ArgMemOnly a deliberate design choice?"

No. ArgMemOnly was introduced by @nate_chandler in commit 607772aaa21 (Sep 2020) as part of a stub commit ("Stubbed entry points for task de/alloc.") β€” no reasoning was provided. It was then mechanically preserved through every subsequent refactoring:

  • Jul 2023: Evan Wilde migrated it from ATTRS(ArgMemOnly) to MEMEFFECTS(ArgMemOnly) during the memory effects macro introduction.
  • Feb 2025: Nate Chandler added swift_task_dealloc_through with the same ArgMemOnly (as part of [CoroutineAccessors] Use retcon.once variant.).

The broader pattern of ArgMemOnly being too optimistic for concurrency functions has already been corrected in two prior instances:

Date Author Function Fix
Jun 2025 @ktoso swift_task_create ArgMemOnly β†’ ReadOnly β€” "llvm would sometimes wrongly assume there's no indirect accesses and the optimizations can lead to a runtime crash"
Nov 2025 @Michael_Gottesman + @John_McCall AsyncLetBegin, AsyncLetFinish ArgMemOnly β†’ UNKNOWN_MEMEFFECTS

The task alloc/dealloc functions simply fell through the cracks during those corrections.


Known Limitations

Other runtime functions may need the same audit

This fix only corrects swift_task_alloc, swift_task_dealloc, and swift_task_dealloc_through. Several other runtime functions appear to have the same under-specification:

Function Issue
swift_task_cancel Modifies task internal state beyond argument memory
swift_task_getCurrentExecutor Takes zero arguments β€” ArgMemOnly with no args equals ReadNone
swift_task_getMainExecutor Same as above β€” reads global state but annotation says no memory access
swift_autoDiffCreateLinearMapContextWithType Calls malloc() internally
swift_autoDiffAllocateSubcontextWithType Calls allocator.Allocate() internally

These are beyond the scope of this fix but may warrant a follow-up audit.

swift_task_alloc annotation is conservative

LLVM annotates malloc with InaccessibleMemOnly (not InaccessibleOrArgMemOnly) since the returned pointer is brand-new and not yet accessible to the module. swift_task_alloc has the same semantics and could arguably use the stricter InaccessibleMemOnly. The current InaccessibleOrArgMemOnly is a correct superset.

StackNesting does not verify LIFO ordering for async let allocations (pre-existing)

This is a pre-existing limitation, not one introduced by this fix. Before this fix (and before PR #84528), isAllocatingStack() did not include StartAsyncLetWithLocalBuffer at all β€” StackNesting was already completely blind to async let allocations by omission.

This fix makes that exclusion explicit by returning StackAllocationIsNotNested, which is actually strictly better than the prior state: now the allocation IS recognized by isAllocatingStack() and isDeallocatingStack(), enabling the FlowSensitiveVerifier to check joint post-dominance (that a deallocation exists on every path). However, the verifier does not check LIFO ordering for non-nested allocations.

LIFO correctness currently relies on SILGen's cleanup stack emitting FinishAsyncLet in reverse declaration order. This invariant holds today, but there is no compiler-level safety net if a future SILGen change breaks it.


Related Issues

This fix likely addresses the same root cause behind:

  • #75501 β€” "Runtime crash: freed pointer was not the last allocation"
  • #87481 β€” "freed pointer was not the last allocation crash in asyncLet_finish_after_task_completion" (Stripe iOS SDK, filed Feb 25, 2026)

PR: Fix async let teardown ordering crash (#81771)

Feedback and review welcome. Happy to provide additional IR dumps or test results.

4 Likes

Hi folks β€” quick bump + request for review on the PR #87571:

What this addresses: the async let teardown ordering crash (β€œfreed pointer was not the last allocation”) reported in swiftlang/#75501, #87481, #81771
What’s included: a targeted change plus regression coverage (Concurrency runtime + IRGen tests).

I want to be transparent about my process: I used AI agents to help draft parts of the analysis, shape the patch and review the code, but I’m treating this as my fix β€” I’ve reviewed the code carefully by myself (without AI techs), validated it against the reproducer(s), and I’m happy to iterate quickly based on feedback.

If someone familiar with the concurrency runtime / async-let lowering could review, I’d really appreciate it. Also, if someone with CI privileges can kick off Swift CI on the PR (@swift-ci Please test), that would unblock progress.

Specific questions I’d love reviewer eyes on:

  • Does the teardown ordering model match the intended semantics across platforms/ABIs?
  • Are the regression tests scoped appropriately (and named/placed where you’d expect)?
  • Any edge cases I missed?

Thanks β€” any guidance on the right reviewers or next steps is welcome.

FWIW i tried running the example from the most recent report of this against a compiler built with the changeset in your PR locally (off commit 2292702797c) and it still appeared to crash in the same way for me. haven't totally ruled out if that might be related to my env at all, and haven't closely inspected the IR yet though.

1 Like

I've triggered the CI for you. Did you add this example in your test case suite?