What's stored at [AsyncContext + 16]?

ole · March 12, 2024, 8:08pm

@nsc and I have been looking at the assembly the compiler generates for an async function call and trying to understand it. Take this example (Godbolt link):

func caller() async {
    await asyncFunc()
}

@inline(never) func asyncFunc() async {
}

The assembly for the first partial func of caller (i.e. up to the await call) looks like this (commented by me).

output.caller() async -> ():
        push    rax
        ; Load size of AsyncContext required for calling asyncFunc()
        mov     edi, dword ptr [rip + (async function pointer to output.asyncFunc() async -> ())+4]
        ; Allocate AsyncContext for async function call.
        call    swift_task_alloc@PLT

        ; Store new AsyncContext at [current AsyncContext + 16]
        ; ??? Why???
        mov     qword ptr [r14 + 16], rax

        ; Set current AsyncContext as parent of new AsyncContext (parent field is at offset 0)
        mov     qword ptr [rax], r14
        ; Set continuation in new AsyncContext (field at offset 8)
        lea     rcx, [rip + ((1) await resume partial function for output.caller() async -> ())]
        mov     qword ptr [rax + 8], rcx
        ; Set new AsyncContext as the current AsyncContext (passed in r14)
        mov     r14, rax
        pop     rax
        ; Perform async function call (tail call)
        jmp     (output.asyncFunc() async -> ())

I think I understand every line of this except this one:

mov     qword ptr [r14 + 16], rax

What's the purpose of this? This seems to be storing the newly allocated AsyncContext (in rax) in the third field of the current AsyncContext (in r14). Looking at the class definition for AsyncContext, I don't see a third field, and I couldn't find an obvious match in any of its subclasses, either.

jrose · March 12, 2024, 8:39pm

The first two fields are the type and retain count, so this is the first declared field, the parent context.

ole · March 12, 2024, 8:48pm

Really? AsyncContext is a C++ class, so it doesn't have a retain count, does it?

John_McCall · March 12, 2024, 10:15pm

AsyncContexts are not reference-counted, no.

The AsyncContext structure is just the header on an async stack frame. Just like a C stack frame, the contents of the rest of the frame are opaque and function-specific; the function allocates whatever storage it needs there, including spilling values that it has to be able to use across the suspension. It shouldn't need to spill the child context pointer across the call — it gets passed into the continuation — but I assume the frame lowering code has somehow forgotten about that.

jrose · March 12, 2024, 10:41pm

Aargh, sorry! Not sure how I convinced myself otherwise—did I think it was a Swift class, or did I think it was a C++ class that had the HeapObject layout as a base class? Thanks for correcting me.

ole · March 12, 2024, 10:44pm

Thanks @John_McCall!

John_McCall · March 12, 2024, 10:47pm

It's worth a bug that we're spilling that — it's probably costing us a lot of code size.

ole · March 13, 2024, 12:05pm

I filed a bug:

github.com/apple/swift

Codegen for async function call spills AsyncContext unnecessarily

opened 12:04PM - 13 Mar 24 UTC

ole

bug triage needed

### Description It looks like the codegen for an async function call unnecessar…ily spills the newly allocated async stack frame across the suspension point. Forum discussion here: [What’s stored at \[AsyncContext + 16\]? (2024-03-12)](https://forums.swift.org/t/whats-stored-at-asynccontext-16/70621) ### Reproduction This Swift code: ```swift func caller() async { await asyncFunc() } @inline(never) func asyncFunc() async { } ``` produces the following assembly for the async function call ([Compiler Explorer link](https://godbolt.org/z/xqfnex794)): ```x86asm output.caller() async -> (): push rax mov edi, dword ptr [rip + (async function pointer to output.asyncFunc() async -> ())+4] call swift_task_alloc@PLT mov qword ptr [r14 + 16], rax mov qword ptr [rax], r14 lea rcx, [rip + ((1) await resume partial function for output.caller() async -> ())] mov qword ptr [rax + 8], rcx mov r14, rax pop rax jmp (output.asyncFunc() async -> ()) ``` This line seems unnecessary and can possibly be eliminated: ```x86asm mov qword ptr [r14 + 16], rax ``` This stores the newly allocated async stack frame in the current AsyncContext, spilling it across the suspension point. According to @rjmccall (https://forums.swift.org/t/whats-stored-at-asynccontext-16/70621/4), this is unnecessary: > It shouldn't need to spill the child context pointer across the call — it gets passed into the continuation — but I assume the frame lowering code has somehow forgotten about that. Similarly, in the next partial function (after returning from the async call), the assembly seems to load the stored async stack frame with `mov rdi, qword ptr [rcx + 16]` when it could also access it via `r14` (I'm not 100 % certain about this). ```x86asm (1) await resume partial function for output.caller() async -> (): push rbp push r14 lea rbp, [rsp + 8] sub rsp, 24 mov rax, rbp mov rcx, qword ptr [r14] lea rdx, [rbp - 8] mov qword ptr [rdx], rcx mov rdi, qword ptr [rcx + 16] mov r14, qword ptr [r14] sub rax, 8 mov qword ptr [rax], r14 call swift_task_dealloc@PLT mov rax, r14 add rsp, 16 add rsp, 16 pop rbp btr rbp, 60 jmp qword ptr [rax + 8] ``` ### Expected behavior More efficient code generation that eliminates the unnecessary spill. ### Environment Tested on Compiler Explorer with optimizations (`-O`) on swift-5.10-nightly on Linux, x86-64: ``` Swift version 5.10-dev (LLVM 5dc9d563e5a6cd2, Swift 9bfe759d7048cb6) Target: x86_64-unknown-linux-gnu ``` The same issue can be observed with swift-nightly on Compiler Explorer: ``` Swift version 6.0-dev (LLVM ce41a43bba95b2b, Swift 1a840948a0905df) Target: x86_64-unknown-linux-gnu ``` I haven't tested this on ARM/Apple. ### Additional information _No response_