Atomic with @_noLocks can cause metadata allocation or locks

jcavar · February 10, 2024, 1:45pm

I have the following code snippet:

import Synchronization

struct Counter: ~Copyable {
    let changeCounter = Atomic<Int>(0)

    @_noAllocation
    @_noLocks
    mutating func add() {
        changeCounter.add(1, ordering: .sequentiallyConsistent)
    }
}

This code reports the violation:

"Using type 'Counter' can cause metadata allocation or locks"

Since Atomic is quite new, I guess performance annotation need to take them into account somehow?

jcavar · February 13, 2024, 9:14am

This seems something specific to Atomic. I am not able to reproduce this with my ~Copyable types (even if they are generic).

Also, you don't need to interact with atomic variable at all. E.g. this raises the same error:

import Synchronization

struct Counter: ~Copyable {
    let changeCounter = Atomic<Int>(0)
    var x = 1

    @_noAllocation
    @_noLocks
    mutating func add() {
        x = 2
    }
}

@Erik_Eckstein do you maybe know what is happening here?

Alejandro · February 13, 2024, 5:28pm

I imagine the issue is just that Atomic is generic so theoretically it may cause a metadata allocation if the generic argument is not statically known or if the compiler doesn't specialize the type. In reality, for Atomic it's almost always statically known and every operation on it is fully transparent and emitted into clients for the compiler to eliminate any generic shenanigans and lower these operations down to single atomic ops. Perhaps we should teach the compiler about this type for @_noLocks and @_noAllocation?

scanon · February 13, 2024, 6:29pm

Yeah, this is a common problem with generic-but-always-specialized-in-practice stuff.

Joe_Groff · February 13, 2024, 6:35pm

I would expect the current implementation to be able to handle a concrete use of a generic type like this, as long as it's actually getting fully specialized. We should check whether some generic part of the implementation isn't @inlinable, or the compiler is deciding it's unable to specialize and leaving behind metadata accesses. cc @Erik_Eckstein

jcavar · February 15, 2024, 1:01pm

Yeah, using Compiler Explorer reveals calls to:

swift_getTypeByMangledNameInContext2

Example:

dnadoba · February 15, 2024, 2:24pm

The linked godbolt is without optimisations turned on so no surprise but it also doesn't change if compiled with -O.

It looks like the initialiser is causing problems. Operations like wrappingAdd seems fine.

The Atomic initialiser is not @inlinable but has the @_alwaysEmitIntoClient and @_transparent attributes which sounds like it should be enough but this is outside of my expertise.

jcavar · February 15, 2024, 2:43pm

Shouldn't Atomic guarantee no metadata access without relying on optimiser?

dnadoba · February 15, 2024, 2:49pm

Maybe? I don't think it has been spelled out explicitly and it would be something new. @_transparent usually does a good job at enabling optimisation in debug builds but I'm not aware what it actually guarantees.

Joe_Groff · February 15, 2024, 3:16pm

It's my understanding that in a @_noLocks/@_noAllocations function that we're supposed to try to optimize out calls that require metadata instantiations even if the compiler wouldn't normally decide to (assuming that's fundamentally possible, with the functions in question being @inlinable or in the same module, of course).

tera · February 15, 2024, 4:44pm

Is it considered ok if the @_noLocks code compiles fine in -O but fails to compile in -Onone ?

I guess we could do the following if this is expected behaviour:

#if !DEBUG
@_noLocks
#endif
func add() { ... }

dnadoba · February 15, 2024, 4:50pm

@_noLock wasn't used in the linked code sample.

Re -Onone: I personally haven't used @_noLocks but maybe someone else can comment on this.

Joe_Groff · February 15, 2024, 5:14pm

No. @_noLocks should apply the compiler optimizations necessary to achieve the "no locks" constraint regardless of optimization mode.

tera · February 15, 2024, 5:18pm

Great news.
Please clarify what would happen to the code shipped (to the AppStore, etc) with -Onone (strange, but it happens). Will the individual function marked with @_noLocks be effectively compiled with -O while the rest of the code compiled with -Onone ? Or will it be compiled with "-Onone" (potentially taking locks)?

Joe_Groff · February 15, 2024, 5:20pm

I believe that's what happens now (Erik might be able to confirm), but it wouldn't necessarily always be the case. We would probably ultimately want to run only lighter guaranteed transformations that eliminate potential sources of runtime locks without sacrificing debuggability of the code.

Alejandro · February 15, 2024, 5:47pm

The issue is that the deinit devirtualizer is not yet turned on for move only types, so it must allocate metadata right now to call the destroy value witness function. In this example:

struct Hello: ~Copyable {
    let counter = Atomic(0)
}

output.Hello.init() -> output.Hello:
        mov     qword ptr [rax], 0
        ret

destroy value witness for output.Hello:
        push    rbx
        mov     rbx, rdi
        lea     rdi, [rip + (demangling cache variable for type metadata for Synchronization.Atomic<Swift.Int>)]
        call    __swift_instantiateConcreteTypeFromMangledName
        mov     rcx, qword ptr [rax - 8]
        mov     rcx, qword ptr [rcx + 8]
        mov     rdi, rbx
        mov     rsi, rax
        pop     rbx
        jmp     rcx

You'll see that Hello's initialization doesn't allocate metadata for Atomic, but it's only in Hello's destroy that it allocates it right now (this is resolved, but the optimizer pass is turned off like I mentioned).

Joe_Groff · February 15, 2024, 6:11pm

Regardless of whether the devirtualizer is enabled, I don't think we ever serialize the SIL record for the deinit yet to make it available across modules. We would need to do that for @frozen public types as well for the devirtualizer to have the information it needs.

Alejandro · February 15, 2024, 6:12pm

I think we record this information as part of the MoveOnlyDeinitList.

Alejandro · February 15, 2024, 6:19pm

And if you explicitly turn on the optimizer pass, then you get some really nice codegen:


output.foo() -> Swift.Int:
        mov     qword ptr [rsp - 8], 1
        lock            inc     qword ptr [rsp - 8]
        mov     rax, qword ptr [rsp - 8]
        ret

output.increment(Synchronization.Atomic<Swift.Int>) -> ():
        lock            inc     qword ptr [rdi]
        ret

jcavar · February 15, 2024, 9:13pm

Interesting, I thought these are only diagnostics and they don't change generated code?