ARC overhead when wrapping a class in a struct which is only borrowed

I have a reference-counted type which I would like to wrap in a struct and pass to a closure. The underlying type is a storage object, and the wrapper exposes some convenient APIs for it.

func forward(_ storage: Storage, _ callback: (Wrapper) -> Void) {
    callback(Wrapper(storage: storage))
}

class Storage { 
    var x: Int = 42
}

struct Wrapper {
    var storage: Storage
}

I'm finding that this involves a retain/release pair, and I'd like to understand why. Godbolt.

output.forward(output.Storage, (output.Wrapper) -> ()) -> ():
        push    r14
        push    r13
        push    rbx
        mov     r13, rdx
        mov     r14, rsi
        mov     rbx, rdi
        call    swift_retain@PLT
        mov     rdi, rbx
        call    r14
        mov     rdi, rbx
        pop     rbx
        pop     r13
        pop     r14
        jmp     swift_release@PLT

My understanding is that in the function forward, the argument storage is passed in with borrow ownership. Similarly, when it invokes callback, the wrapper which is passed as an argument is also borrowed for passing as a function argument.

Any attempt by the callback closure to extend the wrapper's lifetime will need to perform a copy (which will retain Storage). But if no lifetime extension occurs, the wrapper does not need to be copied, and Storage does not need to be retained. That's something for the callback to do, if it wants to - is that correct?

Constructing the Wrapper value will perform a retain (as it "escapes" in to the stored property), but since the wrapper itself is only borrowed by the closure call and then destroyed, I'd expect this retain/release pair to be eliminated -- in other words, I think this function shouldn't include any retain/releases at all.

Are my expectations here correct, and is this really a missed optimisation? Or am I missing something?

5 Likes

First of all, just to reproduce (sorry, I'm on armt64) what you're seeing. Let's check out the assembly (-emit-assembly):

$ swiftc  -O -emit-assembly -module-name output test.swift | swift demangle | grep -A23 ^output.forward
output.forward(output.Storage, (output.Wrapper) -> ()) -> ():
	.cfi_startproc
	stp	x22, x21, [sp, #-48]!
	stp	x20, x19, [sp, #16]
	stp	x29, x30, [sp, #32]
	add	x29, sp, #32
	.cfi_def_cfa w29, 16
	.cfi_offset w30, -8
	.cfi_offset w29, -16
	.cfi_offset w19, -24
	.cfi_offset w20, -32
	.cfi_offset w21, -40
	.cfi_offset w22, -48
	mov	x20, x2
	mov	x19, x1
	mov	x21, x0
	bl	_swift_retain
	blr	x19
	mov	x0, x21
	ldp	x29, x30, [sp, #32]
	ldp	x20, x19, [sp, #16]
	ldp	x22, x21, [sp], #48
	b	_swift_release
	.cfi_endproc

Indeed, a retain/release pair around the call to the closure (blr x19).

For ownership information, I think looking into the SIL (-emit-sil) is the way to go

$ swiftc  -O -emit-sil -module-name output test.swift | swift demangle | grep -A13 '^sil.*forward'
sil hidden @output.forward(output.Storage, (output.Wrapper) -> ()) -> () : $@convention(thin) (@guaranteed Storage, @noescape @callee_guaranteed (@guaranteed Wrapper) -> ()) -> () {
// %0 "storage"                                   // users: %7, %5, %4, %2
// %1 "callback"                                  // users: %6, %3
bb0(%0 : $Storage, %1 : $@noescape @callee_guaranteed (@guaranteed Wrapper) -> ()):
  debug_value %0 : $Storage, let, name "storage", argno 1 // id: %2
  debug_value %1 : $@noescape @callee_guaranteed (@guaranteed Wrapper) -> (), let, name "callback", argno 2 // id: %3
  %4 = struct $Wrapper (%0 : $Storage)            // user: %6
  strong_retain %0 : $Storage                     // id: %5
  %6 = apply %1(%4) : $@noescape @callee_guaranteed (@guaranteed Wrapper) -> ()
  strong_release %0 : $Storage                    // id: %7
  %8 = tuple ()                                   // user: %9
  return %8 : $()                                 // id: %9
} // end sil function 'output.forward(output.Storage, (output.Wrapper) -> ()) -> ()'

We can learn a few things here:

  1. The forward function gets the Storage as @guaranteed: @guaranteed Storage. @guaranteed means that the caller guarantees that the reference is alive but it doesn't "donate" a +1 ref count to us.
  2. The closure wants Storage as @guaranteed too ((@guaranteed Wrapper) -> (), let, name "callback")

But that still leaves the question why we can't just "forward" the @guaranteed the forward receives, through Wrapper into the callback. This looks like it could work because we know that Wrapper dies at the end of the function.

But let's take a step back and get the SIL without optimisations (remove the -O)

$ swiftc   -emit-sil -module-name output test.swift | swift demangle | grep -A13 '^sil.*forward'
sil hidden @output.forward(output.Storage, (output.Wrapper) -> ()) -> () : $@convention(thin) (@guaranteed Storage, @noescape @callee_guaranteed (@guaranteed Wrapper) -> ()) -> () {
// %0 "storage"                                   // users: %7, %5, %2
// %1 "callback"                                  // users: %8, %3
bb0(%0 : $Storage, %1 : $@noescape @callee_guaranteed (@guaranteed Wrapper) -> ()):
  debug_value %0 : $Storage, let, name "storage", argno 1 // id: %2
  debug_value %1 : $@noescape @callee_guaranteed (@guaranteed Wrapper) -> (), let, name "callback", argno 2 // id: %3
  %4 = metatype $@thin Wrapper.Type               // user: %7
  strong_retain %0 : $Storage                     // id: %5
  // function_ref Wrapper.init(storage:)
  %6 = function_ref @output.Wrapper.init(storage: output.Storage) -> output.Wrapper : $@convention(method) (@owned Storage, @thin Wrapper.Type) -> @owned Wrapper // user: %7
  %7 = apply %6(%0, %4) : $@convention(method) (@owned Storage, @thin Wrapper.Type) -> @owned Wrapper // users: %9, %8
  %8 = apply %1(%7) : $@noescape @callee_guaranteed (@guaranteed Wrapper) -> ()
  release_value %7 : $Wrapper                     // id: %9
  %10 = tuple ()                                  // user: %11

Here we see something interesting, Wrapper.init needs the Storage as @owned: %6 = function_ref @output.Wrapper.init(storage: output.Storage) -> output.Wrapper : $@convention(method) (@owned Storage, [...].

But we received this Storage as @guaranteed so kinda with a +0 reference count. Can we donate/forward one ref count into the Wrapper.init? I think the answer is no because we don't own a ref count at all. So to satisfy the @owned (+1) we need to retain it first (which is exactly what we do).

Now you could argue that with enough analysis, the compiler can figure out that we don't actually need to do that because Wrapper dies again within our function. But this is only correct if we can guarantee that the closure cannot tell that we performed this optimisation.

But I think it can tell and therefore removing this retain/release pair would break the semantics of Swift.

If we were to run isKnownUniquelyReferenced on Storage inside the closure, we could indeed tell this apart, iff forward gets called with a Storage with ref count 1. But it can only have ref count 1 here if nothing on the way retained it.
With the existing code, we will always get isKnownUniquelyReferenced false because the caller guarantees a ref to us (so ref count >= 1) and we add a reference (-> ref count >= 2) but if we performed the (IMHO incorrect) "optimisation" to remove that seemingly unnecessary retain/release pair it might incorrectly say true (because the ref count is >= 1). And that would be a huge issue because isKnownUniquelyReferenced working according to the semantics makes things like CoW types work.

So I'd say the Swift compiler is correct and here is no semantics preserving way to just delete the retain/release pair in this function.
Now, said all that, what the Swift compiler can is to just inline all of the code (forward the call to the closure and the closure itself) into the main function (or whatever the actual caller is). If it does that, then it would gain even more knowledge.

But that's besides the point: The function forward we looked at was the actual function, i.e. the one that gets invoked from callsites that didn't inline it. And naturally we also cannot inline the callback into the concrete forward function because it's a parameter that we cannot know at compile time.

I'd love if @Michael_Gottesman could confirm (or deny) this :).

1 Like

I don't think that analysis holds (although I don't know whether there's another reason the retain/release pair couldn't be elided). The only way that isKnownUniquelyReferenced could be called on Storage is if we have a mutable Wrapper (since isKnownUniquelyReferenced takes an inout parameter), and the only way to obtain that is by a copy: i.e. var wrapper = wrapper; isKnownUniquelyReferenced(&wrapper.storage). That guarantees that isKnownUniquelyReferenced will continue to return false.

You are right that in today's Swift source language, isKnownUniquelyRefernced takes an inout. And that causes a bunch of trouble in places and I don't think this is actually necessary anymore, I believe this was a cheap hack in the early Swift days to make sure that we don't +1 the ref count accidentally before passing a reference into isKnownUniquelyReferenced (something that we don't need anymore today I believe).

But anyway, I don't think your argument holds even today (although there might be code where currently it would actually be very hard to observe the wrong semantics with current Swift):

$ cat test2.swift 
func f(_ x: C) -> Bool {
    var x = x
    return isKnownUniquelyReferenced(&x)
}

class C{}
let c = C()
print(f(c))
print(c)
$ swiftc -O test2.swift  && ./test2
true
test2.C

which works because f takes @guaranteed C.

My argument really is: The semantics of the optimisation suggested by OP would be IMHO incorrect because the ref count might be too low. How we can actually observe this in today's Swift isn't the actual argument.

I'm happy to be wrong here but this is my current understanding, @Michael_Gottesman to confirm/deny.

I think @guaranteed is another (internal) name for borrowed, isn't it? And @owned would be take/consuming, so that matches what I expected.

Yes, that's exactly what I want.

I'm not sure that it would be semantics-breaking, though - I think the optimiser is free to eliminate copies wherever it can. I believe it does so already, and that there are situations where you can observe superfluous retain/release pairs being eliminated as the optimiser has improved across releases.

Otherwise, wouldn't that mean the optimiser could never be able to remove any retain/release pairs?

I guess another way of phrasing my argument is: If you put a reference in a struct then you need to have a +1 to the reference to legally do so (expressed as the struct's initialiser taking that @owned). You cannot just piggypack on a +0 you received even if you know that struct will die within your lifetime. Again, might be wrong about this, I think this is the key question.

It can definitely remove retain/release pairs. For example if it receives something @guaranteed and only passes it somewhere that requests @guaranteed (and of course doesn't store it etc).

I think the crux in your example is that it needs @owned Storage to construct a Wrapper but the Storage is only passed into forward as @guaranteed.

The way I understand it is that if you build compounds (structs/tuples) you will have to actually have +1 refs for each individual piece and hand that to the initialiser. And this feels like something that's important for the semantics so I don't think we can just optimise it out... But I do not have anything documented to back this up, this is just the model in my head. Hopefully @Michael_Gottesman can help us here :slight_smile: .

EDIT: Btw @Karl, you may be able to make this work by annotating with enough __owned, forward would take func forward(_ storage: __owned Storage, ...). That way the callsite could (if it doesn't require it anymore) "donate" its Storage into forward which could then donate that into Wrapper etc. Unfortunately last time I tried to use __owned for this, it turned out to not work anymore.

2 Likes

The high-level property you care about with uniqueness checking is not “is the reference count of this object exactly 1”, it’s “can any other context observe this object”. As a direct result, it is only meaningful to ask it of an owned reference that you have exclusive access to, because if the other contexts can access the reference simultaneously, they can of course observe the object no matter what its reference count is. The way we express exclusive access in Swift is with inout.

The standard rule in Swift is that new values you construct are owned values which own their component values. This is a rule that allows values to present a uniform ABI where there is a single representation for values of the type, whether borrowed or owned, regardless of the nature of its component types. If you want to allow borrowed-by-construction values which can be constructed from borrows of the components, that is sensible, but such values cannot have the same representation as an owned value for all possible component types. This is because not all types in Swift have the property that a borrowed value can be relocated in memory. (In general, this is not satisfied by types whose representation is address-sensitive, e.g. because values are registered with a runtime by address, or because values can point into themselves. Among other things, we must assume all non-trivially-copyable C++ types do not have this property.) As a result, a borrowed-by-construction value must either store addresses of borrowed values (giving them a non-uniform representation) or disallow component types whose borrows cannot be relocated.

An alternative that might work in some cases would be to change the type to always store a borrow of its component. This is not currently expressible in Swift without falling back on unsafe pointers. Such a type would be inherently lifetime-restricted because a value would always be dependent on another value. But that is something that can be expressed in e.g. Rust.

Anyway, the optimization limitation you’re probably running into is that I believe SIL doesn’t know how to put together these borrowed-by-construction values even in the cases where it can support them, so if the optimizer can’t completely eliminate the need for the aggregate, it will need to produce an owned value.

6 Likes

The reason that we are not removing this is due to phase ordering issues with respect to inlining. In non-OSSA SIL, we are not allowed to remove this retain/release pair, but in OSSA SIL, we are allowed to. Sadly, today ownership OSSA is lowered before inlining occurs (this limitation is currently being worked on and can be enabled by passing in the flag -enable-ossa-modules). If one enables ossa modules or marks the initializer of Storage as transparent (ensuring that it is inlined early enough before Ownership SSA is lowered), then the Ownership SSA optimizer properly eliminates this ARC traffic.

6 Likes

i do this a lot and i have always wondered what happens when i vend an API that does this sort of thing, so i am very interested in this discussion and kind of frustrated with myself for being unable to understand the SIL knowledge being shared in this thread.

in a real library, i would probably mark this thing @inlinable like:

@inlinable public
func forward(_ storage:Storage, _ callback:(Wrapper) throws -> ()) rethrows
{
    try callback(.init(storage: storage))
}

i agree that the godbolt for the non-inlined version of this function does not look good, but i have also learned that godbolt tends to show us the worst-case code gen that happens when something does not get inlined by the client module.

would the scary

still happen if forward(_:_:) were @inlinable?

Yes it was inlinable. The thing I was doing involved making slices - which of course are wrapper structs over reference-counted storage:

extension MutableCollection {

  @inlinable
  internal mutating func removeSegments(
    separatedBy isSeparator: (Element) -> Bool,
    where shouldRemove: (SubSequence) -> Bool
  ) -> Index
}

And then I was calling this in a function, where each slice was wrapped in another struct and forwarded.

public struct MyCollection {

  @usableFromInline
  var storage: MyStorage

  public struct Element {
    @usableFromInline
    var storage: MyStorage.SubSequence
  }

  @inlinable
  internal mutating func removeAll(
    where predicate: (Element) -> Bool
  ) {

   let newEnd = storage.removeSegments(
     separatedBy: { $0 == 42 }
   ) { storageSlice in
     // ... some logic ...
     guard !storageSlice.isEmpty else { return true }

     // Wrap the slice as an Element (our API type)
     // and ask the user if we should remove it.
     return predicate(Element(storageSlice))
   }

   // ...etc 
  }
}

All of these levels of wrapping add retain/release calls. Also, the implementation sometimes makes use of slices, such as this part in removeSegments:

while let separatorIdx = self[readHead..<end].firstIndex(where: isSepatator) {
  ...
}

When I benchmarked it, retain/release calls in removeSegments amounted to ~20% of the execution time.

What I can do to remove these calls is stop using Slices: create my own version of firstIndex(where:) which takes a Range<Index> to operate on, and provide a (Self, Range<Index>) pair to the closure instead of a Subsequence (in fact, I seem to need to make it an inout Self to avoid the RR pair). But that's not a satisfactory solution, of course.

As far as inlining goes - when I used slices, the removeSegments call didn't get inlined, but once I added the above hacks to avoid slices and the associated retain/releases, they apparently became simple enough that they did then get inlined :neutral_face:

@Michael_Gottesman 's suggestion of -enable-ossa-modules really helps with the minimal example. I'm still trying to test it with the full library, but there are some crashes in other areas so I need to go around disabling stuff so I can get an idea of how the compiler might be able to optimise this code in future when this feature is complete.

Godbolt of minimal example with -enable-ossa-modules

output.forward(output.Storage, (output.Wrapper) -> ()) -> ():
        push    r13
        mov     r13, rdx
        call    rsi
        pop     r13
        ret
3 Likes

I have been experimenting with -Xfrontend -enable-ossa-modules and am impressed with the results. It reduces ARC traffic dramatically.

Is it safe to enable this option in Swift 5.7.2?

this is slightly off topic but can we add an explanation of the (O)SSA acronym to the SIL doc? i couldn’t find any explanation of this acronym from googling “swift ssa” or skimming the SIL doc, and wasn’t able to figure out what this term meant until i got lucky and stumbled upon

3 Likes

inout is still the only way to pass a value without semantically copying it. We'll need formal support for borrowed arguments.

With the current state of the parameter modifier proposals, isKnownUniquelyReferenced could be changed from inout to borrowing, but the user of the API would always need to remember to write borrow on the caller side as well, which is not great.

We've discussed this before, and I also covered it above. The semantics of borrowed values in Swift do not support isKnownUniquelyReferenced in any meaningful way.

Now, I understand what people mean when they say this should work with borrows and not require inout. isKnownUniquelyReferenced is not naturally a mutating operation, and it is reasonable to say that you should not have to "mutate" something in order to test whether it holds a unique reference. But borrows in Swift are not just immutable values, they are non-exclusive immutable values, and that non-exclusiveness makes it meaningless to ask whether they hold a unique reference.

Swift could choose to support exclusive immutable borrows, and then the operation would be meaningful. However, it is not at all clear that that would be useful enough to consider supporting. My instinct is that, because exclusive immutable borrows must be rooted in local ownership of a value, any operations which would benefit from this can always be described as either mutation or consumption and thus present a mutable value. Essentially the only thing you can't write is something that tests for a unique reference and then doesn't even try to use that uniqueness, e.g. assert(isKnownUniquelyReferenced(myLet)) and not a whole lot else.

Can’t you mutate class instances via immutable reference? Why is it “not useful” to know whether mutations through your immutable reference are visible via other immutable references?

2 Likes

i have a use case where i need to prohibit users from using scoped APIs within another scoped API:

session.withTransaction
{
    session.withTransaction
    {
        fatalError("can’t start a transaction inside another transaction!")
    }
}

i can slow them down by making session inout and the withTransaction(_:) method mutating:

session.withTransaction
{
    // can’t call `withTransaction(_:)` again with `session`
}

but session has reference semantics and they can still do

var other:Session = session
session.withTransaction
{
    other.withTransaction
    {
        fatalError("can’t start a transaction inside another transaction!")
    }
}

I think John’s point is that you can never get any answer other than false. If you use today’s implementation of checking the refcount…

// Pseudo-syntax
let x = MyObject()
borrow x1 = borrow x
borrow x2 = borrow x
x2.value = 0
if isKnownUniquelyReferenced(borrow x1) {
  x1.value = 5
}
print(x2.value) // 5, oops!
1 Like

Ah, ok, so it’s a meaningful question but the only way to answer it is to be overly conservative.

I think the trick here is to make withTransaction consume the session, and add a private generation counter to the transaction type to detect attempts to escape the transaction out of the closure.

the session objects already track their transaction stacks so misusing one (e.g., nesting transactions) always results in a precondition failure. but i would like for this kind of thing to not compile in the first place :slight_smile:

the sessions have reference semantics, so even if it consumed the reference that the outer withTransaction(_:) was called on, it won’t stop that method from being called from any of the other references.