Hey there!
I have some performance critical code in my project, and I noticed that 20-30% of the time on the hot path is spent on retain/release calls. I have reduced the example to this code:
protocol P {
func take(a: A)
}
struct V: P {
func take(a: A) {
a.stuff()
}
}
class A {
func stuff() {
print(0)
}
}
struct B {
let a = A()
let b: [P]
init(b: [P]) {
self.b = b
}
func d() {
for t in self.b {
t.take(a: self.a)
}
}
}
let b = B(b: [V()])
b.d()
The hot function in question is B.d
. It passes owned A
instance to each value in the owned array of existentials [P]
.
Godbolt (swiftc nightly -O) shows the following disassembly of d
:
output.B.d() -> ():
...
mov rbx, qword ptr [rsi + 16]
test rbx, rbx
je .LBB3_4
mov r14, rdi
lea r15, [rsi + 32]
mov qword ptr [rsp], rsi
mov rdi, rsi
call swift_retain@PLT
.LBB3_2:
mov r12, qword ptr [r15 + 24]
mov rbp, qword ptr [r15 + 32]
mov rdi, r15
mov rsi, r12
call __swift_project_boxed_opaque_existential_1
mov rdi, r14
mov r13, rax
mov rsi, r12
mov rdx, rbp
call qword ptr [rbp + 8]
add r15, 40
dec rbx
jne .LBB3_2
mov rdi, qword ptr [rsp]
add rsp, 8
pop rbx
pop r12
pop r13
pop r14
pop r15
pop rbp
jmp swift_release@PLT
.LBB3_4:
...
In my understanding of ARC optimizations, lifetime of B.a
is basically guaranteed through the call to B.d
, so retain/release pair should be removed by the optimizer.
Is there any way to remove this overhead? In my code I am using Unmanaged
to successfully remove retain/release calls in this example, but I am curious if this is an expected behavior and are there any "safe" ways to remove this overhead?