question about performance of dispatches on existentials

Hi swift-dev,

If I have basically this program (full program see at the tail end of this mail)

public class A { func bar() { ... }}
public protocol B {
    func foo(_ a: A)
}
extension B {
    func foo(_ a: A) { a.bar() }
}
public class ActualB: B {
}
public class OtherB: B {
}
func abc() {
    let b: B = makeB()
    b.foo(a)
}

I get the following call frames when running it (compiled with `swiftc -O -g -o test test.swift`):

    frame #1: 0x0000000100001dbf test`specialized A.bar() at test.swift:6 [opt]
    frame #2: 0x0000000100001e6f test`specialized B.foo(_:) [inlined] test.SubA.bar() -> () at test.swift:0 [opt]
    frame #3: 0x0000000100001e6a test`specialized B.foo(a=<unavailable>) at test.swift:23 [opt]
    frame #4: 0x0000000100001a6e test`B.foo(_:) at test.swift:0 [opt]
    frame #5: 0x0000000100001b3e test`protocol witness for B.foo(_:) in conformance OtherB at test.swift:0 [opt]
    frame #6: 0x0000000100001ccd test`abc() at test.swift:45 [opt]
    frame #7: 0x0000000100001969 test`main at test.swift:48 [opt]

1, 6, and 7 are obviously totally fine and expected.

In 6 we are also building and destroying an existential box, also understandable and fine.

But there's two things I don't quite understand:

I) Why (in 5) will the existential container be retained and released?

--- SNIP ---
                     __T04test6OtherBCAA1BA2aDP3fooyAA1ACFTW: // protocol witness for test.B.foo(test.A) -> () in conformance test.OtherB : test.B in test
0000000100001b20 push rbp ; CODE XREF=__T04test7ActualBCAA1BA2aDP3fooyAA1ACFTW+4
0000000100001b21 mov rbp, rsp
0000000100001b24 push r14
0000000100001b26 push rbx
0000000100001b27 mov r14, rdi
0000000100001b2a mov rbx, qword [r13]
0000000100001b2e mov rdi, rbx
0000000100001b31 call _swift_rt_swift_retain
0000000100001b36 mov rdi, r14 ; argument #1 for method __T04test1BPAAE3fooyAA1ACF
0000000100001b39 call __T04test1BPAAE3fooyAA1ACF ; (extension in test):test.B.foo(test.A) -> ()
0000000100001b3e mov rdi, rbx
0000000100001b41 pop rbx
0000000100001b42 pop r14
0000000100001b44 pop rbp
0000000100001b45 jmp _swift_rt_swift_release
                        ; endp
--- SNAP ---

II) Why are 2, 3, 4 and 5 not one stack frame? Seems like we could just JMP from one to the next. Sure in 5 the call is surrounded by a release/retain but in the others we could just JMP.

We see quite a measurable performance issue in a project we're working on (email me directly for details/code) and so I thought I'd ask because I'd like to understand why this is all needed (if it is).

Many thanks,
  Johannes

--- SNIP ---
import Darwin

public class A {
    @inline(never)
    public func bar() {
        print("bar")
    }
}
public class SubA: A {
    @inline(never)
    public override func bar() {
        print("bar")
    }
}

public protocol B {
    func foo(_ a: A)
}

public extension B {
    @inline(never)
    func foo(_ a: A) {
        a.bar()
    }
}

public class ActualB: B {
}

public class OtherB: B {
}

public func makeB() -> B {
    if arc4random() == 1231231 {
        return ActualB()
    } else {
        return OtherB()
    }
}

@inline(never)
func abc() {
    let a = SubA()
    let b: B = makeB()
    b.foo(a)
}

abc()
--- SNAP ---

This is a failure in the optimizer of identifying two loads to return the same value and so it can’t remove a retain/release pair.

/ protocol witness for B.foo(_:) in conformance OtherB
sil shared [transparent] [serialized] [thunk] @_T04test6OtherBCAA1BA2aDP3fooyAA1ACFTW : $@convention(witness_method) (@owned A, @in_guaranteed OtherB) -> () {
// %0 // user: %7
// %1 // user: %3
bb0(%0 : $A, %1 : $*OtherB):
  %2 = alloc_stack $OtherB // users: %9, %4, %11, %7
  %3 = load %1 : $*OtherB // users: %6, %4
  store %3 to %2 : $*OtherB // id: %4
  // function_ref B.foo(_:)
  %5 = function_ref @_T04test1BPAAE3fooyAA1ACF : $@convention(method) <τ_0_0 where τ_0_0 : B> (@owned A, @in_guaranteed τ_0_0) -> () // user: %7
  strong_retain %3 : $OtherB // id: %6
  %7 = apply %5<OtherB>(%0, %2) : $@convention(method) <τ_0_0 where τ_0_0 : B> (@owned A, @in_guaranteed τ_0_0) -> ()
  %8 = tuple () // user: %12
  %9 = load %2 : $*OtherB // user: %10
  strong_release %9 : $OtherB // id: %10
  dealloc_stack %2 : $*OtherB // id: %11
  return %8 : $() // id: %12
} // end sil function ‘_T04test6OtherBCAA1BA2aDP3fooyAA1ACFTW’

If load store forwarding could just tell that the apply does not write to the alloc_stack (It could because @in_guaranteed guarantees no write) … i would expect it to mem promote this … ARC could then remove the retain/release pair (AFAICT).

···

On Jul 7, 2017, at 11:27 AM, Johannes Weiß via swift-dev <swift-dev@swift.org> wrote:

Hi swift-dev,

If I have basically this program (full program see at the tail end of this mail)

public class A { func bar() { ... }}
public protocol B {
   func foo(_ a: A)
}
extension B {
   func foo(_ a: A) { a.bar() }
}
public class ActualB: B {
}
public class OtherB: B {
}
func abc() {
   let b: B = makeB()
   b.foo(a)
}

I get the following call frames when running it (compiled with `swiftc -O -g -o test test.swift`):

   frame #1: 0x0000000100001dbf test`specialized A.bar() at test.swift:6 [opt]
   frame #2: 0x0000000100001e6f test`specialized B.foo(_:) [inlined] test.SubA.bar() -> () at test.swift:0 [opt]
   frame #3: 0x0000000100001e6a test`specialized B.foo(a=<unavailable>) at test.swift:23 [opt]
   frame #4: 0x0000000100001a6e test`B.foo(_:) at test.swift:0 [opt]
   frame #5: 0x0000000100001b3e test`protocol witness for B.foo(_:) in conformance OtherB at test.swift:0 [opt]
   frame #6: 0x0000000100001ccd test`abc() at test.swift:45 [opt]
   frame #7: 0x0000000100001969 test`main at test.swift:48 [opt]

1, 6, and 7 are obviously totally fine and expected.

In 6 we are also building and destroying an existential box, also understandable and fine.

But there's two things I don't quite understand:

I) Why (in 5) will the existential container be retained and released?

--- SNIP ---
                    __T04test6OtherBCAA1BA2aDP3fooyAA1ACFTW: // protocol witness for test.B.foo(test.A) -> () in conformance test.OtherB : test.B in test
0000000100001b20 push rbp ; CODE XREF=__T04test7ActualBCAA1BA2aDP3fooyAA1ACFTW+4
0000000100001b21 mov rbp, rsp
0000000100001b24 push r14
0000000100001b26 push rbx
0000000100001b27 mov r14, rdi
0000000100001b2a mov rbx, qword [r13]
0000000100001b2e mov rdi, rbx
0000000100001b31 call _swift_rt_swift_retain
0000000100001b36 mov rdi, r14 ; argument #1 for method __T04test1BPAAE3fooyAA1ACF
0000000100001b39 call __T04test1BPAAE3fooyAA1ACF ; (extension in test):test.B.foo(test.A) -> ()
0000000100001b3e mov rdi, rbx
0000000100001b41 pop rbx
0000000100001b42 pop r14
0000000100001b44 pop rbp
0000000100001b45 jmp _swift_rt_swift_release
                       ; endp
--- SNAP ---

II) Why are 2, 3, 4 and 5 not one stack frame? Seems like we could just JMP from one to the next. Sure in 5 the call is surrounded by a release/retain but in the others we could just JMP.

We see quite a measurable performance issue in a project we're working on (email me directly for details/code) and so I thought I'd ask because I'd like to understand why this is all needed (if it is).

Many thanks,
Johannes

--- SNIP ---
import Darwin

public class A {
   @inline(never)
   public func bar() {
       print("bar")
   }
}
public class SubA: A {
   @inline(never)
   public override func bar() {
       print("bar")
   }
}

public protocol B {
   func foo(_ a: A)
}

public extension B {
   @inline(never)
   func foo(_ a: A) {
       a.bar()
   }
}

public class ActualB: B {
}

public class OtherB: B {
}

public func makeB() -> B {
   if arc4random() == 1231231 {
       return ActualB()
   } else {
       return OtherB()
   }
}

@inline(never)
func abc() {
   let a = SubA()
   let b: B = makeB()
   b.foo(a)
}

abc()
--- SNAP ---

_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

Thanks very much Arnold, also for filing the bug!

···

On 7 Jul 2017, at 8:07 pm, Arnold Schwaighofer <aschwaighofer@apple.com> wrote:

This is a failure in the optimizer of identifying two loads to return the same value and so it can’t remove a retain/release pair.

/ protocol witness for B.foo(_:) in conformance OtherB
sil shared [transparent] [serialized] [thunk] @_T04test6OtherBCAA1BA2aDP3fooyAA1ACFTW : $@convention(witness_method) (@owned A, @in_guaranteed OtherB) -> () {
// %0 // user: %7
// %1 // user: %3
bb0(%0 : $A, %1 : $*OtherB):
  %2 = alloc_stack $OtherB // users: %9, %4, %11, %7
  %3 = load %1 : $*OtherB // users: %6, %4
  store %3 to %2 : $*OtherB // id: %4
  // function_ref B.foo(_:)
  %5 = function_ref @_T04test1BPAAE3fooyAA1ACF : $@convention(method) <τ_0_0 where τ_0_0 : B> (@owned A, @in_guaranteed τ_0_0) -> () // user: %7
  strong_retain %3 : $OtherB // id: %6
  %7 = apply %5<OtherB>(%0, %2) : $@convention(method) <τ_0_0 where τ_0_0 : B> (@owned A, @in_guaranteed τ_0_0) -> ()
  %8 = tuple () // user: %12
  %9 = load %2 : $*OtherB // user: %10
  strong_release %9 : $OtherB // id: %10
  dealloc_stack %2 : $*OtherB // id: %11
  return %8 : $() // id: %12
} // end sil function ‘_T04test6OtherBCAA1BA2aDP3fooyAA1ACFTW’

If load store forwarding could just tell that the apply does not write to the alloc_stack (It could because @in_guaranteed guarantees no write) … i would expect it to mem promote this … ARC could then remove the retain/release pair (AFAICT).

[SR-5403] Memory Optimization Opportunity (Load/Store forwarding) · Issue #47977 · apple/swift · GitHub

On Jul 7, 2017, at 11:27 AM, Johannes Weiß via swift-dev <swift-dev@swift.org> wrote:

Hi swift-dev,

If I have basically this program (full program see at the tail end of this mail)

public class A { func bar() { ... }}
public protocol B {
   func foo(_ a: A)
}
extension B {
   func foo(_ a: A) { a.bar() }
}
public class ActualB: B {
}
public class OtherB: B {
}
func abc() {
   let b: B = makeB()
   b.foo(a)
}

I get the following call frames when running it (compiled with `swiftc -O -g -o test test.swift`):

   frame #1: 0x0000000100001dbf test`specialized A.bar() at test.swift:6 [opt]
   frame #2: 0x0000000100001e6f test`specialized B.foo(_:) [inlined] test.SubA.bar() -> () at test.swift:0 [opt]
   frame #3: 0x0000000100001e6a test`specialized B.foo(a=<unavailable>) at test.swift:23 [opt]
   frame #4: 0x0000000100001a6e test`B.foo(_:) at test.swift:0 [opt]
   frame #5: 0x0000000100001b3e test`protocol witness for B.foo(_:) in conformance OtherB at test.swift:0 [opt]
   frame #6: 0x0000000100001ccd test`abc() at test.swift:45 [opt]
   frame #7: 0x0000000100001969 test`main at test.swift:48 [opt]

1, 6, and 7 are obviously totally fine and expected.

In 6 we are also building and destroying an existential box, also understandable and fine.

But there's two things I don't quite understand:

I) Why (in 5) will the existential container be retained and released?

--- SNIP ---
                    __T04test6OtherBCAA1BA2aDP3fooyAA1ACFTW: // protocol witness for test.B.foo(test.A) -> () in conformance test.OtherB : test.B in test
0000000100001b20 push rbp ; CODE XREF=__T04test7ActualBCAA1BA2aDP3fooyAA1ACFTW+4
0000000100001b21 mov rbp, rsp
0000000100001b24 push r14
0000000100001b26 push rbx
0000000100001b27 mov r14, rdi
0000000100001b2a mov rbx, qword [r13]
0000000100001b2e mov rdi, rbx
0000000100001b31 call _swift_rt_swift_retain
0000000100001b36 mov rdi, r14 ; argument #1 for method __T04test1BPAAE3fooyAA1ACF
0000000100001b39 call __T04test1BPAAE3fooyAA1ACF ; (extension in test):test.B.foo(test.A) -> ()
0000000100001b3e mov rdi, rbx
0000000100001b41 pop rbx
0000000100001b42 pop r14
0000000100001b44 pop rbp
0000000100001b45 jmp _swift_rt_swift_release
                       ; endp
--- SNAP ---

II) Why are 2, 3, 4 and 5 not one stack frame? Seems like we could just JMP from one to the next. Sure in 5 the call is surrounded by a release/retain but in the others we could just JMP.

We see quite a measurable performance issue in a project we're working on (email me directly for details/code) and so I thought I'd ask because I'd like to understand why this is all needed (if it is).

Many thanks,
Johannes

--- SNIP ---
import Darwin

public class A {
   @inline(never)
   public func bar() {
       print("bar")
   }
}
public class SubA: A {
   @inline(never)
   public override func bar() {
       print("bar")
   }
}

public protocol B {
   func foo(_ a: A)
}

public extension B {
   @inline(never)
   func foo(_ a: A) {
       a.bar()
   }
}

public class ActualB: B {
}

public class OtherB: B {
}

public func makeB() -> B {
   if arc4random() == 1231231 {
       return ActualB()
   } else {
       return OtherB()
   }
}

@inline(never)
func abc() {
   let a = SubA()
   let b: B = makeB()
   b.foo(a)
}

abc()
--- SNAP ---

_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev