"class" and "final class" between swift 4.2 and swift 5 (performance)

woodcrust · March 20, 2019, 3:32am

Hello, its me with puffin bench again
Sorry, my english is very bad...

Please tell me why this code(without "final class") executing with so big difference like 4 vs 25 seconds between versions 4.2 and 5 ?

compile with commsnd
swiftc -Ounchecked -whole-module-optimization -Xcc -O2
code with final Class
puffin_bench/swift2_final_class.swift at master · nerzh/puffin_bench · GitHub
code without final keyword
puffin_bench/swift2_class.swift at master · nerzh/puffin_bench · GitHub

I saw article about Increasing Performance by Reducing Dynamic Dispatch

But 4 vs 25 seconds on different versions ... looks like performance regression

4.2.2

5-development

nuclearace · March 20, 2019, 3:34pm

I've moved this to the compiler development section, since this is unrelated to server stuff. But I would say that if you're getting vastly different ~~compile times~~ results, you should file a bug on bugs.swift.org. If you can reference your implementation that causes the issue, or make a sample project that reproduces it, that'd be extremely helpful.

Jon_Shier · March 20, 2019, 3:38pm

Those are run times, not build times.

nuclearace · March 20, 2019, 3:39pm

Oh oops. Regardless though, I think the compiler category is the appropriate one for this, and filing a bug report still stands.

nuclearace · March 20, 2019, 3:43pm

Although, one thing that it could be is the change in exclusivity checking between 4 and 5. I haven't dug into his code, but iirc, having final can aid in eliding some of the more conservative checking.

Joe_Groff · March 20, 2019, 3:44pm

When the class isn't final, the property access to arr has to be dynamically dispatched and treated like a potentially-computed property, in case a subclass overrides it. In -O -wmo builds, though, we ought to notice later that there are no subclasses of the class in the program and inline the access. It's possible that that isn't happening for some reason, or that if so, the exclusive access marker calls are not optimized after the inlining happens. @Andrew_Trick, do you know if this is a known issue?

Jon_Shier · March 20, 2019, 3:44pm

I did some brief testing and turning off exclusivity checking didn’t help anything.

Joe_Groff · March 20, 2019, 3:55pm

Thanks. The other major change in Swift 5 that comes to my mind is the switch to the coroutine model for the property ABI; maybe there's a limitation in the devirtualizer and/or inliner's handling of the coroutine here.

woodcrust · March 20, 2019, 4:57pm

Should I make this issue on the bugs.swift.org or wait yet ?

nuclearace · March 20, 2019, 5:01pm

It's usually always a good idea to file a bug report. You (or someone else) can always close it if it turns out to not be an issue.

johannesweiss · March 20, 2019, 5:07pm

CC @Erik_Eckstein

woodcrust · March 20, 2019, 5:13pm

ok, I will to try do it now

Andrew_Trick · March 20, 2019, 5:30pm

I don't see the bug yet so I'll just comment here:

In the non-final code, SILGen uses a generalized accessor to modify the array:

  %148 = class_method %0 : $TestGlobalArr, #TestGlobalArr.arr!modify.1 : (TestGlobalArr) -> () -> (), $@yield_once @convention(method) (@guaranteed TestGlobalArr) -> @yields @inout Array<Int> // user: %149
  (%149, %150) = begin_apply %148(%0) : $@yield_once @convention(method) (@guaranteed TestGlobalArr) -> @yields @inout Array<Int> // users: %152, %156
  // function_ref Array.subscript.modify
  %151 = function_ref @$sSayxSiciM : $@yield_once @convention(method) <τ_0_0> (Int, @inout Array<τ_0_0>) -> @yields @inout τ_0_0 // user: %152
  (%152, %153) = begin_apply %151<Int>(%143, %149) : $@yield_once @convention(method) <τ_0_0> (Int, @inout Array<τ_0_0>) -> @yields @inout τ_0_0 // users: %154, %155
  store %146 to %152 : $*Int                      // id: %154
  end_apply %153                                  // id: %155
  end_apply %150                                  // id: %156

When that generalized accessor is inlined, we end up with an extra retain_value:

bb8:                                              // Preds: bb7
  %48 = load %21 : $*Int                          // user: %54
  %49 = ref_element_addr %0 : $TestGlobalArr, #TestGlobalArr.arr // user: %50
  %50 = load %49 : $*Array<Int>                   // users: %54, %58, %51
  retain_value %50 : $Array<Int>                  // id: %51

That retain_value prevents the COWArray pass from hoisting the array's
uniqueness check. That's expensive and also probably blocks any other
downstream loop optimizations.

It seems there are multiple points at which the optimizations fall
apart, but I think the starting point should be

Why does SILGen require the generalized accessor in WMO mode.
Why does using the generalized accessor (TestGlobalArr.arr!modify) cause an extra retain.

woodcrust · March 20, 2019, 5:35pm

I've created an issue at [SR-10137] “class” and “final class” between swift 4.2 and swift 5 (performance) · Issue #52539 · apple/swift · GitHub