Why is self a strong reference?

Hello. I’ve just made one benchmark several times faster by just turning a class into a struct (swift sudoku: turn Sudoku class into a struct by yakubin · Pull Request #36 · attractivechaos/plb2 · GitHub). The reason why it worked is that before the profiler (Instruments.app) showed lots of retain&release calls. Now it doesn’t show any. And indeed looking at output of swiftc -emit-sil -module-name output sudoku.swift | swift demangle I can see that before the change there were a couple strong_retain and strong_release instructions in the IL. Now there are none. The biggest offenders were the instructions in the method Sudoku.solve(), which were inserted due to the calls to Sudoku.update().

My question is: why are they needed? Doesn’t the caller of a method already guarantee that self is valid throughout the execution of the method? In particular, in this example doesn’t the implicit main() guarantee that for Sudoku.solve() and Sudoku.solve() for Sudoku.update()? I’d expect that such instructions are necessary when a reference may escape the call stack (when it’s saved in a data structure or pushed to a task queue etc.), but no such thing happens to self during normal method calls.

What am I misunderstanding?

1 Like

Since you're mentioning implicit main, it's probably a good thing to mention that variables in implicit main are also globals (since they are not enclosed in any function). Because of this they can be assigned to another value by any other function, and so an extra retain is needed for the duration of any call using the value to protect against this.

As an alternative to turning your class into a struct, you could try changing the var holding it to a let. Or turn main into a proper function so there is a true local scope. The later is probably the best.


But this does not explain why it's happening inside of solve. I took your code and made some changes so solves take an array of UnicodeScalar instead of a String. That seems to remove the problematic retain and release calls. See: Compiler Explorer

String is a pretty heavy type in Swift. It is modeled as an array of grapheme clusters (human perceived characters), each of which can be arbitrary long (due to combining code points). So when you access or write a character in a String (or even in a Character variable), it needs to consider code paths where the character can't be stored inline, and those may include allocations, retain and release calls. You shouldn't hit those code paths with ASCII characters, but if you don't want to see them appear in the compiled code it's best to avoid String and Character.

I also commented out the print to avoid having any string code in solve when looking for ARC trafic. Maybe you'll want to reenable it.


Note that I did not try to run or profile the code I changed. I just looked at the disassembly output.

2 Likes

Isn't this forbidden by the law of exclusivity?

I tried your Compiler Explorer example. It still calls retain in Sudoku.solve(). You can see it in two ways:

  1. Inspect the IL of Sudoku.solve():
awk "/end sil function 'output.Sudoku.solve/{print;exit} /\/\/ Sudoku.solve\(_:\)/ {print; p=1;next} p==1{print}" sudoku-unicode.sil | grep retain
  retain_value %0 : $Array<Unicode.Scalar>        // id: %53
  strong_retain %1 : $Sudoku                      // id: %1172

Here we can see that it calls strong_retain for an object of type Sudoku, i.e. for the self argument.

(Note that only strong_retain is problematic. retain_value is cheap.)

  1. Run the program under Instruments.app. There are still many calls to retain&release visible.

  2. Time the benchmark. It still runs in 9-10 seconds, while the struct version runs in 1.8s.

As an alternative to turning your class into a struct, you could try changing the var holding it to a let.

That has no effect on reference-typed variables other than you not being able to assign a different reference to the variable. And sure enough after making the change there is no change in the IL code, nor in benchmark timing, nor in the profile.

Good point about String and UnicodeScalar though.

Because of this they can be assigned to another value by any other function, and so an extra retain is needed for the duration of any call using the value to protect against this.

To protect against what though? Not incrementing the reference count for the duration of the method call is not going to lead to a use-after-free or anything. It’s guaranteed to be greater than or equal to 1, because the caller keeps a reference to the self object throughout the whole time.

1 Like

I moved the "class" version of your code into an explicit main() and got rid of all extra retain/release calls that way. It didn't affect the timing though, as 4000 retain/release calls would not take much anyway (maybe 1ms if that).

func main() {
    var n = 200;
    var a = Sudoku();
    ...
}
main()
1 Like

This is generally true for a local variable. But since implicit main variables are global, other functions can assign to the variable during the call, causing a release. So the caller has to defensively retain before making the call. Simple example:

func test(_ local: MyObject) {
   global = MyObject() // <-- releasing value in global here
   print(local) // <-- object still needs to be retained by someone
}

class MyObject {}

// implicit main
var global = MyObject()
test(global)

Of course, inlining could get rid of the extra retain/release here.


Edit: I don't think we're contradcting each other here though, since you are concerned about an extra retain inside of solve, not around the call by the caller.

I don't see a strong_retain in the Compiler Explorer output though. Are you inspecting the IL for the optimized version?

I'm not really used at looking at the IL though, so I can't really comment much on that.

Echoing @michelf comment above: changing "var a = Sudoku();" to "let a = Sudoku();" also gets rid of all those extra retains/releases.

The speedup of the "struct" version in this case is due to something else.


Interesting anomaly: switching exclusiveAccessToMemory build setting from "run-time + compile-time" to "compile-time only" slows down (!) the app (by 15 - 20 %). Very surprising result.

It doesn’t get rid of the calls to retain/release inside Sudoku.solve(), when it calls Sudoku.update(). You can see with the awk command I gave above.

And it clearly takes a lot, as in this version the calls inside Sudoku.update() for some reason take 24% + 13.3% + 5.4% + 2.7% + 9.9% + 3.4% + 2.1% + 9.5% + 4.8% + 0.3% + 0.1% + 0.1% = 75.6% of Sudoku.update() or 64.7% of total runtime (Sudoku.update() takes 85.7%).

Are you sue you are using "release" configuration? In my tests I see no retain/release calls (other than the 2 or so).

I was inspecting an unoptimised version, which printed the Sudoku annotation. After inspecting the optimised version I still see calls to strong_retain, but this time for Builtin.BridgeObject instead:

swiftc -emit-sil -module-name output -Ounchecked sudoku.swift | swift demangle > sudoku.sil
awk "/end sil function 'output.Sudoku.solve/{print;exit} /\/\/ Sudoku.solve\(_:\)/ {print; p=1;next} p==1{print}" sudoku.sil | grep retain
  strong_retain %113 : $Builtin.BridgeObject      // id: %114
  strong_retain %153 : $Builtin.BridgeObject      // id: %401
  strong_retain %153 : $Builtin.BridgeObject      // id: %402
  strong_retain %153 : $Builtin.BridgeObject      // id: %413
  strong_retain %842 : $Builtin.BridgeObject      // id: %859

In CompilerExplorer (and Instruments.app) those show up as calls to dynamically-linked functions swift_bridgeObjectRetain and swift_bridgeObjectRelease.

I’m guessing LLVM optimisation passes inlined some retain&release codes from Sudoku.solve() inside Sudoku.update(), because in Sudoku.update() Swift IL there are no such calls.

Those are related to array accesses.

Why do those disappear though after changing the class into a struct?

I still see those in the "struct" version:

They’re not significant in the profile though.

I do not see a difference between "struct" and "class" in regards to those array accesses in the instruments either. Could it be the case that when you were checking it you were using Debug configuration in profile?

No. I compiled the binary on the CLI using the command swiftc -Ounchecked refcount.swift and run Instruments.app on this binary with the CPU Profiler.

Version with class takes 10.5s and its profile looks like this:

(wait a moment, Discourse forbids me from putting two screenshots in one comment)

Version with struct takes 2.2s and its profile looks like this:

Those differences pretty much match the differences between the unoptimised SILs for the two versions when it comes to strong_retain.

Another possible avenue of investigation would be to make the class final and see if there's a change.

1 Like

There is! It's only half a second slower than the struct version. And the retain&release calls are gone from the profile in similar fashion.

I'm getting these results:

-Ounchecked (isn't that evil)?

  • struct: 1.8 sec
  • class: 1.9 sec

-O

  • struct 2.5 sec
  • class 2.8 sec

This is what profile gives me (class + Ounchecked):


FWIW I didn't mark the class final. Just tried that - no difference at all, same timing (both -O and -Ounchecked)