Why is self a strong reference?

yakubin · January 5, 2024, 11:01pm

I’ve tried doing that, but XCode uses swift-frontend instead of swiftc and many of the flags there are invalid for swiftc . The ones that I tried which worked didn’t make a difference.

I’ve opened an issue on GitHub about it. Maybe people more knowledgeable about the compiler will know something: Program is several times slower when compiled with swiftc than when compiled with XCode due to retain&release calls · Issue #70745 · apple/swift · GitHub

I’ve also discovered that in the XCode build the compiler must be noticing that it’s a final class, because the number of calls to swift_retain is exactly the same as in the binary compiled with swiftc when the class is marked as final. I just don’t know why the compiler notices that when invoked from XCode, but not when invoked from swiftc.

tera · January 6, 2024, 12:07am

I made a subclass of "Sudoku" class (and used it so it's not stripped out as unused) to force Sudoku class to be "actually" non-final – but that didn't change the timing of the binary made by Xcode. Likewise the timing of using a derived class instead of Sudoku was not affected either.

Do you recon it could affect the timing that much? In the above standalone test the extra time of doing 10M extra retain/release pairs is just 0.04 seconds, but maybe we are not accounting for some other difference.

yakubin · January 6, 2024, 12:10am

We don’t actually know the number of calls to retain that the class version makes. Instrumentation slows it down too much. I suspect if I let this program run under DTrace without the timeout, it would run for about 2 days, which wouldn’t be fun. Also, the profile in Instruments.app is pretty clear about those calls being the bottleneck. (Counting it with breakpoints in lldb is even slower, so forget about it.)

tera · January 6, 2024, 12:15am

Reduce the "n" so it runs in a reasonable time?

yakubin · January 6, 2024, 12:26am

For n=1 it made 6918135 calls. (Removed incorrect calculations. It’s too late for me today. Going to sleep.)

tera · January 6, 2024, 12:32am

is that for "final class" or "class" and what is the number for another?

yakubin · January 6, 2024, 12:34am

That’s for class. For struct it’s 1463. For final class it’s 1443.

Anyway, as I said in my comment higher up, based on the profile, those calls amount to about 65% of the runtime of the class version, while they're insignificant for the other versions.

tera · January 6, 2024, 12:40am

Well, at least that checks out: 6918135 * 200 would be 1383627000, at 230M retain/release per second that would add 6 seconds – which is indeed the difference you observe.

tera · January 6, 2024, 12:19pm

I found it, the crucial difference is "-whole-module-optimization". If you pass it to "swiftc" it will be as quick as "swift" or Xcode.

Ditto for godbolt! (both when analysing and executing code).

yakubin · January 6, 2024, 1:39pm

Cool. I’ve also found a minimal pair of commands to produce a faster binary using swift-frontend: Program is several times slower when compiled with swiftc than when compiled with XCode due to retain&release calls · Issue #70745 · apple/swift · GitHub

It doesn’t include the -whole-module-optimization flag. So I guess in swift-frontend it’s enabled by default.