Why is self a strong reference?

yakubin · January 4, 2024, 8:48pm

That’s odd. Maybe we’re actually running different code? Could you try the following?

class: Compiler Explorer
struct: Compiler Explorer

FWIW Swift compiler version:

$ swiftc --version
swift-driver version: 1.87.3 Apple Swift version 5.9.2 (swiftlang-5.9.2.2.56 clang-1500.1.0.2.5)
Target: arm64-apple-macosx13.0

tera · January 4, 2024, 8:50pm

Of course I got rid of all prints (there were two of those)....

The other potential difference – I am on ARM. I believe godbolt only runs intel.

yakubin · January 4, 2024, 8:53pm

I run and profile everything locally on ARM.

tera · January 4, 2024, 8:57pm

I do see an issue when running this code on godbolt itself... the "non-final class" version times out, while "final class" and "struct" version works ok.

Are you testing this on linux? (I'm on macOS and on slightly older compiler than that on the godbolt). To reiterate there is no timing difference on ARM macOS in class vs final class in this test.

yakubin · January 4, 2024, 8:59pm

I’m testing on macOS. The issue was also reproduced by the original author of the benchmarks who also run it on a Macbook and saw the same numbers for the class and struct versions that I’m seeing.

You can see how the green bar for the Swift benchmark shrinked here: updated the figure · attractivechaos/plb2@918f8d4 · GitHub
The duration difference: added fortran timing and udpated sudoku in swift · attractivechaos/plb2@e21506c · GitHub
Which was all thanks to this single-line commit: swift sudoku: turn Sudoku class into a struct (#36) · attractivechaos/plb2@a2713d0 · GitHub

tera · January 4, 2024, 9:04pm

Hard to explain. Maybe compiler difference?

This is mine:

swiftc --version
swift-driver version: 1.82.2 Apple Swift version 5.9 (swiftlang-5.9.0.114.6 clang-1500.0.27.1)
Target: arm64-apple-macosx13.0

yakubin · January 4, 2024, 9:06pm

Are you saying that you still get different numbers than I do, even after putting the prints back in?

I’m guessing the prints are useful for preventing the compiler from optimising away the benchmarked code.

tera · January 4, 2024, 9:07pm

I am NOT testing prints (I commented them out).... What's the point in timing those?

yakubin · January 4, 2024, 9:08pm

The point is that they are I/O operations that depend on the result of the computations. If you eliminate them, the compiler is free to just delete code which would otherwise be left in the binary in order to compute the printed values.

The point is not timing the prints. The point is seeing the difference between the struct and class versions (which both contain the prints, so there should be no issue with that).

tera · January 4, 2024, 9:11pm

Yeah, but print?! Its timing could vary for zillion reasons.
Try this:

var results: [String] = []
...
            //print(String(out))
            results.append(String(out))
...
        a.solve(hard20[j]);
//        print();
...
at the end:
print("done")
print(results.count)

However, even with those print's redacted - there's still a difference between "final class" and "class" I can see on godbolt... But not on my machine... go figure.

yakubin · January 4, 2024, 9:16pm

Same result:

class: Compiler Explorer

$ time ./sudoku >/dev/null
./sudoku > /dev/null  9.72s user 0.03s system 99% cpu 9.771 total

struct: Compiler Explorer

$ time ./sudoku >/dev/null
./sudoku > /dev/null  1.81s user 0.00s system 99% cpu 1.815 total

Printing is not a bottleneck here.

tera · January 4, 2024, 9:17pm

What do you recon? Compiler version difference then?

yakubin · January 4, 2024, 9:19pm

I don’t know. Seems unlikely. The versions are almost the same, while the performance difference is huge.

tera · January 4, 2024, 9:23pm

Downloading the latest (non beta) Xcode, let's see if it makes any difference for me...

Same result!

macOS 13.6.3, Xcode 15.1, M1 Pro (2021)

% swift --version
swift-driver version: 1.87.3 Apple Swift version 5.9.2 (swiftlang-5.9.2.2.56 clang-1500.1.0.2.5)
Target: arm64-apple-macosx13.0

The app compiled with -Ounchecked

struct version:
% time ./Sud > /dev/null
./Sud > /dev/null 1.81s user 0.01s system 99% cpu 1.825 total

final class version:
% time ./Sud > /dev/null
./Sud > /dev/null 1.91s user 0.01s system 87% cpu 2.188 total

class version:
% time ./Sud > /dev/null
./Sud > /dev/null 1.93s user 0.01s system 91% cpu 2.107 total

similar results with -O:

Can anyone else check if "final class" vs "class" makes any difference in this example on your machine?

yakubin · January 5, 2024, 7:04pm

Maybe let’s check if the binaries produced by our compilers result in the same number of calls to swift_retain. Here is a way to do that with DTrace:

Save the following 3 files under the listed names in one directory:

sudoku-struct.swift: Compiler Explorer
sudoku-class.swift: Compiler Explorer
sudoku-final-class.swift: Compiler Explorer

Open this directory in terminal and first run the following to refresh the sudo password:

sudo echo

And then this loop to count how many times swift_retain is called.

for ver in sudoku-struct sudoku-class sudoku-final-class; do echo -e "\n=========\nVERSION: $ver\n=========\n"; swiftc -Ounchecked "$ver.swift"; time sudo dtrace -c "./$ver" -n 'pid$target:libswiftCore.dylib:swift_retain:entry { @[probefunc] = count(); } profile:::tick-30s { printf("\nTIMEOUT\n"); exit(0); }'; done

I’ve added a 30s timeout, because the class version, when instrumented like that, seems happy to just go on forever. So each test finishes either when the program successfully exits or when the 30s timeout is reached, whichever comes first.

My results:


=========
VERSION: sudoku-struct
=========

dtrace: system integrity protection is on, some features will not be available

dtrace: description 'pid$target:libswiftCore.dylib:swift_retain:entry ' matched 2 probes
done
4000
dtrace: pid 58283 has exited

  swift_retain                                                 292003
sudo dtrace -c "./$ver" -n   1.99s user 0.79s system 42% cpu 6.577 total

=========
VERSION: sudoku-class
=========

dtrace: system integrity protection is on, some features will not be available

dtrace: description 'pid$target:libswiftCore.dylib:swift_retain:entry ' matched 2 probes
CPU     ID                    FUNCTION:NAME
  4    140                        :tick-30s 
TIMEOUT


  swift_retain                                               10632060
sudo dtrace -c "./$ver" -n   2.79s user 27.38s system 99% cpu 30.366 total

=========
VERSION: sudoku-final-class
=========

dtrace: system integrity protection is on, some features will not be available

dtrace: description 'pid$target:libswiftCore.dylib:swift_retain:entry ' matched 2 probes
done
4000
dtrace: pid 58403 has exited

  swift_retain                                                 288003
sudo dtrace -c "./$ver" -n   2.31s user 0.78s system 93% cpu 3.317 total

The struct version makes 292003 calls to swift_retain in 6.577s and exits. The final class version makes 288003 calls to swift_retain in 3.317s and exits. The class version makes 10632060 calls to swift_retain in 30s and hits timeout (without the timeout it was happy to go on for even 20 minutes with no end in sight).

swiftc version:

$ swiftc --version
swift-driver version: 1.87.3 Apple Swift version 5.9.2 (swiftlang-5.9.2.2.56 clang-1500.1.0.2.5)
Target: arm64-apple-macosx13.0

I run that on Mac Mini M1, 2020.

tera · January 5, 2024, 9:01pm

Since yesterday I have exactly the same swift version (installed by Xcode 15.1 (15C65)). Could the speed difference be just because retains/releases are much slower on M1 2020 compared to M1 Pro 2021? However... this won't explain why "final" makes a difference for you but not for me.

A quick test for those who want to try retain/release overhead:

// main.swift
import Foundation

class C {
    @discardableResult init() {
        let o = unsafeBitCast(self, to: Int.self)
        let start = Date()
        for _ in 0 ..< 10_000_000 {
            retain(o)
            release(o)
        }
        let elapsed = Date().timeIntervalSince(start)
        print("elapsed: \(elapsed)")
    }
}
C()

// mem.m
#import <Foundation/Foundation.h>

void swift_retain(long);
void swift_release(long);

void retain(long v) {
    swift_retain(v);
}
void release(long v) {
    swift_release(v);
}

// mem.h
void retain(long);
void release(long);

// bridgingHeader.h
#import "mem.h"

On MacBook Pro 2021 M1 Pro with -O (or -Ounchecked, doesn't matter) this takes 0.044 sec or about 230 million retain+release pairs per second.

yakubin · January 5, 2024, 9:09pm

On mine it takes 0.045s.

Could you run the test from my previous comment? Maybe somehow you’re running a binary that makes fewer calls? The DTrace log would show that.

tera · January 5, 2024, 9:29pm

Hmm. I did it and got the same timing as yours.

Then I tried "swift -O sudoku-class.swift" instead of "swiftc -O sudoku-class.swift" - the result was significantly faster.

Previously I used Xcode to build the binary, then regardless of whether I run the built app from Xcode or from terminal - it was fast.

At least this explains the timing difference we observe. But the open question is why building with "swiftc" makes a slower executable compared to the one made with both "swift" and Xcode.

both "swiftc --version" and "swift --version" give this result:

swift-driver version: 1.87.3 Apple Swift version 5.9.2 (swiftlang-5.9.2.2.56 clang-1500.1.0.2.5)
Target: arm64-apple-macosx13.0

BTW, after I installed the new (to me) Xcode 15.1 I did not install the corresponding command line tools afterwards... Should I?

yakubin · January 5, 2024, 10:19pm

I don’t know. I guess if the version reported by the swiftc compiler is alright, then everything is fine.

I’ve checked and indeed the binary compiled by XCode GUI is faster, in case of the class version. I don’t know what’s going on.

ksluder · January 5, 2024, 10:51pm

Take a look at the build log to see what flags Xcode is passing to the Swift compiler.