Swift's unpredictable efficiency

I decided to start this topic after having tried out a series of slight modifications to the code example of SR-7952, a bug that was reported in this thread:

The demonstration program of that bug turned out to make for a good example of how seemingly insignificant modifications to the code can significantly impact the efficiency of the resulting executable, making it very hard to detect, isolate and work around performance issues.

The following is a step by step demonstration of this.

I will use the, as of today, most recent development snapshot (although similar but not identical behavior can also be observed using eg the default toolchain of Xcode 9.4 and Xcode 10 beta).


Program 1

Let's start with the following (a variation of the demonstration program of SR-7952):

import Dispatch

func benchmark<T>(_ function: () -> T) -> (T, Double) {
    let start = DispatchTime.now()
    let res = function()
    let end = DispatchTime.now()
    return (res, Double(end.uptimeNanoseconds - start.uptimeNanoseconds) / 1e9)
}

func runBenchmarks() {
    let base = 0 ..< 1_000_000

    let (cmR, cmT) = benchmark { base.lazy.compactMap({ $0 % 4 == 0 ? nil : $0 }).reduce(0, +) }
    let (fmR, fmT) = benchmark { base.lazy.filter({ $0 % 4 != 0 }).map({ $0 }).reduce(0, +) }

    print("compactMap: \(cmR), took \(cmT) s")
    print(" filterMap: \(fmR), took \(fmT) s")
}
runBenchmarks()

Compiling and running (a couple of times):

$ /Library/Developer/Toolchains/swift-DEVELOPMENT-SNAPSHOT-2018-06-08-a.xctoolchain/usr/bin/swiftc -O -whole-module-optimization -swift-version 4.2 -static-stdlib -sdk /Applications/Xcode-beta.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk -target x86_64-apple-macosx10.13 test.swift
$ ./test
compactMap: 375000000000, took 0.318972944 s
 filterMap: 375000000000, took 0.000626542 s
$ ./test
compactMap: 375000000000, took 0.317654136 s
 filterMap: 375000000000, took 0.000569041 s
$ ./test
compactMap: 375000000000, took 0.317911993 s
 filterMap: 375000000000, took 0.000626542 s

OK, so compactMap is much slower than filterMap.


Program 2

But how much slower is it? Let's add the following two lines to find out:

import Dispatch

func benchmark<T>(_ function: () -> T) -> (T, Double) {
    let start = DispatchTime.now()
    let res = function()
    let end = DispatchTime.now()
    return (res, Double(end.uptimeNanoseconds - start.uptimeNanoseconds) / 1e9)
}

func runBenchmarks() {
    let base = 0 ..< 1_000_000

    let (cmR, cmT) = benchmark { base.lazy.compactMap({ $0 % 4 == 0 ? nil : $0 }).reduce(0, +) }
    let (fmR, fmT) = benchmark { base.lazy.filter({ $0 % 4 != 0 }).map({ $0 }).reduce(0, +) }
    let ratio = Int((cmT/fmT).rounded()) // <-- Added this!

    print("compactMap: \(cmR), took \(cmT) s")
    print(" filterMap: \(fmR), took \(fmT) s")
    print("compactMap is \(ratio) times slower than filterMap") // <-- And this!
}
runBenchmarks()

Compiling and running:

$ /Library/Developer/Toolchains/swift-DEVELOPMENT-SNAPSHOT-2018-06-08-a.xctoolchain/usr/bin/swiftc -O -whole-module-optimization -swift-version 4.2 -static-stdlib -sdk /Applications/Xcode-beta.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk -target x86_64-apple-macosx10.13 test.swift
$ ./test
compactMap: 375000000000, took 0.324593056 s
 filterMap: 375000000000, took 0.00109638 s
compactMap is 296 times slower than filterMap
$ ./test
compactMap: 375000000000, took 0.332787167 s
 filterMap: 375000000000, took 0.001176485 s
compactMap is 283 times slower than filterMap
$ ./test
compactMap: 375000000000, took 0.323394991 s
 filterMap: 375000000000, took 0.001097987 s
compactMap is 295 times slower than filterMap

But wait a second! Why has filterMap suddenly become almost 2 times slower than it was before?

In Program 1 it took 0.0006 seconds but now it takes 0.0011 seconds. What's going on?

Inspecting the disassembly of Program 1 and 2 to explain the details behind this is left as an exercise for the reader.


Program 3

As strange as the above issue might seem, there's an equally strange workaround, which is to just replace the string interpolation with a separate argument for ratio:

import Dispatch

func benchmark<T>(_ function: () -> T) -> (T, Double) {
    let start = DispatchTime.now()
    let res = function()
    let end = DispatchTime.now()
    return (res, Double(end.uptimeNanoseconds - start.uptimeNanoseconds) / 1e9)
}

func runBenchmarks() {
    let base = 0 ..< 1_000_000

    let (cmR, cmT) = benchmark { base.lazy.compactMap({ $0 % 4 == 0 ? nil : $0 }).reduce(0, +) }
    let (fmR, fmT) = benchmark { base.lazy.filter({ $0 % 4 != 0 }).map({ $0 }).reduce(0, +) }
    let ratio = Int((cmT/fmT).rounded())

    print("compactMap: \(cmR), took \(cmT) s")
    print(" filterMap: \(fmR), took \(fmT) s")
    print("compactMap is", ratio, "times slower than filterMap") // <-- Replaced string interpolated ratio with this!
}
runBenchmarks()

Compile and run:

$ /Library/Developer/Toolchains/swift-DEVELOPMENT-SNAPSHOT-2018-06-08-a.xctoolchain/usr/bin/swiftc -O -whole-module-optimization -swift-version 4.2 -static-stdlib -sdk /Applications/Xcode-beta.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk -target x86_64-apple-macosx10.13 test.swift
$ ./test
compactMap: 375000000000, took 0.322355613 s
 filterMap: 375000000000, took 0.000548265 s
compactMap is 588 times slower than filterMap
$ ./test
compactMap: 375000000000, took 0.324320271 s
 filterMap: 375000000000, took 0.000548251 s
compactMap is 592 times slower than filterMap
$ ./test
compactMap: 375000000000, took 0.323314934 s
 filterMap: 375000000000, took 0.00062654 s
compactMap is 516 times slower than filterMap
$ 

Ah! Now filterMap takes 0.0006 seconds again, so compactMap is about 550 times slower than filterMap, and not just 300 times slower as in Program 2.


Program 4

Let's add a small improvement: Instead of running the program several times ourselves, we'll make the program do it for us:

import Dispatch

func benchmark<T>(_ function: () -> T) -> (T, Double) {
    let start = DispatchTime.now()
    let res = function()
    let end = DispatchTime.now()
    return (res, Double(end.uptimeNanoseconds - start.uptimeNanoseconds) / 1e9)
}

func runBenchmarks() {
    let base = 0 ..< 1_000_000

    for _ in 0 ..< 5 { // <-- Added this for-in loop!
        let (cmR, cmT) = benchmark { base.lazy.compactMap({ $0 % 4 == 0 ? nil : $0 }).reduce(0, +) }
        let (fmR, fmT) = benchmark { base.lazy.filter({ $0 % 4 != 0 }).map({ $0 }).reduce(0, +) }
        let ratio = Int((cmT/fmT).rounded())

        print("compactMap: \(cmR), took \(cmT) s")
        print(" filterMap: \(fmR), took \(fmT) s")
        print("compactMap is", ratio, "times slower than filterMap")
    }
}
runBenchmarks()

Compile and run (once):

$ /Library/Developer/Toolchains/swift-DEVELOPMENT-SNAPSHOT-2018-06-08-a.xctoolchain/usr/bin/swiftc -O -whole-module-optimization -swift-version 4.2 -static-stdlib -sdk /Applications/Xcode-beta.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk -target x86_64-apple-macosx10.13 test.swift
$ ./test
compactMap: 375000000000, took 0.32222596 s
 filterMap: 375000000000, took 3.8e-08 s
compactMap is 8479631 times slower than filterMap
compactMap: 375000000000, took 0.32233228 s
 filterMap: 375000000000, took 3.1e-08 s
compactMap is 10397815 times slower than filterMap
compactMap: 375000000000, took 0.320527889 s
 filterMap: 375000000000, took 3.1e-08 s
compactMap is 10339609 times slower than filterMap
compactMap: 375000000000, took 0.320475311 s
 filterMap: 375000000000, took 3.1e-08 s
compactMap is 10337913 times slower than filterMap
compactMap: 375000000000, took 0.329549344 s
 filterMap: 375000000000, took 3.2e-08 s
compactMap is 10298417 times slower than filterMap

Wow! filterMap now takes essentially no time at all (I guess it got optimized away somehow) which makes compactMap 10 million times slower than filterMap!


Program 5

As it turns out, this optimization will not happen if base is moved to top level:

import Dispatch

func benchmark<T>(_ function: () -> T) -> (T, Double) {
    let start = DispatchTime.now()
    let res = function()
    let end = DispatchTime.now()
    return (res, Double(end.uptimeNanoseconds - start.uptimeNanoseconds) / 1e9)
}

let base = 0 ..< 1_000_000 // <-- Moved this out of runBenchmarks().

func runBenchmarks() {
    for _ in 0 ..< 5 {
        let (cmR, cmT) = benchmark { base.lazy.compactMap({ $0 % 4 == 0 ? nil : $0 }).reduce(0, +) }
        let (fmR, fmT) = benchmark { base.lazy.filter({ $0 % 4 != 0 }).map({ $0 }).reduce(0, +) }
        let ratio = Int((cmT/fmT).rounded())

        print("compactMap: \(cmR), took \(cmT) s")
        print(" filterMap: \(fmR), took \(fmT) s")
        print("compactMap is", ratio, "times slower than filterMap")
    }
}
runBenchmarks()

Compile and run:

$ /Library/Developer/Toolchains/swift-DEVELOPMENT-SNAPSHOT-2018-06-08-a.xctoolchain/usr/bin/swiftc -O -whole-module-optimization -swift-version 4.2 -static-stdlib -sdk /Applications/Xcode-beta.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk -target x86_64-apple-macosx10.13 test.swift
$ ./test
compactMap: 375000000000, took 0.319917178 s
 filterMap: 375000000000, took 0.001174697 s
compactMap is 272 times slower than filterMap
compactMap: 375000000000, took 0.324600748 s
 filterMap: 375000000000, took 0.001174689 s
compactMap is 276 times slower than filterMap
compactMap: 375000000000, took 0.318163121 s
 filterMap: 375000000000, took 0.001174688 s
compactMap is 271 times slower than filterMap
compactMap: 375000000000, took 0.3232251 s
 filterMap: 375000000000, took 0.001174688 s
compactMap is 275 times slower than filterMap
compactMap: 375000000000, took 0.318713242 s
 filterMap: 375000000000, took 0.001174699 s
compactMap is 271 times slower than filterMap

OK. But look! filterMap is back at 0.0011s (or rather 0.0012 s) and not 0.0006 s which we know it can be.


Program 6

The string-interpolation-trick comes to rescue, but now in reverse (by changing ratio back into a string interpolation this time):

import Dispatch

func benchmark<T>(_ function: () -> T) -> (T, Double) {
    let start = DispatchTime.now()
    let res = function()
    let end = DispatchTime.now()
    return (res, Double(end.uptimeNanoseconds - start.uptimeNanoseconds) / 1e9)
}

let base = 0 ..< 1_000_000

func runBenchmarks() {
    for _ in 0 ..< 5 {
        let (cmR, cmT) = benchmark { base.lazy.compactMap({ $0 % 4 == 0 ? nil : $0 }).reduce(0, +) }
        let (fmR, fmT) = benchmark { base.lazy.filter({ $0 % 4 != 0 }).map({ $0 }).reduce(0, +) }
        let ratio = Int((cmT/fmT).rounded())

        print("compactMap: \(cmR), took \(cmT) s")
        print(" filterMap: \(fmR), took \(fmT) s")
        print("compactMap is \(ratio) times slower than filterMap") // <-- Undid the change we made in Program 3 here.
    }
}
runBenchmarks()

Compile and run:

$ /Library/Developer/Toolchains/swift-DEVELOPMENT-SNAPSHOT-2018-06-08-a.xctoolchain/usr/bin/swiftc -O -whole-module-optimization -swift-version 4.2 -static-stdlib -sdk /Applications/Xcode-beta.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk -target x86_64-apple-macosx10.13 test.swift
$ ./test
compactMap: 375000000000, took 0.322214887 s
 filterMap: 375000000000, took 0.000548238 s
compactMap is 588 times slower than filterMap
compactMap: 375000000000, took 0.320392592 s
 filterMap: 375000000000, took 0.000548245 s
compactMap is 584 times slower than filterMap
compactMap: 375000000000, took 0.319095109 s
 filterMap: 375000000000, took 0.000548243 s
compactMap is 582 times slower than filterMap
compactMap: 375000000000, took 0.325831477 s
 filterMap: 375000000000, took 0.000548246 s
compactMap is 594 times slower than filterMap
compactMap: 375000000000, took 0.320049229 s
 filterMap: 375000000000, took 0.000572327 s
compactMap is 559 times slower than filterMap

OK.


We'll stop here.

But note that we didn't touch the actual code being benchmarked, nor the benchmark function, the changes were only in the surrounding code.


The reason I went to the trouble of writing this post is to show what, according to my experience, is the rule rather than the exception, although you might not see it unless you're trying hard to optimize some performance critical part of your code.

So this was just a very small taste. In any given situation, there are all sorts of seemingly irrelevant but performance-impacting little changes that can be made, eg adding or removing a number of spaces at the start of a string used only for reporting the result of a test, switching between a for-in loop and the equivalent while loop, wrapping some piece of code in an immediately invoked closure (which can make that section of code faster in some specific contexts), &+ sometimes being slower than + etc, etc.


Note also that these issues are impossible to represent/test/benchmark in eg XCTest and The Swift Benchmarking Suite. Why? Because the issues are all about context, and they are interacting in various funny ways as shown by eg the string-interpolation-trick working in reverse or not depending on whether base was declared in top-level or not. XCTest and The Swift Benchmarking Suite forces some layer(s) of context on the tested code, and thus their harness becomes part of the tested code.

18 Likes

I ran these tests using

Apple Swift version 4.1.2 (swiftlang-902.0.54 clang-902.0.39.2)
Target: x86_64-apple-darwin17.6.0

building with just swiftc -O. Here are my results (numbered the same way yours are):

$ ./main1
compactMap: 375000000000, took 0.289310122 s
 filterMap: 375000000000, took 0.000821735 s

$ ./main2
compactMap: 375000000000, took 0.316243825 s
 filterMap: 375000000000, took 0.000814479 s
compactMap is 388 times slower than filterMap

$ ./main3
compactMap: 375000000000, took 0.288270726 s
 filterMap: 375000000000, took 0.000808948 s
compactMap is 356 times slower than filterMap

$ ./main4
compactMap: 375000000000, took 0.288978948 s
 filterMap: 375000000000, took 0.001179241 s
compactMap is 245 times slower than filterMap
compactMap: 375000000000, took 0.295771218 s
 filterMap: 375000000000, took 0.00117926 s
compactMap is 251 times slower than filterMap
compactMap: 375000000000, took 0.298000228 s
 filterMap: 375000000000, took 0.001179243 s
compactMap is 253 times slower than filterMap
compactMap: 375000000000, took 0.287790936 s
 filterMap: 375000000000, took 0.00118008 s
compactMap is 244 times slower than filterMap
compactMap: 375000000000, took 0.290857785 s
 filterMap: 375000000000, took 0.001179366 s
compactMap is 247 times slower than filterMap

$ ./main5
compactMap: 375000000000, took 0.291611994 s
 filterMap: 375000000000, took 0.001179274 s
compactMap is 247 times slower than filterMap
compactMap: 375000000000, took 0.299940731 s
 filterMap: 375000000000, took 0.001179274 s
compactMap is 254 times slower than filterMap
compactMap: 375000000000, took 0.2925131 s
 filterMap: 375000000000, took 0.001179412 s
compactMap is 248 times slower than filterMap
compactMap: 375000000000, took 0.29145278 s
 filterMap: 375000000000, took 0.001179271 s
compactMap is 247 times slower than filterMap
compactMap: 375000000000, took 0.297499215 s
 filterMap: 375000000000, took 0.001179322 s
compactMap is 252 times slower than filterMap

$ ./main6
compactMap: 375000000000, took 0.309454594 s
 filterMap: 375000000000, took 0.000833356 s
compactMap is 371 times slower than filterMap
compactMap: 375000000000, took 0.319849605 s
 filterMap: 375000000000, took 0.000808931 s
compactMap is 395 times slower than filterMap
compactMap: 375000000000, took 0.31500959 s
 filterMap: 375000000000, took 0.000808922 s
compactMap is 389 times slower than filterMap
compactMap: 375000000000, took 0.298115479 s
 filterMap: 375000000000, took 0.000808922 s
compactMap is 369 times slower than filterMap
compactMap: 375000000000, took 0.300710155 s
 filterMap: 375000000000, took 0.000808922 s
compactMap is 372 times slower than filterMap

so it appears that I can't reproduce the slowdown between 1, 2, and 3, but the performance regression comes in for 4 and 5 and is again resolved in 6.

More interesting results (just for test 6):

$ swiftc -O -whole-module-optimization  -static-stdlib  -target x86_64-apple-macosx10.13 -o main6 main.swift
$ ./main6
compactMap: 375000000000, took 0.295983463 s
 filterMap: 375000000000, took 0.001179248 s
compactMap is 251 times slower than filterMap
compactMap: 375000000000, took 0.292354945 s
 filterMap: 375000000000, took 0.001179273 s
compactMap is 248 times slower than filterMap
compactMap: 375000000000, took 0.294831198 s
 filterMap: 375000000000, took 0.001179264 s
compactMap is 250 times slower than filterMap
compactMap: 375000000000, took 0.300107382 s
 filterMap: 375000000000, took 0.001179257 s
compactMap is 254 times slower than filterMap
compactMap: 375000000000, took 0.292348202 s
 filterMap: 375000000000, took 0.001179251 s
compactMap is 248 times slower than filterMap

$ swiftc -O -o main6 main.swift
$ ./main6
compactMap: 375000000000, took 0.304550801 s
 filterMap: 375000000000, took 0.000808928 s
compactMap is 376 times slower than filterMap
compactMap: 375000000000, took 0.307866093 s
 filterMap: 375000000000, took 0.000808922 s
compactMap is 381 times slower than filterMap
compactMap: 375000000000, took 0.299180831 s
 filterMap: 375000000000, took 0.000808924 s
compactMap is 370 times slower than filterMap
compactMap: 375000000000, took 0.30244097 s
 filterMap: 375000000000, took 0.000810153 s
compactMap is 373 times slower than filterMap
compactMap: 375000000000, took 0.302916451 s
 filterMap: 375000000000, took 0.000808916 s
compactMap is 374 times slower than filterMap

Further testing shows it's the static-stdlib that's causing it.

Yes : ) As I mentioned in the original post, the specific effects of these slight-code-changes-things are highly dependent on all sorts of context, including compiler version, etc. And they will interact so that the same change might have the opposite effect or not given a certain context (see below, and eg the above mentioned "string-interpolation-trick" having opposite effect in Program 3 and Program 6, given that specific compiler version and flags of course).


Regarding -static-stdlib, I used that flag (among others) in my examples because Xcode does that by default, and I wanted the results of my examples to match what you'd get if using Xcode instead of the swiftc command. (Using Xcode in this case means doing a Release Build within Xcode 10 beta, with development snapshot 2018-06-08, using a command line app prj, default build settings).


As you demonstrated above:

When using the default toolchain of Xcode 9.4, ie:

$ swiftc --version
Apple Swift version 4.1.2 (swiftlang-902.0.54 clang-902.0.39.2)
Target: x86_64-apple-darwin17.5.0

Then, for Program 6, including -static-stdlib will make filterMap slower.

But! Note that for Program 1, -static-stdlib will have the opposite effect, ie not including it will make filterMap slower.

Detailed demonstration
$ swiftc --version
Apple Swift version 4.1.2 (swiftlang-902.0.54 clang-902.0.39.2)
Target: x86_64-apple-darwin17.5.0
$
$ swiftc -O Program6.swift
$ ./Program6
compactMap: 375000000000, took 0.304990868 s
 filterMap: 375000000000, took 0.001176455 s
compactMap is 259 times slower than filterMap
compactMap: 375000000000, took 0.302824142 s
 filterMap: 375000000000, took 0.000861471 s
compactMap is 352 times slower than filterMap
compactMap: 375000000000, took 0.303165661 s
 filterMap: 375000000000, took 0.000861489 s
compactMap is 352 times slower than filterMap
compactMap: 375000000000, took 0.302871331 s
 filterMap: 375000000000, took 0.00086148 s
compactMap is 352 times slower than filterMap
compactMap: 375000000000, took 0.302811901 s
 filterMap: 375000000000, took 0.000861474 s
compactMap is 352 times slower than filterMap
$
$ swiftc -O -static-stdlib Program6.swift
$ ./Program6
compactMap: 375000000000, took 0.311002296 s
 filterMap: 375000000000, took 0.001297683 s
compactMap is 240 times slower than filterMap
compactMap: 375000000000, took 0.304119449 s
 filterMap: 375000000000, took 0.001255856 s
compactMap is 242 times slower than filterMap
compactMap: 375000000000, took 0.304179837 s
 filterMap: 375000000000, took 0.001253053 s
compactMap is 243 times slower than filterMap
compactMap: 375000000000, took 0.308980026 s
 filterMap: 375000000000, took 0.001255858 s
compactMap is 246 times slower than filterMap
compactMap: 375000000000, took 0.304566472 s
 filterMap: 375000000000, took 0.001255855 s
compactMap is 243 times slower than filterMap
$ 
$ swiftc -O Program1.swift
$ ./Program1
compactMap: 375000000000, took 0.308087019 s
 filterMap: 375000000000, took 0.00086148 s
$ swiftc -O -static-stdlib Program1.swift
$ ./Program1
compactMap: 375000000000, took 0.307667413 s
 filterMap: 375000000000, took 0.001255829 s

So it's not as easy as "X will make Y slower", it's more like "X will make Y slower or faster depending on A, B, C, D, ...", and I guess this means that there might be situations where some specific performance-issue seems to have been "fixed" in a new version of the compiler, when it's really just because of some change in the interference of these things, and the same issue is still there, only in a different shape, manifesting itself in slightly different contexts.

As someone who's looking to use Swift for some high-performance computations, this scares the hell out of me.

4 Likes

Nonetheless, we can still write regression tests for these cases in the optimizer's own test suite, to ensure that certain abstractions get broken down by the optimizer.

3 Likes

Ah, OK. Can the optimizer's test suite be found anywhere under GitHub - apple/swift: The Swift Programming Language ? (I tried but couldn't find it.)


Is there some strategy for dealing with cases like these, except trusting that enough people will find the urge to perform enough experiments like this and file enough bugs?


Assuming that there are more people like me out there, ie non-compiler hackers yearning for increased efficiency and predictability of the optimizer, willing to spend some but not too much of their time helping to improve the situation in some way, what are some recommended ways to contribute?

(I wish there is some other way than filing individual bugs for each specific case we can find (for the most recent snapshot I guess), which would for example mean at least 6 bugs for the little demonstration in the OP.)

2 Likes

cc @Erik_Eckstein. Filing bugs for the individual issues you encounter is a good idea regardless. Ideally, we'd be able to use your examples as-is as regression tests to validate that compiler transforms continue to work on these abstractions without breakage due to standard library changes or changes elsewhere in the optimizer. Erik, would it be possible for the benchmark suite to work with ad-hoc tests like Jens' here without boilerplate, since at least some of these problems are context-dependent?

3 Likes

My suggestion would be to commit specific a-b benchmarks and then file an SR that states that the performance should be the same. Then we can validate when it is fixed and then maintain that performance over time.

2 Likes

Let me try to tackle this as a higher-level point. There are optimizations that are inherently sensitive to the exact code sequence in the program, potentially including contextual information. These will always come across as "unpredictable" in the sense that most users won't intuitively understand why their program got slower; I don't think that's a good reason to not pursue them. Instead, I think we should empower users to take more control over the performance of their program, in the following ways:

  • We should try to ensure that the optimizer isn't more powerful than the user: it should generally be possible to do important high-level optimizations (like minimizing ARC operations or specializing a generic algorithm) in source code rather than relying on the optimizer to independently discover the opportunity.

  • We should publish "optimization remarks" so that users can see what optimizations have been applied to their code, assuming this is possible to do without overwhelming them; they can they use this in conjunction with the first point to reclaim performance that seems to be lost "unpredictably".

Part of the goal of the ownership features is to provide better tools for addressing the first point.

20 Likes

I was just gonna say the same when I read your first point xD
I agree, most of what's related to memory overhead can be optimized with ownership by the user :slight_smile:

I would in general however, try to stay away from telling the users to "optimize their code" if that would imply a less readable or less maintainable code. I'm not talking about ownership here, but rather about "hey, this struct layout hits an optimizer bug so try to go for this layout instead". When you have to change the meaning of your code to less precisely fit your algorithm, that's when things tend to go wrong from what I've seen.

3 Likes

Right, I absolutely agree that we should continue to aim for the optimizer to optimize more cases. Test cases for that are always welcome.

The sorts of annotations we're talking about for ownership will hopefully not be too onerous to adopt in natural code, though.

4 Likes

I think that would be most useful as a delta between revisions: "[piece of code] can no longer be optimized because [reason]".

1 Like

All the of the commentary here is great, and my subsequent comments are not intended to detract from it.

However, I do want to make a couple of points.

In my opinion, the compiler should be better than the developer at optimising. The set of assembly language routines I can write that run faster than compiled code has reduced to almost zero over the years to the point where the only ones left are small and dependant on contextual knowledge not captured in code. Long may that trend continue. Without me having to decorate code with annotations.

Finally, whilst these results should be addressed... my experience in writing OysterKit starting in Swift 1 through to now is that release after release performance has improved. Sometimes dramatically. These results look worry-some, but with such a small footprint any fluctuations are exaggerated. Across a complete application or framework my experience (and measurements) tell me things just get better release after release.

1 Like

I am also worried about the number of annotations being added; it is a complete anathama for an easy to learn language to require annotations. I don’t buy that annotations are part of progressive disclosure when there so many annotations.

I am also concerned that annotations are becoming baked into the ABI. For example @inlineable should not be part of the ABI; ideally it would not exist, it’s the compiler’s job.

3 Likes

Which attributes are required? I guess there's @escaping, then a few Objective-C and other interop ones (@objc, @IB*). What would the number of attributes have to do with whether they are progressively disclosed? Which algorithm should the compiler use to predict the future so it can work out which parts of one module are safe to inline into a different module?

Strictly speaking, I think these are not attributes of the swift language itself, but of an Apple specific extension thereof, right?

Inlining is the compiler's job, but it's normally impossible to inline across ABI boundaries, and modules with long-term ABI compatibility concerns may not want inlining of old implementations to happen, so it's necessary for binary frameworks to opt in to allowing inlining. As Swift's build system improves to allow for cross-module optimization as part of its normal build process, @inlinable will become irrelevant for most Swift users.

I agree with your general sentiment that relying on annotations is unreasonable. I think there's a bare minimum expected level of optimization that we still aren't reliably reaching yet, and we should work on filling those gaps before dreaming up new language features.

8 Likes

Two points re inlining:

  1. You link your apps against known versions of the library (except for Apple's system libraries - which I will come back to) and therefore old code inlining is not a problem for non-system-3rd-party libraries. The compiler can decide that a function is potentially inlineable and automatically add @inlineable (and supply the source/AST/SIL/LLVM to inline) when compiling the library.
  2. Apple supplied system libraries get changed from under you, i.e. a system software update happens. In this case I think a better solution is an on device relinking of the code including inlining. This works very well in the Java world including Android. An advantage of having control over the whole of the stack is that this is possible for Apple to do also.

That's what I meant, sorry I wasn't clear. For most user modules, @inlinable should be irrelevant in the fullness of time.

That would be an interesting thing for Apple to explore, though there are many tradeoffs. "Works well" is subjective given Android's poor memory and energy efficiency compared to iOS. Apple's platform is fairly aggressively optimized for AOT compilation, and it's easier to optimize memory usage by sharing pages from dynamic libraries and pre-linking the system libraries into the shared cache. Some of that benefit could still be preserved with on-device recompilation, but the more you inline and specialize library code per application, the more per-application code size you pay for and less systemic savings you get.

2 Likes