Swift's unpredictable efficiency

I ran these tests using

Apple Swift version 4.1.2 (swiftlang-902.0.54 clang-902.0.39.2)
Target: x86_64-apple-darwin17.6.0

building with just swiftc -O. Here are my results (numbered the same way yours are):

$ ./main1
compactMap: 375000000000, took 0.289310122 s
 filterMap: 375000000000, took 0.000821735 s

$ ./main2
compactMap: 375000000000, took 0.316243825 s
 filterMap: 375000000000, took 0.000814479 s
compactMap is 388 times slower than filterMap

$ ./main3
compactMap: 375000000000, took 0.288270726 s
 filterMap: 375000000000, took 0.000808948 s
compactMap is 356 times slower than filterMap

$ ./main4
compactMap: 375000000000, took 0.288978948 s
 filterMap: 375000000000, took 0.001179241 s
compactMap is 245 times slower than filterMap
compactMap: 375000000000, took 0.295771218 s
 filterMap: 375000000000, took 0.00117926 s
compactMap is 251 times slower than filterMap
compactMap: 375000000000, took 0.298000228 s
 filterMap: 375000000000, took 0.001179243 s
compactMap is 253 times slower than filterMap
compactMap: 375000000000, took 0.287790936 s
 filterMap: 375000000000, took 0.00118008 s
compactMap is 244 times slower than filterMap
compactMap: 375000000000, took 0.290857785 s
 filterMap: 375000000000, took 0.001179366 s
compactMap is 247 times slower than filterMap

$ ./main5
compactMap: 375000000000, took 0.291611994 s
 filterMap: 375000000000, took 0.001179274 s
compactMap is 247 times slower than filterMap
compactMap: 375000000000, took 0.299940731 s
 filterMap: 375000000000, took 0.001179274 s
compactMap is 254 times slower than filterMap
compactMap: 375000000000, took 0.2925131 s
 filterMap: 375000000000, took 0.001179412 s
compactMap is 248 times slower than filterMap
compactMap: 375000000000, took 0.29145278 s
 filterMap: 375000000000, took 0.001179271 s
compactMap is 247 times slower than filterMap
compactMap: 375000000000, took 0.297499215 s
 filterMap: 375000000000, took 0.001179322 s
compactMap is 252 times slower than filterMap

$ ./main6
compactMap: 375000000000, took 0.309454594 s
 filterMap: 375000000000, took 0.000833356 s
compactMap is 371 times slower than filterMap
compactMap: 375000000000, took 0.319849605 s
 filterMap: 375000000000, took 0.000808931 s
compactMap is 395 times slower than filterMap
compactMap: 375000000000, took 0.31500959 s
 filterMap: 375000000000, took 0.000808922 s
compactMap is 389 times slower than filterMap
compactMap: 375000000000, took 0.298115479 s
 filterMap: 375000000000, took 0.000808922 s
compactMap is 369 times slower than filterMap
compactMap: 375000000000, took 0.300710155 s
 filterMap: 375000000000, took 0.000808922 s
compactMap is 372 times slower than filterMap

so it appears that I can't reproduce the slowdown between 1, 2, and 3, but the performance regression comes in for 4 and 5 and is again resolved in 6.

More interesting results (just for test 6):

$ swiftc -O -whole-module-optimization  -static-stdlib  -target x86_64-apple-macosx10.13 -o main6 main.swift
$ ./main6
compactMap: 375000000000, took 0.295983463 s
 filterMap: 375000000000, took 0.001179248 s
compactMap is 251 times slower than filterMap
compactMap: 375000000000, took 0.292354945 s
 filterMap: 375000000000, took 0.001179273 s
compactMap is 248 times slower than filterMap
compactMap: 375000000000, took 0.294831198 s
 filterMap: 375000000000, took 0.001179264 s
compactMap is 250 times slower than filterMap
compactMap: 375000000000, took 0.300107382 s
 filterMap: 375000000000, took 0.001179257 s
compactMap is 254 times slower than filterMap
compactMap: 375000000000, took 0.292348202 s
 filterMap: 375000000000, took 0.001179251 s
compactMap is 248 times slower than filterMap

$ swiftc -O -o main6 main.swift
$ ./main6
compactMap: 375000000000, took 0.304550801 s
 filterMap: 375000000000, took 0.000808928 s
compactMap is 376 times slower than filterMap
compactMap: 375000000000, took 0.307866093 s
 filterMap: 375000000000, took 0.000808922 s
compactMap is 381 times slower than filterMap
compactMap: 375000000000, took 0.299180831 s
 filterMap: 375000000000, took 0.000808924 s
compactMap is 370 times slower than filterMap
compactMap: 375000000000, took 0.30244097 s
 filterMap: 375000000000, took 0.000810153 s
compactMap is 373 times slower than filterMap
compactMap: 375000000000, took 0.302916451 s
 filterMap: 375000000000, took 0.000808916 s
compactMap is 374 times slower than filterMap

Further testing shows it's the static-stdlib that's causing it.

Yes : ) As I mentioned in the original post, the specific effects of these slight-code-changes-things are highly dependent on all sorts of context, including compiler version, etc. And they will interact so that the same change might have the opposite effect or not given a certain context (see below, and eg the above mentioned "string-interpolation-trick" having opposite effect in Program 3 and Program 6, given that specific compiler version and flags of course).


Regarding -static-stdlib, I used that flag (among others) in my examples because Xcode does that by default, and I wanted the results of my examples to match what you'd get if using Xcode instead of the swiftc command. (Using Xcode in this case means doing a Release Build within Xcode 10 beta, with development snapshot 2018-06-08, using a command line app prj, default build settings).


As you demonstrated above:

When using the default toolchain of Xcode 9.4, ie:

$ swiftc --version
Apple Swift version 4.1.2 (swiftlang-902.0.54 clang-902.0.39.2)
Target: x86_64-apple-darwin17.5.0

Then, for Program 6, including -static-stdlib will make filterMap slower.

But! Note that for Program 1, -static-stdlib will have the opposite effect, ie not including it will make filterMap slower.

Detailed demonstration
$ swiftc --version
Apple Swift version 4.1.2 (swiftlang-902.0.54 clang-902.0.39.2)
Target: x86_64-apple-darwin17.5.0
$
$ swiftc -O Program6.swift
$ ./Program6
compactMap: 375000000000, took 0.304990868 s
 filterMap: 375000000000, took 0.001176455 s
compactMap is 259 times slower than filterMap
compactMap: 375000000000, took 0.302824142 s
 filterMap: 375000000000, took 0.000861471 s
compactMap is 352 times slower than filterMap
compactMap: 375000000000, took 0.303165661 s
 filterMap: 375000000000, took 0.000861489 s
compactMap is 352 times slower than filterMap
compactMap: 375000000000, took 0.302871331 s
 filterMap: 375000000000, took 0.00086148 s
compactMap is 352 times slower than filterMap
compactMap: 375000000000, took 0.302811901 s
 filterMap: 375000000000, took 0.000861474 s
compactMap is 352 times slower than filterMap
$
$ swiftc -O -static-stdlib Program6.swift
$ ./Program6
compactMap: 375000000000, took 0.311002296 s
 filterMap: 375000000000, took 0.001297683 s
compactMap is 240 times slower than filterMap
compactMap: 375000000000, took 0.304119449 s
 filterMap: 375000000000, took 0.001255856 s
compactMap is 242 times slower than filterMap
compactMap: 375000000000, took 0.304179837 s
 filterMap: 375000000000, took 0.001253053 s
compactMap is 243 times slower than filterMap
compactMap: 375000000000, took 0.308980026 s
 filterMap: 375000000000, took 0.001255858 s
compactMap is 246 times slower than filterMap
compactMap: 375000000000, took 0.304566472 s
 filterMap: 375000000000, took 0.001255855 s
compactMap is 243 times slower than filterMap
$ 
$ swiftc -O Program1.swift
$ ./Program1
compactMap: 375000000000, took 0.308087019 s
 filterMap: 375000000000, took 0.00086148 s
$ swiftc -O -static-stdlib Program1.swift
$ ./Program1
compactMap: 375000000000, took 0.307667413 s
 filterMap: 375000000000, took 0.001255829 s

So it's not as easy as "X will make Y slower", it's more like "X will make Y slower or faster depending on A, B, C, D, ...", and I guess this means that there might be situations where some specific performance-issue seems to have been "fixed" in a new version of the compiler, when it's really just because of some change in the interference of these things, and the same issue is still there, only in a different shape, manifesting itself in slightly different contexts.

As someone who's looking to use Swift for some high-performance computations, this scares the hell out of me.

4 Likes

Nonetheless, we can still write regression tests for these cases in the optimizer's own test suite, to ensure that certain abstractions get broken down by the optimizer.

3 Likes

Ah, OK. Can the optimizer's test suite be found anywhere under GitHub - apple/swift: The Swift Programming Language ? (I tried but couldn't find it.)


Is there some strategy for dealing with cases like these, except trusting that enough people will find the urge to perform enough experiments like this and file enough bugs?


Assuming that there are more people like me out there, ie non-compiler hackers yearning for increased efficiency and predictability of the optimizer, willing to spend some but not too much of their time helping to improve the situation in some way, what are some recommended ways to contribute?

(I wish there is some other way than filing individual bugs for each specific case we can find (for the most recent snapshot I guess), which would for example mean at least 6 bugs for the little demonstration in the OP.)

2 Likes

cc @Erik_Eckstein. Filing bugs for the individual issues you encounter is a good idea regardless. Ideally, we'd be able to use your examples as-is as regression tests to validate that compiler transforms continue to work on these abstractions without breakage due to standard library changes or changes elsewhere in the optimizer. Erik, would it be possible for the benchmark suite to work with ad-hoc tests like Jens' here without boilerplate, since at least some of these problems are context-dependent?

3 Likes

My suggestion would be to commit specific a-b benchmarks and then file an SR that states that the performance should be the same. Then we can validate when it is fixed and then maintain that performance over time.

2 Likes

Let me try to tackle this as a higher-level point. There are optimizations that are inherently sensitive to the exact code sequence in the program, potentially including contextual information. These will always come across as "unpredictable" in the sense that most users won't intuitively understand why their program got slower; I don't think that's a good reason to not pursue them. Instead, I think we should empower users to take more control over the performance of their program, in the following ways:

  • We should try to ensure that the optimizer isn't more powerful than the user: it should generally be possible to do important high-level optimizations (like minimizing ARC operations or specializing a generic algorithm) in source code rather than relying on the optimizer to independently discover the opportunity.

  • We should publish "optimization remarks" so that users can see what optimizations have been applied to their code, assuming this is possible to do without overwhelming them; they can they use this in conjunction with the first point to reclaim performance that seems to be lost "unpredictably".

Part of the goal of the ownership features is to provide better tools for addressing the first point.

20 Likes

I was just gonna say the same when I read your first point xD
I agree, most of what's related to memory overhead can be optimized with ownership by the user :slight_smile:

I would in general however, try to stay away from telling the users to "optimize their code" if that would imply a less readable or less maintainable code. I'm not talking about ownership here, but rather about "hey, this struct layout hits an optimizer bug so try to go for this layout instead". When you have to change the meaning of your code to less precisely fit your algorithm, that's when things tend to go wrong from what I've seen.

3 Likes

Right, I absolutely agree that we should continue to aim for the optimizer to optimize more cases. Test cases for that are always welcome.

The sorts of annotations we're talking about for ownership will hopefully not be too onerous to adopt in natural code, though.

4 Likes

I think that would be most useful as a delta between revisions: "[piece of code] can no longer be optimized because [reason]".

1 Like

All the of the commentary here is great, and my subsequent comments are not intended to detract from it.

However, I do want to make a couple of points.

In my opinion, the compiler should be better than the developer at optimising. The set of assembly language routines I can write that run faster than compiled code has reduced to almost zero over the years to the point where the only ones left are small and dependant on contextual knowledge not captured in code. Long may that trend continue. Without me having to decorate code with annotations.

Finally, whilst these results should be addressed... my experience in writing OysterKit starting in Swift 1 through to now is that release after release performance has improved. Sometimes dramatically. These results look worry-some, but with such a small footprint any fluctuations are exaggerated. Across a complete application or framework my experience (and measurements) tell me things just get better release after release.

1 Like

I am also worried about the number of annotations being added; it is a complete anathama for an easy to learn language to require annotations. I don’t buy that annotations are part of progressive disclosure when there so many annotations.

I am also concerned that annotations are becoming baked into the ABI. For example @inlineable should not be part of the ABI; ideally it would not exist, it’s the compiler’s job.

3 Likes

Which attributes are required? I guess there's @escaping, then a few Objective-C and other interop ones (@objc, @IB*). What would the number of attributes have to do with whether they are progressively disclosed? Which algorithm should the compiler use to predict the future so it can work out which parts of one module are safe to inline into a different module?

Strictly speaking, I think these are not attributes of the swift language itself, but of an Apple specific extension thereof, right?

Inlining is the compiler's job, but it's normally impossible to inline across ABI boundaries, and modules with long-term ABI compatibility concerns may not want inlining of old implementations to happen, so it's necessary for binary frameworks to opt in to allowing inlining. As Swift's build system improves to allow for cross-module optimization as part of its normal build process, @inlinable will become irrelevant for most Swift users.

I agree with your general sentiment that relying on annotations is unreasonable. I think there's a bare minimum expected level of optimization that we still aren't reliably reaching yet, and we should work on filling those gaps before dreaming up new language features.

8 Likes

Two points re inlining:

  1. You link your apps against known versions of the library (except for Apple's system libraries - which I will come back to) and therefore old code inlining is not a problem for non-system-3rd-party libraries. The compiler can decide that a function is potentially inlineable and automatically add @inlineable (and supply the source/AST/SIL/LLVM to inline) when compiling the library.
  2. Apple supplied system libraries get changed from under you, i.e. a system software update happens. In this case I think a better solution is an on device relinking of the code including inlining. This works very well in the Java world including Android. An advantage of having control over the whole of the stack is that this is possible for Apple to do also.

That's what I meant, sorry I wasn't clear. For most user modules, @inlinable should be irrelevant in the fullness of time.

That would be an interesting thing for Apple to explore, though there are many tradeoffs. "Works well" is subjective given Android's poor memory and energy efficiency compared to iOS. Apple's platform is fairly aggressively optimized for AOT compilation, and it's easier to optimize memory usage by sharing pages from dynamic libraries and pre-linking the system libraries into the shared cache. Some of that benefit could still be preserved with on-device recompilation, but the more you inline and specialize library code per application, the more per-application code size you pay for and less systemic savings you get.

2 Likes

Even in the short term, @inlinable doesn't really help with this performance problem since it can only be applied to public declarations. We either need a compilation mode that allows cross module optimization or a new attribute that isn't tied to compiling in resilient mode.

1 Like