Add a blackHole function

Karl · March 23, 2021, 10:12pm

Hi!

I've recently been using the excellent swift-benchmark package: GitHub - google/swift-benchmark: A swift library to benchmark code snippets. (which I hope Apple/Swift.org will adopt one day - it needs to be fleshed-out a bit, but it's really much better than XCTest, especially when it comes to linux support).

One thing that I've found a bit awkward is that it doesn't come with a "black hole" function to prevent the compiler optimising values away. This kind of function appears to be used extensively by the standard library and compiler's own test suite, but implementing your own seems to be non-trivial and relies on insider knowledge about what the compiler can/can't do at this particular point in time.

Is this something the standard library could/should provide?

GitHub issue, and CC @dabrahams who knows more about it than I do. I just need it, and I think that benchmarking should not be considered a niche use-case. Having developers try to trick the compiler doesn't feel like a stable solution.

xwu · March 23, 2021, 10:52pm

Although there are a number of different definitions in the standard library, several are of the form:

@inline(never)
func _blackHole(_ x: Int) {}

...which must suffice if it’s used as such so extensively. I do wonder why the other implementations are more elaborate than that.

Also interesting to me is the _opaqueIdentity function.

In general these feel appropriate for XCTest perhaps (or, as you say, swift-benchmark). Not sure how I feel about having these in the standard library itself given their specific relevance for testing when Swift ships with a core library for testing.

scanon · March 24, 2021, 3:05am

Agreed; there's nothing that requires these to be in the standard library, and they're not of general use, so I don't think that they belong there. Note also that these already exist in the benchmarks in the main Swift source tree, which can either be built with Swift or function as a standalone package.

Karl · March 24, 2021, 5:23pm

I'm not sure how the existence of XCTest is relevant - surely we also support alternative benchmarking libraries? And those libraries should not be forced to depend on XCTest for fundamental functionality, IMO, nor should they have to second-guess the compiler. This feels like it should be some kind of compiler built-in; I think it should be exposed directly.

I wouldn't mind if this was part of some kind of supplementary library rather than the standard library itself, but it should have nothing to do with XCTest.

There's also the problem that, like black holes themselves, nobody knows what the evolution process for XCTest is.

xwu · March 24, 2021, 6:42pm

XCTest is a core library, like Foundation; it’s supposed to provide fundamental functionality. Supporting alternative benchmarking libraries and depending on XCTest are not mutually exclusive.

Can you explain a bit more why the “obvious” implementation of _blackHole that relies on no compiler magic is insufficient or otherwise feels like it requires compiler built-ins?

What is the difference between “some kind of supplementary library” for testing and benchmarking utilities that ships on all Swift platforms (i.e., another core library) and XCTest, which is already a core library, other than that the latter is named XCTest?

Sure, but that’s not a problem best addressed by stuffing new code elsewhere. Conway’s law is a descriptive observation, not an aspirational goal.

Karl · March 25, 2021, 12:50am

Which implementation would that be?

The version in StdlibUnitTest seems to do something funky which I can't quite make out:

_blackHole calls _blackHolePtr
_blackHolePtr calls _getPointer
_getPointer is @_silgen_name-d to getPointer. Presumably that's the getPointer function at the end of the file.
getPointer calls _opaqueIdentity
_opaqueIdentity calls _getPointer... i.e. go back to step 3. How does this not infinitely recurse?

The only other implementation I can think of is the one which calls in to an opaque C function... assuming that neither the Swift compiler nor LLVM can optimise across the language barrier. Will that always be the case? Who knows! Rust was motivated to work on it because of Firefox - and as Swift becomes a bigger part of Apple's OS, one would think that they have at least as strong a motivation to investigate cross-language LTO.

We cannot rely on the module boundary, since we already have early tests of cross-module optimisation (which is something you would certainly want to build with, in order for your benchmarks to accurately reflect the performance observed by clients of a library).

Again, all of this second-guessing and attempting to trick the compiler feels like a waste of time and effort.

This supplementary library would only provide access to what are effectively compiler built-ins. XCTest does far, far more than that.

It's not just stuffing it somewhere else. AFAIK, XCTest is supposed to be its own unit-testing and benchmarking framework: it was not designed as a library to facilitate building other frameworks (which may want to use a radically different design). I am suggesting that, for this entirely different use-case, it makes sense to create a new library rather than stuffing it in XCTest.

Alejandro · March 25, 2021, 1:33am

The _getPointer function uses https://github.com/apple/swift/blob/main/stdlib/private/StdlibUnittest/OpaqueIdentityFunctions.cpp

xwu · March 25, 2021, 1:48am

This one:

github.com

apple/swift/blob/a73a8087968f9111149073107c5242d83635107a/test/SILOptimizer/stop_after_module.swift

// RUN: %target-swift-frontend -c -o /dev/null -O -Xllvm -sil-print-after=inline %s 2>&1 | %FileCheck %s --check-prefix NOTSKIPPING
// RUN: %target-swift-frontend -emit-sil -o /dev/null -O -Xllvm -sil-print-after=inline %s 2>&1 | %FileCheck %s --check-prefix NOTSKIPPING
// RUN: %target-swift-frontend -emit-module -o /dev/null -O -Xllvm -sil-print-after=inline %s 2>&1 | %FileCheck %s --check-prefix SKIPPING

// This test ensures that we don't run the Perf Inliner after serializing a
// module, if we're stopping optimizations after serializing. We want to also
// make sure we _do_ still run the Perf Inliner when we're doing a full
// compile or emitting SIL directly.

@inline(never)
func _blackHole(_ x: Int) {}

@inlinable
public func inlinableFunction(_ x: Int) -> Int {
  return x + 1
}

public func caller() {
  _blackHole(inlinableFunction(20))
}

This file has been truncated. show original

...and this one:

github.com

apple/swift/blob/179fd3403c10f8915143dcecb8e68724b97e46b8/test/Frontend/skip-function-bodies.swift

// RUN: %empty-directory(%t)

// Check -emit-ir and -c are invalid when skipping function bodies
// RUN: not %target-swift-frontend -emit-ir %s -experimental-skip-non-inlinable-function-bodies %s 2>&1 | %FileCheck %s --check-prefix ERROR
// RUN: not %target-swift-frontend -c %s -experimental-skip-non-inlinable-function-bodies %s 2>&1 | %FileCheck %s --check-prefix ERROR
// RUN: not %target-swift-frontend -emit-ir %s -experimental-skip-non-inlinable-function-bodies-without-types %s 2>&1 | %FileCheck %s --check-prefix ERROR
// RUN: not %target-swift-frontend -c %s -experimental-skip-non-inlinable-function-bodies-without-types %s 2>&1 | %FileCheck %s --check-prefix ERROR
// RUN: not %target-swift-frontend -emit-ir %s -experimental-skip-all-function-bodies %s 2>&1 | %FileCheck %s --check-prefix ERROR
// RUN: not %target-swift-frontend -c %s -experimental-skip-all-function-bodies %s 2>&1 | %FileCheck %s --check-prefix ERROR
// ERROR: the -experimental-skip-*-function-bodies* flags do not support emitting IR

// Warn when trying to build SwiftONoneSupport with any skip enabled
// RUN: %target-swift-frontend -typecheck -experimental-skip-non-inlinable-function-bodies -module-name SwiftOnoneSupport %s 2>&1 | %FileCheck %s --check-prefix WARNING
// RUN: %target-swift-frontend -typecheck -experimental-skip-non-inlinable-function-bodies-without-types -module-name SwiftOnoneSupport %s 2>&1 | %FileCheck %s --check-prefix WARNING
// RUN: %target-swift-frontend -typecheck -experimental-skip-all-function-bodies -module-name SwiftOnoneSupport %s 2>&1 | %FileCheck %s --check-prefix WARNING
// WARNING: module 'SwiftOnoneSupport' cannot be built with any of the -experimental-skip-*-function-bodies* flags; they have been automatically disabled

// Check skipped bodies are neither typechecked nor SILgen'd
// RUN: %target-swift-frontend -emit-sil -emit-sorted-sil -experimental-skip-non-inlinable-function-bodies -debug-forbid-typecheck-prefix NEVERTYPECHECK -debug-forbid-typecheck-prefix INLINENOTYPECHECK %s -o %t/Skip.noninlinable.sil
// RUN: %target-swift-frontend -emit-sil -emit-sorted-sil -experimental-skip-non-inlinable-function-bodies-without-types -debug-forbid-typecheck-prefix NEVERTYPECHECK -debug-forbid-typecheck-prefix TYPESNOTYPECHECK %s -o %t/Skip.withouttypes.sil

This file has been truncated. show original

Karl · March 25, 2021, 2:19am

Interesting, and interesting that the compiler/stdlib have so many varying implementations.

I’m not able to find any official documentation for @inline(never). I know it doesn’t affect generic specialisation, but I’m not sure if it follows that the compiler is not allowed to omit the call, or that it must be called with a valid argument.

xwu · March 25, 2021, 3:24am

Right, that's what I'm not sure about. Perhaps what's called for here is better information about that, and if it turns out to be inadequate, consideration whether some other annotation in the same family would be necessary.

lorentey · April 1, 2021, 6:54am

I don't see why StdlibUnittest needs to have such a fancy _blackHole these days -- I suspect it may have been implemented before we had @inline(never). (In any case, it's not particularly important, and I wouldn't be surprised if over the years someone has figured out a way to write a test that relies on its current behavior. Still, feel free to submit a PR that fixes it!)

The definitions in the standard benchmark suite are far more sensible:

// Just consume the argument.
// It's important that this function is in another module than the tests
// which are using it.
@inline(never)
public func blackHole<T>(_ x: T) {
}

// Return the passed argument without letting the optimizer know that.
@inline(never)
public func identity<T>(_ x: T) -> T {
  return x
}

Feel free to emulate these in your own benchmarking code. I agree with Steve and Xiaodi above -- I think these are both way too trivial and way too specialized for inclusion in the stdlib.

scanon · April 1, 2021, 2:13pm

In the presence of cross-module optimization, @inline(never) would not be sufficient for these, but neither would anything else except for a dedicated compiler builtin or a hypothetical @optimize(never) attribute. Building benchmark drivers with CMO enabled is fundamentally weird, however, and not something I'm particularly seeing a need to support.

Karl · April 1, 2021, 3:15pm

Why is it weird? I've been building my benchmarks with CMO ever since I learned it exists. I'm using google/swift-benchmark, so it's all statically linked AFAIK. There's no way to tell the compiler to enable cross-module optimisations for certain libraries but to exclude others (right now).

Benchmarking with CMO enabled allows me to focus on the performance of the underlying algorithms, rather than worrying about fragile inlining heuristics. I'll get to tuning those things eventually, but for now it would be a distraction.

scanon · April 1, 2021, 3:26pm

Unless you write your benchmarking loops in assembly (which is what I have always done in the past if it really matters), enabling CMO will break your benchmarks repeatedly.

Karl · April 1, 2021, 5:01pm

... in the absence of a true "black hole" function, or for other reasons?

(not questioning it, just curious to learn more)

scanon · April 1, 2021, 7:06pm

I would expect fairly arbitrary breakage of benchmarks under CMO. Hoisting things that you expect to be in the measurement out of the measurement, throwing away computations that can be optimized out (this one is the "absent a true black hole operation" part), constant propagation and compile-time evaluation, etc. I think it's possible to get useful benchmark results under CMO, but it requires constant vigilance, and you cannot really expect to maintain any sort of long-term baselines.

When working with small operations that I expect to be inlined, I usually prefer to benchmark a larger computation that they can be inlined into, with some form of boundary that prevents optimization across the calls into that wrapping computation.

hassila · January 19, 2023, 5:07am

I’d like to resurrect this discussion with one new - and extremely practical oriented reason for adding this.

If wanting to godbolt a benchmark snippet it’s hard (impossible AFAIU) to get (yet another…) copy of blackhole in place - having built in support would make it possible to use godbolt.

With regard to xctest , it’s really not an option in this godbolt scenario- but also problematic on eg. macOS as it crashes when used in conjunction with jemalloc (which our benchmark driver use) and the feedback (FB11642043) is closed even though jemalloc engineers have helped conclude it seems to be an issue with the proprietary xctest driver (works on Linux…) but I digress.

Of course we’ve made our own copy of this in the benchmark package - but would be really nice to be able to godbolt.

hassila · August 24, 2023, 2:09pm

Just one data point:
It would have been good to standardise them though even if "trivial" (in addition to the Godbolt support!), at least three different places were impacted of the changes to CMO now (and who knows how many more?) with 5.8 and was broken;

Not saying it needs to be in the standard library, but it'd definitely would have been good in this case...

github.com/ordo-one/package-benchmark

fix(patch): Use @_optimize(none) instead of @inline(never) for blackHole to ensure it's not broken by cross-module optimisation in Swift 5.8

ordo-one:main ← ordo-one:fix-blackhole

opened 12:06PM - 21 Aug 23 UTC

hassila

+54 -7

## Description `blackHole()` didn't work as expected for some of our benchmar…ks, some further investigation found this commit: https://github.com/apple/swift/commit/1fceeab71e79dc96f1b6f560bf745b016d7fcdcf AFAIU this broke with Swift 5.8. Using the same approach for our benchmark and we got more expected results inline. ## How Has This Been Tested? Manual tested with internal benchmarks. ## Minimal checklist: - [x] I have performed a self-review of my own code - [x] I have added `DocC` code-level documentation for any public interfaces exported by the package - [x] I have added unit and/or integration tests that prove my fix is effective or that my feature works

and

github.com/apple/swift-collections-benchmark

Update blackHole and identity to use @_optimize(none) instead of @inline(never)

apple:main ← hassila:hassila-patch-cmo-blackhole

opened 12:46PM - 24 Aug 23 UTC

hassila

+2 -2

The new default CMO in Swift 5.8 breaks the current blackHole so most likely sho…uld fix this one up too. Background: https://github.com/apple/swift/commit/1fceeab71e79dc96f1b6f560bf745b016d7fcdcf https://github.com/ordo-one/package-benchmark/pull/179 https://forums.swift.org/t/brave-new-world-best-practices-for-cross-module-optimization/66869/3 ### Checklist - [x] I've read the [Contribution Guidelines](https://github.com/apple/swift-collections-benchmark#contributing-to-swift-collections-benchmark) - [x] My contributions are licensed under the [Swift license](https://swift.org/LICENSE.txt). - [x] I've followed the coding style of the rest of the project. - [ ] I've added tests covering my changes (if appropriate). - [x] I've verified that my change compiles and works correctly, and does not break any existing tests. - [ ] I've updated the documentation (if appropriate).

nixberg · August 24, 2023, 2:32pm

An opaque identity(:) function would be useful for constant-time operations. Like so. Ideally with stronger guarantees than Rust’s core::hint::black_box.

AlexanderM · September 7, 2023, 11:33pm

IMO performance testing is under-utilized. Part of that is surely because it’s hard to do correctly, but part of that is just the nuisance of the setup it requires.

Requiring a second module is annoying enough, but worse yet, it precludes the ability to set up a simple, single-file micro benchmark.

I’m agnostic as to where exactly a black hole function ends up, but I think it’s pretty clear that it would be a huge boost to the developer experience into have one agreed-upon implementation we can easily rely on.