I spent some time trying to understand why the results from CI and the build-script are so different from my usual local builds. I was shocked to realize the stdlib assertions are enabled in our "standard" builds as follows:
This is the default behavior when running build-script, and it's the behavior for CI tests using preset "buildbot_all_platforms,tools=RA,stdlib=RA".
This is unsettling because enabling stdlib asserts will mislead anyone working on Swift performance, particularly tuning stdlib performance, and will confuse anyone working at the SIL level looking at inlined stdlib code.
Why does the build-script default to stdlib assertions?
It's only useful for verifying correctness, once, after changing stdlib algorithms. It's wrong for most purposes and potentially terrible because it goes undetected.
How are swift compiler/stdlib developers supposed to build for benchmarking? And how are they supposed to know how to do that?
We're expecting developers to build benchmarks by default, run benchmarks against their patches, and make decisions based on the results, but we're telling them to build in a way that can't be used for benchmarking.
swift-ci benchmark is turning off all assertions, but its output doesn't show a top-level build-script invocation.
It isn't very useful showing PR benchmark results without providing a way to reproduce them. I'm not sure how anyone would know that there is a "right" way to build for benchmarking that's different from default Release builds.
I think assertions are the right default—they’re better for feature work and bug fixing, and only worse for performance testing. I also think that if assertions were off by default, the assertions in the standard library would rapidly lose sync with the code that actually gets run. Folks would hit a bug, turn on assertions to try to debug the issue, and find assertion failures in totally unrelated parts of the standard library.
I don’t suppose we could build both asserts and no-asserts standard libraries simultaneously, so you can easily switch between them? We don’t have anything else we can build in parallel with the standard library anyway...
@beccadax What you say is generally true of asserts in software,
where asserts enforce preconditions. The stdlib just muddles the
terminology. Preconditions are always enabled in the stdlib. What the
build script calls "stdlib asserts" are actually the internal
consistency checks. When you're doing stdlib development for feature
X, it makes sense to enable those checks within feature X on your
branch. But it doesn't make sense to enable them for any of the other
features, because by definition there's no way to break those checks
from outside the feature--otherwise they would need to be
preconditions. It's sufficient to have a separate verify-stdlib CI bot to
keep those checks working.
[tangent] In response to your suggestion, I've always thought it would
be great if the build system separated the compiler from the Swift
components. I regularly switch between multiple compilers and have
scripts that move the stdlib into place. This is why I can't use
build-script. I also often want to rebuild the stdlib without forcing
the compiler to rebuild but haven't figured out how to do that.
If programmers were able to correctly anticipate all the downstream effects of their changes, we wouldn't need any of these internal consistency checks to begin with. Long experience shows that using checks only for the feature that you're working on directly doesn't suffice. It's not hard for a swift compiler change or llvm backend change to cause a stdlib consistency check to fail (either through miscomputation or by surfacing a latent bug in the stdlib), for example, and we definitely want to flag those.
As you say, it's absolutely not the right choice for performance measurement, so I agree that making no-asserts builds more readily accessible (or adding a "so, you want to work on performance" guide) would be pretty helpful.
It sounds like the stdlib's internal checks might actually be useful for catching bugs in the compiler, which is news to me--I've never seen it happen. If that's the case, I wouldn't want to change the build-script behavior.
So for anyone running the benchmarks, measuring code size, or inspecting the SIL, what's the standard build configuration?
Where should we document that?
How can we test (in a benchmark script) if the stdlib has internal checks enabled?
It's definitely the exception rather than the rule, and I haven't personally seen it in the swift stdlib tests, but I've seen equivalent checks in Apple's tests for system libraries flag plenty of clang and llvm regressions over the years.
FWIW, I do that sort of thing with build-script -R --no-assertions.
Perhaps it deserves a quick note in the Building Swift section of the README. A more detailed description could fit in the Standard Library Programmers Manual, or perhaps a new guide on how to do performance measurements?
The stdlib exposes _isStdlibInternalChecksEnabled() to query this; the benchmark runner should be updated to complain loudly if this returns true.
I think we should have a shout-out in both the Stdlib and SIL Programmers guides to a centralized document around performance. This should belong there.
We should have a preset for measuring performance. This will ensure that there is one thing to do in perpetuity and additionally that we can additionally tweak under the hood if needed without disrupting workflows.
An additional thing that could go there would be various ways to quiet a machine down (i.e. imagine if we provided some sort of tooling/guide to quiet the system).
Developers should not need to jump through hoops to quiet a system for routine performance measurement. Yes, it's easier to get reliable measurements that way, but it's also not representative of the way that the code ends up actually being used. Live systems are noisy.
For very fine grained performance work, it can be necessary, but I don't think we want to enshrine unnatural system conditions as "the standard process"
FWIW, I do that sort of thing with build-script -R --no-assertions .
I just want to add that there's no reason for developers to disable compiler assertions, except maybe getting slightly more applicable compile-time results. Whereas we waste a lot of time attempting to reproduce bugs that were reported without compiler assertions enabled.
It probably shouldn't error because we might still want to test that the benchmarks built correctly (by running the suite), but it could refuse to output any numbers.
I beg your pardon! In my experience (which was the basis for the design of these systems), assertions are sanity checks, which are not at all the same as precondition checks. Sanity checks should/can be used liberally but never shipped. Precondition checks must ship if you are guaranteeing safety and there would otherwise be undefined behavior; in other cases it's a tradeoff between performance impact and debuggability.
Preconditions are always enabled in the stdlib.
The standard library's precondition check, _precondition, was originally (I don't know if things have changed since then) disabled in -Ounsafe client builds, which is in accordance with what I wrote above. The standard library also has _debugPrecondition, which was disabled in -O client builds, for those cases where a precondition violation doesn't create a memory safety violation and the check is deemed too expensive for a release build.
The standard library also has _sanityCheck, which is the standard library's assertion. It was intended to never ship to customers but always be used for internal testing except where it interferes with the tests (e.g. in benchmarking), because these checks pick up bugs that creep into the standard library and into the compiler.
What the
build script calls "stdlib asserts" are actually the internal
consistency checks. When you're doing stdlib development for feature
X, it makes sense to enable those checks within feature X on your
branch. But it doesn't make sense to enable them for any of the other
features, because by definition there's no way to break those checks
from outside the feature--otherwise they would need to be
preconditions. It's sufficient to have a separate verify-stdlib CI bot to
keep those checks working.
I don't know what the dynamics are these days, but when I was working on the stdlib it was common for those checks to find problems that wouldn't be picked up otherwise, including problems in the core compiler. I don't know what you mean by “from outside the feature” and don't understand your assertion that they would need to be preconditions if it were possible to break them somehow.
All that said, I don't have a position about whether these things should be on by default or not in the build script, but I do know that when an external contributor tests his changes to the standard library, it's much better if they're on, and if we turn them off by default IMO most external contributors will never discover them.
Thanks for that clarity, and apologies. I should have just said the build-script muddies the critical distinction between compiler and stdlib asserts.
I perhaps wrongly assumed that stdlib preconditions are sufficient for general testing and that the internal checks are only useful within the feature currently being developed.
I do want to point out that enabling stdlib internal checks makes it impossible to test for expected compiler output when stdlib APIs are involved. For the most part, we end up not writing IR-level tests for stdlib features. In those cases where we really care what the compiler does with a stdlib API we use REQUIRES: swift_stdlib_no_asserts.
So, from my point of view, enabling stdlib assertions decreases the level of testing.
It's also been a very common mistake for people to enable both compiler and stdlib asserts when they only wanted compiler asserts. A lot of the code size and performance data that developers have been looking at over the past couple years has been bogus... I'm not exaggerating to make a point, this has wasted an outrageous amount of time.
I suspect too that more stdlib development time goes into investigating inlining and specialization issues vs. failures caught by internal checks.
I'm not advocating for changing the build-script default now (I only asked because I thought that may have been a mistake). The main point I want to make is this:
Compiler testing primarily needs to be done with -R --no-stdlib-assertions. Compiler assertions need to be enabled for useful bug reports, and stdlib assertions need to be disabled whenever we're looking at the compiler output, particularly for code size, inlining and specialization decisions, and general performance data.
I think there's a consensus that we need to communicate this better and that benchmarks should not report scores when stdlib asserts are on.
Another idea worth floating is that build-script --assertions option should be deprecated. We should probably just have --compiler-assertions and --stdlib-assertions.