Swift.org blog: Thread Sanitizer for Swift on Linux

tkremenek · August 13, 2019, 10:10pm

There is a new post on the Swift.org blog titled Thread Sanitizer for Swift on Linux. It discusses how TSan can now be used on Linux to find data races!

The blog post is authored by @yln.

Please use this thread to ask questions related to the blog post!

Jon_Shier · August 14, 2019, 2:09am

Are there any recommendations for the scale of testing necessary to declare something thread-safe? That is, how many iterations of a test exposing potential unsafe paths should be run with the sanitizer before you can say there are no threading issues? I'm guessing there's not really a good answer, which is why I think some built in stress testing tools might be nice.

lukasa · August 14, 2019, 8:21am

There isn't a good answer to this question really: TSAN can only tell you there's a data race if it actually executes, so the answer to "how many iterations" do you need is contingent on the program in question and its data flows. Additionally, you need to run literally every hypothetical unsafe branch: thread safety issues may only be exposed with particular input data.

I would consider declaring something "thread-safe" is a non-goal, as it's a proof by exclusion. Instead, in NIO we shoot for feeling confident that our core constructs and logic are probably thread safe, at least when used as the tests use them. The main advantage here is dealing with bug reports: when users report threading issues, we can usually fairly quickly identify whether or not an unsafe access to a NIO structure is likely to be NIO's fault, or a manifestation of a thread safety issue elsewhere.

One of the best ways to improve your confidence in your thread safety, though, is to combine the following:

Turn on a TSAN build in CI. This allows you to catch regressions on tested code paths. Additionally, you can enhance your test suites with failures reported from third parties.
Enhance code coverage in testing, to improve (1).
Invest in fuzzing infrastructure and turn on TSAN there too.

Naturally, doing all of these is quite a lot of work, but in general TSAN does better by being run continuously. The NIO team has caught regressions by regularly running our own test suite (as well as those of several other projects) with TSAN turned on. More iterations is always better.

Incidentally, the same guidances applies to Address Sanitizer: more inputs is always better.

thomasvl · August 14, 2019, 1:54pm

Any updates on when all the sanitizers might be included in the Xcode toolchain? (i.e. - Revisiting Status Swift 5/Xcode Fuzzing support?)

MattSeaman · August 27, 2019, 4:26am

What’s the best approach for a CI to be able to notice these issues (not just on Linux, but macOS as well)? I already have TSan on for unit tests, but I also have a suit of integration tests that run by invoking my Swift executable directly. How can the test script notice when warnings are emitted and mark a failure?

yln · August 28, 2019, 12:18am

Thread Sanitizer runtime options can be configured via the TSAN_OPTIONS environment variable:

env TSAN_OPTIONS=option1=val1:option2=val2 /path/to/binary

There are many options available [1, 2], the most relevant to your question are:

exitcode: Override exit status of the process in case TSan reported an error (default: 66).
abort_on_error: call abort() instead of exit(exitcode) when terminating the process (default: true on Linux, false on Darwin).
log_path: Path to which error reports are written (default: stderr).

So on Linux, your CI test script can invoke the binary, check whether the process exit code is 66, and capture stderr to obtain error reports. On Darwin, you should set abort_on_error=0.

For example, the following ensures exit() is used and terminates the process with a custom exit code in case TSan reports an error.

env TSAN_OPTIONS=abort_on_error=0:exitcode=77 /path/to/binary

hassila · March 29, 2021, 9:41am

Just wanted to say thanks to @yln - just used this to find a missing lock on Linux, worked as a charm. Awesome. (I know bumping an old post, but maybe someone else will be happy to stumble on this :-)