RFC: Optimizing test suite performance on high-end machines

Hello,

The ninja check-swift-validation target can complete about 30% faster on high-end workstations if one simply reorganizes the test suite to maximize the number of slow tests that run early.

I'd like to propose:

  1. Refactor the validation tests to use a REQUIRES check instead of just happening to live in a different directory.
  2. Rename two subdirectories: stdlib to 0_stdlib and Unit to 1_Unit.

Given how sweeping this change is and given how often the test suite is being modified, I'd like to get buy in before making a pull request.

Thoughts? A 30% gain is rather nice, IMO. :-)
Dave

[EDIT] – It turns out that the unit tests are a little more complicated. Nevertheless the 30% gain I observed was with just the flattening and renaming the stdlib.

6 Likes

This is certainly an interesting idea, and 30% would certainly make this very worthwhile. We could also consider making changes to lit that would allow configuring it as to which folders to prioritize rather than relying on lexicographic sorting in the file system. It might be a bit more work but might be more generally useful in the future -- any thoughts on trying that?

2 Likes

I looked into this some more and lit is so close and yet so far.

Lit has a solution for entire test suites that are systemically slow, i.e. the unit tests; but no solution for individual tests scattered inside of normal test suites.

I think the most pragmatic solution given how lit works would be to add a new test suite configuration variable to list the slowest tests. This should be a very small list. It'd need to be this way because lit sorts tests before parsing them (which it does after fork() for better and for worse).

What do you think?

1 Like

This turned out to be way easier than I feared:

For LLVM:

diff --git i/llvm/utils/lit/lit/Test.py w/llvm/utils/lit/lit/Test.py
index 59fefbc7f08919ad4a509855b3890f065323ae08..228a387f580ccaf7adfdb66a37a1ad92f93a10aa 100644
--- i/llvm/utils/lit/lit/Test.py
+++ w/llvm/utils/lit/lit/Test.py
@@ -404,4 +404,6 @@ class Test:
         This can be used for test suites with long running tests to maximize
         parallelism or where it is desirable to surface their failures early.
         """
+        if '/'.join(self.path_in_suite) in self.suite.config.slowest_tests:
+            return True
         return self.suite.config.is_early
diff --git i/llvm/utils/lit/lit/TestingConfig.py w/llvm/utils/lit/lit/TestingConfig.py
index 38d05066a2b090cf9ed4e03b29ae15aae5e3bb82..592aaefc77692bd7f8d55aa75ad5a6da69aeb83b 100644
--- i/llvm/utils/lit/lit/TestingConfig.py
+++ w/llvm/utils/lit/lit/TestingConfig.py
@@ -124,6 +124,8 @@ class TestingConfig(object):
         self.limit_to_features = set(limit_to_features)
         # Whether the suite should be tested early in a given run.
         self.is_early = bool(is_early)
+        # List of slowest tests (to run early)
+        self.slowest_tests = {}
         self.parallelism_group = parallelism_group
         self._recursiveExpansionLimit = None

And for Swift:

diff --git i/test/Unit/lit.cfg w/test/Unit/lit.cfg
index 99713d42a7e95ccc8070d24df18792628ac7acf6..efe3218aeaaf607a4fc5516bf808894e0a76f125 100644
--- i/test/Unit/lit.cfg
+++ w/test/Unit/lit.cfg
@@ -48,6 +48,9 @@ config.suffixes = []
 # See http://reviews.llvm.org/D18647 for details.
 config.excludes = ['DWARF']

+# Unit tests tend to be needlessly serial. Run them early.
+config.is_early = True
+
 # Exclude LongTests directories when not executing long tests.
 swift_test_subset = lit_config.params.get('swift_test_subset', 'validation')
 if swift_test_subset in ['primary', 'validation', 'only_validation']:
diff --git i/test/lit.cfg w/test/lit.cfg
index 7e4038e687c9892f976a3b43a2f4d80e5d771ae1..774f05114eaf9ede87509af131f986d78775f0f2 100644
--- i/test/lit.cfg
+++ w/test/lit.cfg
@@ -322,6 +322,30 @@ config.round_trip_syntax_test = make_path(config.swift_utils, 'round-trip-syntax
 config.link = lit.util.which('link', config.environment.get('PATH', '')) or      \
               lit.util.which('lld-link', config.environment.get('PATH', ''))

+config.slowest_tests = {
+    "Casting/BoxingCasts-4.test",
+    "Casting/BoxingCasts-5.test",
+    "Constraints/casts.swift",
+    "Driver/response-file.swift",
+    "Generics/validate_stdlib_generic_signatures.swift",
+    "IDE/complete_ambiguous.swift",
+    "IDE/complete_operators.swift",
+    "IDE/complete_unresolved_members.swift",
+    "IDE/complete_value_expr.swift",
+    "Interpreter/dynamic_replacement.swift",
+    "Interpreter/multi_payload_extra_inhabitant.swift",
+    "IRGen/pre_specialize.swift",
+    "Prototypes/DoubleWidth.swift.gyb",
+    "Python/build_swift.swift",
+    "Sema/type_checker_perf/slow/rdar19612086.swift",
+    "Sema/type_checker_perf/slow/rdar32998180.swift",
+    "stdlib/CharacterPropertiesLong.swift",
+    "stdlib/FixedPoint.swift.gyb",
+    "stdlib/NumericParsing.swift.gyb",
+    "stdlib/UnicodeTrieGenerator.gyb",
+    "Syntax/round_trip_stdlib.swift"
+}
+
 # Find the resource directory.  Assume it's near the swift compiler if not set.
 test_resource_dir = lit_config.params.get('test_resource_dir')
 if test_resource_dir:

4 Likes

This is awesome! @DaveZ would it be possible to create a PR on apple/llvm-project:swift/main and apple/swift:main? So, we can run PR testing and compare test times.

Just to be clear, llvm/utils/lit/lit/Test.py changes should be merged into upstream llvm.org repository not apple/llvm-project:swift/main.

Sure. I can do that tomorrow when I’m at my computer again.

Just to be clear though, the 30% performance gain is the extreme end of what’s possible. It requires having a ton of cores and an OS that has lightweight process creation/destruction (not something Apple optimizes for).

So unless CI has been upgraded recently, then you will be lucky to notice any gains.

1 Like

Thanks @DaveZ! I started testing on the PR

Do you have any numbers for the performance gains on any particular specific configurations? Might be useful for making decisions about testing machine configs...

Is there some sort of feedback mechanism we can incorporate into the test suite to help keep the slowest_tests list up to date?

2 Likes

Lit has a flag to figure out slow tests ( --time-tests), we might be able to use this feature to figure out which test needs to be moved into slow test list. However, we will have to figure out what is the threshold for moving tests in slow test list (is it 200sec or 100sec or top 10 slow tests).

1 Like

I've actually used --time-tests and I've done some statistical analysis of the test suite.

I don't think an absolute time is what matters. If a test is slow, it tends to be slow in proportion to the hardware. But the slowest test is probably fairly consistent from machine to machine.

Personally, the number of files to put in the slowest test suite config variable has rapidly diminishing returns (and will eventually backfire with too many). I'd wager that the right answer is five to ten tests being listed (and maybe a few dozen max).

1 Like

Hi @Nicole_Jacque,

My 48-core (96-thread) Linux workstation is kind of the ideal scenario for this change. This machine is fast enough that the total test time and the slowest test time are about the same, so starting the slowest test earlier matters a lot.

Dave

2 Likes

I imagine it also has something to do with where in the timeline the tests "naturally" end up running. A long-running test that gets ordered near the end of the job is worse for overall time than one that was launched near the beginning of the test suite.

Right. That’s why this thread original proposed lexical hacks to get slower tests (often the stdlib and unit tests) to run earlier. This approach is much nicer.

And just to be clear, fixing tests is preferable to marking them slow/early. For example:

Hello,

So as it turns out, I had a very fruitful discussion with the LLVM community and now lit can automatically record and reorder tests from slowest to fastest when doing incremental development. This allows ninja check-llvm to run over 50% faster on my machine. The benefits to Swift aren't as dramatic but they're still worthwhile on high-end machines. I've opened up a cherry-pick request for Apple's LLVM branch:

5 Likes

great work Dave!

Thanks Chris!

The ninja check-llvm delta would be less dramatic if one pathologically long exegesis test were fixed. :stuck_out_tongue:

Dave

Terms of Service

Privacy Policy

Cookie Policy