Ongoing work related to compilation performance

Hi,

This is just an update on some work that's been ongoing in the past several months related to tracking and improving Swift compilation performance. I've been helping coordinate some of that work: documenting what's known about compilation performance patterns and diagnostic machinery, increasing visibility and process controls around compilation performance, and also helping making some changes directly to the compiler.

I wanted to post here to make sure everyone's aware of what's currently going on, as well as solicit feedback on priorities and ways to help others interested in the topic.

Here's a bit of an overview of recent activities:

Compilation-performance documentation

···

=====================================

There's a somewhat lengthy document I wrote up during the summer that explains what's currently known about how compilation performance varies in the Swift compiler, what the causes of that variation (and existing cost centers) are, how things are known to sometimes go wrong, which compiler options exist to help understand the compiler's behaviour, and which auxiliary tools, scripts and processes can help diagnose and improve compilation performance.

It's stored in the repository (https://github.com/apple/swift/blob/master/docs/CompilerPerformance.md) and I'll be trying to keep it up to date as things change, but it's a good initial orientation read when approaching the topic.

Pull request (PR) compilation-performance testing

More interactively: a new form of PR testing (documented in https://github.com/apple/swift/blob/master/docs/ContinuousIntegration.md#testing-compiler-performance) has been added in recent months that committers can trigger with:

  @swift-ci please smoke test compiler performance

or

  @swift-ci please test compiler performance

The latter takes quite a long time, and if you are just curious to see if a change helps or hurts compilation performance, the former is usually totally adequate.

Output from these commands looks like this:

  https://github.com/apple/swift/pull/12843#issuecomment-345042338

And it displays a summary of output binary-size and compile-time changes between your pull request and the branch you're committing to, as well as changes to a set of compiler statistics tracking interesting causes of work.

These CI-level reports are based on compiling and measuring the source compatibility testsuite (kept at https://github.com/apple/swift-source-compat-suite). The "smoke" CI test, which is usually sufficiently representative to catch regressions, measures counters and timers for 3 projects in the source compatibility suite (currently Alamofire, Kingfisher and ReactiveCocoa); the "full" test measures the whole suite.

Measurements in general

The measurements taken by the CI tests above are emitted by the compiler using a mode introduced earlier in the year: -stats-output-dir. Briefly: this mode emits a collection of .json files summarizing all available statistics and timers in a given compiler (the exact set changes depending on whether the compiler is built with asserts). One .json file is written per frontend or driver process in a compilation, so this permits post-execution analysis by a variety of tools. One such tool (swift/utils/process-stats-dir.py, see https://github.com/apple/swift/blob/master/docs/CompilerPerformance.md#post-processing-tools-for-diagnostics) is usable both for batch analysis (the CI job uses it) or manually for interpreting the performance impact of a change on a developer's workstation. It is also, of course, usable in regression tests; a handful of tests now directly measure performance counters using it.

Scale testing

In addition to testing the absolute values of counters in the compiler, there's a bit of test infrastructure that uses those counters in a more abstract way, called "scale tests" (driven by utils/scale-test.py). This approach measures the relationship between linear changes to the scale of a synthetic input (say: increases to the number of classes in a project) and changes to the amount of work different counters in the compiler do. This is explained in more detail in https://github.com/apple/swift/blob/master/docs/CompilerPerformance.md#scale-test, but briefly: this allows writing tests that are insensitive to the exact values of counters, but that check that counters have certain desirable relationships (eg. linear or sub-linear) to the scale of inputs, rather than undesirable relationships (quadratic or worse).

Reducing quadratic costs

One area of work that the scale-tests have highlighted are those parts of early semantic-analysis that work on "all declarations in a module", and that thereby risk doing work that's quadratic in the number of files, when running a debug build. The compiler was designed to avoid this sort of quadratic work by being lazy about its analysis, but measurements suggest it's not always working as well as intended. Based on the belief that these cost centers are responsible for cases where (non-WMO) debug builds run slower than the unsupported (but widely practiced) "WMO debug builds", several people are focusing attention on reducing these costs, in two ways:

1. Making each frontend do closer-to-constant work, by loading, importing and validating declaration members more lazily. For example, some work I've recently been doing on name lookup (eg. https://github.com/apple/swift/pull/12669), and some other work Slava Pestov's been doing on extension binding (https://github.com/apple/swift/pull/12855) and member validation (https://github.com/apple/swift/pull/12942). Not all of this work is landed or enabled yet, but it shows some promise, and each piece should magnify the effects of the others.

2. Reorganizing the strategy by which the driver runs frontend processes, to permit batching semantic work in non-WMO builds. This is work Jordan Rose outlined in September (https://lists.swift.org/pipermail/swift-dev/Week-of-Mon-20170925/005439.html) and David Ungar started on in October (https://github.com/apple/swift/pull/12373); it's still early on but the hope is that it may reduce the tradeoff between WMO and non-WMO debug builds, gaining some of the best of both models and ideally removing the need for users to fiddle with obscure build settings.

Improving incremental mode

Another area that could use work is the incremental compilation logic in the driver -- that is, reducing the number of times a file is rebuilt when it "doesn't need to be" -- and some of the documentation and driver-level counters mentioned above provide insight into that (eg. https://github.com/apple/swift/blob/master/docs/CompilerPerformance.md#driver-diagnosis). Incremental compilation is based on approximating the "true" dependency graph and sometimes this approximation is too coarse; but changing this will be a large amount of work, and the nature of a substantial change to that is still subject to a lot of analysis and design. In the meantime there may also simpler bugs lurking in the dependency-analysis logic, within the current dependency approximation; I'd be happy to help anyone who wants to spend time bug-hunting in this area understand what they're looking at.

That's all I've got for now, but if you have specific bugs, questions, suggestions or comments on the topic, please direct them my way.

-Graydon

Hi,

This is just an update on some work that's been ongoing in the past several months related to tracking and improving Swift compilation performance. I've been helping coordinate some of that work: documenting what's known about compilation performance patterns and diagnostic machinery, increasing visibility and process controls around compilation performance, and also helping making some changes directly to the compiler.

I wanted to post here to make sure everyone's aware of what's currently going on, as well as solicit feedback on priorities and ways to help others interested in the topic.

Here's a bit of an overview of recent activities:

Compilation-performance documentation

There's a somewhat lengthy document I wrote up during the summer that explains what's currently known about how compilation performance varies in the Swift compiler, what the causes of that variation (and existing cost centers) are, how things are known to sometimes go wrong, which compiler options exist to help understand the compiler's behaviour, and which auxiliary tools, scripts and processes can help diagnose and improve compilation performance.

It's stored in the repository (https://github.com/apple/swift/blob/master/docs/CompilerPerformance.md) and I'll be trying to keep it up to date as things change, but it's a good initial orientation read when approaching the topic.

Pull request (PR) compilation-performance testing

More interactively: a new form of PR testing (documented in https://github.com/apple/swift/blob/master/docs/ContinuousIntegration.md#testing-compiler-performance) has been added in recent months that committers can trigger with:

@swift-ci please smoke test compiler performance

or

@swift-ci please test compiler performance

The latter takes quite a long time, and if you are just curious to see if a change helps or hurts compilation performance, the former is usually totally adequate.

Output from these commands looks like this:

https://github.com/apple/swift/pull/12843#issuecomment-345042338

And it displays a summary of output binary-size and compile-time changes between your pull request and the branch you're committing to, as well as changes to a set of compiler statistics tracking interesting causes of work.

These CI-level reports are based on compiling and measuring the source compatibility testsuite (kept at https://github.com/apple/swift-source-compat-suite). The "smoke" CI test, which is usually sufficiently representative to catch regressions, measures counters and timers for 3 projects in the source compatibility suite (currently Alamofire, Kingfisher and ReactiveCocoa); the "full" test measures the whole suite.

Measurements in general

The measurements taken by the CI tests above are emitted by the compiler using a mode introduced earlier in the year: -stats-output-dir. Briefly: this mode emits a collection of .json files summarizing all available statistics and timers in a given compiler (the exact set changes depending on whether the compiler is built with asserts). One .json file is written per frontend or driver process in a compilation, so this permits post-execution analysis by a variety of tools. One such tool (swift/utils/process-stats-dir.py, see https://github.com/apple/swift/blob/master/docs/CompilerPerformance.md#post-processing-tools-for-diagnostics) is usable both for batch analysis (the CI job uses it) or manually for interpreting the performance impact of a change on a developer's workstation. It is also, of course, usable in regression tests; a handful of tests now directly measure performance counters using it.

Scale testing

In addition to testing the absolute values of counters in the compiler, there's a bit of test infrastructure that uses those counters in a more abstract way, called "scale tests" (driven by utils/scale-test.py). This approach measures the relationship between linear changes to the scale of a synthetic input (say: increases to the number of classes in a project) and changes to the amount of work different counters in the compiler do. This is explained in more detail in https://github.com/apple/swift/blob/master/docs/CompilerPerformance.md#scale-test, but briefly: this allows writing tests that are insensitive to the exact values of counters, but that check that counters have certain desirable relationships (eg. linear or sub-linear) to the scale of inputs, rather than undesirable relationships (quadratic or worse).

Reducing quadratic costs

One area of work that the scale-tests have highlighted are those parts of early semantic-analysis that work on "all declarations in a module", and that thereby risk doing work that's quadratic in the number of files, when running a debug build. The compiler was designed to avoid this sort of quadratic work by being lazy about its analysis, but measurements suggest it's not always working as well as intended. Based on the belief that these cost centers are responsible for cases where (non-WMO) debug builds run slower than the unsupported (but widely practiced) "WMO debug builds", several people are focusing attention on reducing these costs, in two ways:

1. Making each frontend do closer-to-constant work, by loading, importing and validating declaration members more lazily. For example, some work I've recently been doing on name lookup (eg. https://github.com/apple/swift/pull/12669), and some other work Slava Pestov's been doing on extension binding (https://github.com/apple/swift/pull/12855) and member validation (https://github.com/apple/swift/pull/12942). Not all of this work is landed or enabled yet, but it shows some promise, and each piece should magnify the effects of the others.

2. Reorganizing the strategy by which the driver runs frontend processes, to permit batching semantic work in non-WMO builds. This is work Jordan Rose outlined in September (https://lists.swift.org/pipermail/swift-dev/Week-of-Mon-20170925/005439.html) and David Ungar started on in October (https://github.com/apple/swift/pull/12373); it's still early on but the hope is that it may reduce the tradeoff between WMO and non-WMO debug builds, gaining some of the best of both models and ideally removing the need for users to fiddle with obscure build settings.

Improving incremental mode

Another area that could use work is the incremental compilation logic in the driver -- that is, reducing the number of times a file is rebuilt when it "doesn't need to be" -- and some of the documentation and driver-level counters mentioned above provide insight into that (eg. https://github.com/apple/swift/blob/master/docs/CompilerPerformance.md#driver-diagnosis). Incremental compilation is based on approximating the "true" dependency graph and sometimes this approximation is too coarse; but changing this will be a large amount of work, and the nature of a substantial change to that is still subject to a lot of analysis and design. In the meantime there may also simpler bugs lurking in the dependency-analysis logic, within the current dependency approximation; I'd be happy to help anyone who wants to spend time bug-hunting in this area understand what they're looking at.

Has any thought been put into taking advantage of llbuild for dependency graphs? Can some performance improvements come from that?

···

On 17 Nov 2017, at 01:27, Graydon Hoare via swift-dev <swift-dev@swift.org> wrote:

That's all I've got for now, but if you have specific bugs, questions, suggestions or comments on the topic, please direct them my way.

-Graydon

_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

Improving incremental mode

Another area that could use work is the incremental compilation logic in the driver -- that is, reducing the number of times a file is rebuilt when it "doesn't need to be" -- and some of the documentation and driver-level counters mentioned above provide insight into that (eg. https://github.com/apple/swift/blob/master/docs/CompilerPerformance.md#driver-diagnosis). Incremental compilation is based on approximating the "true" dependency graph and sometimes this approximation is too coarse; but changing this will be a large amount of work, and the nature of a substantial change to that is still subject to a lot of analysis and design. In the meantime there may also simpler bugs lurking in the dependency-analysis logic, within the current dependency approximation; I'd be happy to help anyone who wants to spend time bug-hunting in this area understand what they're looking at.

Has any thought been put into taking advantage of llbuild for dependency graphs? Can some performance improvements come from that?

Long term I think there's general interest for leveraging it, in place of the miniature build system inside the swift driver; but the discovery/analysis or dependencies and the scheduling/executing of jobs are somewhat separate tasks, and llbuild only does the second. The mapping from a set of swift declarations-and-files into a dependency graph that is a safe (but tight) approximation of the "true" dependencies of the compiler is the hard part. That is, the material that goes into a .swiftdeps file is the hard part; and unfortunately that's a thing build systems delegate to compilers to figure out, since it involves tracing the compiler's internal activities, as it runs each job.

-Graydon