Decoupling -verify tests from diagnostic wording

DiagnosticVerifier currently supports expecting diagnostics only by matching the message text. This means that when diagnostic wording is improved, every test that references the old wording must be updated, even if the test has nothing to do with how the diagnostic is worded. I recently opened a PR (#89194) to support expecting the diagnostic ID as well.

Mode Example
Message // expected-error {{expected initial value after '='}}
Identifier // expected-error [expected_init_value]

All directive features (count, line offsets, columns, markers, cross-file references, continuation, fix-its, {{none}}, {{documentation-file=...}}) work identically with [id]. The bracket syntax [id] matches the form already used by -debug-diagnostic-names to render diagnostic identifiers, so identifiers can be copied directly from compiler output into a test.

Motivation

The following commits illustrate the scale of this problem:

Commit Description Test Files Updated
a1716fe2a6d Use less jargon in diagnostics 178
2cd90bdd694 Quote attributes consistently 149
9ba481ad53e "Swift 5" → "Swift 5 language mode" 72
1957bd60658 "without more context" → "without a type annotation" 32
c8f3476f193 Replace backticks with single quotes 17

Each of these commits represents a straightforward improvement to user experience: clearer wording, more consistent formatting, less jargon. The .def file changes themselves are trivial, but the resulting test updates are disproportionately large, creating friction that discourages incremental refinement of diagnostics.

The vast majority of tests that use -verify mode exist to verify that a specific code path emits a specific diagnostic. These tests verify compiler behavior, not diagnostic wording. But because message-matching is the only available mechanism, these tests are inherently coupled to the wording: when the wording changes, the tests fail even though the underlying behavior being verified did not change.


If this lands, incremental refinement of diagnostics gets meaningfully cheaper. Sharing here so the syntax is on your radar.

Would be important, I'd think, for any identifier-based verification to support arguments just as the diagnostics themselves do. As it is, string-based matching encourages checking that the arguments interpolated into any diagnostic are as expected (since there's no more friction to match interpolated text than templated text).

I think it'd be acceptable if, in the identifier-based form, there's mildly more friction (e.g., having to proactively choose to add a comma or a pair of parens), but I worry that providing a lower-friction option that doesn't support testing arguments at all risks encouraging suboptimal patterns.

Certainly finding ways to improve the workflow are positive; but given that, on the one hand, updating lots of files (and checking that they've been appropriately updated) has never been easier, and on the other hand, doing things thoughtfully has never been more out of vogue, I think we are best served if features were built to encourage the latter as much as possible rather than optimizing for the former.

3 Likes

I think the most rigorous pattern is to also introduce a test suite that exhaustively verifies that every diagnostic renders correctly, including argument interpolation across a wide variety of inputs. Message-matching today is a leaky substitute: the existing tests incidentally cover argument interpolation, but only for the diagnostics that happen to be exercised by ad-hoc tests, and coverage is duplicated across thousands of files. ID-matching as a primary tool opens the door to the separation, with rendered-message coverage concentrated where it belongs, and serving as the actual safeguard against rendering bugs.

I plan to follow this PR up with that rendering suite. I kept the initial scope narrow for a better chance of acceptance, but the rendering suite was always an intended second step. The rendering suite plus [id] matching together decouple the test base from diagnostic wording, while ensuring high-quality rendering.

It's true that these days it's very easy to update files en masse, but the reviewer cost across 178 files is real and doesn't go down much with AI assistance. It may arguably even go up, since bulk updates are more opaque and may happen more often.

I agree with your combined-syntax direction. The answer to my earlier question of "do both syntaxes need to match?" is then straightforwardly "yes". An [id]{{substring}} syntax would require both ID and substring match against the rendered output, giving opt-in coverage at the cost of mildly more friction. I can fold that into this PR if you'd like.

Thanks for the thoughtful feedback.

I think the churn is good for reviewers, rather than bad: it lets them easily check what it looks like in various real contexts. If I feel convinced after reading 20 test file diffs I can still choose to skip reading the rest, but if they're not in the diff I don't even have the option. I left a comment on the PR explaining why I'm against going in this direction. I do appreciate the effort in improving our testing frameworks however! I'm quite convinced there are tons of places it can be improved.

3 Likes

I agree that having the diff in front of reviewers is a valuable feature which would be abstracted away by this implementation, and that utils/update-verify-tests.py does a lot of the heavy lifting in the common case. With that in mind, I'm going to close #89194. Appreciate both your and @xwu's time thinking this through! Testing infrastructure is something I care about a fair amount, so I'd be glad to pick up improvements in that area down the line.

update-verify-tests should be better about not touching unrelated whitespace and formatting now! I really like that test diffs are associated with code changes for git blame personally.