[Pitch] Test Issue Warnings

I propose introducing a new API to Swift Testing that allows developers to record issues with a specified severity level. By default, all issues will have severity level “error”, and a new "warning" level will be added to represent less severe issues. The effects of the warning recorded on a test will not cause a failure but will be included in the test results for inspection after the run is complete.

To see my draft proposal, click https://github.com/suzannaratcliff/swift-evolution/blob/suzannaratcliff/issue-severity-warnings/proposals/testing/XXXX-issue-severity-warning.md!

Trying it out

To try this feature out, add a dependency to the main branch of swift-testing to your project:

...
dependencies: [
   .package(url: "https://github.com/swiftlang/swift-testing.git", branch: "main"),
],
...

and to your test target:

.testTarget(...
    ...,
    dependencies: [
       ...,
       .product(name: "Testing", package: "swift-testing")
    ]

Next, import Swift Testing using @_spi(Experimental)
So, instead of import Testing, use @_spi(Experimental) import Testing instead.

To record a warning issue:

Issue.record("My comment", severity: .warning)
7 Likes

So I'm a bit on the fence about this one.

Mainly because, what does a warning mean?

  • Is it meant to signify that one test is less important than another one?
  • Is it meant to signify that the test didn't fully pass as expected?
  • Is it meant to signify that a test could be flaky?
  • ...

My concern is that without guidance, this could very quickly lead to "hiding issues" behind warnings. (I like the binary pass/fail behaviour)

On the other hand, "hiding issues" is something that people already do and have multiple ways of doing. So a more explicit expression might be a net plus.

What kind of use cases are you looking at?

2 Likes

I like this idea. (I read the proposal but didn't try the implementation.)

I would focus on the value of logically selecting issues (i.e., see in Xcode or flag from the command-line) that the test developer thinks test executors should observe. Executors can decide for themselves whether warnings are actionable. i.e., I wouldn't go too far down the road of replacing logging levels or context capture (much broader APIs/concerns).

The proposal text presents only the string -> Comment form (i.e., for Comment (Issue.record("My comment"...), but the Issue.record(_ e: Error..) form would help with re-using code. Right now for first-failure data capture, developers can populate their own error with helpful context. If those custom errors can be recorded as warnings, existing context-capture code can be re-used. Also, developers could then support their own reporting policies to downgrade a failure to a warning or vice-versa depending on the context.

The duplication of Issue.isFailure and Issue.severity could be confusing. The only distinguishing severity really is .warning. Severity also creates a third category of status (alongside Issue.Kind and isFailure). (And enums can be hard to grow, as you mentioned in the alternatives.) Did you consider instead a new API Issue.warn(Comment | Error, ...) and then setting isFailure to false? Could you then do without severity?

This has some potential for complication when warnings are issued alongside failures. Currently there's one signal from each test failure (and multiple failures per test). With this proposal, there would also be some number of warnings possibly accompanying each failure or test. It's not clear how the warnings would be related to each failure for a given test (esp. given possibly async API). And it's possible the same warning would be issued in the context of multiple failures; would Xcode display multiple issues in that case? So perhaps consider how this would be used when issues are disordered, or how consumers of testing output messages might be confused by non-failing issues (e.g., does one have to enable their output on the command-line? or their annotations in VS-Code?).

The proposal didn't include a macro #warn(...), which would have the advantage of assembling the evaluation context, avoiding a lot of string munging. Is that a future direction?

1 Like

Hi @suzannaratcliff, I think fleshing out some concrete examples might help people get a better understanding of how these feature are intended to be used. And I would be curious to see a sketch of what some console output could theoretically look like when a “warning” issue is recorded.

If I am understanding this tool correctly, we would definitely make use of it, though I think we would prefer for there to be an “info” setting in addition to “warning” and “error”. @stephencelis and I work on a number of open source libraries that are used by thousands of developers, and we currently use withKnownIssue and XCTExpectIssue in order to approximate what I think this feature would bring to us. Let me try to concretely describe a simplified version of what we do.

We have a few tools in our ecosystem that allow one to “exhaustively” assert on how large pieces of state change after some event is processed. It’s not important to know all of the details of how this code works, but suffice it to say that it allows you to write an assertion like the following:

process(.nextSpeakerButtonTapped) {
  $0.speakerIndex = 1
  $0.secondsElapsed = 30
}
process(.nextSpeakerButtonTapped) {
  $0.speakerIndex = 2
  $0.secondsElapsed = 60
}
process(.nextSpeakerButtonTapped) {
  $0.alert = "This is the last speaker. End meeting early?"
}

The $0 in these trailing closures represents a large piece of state with many fields, and some of those fields are themselves quite complex. Further, it represents the state before the .nextSpeakerButtonTapped event is processed, and it’s our job to mutate it into the state after the event is processed.

And this assertion forces you to exhaustively prove how the full piece of state changes. If you leave off a part:

process(.nextSpeakerButtonTapped) {
  $0.speakerIndex = 2
  // Forgot to assert this: $0.secondsElapsed = 60
}

…you get a nicely formatted test failure letting you know what did not match:

 ❌ Issue recorded: A state change does not match expectation: …

      RecordMeeting.State(
        alert: nil,
    −   secondsElapsed: 30,
    +   secondsElapsed: 60,
        speakerIndex: 2,
        syncUp: SyncUp(…),
        transcript: ""
      )

(Expected: −, Actual: +)

But, sometimes these kinds of exhaustive assertions can be a pain. When the state is very large, and when the event causes many things to change, we have to make a lot of mutations to $0 even though we may only be interested in a small part of it.

This is why we further allow these tools to be configured with a “non-exhaustive” option, which allows you to assert on only the pieces of state you care about. In this mode $0 represents the state after the event is processed, and so any mutations you make to $0 must not actually change the value, otherwise you get an error.

But, to make it clear to the user that there are technically changes left un-asserted we emit an expected issue/failure letting them know what things were not asserted on:

 ☑️ Issue recorded: A state change does not match expectation: …

      RecordMeeting.State(
        alert: nil,
    −   secondsElapsed: 30,
    +   secondsElapsed: 60,
        speakerIndex: 2,
        syncUp: SyncUp(…),
        transcript: ""
      )

(Expected: −, Actual: +)

I’m using the :check_box_with_check: emoji to denote this is an “issue” but did not actually cause a test failure, but in the Swift Testing logs it actually looks like a failure unless you really analyze the report:

◇ Test nextSpeaker() started.
✘ Test nextSpeaker() recorded a known issue at RecordMeetingTests.swift:216:19: Issue recorded
✘ Test nextSpeaker() passed after 0.016 seconds with 1 known issue.

Those “✘” icons are very misleading. The test did not actually fail.

And so, if we had this feature, we would convert our withKnownIssues to instead report a “warning” issue (though ideally an “info” issue). To us it represents a potential issue, in that there is some un-asserted state lurking in the shadows, but because they are run in the non-exhaustive mode it is not an actual test failure.

So, overall I’m a big +1!

10 Likes

Hello everyone!

I appreciate all of the feedback so far!

I updated the draft proposal to include 4 use cases and a future direction section.


Two of the use cases I think are pretty strong for warnings include:

  • Warning about a Percentage Discrepancy in Image Comparison:
    • Scenario: When comparing two images to assess their similarity, a warning can be triggered if there's a 95% pixel match, while a test failure is set at a 90% similarity threshold.
    • Reason: In practices like snapshot testing, minor changes (such as a timestamp) might cause a discrepancy. Setting a 90% match as a pass ensures test integrity. However, a warning at 95% alerts testers that, although the images aren't identical, the test has passed, which may warrant further investigation.

and

  • Warning for a retry during setup for a test:
    • Scenario: During test setup part of your code may be configured to retry, it would be nice to notify in the results that a retry happened
    • Reason: This makes sense to be a warning and not a failure because if the retry succeeds the test may still verify the code correctly

For the future direction section:

  • Yesterday @plemarquand brought up the idea of being able to run in a more strict mode where warnings could fail a test. I thought this was a good idea and so I added it to the future direction section.

  • I think in the future we would like to be able to add other levels of severity such as an info severity. There is room in this enum to grow but I am not sure that this needs to be added for this initial proposal.

3 Likes