Swift Testing includes an interface for checking that some asynchronous event occurs a given number of times (typically exactly once or never at all.) This proposal enhances that interface to allow arbitrary ranges of event counts so that a test can be written against code that may not always fire said event the exact same number of times.
Hmm… I'm not completely sure how testing patterns from other ecosystems handle situations like this by-convention (and I would like to hear if anyone has any ideas or context for that).
I'm thinking that there's a subtle shift in thinking implied here… where we are now giving a subset of tests (the confirmation) the opportunity to be "flaky" by-design. It's not necessarily a "bad" thing… but we already formally define withKnownIssue to indicate tests that can fail but do not lead to test failure runs. One way to look at this is multiple parallel "versions" of flakiness. Should a confirmation that might fire with different values be wrapped in a withKnownIssue or wrapped with a new range-based confirmation? Would we make any kind of opinionated statement about how these two different approaches work together?
Was there discussion of a parallel API on confirmation that transitions away from a expectedCount integer and toward something like an arbitrary closure that could be defined ad-hoc by the engineer building the tests? If an engineer truly wanted to build more flexible tests could that be a legit alternative?
The way I interpret this API is as a solution to managing inconsistencies or complexity in a dependency you can't change. I think this is a very specific tool for a specific application of test code, and in general, strict values should be specified.
I would also go so far as to say that using this proposal to avoid having to specify an expectation count due to a concern over changing and merge source code changes is an abuse of the API. In that case, if a system changes such that a confirmation is called more than once, then the test should be updated to strictly refer to that new count, rather than some range being specified so the source-under-test can be arbitrarily changed. That's a recipe for letting bugs go by unnoticed and I don't endorse it.
Flaky tests are a fact of life, but this API is not intended nor designed to facilitate them. If you have a flaky test, we do already have withKnownIssue(isIntermittent: true) {} that will let you model it. We view these APIs as orthogonal to each other.
I suspect the example I wrote up in the pitch regarding .mouseClicked didn't do a great job capturing the scenarios where this API is useful. Non-determinism (randomness) is a real aspect of real-world software development and it is often the case that it causes tests to be flaky because the tests are not written to be resilient against that non-determinism. @chefski touched on this in his reply (and, I'll note, sassed me while doing so!) He has a lot more experience than I do working with non-deterministic tests and may be able to give some real-world examples.
We toyed with other mechanisms for adding flexibility, but I don't think we specifically considered a closure in this case. I don't think we'd want to build out such a feature—I'm struggling to picture how it would differ from just writing a test function. The purpose of this proposed API is not to allow arbitrary pass/fail semantics for confirmation() but to allow it to be used in non-deterministic test scenarios (as discussed above.)
Timeouts are not part of this proposal. We've discussed timeouts previously in several threads in this topic; there's probably no need to rehash those conversations here.
Thank you @grynspan! This is a useful enhancement to confirmation() and offers a conceptual analogue to some of XCTest's XCTestExpectation APIs.
Since it was originally posted, the proposal and its implementation PR have evolved slightly which is worth calling out. One piece of feedback I gave was that passing a RangeExpression literal which lacks a lower bound such as ...10 could lead to surprising behavior because such a range includes 0, meaning that a confirmation() would succeed even if never confirmed. Jonathan amended the feature to require explicit lower bounds for this reason — see Ranges without lower bounds in the proposal for more details.
One question from this pitch thread was whether this API might allow tests to become under-specified and lead to flaky behavior. While this is possible, we think there are valid reasons to need to specify a min and/or max number of expected confirmations, and in particular, the ability to specify "at least N" (without any upper bound) can be quite useful. Analogous functionality in XCTest (expectedFulfillmentCount) has existed for many years and we have not observed that to be problematic.
This proposal is accepted. Thank you to everyone who gave feedback!