[Pitch] Polling Expectations

x-sheep · May 20, 2025, 5:54am

Wouldn't using an SPI make it very difficult to integrate a third-party library into other projects? Right now, trying to do an @_spi import gives me the following message:

'@_spi' import of 'Testing' will not include any SPI symbols; 'Testing' was built from the public interface at /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/Library/Frameworks/Testing.framework/Modules/Testing.swiftmodule/arm64-apple-macos.swiftinterface

This is after I've added an explicit dependency on swift-testing to my library project, which seems to make no difference to this warning message.

grynspan · May 21, 2025, 2:14pm

We always start with SPI (@_spi(Experimental)) until we've formalized and finalized the API. See swift-testing/Documentation/SPI.md at main · swiftlang/swift-testing · GitHub. You should not use experimental Swift Testing interfaces in production code.

In the future we might switch over to package traits instead, but this is how we do it for the moment.

x-sheep · May 21, 2025, 2:43pm

I understand, but the previously suggested ForToolsIntegrationOnly SPI annotation does not have a defined removal date. It also does not seem to be intended to be used by third parties, but only other projects directly connected to Swift (like the Package Manager).

So using the Experimental SPI makes sense, but it doesn't seem like a good idea to permanently hide this function behind an SPI just because it allows developers to handle their issues incorrectly.

grynspan · May 21, 2025, 2:56pm

As a matter of policy, Swift Testing does not expose tools-oriented API (that is, functions/etc. intended for use when integrating with Xcode, VS Code, etc.) in its public interface. We use @_spi(ForToolsIntegrationOnly) to guard such things. I don't believe @younata intends to hide the proposed API behind this attribute as it's not a tools-specific interface.

~~If you're referring to this paragraph in the proposal:~~

~~> We will expose the polling mechanism under ForToolsIntegrationOnly spi so that tools may integrate with them.~~

I think she means (not to speak for you, Rachel!) specifically bits and pieces of the feature that Xcode/VS Code/etc. might need access to, but a test author wouldn't need. If you're referring to something else that uses @_spi in this thread, I must have missed it; a citation would help.

Edit: Oh, I assume you mean this reply? It's certainly appropriate to discuss whether such an interface would make sense as public API or not, although I would personally agree with Rachel's position that it should not be public. (When in doubt, don't make something API; we can always add new API, but removing API is difficult-to-impossible.)

All that said, Swift Testing's broader policy around the use of @_spi to guard tools-specific interfaces is beyond the scope of this pitch thread, so if you'd like to discuss it further, let's start a new thread dedicated to that topic. Thanks!

tikitu · May 23, 2025, 11:36am

Glad to hear you're on-board with separate signatures for "passes once" and "passes always", I think that's helpful no matter how the macro and timing implementations shake out.

That's quite surprising to me. Many of my own use cases for this are more along the lines of "wait a small number of UI cycles for to stabilise": either that will pass much earlier than 1 minute later in local-test-time, or it will never pass (because it stabilises and my expectation was wrong). I would not be particularly enthusiastic to have my failing test cases get an automatic minimum-1-minute time penalty!

Do I understand correctly that this is something like:

we start measuring wall-clock time for the timeout, and do not yet poll the expectation even once
by the time the first expectation-poll gets scheduled in, the timeout period on the wall clock has already passed
then test failure or success depends on which of the two tasks (timeout or expectation-poll) executes first.

If that's right, neither option seems quite right: if the timeout task runs first and the test fails, the intuition objects because "it was always true, and nothing in the test made it take too long, this is pure flakiness". But if the expectation-poll task runs first and the test passes, the intuition objects because "if I put a timeout of 1 second, I don't want my test taking 2 seconds unless that test (or code it's testing) includes too much waiting".

If that's right, is there any way to measure something a little closer to "time spent in this test"? I understand that's not a concept to throw around lightly, but it seems a good match for the specific programmer intention for a polling timeout: "I'm instructing you how long you should keep trying to do this before giving up". (It would be amazing if this could limit time spent on the test and all its descendent tasks. That would come even closer to the programmer intuition of "limit the time you spend doing this".) There's still a surprise in store, if the time-in-test-tasks is limited to 1 second and the wall-clock time shows the test took 2 seconds, but perhaps that's addressable with careful naming and documentation.

tikitu · May 23, 2025, 11:56am

I need to get up to date on the Swift Testing metadata/annotation story, but if we could set a default timeout at the test/suite/suite-of-suites/etc level, I think that would give a nice level of configurability: if I'm in a huge suite and need to raise my default I have a mechanism, while if I don't need that, I can get the benefit of faster failures.

grynspan · May 23, 2025, 7:55pm

Something of possible relevance to the timeout discussion: Add an overload of `confirmation()` with a timeout by grynspan · Pull Request #789 · swiftlang/swift-testing · GitHub

younata · May 27, 2025, 11:34pm

tikitu:

Do I understand correctly that this is something like:

we start measuring wall-clock time for the timeout, and do not yet poll the expectation even once

by the time the first expectation-poll gets scheduled in, the timeout period on the wall clock has already passed

then test failure or success depends on which of the two tasks (timeout or expectation-poll) executes first.

If that's right, neither option seems quite right: if the timeout task runs first and the test fails, the intuition objects because "it was always true , and nothing in the test made it take too long, this is pure flakiness". But if the expectation-poll task runs first and the test passes, the intuition objects because "if I put a timeout of 1 second, I don't want my test taking 2 seconds unless that test (or code it's testing) includes too much waiting".

You are correct. This problem is basically caused by 2 things:

The first is that Swift Testing's test runner is essentially this:

await withTaskGroup { taskGroup in
    for test in allTests {
        taskGroup.addTask { await test.run() }
    }
    await taskGroup.waitForAll()
}

This means that, if you have 300 tests, then 300 tasks will be submitted to the concurrency system. This was chosen because it's both the simplest implementation, but also because it optimizes total test runtime. This also has a number of downsides (it also pessimizes the memory consumption once you start having async tests), including making timing become ever more unreliable the larger your test suite is. That said, changing how the test runner works is outside the scope of this pitch.

The second is that my implementation scheduled 2 tasks simultaneously, which looked something like:

await withTaskGroup { taskGroup in
    taskGroup.addTask {
        try await Task.sleep(for: timeout)
        return PollingResult.timedOut
    }
    taskGroup.addTask {
        while !Task.isCancelled {
            if await closureToBePolled() {
                return PollingResult.success
            }
            await Task.yield() // originally, this wasn't here. Adding it later helped somewhat, but didn't solve the problem.
        }
        return PollingResult.timedOut
    }
    defer { taskGroup.cancelAll() }
    return await taskGroup.next()!
}

(Side note: I like this implementation a lot, specifically because it can allow us to handle the case when the closure being polled stalls/doesn't return even once.)

Combined with how the test runner works, this means once you start a polling expectation, you are now at the mercy of the task scheduler to decide which tasks should be run, and on what processor they should be run on. Meaning either subtask could finish first and there's very little we can do to influence that. On the M1 Max I code on, running the full test suite for Swift Testing results in polling expectations having multiple-second variations in runtimes.

We could change the polling implementation to something like:

let endTime = clock.now.advanced(by: timeout)
while clock.now < endTime && Task.isCancelled == false {
    if await closureToBePolled() {
        return PollResult.success
    }
    await Task.yield()
}
return PollResult.failed

Which would involve stalling the test runner if closureToBePolled stalls. Which is inelegant, but not that big of a problem for parallel test execution. However, the only thing we get with this approach is a guarantee that closureToBePolled() will be called at least once before any timeout. However, in systems under heavy enough load, this just means you'll end up with test flakes as the closure could run once, fail, and never run again. This is also possible even if we were to entirely disable parallel in-process test execution (like how XCTest works), just much less likely (but still possible, especially on CI/in VMs).

From my understanding of the concurrency system, this isn't something we can really do. To my knowledge, there isn't a way to get information on time spent waiting for async code to run vs. time spent running the async code. We'd have to restrict the polling to only poll synchronous closures, and add up the time between the await Task.yield() calls manually. Neither of which I think is worth doing.

younata · May 27, 2025, 11:57pm

Yesterday, I updated both the proposal document & the pitch document based on this. The main changes are:

Removed the #expect/#require macros.
Added new confirmPassesEventually and confirmAlwaysPasses functions (plus variants that evaluate based on a closure returning an optional).
Remove timeouts, now polling will execute the closure up to 1 million times before executing due to the "timeout".

Some additional things I want to improve upon:

Naming of these functions. I think that what I chose is fine and reads nicely, but isn't the best.
I want to add an equivalent for require, but I don't know what to name it. My first thoughts were requirePassesEventually and requireAlwaysPasses, but those don't read nearly as nicely.
When to cut off polling: 1 million was chosen arbitrarily based on how long it takes fast-running closures to run. But is not appropriate for closures that take longer. I could add an argument for the number of iterations to stop at, which shifts the burden to the test author, but that doesn't really solve the overall problem of making it easy to figure out when the stop. My early thoughts are to add a type (enum? protocol?) to represent the cutoff strategy, with the default utilizing number of iterations. But these aren't fleshed out all that well.
With the lack of timeouts, I wonder if adding a test-author-specifiable delay between polling attempts would be worth doing. This way, test authors could say "attempt polling 1000 times, but wait at least a millisecond between polling". Which approximates the "Poll this for 1 second" behavior that I & others want, while bypassing the timeout problem. This does feel like a bit of a hack.

tikitu · May 28, 2025, 6:05am

I could see tracking the time manually as worth it, but sync-only would be terrible. Oh well.

I think the combination of max-number-of-iterations and min-time-between-iterations would get me to the spot I'm looking for: I can set both values quite small and a legit failing test won't massively slow down my run. If it also gets you the robustness under heavy concurrency load (by raising either I think, or both), seems like a win!

tikitu · May 28, 2025, 6:19am

I'm not as convinced by confirmAlwaysPasses<R> as by the rest. It seems like it can easily cause confusion if the iterations are not literally returning the same object: the semantics is clear enough ("return the value from the last time body was invoked") but I can still see it being a source of confusion. It should be possible to get (very nearly) the same behaviour even if you didn't provide that one:

confirmAlwaysPasses {
    getTheThing() != nil
}
let theLatestThing = getTheThing()

tikitu · May 28, 2025, 6:50am

Naming is hard!

The more I think about this the more I like just eventually for the passes-once family of variants: confirmEventually and requireEventually read well for both the Bool-returning and value-returning cases (whereas confirmPassesEventually is odd when returning the value).

The tricky one remains the continues-to-pass family. Maybe it helps to brainstorm options for expressing those configuration parameters?

confirmAlways(times: 1000, waitingAtLeast: .milliseconds(1))
confirmAlwaysPasses(times: 1000, waiting: .milliseconds(1))
confirmAlways(polling: .times(1000, minTimeBetweenIterations: .milliseconds(1)))
confirmContinuously(times: 1000, minWaitTime: .milliseconds(1))
confirmContinuouslyPasses(times: 1000, minWaitTime: .milliseconds(1))

With the configuration visible I lean gently towards confirmContinuously I think: "confirm continuously 1000 times" rather than "confirm always 1000 times".

Do you want users forced to spell out the polling configuration? I feel like that's justifiable for the continues-to-pass family (as in "you're guaranteeing your tests take at minimum X extra time to run: let's make that visible"), not so much for the the pass-eventually family. If they're not required here, being able to override the global default with a test scoping trait would be really nice.

maartene · May 28, 2025, 7:04am

Hi Rachel,

I tried out the new API. I believe not using timeouts, and using a maximum execution count as a guard is a good idea. I do miss the explicit #expect lines in the tests. They are the first thing I try and find in a test. Though I can get used to scanning for these function names as well.

Would it be a good idea to be able to provide a maximum execution count as an optional parameter? Because not all 1-million-executions are the same.

The confirmAlways/confirmContinuously functions I find a bit harder. A use case I can imagine is that you play an animation and you want to validate that during the time the animation plays a certain state is valid. For instance, that a button is disabled. Though I wonder if confirmAlways/confirmContinuously provides enough confidence. I could also test for the state changes I guess. First that eventually I go from enabled -> disabled and then from disabled -> enabled. Maybe you and @tikitu have other use cases in mind?

younata · May 29, 2025, 6:53am

Thanks! I do like the idea of keeping the naming separate from the #expect/#require macros, to specifically discourage overuse of polling. Not having to write and maintain more macros is also a benefit (especially for code using a non-toolchain version of Swift Testing, where macros significantly degrade build times).

My personal use case for confirmAlwaysPasses is much more to check that something doesn't happen. Frequently, when I write a test verify that some API is called when a conditional evaluates to true, I will write a counterpart test, verifying that the API is not called when the conditional is false.

Fair. I kept that because someone might have an actual use case, which isn't a strong reason to keep it.

Thank you for these. Assuming this goes to proposal review, I fully expect the names of these functions to be bikeshedded even more during that review. (For consistency with the rest of this post, I'm going to continue to use passesEventually and alwaysPasses. Please don't take that to be a dismissal of your suggestions!)

My current thoughts for a more fleshed-out signature are variations on

@discardableResult func confirmPassesEventually( // to be bikeshedded
    // comment
    maxPollingIterations: Int = 1000,
    minimimPollingInterval: Duration = .milliseconds(1),
    // sourceLocation, etc.
    expression: () -> async throws Bools
) async -> Bool

With the default values being suggestions at this point, and not something I'm particularly committed to. Which goes into...

I think it's important to provide decent defaults for all cases. I expect there to be a bunch of confusion from test authors around the lack of explicit time outs, as one of the weak points of my proposal is explaining why that can't work. There's an argument to be made that requiring test authors to specify polling configuration will make it harder to use and thus push test authors to use other, likely more appropriate forms of monitoring state (e.g. the existing confirmation apis). However, I think that not providing defaults makes it significantly easier for test authors misuse polling (e.g. set polling interval to 0, and using a small polling iteration count - which will likely result in a high rate of test flakes). Providing good defaults will help provide guidance by making it harder to misuse this api.

In addition to providing call-site configuration, I think that Swift Testing should allow test authors to configure polling defaults using test/suite traits. Furthermore, I think that, while they might default to the same value, test authors should be able to provide different polling configuration values for passesEventually and alwaysPasses behaviors. When I use Nimble, I use much shorter timeouts for toAlways/toNever than I do for toEventually/toEventuallyNot, and similarly, test authors might want to use lower polling iteration counts for alwaysPasses than they do for passesEventually.

tikitu · May 29, 2025, 12:14pm

I really like where you've gotten to with this pitch! Couple of detailed responses below, but I wanted to say out loud that (a) I'm just as pleased as at pitch-start that you're tackling the problem and (b) I'm very satisfied by now with the detailed shape of the APIs you're proposing.

Yup, this is a subtle and complex thing to have to explain. I expect the current proposal text (under "alternatives considered") will invite a lot of disagreement in review, from people not seeing why but also not ready to accept "are inherently unreliable" without more detail. Maybe something like this could be useful:

Because the test runner schedules multiple tests concurrently, there's no way to choose a timeout value that is reasonable for a single test in isolation: the scheduler might offer any amount of wall-clock time to other tests before getting around to the work that this particular test was waiting for. And the details of this scheduling are highly situation-dependent and variable (think available cpu cores but also transitory factors like system load), making it impossible in practise to choose reliable timeout values.

I see the logic, I think. I want my always defaults as low as I can get away with (they buy me worst-case stability but always cost me test time) while I'm happy to raise my eventually defaults (which also buy me worst-case stability, but only cost worst-case test time which is exactly the case where I want the stability).

I had a look back at where I've tried to do things like this with Nimble (these were always the cases that made me think really hard): they're pretty much all similar to Rachel's example, something that I would want to phrase as "expect(...).toNever(...)". And, again like Rachel's, they often come as a pair with a positive test in one scenario ("given the user has granted permission, when the trigger occurs then we should download the thingy") and a negative test in another ("given the user has not granted permission, when the trigger appears then we should not download the thingy"). The naive version tests immediately and passes even on buggy code, if the thing that is supposed to not happen might come asynchronously "just a little later". In Nimble terms I tend to rephrase these into something like expect(theFlow).toEventually(complete()) ; expect(theThingThatShouldNotHappen).to(notHaveHappened()) where you search for an explicit end-state to wait for (with polling), and only then check that in the interim something did not happen. And that doesn't work out when there isn't a clear end-state to wait for, and you really want to say something timeout-ish like "keep trying for a few seconds, if it didn't happen by then we're probably good": I definitely like that this pitch offers a good way to say this, and thinking it through with y'all has also made me more conscious that I should probably use the alternatives whenever I can.

grynspan · May 29, 2025, 2:40pm

So, I admit I haven't been following this thread as closely as I should, but have we discussed the pros/cons of timeouts vs. counts? Like, "try doing this 10 times" vs. "keep trying this for 10 seconds"? I realize that the repeat count may be hard for a test author to predict, but I would argue the same is true of a timeout (ignoring the issues around concurrently-running tests.)

younata · June 6, 2025, 3:24pm

Test authors only need to predict a good-enough value, and timeouts are much better than counts for that.

Assuming a single-threaded test execution model a la XCTest:

Some timeouts are easy to predict (wait for something [an animation] with a fixed duration), some less so (wait for a very fast closure to complete, wait for some computationally-expensive closure to partially or fully complete). In both of those examples, you can fairly easily come up with a good-enough timeout (though waiting for a computationally-expensive task will likely require trial-and-error, or using some other mechanism for observing the state change).

With only counts (i.e. no polling interval component), the only example I can think of is waiting for a very small closure to run (i.e. Task { self.value = 1 }). Even then, I only have the confidence to guess that the count for polling this should be something on the order of 10. The other examples are extremely hard to figure out, or very convoluted. Waiting for some computationally-expensive task is difficult enough that I would retool the production code to allow me to use some other means to observe the desired state change. Waiting for a task with a fixed duration would be done by first sleeping the test until the duration has passed (plus a decent buffer), and then either polling or not bothering with polling because that sleep should have been enough anyway. In both of these cases, coming up with a good-enough count is extremely difficult.

Adding the polling interval components makes counts viable, by turning them into something like timeouts.

younata · June 8, 2025, 5:17am

I updated my implementation with what I think might be the final bits of logic for this.

I added requirePassesEventually and requireAlwaysPasses, which behave like their confirm counterparts, only they throw instead of recording an issue if polling fails. Like their confirm counterparts, these still are silent if polling is cancelled.

I'm not too jazzed about their names. I'd like to have them mention confirm or confirmation somewhere. Additionally, test authors might confuse them with the #require macro. But I wanted to add the logic first and we can discuss the name.

This adds a delay period between polling (pollingInterval), which defaults to 1 millisecond, but is configurable by test authors. Additionally, the amount of times to poll is now configurable by authors and defaults to 1,000 instead of 1,000,000. Additionally, I added 2 traits: ConfirmPassesEventuallyConfigurationTrait and ConfirmAlwaysPassesConfigurationTrait (with static methods on Trait doing the right thing: confirmPassesEventuallyDefaults and confirmAlwaysPassesDefaults respectively) for specifying pollingInterval and maxPollingIterations on a Suite and Test level.

The new traits added are not well named (they're descriptive, but could be shorter), and I would love help with naming them.

In addition to allowing test authors to configure those parts of polling, I added some guardrails - there's now a precondition for maxPollingIterations > 0 and pollingInterval > Duration.zero (pollingInterval should be > Duration.zero for the same reason it should exist in the first place - to make it easier for test authors to come up with a good-enough guess for when to stop polling).

I updated the proposal document with details behind what all I changed in the implementation.

One last thing: Because this feature is now an extension of the confirmation apis instead of #expect()/#require(), I renamed the feature to Polling Confirmations. I am inclined to leave this pitch thread named polling expectations for those who've already clicked on it. But if this pitch becomes a proposal, then it should be called Polling Confirmations.

younata · July 21, 2025, 4:20pm

At this point, I feel really good about where both the implementation and proposal are.

A short summary of the changes from my last update:

Dropped most of the public API, down to 2 methods (both confirmation(_:until:within:pollingEvery:isolation:sourceLocation:_:)), an enum type with 2 cases (what was the PollingBehavior, now called PollingStopCondition), and the previous error case.
Renamed from confirmPassesEventually/confirmAlwaysPasses to the more api-consistent confirmation. The passes eventually/always passes behavior is now captured in PollingStopCondition, which currently has values .firstPass and .stopsPassing.
- This was driven by the desire to use the confirmation name. I need to somehow distinguish polling from other forms of confirmation outside of the body closure. The other arguments unique to polling are the duration & timeout, which I don't want to make required. Additionally, I needed some way to differentiate the (currently) two stop conditions. confirmation(until: .firstPass) and confirmation(until: .stopsPassing) fulfill both requirements. Plus they read really nicely (to me), satisfying that (un?)official guideline to have APIs read like they could be english sentences.
- Additionally, putting this behavior into the PollingStopCondition enum allows for future expansion of the enum without exponentially increasing the number of confirmation functions. The proposal document already mentions a combination of firstPass and stopsPassing, as well as custom stop conditions as future directions to consider.
All polling confirmations will throw an error when they fail. This is a break from the behavior of the existing confirmation APIs, but I think this is a good decision because test authors will more often want to block further execution of the program if polling fails. I think this is something the other confirmation APIs should do as well.

Again, I think this is in a really good state right now, and I'm ready to move forward with this.

maartene · July 22, 2025, 6:36am

Hi Rachel,

This looks really good:

excellent explanation of workings in the proposal doc
clean, consistent API
reasonable defaults

Looking forward to discussing the next steps during our next workgroup meeting.

+1