[Pitch] Polling Expectations

Something I've found to be of immense utility in tests is the ability to continuously re-evaluate an expression until it passes. Think monitoring changes made by a background activity when it doesn't make sense to add a callback mechanism or similar async mechanism (for example, that a ViewModel's property is updated as a result of a network request or db query). This proposal adds a way to check that an expression passes at some point during the course of a given timeout period, as well as a way to check that an expression always passes during the given timeout period.

Read the full proposal here.

And if you're feeling adventurous, there's a preliminary implementation at swiftlang/swift-testing#1115. To use this, add my fork of swift testing as a dependency, specifying the polling-expectations branch (add .package(url: "https://github.com/younata/swift-testing.git", branch: "polling-expectations") to the dependencies list in your Package.swift). Then, when importing testing, use the Experimental spi (so import it in swift as @_spi(Experimental) import Testing).

6 Likes

First question: does this need to be a macro? Could it instead be something similar to withKnownIssue where it looks out for/intercepts issues? Something like:

await confirmationButNotNamingIsHard(.passes(10)) {
  let potato = try #require(await downloadPotato())
  #expect(potato.isTasty)
}

?

That's a very good point to bring up. This proposes macros without too much thought into alternatives - or how difficult it is to write & maintain macros :grin:. I was much more concerned with the expected behavior of this.

It really comes down to how often we expect or want people to use this. The test author experience is a bit more seamless if we use macros. One of my complaints about the existing confirmation api is that extra bit of setup required, which pushes me to use other mechanisms over callbacks. On the other hand having that little extra friction might be desirable to prevent overuse. Working with Nimble, I've definitely seen developers default to using polling expectations over standard expectations in order to avoid having to think about which is actually appropriate. Which isn't all that bad, but definitely is a smell. I don't know how to weigh that against the desire to make it easy to use.

It's… well, complicated to say the least. And it affects codegen and documentation in sometimes-unexpected ways (e.g. source location information is less reliable.) So if we can find a way to build API here without using them, that's preferable. That said, the closure-takin' #expect macro implementation is probably already sufficiently built out that adding something like this wouldn't involve much new macro code, if any.

Dunno if that scattered collection of thoughts helps… :melting_face:

1 Like

I like the proposal. :star_struck:

Although polling is not my preferred way of testing for something happening in the future, there are definitely cases where you don't have much alternatives. Yesterday actually (Instrument Processor kata):

try instrumentProcessor.process() 

let deadline = Date().addingTimeInterval(1)
    while deadline > Date() {
        if console.output.contains("Error occurred") {
            return
        }

        usleep(1000)
    }

    Issue.record("Error event was not written to the console")

Note that the 'expectation' is hidden in this piece of code in the condition and return statement.

Having support for this in Swift Testing would make the expectation both more explicit as well as more recognizable. I.e. I could do a pass over all tests that have an #expect(until: ) and refactor when needed.

Is there already a preliminary implementation I can try out?

About PollingBehavior, isn't passesAlways more or less the inverse of passesOnce with an inverted boolean check?

I.e. are these equal?

#expect(until: .passesAlways) {
    f()    // func f() -> Bool
#expect(until: .passesOnce) {
    f() == false
}
1 Like

Yes please! The general pattern is extremely common, having a tool for it is extremely welcome. (I use .toEventually heavily in Nimble and miss it when I'm not Nimbling.)

A comprehension question: What does "continuously" mean in "these macros will continuously evaluate the closure"? As each closure-execution finishes the next is async-scheduled immediately? Or with a fixed small delay? Or an adaptive delay? Or something else?

The case I'm currently running into very often is with SwiftUI snapshot tests: waiting to run the snapshot until some async processing inside the SwiftUI runtime has reached the state I want. In those cases I can usually find some Observable object whose state change is what I need to wait for. I wonder if an async-sequence-driven version of the pattern might be possible without polling, where the closure is only evaluated when a change to the observable object pushes an update (and also once at timeout) -- would you have any opinions about that variant?

I think this would be useful for UI Tests which I assume will support Swift attesting at some point

Thanks for the support! I agree that polling often isn't the best, but it is frequently incredibly useful. If there are other avenues for being notified when a change happens, then test authors should use those instead of polling.

Not yet. I am actively working to implement this, and I hope to put up a branch soon.

Not quite, but you're on the right track. In the trivial example (func f() -> Bool { true }), then your first example would pass, and your second example would fail. However, both would run until the timeout passes.

So the difference in behavior is:

  • if it returns true, then passesOnce will stop polling without reporting a failure. While passesAlways will continue to poll.
  • if it returns false, then passesOnce will continue to poll. passesAlways will stop polling and report a failure.

:heart: (I'm the person who maintains Nimble, thank you for using it)

My current working branch is re-evaluating the closure each time it finishes (and we're not at one of the stopping points). Nimble's implementation includes configurable logic for a delay between attempting to reevaluating the closure. I chose to leave that out of this proposal because it's simpler and most people probably don't care. The few bits of code I've seen that did change the default, changed it to have 0 delay between evaluation attempts. However, Nimble does not record or report any usage metrics, so I don't know if being able to specify a delay is actually useful for others. Additionally, because this is all handled using Swift Concurrency, we're kind of at the whim of whenever Swift Concurrency schedules these tasks, so it might introduce slight delays here and there - especially considering that Swift Testing tests themselves are all tasks scheduled by Swift Concurrency. It's definitely not the case polling will spin-lock the thread until a stopping point is reached.

I can see this as a potential future direction. Or at least something as a tools integration point. It'll be difficult to do this in Swift Testing itself: The Observation system in Swift requires using the Observation module, which is difficult to integrate in Swift Testing because Swift Testing is used to test Swift Package Manager (... am I getting this right, @grynspan?) and that can't have other dependencies.
There does exist the mechanism used for Swift Testing's Foundation overlay, which allows for Swift Testing to provide integrations with Foundation without having a hard dependency on Foundation. But my understanding is that the overlay mechanism is a one-off thing just for this exact case. There is desire for something like the overlay mechanism to go through the evolution process so that other people can use it.
I'd feel much better about waiting for a general overlay system in Swift before adding something to monitor Observables to Swift Testing.

2 Likes

This is also one of the reasons why timeouts, especially fine-grained ones, aren't present in Swift Testing's API surface today.

Our API design also needs to take single-threaded platforms like WASI[1] into account, don't forget.

Iiiit's complicated, but it mostly boils down to "anything that Swift Testing links to can't link to Swift Testing." There's some nuance here, as always, but that's the high-order bit.

Not strictly true, no, and we plan to introduce a few additional overlays for attaching CGImage, UIImage, and NSImage sooner or later. It would be reasonable to look at one for the Observation module if we think there's good API to be had there.

This bit is "inside baseball", as it were, and doesn't directly affect users of Swift Testing or Observation, just us implementer folks. If Swift Testing were normally used as a built-from-source package instead of as a toolchain or Xcode component, then we would just include the code directly in Swift Testing and guard it with a package trait (so that Foundation could disable bits that circularly reference it, for instance.) But because Swift Testing is in the toolchain, it can only really link to things that are also in the toolchain (or the OS).

Observation is in the toolchain, so we're fine there, but if we had libSwiftTesting link to libSwiftObservation, then if Observation's unit tests used Swift Testing, we'd run into the problem where the process has two copies of Observation loaded: the one from the toolchain/OS and the one that was just built for testing. This is bad™, doubly so where the Objective-C runtime is present.

Hence, to break that cycle, we would move code in Swift Testing that links Observation (today: Foundation) into a separate module and then tell the compiler some magic words that mean "when somebody imports the Testing and Observation modules, also import this third module."


  1. Some conditions apply, YMMV, caveat emptor, All Dogs Go to Heaven, etc. ↩︎

3 Likes

I just put up a PR with this - Polling expectations (under Experimental spi) by younata · Pull Request #1115 · swiftlang/swift-testing · GitHub. The code itself could be cleaner, but the feature works.

It's under the experimental spi at the moment, so you'll need to do the @_spi(Experimental) import Testing dance.

2 Likes

Thanks both for the detailed explanations! (And of course thank you Rachel for all your years of work on Quick & Nimble.) I'm (still) enthusiastic about the feature now I understand it better, the rest of this is API bikeshedding.

I would say the hardest thing for users of an API like this is keeping track of the temporal semantics: making the correct choice between the four options "is eventually true once (with a timeout) / is always true (within a period) / is eventually false once (with a timeout) / is always false (within a period)". I have a degree in mathematical logic and I still occasionally get it flat-out wrong with Nimble's API; I've regularly resorted to making little ascii-art sketches of timelines to help figure out what we should be doing.

That's to say that your #expect(until: .passesOnce) and #expect(until: .passesAlways) are most definitely an improvement over .toEventually, but I'm still not sure they're the best they could be, and I think getting really good names here would pay off more than most naming bikeshed discussions. (See also above Maarten's question.)

In particular the timeout value is doing really different work in the two cases: for .passesOnce it's a maximum time to wait-and-hope (and it's great to have a default), while for .passesAlways it's a minimum period to verify over: when I use passesAlways I'm definitely slowing my test down by the timeout period, and my personal feeling is that this decision should be explicitly visible.

Much less importantly, but still trending in the same general direction, until: .passesOnce is lovely API but until: .passesAlways doesn't quite read comfortably for me .

So I'm wondering if the decision to put both of these variants under the same signature might be gently pushing the API in the wrong direction? Using two distinct signatures gives options like

  • #expect(eventually: .passesOnce) which, like the Nimble API, reads beautifully but can't easily cover the "always" case, and
  • #expect(alwaysDuring: .seconds(1)) or #expect(continuouslyDuring: .seconds(1)) which both expresses the semantics more explicitly, and requires the test-writer to explicitly opt in to always slowing their test-run down.

If you're tempted by this direction I can brainstorm more naming suggestions, but is the option on the table at all? Or are you fairly committed to the signature decisions in the pitch?

(I did try to come up with alternatives to until. My best shot is #expect(continuously: .untilPassesOnce(timeout: .seconds(1))) / #expect(continuously: .alwaysPasses(during: .seconds(1))) but it's not any better. I'm not seeing any decent way to get the timeout treated differently in the two cases, being defaulted to once but required for always.)

6 Likes

Very nice!

From this:

        let deadline = Date().addingTimeInterval(1)
        
        while deadline > Date() {
            if taskDispatcher.finishedTaskCalled == "Important Task" {
                return
            }

            usleep(1000)
        }

        Issue.record("Finished task was not called with the correct task \(taskDispatcher.finishedTaskCalled ?? "nil")")

To this:

        await #expect(until: .passesOnce) {
            taskDispatcher.finishedTaskCalled == "Important Task"
        }
2 Likes

Everything about this pitch is up for debate (and will be again at the proposal stage). I absolutely want to bikeshed about naming because I want to make it as easy as possible for developers to get it right - that's something Nimble got wrong[1], and let's do better here.

I do want to balance names against other wants & needs:

  • I'd like to provide a default timeout period. This is a desire, but not a need.
  • I'd like to minimize the amount of macros I add to Swift Testing. This is why I combined the two behaviors under a single signature. Though as you note, this desire might be pushing the API in the wrong direction.
  • To avoid user & compiler confusion, there must be least 1 required argument specific to polling which isn't the expression being evaluated. This is a requirement, so long as we make polling a #expect/#require overload.

As an example of something that won't work: if the api is something like #expect(until: .seconds(1), expression: { ... }), and we use the default timeout, then this can shorten to #expect { ... }, which does not indicate that polling will happen at all.

I do agree that until is not the right signature, at least not for the passesAlways behavior. until is implying that the behavior is "this might fail at first, but later on it will pass", and then the PollingBehavior controls how long we continue to poll after the expression starts to pass. Which is not the case.


  1. Because Nimble has had these APIs for so long, I can't just rename these APIs in Nimble without having an extremely long deprecation period. ↩︎

Maybe having separate signatures for the passes once and passes always outweighs the extra cost of having an additional macro? Also because the passes once one takes at most deadline time to run, whereas the passes always one takes at least deadline time to run.

If we were to add the timeout as a required argument, then we could do something like:
#expect(passesWithin: .seconds(2) and #expect(alwaysDuring: .seconds(1). Providing an actual value for a deadline doesn't sound like a bad idea to me to have the user actively think about a reasonable deadline.

Yes, I now think that's the right choice.

This is starting to dovetail with other conversations I've been having about timeouts, and how inappropriate they are. Especially in the environment of Swift Testing's parallel test runner.

Essentially, Swift Testing's test runner aims to minimize total test runtime, but it doesn't so much care about how long each individual test takes. Moreover, you can't really guarantee "consistent" test execution time with parallel runtime - some tests will run on the E core, some will run on the P core. System load is a thing. So many other variables to take into account.

In practice, this means that using a timeout is not the right approach: they'll work if you have only a handful of tests (or you only run a handful at a time), but once you scale up, then your tests will start becoming flakier and flakier.

As an example, this pitch initially specified a default timeout of 1 second. However, when I implemented this and ran the entire test swift testing test suite, it flaked because #expect(until: .passesOnce) { true } would consistently take >2 seconds to run. However, sometimes the test was still reported as passing, even though the test runtime was >2 seconds. Which is based entirely on whether the timeout task ran first, or the expression-running task ran first. I then changed the default timeout to a minute, and only earlier today did I realize that timeouts are fundamentally unworkable: There will be very large test suites where even a minute will be too short for a trivial #expect(until: .passesOnce) { true }.

I chatted with @grynspan about this earlier, something he suggested is ignoring wall clock times and specifying the number of times to run the expression. Which would basically turn the implementation into a for loop. That's workable, reliable, and easy to implement. But I'm worried that it's going to be difficult for test authors to predict good "number of times to run" values - running a fast expression a million times can happen on the order of milliseconds. Additionally, this drops the ability to cancel the polling if the expression being polled doesn't return quickly enough - though, that is much less of a concern when we have parallel test execution.

I've only come to this conclusion today, I don't have further thoughts on how to decide when to stop polling.

That's where I'm at: timeouts are easy for humans to think & reason about, but are inappropriate for the parallel nature of Swift Testing's test runner. The other idea I've seen is very hard for a human to predict. At this point, I think figuring out the mechanism used to decide when to stop polling is more important than the default value for how "long" that mechanism should run for - or if there should even be a default.

2 Likes

Would a suppressAllIssues() function be enough to implement polling? It would mean that the exact polling behavior wouldn't need to be defined inside Swift Testing, since a utility function (or separate package) could be written.

e.g.

await poll(for: .seconds(2)) {
    #expect(object.status == .ready)
}

With poll being a custom function that calls suppressAllIssues in a loop with a Task.sleep().

Does this also apply to the time it takes to run the "critical" part of the test? I.e. the #expect closure? I thought that once you were there in execution of the test code, the time it would take to execute that part would be predictable (at least on a human level of perception of time).

Another solution would be to just forego timeouts/deadlines for .passesOnce tests. I mean, infinite loops/waits can happen in "regular tests" as well. (Wouldn't be the first time I had to cancel running tests because I forgot to update the guard in a while loop :sweat_smile:)

For the .passesAlways case I can't think of a solution. If runtime is so unpredictable, what would 'passes always during one second mean?'. Maybe it ran once? Maybe it ran a million time? Maybe not at all? Maybe defining it in terms of execution count is better, but I agree that is very difficult to reason about (for a human).

Whether or not it's predictable depends on if it suspends. If it does not suspend, it runs like any other code. If it does suspend, the runtime and/or OS may schedule other work that preempts it.

I think a suppressAllIssues function should be part of a separate proposal. I additionally think that a suppressAllIssues function would be something available under the ForToolsIntegrationOnly spi, and probably expanded to return the list of all issues captured. We absolutely do not want to give Test Authors an out for ignoring issues (which is why withKnownIssue behaves the way it does), but I agree that there are cases when tools authors will write code that generates issues which need to be ignored.

Polling makes sense to include in Swift Testing because it's the most basic part of the "wait for something to update" feature set, which the confirmation API started to introduce.

The Testing Workgroup discussed this today. There's still definitely work to be done on this pitch before it can move on, but there is interest in moving forward (from more than just me!)

Some highlights of this discussion:

  • If we do keep the concept timeouts, then we should disallow timeouts on below the level of a minute.
  • My preliminary implementation does not have the evaluation loop use Task.yield (oops), causing this to unnecessarily take up time in the cooperative thread pool. Fixing this, the issues with timeouts still occur in Swift Testing, but timeouts are fired much closer to when they should be fired. I think more discussion needs to happen on this, because timeouts are a really useful thing for humans to grasp & I, at least, do not have better ideas for answering "when do we stop".
  • Without timeouts, figuring out when to stop is a really hard problem.
  • Several people mirrored this feedback, but with a different argument - polling is very brute force, and should be "somewhat of a thing of last resort". Therefore, making them a macro is perhaps doing a bit much to encourage them. @smontgomery in particular mentioned that this probably should be a variation on/part of the confirmation apis. Which convinced me that this shouldn't be a macro.
  • XCTest kinda has polling expectations through the XCExpectaction API, which run one last evaluation after the timeout has occurred. This proposal should consider that.
  • There was a concern about issues with something that either briefly starts passing and then stops - so brief that none of the polling cycles would catch it - which could introduce flakes. This isn't a problem that can be solved by polling, but does indicate that there's more follow-up work on the overall feature set of "checking that a state eventually changes (or not)".

From that, I'm going to revise this pitch & the PR I have up. I'll be getting rid of the added macros, and will rename it according to confirmation (or something? I'm not certain on that. We can bikeshed on that).

3 Likes