Brainstorming customizing matchers

allevato · September 22, 2023, 3:52pm

Spawning my comments here into a new topic.

My experience with matchers mostly comes from GoogleTest in C++ and Truth in Java. Right off the bat, swift-testing provides a much nicer syntax for predicate-like assertions than those frameworks due to the fact that the language and macros let you just write what you mean. #expect(x == y) is so much cleaner than EXPECT_THAT(x, Eq(y)).

That's also the appeal of Nimble's operator overloads, although their syntax is slightly different since swift-testing has the advantage of being able to parse the actual expression and transform it. And Nimble also falls back to traditional(matcher(syntax)) when you go outside those common operations.

So (as directed) I wanted to start this thread to brainstorm ideas for how swift-testing might tackle similar problems around customization while sticking as close as possible to the elegant #expect(whatIWantToTest) that you've created. There are two related problems to tackle:

How do I express a specific kind of test that isn't something innate like ==?
How do I provide meaningful feedback about the failure to the test runner?

Looking over some tests I've written recently, I've had to reach for the following things that aren't easily expressed as traditional binary operator expressions:

expect that some array contains a specific subsequence
expect that two collections have the same elements, disregarding order (the collections themselves may be ordered, like arrays)
expect that some collection is a superset of some other collection, disregarding order
expect that two protobuf messages are equal, ignoring some fields

Another complication with traditional matchers is that they can be composed. Most matcher frameworks let you write things like "expect that every element in this collection satisfies some matcher" or "expect that this collection is a superset of elements satisfying these matchers".

Many of the cases above could be written using various collection methods, but if all the testing framework ends up seeing is the Boolean result, all context is lost—"expected true, got false" is less helpful than, say, showing what the actual collection contained. The docs cite #expect(x == y) as a case where the macro can pick apart the values and provide that helpful context.

Figuring out how to generalize what you've done there for other arbitrary expressions would be really powerful. If I write #expect(x.hasPrefix(y)), I'd want the test output to tell me what x and y are and that it was a prefix test that failed. You have that done already, which is lovely! But I wonder if we'd ever want an expectation's failure to have more context about the specific operation that was performed. That's easy to do when matchers have to be their own unique functions/types; the SomeCondition in EXPECT_THAT(x, SomeCondition(y)) can do whatever it wants without the testing framework having to be specially aware of it, but I'm curious how we could extend swift-testing to have hooks for that kind of additional context without twisting the expectation's call site.

I'm really excited to hear the swift-testing team's thoughts on this!

Jon_Shier · September 22, 2023, 4:34pm

Personally, I don't care much about the matching syntax (still just using XCTest here) but the failure messaging. If proper messaging can't be supported by the clever use of Swift syntax in the macro then it shouldn't be used. Mostly this should include full diff support (like we can get from the PointFree testing tools) so that #expect(x == y) can tell us exactly what didn't match between the two values.

Producing diffs may also be useful for other features, like your "equal except for a few properties" case, to create something like #expect(x == y).except(\.first, \.second). Of course, such usage would probably be much easier (unless the meaning of except was dynamic) if there were explicit matchers. #expectEqual(x, y).except(\.first) may work.

In general, though, I'd expect the testing framework to include not just the general syntax but additional tools that can be expressed in tests. For example, your superset case could be #expect(x.isSuperset(of: y) or whatever the normal collection syntax is, even for collections that aren't Sets. And these sorts of utilities should be useful outside the matchers themselves, incase I want to dynamically produce a diff as part of my testing.

allevato · September 22, 2023, 4:46pm

This is a good example of what I'm driving at, because in the case of #expect(x.isSuperset(of: y)), I don't care about the exact values of x and y as much as I care about the difference between them. Right now, swift-testing gives the following output:

@Test func superset() {
  let x: Set = [1, 2, 3, 4, 5]
  let y: Set = [2, 3, 6]
  #expect(y.isSuperset(of: x))
    // Expectation failed: (y → [2, 6, 3]).isSuperset(of: x → [1, 2, 4, 5, 3])
}

This is already a great improvement over XCTest, but in this example, I'd love to see something like "x contains [1, 4, 5], which are not in y"—especially if one or both of the collections I'm testing are large.

So what I'm mainly interested in is if we can thread the needle to generalize operations like this so that users can write what straightforward code like x.isSuperset(of: y) without having to encode special knowledge of those collection operations into the macro itself, which wouldn't be as extensible for custom types/behaviors.

Of course, I have to acknowledge that that wouldn't by itself solve the composed matchers use cases. If the superset relationship I'm testing is a collection of other matchers instead of just elements, it's not clear how to achieve that (or if we'd want to achieve that) with the expected syntax.

grynspan · September 22, 2023, 4:54pm

We've built the #expect() and #require() macros in such a way that any binary operator or nonmutating member function call should already "just work", even operators that are not part of the standard library.

Regarding custom behaviour for types not visible to the Swift standard library: there's an interesting general problem here, and there are a few possible solutions that present themselves. We've been looking at these two approaches in particular:

Exposing some sort of #customExpect() macro that resolves to a call to the underlying matcher function, but which can be customized by individual packages for testing code that uses them.
Exposing a protocol such as CustomExpectable that provides appropriate hooks and which can be detected after macro resolution, during the second type-checking pass.

Does either of these approaches sound workable? Were you thinking of something different?

Jon_Shier · September 22, 2023, 4:55pm

It could be as simple(ish) as having specialized overloads of #expect like #expect(x, isSuperSetOf: y) but that could lead to other issues (like overload scaling). That's why I suggested the testing framework should include more general utility like the ability to compare arbitrary collections. This functionality likely isn't suitable for general use due to performance or other concerns, which means we can build versions that allow for diff computation to be used in failure messages.

allevato · September 22, 2023, 5:03pm

Unfortunately those overloads would have to be hardcoded as part of the macro declaration itself. An interesting approach in that vein would be something like what's possible for string interpolations, where you can put whatever labels you want in the interpolation as long as there's a matching DefaultStringInterpolation append method that takes those labels. This would need enhancements in the macro system to allow a macro to take arbitrary labeled arguments not known at the time the macro is declared, but then the macro would be able to see those and it could stitch together a reference to some API using the label as part of the name, and the macro would generate a call to that API.

This sounds promising and I'm definitely interested in more about what you have in mind here!

smontgomery · September 22, 2023, 5:15pm

One of the guiding principles we have for this project is to keep the APIs approachable and easy to learn, and we believe a big part of that is avoiding a large number of specialized functions or macros for each type of validation.

By comparison, XCTest has several dozen XCTAssert-family APIs, and something we have noticed is that both newcomers and experienced engineers often forget to use the most appropriate function, or aren't aware a more useful specialized API exists. They often end up using the more simple XCTAssert(x == y) pattern, but that results in failure messages that aren't very useful.

I'd greatly prefer if the number of #expect macro declarations is kept as small as possible, and the different usage patterns are expressed via the expressions you pass to #expect(...). For example, if you want to check that two values are equal except ignore certain properties in that comparison, you could define a function like func equals(_ other: Self, ignoring: ...) -> Bool on the relevant type, and then use it as #expect(x.equals(y, ignoring: \.first, \.second)). At least, that approach would be the first solution we would reach for. For scenarios that doesn't adequately cover, I think we could explore some of the alternatives @grynspan mentioned above.

Jon_Shier · September 22, 2023, 5:20pm

Makes sense, and I appreciate the principle, but personally the syntax is second to producing useful and powerful failure output. After being spoiled by PointFree's inline diff output for failure messages I really can't go back to "assert failed". But your suggestion of functionality added to types (to add things like equals(_:ignoring:)) makes a lot of sense. Do you have an idea of how that would work?

grynspan · September 22, 2023, 5:21pm

#expect() expands to a call to one of various functions we call __check() (because their names start with that.) It would be feasible to include overloads of these functions that constrain to some CustomExpectable, and which evaluate the passed operation, then use members of that protocol to supply diagnostic information if the expectation fails. For example:

public protocol CustomExpectable {
  ...
}

extension MyType: CustomExpectable {
  ...
}

let x: MyType = ...
let y: MyType = ...
#expect(x == y)

In the scenario above, if x does not equal y according to its == operator, we could teach the runtime elements of swift-testing to emit some value derived from a protocol requirement of CustomExpectable in place of whatever we currently emit (which is based on String(describing:) in the general case.) You'll note I'm being vague here, because I don't think we've pinned down exactly what the appropriate protocol requirements are, but as a general-purpose solution, does this seem like a good direction?

allevato · September 22, 2023, 5:39pm

That does sound interesting. I'm assuming it would expand to other kinds of checks as well? Looking at the macro expansion for #expect(y.isSuperset(of: x)) we could have some kind of requirement on CustomExpectable that would be called if __checkFunctionCall fails, and it would receive as one of its arguments the name of the method, and it could look at that and compute the set difference? That could be feasible.

Thinking about the analogy with custom string interpolations some more, it would be nice if that was a bit more automatic, like if the macro could generate a call to some function that has a matching name and signature but returns diagnostic information. But it's not immediately clear how you'd make that work in a type-safe way, because the macro can't conditionally generate that diagnostic call based on whether that helper function exists somewhere or not.

grynspan · September 22, 2023, 5:55pm

The expression passed to #expect() needs to be valid before the macro is expanded. So if the expression is a member function call to a member function that doesn't exist (e.g. Array.isSuperset(of:)), compilation will fail. The specific case of isSuperset(of:) is therefore constrained to types conforming to SetAlgebra. Set in particular conforms to Collection, and collections already have special casing in swift-testing—we haven't extended that special-casing to member function calls, but we could do so easily. With that in mind, this sort of approach is likely to be more useful for things like protobuf, which the testing library doesn't know about.

Macros do not have type information during expansion, only syntax tree information, so they cannot tell if an arbitrary expression results in (or is composed of) a value of a particular type. Any overload of __check() that the macro expansion calls therefore needs to have a generic fallback (which is usually straightforward to provide, at least.) Does that clarify things at all?

allevato · September 22, 2023, 6:09pm

Right, I think we're in agreement here—that's why I was saying it wasn't immediately clear how that would work since the macro wouldn't have that information. (But I was leaving the door open for someone to have a brilliant idea I hadn't thought of )

Perhaps there's another approach here—to take @smontgomery's example of func equals(_ other: Self, ignoring: ...) -> Bool, maybe the right thing for that function to do isn't to return a Bool, but to return an ExpectationResult (or whatever you'd want to call it) that could be either .success or .failure(reason: String). Then #expect could have an overload that takes an ExpectationResult, and all users would have to do to write custom behavior and provide custom failure diagnostics would be to implement a single function?

taylorswift · September 22, 2023, 6:30pm

if you want some examples of convergent evolution, check out (operators, assertions).

i will add that i’m actually not too happy with some of the operators i’m currently using, such as !*?, they are cryptic and it takes a few open files for Copilot to figure out what they mean.

grynspan · September 22, 2023, 7:30pm

That's an interesting idea and worth exploring. Would you mind filing an issue about it on the swift-testing repo and we'll track it there? Thanks!

allevato · September 22, 2023, 10:32pm

You got it! Provide an API for custom expectations with diagnostics · Issue #11 · apple/swift-testing · GitHub

anandabits · September 25, 2023, 9:30pm

ExpectationResult could also carry additional failure information and model behaviors other than success or failure. For example, in property based testing generated input values are sometimes discarded.