Property-based testing of Swift syntax trees

dan-zheng · November 30, 2020, 11:41am

Context

Let's say I'm designing a language feature/transformation that involves some degree of full/partial language coverage.

Examples:

.swiftmodule module interface files need to work (round-trip printing and parsing) for all Swift modules. @harlanhaskins @jrose
A new SIL transform needs to correctly handle all potential SIL programs without crashing or miscompiling (or breaking OSSA). @Michael_Gottesman @Andrew_Trick
Differentiating function bodies (as part of differentiable programming in Swift) needs to work for all supported Swift function bodies (represented in syntax trees and the AST, and eventually SIL and LLVM IR). @rxwei

Question

How do I test language coverage?

I can do my best, find bugs, and check in unit tests preventing regression. Or I can have some foresight and preemptively think of untested buggy edge cases, and check in unit tests. Or I can add integration tests based on popular Swift community packages. Or I can...

There are many technology options to solve this testing use case. Which is best? Example criteria:

Flexibility and expressivity (what can I do with this tech)
Practicality (effort of implementing and maintaining it)
Maintainability (modularity, separation of concerns, is it written as a SwiftPM package)

Right now, we have the following technologies:

Swift Source Compatibility Suite: highly pragmatic and maintainable. Not expressive, limited to regression testing (only works with existing code, doesn't preemptively discover any bugs for new code).
Swift Evolve: pragmatic and maintainable. Very limited expressivity so far.
Naive fuzzing. See @practicalswift and their fuzzing tests - it mostly catches garbage Swift code that crashes the parser or name lookup.

Solution

I like property-based testing, at least for use case (3) above. @marcrasi and @shabalin implemented variants of generators of random Swift code, for different purposes: autodiff correctness (Marc) and autodiff performance for different language features (Denys).

Marc's work: Randomly-generated Tests for Swift AD
Denys' work: generate a simple Python AST and generate Swift code from it with specific language features enabled. (not yet open-sourced, ask us if you're interested)

Are there good property-based testing libraries suitable for generating simple ASTs, lower-able to Swift? SwiftCheck is hefty, I think it could be distilled to a simpler core with less abstraction and fewer cute syntax. @codafi

Any other solution ideas that hit a criteria-satisfaction sweet spot?

shabalin · November 30, 2020, 12:09pm

From my early experiments in this area: it's easy to combinatorially generate random syntactic trees, but it's quite hard to generate non-trivial random trees that pass all static checks and correspond to valid programs that do something remotely resembling real workloads.

This has been explored in greater detail in projects such as CSmith.

dan-zheng · November 30, 2020, 12:28pm

@shabalin just provided supporting details via DMs:

> It's easy to generate random trees, it's hard to generate random programs.
> Even on mini subset just for AD in autodiff-gen, you need to encode quite a few type system expectations into the code generator.
> Emitting programs without runtime errors is even harder. autodiff-gen tracks array sizes as part of the type to avoid emitting array load/stores that are going to fail.
> This means that your code generator might need to have more powerful type system than a source language (!!!)
> autodiff-gen doesn't yet have tensors, but if it did, it would have to have the full tensor shape and dtype as part of the code gen type system.
> Otherwise even picking operations on tensors combinatorially respecting Swift types is unlikely to work at runtime if they have incompatible shapes.
> Combinatorial generation as in quickcheck is easy as long as your datatypes as simple as collections. I haven't seen any success using those techniques for syntactic trees of non-trivial languages.
> Dropping elements from array is way simpler than dropping statements from the program.
> One allows independent removal the other can have non-trivial effects (i.e. what if you remove a let that defines a local variable, this makes the subsequent statements invalid).

All this (random generation, aka property-based testing) sounds highly expressive, practical, and maintainable. A nice implementation with minimal complexity would be ideal.

There's still research in this area. I enjoyed the Chalmers FP 2020 seminar "Backtracking Generators for Random Testing" on the topic (spectrum between brute enumeration and random exploration):

http://chalmersfp.org

dan-zheng · November 30, 2020, 12:31pm

I got C-Reduce to work with Swift programs and a Swift-compiler-invoking interestingness test (after meeting and bugging John Regehr at LLVM Dev Meeting 2019).

It works okay. It is very "non-intelligent", I found the reducer to be highly antagonistic (it tries as hard as possible to reduce to an empty file), which was fun to fight and outsmart. I would like to write a blog post sometime.