SILGenPattern Doesn't Actually Implement Maranget's Algorithm

codafi · March 9, 2018, 6:57pm

SILGenPattern seems to claim that it implements Luc Maranget's algorithm for computing efficient decision trees from pattern matrices. To the contrary, every pattern matrix that gets created by SILGenPattern is, in fact, a pattern column vector because tuple patterns are never expanded as clause rows are added to the matrix. To test my hypothesis, I threw some llvm_unreachables all over the pattern specialization code and reran the tests and it passed clean.

Was the goal ever to implement more efficient tuple pattern dispatch, or should I just remove all of this and potentially speed up SILGenPattern a bit?

Joe_Groff · March 9, 2018, 7:22pm

Yeah, the original algorithm is pretty watered down at this point. The fact that most patterns besides enum and tuple patterns don't offer any specialization (since a ~= call is essentially just a guard) also limits its theoretical benefit. Anything to simplify that gnarly code is worth it, IMO. You might coordinate with @Michael_Gottesman who's also talked about cleaning this file up for better ownership management.

Michael_Gottesman · March 9, 2018, 8:45pm

@Joe_Groff @codafi I dont have a problem with us actually using that algorithm. Especially since as time goes on I believe more and more pattern matching will come into play.

The big thing that I would look to see is the splitting of SILGenPattern into two parts: a model that can be tested completely separately that is a straight forward implementation of the algorithm and then the view that is the actual use of the implementation in SILGenPattern. My hope is that this will make the underlying algorithm more robust by allowing us to do gtest based unit testing.

codafi · March 10, 2018, 3:15am

I don't think we actually need Maranget's algorithm. The existing tuple dispatch mechanism seems to work just fine without it, and the only thing we could take away from that paper would be that we could compute a "usefulness score" for each column and dispatch on the column that contains the most non-wildcard-non-var patterns. I'll look into removing it.

Thanks!

Slava_Pestov · March 10, 2018, 5:43am

I'm in favor of removing the unnecessary code and simplifying the logic in this file as much as possible.

Michael_Gottesman · March 10, 2018, 6:02am

Ok. If we don't actually need it no worries. I was imagining more complex pattern matching than that. That being said, if we find it to be useful in the future we can always reimplement it and do things correctly.