How robust is the implementation of recursive enumeration types?

CTMacUser · June 3, 2019, 7:09am

This may need to go under Development/Compiler instead.

All I want to do is parse CR, LF, and CR-LF (and possibly CR-CR-LF) line breaks.

I've attempted custom parsers before. But this seems generalizable.

Something like the Aho–Corasick algorithm should help. I started reading and saw something called a trie, and started working on that. Of course, when I read the whole thing, I realized a straight trie can't work, because to use it with ACA, the trie needs back-references and payload information. I decided to work on a straight trie anyway, maybe to propose it here, and go back to ACA later.

Other Swift tries I perused used a class for nodes with the element value and a dictionary to follow-up nodes. But that would be too easy for me; I'm going to punish myself with a ridiculously recursive enumeration type. I made a Swift package manager project on my macOS system, and am using a Xcode project file generated by SPM. (I haven't tried from the command line yet.)

I first made a linked-list type to represent the sequences to be stored in the trie. Then I made the trie type. They're both at a gist I made for this thread. The linked list and its tests went fine. I did the trie and its tests in piecemeal too. But when I got noticeably past the insertion test, I got devastating problems, all intermittent.

The compile cycle crashes with a Segmentation fault 11. I just repeatedly re-build, sometimes build-cleaning and/or restarting Xcode, and I bully through.
testInsertion() fails with the calculated tries with junk values. The junk values aren't from other parts of the program; they're generally new. I hope I just messed up my algorithm, because my best other guess is that indirect enum types have a recursion limit, and frying that limit causes memory corruption!

(Maybe the bullying through the segment-fault 11 is causing the corruption.)

I also have a wrapper trie type, but didn't included it in the gist. Again, I occasionally get corrupt values. My direct trie type used Int, while my wrapper trie type used Character. Sometimes the junk Character value was none at all! I was going to switch to UnicodeScalar (since it doesn't use remote memory), but decided to ask for help here first instead. In the wrapper type, only the methods that call NeverEmptyTrie.doInsert to make the trie bigger trigger the same corruption, the methods that call NeverEmptyTrie.init(unify: with:) don't have (obvious) corruption.

Joe_Groff · June 3, 2019, 4:26pm

There are no intentional recursion limits on indirect enums. If you're performing recursive algorithms on them, there's a chance you'll overflow the callstack if you recur deep enough, but that would always manifest as a runtime segfault and not as memory corruption, so this is most likely a compiler bug. If you haven't yet and you have a moment, we'd appreciated it if you filed a bug with your code examples to bugs.swift.org. Thanks!

Michael_Gottesman · June 3, 2019, 5:47pm

I took a quick look on master. We are hitting an assert when emitting doInsert.

CTMacUser · June 10, 2019, 3:42am

I filed SR-10905, "Memory Corruption while Building Recursive Enumeration Instance."

Joe_Groff · June 10, 2019, 2:26pm

Thanks!