Ah, right, it's coming back to me now (I tinkered with the implementation briefly a few months ago). The Identifier
type has an isOperator()
method (and some helpers) that is used throughout Sema and elsewhere to determine whether an identifier is an operator or not, and it only looks at the first code point because right now that's enough to distinguish it.
If you wanted to apply my suggested rule above that a backticked identifier is an operator if and only if all of its characters are operator characters, then you could modify that function accordingly, and that should fix the issue you're seeing (assuming there are no other required changes elsewhere). There's a comment in isOperator()
about caching that calculation, and if you're checking every character instead of just the first one, it would probably be a good opportunity to resolve that. (There's also some duplication that would be nice to clean up, because the code point ranges for operators are listed both in Lexer.cpp and Identifier.h.)
Another issue I remember running into: Make sure to add some tests that run some escaped identifiers through -emit-silgen
and then feed that SIL back into the compiler to test SILPrinter
and SILParser
. Currently, SILPrinter
only escapes identifiers that match keywords, so you'll need to extend that logic to cover other identifiers that require escaping under the proposed new rules, and then also make sure that SILParser
handles those correctly (which hopefully falls out naturally because I think the lexer is the same).