Recognise all Unicode line terminators when parsing?

wowbagger · October 2, 2020, 7:27pm

I was reading Lexer.cpp and saw that only \r and \n are treated as line terminators. Should it be expanded to recognise all Unicode line terminators, i.e. U+000A, U+000B, U+000C, U+000D, U+000D + U+000A, U+0085, U+2028, and U+2029?

Since C and C++ don't guarantee that \n translates to U+000A and \r to U+000D, recognising all Unicode line terminators could possibly remove the ambiguity, which might(?) cause problems as more and more platforms become supported.

I'm not sure if it would be a source-breaking change, but it doesn't seem likely to me. Swift doesn't allow most non-printable characters in source files, so there probably isn't any multi-line string or single-line comment (or other things that are sensitive to line termination) that already use these characters.

I'm a noob with Swift compiler, so it would be great if someone can help me see the pros and cons of the current design vs using the entire set of Unicode line terminators.

I also noticed that in some cases only \n is checked as an indication of end of line, and I wonder if they should include \r too, if not all Unicode line terminators?