I’ve raised a speculative PR against the Swift Lexer to support multi-line string literals
Wow, it's pretty cool that this change is so surgical.
I’m trying to avoid more advanced features such as the handling of indenting which
for me complicates something that if kept simple can be documented very easily.
I don't think you can tackle multiline strings without worrying about indenting. Indentation may fundamentally change the approach you choose.
I continue to believe that we're actually looking at three orthogonal features here:
* Multiline string literals
* Alternative string literal delimiters
* Disabling escapes in string literals
The way I would prefer to tackle these is:
* Multiline literals: If the closing quote of a string is not present, look at the next line. If it consists of (optional) indentation followed by a matching opening quote, the string has a newline and then continues after the quote on the next line. (The handling of comments is an open question here.)
let xml: String = "<?xml version=\"1.0\"?>
"\t<book id=\"bk101\" empty=\"\">
The cool things about this are that (a) the compiler can tell you really do mean this to be part of the literal and you haven't just forgotten to close the string, and (b) there's no guesswork about how indentation should be handled. The uncool thing is that you need to insert the quote at the beginning of each line, so you can't just blindly paste into a multiline literal. Editors can help make that easier, though—a "paste as string literal" feature would be a nice addition to Xcode, and not just for multiline strings or just for Swift.
* Alternative delimiters: If a string literal starts with three, or five, or seven, or etc. quotes, that is the delimiter, and fewer quotes than that in a row are simply literal quote marks. Four, six, etc. quotes is a quote mark abutting the end of the literal.
let xml: String = """<?xml version="1.0"?>
"""\t<book id="bk101" empty="">
You can't use this syntax to express an empty string, or a string consisting entirely of quote marks, but `""` handles empty strings adequately, and escaping can help with quote marks. (An alternative would be to remove the abutting rule and permit `""""""` to mean "empty string", but abutting quotes seem more useful than long-delimiter empty strings.)
* Disabling escapes: If you use single quotes instead of double quotes, backslash escapes are disabled. (There is no escaping at all, not even \\ or \'. If you need to include the delimiter, use a delimiter with more quote marks. I'm not sure if this should disable interpolation; for now, I'm assuming it should. If it doesn't disable interpolation, the only way to get a \( into the string would be by interpolating it in, not by escaping it somehow.)
let xml: String = '''<?xml version="1.0"?>
''' <book id="bk101" empty="">
''' <author>''' + author + '''</author>
I'm not sure if single quotes should allow interpolation. Options are:
* No, just concatenate (as shown above).
* Yes, with the ordinary syntax: ''' <author>\(author)</author>
* Yes, with a number of backslashes matching the number of quotes, which allows you to insert literal \( text: ''' <author>\\\(author)</author>
Note that you can use these features in any combination. I've shown a few combinations above, but here are some others.
A single-line literal with an alternate delimiter:
""" <book id="bk101" empty="">"""
The same thing, but no-escaping:
''' <book id='bk101' empty=''>'''
A no-escaping multiline literal with a normal delimiter:
* * *
Notes on alternatives:
1. If you wanted to not provide no-escaping strings, an alternative would be to say that *all* escapes require as many backslashes as there are quotes in the string delimiter. Thus, a newline escape in a `"""` string would be `\\\n`. This would in practice give you the same flexibility to write a literal without worrying (much) about escaping.
2. However, it's not entirely clear to me that we really need escapes other than interpolations at all. You could write "\(.newline)" or "\(.doubleQuote)" or "\(.backslash)" to get those characters. (These might be static members required by StringInterpolationConvertible.) Plain backslashes would have no special meaning at all; only "\(" would be special.
3. It might be useful to make multiline `"` strings trim trailing whitespace and comments like Perl's `/x` regex modifier does. That would allow you to document things in literals. Then you would want `'` again so that you could turn that smartness off. (Of course, the big problem here is that a naïve implementation would consider "http://" to have a comment at the end of it.)
* * *
Finally, a brief aside:
For example, a regular expression that detects a might be written "\N*\n". If escaping is enabled, then the compiler changes "\n" into line feed, which does not have the same meaning to the regular expression engine as "\n".
There is a special place in Hell reserved for authors of languages which use `\` as an escape character, provide no regex literals or non-escaping string literals, and ship with regex libraries which use `\` as a metacharacter. It's in the outer circles—Satan has some sense of perspective—but it's definitely there.
Sorry if that's not very constructive, but *man*, that burns my biscuits.