multi-line string literals proposal


(Michael Peternell) #1

I've written up a proposal for multi-line string literals. Before proposing it officially, I would like to get some informal feedback.

How can I propose it officially? Do I have to convert it to Markdown? I have no idea how to create a Markdown version of this, with all the quotes and funny characters in it :wink:

-Michael

路路路

***

MULTI-LINE STRING LITERALS

- Proposal: SE-xxxx
- Author: Michael Peternell
- Status:
- Review manager:

INTRODUCTION

Multi-line string literals allow text that may be multiple lines long, to be included verbatim into a string literal. The string may even contain quote characters (" or '), and they don't have to be specially escaped.

MOTIVATION

Including many lines of text in a program often looks not so well, e.g. a JSON-string where ever quote needs to be escaped: "{\"response\":{\"result\":\"OK\"}}". With multi-line string literals, we can write """{"response":{"result":"OK"}}""" - note that every valid JSON can be pasted as-is into a """3-quote string literal""", because 3 quotes (""") cannot appear in a valid JSON. (Why would you want to have a JSON-string in a program? Maybe you are writing unit tests for a JSON parser.) Another usage example is below.

Some people had concerns that a string block may break the indentation of the code. E.g.

聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽// some deeply indented code
聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽doSomeStuff(2, 33.1)
聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽print("""Usage: \(program_name) <PARAM-X> <PARAM-Y> filename
Example: \(program_name) 3 1 countries.csv
This will print the 1st column of the 3rd non-empty non-header line from
countries.csv
""" )
聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽exit(2)

That's the reason why there is also a HEREDOC-syntax in the proposal that can solve this problem. The example can be rewritten as:

聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽// some deeply indented code
聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽doSomeStuff(2, 33.1)
聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽print(<<USAGE_END)
聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽Example: \(program_name) 3 1 countries.csv
聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽This will print the 1st column of the 3rd non-empty
聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽non-header line from countries.csv

聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽USAGE_END
聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽exit(2)

This works unambiguously, as long as you don't mix tabs and spaces in your source code file.

PROPOSED SOLUTION

This proposal introduces three new forms of a String literal:

let INTERPOLATION = "String interpolation"

1. The """Python-style string literal. 3 Quotes (") at the beginning, 3 Quotes at the end, and Swift \(INTERPOLATION) is possible."""

2. The <<HERE_DOC, the string literal starts on the next line:
聽聽聽聽A hereDoc may contain multiple lines. Leading space on each
聽聽聽聽line is automatically truncated if the HERE_DOC delimiter
聽聽聽聽is also indented. \(INTERPOLATION) is possible.
聽聽聽聽HERE_DOC

3. A <<'HERE_DOC' with single quotes around them.
This is almost the same as a heredoc without single quotes, but text is included as-is.
You may include \ or " or ' or whatever (\") is just a backslash followed by a double quote.
The leading space rule is the same as for the other HERE_DOC.
Swift String interpolation is not possible here.
HERE_DOC

DETAILED DESIGN

The first type of String (the """Python-style multiline string""") behaves exactly like the "ordinary string literal", except for a few differences:
- a line-break doesn't result in an error, but is normally integrated into the strings value
- an included " doesn't end the string and does not need to be quoted.
- If you want to include """ in the string, you have to write ""\". This is a rare use-case, and if you really need to do that, you may as well use one of the HERE_DOC-styles instead.

The second type of String (the <<HERE_DOC with string-interpolation) include all lines after the line where HERE_DOC appears, until the HERE_DOC delimiter line. The last newline before the HERE_DOC delimiter line is automatically truncated from the string; otherwise it would not be possible to create a HERE_DOC string literal that does not end with a newline character. If you want to end the string literal with a newline character, you need an empty line before the HERE_DOC delimiter line (as in the "usage"-example above). The HERE_DOC delimiter line contains optional whitespace at the beginning, followed by the HERE_DOC token. If the line contains leading whitespace, all lines within the literal have to contain exactly the same amount of leading whitespace. E.g. if the HERE_DOC-line contains 4 spaces, followed by "HERE_DOC", each line in the string literal has to start with 4 spaces as well (using one tab instead, or less white space, would be a parse error.) Empty lines within the string literal are exempt from this requirement. They just translate to "\n". (Fineprint: if the HEREDOC delimiter line is "\t\tHEREDOC" and one of the lines in the string literal are just " " then it is not decidable wether the line should translate to "\n" (if "\t" is like " " or larger) or to " \n" (if "\t" is like " "), so this would also result in a parse error. The whitespace before the HERE_DOC on the HERE_DOC delimiter line must contain only spaces or only tabs, but not a mixture of both. These rules are a bit complicated for the language implementor, but for the user of the feature, they have an important advantage: if the code compiles, the string literal will behave as expected. Just don't mix tabs and spaces and you'll be fine.)

The third type of string is exactly the same as the second type, with the only difference that the <<HERE_DOC syntax is changed to <<'HERE_DOC', and that all string interpolation and escape sequences are disabled within the literal. The end token is still HERE_DOC without single quotes, and not 'HERE_DOC'. The rules about leading whitespace on the HERE_DOC delimiter line are the same as for the second type.

For the HERE_DOC token, everything that is a valid variable name is allowed, so <<hello, <<END_OF_XML are all valid, but <<2442 is not. Furthermore, ever token that matches /[a-zA-Z]+/ is also valid, so <<class should be okay as well. (The usual practice is to use SCREAMING_SNAKE_CASE tokens as delimiters.)

IMPACT ON EXISTING CODE

This is an add-on feature. Code that uses these multi-line string literals didn't even compile with previous versions of Swift, so no existing code can break because of this change.

ALTERNATIVES CONSIDERED

1. Just copy all String-handling rules from Perl :wink:

2. String literals of the form

聽聽聽聽_"text text
聽聽聽聽"text text"_

I don't like the continuation quote, and so it doesn't solve the problem that I am trying to solve with this proposal. The same if true for a string literal where you would have to start each line with \\ .

3. eXML"a string literal that starts with e, followed by some token, and that ends with a quote (") followed by the same token"XML. This has the advantage, that you can put anything between the start and the end, and that you can choose a delimiter. It's a flexible solution. I prefer HERE_DOC's though, because they are an already well-known programming language construct.

4. Do nothing, and just use string concatenation: "this string\n"+
"with newlines in it\n" works well. Maybe the optimizer can optimize this away anyways, so there wouldn't even be a performance cost.