[Proposal] Multiline string literals


(Michael Peternell) #1

Hi,

I propose adding multiline string literals to Swift 3.

I have written up a proposal as a Github Gist, here:
https://gist.github.com/michaelpeternell/a4da4185de78808f4575a836c50debbd

Can someone with write-access push it to the swift-evolution repository, please?

Thanks..

Regards,
Michael

Multiline String literals

Proposal: SE-NNNN
Author: Michael Peternell <https://www.github.com/michaelpeternell>
Status: Awaiting review <https://github.com/apple/swift-evolution/blob/master/0000-template.md#rationale>
Review manager: TBD
<https://gist.github.com/michaelpeternell/a4da4185de78808f4575a836c50debbd#introduction>Introduction

Multi-line string literals allow text that may be multiple lines long, to be included verbatim into a string literal. The string may even contain quote characters (" or '), and they don't have to be especially escaped.

<https://gist.github.com/michaelpeternell/a4da4185de78808f4575a836c50debbd#motivation>Motivation

Including many lines of text in a program often looks not so well, e.g. a JSON-string where every quote needs to be escaped: "{\"response\":{\"result\":\"OK\"}}". With multi-line string literals, we can write """{"response":{"result":"OK"}}""" - note that every valid JSON can be pasted as-is into a """3-quote string literal""", because 3 quotes (""") cannot appear in a valid JSON. (Why would you want to have a JSON-string in a program? Maybe you are writing unit tests for a JSON parser.) Another usage example is below.

Some people had concerns that a string block may break the indentation of the code. E.g.

            // some deeply indented code
            doSomeStuff(2, 33.1)
            print("""Usage: \(program_name) <PARAM-X> <PARAM-Y> filename
Example: \(program_name) 3 1 countries.csv
This will print the 1st column of the 3rd non-empty non-header line from
countries.csv
""")
            exit(2)
First, you don't have to use them. You can still use the former way of using normal double quote characters. But in order to fix the problem with multiline strings, you may use a HEREDOC-syntax. The example can be rewritten as:

            // some deeply indented code
            doSomeStuff(2, 33.1)
            print(<<USAGE_END)
                Example: \(program_name) 3 1 countries.csv
                This will print the 1st column of the 3rd non-empty
                non-header line from countries.csv

                USAGE_END
            exit(2)
This works unambiguously, as long as you don't mix tabs and spaces in your source code file.

<https://gist.github.com/michaelpeternell/a4da4185de78808f4575a836c50debbd#proposed-solution>Proposed solution

This proposal introduces four new forms of a String literal:

<https://gist.github.com/michaelpeternell/a4da4185de78808f4575a836c50debbd#the-python-like-string-literal>The Python-like string literal

Everything between """ and """ belongs to the string. Escape sequences (\n, \t, \\, etc.) and string interpolation work as usual. The rules are the same as for a normal double quoted (") string literal. However, a single " doesn't need to be escaped, and therefore a string literal like """<a href="#" onclick="openABCWindow(22);return false">details</a>""" would be valid Swift. Newline, spaces and tabs are not treated in any special way, so the following string

"""
test
test"""
could also be written as "\n test\n test".

<https://gist.github.com/michaelpeternell/a4da4185de78808f4575a836c50debbd#the-heredoc-with-string-interpolation>The HEREDOC with string interpolation

The HEREDOC starts with a <<, followed by an identifier. The string literal starts in the next line and ends in the line that contains the HEREDOC identifier. Example:

print(<<USAGE)
    funnyProgram [-v] [-h]
    This program tells a joke. Possible options:
      -v | --version ... Shows version information
      -h | --help ...... Shows this list of options
USAGE
exit(2)
This string literal does not contain a trailing newline character. Otherwise, it would not be possible to create a HEREDOC-literal without trailing newline (The print()-function will add a newline though.) In the example above, each line is indented by 4 spaces. If you want to strip leading spaces on each line you have to indent the ending identifier with the same amount of whitespace. Thus, a better example would look like this:

print(<<USAGE)
    funnyProgram [-v] [-h]
    This program tells a joke. Possible options:
      -v | --version ... Shows version information
      -h | --help ...... Shows this list of options
    USAGE
exit(2)
The leading indentation has to be all-spaces or all-tabs, but never a mixture of them.

<https://gist.github.com/michaelpeternell/a4da4185de78808f4575a836c50debbd#the-heredoc-without-string-interpolation>The HEREDOC without string interpolation

To have no string interpolation or escape sequences at all, you can add single quotes around the HEREDOC-identifier. Example:

print(<<'USAGE')
    funnyProgram [-v] [-h]
    This program tells a joke. Possible options:
      -v | --version ... Shows version information
      -h | --help ...... Shows this list of options
         \( ^^ don't worry, be happy :slight_smile:
    USAGE
exit(2)
In all other regards, this string literal behaves the same as the HEREDOC with string interpolation.

<https://gist.github.com/michaelpeternell/a4da4185de78808f4575a836c50debbd#guillemets-and-english-typographical-quotes>«Guillemets» and English “typographical quotes”

Swift already allows emojis in names of all sort. The following code is valid Swift:

for :fish: in sea {
    🐟.makeSushi()
}
Swift is a playful language. Allowing «Guillemets» and “typographical quotes” is the next logical step. To allow for both strings with interpolation and strings without interpolation, one should allow string interpolation and escape sequences while the other should not. I propose that «Guillemets» are used for strings without interpolation, so «\» is a valid string literal consisting of one escape character. “"\(localizedName)"” is a string containing a double quote character (") followed by whatever the contents of localizedName is, followed by another double quote character ("). (Note that the reverse is already possible: "“\(localizedName)”".)

These literals behave the same as the Python-like string literal above.

<https://gist.github.com/michaelpeternell/a4da4185de78808f4575a836c50debbd#detailed-design>Detailed design

Python-strings, Guillemets and English typographical quotes are already described in detail above. The only thing that may cause misunderstandings are the HEREDOCs.

The following code should be invalid:

    print(<<EOT)
    hello world
        is this a proper string literal?
        EOT
because the ending EOT has more indentation than one of the lines in the string literal.

The following code is valid though:

    // I replaced spaces with _underscores_ below:
____print(<<EOT)
________hello world

···

____
________is this a valid string?
________EOT
Although the second line has less indentation than the other lines, this is not a problem because the line is empty.

The following string literal contains 3 spaces in the second line, and it ends with a single newline character:

    // I replaced spaces with _underscores_ below:
____print(<<EOT)
________hello world
___________
________is this a valid string?

________EOT
The following string literal is invalid:

    // I replaced spaces with _underscores_ below.
    // I replaced tab characters with TAB! below.
____print(<<EOT)
________hello world
____
TAB!____is this a valid string
________EOT
    // => no
With tabs configured to look exactly like 4 spaces, the code above looks valid but it is not. There is no sane way to decide wether (TAB + 4 spaces) is (less than, the same amount, or more than) 8 spaces. Such code should be discarded.

The authors opinion is that tabs and spaces should not be mixed, and that this will not be a problem in almost all use cases.

The following HEREDOC is also invalid, although the amount of whitespace is consistent:

    // I replaced spaces with _underscores_ below.
    // I replaced tab characters with TAB! below.
____print(<<EOT)
____TAB!hello world
____TAB!
____TAB!is this a valid string?
____TAB!__good question.
____TAB!EOT
Tabs and spaces just shouldn't be mixed. The following snippet is fine though, although inconsistent in it's tab/spaces use:

    // I replaced spaces with _underscores_ below:
    // I replaced tab characters with TAB! below.
____print(<<EOT)
________hello world
________
________is this a valid string?
________TAB!Yes, indeed, even though this line started with a tab
________EOT
In the example above, the leading space on each line consists of 8 spaces. Everything after these 8 spaces should become part of the string literal as-is.

<https://gist.github.com/michaelpeternell/a4da4185de78808f4575a836c50debbd#impact-on-existing-code>Impact on existing code

This is an add-on feature. Code that uses these multi-line string literals didn't even compile with previous versions of Swift, so no existing code can break because of this change.

<https://gist.github.com/michaelpeternell/a4da4185de78808f4575a836c50debbd#alternatives-consiedered>Alternatives consiedered

Introduce just the """Python-like string literal"""

Do nothing.

<https://gist.github.com/michaelpeternell/a4da4185de78808f4575a836c50debbd#rationale>Rationale

...