Optional dictionary subscripts

cloutiertyler · April 30, 2016, 6:54pm

Awesome. Some specific suggestions below, but feel free to iterate in a pull request if you prefer that.

I've adopted these suggestions in some form, though I also ended up rewriting the explanation of why the feature was designed as it is and fusing it with material from "Alternatives considered".

(Still not sure who I should list as a co-author. I'm currently thinking John, Tyler, and maybe Chris? Who's supposed to go there?)

Multiline string literals

Proposal: SE-NNNN <https://github.com/apple/swift-evolution/blob/master/proposals/NNNN-name.md>
Author(s): Brent Royal-Gordon <https://github.com/brentdax>
Status: Second Draft
Review manager: TBD
<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#introduction>Introduction

In Swift 2.2, the only means to insert a newline into a string literal is the \n escape. String literals specified in this way are generally ugly and unreadable. We propose a multiline string feature inspired by English punctuation which is a straightforward extension of our existing string literals.

This proposal is one step in a larger plan to improve how string literals address various challenging use cases. It is not meant to solve all problems with escaping, nor to serve all use cases involving very long string literals. See the "Future directions for string literals in general" section for a sketch of the problems we ultimately want to address and some ideas of how we might do so.

Swift-evolution threads: multi-line string literals. (April) <https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20160418/015500.html>, multi-line string literals (December) <https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20151214/002349.html>
<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#draft-notes>Draft Notes

Removes the comment feature, which was felt to be an unnecessary complication. This and the backslash feature have been listed as future directions.

Loosens the specification of diagnostics, suggesting instead of requiring fix-its.

Splits a "Rationale" section out of the "Proposed solution" section.

Adds extensive discussion of other features which wold combine with this one.

I've listed only myself as an author because I don't want to put anyone else's name to a document they haven't seen, but there are others who deserve to be listed (John Holdsworth at least). Let me know if you think you should be included.

<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#motivation>Motivation

As Swift begins to move into roles beyond app development, code which needs to generate text becomes a more important use case. Consider, for instance, generating even a small XML string:

let xml = "<?xml version=\"1.0\"?>\n<catalog>\n\t<book id=\"bk101\" empty=\"\">\n\t\t<author>\(author)</author>\n\t</book>\n</catalog>"
The string is practically unreadable, its structure drowned in escapes and run-together lines; it looks like little more than line noise. We can improve its readability somewhat by concatenating separate strings for each line and using real tabs instead of \t escapes:

let xml = "<?xml version=\"1.0\"?>\n" +
          "<catalog>\n" +
          " <book id=\"bk101\" empty=\"\">\n" +
          " <author>\(author)</author>\n" +
          " </book>\n" +
          "</catalog>"
However, this creates a more complex expression for the type checker, and there's still far more punctuation than ought to be necessary. If the most important goal of Swift is making code readable, this kind of code falls far short of that goal.

<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#proposed-solution>Proposed solution

We propose that, when Swift is parsing a string literal, if it reaches the end of the line without encountering an end quote, it should look at the next line. If it sees a quote at the beginning (a "continuation quote"), the string literal contains a newline and then continues on that line. Otherwise, the string literal is unterminated and syntactically invalid.

One other way to implement the feature would be to allow quotes to be terminated by either a close quote or an end of line character. Multiline literals would then be constructed by concatenating adjacent (e.i. separated by only comments or whitespace) string literals.

There is an issue with this that

let foo = “bar

would be a valid string whose value would be “bar\n”, even though that might not be the intended result. There is also the issue of things like

let foo = [
  “string1”,
  “string2”
  “string3"
]

becoming [“string1”, “string2string3”]. This is something that can happen in Python, for example.

However, this has a few benefits. Namely, that it simplifies the model and that if I’m just pasting in a block of text I don’t have to add a trailing quote to the last line. If I was in Vim for example, I could just visual-block add a column of quotes at the beginning. So it would be:

let xml = "<?xml version=\"1.0\"?>
          "<catalog>
          " <book id=\"bk101\" empty=\"\">
          " <author>\(author)</author>
          " </book>
          "</catalog>

Our sample above could thus be written as:

let xml = "<?xml version=\"1.0\"?>
          "<catalog>
          " <book id=\"bk101\" empty=\"\">
          " <author>\(author)</author>
          " </book>
          "</catalog>"
If the second or subsequent lines had not begun with a quotation mark, or the trailing quotation mark after the </catalog>tag had not been included, Swift would have emitted an error.

<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#rationale>Rationale

This design is rather unusual, and it's worth pausing a moment to explain why it has been chosen.

The traditional design for this feature, seen in languages like Perl and Python, simply places one delimiter at the beginning of the literal and another at the end. Individual lines in the literal are not marked in any way.

We think continuation quotes offer several important advantages over the traditional design:

They help the compiler pinpoint errors in string literal delimiting. Traditional multiline strings have a serious weakness: if you forget the closing quote, the compiler has no idea where you wanted the literal to end. It simply continues on until the compiler encounters another quote (or the end of the file). If you're lucky, the text after that quote is not valid code, and the resulting error will at least point you to the next string literal in the file. If you're unlucky, you'll get a seemingly unrelated error several literals later, an unbalanced brace error at the end of the file, or perhaps even code that compiles but does something totally wrong.

(This is not a minor concern. Many popular languages, including C and Swift 2, specifically reject newlines in string literals to prevent this from happening.)

Continuation quotes provide the compiler with redundant information about your intent. If you forget a closing quote, the continuation quotes give the compiler a very good idea of where you meant to put it. The compiler can point you to (or at least very near) the end of the literal, where you want to insert the quote, rather than showing you the beginning of the literal or even some unrelated error later in the file that was caused by the missing quote.

Temporarily unclosed literals don't make editors go haywire. The syntax highlighter has the same trouble parsing half-written, unclosed traditional quotes that the compiler does: It can't tell where the literal is supposed to end and the code should begin. It must either apply heuristics to try to guess where the literal ends, or incorrectly color everything between the opening quote and the next closing quote as a string literal. This can cause the file's coloring to alternate distractingly between "string literal" and "running code".

Continuation quotes give the syntax highlighter enough context to guess at the correct coloration, even when the string isn't complete yet. Lines with a continuation quote are literals; lines without are code. At worst, the syntax highlighter might incorrectly color a few characters at the end of a line, rather than the remainder of the file.

They separate indentation from the string's contents. Traditional multiline strings usually include all of the content between the start and end delimiters, including leading whitespace. This means that it's usually impossible to indent a multiline string, so including one breaks up the flow of the surrounding code, making it less readable. Some languages apply heuristics or mode switches to try to remove indentation, but like all heuristics, these are mistake-prone and murky.

Scala has an interesting solution to this problem which doesn’t involve a mode, but rather a function that strips out whitespace before the | character. In this case the | character serves a very similar purpose to the continuation quote. The particular character can be passed to the function as an argument.

Continuation quotes neatly avoid this problem. Whitespace before the continuation quote is indentation used to format the source code; whitespace after the continuation quote is part of the string literal. The interpretation of the code is perfectly clear to both compiler and programmer.

They improve the ability to quickly recognize the literal. Traditional multiline strings don't provide much visual help. To find the end, you must visually scan until you find the matching delimiter, which may be only one or a few characters long. When looking at a random line of source, it can be hard to tell at a glance whether it's code or literal. Syntax highlighting can help with these issues, but it's often unreliable, especially with advanced, idiosyncratic string literal features like multiline strings.

Continuation quotes solve these problems. To find the end of the literal, just scan down the column of continuation characters until they end. To figure out if a given line of source is part of a literal, just see if it starts with a quote mark. The meaning of the source becomes obvious at a glance.

Nevertheless, the traditional design does has a few advantages:

It is simpler. Although continuation quotes are more complex, we believe that the advantages listed above pay for that complexity.

There is no need to edit the intervening lines to add continuation quotes. While the additional effort required to insert continuation quotes is an important downside, we believe that tool support, including both compiler fix-its and perhaps editor support for commands like "Paste as String Literal", can address this issue. In some editors, new features aren't even necessary; TextMate, for instance, lets you insert a character on several lines simultaneously. And new tool features could also address other issues like escaping embedded quotes.

Although I was concerned about this, most editors do have some way of inserting a column of characters which would reduce the burden of pasting in code. And although enabling/disabling escaping is an orthogonal feature, allowing the _” syntax to disable escaping would allow you to paste in code with no other modifications.

Naïve syntax highlighters may have trouble understanding this syntax. This is true, but naïve syntax highlighters generally have terrible trouble with advanced string literal constructs; some struggle with even basic ones. While there are some designs (like Python's """ strings) which trick some syntax highlighters into working some of the time with some contents, we don't think this occasional, accidental compatibility is a big enough gain to justify changing the design.

It looks funny—quotes should always be in matched pairs. We aren't aware of another programming language which uses unbalanced quotes in string literals, but there is one very important precedent for this kind of formatting: natural languages. English, for instance, uses a very similar format for quoting multiple lines of dialog by the same speaker. As an English Stack Exchange answer illustrates <punctuation - Why does the multi-paragraph quotation rule exist? - English Language & Usage Stack Exchange

“That seems like an odd way to use punctuation,” Tom said. “What harm would there be in using quotation marks at the end of every paragraph?”

“Oh, that’s not all that complicated,” J.R. answered. “If you closed quotes at the end of every paragraph, then you would need to reidentify the speaker with every subsequent paragraph.

“Say a narrative was describing two or three people engaged in a lengthy conversation. If you closed the quotation marks in the previous paragraph, then a reader wouldn’t be able to easily tell if the previous speaker was extending his point, or if someone else in the room had picked up the conversation. By leaving the previous paragraph’s quote unclosed, the reader knows that the previous speaker is still the one talking.”

“Oh, that makes sense. Thanks!”
In English, omitting the ending quotation mark tells the text's reader that the quote continues on the next line, while including a quotation mark at the beginning of the next line reminds the reader that they're in the middle of a quote.

Similarly, in this proposal, omitting the ending quotation mark tells the code's reader (and compiler) that the string literal continues on the next line, while including a quotation mark at the beginning of the next line reminds the reader (and compiler) that they're in the middle of a string literal.

This is very interesting, I never knew!

On balance, we think continuation quotes are the best design for this problem.

<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#detailed-design>Detailed design

When Swift is parsing a string literal and reaches the end of a line without finding a closing quote, it examines the next line, applying the following rules:

If the next line begins with whitespace followed by a continuation quote, then the string literal contains a newline followed by the contents of the string literal starting on that line. (This line may itself have no closing quote, in which case the same rules apply to the line which follows.)

If the next line contains anything else, Swift raises a syntax error for an unterminated string literal.

The exact error messages and diagnostics provided are left to the implementers to determine, but we believe it should be possible to provide two fix-its which will help users learn the syntax and correct string literal mistakes:

Insert " at the end of the current line to terminate the quote.

Insert " at the beginning of the next line (with some indentation heuristics) to continue the quote on the next line.

<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#impact-on-existing-code>Impact on existing code

Failing to close a string literal before the end of the line is currently a syntax error, so no valid Swift code should be affected by this change.

<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#future-directions-for-multiline-string-literals>Future directions for multiline string literals

We could permit comments before encountering a continuation quote to be counted as whitespace, and permit empty lines in the middle of string literals. This would allow you to comment out whole lines in the literal.

We could allow you to put a trailing backslash on a line to indicate that the newline isn't "real" and should be omitted from the literal's contents.

<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#future-directions-for-string-literals-in-general>Future directions for string literals in general

There are other issues with Swift's string handling which this proposal intentionally does not address:

Reducing the amount of double-backslashing needed when working with regular expression libraries, Windows paths, source code generation, and other tasks where backslashes are part of the data.

Alternate delimiters or other strategies for writing strings with " characters in them.

Accommodating code formatting concerns like hard wrapping and commenting.

String literals consisting of very long pieces of text which are best represented completely verbatim, with minimal alteration.

This section briefly outlines some future proposals which might address these issues. Combined, we believe they would address most of the string literal use cases which Swift is currently not very good at.

Please note that these are simply sketches of hypothetical future designs; they may radically change before proposal, and some may never be proposed at all. Many, perhaps most, will not be proposed for Swift 3. We are sketching these designs not to propose and refine these features immediately, but merely to show how we think they might be solved in ways which complement this proposal.

<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#string-literal-modifiers>String literal modifiers

A string literal modifier is a cluster of identifier characters which goes before a string literal and adjusts the way it is parsed. Modifers only alter the interpretation of the text in the literal, not the type of data it produces; for instance, there will never be something like the UTF-8/UTF-16/UTF-32 literal modifiers in C++. Uppercase characters enable a feature; lowercase characters disable a feature.

Modifiers can be attached to both single-line and multiline literals, and could also be attached to other literal syntaxes which might be introduced in the future. When used with multiline strings, only the starting quote needs to carry the modifiers, not the continuation quotes.

Modifiers are an extremely flexible feature which can be used for many proposes. Of the ideas listed below, we believe the e modifier is an urgent addition which should be included in Swift 3 if at all possible; the others are less urgent and most of them could be deferred, or at least added later if time allows.

Escape disabling: e"\\\" (string with three backslash characters)

Fine-grained escape disabling: i"\(foo)\n" (the string \(foo) followed by a newline); eI"\(foo)\n" (the contents of foo followed by the string \n), b"\w+\n" (the string \w+ followed by a newline)

Alternate delimiters: _ has no lowercase form, so it could be used to allow strings with internal quotes: _"print("Hello, world!")"_, __"print("Hello, world!")"__, etc.

This is interesting and perhaps could be applied per line with the continuation quote syntax:

let xml = _"<?xml version="1.0"?>
          _"<catalog>
          _" <book id="bk101" empty="">
           " <author>\(author)</author>
          _" </book>
          _"</catalog>
This would allow individual lines to retain the ability to do escaping and interpolation without affecting the whole string, just like the author line in the example above. This is also very easy to insert into editors just like the standard continuation quote syntax. Or perhaps we could just “escape” each string:

let xml = \"<?xml version="1.0"?>
          \"<catalog>
          \" <book id="bk101" empty="">
           " <author>\(author)</author>
          \" </book>
          \"</catalog>

Whitespace normalization: changes all runs of whitespace in the literal to single space characters; this would allow you to use multiline strings purely to improve code formatting.

alert.informativeText =
    W"\(appName) could not typeset the element “\(title)” because
     "it includes a link to an element that has been removed from this
     "book."
Localization:

alert.informativeText =
    LW"\(appName) could not typeset the element “\(title)” because
      "it includes a link to an element that has been removed from this
      "book."
Comments: Embedding comments in string literals might be useful for literals containing regular expressions or other code.

Eventually, user-specified string modifiers could be added to Swift, perhaps as part of a hygienic macro system. It might also become possible to change the default modifiers applied to literals in a particular file or scope.

<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#heredocs-or-other-verbatim-string-literal-features>Heredocs or other "verbatim string literal" features

Sometimes it really is best to just splat something else down in the middle of a file full of Swift source code. Maybe the file is essentially a template and the literals are a majority of the code's contents, or maybe you're writing a code generator and just want to get string data into it with minimal fuss, or maybe people unfamiliar with Swift need to be able to edit the literals. Whatever the reason, the normal string literal syntax is just too burdensome.

One approach to this problem is heredocs. A heredoc allows you to put a placeholder for a literal on one line; the contents of the literal begin on the next line, running up to some delimiter. It would be possible to put multiple placeholders in a single line, and to apply string modifiers to them.

In Swift, this might look like:

print(#to("---") + e#to("END"))
It was a dark and stormy \(timeOfDay) when
---
the Swift core team invented the \(interpolation) syntax.
END
Another possible approach would be to support traditional multiline string literals bounded by a different delimiter, like """. This might look like:

print("""
It was a dark and stormy \(timeOfDay) when
""" + e"""
the Swift core team invented the \(interpolation) syntax.
""")
Although heredocs could make a good addition to Swift eventually, there are good reasons to defer them for now. Please see the "Alternatives considered" section for details.

<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#first-class-regular-expressions>First\-class regular expressions

Members of the core team are interested in regular expressions, but they don't want to just build a literal that wraps PCRE or libicu; rather, they aim to integrate regexes into the pattern matching system and give them a deep, Perl 6-style rethink. This would be a major effort, far beyond the scope of Swift 3.

In the meantime, the e modifier and perhaps other string literal modifiers will make it easier to specify regular expressions in string literals for use with NSRegularExpression and other libraries accessible from Swift.

<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#alternatives-considered>Alternatives considered

<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#requiring-no-continuation-character>Requiring no continuation character

The main alternative is to not require a continuation quote, and simply extend the string literal from the starting quote to the ending quote, including all newlines between them. For example:

let xml = "<?xml version=\"1.0\"?>
<catalog>
    <book id=\"bk101\" empty=\"\">
        <author>\(author)</author>
    </book>
</catalog>"
This alternative is extensively discussed in the "Rationale" section above.

<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#skip-multiline-strings-and-just-support-heredocs>Skip multiline strings and just support heredocs

There are definitely cases where a heredoc would be a better solution, such as generated code or code which is mostly literals with a little Swift sprinkled around. On the other hand, there are also cases where multiline strings are better: short strings in code which is meant to be read. If a single feature can't handle them both well, there's no shame in supporting the two features separately.

It makes sense to support multiline strings first because:

They extend existing syntax instead of introducing new syntax.

They are much easier to parse; heredocs require some kind of mode in the parser which kicks in at the start of the next line, whereas multiline string literals can be handled in the lexer.

As discussed in "Rationale", they offer better diagnostics, code formatting, and visual scannability.

<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#use-a-different-delimiter-for-multiline-strings>Use a different delimiter for multiline strings

The initial suggestion was that multiline strings should use a different delimiter, """, at the beginning and end of the string, with no continuation characters between. Like heredocs, this might be a good alternative for certain use cases, but it has the same basic flaws as the "no continuation character" solution.

That might be a useful document to have, but I worry that we'll end up seeing the string feature proposals signed in triplicate, sent in, sent back, queried, lost, found, subjected to public inquiry, lost again, and finally buried in soft peat for three months and recycled as firelighters, all to end up in with basically the same proposals but with slightly different keywords. Not every decision needs that level of explicit, deep documentation. Some things you can think about, experiment with, discuss, and do.

Yeah, I think you are probably right here. I actually think with the additions to your proposal it covers almost all of the other suggestions regarding string literals or at least mentions them as alternatives. Thanks so much for spending the time putting together the proposal! I have no idea how you find the time to follow and participate in what seems like every Swift evolution thread, but it’s awesome!

Tyler

···

On Apr 28, 2016, at 2:56 PM, Brent Royal-Gordon via swift-evolution <swift-evolution@swift.org> wrote:

--
Brent Royal-Gordon
Architechies

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

cloutiertyler · April 30, 2016, 6:57pm

Awesome. Some specific suggestions below, but feel free to iterate in a pull request if you prefer that.

I've adopted these suggestions in some form, though I also ended up rewriting the explanation of why the feature was designed as it is and fusing it with material from "Alternatives considered".

(Still not sure who I should list as a co-author. I'm currently thinking John, Tyler, and maybe Chris? Who's supposed to go there?)

I haven’t contributed much beyond the initial suggestions, however, that being said I have never been an author on Swift evolution and it would really make my day (if not year, given that Swift is at least in my top 5 favorite things).

:)

···

On Apr 28, 2016, at 2:56 PM, Brent Royal-Gordon via swift-evolution <swift-evolution@swift.org> wrote:

Multiline string literals

Proposal: SE-NNNN <https://github.com/apple/swift-evolution/blob/master/proposals/NNNN-name.md>
Author(s): Brent Royal-Gordon <https://github.com/brentdax>
Status: Second Draft
Review manager: TBD
<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#introduction>Introduction

In Swift 2.2, the only means to insert a newline into a string literal is the \n escape. String literals specified in this way are generally ugly and unreadable. We propose a multiline string feature inspired by English punctuation which is a straightforward extension of our existing string literals.

This proposal is one step in a larger plan to improve how string literals address various challenging use cases. It is not meant to solve all problems with escaping, nor to serve all use cases involving very long string literals. See the "Future directions for string literals in general" section for a sketch of the problems we ultimately want to address and some ideas of how we might do so.

Swift-evolution threads: multi-line string literals. (April) <https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20160418/015500.html>, multi-line string literals (December) <https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20151214/002349.html>
<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#draft-notes>Draft Notes

Removes the comment feature, which was felt to be an unnecessary complication. This and the backslash feature have been listed as future directions.

Loosens the specification of diagnostics, suggesting instead of requiring fix-its.

Splits a "Rationale" section out of the "Proposed solution" section.

Adds extensive discussion of other features which wold combine with this one.

I've listed only myself as an author because I don't want to put anyone else's name to a document they haven't seen, but there are others who deserve to be listed (John Holdsworth at least). Let me know if you think you should be included.

<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#motivation>Motivation

As Swift begins to move into roles beyond app development, code which needs to generate text becomes a more important use case. Consider, for instance, generating even a small XML string:

let xml = "<?xml version=\"1.0\"?>\n<catalog>\n\t<book id=\"bk101\" empty=\"\">\n\t\t<author>\(author)</author>\n\t</book>\n</catalog>"
The string is practically unreadable, its structure drowned in escapes and run-together lines; it looks like little more than line noise. We can improve its readability somewhat by concatenating separate strings for each line and using real tabs instead of \t escapes:

let xml = "<?xml version=\"1.0\"?>\n" +
          "<catalog>\n" +
          " <book id=\"bk101\" empty=\"\">\n" +
          " <author>\(author)</author>\n" +
          " </book>\n" +
          "</catalog>"
However, this creates a more complex expression for the type checker, and there's still far more punctuation than ought to be necessary. If the most important goal of Swift is making code readable, this kind of code falls far short of that goal.

<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#proposed-solution>Proposed solution

We propose that, when Swift is parsing a string literal, if it reaches the end of the line without encountering an end quote, it should look at the next line. If it sees a quote at the beginning (a "continuation quote"), the string literal contains a newline and then continues on that line. Otherwise, the string literal is unterminated and syntactically invalid.

Our sample above could thus be written as:

let xml = "<?xml version=\"1.0\"?>
          "<catalog>
          " <book id=\"bk101\" empty=\"\">
          " <author>\(author)</author>
          " </book>
          "</catalog>"
If the second or subsequent lines had not begun with a quotation mark, or the trailing quotation mark after the </catalog>tag had not been included, Swift would have emitted an error.

<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#rationale>Rationale

This design is rather unusual, and it's worth pausing a moment to explain why it has been chosen.

The traditional design for this feature, seen in languages like Perl and Python, simply places one delimiter at the beginning of the literal and another at the end. Individual lines in the literal are not marked in any way.

We think continuation quotes offer several important advantages over the traditional design:

They help the compiler pinpoint errors in string literal delimiting. Traditional multiline strings have a serious weakness: if you forget the closing quote, the compiler has no idea where you wanted the literal to end. It simply continues on until the compiler encounters another quote (or the end of the file). If you're lucky, the text after that quote is not valid code, and the resulting error will at least point you to the next string literal in the file. If you're unlucky, you'll get a seemingly unrelated error several literals later, an unbalanced brace error at the end of the file, or perhaps even code that compiles but does something totally wrong.

(This is not a minor concern. Many popular languages, including C and Swift 2, specifically reject newlines in string literals to prevent this from happening.)

Continuation quotes provide the compiler with redundant information about your intent. If you forget a closing quote, the continuation quotes give the compiler a very good idea of where you meant to put it. The compiler can point you to (or at least very near) the end of the literal, where you want to insert the quote, rather than showing you the beginning of the literal or even some unrelated error later in the file that was caused by the missing quote.

Temporarily unclosed literals don't make editors go haywire. The syntax highlighter has the same trouble parsing half-written, unclosed traditional quotes that the compiler does: It can't tell where the literal is supposed to end and the code should begin. It must either apply heuristics to try to guess where the literal ends, or incorrectly color everything between the opening quote and the next closing quote as a string literal. This can cause the file's coloring to alternate distractingly between "string literal" and "running code".

Continuation quotes give the syntax highlighter enough context to guess at the correct coloration, even when the string isn't complete yet. Lines with a continuation quote are literals; lines without are code. At worst, the syntax highlighter might incorrectly color a few characters at the end of a line, rather than the remainder of the file.

They separate indentation from the string's contents. Traditional multiline strings usually include all of the content between the start and end delimiters, including leading whitespace. This means that it's usually impossible to indent a multiline string, so including one breaks up the flow of the surrounding code, making it less readable. Some languages apply heuristics or mode switches to try to remove indentation, but like all heuristics, these are mistake-prone and murky.

Continuation quotes neatly avoid this problem. Whitespace before the continuation quote is indentation used to format the source code; whitespace after the continuation quote is part of the string literal. The interpretation of the code is perfectly clear to both compiler and programmer.

They improve the ability to quickly recognize the literal. Traditional multiline strings don't provide much visual help. To find the end, you must visually scan until you find the matching delimiter, which may be only one or a few characters long. When looking at a random line of source, it can be hard to tell at a glance whether it's code or literal. Syntax highlighting can help with these issues, but it's often unreliable, especially with advanced, idiosyncratic string literal features like multiline strings.

Continuation quotes solve these problems. To find the end of the literal, just scan down the column of continuation characters until they end. To figure out if a given line of source is part of a literal, just see if it starts with a quote mark. The meaning of the source becomes obvious at a glance.

Nevertheless, the traditional design does has a few advantages:

It is simpler. Although continuation quotes are more complex, we believe that the advantages listed above pay for that complexity.

There is no need to edit the intervening lines to add continuation quotes. While the additional effort required to insert continuation quotes is an important downside, we believe that tool support, including both compiler fix-its and perhaps editor support for commands like "Paste as String Literal", can address this issue. In some editors, new features aren't even necessary; TextMate, for instance, lets you insert a character on several lines simultaneously. And new tool features could also address other issues like escaping embedded quotes.

Naïve syntax highlighters may have trouble understanding this syntax. This is true, but naïve syntax highlighters generally have terrible trouble with advanced string literal constructs; some struggle with even basic ones. While there are some designs (like Python's """ strings) which trick some syntax highlighters into working some of the time with some contents, we don't think this occasional, accidental compatibility is a big enough gain to justify changing the design.

It looks funny—quotes should always be in matched pairs. We aren't aware of another programming language which uses unbalanced quotes in string literals, but there is one very important precedent for this kind of formatting: natural languages. English, for instance, uses a very similar format for quoting multiple lines of dialog by the same speaker. As an English Stack Exchange answer illustrates <punctuation - Why does the multi-paragraph quotation rule exist? - English Language & Usage Stack Exchange

“That seems like an odd way to use punctuation,” Tom said. “What harm would there be in using quotation marks at the end of every paragraph?”

“Oh, that’s not all that complicated,” J.R. answered. “If you closed quotes at the end of every paragraph, then you would need to reidentify the speaker with every subsequent paragraph.

“Say a narrative was describing two or three people engaged in a lengthy conversation. If you closed the quotation marks in the previous paragraph, then a reader wouldn’t be able to easily tell if the previous speaker was extending his point, or if someone else in the room had picked up the conversation. By leaving the previous paragraph’s quote unclosed, the reader knows that the previous speaker is still the one talking.”

“Oh, that makes sense. Thanks!”
In English, omitting the ending quotation mark tells the text's reader that the quote continues on the next line, while including a quotation mark at the beginning of the next line reminds the reader that they're in the middle of a quote.

Similarly, in this proposal, omitting the ending quotation mark tells the code's reader (and compiler) that the string literal continues on the next line, while including a quotation mark at the beginning of the next line reminds the reader (and compiler) that they're in the middle of a string literal.

On balance, we think continuation quotes are the best design for this problem.

<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#detailed-design>Detailed design

When Swift is parsing a string literal and reaches the end of a line without finding a closing quote, it examines the next line, applying the following rules:

If the next line begins with whitespace followed by a continuation quote, then the string literal contains a newline followed by the contents of the string literal starting on that line. (This line may itself have no closing quote, in which case the same rules apply to the line which follows.)

If the next line contains anything else, Swift raises a syntax error for an unterminated string literal.

The exact error messages and diagnostics provided are left to the implementers to determine, but we believe it should be possible to provide two fix-its which will help users learn the syntax and correct string literal mistakes:

Insert " at the end of the current line to terminate the quote.

Insert " at the beginning of the next line (with some indentation heuristics) to continue the quote on the next line.

<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#impact-on-existing-code>Impact on existing code

Failing to close a string literal before the end of the line is currently a syntax error, so no valid Swift code should be affected by this change.

<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#future-directions-for-multiline-string-literals>Future directions for multiline string literals

We could permit comments before encountering a continuation quote to be counted as whitespace, and permit empty lines in the middle of string literals. This would allow you to comment out whole lines in the literal.

We could allow you to put a trailing backslash on a line to indicate that the newline isn't "real" and should be omitted from the literal's contents.

<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#future-directions-for-string-literals-in-general>Future directions for string literals in general

There are other issues with Swift's string handling which this proposal intentionally does not address:

Reducing the amount of double-backslashing needed when working with regular expression libraries, Windows paths, source code generation, and other tasks where backslashes are part of the data.

Alternate delimiters or other strategies for writing strings with " characters in them.

Accommodating code formatting concerns like hard wrapping and commenting.

String literals consisting of very long pieces of text which are best represented completely verbatim, with minimal alteration.

This section briefly outlines some future proposals which might address these issues. Combined, we believe they would address most of the string literal use cases which Swift is currently not very good at.

Please note that these are simply sketches of hypothetical future designs; they may radically change before proposal, and some may never be proposed at all. Many, perhaps most, will not be proposed for Swift 3. We are sketching these designs not to propose and refine these features immediately, but merely to show how we think they might be solved in ways which complement this proposal.

<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#string-literal-modifiers>String literal modifiers

A string literal modifier is a cluster of identifier characters which goes before a string literal and adjusts the way it is parsed. Modifers only alter the interpretation of the text in the literal, not the type of data it produces; for instance, there will never be something like the UTF-8/UTF-16/UTF-32 literal modifiers in C++. Uppercase characters enable a feature; lowercase characters disable a feature.

Modifiers can be attached to both single-line and multiline literals, and could also be attached to other literal syntaxes which might be introduced in the future. When used with multiline strings, only the starting quote needs to carry the modifiers, not the continuation quotes.

Modifiers are an extremely flexible feature which can be used for many proposes. Of the ideas listed below, we believe the e modifier is an urgent addition which should be included in Swift 3 if at all possible; the others are less urgent and most of them could be deferred, or at least added later if time allows.

Escape disabling: e"\\\" (string with three backslash characters)

Fine-grained escape disabling: i"\(foo)\n" (the string \(foo) followed by a newline); eI"\(foo)\n" (the contents of foo followed by the string \n), b"\w+\n" (the string \w+ followed by a newline)

Alternate delimiters: _ has no lowercase form, so it could be used to allow strings with internal quotes: _"print("Hello, world!")"_, __"print("Hello, world!")"__, etc.

Whitespace normalization: changes all runs of whitespace in the literal to single space characters; this would allow you to use multiline strings purely to improve code formatting.

alert.informativeText =
    W"\(appName) could not typeset the element “\(title)” because
     "it includes a link to an element that has been removed from this
     "book."
Localization:

alert.informativeText =
    LW"\(appName) could not typeset the element “\(title)” because
      "it includes a link to an element that has been removed from this
      "book."
Comments: Embedding comments in string literals might be useful for literals containing regular expressions or other code.

Eventually, user-specified string modifiers could be added to Swift, perhaps as part of a hygienic macro system. It might also become possible to change the default modifiers applied to literals in a particular file or scope.

<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#heredocs-or-other-verbatim-string-literal-features>Heredocs or other "verbatim string literal" features

Sometimes it really is best to just splat something else down in the middle of a file full of Swift source code. Maybe the file is essentially a template and the literals are a majority of the code's contents, or maybe you're writing a code generator and just want to get string data into it with minimal fuss, or maybe people unfamiliar with Swift need to be able to edit the literals. Whatever the reason, the normal string literal syntax is just too burdensome.

One approach to this problem is heredocs. A heredoc allows you to put a placeholder for a literal on one line; the contents of the literal begin on the next line, running up to some delimiter. It would be possible to put multiple placeholders in a single line, and to apply string modifiers to them.

In Swift, this might look like:

print(#to("---") + e#to("END"))
It was a dark and stormy \(timeOfDay) when
---
the Swift core team invented the \(interpolation) syntax.
END
Another possible approach would be to support traditional multiline string literals bounded by a different delimiter, like """. This might look like:

print("""
It was a dark and stormy \(timeOfDay) when
""" + e"""
the Swift core team invented the \(interpolation) syntax.
""")
Although heredocs could make a good addition to Swift eventually, there are good reasons to defer them for now. Please see the "Alternatives considered" section for details.

<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#first-class-regular-expressions>First\-class regular expressions

Members of the core team are interested in regular expressions, but they don't want to just build a literal that wraps PCRE or libicu; rather, they aim to integrate regexes into the pattern matching system and give them a deep, Perl 6-style rethink. This would be a major effort, far beyond the scope of Swift 3.

In the meantime, the e modifier and perhaps other string literal modifiers will make it easier to specify regular expressions in string literals for use with NSRegularExpression and other libraries accessible from Swift.

<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#alternatives-considered>Alternatives considered

<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#requiring-no-continuation-character>Requiring no continuation character

The main alternative is to not require a continuation quote, and simply extend the string literal from the starting quote to the ending quote, including all newlines between them. For example:

let xml = "<?xml version=\"1.0\"?>
<catalog>
    <book id=\"bk101\" empty=\"\">
        <author>\(author)</author>
    </book>
</catalog>"
This alternative is extensively discussed in the "Rationale" section above.

<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#skip-multiline-strings-and-just-support-heredocs>Skip multiline strings and just support heredocs

There are definitely cases where a heredoc would be a better solution, such as generated code or code which is mostly literals with a little Swift sprinkled around. On the other hand, there are also cases where multiline strings are better: short strings in code which is meant to be read. If a single feature can't handle them both well, there's no shame in supporting the two features separately.

It makes sense to support multiline strings first because:

They extend existing syntax instead of introducing new syntax.

They are much easier to parse; heredocs require some kind of mode in the parser which kicks in at the start of the next line, whereas multiline string literals can be handled in the lexer.

As discussed in "Rationale", they offer better diagnostics, code formatting, and visual scannability.

<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#use-a-different-delimiter-for-multiline-strings>Use a different delimiter for multiline strings

The initial suggestion was that multiline strings should use a different delimiter, """, at the beginning and end of the string, with no continuation characters between. Like heredocs, this might be a good alternative for certain use cases, but it has the same basic flaws as the "no continuation character" solution.

--
Brent Royal-Gordon
Architechies

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

dabrahams · May 1, 2016, 2:01am

Awesome. Some specific suggestions below, but feel free to iterate in a pull
request if you prefer that.

I've adopted these suggestions in some form, though I also ended up rewriting
the explanation of why the feature was designed as it is and fusing it with
material from "Alternatives considered".

(Still not sure who I should list as a co-author. I'm currently thinking John,
Tyler, and maybe Chris? Who's supposed to go there?)

A couple of remarks:

First, this passage is wrong about the reliability of the triple-quote syntax:

  (like Python's """ strings) which trick some syntax
  highlighters into working some of the time with some contents, we don't think
  this occasional, accidental compatibility is a big enough gain to justify
  changing the design.

I've never seen a syntax highlighter have problems with it, I don't see
how it *could* ever cause a problem, and lastly I think it's both naïve
and presumptuous to call these effects accidental.

Second, this proposal should explain why it's reinventing the wheel
instead of standardizing existing, very successful, prior art. Answer
the question: “what compelling advantages does this syntax have over
Python's?”

···

on Thu Apr 28 2016, Brent Royal-Gordon <swift-evolution@swift.org> wrote:

--
Dave

Brent_Royal-Gordon · May 1, 2016, 11:49pm

• using the default string literal recognition mechanism binds us to have to ‘wait’ a bit in order to figure which kind of string literal we are dealing with (single/multi line)

Part of the idea of the continuation quotes proposal is that there *is* no distinction between a single-line literal and a multi-line literal. There are just string literals, which may include one line or many.

And in general, part of the advantage of this proposal is precisely that it has such low impact on the parser and lexer, and the grammar more generally. It is currently illegal to open a string and fail to close it on the same line, so we are just giving that obviously illegal construct an interpretation. By extending the existing feature, we get a whole lot of stuff for free:

* We don't have to worry about whether `|>` or `@|` or whatever construct you end up using might, in some place where you could otherwise use a string literal, be confusable with some other construct.
* We don't have to worry about whether obscure corners of the language, like the string parameters to `@available`, support multiline string literals (they do, automatically).
* We don't have to do a huge amount of redesigning in the lexer, and I doubt we'll have to touch the parser at all (the prototype doesn't).

It seems a bit backwards to propose we introduce a separate path for lexing multiline strings, and then complain that one of the proposals on the table makes it difficult to tell which path you should take, when the proposal in question is specifically designed to make having a separate path unnecessary.

···

--
Brent Royal-Gordon
Architechies

L_Mihalkovic · May 2, 2016, 1:57am

• using the default string literal recognition mechanism binds us to have to ‘wait’ a bit in order to figure which kind of string literal we are dealing with (single/multi line)

Part of the idea of the continuation quotes proposal is that there *is* no distinction between a single-line literal and a multi-line literal. There are just string literals, which may include one line or many.

And in general, part of the advantage of this proposal is precisely that it has such low impact on the parser and lexer, and the grammar more generally. It is currently illegal to open a string and fail to close it on the same line, so we are just giving that obviously illegal construct an interpretation. By extending the existing feature, we get a whole lot of stuff for free:

I appreciate how easy it is to retrofit some multiline literal behavior without touching much as demonstrated by John's code (I also implemented a different patch that isolates all code changes inside a couple new methods and generates the existing string_literal token).

The problem I am facing is that I also want to support "ZERO massaging" schemes (direct past without editing the lines), and so far I have not seen how to do it without opening a wider whole through the parser/lexer. I chose to make a parallel route simply to avoid risking making my code a merge nightmare as soon as the core team touches anything in the vicinity.

* We don't have to worry about whether `|>` or `@|` or whatever construct you end up using might, in some place where you could otherwise use a string literal, be confusable with some other construct.
* We don't have to worry about whether obscure corners of the language, like the string parameters to `@available`, support multiline string literals (they do, automatically).
* We don't have to do a huge amount of redesigning in the lexer, and I doubt we'll have to touch the parser at all (the prototype doesn't).

It seems a bit backwards to propose we introduce a separate path for lexing multiline strings, and then complain that one of the proposals on the table makes it difficult to tell which path you should take, when the proposal in question is specifically designed to make having a separate path unnecessary.

Just to be sure, I am complaining about anything, but merely referencing an argument I read in one of John Holdsworth's recent contribution. Regarding the "unnecessary separate path", IMHO it depends on what the end game is supposed to look like (as previously stated, I want to try tagging the contents of multine literals, which makes them different from single line ones), and at what horizon.

No matter what, you are the experts... I just appreciate how easy the quality of the codebase makes prototyping some of these ideas.

King regards

···

On May 2, 2016, at 1:49 AM, Brent Royal-Gordon <brent@architechies.com> wrote:

johnno1962 · May 2, 2016, 12:23pm

I'm having trouble getting the `e` modifier to work as advertised, at least for the sequence `\\`. For example, `print(e"\\\\")` prints two backslashes, and `print(e"\\\")` seems to try to escape the string literal. I'm currently envisioning `e` as disabling *all* backslash escapes, so these behaviors wouldn't be appropriate. It also looks like interpolation is still enabled in `e` strings.

Since other things like `print(e"\w+")` work just fine, I'm guessing this is a bug in the proposal's sketches (not being clear enough about the expected behavior), not your code.

I've written a gist with some tests to show how I expect things to work:

https://gist.github.com/brentdax/be3c032bc7e0c101d7ba8b72cd1a692e

The problem here is that I’ve not implemented unescaped literals fully as it would require changes outside the lexer.
This is because the string is first lexed and tokenised by one piece of code Lexer::lexStringLiteral but later
on in the code generation phase it generates the actual literal in a function Lexer::getEncodedStringSegment.
This is passed the same string from the source file but does not know what modifiers should be applied. As a result
normal escapes are still processed. All the “e” flag does is silence the error for invalid escapes during tokenising.

assert( e"\w\d+\(author)\n" == "\\w\\d+\(author)\n" );

Having encountered this limitation I managed to persuade myself this is what you want anyway but perhaps few would agree,
What has been implemented is more of an r”” than a e”” that solves the “picket fence” problem where you can also interpolate
into convenient regex literals. This is all beyond the scope of this proposal anyway so I’ll leave that battle for another day.
The changes to the compiler for anything else would be a step up in terms of disruption.

and one new feature that \ before a newline ignores the newline.

This is in the "Future directions for multiline strings" section of the proposal. Having implemented this, how do you feel about it? Does it seem like such a no-brainer that we should just incorporate it into the proposal?

I agree, lets move it into scope.

Latest toolchain with the ability to have more than one modifier as you suggest is now:
http://johnholdsworth.com/swift-LOCAL-2016-05-02-a-osx.tar.gz

John

Michael_Peternell · April 27, 2016, 7:40pm

It really amazes me what some people think multiline strings are.

For me, the *definition* of a multiline string is this: """A multiline string allows you to copy&paste most text, without having to use any special string quoting rules: that's the primary use case. For example, you can embed something like <span class="stronger" id='highlighted_bit'>HTML Tags with different quotes in it</span>, without needing any special care. Some not-so-common things may still need quotation, like 3 Quotes in a row, but most XML-snippets, JSON-text or Email-Headers can be pasted as-is."""

If you guys have another definition, then please share with us and enlighten us: What is the purpose (use case) of having multiline string literals then? And what is the definition of a multiline string literal? It seems (to me, at least) like some people just introduce a new string literal syntax and call it "multiline".

I'm really not very demanding on this issue. I would be happy with """3 quotes""", '''3 single quotes''', <<HERE_DOCS, <<'HERE_DOCS', q{perl style {balanced quotes} that also allow {{arbitrary} nesting}}; «Guillemets would also be nice», maybe combined with “English typographical quotes”, you usually don't use both in a string. My personal opinion would be to use <<HERE_DOCS for multiline literals with string interpolation (but without any escape sequences except \\ for \), and <<'HERE_DOCS' for multiline literals without any string interpolation or escape sequences at all (like 'bash string literals'). But the users who prefer a quoting style that requires each line to start with a specific token (wether it's " or \\) don't like any of the """quotation examples""" that I presented, right?

Regards,
Michael

VladimirS · April 28, 2016, 11:32am

Probably the subject of this proposal(thread) should be changed to something like "multi-line string literals with escaping of special chars and interpolation support"
,as I believe many of us thinks about multi-line strings in source as feature that allows us to have text *as-is*, without escaped characters, without interpolation, in situations when the text contains a lot of \ \\ \" " \( \t etc..

Probably someone can start new thread "multi-line strings with text as-is, without escaping" - so it will be discussed separately from current suggestion.

···

On 28.04.2016 12:30, Brent Royal-Gordon via swift-evolution wrote:

Should we not have a master document that considers the pros and cons
of many different solutions? I started writing one such treatise a few
weeks ago, but I haven’t been able to finish it yet. Just being able
to see all of the potential implementations compared in one place but
provide some insight. I imagine that this a feature that won’t change
once it’s implemented, so it’s important to get it right.

That might be a useful document to have, but I worry that we'll end up
seeing the string feature proposals signed in triplicate, sent in, sent
back, queried, lost, found, subjected to public inquiry, lost again, and
finally buried in soft peat for three months and recycled as
firelighters, all to end up in with basically the same proposals but
with slightly different keywords. Not every decision needs that level of
explicit, deep documentation. Some things you can think about,
experiment with, discuss, and do.

Brent_Royal-Gordon · April 28, 2016, 10:52pm

Did you ever really use multiline string literals before?

Yes. I used Perl in the CGI script era. Believe me, I have used every quoting syntax it supports extensively, including `'` strings, `"` strings, `q` strings, `qq` strings, and heredocs. This proposal is educated by knowledge of their foibles.

As outlined in the "Future directions for string literals in general" section, I believe alternate delimiters (so you can embed quotes) are a separate feature and should be handled in a separate proposal. Once both features are available, they can be combined. For instance, using the `_"foo"_` syntax I sketch there for alternate delimiters, you could say:

  let xml = _"<?xml version="1.0"?>
              "<catalog>
              " <book id="bk101" empty="">
              " <author>\(author)</author>
              " </book>
              "</catalog>"_

Basically, I am trying very, *very* hard not to let this proposal turn into "here's a huge pile of random string literal features which will become a giant catfight if we debate them all at once". Clearly this message is not getting through, but I'm not sure how I should edit the proposal to make it clear enough.

···

--
Brent Royal-Gordon
Architechies

Cole_Campbell · April 29, 2016, 6:02am

I think the proposal is very clear on its intended scope.

I really like this idea. It's readable, and it's as simple and uncluttered as I think it could be while still allowing the tabbing of new lines. The one thing I dislike about multi line strings in Ruby is how you have to left align each new line, which can really disrupt the visual flow of your code, especially if you are several tabs of indentation deep.

I would love to see the addition (at some point) of the ability to escape all whitespaces preceding the new line, if that line does not begin with a quote and thus is meant to be a direct continuation of the preceding line. That way you could maintain your indentation without adding whitespaces between what are meant to be subsequent characters in the string literal.

···

On Apr 28, 2016, at 5:52 PM, Brent Royal-Gordon via swift-evolution <swift-evolution@swift.org> wrote:

Did you ever really use multiline string literals before?

Yes. I used Perl in the CGI script era. Believe me, I have used every quoting syntax it supports extensively, including `'` strings, `"` strings, `q` strings, `qq` strings, and heredocs. This proposal is educated by knowledge of their foibles.

As outlined in the "Future directions for string literals in general" section, I believe alternate delimiters (so you can embed quotes) are a separate feature and should be handled in a separate proposal. Once both features are available, they can be combined. For instance, using the `_"foo"_` syntax I sketch there for alternate delimiters, you could say:

let xml = _"<?xml version="1.0"?>
             "<catalog>
             " <book id="bk101" empty="">
             " <author>\(author)</author>
             " </book>
             "</catalog>"_

Basically, I am trying very, *very* hard not to let this proposal turn into "here's a huge pile of random string literal features which will become a giant catfight if we debate them all at once". Clearly this message is not getting through, but I'm not sure how I should edit the proposal to make it clear enough.

--
Brent Royal-Gordon
Architechies

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Erica_Sadun · April 29, 2016, 2:20pm

Other than the underscores (I'm not sold on them but I could live with them), this is my favorite approach:

* It supports indented left-hand alignment, which is an important to me for readability
* It avoids painful `\n"+` RHS constructions
* It's easy to scan and understand
* It's simple and harmonious

-- E

···

On Apr 28, 2016, at 4:52 PM, Brent Royal-Gordon via swift-evolution <swift-evolution@swift.org> wrote:

Did you ever really use multiline string literals before?

Yes. I used Perl in the CGI script era. Believe me, I have used every quoting syntax it supports extensively, including `'` strings, `"` strings, `q` strings, `qq` strings, and heredocs. This proposal is educated by knowledge of their foibles.

As outlined in the "Future directions for string literals in general" section, I believe alternate delimiters (so you can embed quotes) are a separate feature and should be handled in a separate proposal. Once both features are available, they can be combined. For instance, using the `_"foo"_` syntax I sketch there for alternate delimiters, you could say:

  let xml = _"<?xml version="1.0"?>
              "<catalog>
              " <book id="bk101" empty="">
              " <author>\(author)</author>
              " </book>
              "</catalog>"_

VladimirS · April 29, 2016, 5:48am

is it just me who would prefer a multiline string literal to not require
a \backslash before each "double quote"?

You are not alone ;-)
But, as I understand, the proposal does not even try to solve a problem of *as-is* text in sources, but is fighting against just \n"+ at the end of the string. That is what is proposed. I don't feel like it is a valuable improvement, but it is OK for me to have such feature in language.

IMO we need just 2 variants: current method where we can use all the escaped chars, interpolation, \n and closing quotes, and additionally should have a feature to paste text *as-is*, without escapes and interpolation. For example :

let xml = "\

<?xml version="1.0"?>
<catalog>
   <book id="myid" empty="">
       <author>myAuthor</author>
       <title>myTitle \tutorial 1\(edition 2)</title>
   </book>
</catalog>

"

or

let xml = _"
"<?xml version="1.0"?>
"<catalog>
" <book id="myid" empty="">
" <author>myAuthor</author>
" <title>myTitle \tutorial 1\(edition 2)</title>
" </book>
"</catalog>

···

On 29.04.2016 1:31, Michael Peternell via swift-evolution wrote:

Did you ever really use multiline string literals before? I did, and
it's mostly for quick hacks where I wrote a script or tried something
out quickly. And maybe I needed to put an XML snippet into a unit test
case to see if my parser correctly parses or correctly rejects the
snippet. The current proposal doesn't help this use case in any way. I
cannot see which use case inspires multiline string literals which
require double quotes to be escaped... I wouldn't use them if they were
available. I'd become an Android developer instead ;)

-Michael

Am 28.04.2016 um 23:56 schrieb Brent Royal-Gordon via swift-evolution >> <swift-evolution@swift.org>:

Awesome. Some specific suggestions below, but feel free to iterate
in a pull request if you prefer that.

I've adopted these suggestions in some form, though I also ended up
rewriting the explanation of why the feature was designed as it is and
fusing it with material from "Alternatives considered".

(Still not sure who I should list as a co-author. I'm currently
thinking John, Tyler, and maybe Chris? Who's supposed to go there?)

Multiline string literals

• Proposal: SE-NNNN • Author(s): Brent Royal-Gordon • Status: Second
Draft • Review manager: TBD Introduction

In Swift 2.2, the only means to insert a newline into a string literal
is the \n escape. String literals specified in this way are generally
ugly and unreadable. We propose a multiline string feature inspired by
English punctuation which is a straightforward extension of our
existing string literals.

This proposal is one step in a larger plan to improve how string
literals address various challenging use cases. It is not meant to
solve all problems with escaping, nor to serve all use cases involving
very long string literals. See the "Future directions for string
literals in general" section for a sketch of the problems we
ultimately want to address and some ideas of how we might do so.

Swift-evolution threads: multi-line string literals. (April),
multi-line string literals (December)

Draft Notes

• Removes the comment feature, which was felt to be an unnecessary
complication. This and the backslash feature have been listed as
future directions.

• Loosens the specification of diagnostics, suggesting instead of
requiring fix-its.

• Splits a "Rationale" section out of the "Proposed solution"
section.

• Adds extensive discussion of other features which wold combine with
this one.

• I've listed only myself as an author because I don't want to put
anyone else's name to a document they haven't seen, but there are
others who deserve to be listed (John Holdsworth at least). Let me
know if you think you should be included.

Motivation

As Swift begins to move into roles beyond app development, code which
needs to generate text becomes a more important use case. Consider,
for instance, generating even a small XML string:

let xml = "<?xml version=\"1.0\"?>\n<catalog>\n\t<book id=\"bk101\"
empty=\"\">\n\t\t<author>\(author)</author>\n\t</book>\n</catalog>"
The string is practically unreadable, its structure drowned in escapes
and run-together lines; it looks like little more than line noise. We
can improve its readability somewhat by concatenating separate strings
for each line and using real tabs instead of \t escapes:

let xml = "<?xml version=\"1.0\"?>\n" +

"<catalog>\n" +

" <book id=\"bk101\" empty=\"\">\n" +

" <author>\(author)</author>\n" +

" </book>\n" +

"</catalog>" However, this creates a more complex expression for the
type checker, and there's still far more punctuation than ought to be
necessary. If the most important goal of Swift is making code
readable, this kind of code falls far short of that goal.

Proposed solution

We propose that, when Swift is parsing a string literal, if it reaches
the end of the line without encountering an end quote, it should look
at the next line. If it sees a quote at the beginning (a "continuation
quote"), the string literal contains a newline and then continues on
that line. Otherwise, the string literal is unterminated and
syntactically invalid.

Our sample above could thus be written as:

let xml = "<?xml version=\"1.0\"?> "<catalog> " <book id=\"bk101\"
empty=\"\"> " <author>\(author)</author> " </book> "</catalog>"

If the second or subsequent lines had not begun with a quotation mark,
or the trailing quotation mark after the </catalog>tag had not been
included, Swift would have emitted an error.

Rationale

This design is rather unusual, and it's worth pausing a moment to
explain why it has been chosen.

The traditional design for this feature, seen in languages like Perl
and Python, simply places one delimiter at the beginning of the
literal and another at the end. Individual lines in the literal are
not marked in any way.

We think continuation quotes offer several important advantages over
the traditional design:

• They help the compiler pinpoint errors in string literal delimiting.
Traditional multiline strings have a serious weakness: if you forget
the closing quote, the compiler has no idea where you wanted the
literal to end. It simply continues on until the compiler encounters
another quote (or the end of the file). If you're lucky, the text
after that quote is not valid code, and the resulting error will at
least point you to the next string literal in the file. If you're
unlucky, you'll get a seemingly unrelated error several literals
later, an unbalanced brace error at the end of the file, or perhaps
even code that compiles but does something totally wrong.

(This is not a minor concern. Many popular languages, including C and
Swift 2, specifically reject newlines in string literals to prevent
this from happening.)

Continuation quotes provide the compiler with redundant information
about your intent. If you forget a closing quote, the continuation
quotes give the compiler a very good idea of where you meant to put
it. The compiler can point you to (or at least very near) the end of
the literal, where you want to insert the quote, rather than showing
you the beginning of the literal or even some unrelated error later in
the file that was caused by the missing quote.

• Temporarily unclosed literals don't make editors go haywire. The
syntax highlighter has the same trouble parsing half-written, unclosed
traditional quotes that the compiler does: It can't tell where the
literal is supposed to end and the code should begin. It must either
apply heuristics to try to guess where the literal ends, or
incorrectly color everything between the opening quote and the next
closing quote as a string literal. This can cause the file's coloring
to alternate distractingly between "string literal" and "running
code".

Continuation quotes give the syntax highlighter enough context to
guess at the correct coloration, even when the string isn't complete
yet. Lines with a continuation quote are literals; lines without are
code. At worst, the syntax highlighter might incorrectly color a few
characters at the end of a line, rather than the remainder of the
file.

• They separate indentation from the string's contents. Traditional
multiline strings usually include all of the content between the start
and end delimiters, including leading whitespace. This means that it's
usually impossible to indent a multiline string, so including one
breaks up the flow of the surrounding code, making it less readable.
Some languages apply heuristics or mode switches to try to remove
indentation, but like all heuristics, these are mistake-prone and
murky.

Continuation quotes neatly avoid this problem. Whitespace before the
continuation quote is indentation used to format the source code;
whitespace after the continuation quote is part of the string literal.
The interpretation of the code is perfectly clear to both compiler and
programmer.

• They improve the ability to quickly recognize the literal.
Traditional multiline strings don't provide much visual help. To find
the end, you must visually scan until you find the matching delimiter,
which may be only one or a few characters long. When looking at a
random line of source, it can be hard to tell at a glance whether it's
code or literal. Syntax highlighting can help with these issues, but
it's often unreliable, especially with advanced, idiosyncratic string
literal features like multiline strings.

Continuation quotes solve these problems. To find the end of the
literal, just scan down the column of continuation characters until
they end. To figure out if a given line of source is part of a
literal, just see if it starts with a quote mark. The meaning of the
source becomes obvious at a glance.

Nevertheless, the traditional design does has a few advantages:

• It is simpler. Although continuation quotes are more complex, we
believe that the advantages listed above pay for that complexity.

• There is no need to edit the intervening lines to add continuation
quotes. While the additional effort required to insert continuation
quotes is an important downside, we believe that tool support,
including both compiler fix-its and perhaps editor support for
commands like "Paste as String Literal", can address this issue. In
some editors, new features aren't even necessary; TextMate, for
instance, lets you insert a character on several lines simultaneously.
And new tool features could also address other issues like escaping
embedded quotes.

• Naïve syntax highlighters may have trouble understanding this
syntax. This is true, but naïve syntax highlighters generally have
terrible trouble with advanced string literal constructs; some
struggle with even basic ones. While there are some designs (like
Python's """ strings) which trick some syntax highlighters into
working some of the time with some contents, we don't think this
occasional, accidental compatibility is a big enough gain to justify
changing the design.

• It looks funny—quotes should always be in matched pairs. We aren't
aware of another programming language which uses unbalanced quotes in
string literals, but there is one very important precedent for this
kind of formatting: natural languages. English, for instance, uses a
very similar format for quoting multiple lines of dialog by the same
speaker. As an English Stack Exchange answer illustrates:

“That seems like an odd way to use punctuation,” Tom said. “What harm
would there be in using quotation marks at the end of every
paragraph?”

“Oh, that’s not all that complicated,” J.R. answered. “If you closed
quotes at the end of every paragraph, then you would need to
reidentify the speaker with every subsequent paragraph.

“Say a narrative was describing two or three people engaged in a
lengthy conversation. If you closed the quotation marks in the
previous paragraph, then a reader wouldn’t be able to easily tell if
the previous speaker was extending his point, or if someone else in
the room had picked up the conversation. By leaving the previous
paragraph’s quote unclosed, the reader knows that the previous speaker
is still the one talking.”

“Oh, that makes sense. Thanks!” In English, omitting the ending
quotation mark tells the text's reader that the quote continues on the
next line, while including a quotation mark at the beginning of the
next line reminds the reader that they're in the middle of a quote.

Similarly, in this proposal, omitting the ending quotation mark tells
the code's reader (and compiler) that the string literal continues on
the next line, while including a quotation mark at the beginning of
the next line reminds the reader (and compiler) that they're in the
middle of a string literal.

On balance, we think continuation quotes are the best design for this
problem.

Detailed design

When Swift is parsing a string literal and reaches the end of a line
without finding a closing quote, it examines the next line, applying
the following rules:

• If the next line begins with whitespace followed by a continuation
quote, then the string literal contains a newline followed by the
contents of the string literal starting on that line. (This line may
itself have no closing quote, in which case the same rules apply to
the line which follows.)

• If the next line contains anything else, Swift raises a syntax error
for an unterminated string literal.

The exact error messages and diagnostics provided are left to the
implementers to determine, but we believe it should be possible to
provide two fix-its which will help users learn the syntax and correct
string literal mistakes:

• Insert " at the end of the current line to terminate the quote.

• Insert " at the beginning of the next line (with some indentation
heuristics) to continue the quote on the next line.

Impact on existing code

Failing to close a string literal before the end of the line is
currently a syntax error, so no valid Swift code should be affected by
this change.

Future directions for multiline string literals

• We could permit comments before encountering a continuation quote to
be counted as whitespace, and permit empty lines in the middle of
string literals. This would allow you to comment out whole lines in
the literal.

• We could allow you to put a trailing backslash on a line to indicate
that the newline isn't "real" and should be omitted from the literal's
contents.

Future directions for string literals in general

There are other issues with Swift's string handling which this
proposal intentionally does not address:

• Reducing the amount of double-backslashing needed when working with
regular expression libraries, Windows paths, source code generation,
and other tasks where backslashes are part of the data.

• Alternate delimiters or other strategies for writing strings with "
characters in them.

• Accommodating code formatting concerns like hard wrapping and
commenting.

• String literals consisting of very long pieces of text which are
best represented completely verbatim, with minimal alteration.

This section briefly outlines some future proposals which might
address these issues. Combined, we believe they would address most of
the string literal use cases which Swift is currently not very good
at.

Please note that these are simply sketches of hypothetical future
designs; they may radically change before proposal, and some may never
be proposed at all. Many, perhaps most, will not be proposed for Swift
3. We are sketching these designs not to propose and refine these
features immediately, but merely to show how we think they might be
solved in ways which complement this proposal.

String literal modifiers

A string literal modifier is a cluster of identifier characters which
goes before a string literal and adjusts the way it is parsed.
Modifers only alter the interpretation of the text in the literal, not
the type of data it produces; for instance, there will never be
something like the UTF-8/UTF-16/UTF-32 literal modifiers in C++.
Uppercase characters enable a feature; lowercase characters disable a
feature.

Modifiers can be attached to both single-line and multiline literals,
and could also be attached to other literal syntaxes which might be
introduced in the future. When used with multiline strings, only the
starting quote needs to carry the modifiers, not the continuation
quotes.

Modifiers are an extremely flexible feature which can be used for many
proposes. Of the ideas listed below, we believe the e modifier is an
urgent addition which should be included in Swift 3 if at all
possible; the others are less urgent and most of them could be
deferred, or at least added later if time allows.

• Escape disabling: e"\\\" (string with three backslash characters)

• Fine-grained escape disabling: i"\(foo)\n" (the string \(foo)
followed by a newline); eI"\(foo)\n" (the contents of foo followed by
the string \n), b"\w+\n" (the string \w+ followed by a newline)

• Alternate delimiters: _ has no lowercase form, so it could be used
to allow strings with internal quotes: _"print("Hello, world!")"_,
__"print("Hello, world!")"__, etc.

• Whitespace normalization: changes all runs of whitespace in the
literal to single space characters; this would allow you to use
multiline strings purely to improve code formatting.

alert.informativeText = W"\(appName) could not typeset the element
“\(title)” because "it includes a link to an element that has been
removed from this "book."

• Localization:

alert.informativeText = LW"\(appName) could not typeset the element
“\(title)” because "it includes a link to an element that has been
removed from this "book."

• Comments: Embedding comments in string literals might be useful for
literals containing regular expressions or other code.

Eventually, user-specified string modifiers could be added to Swift,
perhaps as part of a hygienic macro system. It might also become
possible to change the default modifiers applied to literals in a
particular file or scope.

Heredocs or other "verbatim string literal" features

Sometimes it really is best to just splat something else down in the
middle of a file full of Swift source code. Maybe the file is
essentially a template and the literals are a majority of the code's
contents, or maybe you're writing a code generator and just want to
get string data into it with minimal fuss, or maybe people unfamiliar
with Swift need to be able to edit the literals. Whatever the reason,
the normal string literal syntax is just too burdensome.

One approach to this problem is heredocs. A heredoc allows you to put
a placeholder for a literal on one line; the contents of the literal
begin on the next line, running up to some delimiter. It would be
possible to put multiple placeholders in a single line, and to apply
string modifiers to them.

In Swift, this might look like:

print(#to("---") + e#to("END" )) It was a dark and stormy \(timeOfDay)
when --- the Swift core team invented the \(interpolation) syntax.
END

Another possible approach would be to support traditional multiline
string literals bounded by a different delimiter, like """. This might
look like:

print(""" It was a dark and stormy \(timeOfDay) when """ + e""" the
Swift core team invented the \(interpolation) syntax. """) Although
heredocs could make a good addition to Swift eventually, there are
good reasons to defer them for now. Please see the "Alternatives
considered" section for details.

First-class regular expressions

Members of the core team are interested in regular expressions, but
they don't want to just build a literal that wraps PCRE or libicu;
rather, they aim to integrate regexes into the pattern matching system
and give them a deep, Perl 6-style rethink. This would be a major
effort, far beyond the scope of Swift 3.

In the meantime, the e modifier and perhaps other string literal
modifiers will make it easier to specify regular expressions in string
literals for use with NSRegularExpression and other libraries
accessible from Swift.

Alternatives considered

Requiring no continuation character

The main alternative is to not require a continuation quote, and
simply extend the string literal from the starting quote to the ending
quote, including all newlines between them. For example:

let xml = "<?xml version=\"1.0\"?> <catalog> <book id=\"bk101\"
empty=\"\"> <author>\(author)</author> </book> </catalog>" This
alternative is extensively discussed in the "Rationale" section
above.

Skip multiline strings and just support heredocs

There are definitely cases where a heredoc would be a better solution,
such as generated code or code which is mostly literals with a little
Swift sprinkled around. On the other hand, there are also cases where
multiline strings are better: short strings in code which is meant to
be read. If a single feature can't handle them both well, there's no
shame in supporting the two features separately.

It makes sense to support multiline strings first because:

• They extend existing syntax instead of introducing new syntax.

• They are much easier to parse; heredocs require some kind of mode in
the parser which kicks in at the start of the next line, whereas
multiline string literals can be handled in the lexer.

• As discussed in "Rationale", they offer better diagnostics, code
formatting, and visual scannability.

Use a different delimiter for multiline strings

The initial suggestion was that multiline strings should use a
different delimiter, """, at the beginning and end of the string, with
no continuation characters between. Like heredocs, this might be a
good alternative for certain use cases, but it has the same basic
flaws as the "no continuation character" solution.

-- Brent Royal-Gordon Architechies

_______________________________________________ swift-evolution
mailing list swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________ swift-evolution mailing
list swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

cloutiertyler · April 30, 2016, 7:11pm

Awesome. Some specific suggestions below, but feel free to iterate in a pull request if you prefer that.

I've adopted these suggestions in some form, though I also ended up rewriting the explanation of why the feature was designed as it is and fusing it with material from "Alternatives considered".

(Still not sure who I should list as a co-author. I'm currently thinking John, Tyler, and maybe Chris? Who's supposed to go there?)

Multiline string literals

Proposal: SE-NNNN <https://github.com/apple/swift-evolution/blob/master/proposals/NNNN-name.md>
Author(s): Brent Royal-Gordon <https://github.com/brentdax>
Status: Second Draft
Review manager: TBD
<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#introduction>Introduction

In Swift 2.2, the only means to insert a newline into a string literal is the \n escape. String literals specified in this way are generally ugly and unreadable. We propose a multiline string feature inspired by English punctuation which is a straightforward extension of our existing string literals.

This proposal is one step in a larger plan to improve how string literals address various challenging use cases. It is not meant to solve all problems with escaping, nor to serve all use cases involving very long string literals. See the "Future directions for string literals in general" section for a sketch of the problems we ultimately want to address and some ideas of how we might do so.

Swift-evolution threads: multi-line string literals. (April) <https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20160418/015500.html>, multi-line string literals (December) <https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20151214/002349.html>
<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#draft-notes>Draft Notes

Removes the comment feature, which was felt to be an unnecessary complication. This and the backslash feature have been listed as future directions.

Loosens the specification of diagnostics, suggesting instead of requiring fix-its.

Splits a "Rationale" section out of the "Proposed solution" section.

Adds extensive discussion of other features which wold combine with this one.

I've listed only myself as an author because I don't want to put anyone else's name to a document they haven't seen, but there are others who deserve to be listed (John Holdsworth at least). Let me know if you think you should be included.

<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#motivation>Motivation

As Swift begins to move into roles beyond app development, code which needs to generate text becomes a more important use case. Consider, for instance, generating even a small XML string:

let xml = "<?xml version=\"1.0\"?>\n<catalog>\n\t<book id=\"bk101\" empty=\"\">\n\t\t<author>\(author)</author>\n\t</book>\n</catalog>"
The string is practically unreadable, its structure drowned in escapes and run-together lines; it looks like little more than line noise. We can improve its readability somewhat by concatenating separate strings for each line and using real tabs instead of \t escapes:

let xml = "<?xml version=\"1.0\"?>\n" +
          "<catalog>\n" +
          " <book id=\"bk101\" empty=\"\">\n" +
          " <author>\(author)</author>\n" +
          " </book>\n" +
          "</catalog>"
However, this creates a more complex expression for the type checker, and there's still far more punctuation than ought to be necessary. If the most important goal of Swift is making code readable, this kind of code falls far short of that goal.

<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#proposed-solution>Proposed solution

We propose that, when Swift is parsing a string literal, if it reaches the end of the line without encountering an end quote, it should look at the next line. If it sees a quote at the beginning (a "continuation quote"), the string literal contains a newline and then continues on that line. Otherwise, the string literal is unterminated and syntactically invalid.

One other way to implement the feature would be to allow quotes to be terminated by either a close quote or an end of line character. Multiline literals would then be constructed by concatenating adjacent (e.i. separated by only comments or whitespace) string literals.

There is an issue with this that

let foo = “bar

would be a valid string whose value would be “bar\n”, even though that might not be the intended result. There is also the issue of things like

let foo = [
  “string1”,
  “string2”
  “string3"
]

becoming [“string1”, “string2string3”]. This is something that can happen in Python, for example.

However, this has a few benefits. Namely, that it simplifies the model and that if I’m just pasting in a block of text I don’t have to add a trailing quote to the last line. If I was in Vim for example, I could just visual-block add a column of quotes at the beginning. So it would be:

let xml = "<?xml version=\"1.0\"?>
          "<catalog>
          " <book id=\"bk101\" empty=\"\">
          " <author>\(author)</author>
          " </book>
          "</catalog>

Just one amendment, in order to have the same string you would still have to write:

let xml = "<?xml version=\"1.0\"?>
          "<catalog>
          " <book id=\"bk101\" empty=\"\">
          " <author>\(author)</author>
          " </book>
          "</catalog>"
Or else an unwanted newline would be appended to the end. Additionally Brent made an excellent point about having ending delimiters for the following case:

let xml = "<?xml version=\"1.0\"?>
          "<catalog>
          " <book id=\"bk101\" empty=\"\">
          " <author>\(author)</author>
          " </book>
          "</catalog>".encoded(as: .UTF8)

···

On Apr 30, 2016, at 11:54 AM, Tyler Fleming Cloutier via swift-evolution <swift-evolution@swift.org> wrote:

On Apr 28, 2016, at 2:56 PM, Brent Royal-Gordon via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Our sample above could thus be written as:

let xml = "<?xml version=\"1.0\"?>
          "<catalog>
          " <book id=\"bk101\" empty=\"\">
          " <author>\(author)</author>
          " </book>
          "</catalog>"
If the second or subsequent lines had not begun with a quotation mark, or the trailing quotation mark after the </catalog>tag had not been included, Swift would have emitted an error.

<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#rationale>Rationale

This design is rather unusual, and it's worth pausing a moment to explain why it has been chosen.

The traditional design for this feature, seen in languages like Perl and Python, simply places one delimiter at the beginning of the literal and another at the end. Individual lines in the literal are not marked in any way.

We think continuation quotes offer several important advantages over the traditional design:

They help the compiler pinpoint errors in string literal delimiting. Traditional multiline strings have a serious weakness: if you forget the closing quote, the compiler has no idea where you wanted the literal to end. It simply continues on until the compiler encounters another quote (or the end of the file). If you're lucky, the text after that quote is not valid code, and the resulting error will at least point you to the next string literal in the file. If you're unlucky, you'll get a seemingly unrelated error several literals later, an unbalanced brace error at the end of the file, or perhaps even code that compiles but does something totally wrong.

(This is not a minor concern. Many popular languages, including C and Swift 2, specifically reject newlines in string literals to prevent this from happening.)

Continuation quotes provide the compiler with redundant information about your intent. If you forget a closing quote, the continuation quotes give the compiler a very good idea of where you meant to put it. The compiler can point you to (or at least very near) the end of the literal, where you want to insert the quote, rather than showing you the beginning of the literal or even some unrelated error later in the file that was caused by the missing quote.

Temporarily unclosed literals don't make editors go haywire. The syntax highlighter has the same trouble parsing half-written, unclosed traditional quotes that the compiler does: It can't tell where the literal is supposed to end and the code should begin. It must either apply heuristics to try to guess where the literal ends, or incorrectly color everything between the opening quote and the next closing quote as a string literal. This can cause the file's coloring to alternate distractingly between "string literal" and "running code".

Continuation quotes give the syntax highlighter enough context to guess at the correct coloration, even when the string isn't complete yet. Lines with a continuation quote are literals; lines without are code. At worst, the syntax highlighter might incorrectly color a few characters at the end of a line, rather than the remainder of the file.

They separate indentation from the string's contents. Traditional multiline strings usually include all of the content between the start and end delimiters, including leading whitespace. This means that it's usually impossible to indent a multiline string, so including one breaks up the flow of the surrounding code, making it less readable. Some languages apply heuristics or mode switches to try to remove indentation, but like all heuristics, these are mistake-prone and murky.

Scala has an interesting solution to this problem which doesn’t involve a mode, but rather a function that strips out whitespace before the | character. In this case the | character serves a very similar purpose to the continuation quote. The particular character can be passed to the function as an argument.

1.2. Creating Multiline Strings - Scala Cookbook [Book]

Continuation quotes neatly avoid this problem. Whitespace before the continuation quote is indentation used to format the source code; whitespace after the continuation quote is part of the string literal. The interpretation of the code is perfectly clear to both compiler and programmer.

They improve the ability to quickly recognize the literal. Traditional multiline strings don't provide much visual help. To find the end, you must visually scan until you find the matching delimiter, which may be only one or a few characters long. When looking at a random line of source, it can be hard to tell at a glance whether it's code or literal. Syntax highlighting can help with these issues, but it's often unreliable, especially with advanced, idiosyncratic string literal features like multiline strings.

Continuation quotes solve these problems. To find the end of the literal, just scan down the column of continuation characters until they end. To figure out if a given line of source is part of a literal, just see if it starts with a quote mark. The meaning of the source becomes obvious at a glance.

Nevertheless, the traditional design does has a few advantages:

It is simpler. Although continuation quotes are more complex, we believe that the advantages listed above pay for that complexity.

There is no need to edit the intervening lines to add continuation quotes. While the additional effort required to insert continuation quotes is an important downside, we believe that tool support, including both compiler fix-its and perhaps editor support for commands like "Paste as String Literal", can address this issue. In some editors, new features aren't even necessary; TextMate, for instance, lets you insert a character on several lines simultaneously. And new tool features could also address other issues like escaping embedded quotes.

Although I was concerned about this, most editors do have some way of inserting a column of characters which would reduce the burden of pasting in code. And although enabling/disabling escaping is an orthogonal feature, allowing the _” syntax to disable escaping would allow you to paste in code with no other modifications.

Naïve syntax highlighters may have trouble understanding this syntax. This is true, but naïve syntax highlighters generally have terrible trouble with advanced string literal constructs; some struggle with even basic ones. While there are some designs (like Python's """ strings) which trick some syntax highlighters into working some of the time with some contents, we don't think this occasional, accidental compatibility is a big enough gain to justify changing the design.

It looks funny—quotes should always be in matched pairs. We aren't aware of another programming language which uses unbalanced quotes in string literals, but there is one very important precedent for this kind of formatting: natural languages. English, for instance, uses a very similar format for quoting multiple lines of dialog by the same speaker. As an English Stack Exchange answer illustrates <punctuation - Why does the multi-paragraph quotation rule exist? - English Language & Usage Stack Exchange

“That seems like an odd way to use punctuation,” Tom said. “What harm would there be in using quotation marks at the end of every paragraph?”

“Oh, that’s not all that complicated,” J.R. answered. “If you closed quotes at the end of every paragraph, then you would need to reidentify the speaker with every subsequent paragraph.

“Say a narrative was describing two or three people engaged in a lengthy conversation. If you closed the quotation marks in the previous paragraph, then a reader wouldn’t be able to easily tell if the previous speaker was extending his point, or if someone else in the room had picked up the conversation. By leaving the previous paragraph’s quote unclosed, the reader knows that the previous speaker is still the one talking.”

“Oh, that makes sense. Thanks!”
In English, omitting the ending quotation mark tells the text's reader that the quote continues on the next line, while including a quotation mark at the beginning of the next line reminds the reader that they're in the middle of a quote.

Similarly, in this proposal, omitting the ending quotation mark tells the code's reader (and compiler) that the string literal continues on the next line, while including a quotation mark at the beginning of the next line reminds the reader (and compiler) that they're in the middle of a string literal.

  This is very interesting, I never knew!

On balance, we think continuation quotes are the best design for this problem.

<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#detailed-design>Detailed design

When Swift is parsing a string literal and reaches the end of a line without finding a closing quote, it examines the next line, applying the following rules:

If the next line begins with whitespace followed by a continuation quote, then the string literal contains a newline followed by the contents of the string literal starting on that line. (This line may itself have no closing quote, in which case the same rules apply to the line which follows.)

If the next line contains anything else, Swift raises a syntax error for an unterminated string literal.

The exact error messages and diagnostics provided are left to the implementers to determine, but we believe it should be possible to provide two fix-its which will help users learn the syntax and correct string literal mistakes:

Insert " at the end of the current line to terminate the quote.

Insert " at the beginning of the next line (with some indentation heuristics) to continue the quote on the next line.

<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#impact-on-existing-code>Impact on existing code

Failing to close a string literal before the end of the line is currently a syntax error, so no valid Swift code should be affected by this change.

<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#future-directions-for-multiline-string-literals>Future directions for multiline string literals

We could permit comments before encountering a continuation quote to be counted as whitespace, and permit empty lines in the middle of string literals. This would allow you to comment out whole lines in the literal.

We could allow you to put a trailing backslash on a line to indicate that the newline isn't "real" and should be omitted from the literal's contents.

<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#future-directions-for-string-literals-in-general>Future directions for string literals in general

There are other issues with Swift's string handling which this proposal intentionally does not address:

Reducing the amount of double-backslashing needed when working with regular expression libraries, Windows paths, source code generation, and other tasks where backslashes are part of the data.

Alternate delimiters or other strategies for writing strings with " characters in them.

Accommodating code formatting concerns like hard wrapping and commenting.

String literals consisting of very long pieces of text which are best represented completely verbatim, with minimal alteration.

This section briefly outlines some future proposals which might address these issues. Combined, we believe they would address most of the string literal use cases which Swift is currently not very good at.

Please note that these are simply sketches of hypothetical future designs; they may radically change before proposal, and some may never be proposed at all. Many, perhaps most, will not be proposed for Swift 3. We are sketching these designs not to propose and refine these features immediately, but merely to show how we think they might be solved in ways which complement this proposal.

<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#string-literal-modifiers>String literal modifiers

A string literal modifier is a cluster of identifier characters which goes before a string literal and adjusts the way it is parsed. Modifers only alter the interpretation of the text in the literal, not the type of data it produces; for instance, there will never be something like the UTF-8/UTF-16/UTF-32 literal modifiers in C++. Uppercase characters enable a feature; lowercase characters disable a feature.

Modifiers can be attached to both single-line and multiline literals, and could also be attached to other literal syntaxes which might be introduced in the future. When used with multiline strings, only the starting quote needs to carry the modifiers, not the continuation quotes.

Modifiers are an extremely flexible feature which can be used for many proposes. Of the ideas listed below, we believe the e modifier is an urgent addition which should be included in Swift 3 if at all possible; the others are less urgent and most of them could be deferred, or at least added later if time allows.

Escape disabling: e"\\\" (string with three backslash characters)

Fine-grained escape disabling: i"\(foo)\n" (the string \(foo) followed by a newline); eI"\(foo)\n" (the contents of foo followed by the string \n), b"\w+\n" (the string \w+ followed by a newline)

Alternate delimiters: _ has no lowercase form, so it could be used to allow strings with internal quotes: _"print("Hello, world!")"_, __"print("Hello, world!")"__, etc.

This is interesting and perhaps could be applied per line with the continuation quote syntax:

let xml = _"<?xml version="1.0"?>
          _"<catalog>
          _" <book id="bk101" empty="">
           " <author>\(author)</author>
          _" </book>
          _"</catalog>
This would allow individual lines to retain the ability to do escaping and interpolation without affecting the whole string, just like the author line in the example above. This is also very easy to insert into editors just like the standard continuation quote syntax. Or perhaps we could just “escape” each string:

let xml = \"<?xml version="1.0"?>
          \"<catalog>
          \" <book id="bk101" empty="">
           " <author>\(author)</author>
          \" </book>
          \"</catalog>

Whitespace normalization: changes all runs of whitespace in the literal to single space characters; this would allow you to use multiline strings purely to improve code formatting.

alert.informativeText =
    W"\(appName) could not typeset the element “\(title)” because
     "it includes a link to an element that has been removed from this
     "book."
Localization:

alert.informativeText =
    LW"\(appName) could not typeset the element “\(title)” because
      "it includes a link to an element that has been removed from this
      "book."
Comments: Embedding comments in string literals might be useful for literals containing regular expressions or other code.

Eventually, user-specified string modifiers could be added to Swift, perhaps as part of a hygienic macro system. It might also become possible to change the default modifiers applied to literals in a particular file or scope.

<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#heredocs-or-other-verbatim-string-literal-features>Heredocs or other "verbatim string literal" features

Sometimes it really is best to just splat something else down in the middle of a file full of Swift source code. Maybe the file is essentially a template and the literals are a majority of the code's contents, or maybe you're writing a code generator and just want to get string data into it with minimal fuss, or maybe people unfamiliar with Swift need to be able to edit the literals. Whatever the reason, the normal string literal syntax is just too burdensome.

One approach to this problem is heredocs. A heredoc allows you to put a placeholder for a literal on one line; the contents of the literal begin on the next line, running up to some delimiter. It would be possible to put multiple placeholders in a single line, and to apply string modifiers to them.

In Swift, this might look like:

print(#to("---") + e#to("END"))
It was a dark and stormy \(timeOfDay) when
---
the Swift core team invented the \(interpolation) syntax.
END
Another possible approach would be to support traditional multiline string literals bounded by a different delimiter, like """. This might look like:

print("""
It was a dark and stormy \(timeOfDay) when
""" + e"""
the Swift core team invented the \(interpolation) syntax.
""")
Although heredocs could make a good addition to Swift eventually, there are good reasons to defer them for now. Please see the "Alternatives considered" section for details.

<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#first-class-regular-expressions>First\-class regular expressions

Members of the core team are interested in regular expressions, but they don't want to just build a literal that wraps PCRE or libicu; rather, they aim to integrate regexes into the pattern matching system and give them a deep, Perl 6-style rethink. This would be a major effort, far beyond the scope of Swift 3.

In the meantime, the e modifier and perhaps other string literal modifiers will make it easier to specify regular expressions in string literals for use with NSRegularExpression and other libraries accessible from Swift.

<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#alternatives-considered>Alternatives considered

<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#requiring-no-continuation-character>Requiring no continuation character

The main alternative is to not require a continuation quote, and simply extend the string literal from the starting quote to the ending quote, including all newlines between them. For example:

let xml = "<?xml version=\"1.0\"?>
<catalog>
    <book id=\"bk101\" empty=\"\">
        <author>\(author)</author>
    </book>
</catalog>"
This alternative is extensively discussed in the "Rationale" section above.

<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#skip-multiline-strings-and-just-support-heredocs>Skip multiline strings and just support heredocs

There are definitely cases where a heredoc would be a better solution, such as generated code or code which is mostly literals with a little Swift sprinkled around. On the other hand, there are also cases where multiline strings are better: short strings in code which is meant to be read. If a single feature can't handle them both well, there's no shame in supporting the two features separately.

It makes sense to support multiline strings first because:

They extend existing syntax instead of introducing new syntax.

They are much easier to parse; heredocs require some kind of mode in the parser which kicks in at the start of the next line, whereas multiline string literals can be handled in the lexer.

As discussed in "Rationale", they offer better diagnostics, code formatting, and visual scannability.

<https://gist.github.com/brentdax/c580bae68990b160645c030b2d0d1a8f#use-a-different-delimiter-for-multiline-strings>Use a different delimiter for multiline strings

The initial suggestion was that multiline strings should use a different delimiter, """, at the beginning and end of the string, with no continuation characters between. Like heredocs, this might be a good alternative for certain use cases, but it has the same basic flaws as the "no continuation character" solution.

That might be a useful document to have, but I worry that we'll end up seeing the string feature proposals signed in triplicate, sent in, sent back, queried, lost, found, subjected to public inquiry, lost again, and finally buried in soft peat for three months and recycled as firelighters, all to end up in with basically the same proposals but with slightly different keywords. Not every decision needs that level of explicit, deep documentation. Some things you can think about, experiment with, discuss, and do.

Yeah, I think you are probably right here. I actually think with the additions to your proposal it covers almost all of the other suggestions regarding string literals or at least mentions them as alternatives. Thanks so much for spending the time putting together the proposal! I have no idea how you find the time to follow and participate in what seems like every Swift evolution thread, but it’s awesome!

Tyler

--
Brent Royal-Gordon
Architechies

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Brent_Royal-Gordon · May 1, 2016, 3:11am

Second, this proposal should explain why it's reinventing the wheel
instead of standardizing existing, very successful, prior art. Answer
the question: “what compelling advantages does this syntax have over
Python's?”

Sure.

First of all, I will admit up front that I have not written much Python (a couple weeks ago, "much" would have been "any") and I may not fully understand their string literals. So I'll start by describing my understanding of the design in question; then I'll critique the design as I understand it. So if something in this section is wrong, please forgive any related mistakes in the critique.

Python offers a `"""` string which is almost the same as the `"` string:

  * Every character between the first `"""` and the second `"""` is part of its contents.
  * Escapes are processed normally.
  * There is no special behavior with regards to whitespace.

The only difference is that a `"""` string allows real, unescaped newlines in it, while a `"` string forbids them. (And, of course, since the delimiter is `"""`, the strings `"` and `""` are interpreted literally.)

This approach is really simple, which is a plus, but it has a number of issues.

CONTENT FORMATTING

A number of aspects of the design combine to make `"""` strings harder to read than they should be:

  * You can't indent the contents of a `"""` string to match the code it's in. This is actually pretty shocking considering how sensitive Python is to indentation, and it necessitates a number of strange hacks (for instance, Python's `help()` function unindents all but the first line of doc strings).
  * You can't put all of the contents against the left margin, either, because a newline right after the `"""` is counted as part of the string's contents. (You can use a backslash to work around this.)
  * The last line of the string also has to have the delimiter in it, because again, a newline right before the `"""`is counted as part of the string's contents. (You can use a backslash to work around this, but the backslash is *not* in the mirror position of the start of the string, so good luck remembering it.)

In other words, the first and last lines have to be adulterated by adding a `"""`, and the middle lines can't be indented to line up with either the surrounding code or the beginning of the first line. If one of the selling points of this feature is that you just stick your contents in verbatim without alteration, that isn't great.

This is such a problem that, in researching `"""` to be sure I understood how it works, I came across a Stack Overflow question whose answers are full of people recommending a different, more highly punctuated, feature instead: <How does Python's triple-quote string work? - Stack Overflow;

(There is an alternate design which would fix the beginning and end problems: make a newline after the opening delimiter and before the closing delimiter mandatory and part of the delimiter. You might then choose to fix the indentation problem by taking the whitespace between the closing delimiter and the newline before it as the amount of indentation for the entire string, and removing that much indentation from each line. But that's not what Python does, and it's not what you seem to be proposing.)

BREAKING UP EXPRESSIONS

String literals are expressions, and in fact, they are expressions with no side effects. To do anything useful, they *must* be put into a larger expression. Often this expression is an assignment, but it could be anything—concatenation, method call, function parameter, you name it.

This creates a challenge for multiline strings, because they can become very large and effectively break up the expression they're in. The continuation-quote-based multiline strings I'm proposing are aimed primarily at relatively short strings*, where this is less of a concern. But `"""` aims to be used not only for short strings, but for ones which may be many dozens or even hundreds of lines long. You're going to end up with code like:

  print("""<?xml version="1.0"?>
  <catalog>
    <book id="bk101" empty="">
      ...
      ...
      ...a hundred more lines of XML with interpolations in it...
      ...
      ...
    </book>
  </catalog>""")

What does that `)` mean? Who knows? We saw the beginning of the expression an hour and a half ago. (It's common to avoid this issue by assigning the string to a constant even if it's only going to be used once, but that just changes the problem a little—now you're trying to remember the name of a local variable declared a hundred lines ago.)

Heredocs cleverly avoid this issue by not trying to put the literal's contents in the middle of the expression. Instead, they put a short placeholder in the expression, then start the contents on the next line. The expression is readable as an expression, while the contents of the literal are adjacent but separate. That's why I think they're a better solution than `"""` for truly massive string literals.

* This is something I am not saying in the proposal, but I really should.

NESTING

Another problem is that you don't get another choice besides `"""`. That's not so bad, though, right? It's such an uncommon sequence of characters, surely you'll never encounter it?

Well, sure...until you try to generate code.

For instance, suppose you're writing a web app using a barebones Swift framework and you have a lot of code like this:

  response.send("""<tr>
    <td>\(name)</td>
    <td>\(value)</td>
  </tr>
  """)

Every 90s Perl hacker knows what a pain this is, and every 90s Perl hacker knows the solution: a template language. Hack together some kind of simple syntax for embedding commands in a file of content, and then convert it into runnable code with a tool that does things like:

  print("""
  response.send("""\(escapedContent)""")
  """)

...oh. Wait a minute there.

To get around this, you really need to support, not two delimiters, but *n* delimiters. Heredocs let you choose an arbitrary delimiter. C++ lets you augment the delimiter with arbitrary characters. Perl's `qq` construct lets you choose a single character, but it can be almost anything you want (and some of them nest). I'm thinking about letting you extend the delimiter with an arbitrary number of underscores. All of these solutions have in common that they don't just have "primary" and "alternate" delimiters, but an effectively endless number of them.

`"""` does not have this feature—you just have the primary delimiter and the alternate delimiter, and if neither of them works for you, you have to escape. That isn't ideal.

RUNAWAY LITERALS

`"""` does not offer much help with preventing or diagnosing runaway literals or highlighting code with half-written literals. Heredocs don't either, but I envision heredocs being used less often than `"""` strings would be, since continuation quotes would handle shorter strings.

SYNTAX HIGHLIGHTING

So, let's talk about this:

(like Python's """ strings) which trick some syntax
highlighters into working some of the time with some contents, we don't think
this occasional, accidental compatibility is a big enough gain to justify
changing the design.

I've never seen a syntax highlighter have problems with it, I don't see
how it *could* ever cause a problem, and lastly I think it's both naïve
and presumptuous to call these effects accidental.

I call these effects "accidental" because the syntax highlighter was not designed to handle the `"""`; it just happens to handle it correctly because it misinterprets a `"""` string as an empty `"` string, followed by a non-empty `"` string, followed by another empty `"` string. It's "accidental" from the perspective of the syntax highlighter designer, not the language designer, who probably intended that to happen.

And it only works in a specific subset of cases. It breaks if:

* The syntax highlighter tries to apply smarter per-language rules.
* The syntax highlighter assumes that strings are not allowed to be multi-line. (This is true of many languages, including C derivatives and Swift 2.)
* The string literal contains any `"` characters, which `"""` is often used in order to permit.
* The string literal contains any escapes or special features that the syntax highlighter misinterprets, like an interpolation which itself contains a string literal.

Yes, it will often work, or at least sort-of work. But I just don't see that as very valuable.

WHAT'S GOOD ABOUT `"""`?

In my opinion, the best thing about `"""` (the language feature) is `"""` (the token).

A sequence of three quote marks is a fantastic token for a feature meant to create long string literals. It clearly has something to do with string literals, but it cannot be an empty string, because there are too many quote marks—that is, it's too long. It's a really clever mnemonic which also parses unambiguously.

I've spoken before in this thread and others about potentially using `"""` as an alternate delimiter (which could be extended to `"""""` and beyond). I'm also considering the idea that it might be a good token for a Perl-style heredoc syntax:

  print(""" + e""")
  It was a dark and stormy \(timeOfDay) when
  """
  the Swift core team invented the \(interpolation) syntax.
  """

Nesting could be achieved with a version of whatever alternate delimiter syntax we use for `"` strings. For instance, if we adopted the `_"foo"_` syntax I sketched:

  print(_"""_)
  response.send(""")
  \(escapedContent)
  """
  _"""_

(P.S. If this post seems way too long to have been written in a couple hours, that's because I've been drafting a version of it on and off for a day or two; it just so happened that Dave directly asked me to confront `"""` today.)

···

--
Brent Royal-Gordon
Architechies

Brent_Royal-Gordon · May 2, 2016, 2:16am

The problem I am facing is that I also want to support "ZERO massaging" schemes (direct past without editing the lines), and so far I have not seen how to do it without opening a wider whole through the parser/lexer. I chose to make a parallel route simply to avoid risking making my code a merge nightmare as soon as the core team touches anything in the vicinity.

I totally understand that; since I do think we should have a no-massaging feature eventually, you're right that such a path should probably exist sooner or later.

Sorry if I jumped down your throat—I think I just misinterpreted what you were asking for.

···

--
Brent Royal-Gordon
Architechies

Haravikk3 · May 2, 2016, 10:31am

So I’m pretty late to the discussion here, but my question is; do we really want to encourage large blocks of text in code? They tend to bloat things, especially if they’re not used very often, and even if they are, any large chunk of text can be loaded from file and cached if (and when) necessary. If you need to process it, then some kind of templating system would be better.

I dunno, I just feel like if you’re storing enough text that the current syntax becomes burdensome, then perhaps the text should be stored elsewhere? I’d rather discourage big blocks of text in code personally.

L_Mihalkovic · May 2, 2016, 2:00pm

Inline

Regards
LM
(From mobile)

I'm having trouble getting the `e` modifier to work as advertised, at least for the sequence `\\`. For example, `print(e"\\\\")` prints two backslashes, and `print(e"\\\")` seems to try to escape the string literal. I'm currently envisioning `e` as disabling *all* backslash escapes, so these behaviors wouldn't be appropriate. It also looks like interpolation is still enabled in `e` strings.

Since other things like `print(e"\w+")` work just fine, I'm guessing this is a bug in the proposal's sketches (not being clear enough about the expected behavior), not your code.

I've written a gist with some tests to show how I expect things to work:

https://gist.github.com/brentdax/be3c032bc7e0c101d7ba8b72cd1a692e

The problem here is that I’ve not implemented unescaped literals fully as it would require changes outside the lexer.

I think you are correct, I am looking at Parse.cpp and further down the pipeline.

Here is a 1¢ thought born out of this exploration:

%1 = _"[aapl_sil]"\
// inline sil code (macro-like)
...
// more code
...
"_

Just a different kind of string literal contents... understood by the ide (completion/syntax chk) and processed by the compiler at the right stage. Who knows where really adventurous devs might take something like this?

···

On May 2, 2016, at 2:23 PM, John Holdsworth <mac@johnholdsworth.com> wrote:

This is because the string is first lexed and tokenised by one piece of code Lexer::lexStringLiteral but later
on in the code generation phase it generates the actual literal in a function Lexer::getEncodedStringSegment.
This is passed the same string from the source file but does not know what modifiers should be applied. As a result
normal escapes are still processed. All the “e” flag does is silence the error for invalid escapes during tokenising.

assert( e"\w\d+\(author)\n" == "\\w\\d+\(author)\n" );

Having encountered this limitation I managed to persuade myself this is what you want anyway but perhaps few would agree,
What has been implemented is more of an r”” than a e”” that solves the “picket fence” problem where you can also interpolate
into convenient regex literals. This is all beyond the scope of this proposal anyway so I’ll leave that battle for another day.
The changes to the compiler for anything else would be a step up in terms of disruption.

and one new feature that \ before a newline ignores the newline.

This is in the "Future directions for multiline strings" section of the proposal. Having implemented this, how do you feel about it? Does it seem like such a no-brainer that we should just incorporate it into the proposal?

I agree, lets move it into scope.

Latest toolchain with the ability to have more than one modifier as you suggest is now:
http://johnholdsworth.com/swift-LOCAL-2016-05-02-a-osx.tar.gz

John

L_Mihalkovic · May 3, 2016, 10:23pm

inline

Regards
(From mobile)

I'm having trouble getting the `e` modifier to work as advertised, at least for the sequence `\\`. For example, `print(e"\\\\")` prints two backslashes, and `print(e"\\\")` seems to try to escape the string literal. I'm currently envisioning `e` as disabling *all* backslash escapes, so these behaviors wouldn't be appropriate. It also looks like interpolation is still enabled in `e` strings.

Since other things like `print(e"\w+")` work just fine, I'm guessing this is a bug in the proposal's sketches (not being clear enough about the expected behavior), not your code.

I've written a gist with some tests to show how I expect things to work:

https://gist.github.com/brentdax/be3c032bc7e0c101d7ba8b72cd1a692e

The problem here is that I’ve not implemented unescaped literals fully as it would require changes outside the lexer.
This is because the string is first lexed and tokenised by one piece of code Lexer::lexStringLiteral but later
on in the code generation phase it generates the actual literal in a function Lexer::getEncodedStringSegment.
This is passed the same string from the source file but does not know what modifiers should be applied. As a result
normal escapes are still processed. All the “e” flag does is silence the error for invalid escapes during tokenising.

Lexer just lays ropes around certain areas to tell what's where. sometimes this is not enough for extra semantics. this is the reason why i went down the path of a custom string_multiline_literal token. It looks like you might want to consider that path too. If you do, you might consider the merits of suggesting that half the work be put in place now, allowing both our experimentations (and other more sophisticated) to lean on it, as an alternative to just directly adding extra conditional code in the default lexer code.

Having encountered this limitation I managed to persuade myself this is what you want anyway but perhaps few would agree,
What has been implemented is more of an r”” than a e”” that solves the “picket fence” problem where you can also interpolate
into convenient regex literals. This is all beyond the scope of this proposal anyway so I’ll leave that battle for another day.
The changes to the compiler for anything else would be a step up in terms of disruption.

I found that by separating new from existing in Lexer using a new token, you can go further along without really disrupting the original flow. Having a custom token would give your a differentiation point to know how to treat the contents differently. As a concrete eg, this is my way to deal with 2 character prefix/postfix around multiline literals while keeping the existing interpolation logic in place:

void Lexer::getStringLiteralSegments(UNCHANGED SIG) {
// normal initialization

  // drop double character marker of multiline literals
  if (Str.is(tok::string_multiline_literal)) {
    Bytes = Bytes.drop_front().drop_back();
  }

  // normal segmenter below
}

Just thinking… another way to differentiation could be to seed the second lexer with a specific initial token to giving it a different context to interpret incoming chars from. Would probably give you the extra context you seem to be looking for (without widening the signature of the existing parse/lexer communication channel).

@dabrahams / @clattner
Might I ask if it would be possible to have even a very high level yup/nope answer regarding the feasibility of using the temporary lexer swapping facility to inline SIL contents as the body of multiline string literal expression?

···

On May 2, 2016, at 2:23 PM, John Holdsworth <mac@johnholdsworth.com <mailto:mac@johnholdsworth.com>> wrote:

Cole_Campbell · April 29, 2016, 4:08pm

This would ultimately be my favorite approach. I do like the underscores, because they're unobtrusive and don't distract the eye, but I'm interested to see alternative suggestions. However, I understand this is not considered in scope for the current proposal. Is the intention to propose alternate delimiters for Swift 3 now or wait?

···

On Apr 29, 2016, at 9:20 AM, Erica Sadun via swift-evolution <swift-evolution@swift.org> wrote:

On Apr 28, 2016, at 4:52 PM, Brent Royal-Gordon via swift-evolution <swift-evolution@swift.org> wrote:

Did you ever really use multiline string literals before?

Yes. I used Perl in the CGI script era. Believe me, I have used every quoting syntax it supports extensively, including `'` strings, `"` strings, `q` strings, `qq` strings, and heredocs. This proposal is educated by knowledge of their foibles.

As outlined in the "Future directions for string literals in general" section, I believe alternate delimiters (so you can embed quotes) are a separate feature and should be handled in a separate proposal. Once both features are available, they can be combined. For instance, using the `_"foo"_` syntax I sketch there for alternate delimiters, you could say:

   let xml = _"<?xml version="1.0"?>
               "<catalog>
               " <book id="bk101" empty="">
               " <author>\(author)</author>
               " </book>
               "</catalog>"_

Other than the underscores (I'm not sold on them but I could live with them), this is my favorite approach:

* It supports indented left-hand alignment, which is an important to me for readability
* It avoids painful `\n"+` RHS constructions
* It's easy to scan and understand
* It's simple and harmonious

L_Mihalkovic · April 30, 2016, 6:30am

let whyOwhy = _"\
    !! Can't understand what improvements it truly delivers
    !! It basically removes a handful of characters
    !! It works today
    !! But I don't see it as a likable foundations for adding in future enhancements
    !!\
    !! I don't envy the people who will have to support it outside of xcode
    !! Or even in xcode (considering how it currently struggles with indents/formatting
    !! As for elegance, beauty is in the eye of the beholder, they say.
    "_
var json = _"[json]\
!!{
!! "file" : "\(wishIhadPlaceholders)_000.md"
!! "desc" : "and why are all examples in xml, i thought it died a while ago ;-)"
!! "rational" : [
!! "Here we go again"
!! "How will xcode help make these workable"
!! ]
!!}
"_

[_"] --> start string
[_"\] --> start line + ignore spaces until eol
[!!\] --> ignore everything until eol... basically the gap does not exits
["_] --> terminate string
[_"[TYPEID]\] --> start string knowing that it a verifyer or a formatter (or a chain of) understanding TYPEID can syntax check or format or or or

Regards
(From mobile)

···

On Apr 29, 2016, at 4:20 PM, Erica Sadun via swift-evolution <swift-evolution@swift.org> wrote:

On Apr 28, 2016, at 4:52 PM, Brent Royal-Gordon via swift-evolution <swift-evolution@swift.org> wrote:

Did you ever really use multiline string literals before?

Yes. I used Perl in the CGI script era. Believe me, I have used every quoting syntax it supports extensively, including `'` strings, `"` strings, `q` strings, `qq` strings, and heredocs. This proposal is educated by knowledge of their foibles.

As outlined in the "Future directions for string literals in general" section, I believe alternate delimiters (so you can embed quotes) are a separate feature and should be handled in a separate proposal. Once both features are available, they can be combined. For instance, using the `_"foo"_` syntax I sketch there for alternate delimiters, you could say:

   let xml = _"<?xml version="1.0"?>
               "<catalog>
               " <book id="bk101" empty="">
               " <author>\(author)</author>
               " </book>
               "</catalog>"_

Other than the underscores (I'm not sold on them but I could live with them), this is my favorite approach:

* It supports indented left-hand alignment, which is an important to me for readability
* It avoids painful `\n"+` RHS constructions
* It's easy to scan and understand
* It's simple and harmonious

-- E

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution