multi-line string literals


(Drew Crawford) #1

The reason I raise the question is that some languages have multiple quote styles (Perl 5 has something like 3 or 4 different string literal styles IIRC?) with different policies. One reason for this is to disable processing of escapes: if you’re using string literals to enter something that uses \ or “ frequently, it can be irritating and ugly to have a lot of \\'s <smb://'s> <smb://'s <smb://'s>>. In some dialects of inline assembly in C, for example, this can lead to very ugly code.

When introducing a feature like this, I think it would be useful to survey a range of popular languages (and yes, even perl :wink: to understand what facilities they provide and why (i.e. what problems they are solving) and synthesize a good swift design that can solve the same problems with a hopefully simple approach.

While I think surveying the literature for inspiration is good, it is even better to look at motivating usecases. “multi-line string literals” is a feature everybody can get behind, but I wonder if there is agreement on the motivation.

For me, the motivation is that I am pasting the string from some external source. Suppose I’m writing JSON parsing code, and I paste `let json = <JSON>" at the top of the playground to test with.

I would want multi-line string literals to behave as much like normal
string literals as possible for consistency. To optimize for developer
happiness, to steal the ruby saying, unexpected behavior should be kept to
a minimum. If \n works in a normal string literal, it should work in a
multi-line string literal... even if you could just hit enter instead.

Well… the principle of least surprise depends entirely on what surprises you. If you start with a one-line string and then say “nevermind, two lines” then having the escape behavior change out from under you is surprising. On the other hand, if you are pasting long text from an external source, opting *into* escaping behavior is surprising.

The triple-quote proposal at least avoids the issue of accidentally prematurely terminating a pasted string in a lot of common cases. (JSON strings, for example.) However that is almost worse in some sense, as the other escape sequences are much more subtle than a compile error, and likely to slip past developer notice.

tl;dr: I have two questions.

1. Are multi-line strings more commonly used as “long single-line strings” (e.g. wants to be escaped) or “pasted data” (e.g. wants to be non-escaped)? How can we find out?
2. For what answers of 1 is it sensible to make the user ALWAYS be explicit about what behavior they want?


(Travis Tilley) #2

tl;dr: I have two questions.

1. Are multi-line strings more commonly used as “long single-line
strings” (e.g. wants to be escaped) or “pasted data” (e.g. wants to be
non-escaped)? How can we find out?
2. For what answers of 1 is it sensible to make the user ALWAYS be
explicit about what behavior they want?

​I think there are many more use cases than you suggest in #1. In my
primary use case of Docopt, which creates a command-line interface based on
provided help text, an example multi-line string would be this:

let helptext: String = """

Naval Fate.

Usage:
  naval_fate ship new <name>...
  naval_fate ship <name> move <x> <y> [--speed=<kn>]
  naval_fate ship shoot <x> <y>
  naval_fate mine (set|remove) <x> <y> [--moored|--drifting]
  naval_fate -h | --help
  naval_fate --version

Options:
  -h --help Show this screen.
  --version Show version.
  --speed=<kn> Speed in knots [default: 10].
  --moored Moored (anchored) mine.
  --drifting Drifting mine.
​"""

From that string, docopt will generate an appropriate CLI... properly

handling arguments, options, and defaults. Also, if you call the binary
with -h or --help, this text will be output as-is. No escape sequences or
interpolation are used here, but that doesn't mean that they shouldn't be
(replacing a few names with the actual binary name might be handy, for
example).

As for your question #2... I believe that the sane option should always be
the easiest, and the less sane option should be possible, but with
"syntactic vinegar" to make it less pleasant to use. In this specific
instance, however, I'm not sure what that would entail. ^_^;

Please give me more info on what would make your specific use case less
unpleasant. JSON/JavaScript escape sequences tend to correspond pretty well
with swift (with notable exceptions).

- Travis Tilley

···

On Fri, Dec 11, 2015 at 2:06 AM, Drew Crawford via swift-evolution < swift-evolution@swift.org> wrote:


(Drew Crawford) #3

I think there are many more use cases than you suggest in #1.

I didn’t intend this to be read narrowly, sorry! I simply mean that I don’t have a good handle on why people want this feature, I think it might be good to investigate that question to help inform the design rather than assume everybody else is thinking of the same use case each of us individually may have in mind. Your Docopt example was very helpful in this regard, I feel like I better understand your problems. I agree that escaped strings are the ‘sane’ default to serve a use case like Docopt.

As for your question #2... I believe that the sane option should always be the easiest, and the less sane option should be possible, but with "syntactic vinegar" to make it less pleasant to use.

I don’t believe this is true in full generality. An illustration from the type system:

let a: Int16 = 0
let b: Int32 = 0
if a & b { }

There are two ways to resolve this: the “sane” option is to promote `a` to Int32—the insane option is to demote `b` to Int16. Swift rejects both, gives you a compile error, and tells you to sort it out yourself.

Having “default” and “optout” behavior is correct sometimes, right? As is sending it back to the programmer for clarification. What I’m asking is how we distinguish between those two approaches that are both correct for some problems. Do you know any design principle that distinguishes these examples? There must be some reason we force a programmer to rewrite their implicit upcast but not their implicit escaped string.

Please give me more info on what would make your specific use case less unpleasant. JSON/JavaScript escape sequences tend to correspond pretty well with swift (with notable exceptions).

So for example, consider an object.json:

{
  "foo": "bar",
  "baz": "bap\nbap"
}

Now I paste that into my Swift sourcode:

let sampleJSON = """
{
  "foo": "bar",
  "baz": "bap\nbap"
}
"""

Now if Swift escapes this, and thinks I wanted:

{
  "foo": "bar",
  "baz": "bap
bap"
}

This is no longer valid JSON, and I get a mystery parse error.

In the spirit of clattner’s “what do other languages do?”, I think it is instructive to look at what Pythonistas do with their Python multi-line syntax. Almost universally, they use (the default) escaped strings <http://stackoverflow.com/a/1872081> to store JSON. Even people who should know better, because they are writing a JSON parser <https://github.com/simplejson/simplejson/blob/v2.3.3/simplejson/tests/test_pass3.py#L8>, reach for escaped strings first. Escaped strings work until they don’t (e.g. in the example), and then they fail *badly*.

Think about the plight of a noob Swift 3 programmer who has Python syntax. StackOverflow questions will teach them to use escaped strings in this case. Then they paste their JSON into a linter and it comes back valid. Even if they go read a JSON parser they will learn to use escaped strings! I wonder if an entire class of bugs can be avoided by being more careful with the syntax in some way.

While I am motivating the example in terms of JSON specifically, it is much broader than one format—XML, YAML, Xcode build logs, CSV—are all cases where escaping might be a surprise, particularly in playgrounds.

···

On Dec 11, 2015, at 1:27 AM, Travis Tilley <ttilley@gmail.com> wrote:


(Drew Crawford) #4

Sorry, that last link should have pointed to https://github.com/simplejson/simplejson/blob/v2.3.3/simplejson/tests/test_separators.py#L13

···

On Dec 11, 2015, at 5:01 AM, Drew Crawford <drew@sealedabstract.com> wrote:

On Dec 11, 2015, at 1:27 AM, Travis Tilley <ttilley@gmail.com <mailto:ttilley@gmail.com>> wrote:

I think there are many more use cases than you suggest in #1.

I didn’t intend this to be read narrowly, sorry! I simply mean that I don’t have a good handle on why people want this feature, I think it might be good to investigate that question to help inform the design rather than assume everybody else is thinking of the same use case each of us individually may have in mind. Your Docopt example was very helpful in this regard, I feel like I better understand your problems. I agree that escaped strings are the ‘sane’ default to serve a use case like Docopt.

As for your question #2... I believe that the sane option should always be the easiest, and the less sane option should be possible, but with "syntactic vinegar" to make it less pleasant to use.

I don’t believe this is true in full generality. An illustration from the type system:

let a: Int16 = 0
let b: Int32 = 0
if a & b { }

There are two ways to resolve this: the “sane” option is to promote `a` to Int32—the insane option is to demote `b` to Int16. Swift rejects both, gives you a compile error, and tells you to sort it out yourself.

Having “default” and “optout” behavior is correct sometimes, right? As is sending it back to the programmer for clarification. What I’m asking is how we distinguish between those two approaches that are both correct for some problems. Do you know any design principle that distinguishes these examples? There must be some reason we force a programmer to rewrite their implicit upcast but not their implicit escaped string.

Please give me more info on what would make your specific use case less unpleasant. JSON/JavaScript escape sequences tend to correspond pretty well with swift (with notable exceptions).

So for example, consider an object.json:

{
  "foo": "bar",
  "baz": "bap\nbap"
}

Now I paste that into my Swift sourcode:

let sampleJSON = """
{
  "foo": "bar",
  "baz": "bap\nbap"
}
"""

Now if Swift escapes this, and thinks I wanted:

{
  "foo": "bar",
  "baz": "bap
bap"
}

This is no longer valid JSON, and I get a mystery parse error.

In the spirit of clattner’s “what do other languages do?”, I think it is instructive to look at what Pythonistas do with their Python multi-line syntax. Almost universally, they use (the default) escaped strings <http://stackoverflow.com/a/1872081> to store JSON. Even people who should know better, because they are writing a JSON parser <https://github.com/simplejson/simplejson/blob/v2.3.3/simplejson/tests/test_pass3.py#L8>, reach for escaped strings first. Escaped strings work until they don’t (e.g. in the example), and then they fail *badly*.

Think about the plight of a noob Swift 3 programmer who has Python syntax. StackOverflow questions will teach them to use escaped strings in this case. Then they paste their JSON into a linter and it comes back valid. Even if they go read a JSON parser they will learn to use escaped strings! I wonder if an entire class of bugs can be avoided by being more careful with the syntax in some way.

While I am motivating the example in terms of JSON specifically, it is much broader than one format—XML, YAML, Xcode build logs, CSV—are all cases where escaping might be a surprise, particularly in playgrounds.


(Travis Tilley) #5

OK, I understand your problem better with that example. Perhaps a solution
would be to have """ process escapes while ''' starts a "raw" string. This
is very much in line with ruby's style of strings where processing isn't
done within single quoted strings:

irb(main):001:0> foo = 2
=> 2
irb(main):002:0> "#{foo}"
=> "2"
irb(main):003:0> '#{foo}'
=> "\#{foo}"

(Ignore that \# at the end there; that's only because the interactive
console returns a double quoted string. I assure you, the contents are
unescaped.)

If we go with this syntax for escaped and unescaped strings, then for
consistency's sake it would make sense to have single-line single-quoted
strings that don't escape and are "raw" strings like in ruby.

CCing Chris Lattner here specifically because, in swift today, using single
quotes gives you a warning to just use double quotes, so he might have a
strong opinion on this one. The latest commit to Lexer.cpp was actually to
add better handling of single quoted strings (and I think more warnings if
I remember?).

- Travis Tilley

···

On Fri, Dec 11, 2015 at 6:01 AM, Drew Crawford <drew@sealedabstract.com> wrote:

On Dec 11, 2015, at 1:27 AM, Travis Tilley <ttilley@gmail.com> wrote:

I think there are many more use cases than you suggest in #1.

I didn’t intend this to be read narrowly, sorry! I simply mean that I
don’t have a good handle on why people want this feature, I think it might
be good to investigate that question to help inform the design rather than
assume everybody else is thinking of the same use case each of us
individually may have in mind. Your Docopt example was very helpful in
this regard, I feel like I better understand your problems. I agree that
escaped strings are the ‘sane’ default to serve a use case like Docopt.

As for your question #2... I believe that the sane option should always be
the easiest, and the less sane option should be possible, but with
"syntactic vinegar" to make it less pleasant to use.

I don’t believe this is true in full generality. An illustration from the
type system:

let a: Int16 = 0
let b: Int32 = 0
if a & b { }

There are two ways to resolve this: the “sane” option is to promote `a` to
Int32—the insane option is to demote `b` to Int16. Swift rejects both,
gives you a compile error, and tells you to sort it out yourself.

Having “default” and “optout” behavior is correct sometimes, right? As is
sending it back to the programmer for clarification. What I’m asking is
how we distinguish between those two approaches that are both correct for
some problems. Do you know any design principle that distinguishes these
examples? There must be some reason we force a programmer to rewrite their
implicit upcast but not their implicit escaped string.

Please give me more info on what would make your specific use case less
unpleasant. JSON/JavaScript escape sequences tend to correspond pretty well
with swift (with notable exceptions).

So for example, consider an object.json:

{
  "foo": "bar",
  "baz": "bap\nbap"
}

Now I paste that into my Swift sourcode:

let sampleJSON = """
{
  "foo": "bar",
  "baz": "bap\nbap"
}
"""

Now if Swift escapes this, and thinks I wanted:

{
  "foo": "bar",
  "baz": "bap
bap"
}

This is no longer valid JSON, and I get a mystery parse error.

In the spirit of clattner’s “what do other languages do?”, I think it is
instructive to look at what Pythonistas do with their Python multi-line
syntax. Almost universally, they use (the default) escaped strings
<http://stackoverflow.com/a/1872081> to store JSON. Even people who
should know better, because they are writing a JSON parser
<https://github.com/simplejson/simplejson/blob/v2.3.3/simplejson/tests/test_pass3.py#L8>,
reach for escaped strings first. Escaped strings work until they don’t
(e.g. in the example), and then they fail *badly*.

Think about the plight of a noob Swift 3 programmer who has Python
syntax. StackOverflow questions will teach them to use escaped strings in
this case. Then they paste their JSON into a linter and it comes back
valid. Even if they go read a JSON parser they will learn to use escaped
strings! I wonder if an entire class of bugs can be avoided by being more
careful with the syntax in some way.

While I am motivating the example in terms of JSON specifically, it is
much broader than one format—XML, YAML, Xcode build logs, CSV—are all cases
where escaping might be a surprise, particularly in playgrounds.


(Drew Crawford) #6

I actually like this solution, for what it’s worth. +1 for """ vs ''' having escaped/unescaped semantics.

I considered suggesting that, but like you I suspect that clattner has a strong view.

Would be interested in hearing what him & others think.

···

On Dec 11, 2015, at 10:01 AM, Travis Tilley <ttilley@gmail.com> wrote:

OK, I understand your problem better with that example. Perhaps a solution would be to have """ process escapes while ''' starts a "raw" string. This is very much in line with ruby's style of strings where processing isn't done within single quoted strings:

irb(main):001:0> foo = 2
=> 2
irb(main):002:0> "#{foo}"
=> "2"
irb(main):003:0> '#{foo}'
=> "\#{foo}"

(Ignore that \# at the end there; that's only because the interactive console returns a double quoted string. I assure you, the contents are unescaped.)

If we go with this syntax for escaped and unescaped strings, then for consistency's sake it would make sense to have single-line single-quoted strings that don't escape and are "raw" strings like in ruby.

CCing Chris Lattner here specifically because, in swift today, using single quotes gives you a warning to just use double quotes, so he might have a strong opinion on this one. The latest commit to Lexer.cpp was actually to add better handling of single quoted strings (and I think more warnings if I remember?).

- Travis Tilley

On Fri, Dec 11, 2015 at 6:01 AM, Drew Crawford <drew@sealedabstract.com> wrote:

On Dec 11, 2015, at 1:27 AM, Travis Tilley <ttilley@gmail.com> wrote:

I think there are many more use cases than you suggest in #1.

I didn’t intend this to be read narrowly, sorry! I simply mean that I don’t have a good handle on why people want this feature, I think it might be good to investigate that question to help inform the design rather than assume everybody else is thinking of the same use case each of us individually may have in mind. Your Docopt example was very helpful in this regard, I feel like I better understand your problems. I agree that escaped strings are the ‘sane’ default to serve a use case like Docopt.

As for your question #2... I believe that the sane option should always be the easiest, and the less sane option should be possible, but with "syntactic vinegar" to make it less pleasant to use.

I don’t believe this is true in full generality. An illustration from the type system:

let a: Int16 = 0
let b: Int32 = 0
if a & b { }

There are two ways to resolve this: the “sane” option is to promote `a` to Int32—the insane option is to demote `b` to Int16. Swift rejects both, gives you a compile error, and tells you to sort it out yourself.

Having “default” and “optout” behavior is correct sometimes, right? As is sending it back to the programmer for clarification. What I’m asking is how we distinguish between those two approaches that are both correct for some problems. Do you know any design principle that distinguishes these examples? There must be some reason we force a programmer to rewrite their implicit upcast but not their implicit escaped string.

Please give me more info on what would make your specific use case less unpleasant. JSON/JavaScript escape sequences tend to correspond pretty well with swift (with notable exceptions).

So for example, consider an object.json:

{
  "foo": "bar",
  "baz": "bap\nbap"
}

Now I paste that into my Swift sourcode:

let sampleJSON = """
{
  "foo": "bar",
  "baz": "bap\nbap"
}
"""

Now if Swift escapes this, and thinks I wanted:

{
  "foo": "bar",
  "baz": "bap
bap"
}

This is no longer valid JSON, and I get a mystery parse error.

In the spirit of clattner’s “what do other languages do?”, I think it is instructive to look at what Pythonistas do with their Python multi-line syntax. Almost universally, they use (the default) escaped strings to store JSON. Even people who should know better, because they are writing a JSON parser, reach for escaped strings first. Escaped strings work until they don’t (e.g. in the example), and then they fail *badly*.

Think about the plight of a noob Swift 3 programmer who has Python syntax. StackOverflow questions will teach them to use escaped strings in this case. Then they paste their JSON into a linter and it comes back valid. Even if they go read a JSON parser they will learn to use escaped strings! I wonder if an entire class of bugs can be avoided by being more careful with the syntax in some way.

While I am motivating the example in terms of JSON specifically, it is much broader than one format—XML, YAML, Xcode build logs, CSV—are all cases where escaping might be a surprise, particularly in playgrounds.


(Chris Lattner) #7

I’m ok with repurposing single quoted strings for something else. Making them be the canonical “do not process escapes” string would make sense to me. They should be usable in both single line and multi-line forms.

-Chris

···

On Dec 11, 2015, at 8:01 AM, Travis Tilley <ttilley@gmail.com> wrote:

OK, I understand your problem better with that example. Perhaps a solution would be to have """ process escapes while ''' starts a "raw" string. This is very much in line with ruby's style of strings where processing isn't done within single quoted strings:

irb(main):001:0> foo = 2
=> 2
irb(main):002:0> "#{foo}"
=> "2"
irb(main):003:0> '#{foo}'
=> "\#{foo}"

(Ignore that \# at the end there; that's only because the interactive console returns a double quoted string. I assure you, the contents are unescaped.)

If we go with this syntax for escaped and unescaped strings, then for consistency's sake it would make sense to have single-line single-quoted strings that don't escape and are "raw" strings like in ruby.

CCing Chris Lattner here specifically because, in swift today, using single quotes gives you a warning to just use double quotes, so he might have a strong opinion on this one. The latest commit to Lexer.cpp was actually to add better handling of single quoted strings (and I think more warnings if I remember?).