An Alternate String Literal Design
As John already mentioned, Brent and I have come up with the following alternate design based on feedback from this thread and motivated by my production code. I know this is a reach. Any hostility should be directed to me and me alone.
This design moves in a slightly different direction but it takes inspiration from the same place as our most recent proposal draft: Adopt Rust-style delimiters and use them to enable a single mode of raw, cooked, and conventional string literals all using the same grammar.
String Literals
First, a review of what we have been discussing:
- A conventional string literal is exactly what you use in Swift today. It allows you to use escape sequences like
\\
and \"
and \u{n}
to express backslashes, quotes, and unicode scalars, among other special character sequences.
- A raw string literal ignores escape sequences. It allows you to paste raw code, meaning the sequence
\\\n
represents three backslashes followed by the letter n, not a backslash followed by a line feed.
- A "cooked" string literal (I believe we take the term from C++) allows you to adapt the leading and trailing delimiters so you can include quote marks within the string but retain interpolated sequences. This allows a string to have content like
She said "\(phrase)" to him
, where the quotes do not need escaping and phrase
is expanded to its evaluated content.
Our Design
Our design powers up a conventional String
literal and in doing so, allows you to access features normally associated with raw and cooked literals.
In this design, there is only one variety of string literals without a special "raw" syntax. A string literal is either
- a sequence of characters surrounded by double quotation marks ("), or
- a string that spans several lines surrounded by three double quotation marks.
These are examples of Swift string literals:
"This is a single line Swift string literal"
"""
This is a multi line
Swift string literal
"""
In this form, the revised string design acts exactly like any other string. You use escape sequences including string interpolation exactly as you would today. A backslash escape tells the compiler that a sequence should be interpolated, interpreted as an escaped character, or representa unicode scalar. Escape sequences include:
- The special characters
\0
(null character), \\
(backslash), \t
(horizontal tab), \n
(line feed), \r
(carriage return), \"
(double quotation mark) and \'
(single quotation mark)
- Arbitrary Unicode scalars, written as
\u{n}
, where n is a 1â8 digit hexadecimal number with a value equal to a valid Unicode code point
- Interpolated expressions, introduced by
\(
and terminated by )
Expanding Delimiters
Our design includes custom string delimiters. You may pad a string literal with one or more #
(pound, U+0023) characters:
"This is a Swift string literal"
#"This is also Swift string literal"#
####"So is this"####
The number of pound signs at the start of the string (in these examples, zero, one, and four) must match the number of pound signs at the end of the string. "This"
, #"This"#
, and ##"This"##
represent identical string values.
static-string-literal -> " quoted-text " |
""" multiline-quoted-text """ |
# static-string-literal #
Adding a pound signs changes the string delimiter, allowing you to "cook" a string and include unescaped double quotes:
#"She said, "This is dialog!""#
// The quoted text is `She said, "This is dialog!"`
If you do add a backslash, it is interpreted as an extra character. This string literal includes both the backslash and both double quote marks inside the string delimiters (#"
and "#
):
#"A \"quote"."#
If for some reason you need to include #"
or "#
in your quoted text, adjust the number of delimiter pound signs. This need should be rare.
Escaping
The second, and more impactful, change in this design is that any escape sequence in a string literal must match the number of pound signs used to delimit either end of the string.
Here is the degenerate case. It is a normal string with no pound signs.
"This string has an \(escaped) interpolated item"
Strings using customized delimiters add pound sign(s) after the leading backslash, as in these examples which produce identical results:
#"This string has an \#(escaped) interpolated item"#
####"This string has an \####(escaped) interpolated item"####
The escape sequence delimiter matches the extra delimiters given to the string. Any backslash that is not followed by the correct number of pound signs is treated as raw text. Each of these examples produces the exact characters of the quoted text between the quote marks:
#"This is not \(interpolated)"#
"This is not \#(interpolated)"
#"This is not \##(interpolated)"#
This escaping rule reproduces the raw string behavior from our original proposal but adds string interpolation on demand. We feel this is a huge feature for code generation applications.
Summary
We feel this is a conceptual leap of elegance that simplifies all our workarounds and collapses them into one general solution. It retains Rust-inspired custom delimiters, offers all the features of both "cooked" and "raw" strings, introduces raw string interpolation, and does this all without adding a new special-purpose string type to Swift.
Yes, this approach requires slightly more work than our original design:
- You must use pound signs for any raw string.
- You must use a more cumbersome interpolation sequence for raw and cooked strings.
Hopefully the tradeoffs are worth it in terms of added expressibility and the resulting design is sufficiently elegant.