Pure Bikeshedding: Raw Strings (why yes, again!)

(Erica Sadun) #1

The Need for Raw

I write a lot of code generation apps. Because of this, I could really use some raw strings in my life. Raw strings aren't uncommon in other programming languages. Swift needs to implement them in a way that feels cromulently Swifty.

If you don't remember, the SE-0200 proposal was returned for revision with two conditions: fixing the syntax and coming up with better use cases. I think the use cases has been fixed. It's up to the syntax, and the syntax has to work for both single line and multiline strings. @johnno1962 already implemented Raw strings as a branch so this effort just needs a revised proposal to make this happen.

At this point, I figure either I push for this to happen or I have to live with it never happening. I decided to make one last push.

Motivation

The core team wrote:

The proposal itself leans heavily on regular expressions as a use case for raw string literals. Several reviewers remarked that the motivation wasn’t strong enough to justify the introduction of new syntax in the language, so a revised proposal will need additional motivating examples in other domains.

I think the use cases part is easy. Here's my list.

  • Metaprogramming: This covers use cases including code-producing-code for utility programming, targeted grammars, and building test cases without escaping. This is my use case. I have apps that generate color schemes (in Swift, ObjC, for SpriteKit/SceneKit, literals, etc), that generate appropriate date formatters, that perform language-specific escaping (for example, for Objective-C), that generate markup, and more. Other interested parties would probably include the people who build Kite Composer and PaintCode. Any utility app that outputs code would benefit in some form.
  • Pedagogy: Code snippets play a major role in projects like "Learn to Code" and other teaching applications. Removing snippets to external files makes code review harder. Escaping snippets is a tedious process, which is hard to inspect. Escaping also complicates copying and pasting from working code into your source and back. When you're talking about code, and using code, having that code be formatted as an easily updated raw string is especially valuable.
  • ** Dialogue**: Although many issues are handled with triple-quoted multiline strings, dialogue is peppered with quote marks and other material that requires string escaping. Sources range from scripts to literature, to other textual sources. For relatively short snippets, in the ones or tens of lines, it may be impractical to use external files for each inclusion, especially during code review.
  • Windows paths: Windows uses backslashes to delineate descent through a directory tree: e.g., C:\Windows\All Users\Application Data. I don't do windows but other people do. We do not judge.
  • Data Formats and Domain Specific Languages: It's useful to incorporate short sections of unescaped JSON and XML allows raw form cut and paste without escaping quotes. Like dialog, it may be impractical to use external files and databases for each inclusion. Doing so reduces the ease of inspection, maintenance, and updating with new material.
  • Regular expressions: I saved this one for last because Raw Strings aren't just about regex. While regex in general is a much larger problem than raw strings, it is a primary (if not the primary) use case for many Swift developers. Doing raw strings right now helps regex down the line.

Approachs (aka Get Your Paint Swatches Out!)

The core team wrote that the proposed r"..." syntax didn’t fit well with the rest of the language. The most-often-discussed replacement was #raw("..."), but the Core Team felt more discussion (as a pitch) was necessary. In follow-up, the most popular approaches were #raw and #rawString.

Neither of these seems to satisfy multi-line raw strings very well (which are my primary use-case). Whatever solution is picked, needs to handle both quick regexes as well as substantial multi-line text.

It is highly unlikely that the raw string will offer custom delimiters, so the solution should be as simple and as elegant as possible and the solution for one-line and multi-line strings should be obvious and related.

Right now, multi-line strings work well with dialog and book excerpts, so long as they don't involve backslashes. This is a win for classic text and a slight problem for pedagogical text that discusses technological issues:

let aliceStrings = """
    "Come, there's no use in crying like that!" said Alice to herself,
    rather sharply; "I advise you to leave off this minute!" She generally
    gave herself very good advice, though she very seldom followed it.
    """

This works terribly with code

let codeString = """
    let (coreName, coreValue) = model.refresh()
    let value = "\(coreName): \(coreValue)"
    """

The marker for raw text processing should be close to the string it annotates. Swift provides a couple of role models for this.

The first is declaration attributes. Adding a @raw (or rawValue) attribute to a String declaration could establish that a following string literal be treated as raw.

let myString: @raw String = "let value = "\(coreName): \(coreValue)""

let myString: @raw String = """
   let value = "\(coreName): \(coreValue)"
"""

The drawbacks are:

  • You need a declaration
  • You cannot use an expression with other elements.
  • The annotation is fairly far away from the quote marks.

Another approach is to use a #-delimited keyword, such as #rawString. The problem lies in where and if you add parentheses.

Single line raw strings are easy:

let codeValue = #rawString("let value = "\(coreName): \(coreValue)"")

Multilines get more complicated, especially with the dangling parenthesis at the end.

let codeValue = #rawString("""
    let (coreName, coreValue) = model.refresh()
    let value = "\(coreName): \(coreValue)"
    """)

Eliminating the parenthesis means the modifier must precede the literal, and r"..." has already been ruled out. Plus it's ugly and unswifty. So how do you annotate the raw string in a Swiftilious fashion?

let codeValue = #raw "let value = "\(coreName): \(coreValue)""
let codeValue = #raw """
    let (coreName, coreValue) = model.refresh()
    let value = "\(coreName): \(coreValue)"
    """

These aren't horrible but they aren't wonderful either.

You could follow the example of #if/#endif:

let codeValue = #raw "let value = "\(coreName): \(coreValue)"" #endRaw
let codeValue = #raw """
    let (coreName, coreValue) = model.refresh()
    let value = "\(coreName): \(coreValue)"
    """ #endRaw
// or
let codeValue = #raw
    """
    let (coreName, coreValue) = model.refresh()
    let value = "\(coreName): \(coreValue)"
    """ 
    #endRaw

Again, pretty ugly.

Or you can try using some kind of operator on the quote marks:

let codeValue = #raw(")let value = "\(coreName): \(coreValue)"#raw(")
let codeValue = #raw(""")
    let (coreName, coreValue) = model.refresh()
    let value = "\(coreName): \(coreValue)"
    #raw(""")

If you're willing to walk away from keywords, then underscores could create a new "kind" of quote marks:

let codeValue = _"let value = "\(coreName): \(coreValue)""_
let codeValue =  _"""_
    let (coreName, coreValue) = model.refresh()
    let value = "\(coreName): \(coreValue)"
    _"""_

I actually don't hate this one as much as I might. I thought the double underscores for the two triple quotes looked better than unbalanced ones. I dislike the leading preventing the close quotes from lining up with the text. You could change the triples to single underscore markers:

let codeValue =  
    """_
    let (coreName, coreValue) = model.refresh()
    let value = "\(coreName): \(coreValue)"
    """_

Right-sided underscores look best but they kind of break the idea of the modifications being on the "outside" of the raw quoted text.

So what do you have to offer? Bonus points for clean and elegant.

3 Likes
(Huon Wilson) #2

I think the point in this comment is an important one for considering syntax; quoted here:

I'll also add to those examples that Rust uses r#"..."#, where there can be any^ (including 0) number of #s. The typical case is r"...", but if the string includes a " then one can use r#"..."..."#, and if it includes "#, then r##"..."#..."##, etc. This strikes a middle ground of avoiding the unescapable-delimiter problem while still having a moderate amount of structure. The "RFC" proposing it for Rust has an even more comprehensive overview of other languages.

^ (I think there's actually a limit, but I believe it's so large enough that it's unlikely anyone would encounter it in code where using a raw string is a good idea.)

12 Likes
#3

I would very much like to agree on not needing custom delimiters, but what would be the solution for handling the important “Swift code in a String literal” case that is important for your metaprogramming and pedagogy use cases? People are going to want to put raw strings into raw strings, so I think an optional parameter to set a custom delimiter might still be required. Or perhaps something similar to the Rust solution that @huon mentions where you can do some simple prefixing/suffixing of the opening/closing quotation marks. This will unfortunately reopen the question about using the same techniques to avoid quoting in regular Strings though.

The @raw attribute on the type doesn't make a lot of sense to me, unless you're really proposing that I could, for example, write a function that takes a @raw String parameter and would automatically make any string literal used in that argument position raw. It also raises the question of whether the second string literal would be considered raw in:

var rawString: @raw String = "…"
…
rawString = "…"

To me, the version in your post that makes most sense is the simple #raw(…) or #rawString(…) version, especially if there's going to be a parameter for a custom delimiter. I'm not sure I understand your objection to this form, particularly why they don't “satisfy multi-line raw strings very well”. If it's just the matter of the dangling parenthesis at then end then this ship has already sailed for anyone who indents a function call or method declaration over multiple lines, and indeed for using a multi-line String as the argument to a method:

print("""
      bon voyage
      """)

The plain #raw in front of a string might work as well, depending on the solution for delimiters.

1 Like
#4

As was discussed extensively in the previous thread, custom delimiters are necessary for a raw string that can include Swift code as text, because it must be able to contain other raw strings. In particular it must be able to contain the ending delimiter of other raw strings.

Also, in my opinion a raw string should not support any escape sequences—the characters within it should be represented verbatim.

Regular expressions are not a motivating use-case because there are plans (or at least intentions?) to add first-class regex support to Swift with its own syntax and compile-time checking.

To the topic at hand of bikeshedding raw strings, I think we do not need to support single-line strings, it is sufficient to cover multiline raw strings. Leading spaces can be stripped just like they are for the existing triple-quote multiline strings, whereby the horizontal placement of the closing delimiter determines the indentation.

In order to look and feel Swifty, I think the delimiters for raw strings should primarily comprise quotation marks. One option is “any number (greater than 3) of consecutive double-quotes” (or any such odd number), where the opening and closing delimiters must have the same number of double-quotes.

For example:

let codeValue = """""
        let value = "\(coreName): \(coreValue)"
        """""
3 Likes
(Brent Royal-Gordon) #5

I agree with some of the other posters that we need something customizable, not just alternative. On the other hand, I don't think the Perl/Ruby solution of "just use your favorite delimiter" is a good fit for Swift; both languages invest a surprising amount of complexity into this feature, and the result usually doesn't look as clean as we usually prefer Swift to look.

I'm going to make two suggestions:

  1. An equal number of underscores before the first delimiter and after the last delimiter.
print(_""Step into my parlor," said the spider to the fly."_)

print(___"""
     print("""
           Hi!
           Hi!
           Hi!
           """)
     """___)

This syntax is very lightweight while still making the delimiter very visible in the source code. It's just visible because it's whitespace, instead of being visible because it's very dense.

  1. #raw("literal"), with a matching number of parentheses.
print(#raw(""Step into my parlor," said the spider to the fly."))

print(#raw((((("""
     print("""
           Hi!
           Hi!
           Hi!
           """)
     """))))))

This one incorporates the word "raw" to make sure the user understands the difference in escaping semantics. The very dense parens are very visible, too.

1 Like
(Dante Broggi) #6

I am currently wondering if the following may be acceptable syntax,
Where <*> stands for 'any combination of valid operator characters`,
(the same combination in both places):

#rawString(<*>)
 This string starts at the previous capital T.
 print(#raw((((("""
     print("""
           Hi!
           Hi!
           """)
     """))))))
 #endRaw(<*>)
2 Likes
#7

It doesn't seem great for three versus four quotation marks to change behaviour so wildly.

This is possible but perhaps doesn't align with the use of _ elsewhere in the language and doesn't scream “raw string”. It does feel kind of nicely lightweight, though. Did you consider the same idea, but with # (which I guess is the Rust solution but without the prefixed r)? I haven't thought about the parsing side, though, or whether it fits well with the other uses of #.

print(#""Step into my parlor," said the spider to the fly."#)

print(###"""
     print("""
           Hi!
           Hi!
           Hi!
           """)
     """###)

This is initially appealing, but when I think about the use case of “Swift code in a raw string” I think that multiple closing parentheses in a row are fairly common in source code. I think you're likely to see #raw(((((((((((( a lot just so people don't have to scan through the string to find the actual minimum number required.

This gets back to my question in the previous thread about how common raw strings will be (and also if we expect any other special string types). This idea is fairly verbose, which might be okay if they are rare but not if they're common. It will look truly ridiculous in single-line use, though. #raw and #endraw/#endRaw might be more palatable if the right delimiter solution can be found. As an aside, why did you restrict it to operator characters?

(Dante Broggi) #8

I suppose with #rawString(<*>) there is no reason to restrict it to operator characters, The only necessary restriction would be to exclude (), [], , and IIRC {} because those are notated specifically in the parse rules (or at least their description in the swift book) for # keywords.

(Lance Parker) #9

I would be thrilled if we ended up with something that looks like Rusts raw strings. I'm not a fan of any of the #raw() variants, they seem overly verbose for a string literal.

4 Likes
(Erica Sadun) #10

For reference: Rust Raw Strings

(Rob Mayoff) #11

Suggestion:

\"This is a raw string"
\("This is a raw string with a one-paren terminator")
\(("This is a raw string with a two-paren terminator"))
\((("This is a raw string with a three-paren terminator")))
1 Like
(Jeremy Pereira) #12

Why operator characters? Wouldn't it be better to allow alphanumerics so you can document the purpose of the string?

#rawString(EXAMPLE_RAW)
 This string starts at the previous capital T.
 print(#raw((((("""
     print("""
           Hi!
           Hi!
           """)
     """))))))
 #endRaw(EXAMPLE_RAW)
5 Likes
(Gwendal Roué) #13

Oh yes please!

#rawString(bash)...
#rawString(SQL)...
#rawString(Swift)...
#rawString(HTML)...
#rawString(java)...
#rawString(markdown)...
#rawString(CSS)...
#rawString(javascript)...
(Gwendal Roué) #14

BTW, did we formally ditch the heredoc syntax, already?

// "foo\\(bar)\nbaz"
let x = <<END
    foo\(bar)
    baz
    END

It would kill the << prefix operator, according to the whitespace operator rules described in the doc. But can't it be saved without too much #rawCeremony?

(Dave DeLong) #15

I've been doing some stuff in Go recently, and I find myself very much liking their use of backticks to indicate raw strings:

var cleanString = regexp.MustCompile(`^\s*'?(.*?)'?,?\s*$`)

The use of `` to delimit the string means that you don't have to escape backslashes. (You would have to escape a backtick though)

I like this because it kind of already looks like a string without needing funky syntax, and it echoes how formatting works in Markdown: stuff inside the backticks is "literal"/code.

5 Likes
(Matt Diephouse) #16

I like that a lot.

It might make sense to follow Markdown's lead here too. Markdown will use multiple leading/trailing `s if you want to use a ` inside. (I'm able to write that here as `` ` ``.) With Markdown, you can put spaces directly inside the delimiters so that you can escape a lone `. But that shouldn't be necessary here.

2 Likes
(Erica Sadun) #17

Sadly, backticks interfere with code voice in swiftdoc header material in raw text.

The Rust approach has emerged on this thread as a design driver. Rust's raw strings were well researched. They are simple, and the language establishes precedent. However, the approach is quite close to the rejected spelling:

The proposed r"..." syntax didn’t fit well with the rest of the language. The most-often-discussed replacement was #raw("..."), but the Core Team felt more discussion (as a pitch) is necessary.

The Rust approach uses balanced pound-signs on either side of the opening and closing quote to ensure there's no overlap issues with the raw text. For example:

fn main() {
    let raw_str = r"Escapes don't work here: \x3F \u{211D}";
    println!("{}", raw_str);

    // If you need quotes in a raw string, add a pair of #s
    let quotes = r#"And then I said: "There is no escape!""#;
    println!("{}", quotes);

    // If you need "# in your string, just use more #s in the delimiter.
    // There is no limit for the number of #s you can use.
    let longer_delimiter = r###"A string with "# in it. And even "##!"###;
    println!("{}", longer_delimiter);
}

So how do Evo make this more Swifty? I think moving from r to a more substantial start tag would help. raw balances concision with expression and recognition.

let rawString = raw"Escapes don't work here: \x3F \u{211D}"
print("\(raw_str)")

// If you need quotes in a raw string, add a pair of #s
let quote = raw#"And then I said: "There is no escape!""#
print("\(quote)")

// If you need "# in your string, just use more #s in the delimiter.
// There is no limit for the number of #s you can use.
let longerDelimiter = raw###"A string with "# in it. And even "##!"###
print("\(longerDelimiter)")

/// If you need to use multiline raw strings, prefix `"""` with raw
let mutliLineText = raw####"""
    /// ### Usage
    /// For example, lazily call `foo`
    /// ```
    /// mySequence.lazy.foo({ "$\($0).00")
    /// ```
    """####

I mildly dislike the unbalanced open and closing quotes. Still, the spelling is short, both for single and multi-line raw text. The context is clearer with "raw" versus "r" (clarity at the point of use). Using pound-delimiters offers flexible accommodation for special-cases. Pound signs are not uncommon in documentation markup and the spec allows expanding the delimiter as needed to counteract their presence.

@Lily_Ballard do you have any regrets about how the Rust design ended up?

3 Likes
#18

Two more datapoints, from Julia:

r"..." denotes a regexp, and raw"...." is a raw string (https://docs.julialang.org/en/stable/manual/strings/#man-raw-string-literals-1).

(Dante Broggi) #19

As an open opinion:
I dislike the syntax attempts that put alphanumerics adjacent to quote characters, and # looks too heavy to be repeated before and after a string, although I suppose repetition would only be necessary if one was using # in the string.

(Erica Sadun) #20

The pound sign isn't needed at all if # isn't part of the string. The initial delimiter can be spelled raw. I don't know if it's possible to add something to add a little visual space between raw and either the open quote or the first pound sign. Maybe an underscore or a colon (although the latter may interfere with label parsing):

let rawString = raw_"Escapes don't work here: \x3F \u{211D}"
print("\(raw_str)")

// If you need quotes in a raw string, add a pair of #s
let quote = raw_#"And then I said: "There is no escape!""#
print("\(quote)")

let rawString = raw:"Escapes don't work here: \x3F \u{211D}"
print("\(raw_str)")

// If you need quotes in a raw string, add a pair of #s
let quote = raw:#"And then I said: "There is no escape!""#
print("\(quote)")