Pure Bikeshedding: Raw Strings (why yes, again!)

Just wanted to highlight that, although Erica's examples all show interpolations, this rule applies to other escape sequences as well. In a #"..."# string, \#n produces a newline character, while \n and \##n do not. You probably wouldn't see this too often, though—after all, one of the major reasons to use #"..."# is to avoid having to escape things!

2 Likes

I think I like this latest proposal, but this part gives me pause:

For the simple case of a "conventional" string literal, this feels ... wrong?

Compare:

let x = "You're \#1"
// OK; `x` as the value: You're #1

to:

let x = "You're \#1"
//               ^
//               error: invalid escape sequence in literal

(The latter is the behavior in current versions of the language.)

2 Likes

Yeah, we might want to treat too many pounds as an illegal escape sequence (probably with a custom diagnostic if we do). This design is still just on paper—we’ll see how the details shake out.

1 Like

Warning on misused \ is challenging so look at a few cases.

What you want: \#1
What you type today: "\\#1"
What you type upon acceptance: "\\#1" (The same)
What you type if the string uses a single pound delimiter: #"\#\#1"# (Brent, check me on that) or you adjust the delimiter: ##"\#1"## (ditto)

Yes, this design can (and probably should) warn on an unrecognized escape sequence such as #"\#1"# or "\v" exactly as it does today. Delimiter + unrecognized = error/warning.

Going beyond that check is problematic, especially looking for under-delimited escape sequences. For example, say the content is

#"printf("%s\n", value_string)"#

This is not a misplaced or misused escape delimiter. I think it should not be warned.

I also agree that "This is not \#(interpolated)" probably should error because \# is an unrecognized escape sequence.

Right. It doesn't make sense to warn about too few pounds because one of the purposes of the feature is to let you write backslashes that aren't escape sequences. But too many makes sense to diagnose as an error because #"...\##..."# is equivalent to "...\#...", which is not a valid escape sequence.

3 Likes

Indeed, this is excellent. I'm very glad that you've gone beyond raw strings to look at other ways to address the stated use cases.

1 Like

If you want to follow along, here's the gist with the writeup including several corrections (Thanks @nonsensery)

I think unifying all strings into a single scheme is really elegant, and letting the normal syntax fall out of the zero-#s case is so nice. (I think that Rust should adopt it too, to complete the circle!)

One minor thing is calling the \ in "\\" and \# in #"\#\"# "literal"/"literal signifier" was quite confusing to me, as it's the opposite of literal for \n and \() etc. Maybe "escape"?

Really interesting design, thanks again to everyone for working so hard on this. The biggest source of backlash is probably going to be for \#n and similar, but I think multi-line strings make this less of an issue, and most people will just continue using “normal” strings anyway, which are unchanged. Are there other escape sequences that are commonly used in strings besides \n and \(…)?

I looked up the list:

  • The escaped special characters \0 (null character), \\ (backslash), \t (horizontal tab), \n (line feed), \r (carriage return), \" (double quotation mark) and \' (single quotation mark)
  • An arbitrary Unicode scalar, written as \u{n}, where n is a 1–8 digit hexadecimal number with a value equal to a valid Unicode code point (Unicode is discussed in Unicode below)

None of which seem like they would be common in strings with custom delimiters besides \t and \n, since the custom delimiters already handle the \\ and \" cases and you can just write Unicode in source code directly. The one oddity on this list is \', which on testing seems to do nothing different from ' currently, i.e. they both just make an apostrophe/single quote. I'm not sure what the history is there, but it doesn't seem like a problem for this proposal.

The parser parses single-quoted strings as though they are double-quoted strings, but immediately diagnoses an error with a fix-it to turn them into double-quotes; supporting \' is part of that feature. Looking at the revision history, I think this is just to help people who've been writing a lot of Python lately, but obviously it'll come in handy if we ever decide to use single quotes for something.

I've done some massive rewrites.

Please cast critical eyes on this: raw#.md · GitHub

Thank you!

1 Like

Toolchain supporting the new unified delimited strings available for testing.
Shows up as a dev snapshot for the 27th and you’ll need to restart Xcode.

2 Likes

Wait, is that supposed to be right or wrong?

(I'm thinking wrong. The "#" in the middle enables interpolation, so the prior "\" should have gone away with the pound and parentheses marks during replacement.)

Looks like a bug in the toolchain.. Fixed it and will upload a new version in a couple of hours.

Edit: There were a couple of bugs in the toolchain in the end.
They are fixed now if you download again from the same link

I am not a fan of mixing interpolation in raw (or cooked?) strings.

I'd much prefer using something like the below which I'd would expect to work.

let myString = #"I am a [<template>] string"#.replacingOccurrences(of: "[<template>]", with: "raw")
// or
let myString = #"I am a ##[<template>]## string.  "#
              .replacingOccurrences(of: "[<template>]", with: "raw")
              .replacingOccurrences(of: ".", with: "!")

I think it would help if you would explain your reasoning instead of saying “I am not a fan of mixing interpolation in raw strings” or “custom delimiters should be limited to raw strings”. It's hard to understand why you would prefer to use an ad hoc templating system instead of interpolation, for example, and people will probably not be convinced by statements of opinion without any analysis.

3 Likes

I’m glad you asked.

I care about this proposal and I want to see it pass the review phase. I worry that jamming more features into raw strings just makes it less focused thus less likely to be accepted.

I love the idea of using different number of #’s as a delimiter for raw strings but to call that feature “custom delimiter” is probably not the right choice.

A raw string uses #’s as a delimiter.

  1. Why do we have to extend that to regular strings? As in, trying to support escaping and interpolation.

  2. Why can’t we just have #’s depict pure raw strings?

I don’t like the idea because:

  1. I think it needlesly complicates a very simple feature (raw strings) by forcing interpolation to work.

  2. The moment you add interpolation, you don’t have a raw string. From a purest stand point I just feel like it completely misses the point.

Otherwise I love the proposal.

I argue that, if you only have 1-3 'levels' of raw string, ## is not much more visually distinct than #3. The moment that you have four or more levels… you have the problem that I dislike so much of "is that five or six characters?" ##### vs ###### vs ##### vs #### vs ####### is just annoying.

four # seems to be the the limit of easy readability, in my opinion.

1 Like

Basically, this would put us back to the "allow arbitrary delimiter strings" situation:
When #3 is valid, #3141592 should be valid as well... and so does #?sharp or any (at least most) other combinations.

Actually, I don't think the restriction to only allow # isn't needed, and it would still be allowed to use a delimiter that only contains #
You could definitely run into trouble when you choose a delimiter that is an existing directive - but if someone desperately wants to shoot into his feet, we can't stop him anyways.