Yeah, we might want to treat too many pounds as an illegal escape sequence (probably with a custom diagnostic if we do). This design is still just on paper—we’ll see how the details shake out.
Warning on misused \
is challenging so look at a few cases.
What you want: \#1
What you type today: "\\#1"
What you type upon acceptance: "\\#1"
(The same)
What you type if the string uses a single pound delimiter: #"\#\#1"#
(Brent, check me on that) or you adjust the delimiter: ##"\#1"##
(ditto)
Yes, this design can (and probably should) warn on an unrecognized escape sequence such as #"\#1"#
or "\v"
exactly as it does today. Delimiter + unrecognized = error/warning.
Going beyond that check is problematic, especially looking for under-delimited escape sequences. For example, say the content is
#"printf("%s\n", value_string)"#
This is not a misplaced or misused escape delimiter. I think it should not be warned.
I also agree that "This is not \#(interpolated)"
probably should error because \#
is an unrecognized escape sequence.
Right. It doesn't make sense to warn about too few pounds because one of the purposes of the feature is to let you write backslashes that aren't escape sequences. But too many makes sense to diagnose as an error because #"...\##..."#
is equivalent to "...\#..."
, which is not a valid escape sequence.
Indeed, this is excellent. I'm very glad that you've gone beyond raw strings to look at other ways to address the stated use cases.
If you want to follow along, here's the gist with the writeup including several corrections (Thanks @nonsensery)
I think unifying all strings into a single scheme is really elegant, and letting the normal syntax fall out of the zero-#
s case is so nice. (I think that Rust should adopt it too, to complete the circle!)
One minor thing is calling the \
in "\\"
and \#
in #"\#\"#
"literal"/"literal signifier" was quite confusing to me, as it's the opposite of literal for \n
and \()
etc. Maybe "escape"?
Really interesting design, thanks again to everyone for working so hard on this. The biggest source of backlash is probably going to be for \#n
and similar, but I think multi-line strings make this less of an issue, and most people will just continue using “normal” strings anyway, which are unchanged. Are there other escape sequences that are commonly used in strings besides \n
and \(…)
?
I looked up the list:
- The escaped special characters
\0
(null character),\\
(backslash),\t
(horizontal tab),\n
(line feed),\r
(carriage return),\"
(double quotation mark) and\'
(single quotation mark)- An arbitrary Unicode scalar, written as \u{n}, where n is a 1–8 digit hexadecimal number with a value equal to a valid Unicode code point (Unicode is discussed in Unicode below)
None of which seem like they would be common in strings with custom delimiters besides \t
and \n
, since the custom delimiters already handle the \\
and \"
cases and you can just write Unicode in source code directly. The one oddity on this list is \'
, which on testing seems to do nothing different from '
currently, i.e. they both just make an apostrophe/single quote. I'm not sure what the history is there, but it doesn't seem like a problem for this proposal.
The parser parses single-quoted strings as though they are double-quoted strings, but immediately diagnoses an error with a fix-it to turn them into double-quotes; supporting \'
is part of that feature. Looking at the revision history, I think this is just to help people who've been writing a lot of Python lately, but obviously it'll come in handy if we ever decide to use single quotes for something.
Toolchain supporting the new unified delimited strings available for testing.
Shows up as a dev snapshot for the 27th and you’ll need to restart Xcode.
Wait, is that supposed to be right or wrong?
(I'm thinking wrong. The "#
" in the middle enables interpolation, so the prior "\
" should have gone away with the pound and parentheses marks during replacement.)
Looks like a bug in the toolchain.. Fixed it and will upload a new version in a couple of hours.
Edit: There were a couple of bugs in the toolchain in the end.
They are fixed now if you download again from the same link
I am not a fan of mixing interpolation in raw (or cooked?) strings.
I'd much prefer using something like the below which I'd would expect to work.
let myString = #"I am a [<template>] string"#.replacingOccurrences(of: "[<template>]", with: "raw")
// or
let myString = #"I am a ##[<template>]## string. "#
.replacingOccurrences(of: "[<template>]", with: "raw")
.replacingOccurrences(of: ".", with: "!")
I think it would help if you would explain your reasoning instead of saying “I am not a fan of mixing interpolation in raw strings” or “custom delimiters should be limited to raw strings”. It's hard to understand why you would prefer to use an ad hoc templating system instead of interpolation, for example, and people will probably not be convinced by statements of opinion without any analysis.
I’m glad you asked.
I care about this proposal and I want to see it pass the review phase. I worry that jamming more features into raw strings just makes it less focused thus less likely to be accepted.
I love the idea of using different number of #’s as a delimiter for raw strings but to call that feature “custom delimiter” is probably not the right choice.
A raw string uses #’s as a delimiter.
-
Why do we have to extend that to regular strings? As in, trying to support escaping and interpolation.
-
Why can’t we just have #’s depict pure raw strings?
I don’t like the idea because:
-
I think it needlesly complicates a very simple feature (raw strings) by forcing interpolation to work.
-
The moment you add interpolation, you don’t have a raw string. From a purest stand point I just feel like it completely misses the point.
Otherwise I love the proposal.
I argue that, if you only have 1-3 'levels' of raw string, ##
is not much more visually distinct than #3
. The moment that you have four or more levels… you have the problem that I dislike so much of "is that five or six characters?" #####
vs ######
vs #####
vs ####
vs #######
is just annoying.
four #
seems to be the the limit of easy readability, in my opinion.
Basically, this would put us back to the "allow arbitrary delimiter strings" situation:
When #3
is valid, #3141592
should be valid as well... and so does #?sharp
or any (at least most) other combinations.
Actually, I don't think the restriction to only allow # isn't needed, and it would still be allowed to use a delimiter that only contains #
You could definitely run into trouble when you choose a delimiter that is an existing directive - but if someone desperately wants to shoot into his feet, we can't stop him anyways.
I was actually arguing for integers of arbitrary length. if you get up to 3141592 levels your readability has still improved by some amount…
This revision feels like fewer changes, fewer features, and gives me both the expressive power I was looking for plus a huge bonus feature that saves me having to do "replace occurrences". It also fits pretty naturally into the language if you look at the diffs. I spent a lot of yesterday coding with it and it feels natural and gets the job done beautifully.
One pro tip: Paste code into Xcode first so it indents right and then add quotes around it.
Unless you're doing something really weird I don't think people ever need more than one level.