SE-0200: Enhancing String Literals Delimiters to Support Raw Text

  • What is your evaluation of the proposal?
    +1
    The only thing I'm not 100% for is the particular shade of "#" on this shed, but that's just a minor thing, not interfering with the usage

  • Is the problem being addressed significant enough to warrant a change to Swift?
    Yes, more than once I've tried to paste weird stuff into a string while testing my code and it broke. Using raw strings would've helped, and there is no alternative to them right now.

  • Does this proposal fit well with the feel and direction of Swift?
    Yes

  • If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?

  • How much effort did you put into your review? A glance, a quick reading, or an in-depth study?
    I've followed threads from the start

1 Like

I'm in favor of this proposal; I've written enough code-gen in Swift recently that not having a raw string format is a frustrating experience of chasing the right number of backslashes. I'm also glad that raw strings support interpolation, since otherwise the feature would be a lot less useful.

Syntax-wise, the # sigil is much nicer than some of the heavier options that were discussed like @raw. However, looking back through one of the previous discussion threads, I even more strongly prefer @beccadax's suggestion of `"..."` (backtick-wrapped quotes) as a delimiter because it's even less visually noisy:

let foo = `"raw string"`
let bar = ``"raw string with `" inside it"``
let baz = `"here's an interpolation \`(foo)"`

Brent, since you're a coäuthor on the proposal, is there a reason that the # was chosen over this one? I don't see it discussed in the proposal text beyond the general statement that backticks would clash with escaped identifiers, but as you point out in the previous thread, there shouldn't be a collision if they are immediately followed/preceded by double-quotes or other backticks.

If the intention is that you want raw strings to stand out more for easy identifiability than backtick-quote might perhaps allow, then that's a reasonable stance, but it would still be good to know if it was considered and have that reasoning noted somewhere.

That one preference aside, I'd also be happy with # as a delimiter if that's what we end up going with in the final implementation.

3 Likes

Ah, yes, there’s the crux of it: existing heredoc syntaxes in other languages offer to disable escaping/interpolation entirely, but don’t offer to change the escape sequence. This proposal hits two birds with one stone: arbitrary string terminor and arbitrary escaping. And given the latter, disabling escaping altogether really isn't necessary. Thanks for walking me through this.

I don't think I'll have time for a longer writeup, so I'll do a quick one: +1. Important problem, good solution.

And my compliments to the authors on a particularly well-composed writeup!

1 Like

+0.9

I share the sentiment that calling this "raw string" is very confusing and misleading while I totally agree that this solution is so elegant.

Though, this solution feels to me that it takes elegance over practicality because I'd imagine that in practice, the most of the cases, the want of "raw string" feature would be for the simplest case to avoid \\ or \" and if so, I think single quote is the best solution for it.

let path = 'c:\nova\register\table'
let regex = '\.\d+'
let info = 'Escape sequences: \n, \t, \r.'

This is far better than what this solution is proposing, isn't it?

I see "Alternatives considered" addresses the reason of excluding single quote:

If we use single quotes for raw strings, we cannot use them for character literals or any other future proposal. We see no need to burn single quotes on this feature.

but as ExpressibleByExtendedGraphemeClusterLiteral currently nicely covers the character literal, I see no strong motive for adding single quote just for character literal in the future. I wouldn't be surprised if single quote will be left untouched at all and we'll end up producing so many #"simple\ncase"# which could be 'simple\ncase'.

I agree that this solution is total win and extremely useful for some complicated cases, and I'd support 100% for it. But I'd like to remove "raw string" part and leave us the chance of using single quote for it.

2 Likes

Again speaking only for myself:

IIRC, when we wrote out a bunch of examples of both syntaxes, we found the # versions much easier to visually parse than the ` ones. I think that was before we came up with the escaping stuff, but I also find \`(...) much more awkward-looking than \#(...).

But I do think it's technically workable; if the core team objects to using # for some reason, they could consider accepting the proposal but changing # to `.

Single-quote literals would do something subtly different from the character literal syntax. The latter is a literal which creates an instance representing a character; single-quote literals would create an instance representing the integer value of a character. That is, single-quotes would be a form of integer literal and would go through ExpressibleByIntegerLiteral. You'd use this for things like byte-by-byte parsing.

To be honest, I don't personally find this use case extremely compelling, but a lot of people disagree with me on that, and using #" instead of ' gives us so much flexibility and power that I'm happy to leave that syntax for them.

I think that #"..."#'s semantics are so close to those of a "truly raw string" that we don't need such a feature—it will cover those use cases adequately. But this proposal doesn't actually foreclose that possibility in the future.

6 Likes

I was put off by the #"..."# syntax at first, but I was completely turned around by the idea that this allows us to not add raw strings to the language as a different kind of string literal. We already have the difference between regular and multiline string literals, and I was prepared to have the full matrix of "single-line raw" and "multiline raw" literals. But this lets us avoid that. From the proposal:

String Start Delimiter Escape Delimiter String End Delimiter
" |"
#" \# "#
##" \## "##
######" \###### "######

I found this table extremely compelling. Swift has one kind of single-line string. You start with 0 or more pound signs to describe how escaping works within the string, then you write the string literal, than you close with the same number of pound signs to allow embedded quotes. There's no such thing as a "raw string", and that keeps the language a little simpler than if "string literals with one or more quotes" were different from "string literals with zero quotes".

Complexity aside, I also think it's good in general to allow escapes even in "raw"-ish contexts. With shell-style '…' or C++-style r"…" you always end up with some scenario where you need to break out after all, and then you either need to use multiple literals or go through and carefully escape everything. This wouldn't even have to be about interpolation or quote characters; it could just be a Unicode escape in a file that's supposed to stay ASCII-compatible for whatever reason.

I think # is as good a sigil as we're going to get, and it matches #selector, so I'm fine with that too. EDIT: I expect that the common case will be #"…"# and that occasionally we'll dip to ##"…"## or even ###"…"### when talking about C preprocessor hacks, but that those last two will be so rare that it's not worth adding custom delimiter functionality.

(I also think whatever syntax we pick, we'll all get used to it in six months like we did with """.)

16 Likes

Unless I skimmed too many lines, the proposal seems a bit unclear on nesting string delimiters.

Main information:

And

Are the following acceptable ways to generate "######" and ###""###

#" "######" "#
#" ###""### "#

(none of the text inside single-pound-quote quote-single-pound have the appropriate number of pounds)

The proposal doesn't mention either about using quoted string inside interpolation. Will the following be acceptable?

#" "\#("######")" #
#" \#(##"###""###"##) "#

(That's a lot of pounds, not sure if there's real need for it, but with proper nesting handling it should be doable)

The text you quoted answers your question, albeit tersely. As I read the proposal, the strings in your two examples terminate at the red stop signs:

#" "#🛑#####" "#
#" ###""#🛑## "#

Yes, this is legal. The text inside the \#(…) interpolation is a separate Swift expression, parsed the same as if it were not inside an interpolated string.

No, this is not legal, because the inner string terminates here:

#" \#(##"###""##🛑#"##) "#

Proposal authors, did I get it right?

2 Likes

That looks right to me.

Why isn't this

#" \#(##"###""#🛑##"##) "#

?

As for the other interpolation example, the parser seems to be able to extract the \#(##"###""###"##) for a nested parsing; inside this parsing, the raw quote are ##" and "##. Note that the following currently work, there's no misquoting, generating a single pound:

"\("#")"
1 Like

The lexer knows that when it's inside an interpolation, it needs to find the end of the interpolation before it can find the terminator. So inside the interpolation, it's not even looking for the "# that would terminate the outer string; it's looking for a "## to terminate the inner string.

(At least it should; I haven't personally tried this with the implementation.)

1 Like

Here's what my current build (a little out of date) does. The second " on line 6 wasn't needed or maybe I should have slapped another " on the end..

12%20PM

1 Like

The proposal text was imprecise enough to allow one to dream.

Too bad, the red stop sign is too early to my taste. Any particular reason why the parsing would stop on the first match instead of continuing until an exact match (or an end-of-line/file for single/triple quote) is found? Does it make parsing simpler?

I didn't notice the interpolation indicator.

I'm also not super thrilled with the use of #, but that's only because I'd like to reserve it for macro'y things. I agree with you though that ` doesn't feel right here because it means something else already (emotionally to humans, this is not an implementation constraint).

That said, there is lots of other punctuation available, and I don't see anything that immediately looks obvious for this use-case, but if someone else does and feels passionately about it, it would be great for them to make the case for a different sigil.

-Chris

5 Likes

I would also like to save # for macro stuff. I can only think of two reasonable alternatives: ~ and @.

1 Like

I had the same concern re #, but on the other hand, isn't the interpolated string literal something like a macro, some kind of compile-time transform? E.g "name: \(name)" -> "name: " + name

1 Like

To be fair, the current behaviour of backticks is also to escape an otherwise-reserved identifier. So it wouldn’t be entirely out of place.

3 Likes

I’d be disappointed to see a switch to ` as the delimiter at the last minute. In practice, using a toolchain:

You very quickly get used to it and I don’t believe using # would preclude application for macros in the same way it lives along side existing constructs such as selector. Using a quote character will also confuse external editors. There may be other alternative characters to consider but few have the desirable heft of #

2 Likes