Pure Bikeshedding: Raw Strings (why yes, again!)

You can make a case that raw strings have a greater claim on ‘ than char literals and I would disagree but that’s not the main problem. Given custom delimiters are a requirement changing quote does not solve the problem. What is your response to this:

In the end you need an introducer inside or outside the string and if you’re going to do that why not just multi-purpose “”.

What about using Markdown style code block syntax (```):

let string = ```
Example of a Raw String: "C:\DEMO"
```

let string = ```C:\DEMO```

They are clean and easy to understand.

Well, we really want two different things: an alternate delimiter and a raw literal. '' can sometimes act as an alternate delimiter when a string contains " but not ' and it doesn't matter whether the contents are raw or "cooked", but it's not primarily an alternate delimiter feature, any more than multiline string literals are.

My current thought on alternate delimiters is that we should put ` before the opening quote and after the closing one:

print(`""No Raw Loops" --Dave Abrahams's coworker"`)

If you have a backtick in your string, you can use two backticks on each end, etc. You can also use `'...'` for a raw string with alternate delimiters, `"""..."""` for a multiline string, `'''...'''` for a raw multiline string, etc.

This would not conflict with any currently valid syntax—neither ", ', nor ` is valid immediately following a backtick, and there isn't even much weird "let's diagnose some common invalid code" logic in the backtick handling, so it's a truly unused corner of the language. I don't think there's any direct precedent for this syntax in other languages, but it's somewhat similar to our use of backticks to escape identifiers which match keywords.

Edit to show all permutations of these features (well, I can't show every alternate delimiter, they're infinite):

// "Cooked" literal
print("C:\\AUTOEXEC.BAT")
// Raw literal
print('C:\AUTOEXEC.BAT')

// Alternate "cooked" literal
print(`"print("C:\\\\AUTOEXEC.BAT")"`)
// Alternate raw literal
print(`'print('C:\AUTOEXEC.BAT')'`)

// Alternate (2) "cooked" literal
print(``"print(`"print("C:\\\\\\\\AUTOEXEC.BAT")"`)"``)
// Alternate (2) raw literal
print(``'print(`'print('C:\AUTOEXEC.BAT')'`)'``)

// Alternate 3+ omitted to prevent running out of memory and/or author lifespan

// "Cooked" multiline literal
print("""
        print("C:\\\\AUTOEXEC.BAT")
        print('C:\\AUTOEXEC.BAT')
        print(`"print("C:\\\\\\\\AUTOEXEC.BAT")"`)
        print(`'print('C:\\AUTOEXEC.BAT')'`)
        """)
// Raw multiline literal:
print('''
        print("C:\\AUTOEXEC.BAT")
        print('C:\AUTOEXEC.BAT')
        print(`"print("C:\\\\AUTOEXEC.BAT")"`)
        print(`'print('C:\AUTOEXEC.BAT')'`)
        ''')

// Alternate "cooked" multiline literal
print(`"""
         print("""
                print("C:\\\\\\\\AUTOEXEC.BAT")
                print('C:\\\\AUTOEXEC.BAT')
                print(`"print("C:\\\\\\\\\\\\\\\\AUTOEXEC.BAT")"`)
                print(`'print('C:\\\\AUTOEXEC.BAT')'`)
                """)
        """`)
// Alternate raw multiline string literal:
print(`'''
         print('''
                print("C:\\AUTOEXEC.BAT")
                print('C:\AUTOEXEC.BAT')
                print(`"print("C:\\\\AUTOEXEC.BAT")"`)
                print(`'print('C:\AUTOEXEC.BAT')'`)
                ''')
        '''`)

// Alternate (2) "cooked" multiline literal
print(``"""
          print(`"""
                  print("""
                         print("C:\\\\\\\\\\\\\\\\AUTOEXEC.BAT")
                         print('C:\\\\\\\\AUTOEXEC.BAT')
                         print(`"print("C:\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\AUTOEXEC.BAT")"`)
                         print(`'print('C:\\\\\\\\AUTOEXEC.BAT')'`)
                         """)
                 """`)
         """``)
// Alternate (2) raw multiline string literal:
print(``'''
          print(`'''
                  print('''
                         print("C:\\AUTOEXEC.BAT")
                         print('C:\AUTOEXEC.BAT')
                         print(`"print("C:\\\\AUTOEXEC.BAT")"`)
                         print(`'print('C:\AUTOEXEC.BAT')'`)
                         ''')
                 '''`)
         '''``)

// Alternate 3+ omitted to prevent running out of memory and/or author lifespan, again
1 Like

You want to use ‘ and `?!? I don’t understand your example

Is that a raw string or cooked where you can use escaping so special delimiters are not required. Alternate delimiters as a requirement are unique to raw strings where normal escaping is not possible and as such a couple of solutions put forward have used this to say if there is a custom delimiter then it is a raw string, for example ###”a string”### which is fine.

Just too complicated. The elegance of the rust approach of repeated different particular characters outside a string with a prefix or “r” or “#raw” or none at all (see above) is that it is simple and easy to grok and document. Otherwise you can’t be certain you have a solution for all conceivable cases.

Yes, I want to use both. I think we should have three orthogonal features:

  1. Raw literals: If you delimit your string literal with ' instead of ", it treats backslashes within the literal as backslash characters, not as escape sequences or interpolations.

  2. Multiline literals: If you triple the delimiter, you must place a newline before and after the literal's content; newlines within the content will be preserved, and leading whitespace will be stripped to match the whitespace before the closing delimiter.

  3. Alternate delimiters: If you put one or more backticks before the opening delimiter, you must put the same number of backticks after the closing delimiter; instances of the delimiter with less than the required number of backticks will be treated as part of the literal's contents.

You can use these three features in any combination, so they provide a ton of flexibility. Usually you won't need all of them, but when you do, they'll be there. And when you want to write code that contains code, you always have good options which won't require you to mangle your string content more than necessary.

I don't think this is hard to explain to users; honestly, I think it'd be harder to explain why, for instance, alternate delimiters are only available on raw strings. Python has an alternate delimiter, single-line and multi-line strings, and option characters for things like raw, byte, and even interpolation, and users don't seem to find it too complicated to handle.

(Random thought: Maybe the ` should affect the escape character too? That is, in a `"..."` string, you have to write `\n. That would help when you're generating code with backslashes in it, but you want to interpolate content or escape things. It'd definitely be more complicated, though—might be a bridge too far.)

Alternate delimiters are only a requirement on raw strings, but they're nice to have on regular strings too. There's a reason every major scripting language has at least one alternate string literal delimiter with no semantic differences, and the one with the best reputation for string handling has the most sophisticated string literal delimiting features. If you have these features, people will occasionally be very grateful they're there.

6 Likes

Wait, since when?

I don’t agree with it personally since multiline raw syntax is so distinctive but it's generally considered to be something we have to cater for. What did I say about “distracting dimension of discourse” first time around?

Generally considered by whom? I recall posting a survey of other languages which demonstrated that raw strings and custom delimiters were not conjoined features; some (many?) languages support one but not the other. At what point did it become a given in this conversation that we are designing two features at the same time? To me it seems that at least the proponents of single quotes and backticks do not agree that it's a given.

If we can't agree on what shed we're painting, then I'd suggest that it's time to take a step back before trying to pick out a paint color.

I think this speaks to the core team's exhortation to flesh out motivating use cases. Overall, the process has felt like it's proceeded backwards: it was identified that "raw strings" were something to be added, and now we're trying to figure out what problems we want to solve with raw strings while also figuring out how we want to design them. I'd imagine we'd arrive at a design more organically if we started the other way: here are some problems x, y, and z; why are raw strings the best solution, and how then to design raw strings to best address them?

It’s trying to implement custom delimiters where the wheels fall off the alternate quote implementations. I’m just trying to wrangle the constraints thrown up along the way. You can see where if you look back through the thread(s). If you ask me we’re close to having an array of possible syntaxes that satisfy these constraints that I listed above and just need to decide between them but I don’t view ‘ or ` implementations being among them.

I would disagree to a limited extent you need to know what your going to use it for before you can design the feature. @Erica_Sadun has listed the motivating cases anyway. For me it is a purely technical problem: can you come up with a syntax that can represent any possible string in the manner desired without being able to escape that is not “ugly"

Just throwing this out there: German-style quotation marks:

≪"What is this 'thing'?", he asked.≫

(Have we discounted unicode?)

Right, and I'm seriously asking by whom this particular constraint was thrown up, and wondering whether the constraint has been sufficiently motivated by those throwing it up such that we should satisfy it.

To show my cards, I'm in agreement with Brent that custom delimiters are an orthogonal feature, and on that basis I consider it an anti-goal to roll it into this proposal.

1 Like

One of the major motivations for adding raw strings is code generation. But raw strings aren't very useful for that if you can't embed single quotes inside them somehow, and even multiline raw strings aren't enough if you want to generate Swift code containing multiline raw string literals.

I agree with you up to here completely.

and even multiline raw strings aren't enough if you want to generate Swift code containing multiline raw string literals.

...and here I think we need to take stock. Should several designs for raw strings be discarded because they fail to account for the specific use of case of (a) code generation, (b) of Swift code, (c) containing multiline raw string literals? What are the use cases for these multiline raw string literals in Swift embedded inside multiline raw string literals in Swift?

Using indentation might help out there even without custom delimiters

Consider the following

print(#raw"""
____print(#raw"""
________a string
________""")
____""")

Something could probably be implemented that inhibited the fourth line from closing the first line prematurely

This trick works for that case, although would likely be confusing, and I don't think it allows more than one ' on either end. This point above is relevant:

Then an indentation error would cause a runaway multiline string literal. Imagine if you added one extra column of indentation to the closing delimiter in your example; the compiler would reject that as a valid delimiter and keep looking. Three hundred lines later, it might finally find a delimiter that was as unindented as all of the lines above it and happily assume that the code was valid.

@taylorswift and I have a somewhat improved version of this proposal.

I would really like to see this happen, I just haven't had time to push it forward lately. I think that this is a much much better use for single quotes than raw string literals.

IMO, if we ever support them, raw string literals in Swift should be verbose/explicit because they are infrequently used and are merely redundant syntax for already existing ways of doing things (multiline strings with escapes). Mixing them into single quotes doesn't seems like the wrong encoding.

-Chris

4 Likes

i think i’ve used characters as integer scalars hundreds of times while i have been in a situation where a raw string would be useful maybe two or three times. also,, typing a backslash is way easier than typing Int8(bitPattern: UInt8(ascii: "e"))

1 Like

i don’t have a german keyboard

1 Like

You shouldn't actually need one, but << and >> would be alternatives if you want to stick with ASCII.