Pure Bikeshedding: Raw Strings (why yes, again!)

(Tino) #75

Actually, I can't remember any real argument against it. Yes, they look similar to double quotes, but hey, they are similar - and I don't think you'll need more than three in a row often, so it's easy to see the difference unless somebody actively wants to cause confusion.
Single quotes look fine, serve a similar purpose in shell scripts, and, last but not least

We'd rather save single quoted literals for a greater purpose (e.g. non-escaped string literals).

let simple = 'This doesn't crash: \(1 / 0 as Int)'
let confusion = '''Two quotes: '''''
let workaround = "'''" + 'This doesn't crash: \(1 / 0 as Int)' + " and contains single quotes as prefix"
let multiline = 'Maybe newlines
will be allowed as well in raw strings?'
let broken = '''''
'''With the established concept for multiline strings, we could include single quotes at the beginning as well'''

So, what's wrong with single quotes?

(John Holdsworth) #76

I see your link and raise you Prepitch: Character integer literals. regardless which route you go down the point is the same you can only claim a character once and I’d be disappointed to see a whole character ‘ squandered on something as niche as raw strings.

There are many other problems with using ' and ` that have just not been thought through. Using repetitions of the quote character will confuse external editors, mean you can no longer detect multiline raw strings and create problems using the quote at the start of the intended string that are not solved by adding more rules such as requiring an odd number - what if you want two quote characters at the start of the string. The more general statement is that you really need to have a different character for the extra delimiter than your quote character at the least such as ##”a string”## or \\”a string”\\ which isn’t all bad. I'd keep “” for strings with some other indicator or prefix to engage special treatment of it’s contents rather than add something so syntactically distinct to the language as ‘.


Here’s a thought for consideration:

Those of us who remember the old days when files had resource forks (and especially those of us who played and modded Escape Velocity :-) are familiar with the idea of storing many strings in a single file. The younger crowd may have done the same thing using plists, json, or xml.

In any event, the principle I’m getting at is, rather than having each raw string in its own file, which could indeed get very messy to deal with, an alternative is to introduce the ability to store many strings in a single file. A simple and well-known format could be used, which allows each string to given a unique name.

Then that file full of named strings could be included in a Swift project, and its contents accessed by name *at compile time*.

(Erica Sadun) #78

Can we do that? Just add pounds without any starter?

(John Holdsworth) #79

Sure, it’s unambiguous by the time you reach the “ so it can be lexed. Perhaps \”a string”\ speaks more rawstring ahead. The menagerie of possible syntaxes is now:

r”a string"
r##”a string”##
##”s string”##
\”a string”\
\\”a string”\\
#raw”a string”
#raw##”a string”##
#raw(“a string”)
#raw(DELIM”a string”DELIM)
“\qa string”
“DELIM\qa stringDELIM”
“\^a string”
“DELIM\^a stringDELIM”

(Tino) #80

... and the first answer in thread about single quotes for character literals is

In the end, we might be saving that ' forever ;-)

Agreed. Without a delimiter between "real" delimiter and payload, you can't tell where the payload starts... at the same time, I think the use of different characters is a major reason for ugliness.
So, can we find a separator with minimal ugliness?

let string = ' space isn't ugly, but might be confusing'
let string = '''・This looks humble, but I don't know how to type it'''
let string = '.easy to type, not really beautiful'
let string = ''':How is that?"'
let string = '''/How is that?"'
let string = '''\How is that?"'
let string = '''#How is that?"'
let string = \'We could also use another character that is still available\
let string = \\\'Would be a good fit if we choose slashed for /regex/\\\

Having arbitrary delimiter strings imho adds much ugliness, and in real applications, I don't think the decrease in flexibility is a real issue.

So, are there any dealbreakers for the "repeated marker with separator" family of options?

(Brent Royal-Gordon) #81

I don't think anyone's made a comprehensive case for single-quote yet, at least not in this thread. I'm going to try to make that case now.

1. It's highly precedented.

I looked around for a good list of string literal syntaxes, and Wikipedia's was the best I found. (Granted, that was after I cleaned it up and added a few things.) Here's what they list for "quoted raw" strings, with some niche examples (APL, Pascal, Cobra) removed:

Syntax Language(s) Possible in Swift?
'Hello, world!' Bourne shell, Perl, PHP, Ruby, Windows PowerShell Yes
q(Hello, world!) Perl (alternate) Maybe (depends on delimiter)
%q(Hello, world!) Ruby (alternate) No (% is a valid prefix operator)
R"(Hello, world!)" C++11 Yes
@"Hello, world!" C#, F# Yes (but would be awful for Obj-C switchers)
r"Hello, world!" D, Python Yes
`Hello, world!` D, Go No (conflicts with escaped identifiers)
raw"Hello, world!" Scala Yes

There are two potentially parseable choices which have strong precedents in many languages: single quotes and r/R. We've already rejected the latter, so if we want a syntax that will be familiar to programmers coming from other languages, single quotes are the only option.

Swift is not afraid to blaze new trails, but we don't do it for the hell of it. There's a great option sitting right in front of us. We should take it.

2. A large enough difference in degree can make a difference in kind.

When you're in the shell, what does it take for you to use ' instead of "? For me, it's not much; I'll use ' even if I just can't remember what I'm about to paste in.

And that great! Even when it only keeps me from writing one backslash, not needing that backslash makes what I'm writing more clear.

The more cumbersome the raw string syntax is, the less often people will use it for small wins. When the syntax is '\n', of course you're going to use raw literals. When it's raw"\n", you're going to think about it. When it's @rawStringLiteral "\n", no way—and that's a small gain in readability lost.

I think it's notable that, of the above languages, only one of them—Scala—requires even three extra characters to get a raw literal, and Scala only does it because it has extensible string literals. The more ceremony we require, the less value users will get out of the feature.

3. Character literals don't need a unique syntax.

The strongest reason not to use single quotes is that C and many C-like languages (from Java to Go) use them to get an integer literal equal to the value of a Unicode scalar (or something like that, depending on the age of the language). Swift is only broadly C-like so we don't have to mimic that behavior, but that ability might be desirable.

Nevertheless, I think raw strings need the ' character much more than character literals do, because:

  1. Integer literals based on Unicode scalars are useful in fairly low-level code which bypasses most of our string-handling machinery. This is a pretty specialized use case. Raw literals have much broader applications.

  2. Humans reading code usually don't care. When you see case "$":, you don't care whether that's a string literal or character literal; the switch statement will provide the appropriate context for you.

  3. The lexer and parser have no need to recognize a character literal and treat it differently from a string literal. We can—and already do—recognize when we need a character literal in Sema and transform string literals into them; we can leverage that logic from user space. Four lines of code in the standard library would handle that thread's use case with only a small code penalty:

let hexcodes: [UInt8] = 
    ["0", "1", "2", "3", "4" ,"5", "6", "7", "8", "9", "a", "b", "c", "d", "e", "f"]

Raw strings do not have the same property. They need a distinctive syntax because everything from the lexer on up needs to treat them differently.

(Note: there may be better designs than ord(_:). For example, a lightweight ASCII type which kept the fact that these are characters in the type system might be better. Hell, we could add an ExpressibleByUnicodeScalarLiteral conformance to UInt8! I'm just saying, solutions for these use cases can come from ordinary Swift code; solutions for raw string literals have to come from the parser.)

In short, I don't think the case for character literals is very strong. They should not be an obstacle to using '' for raw strings.

We've spent something like two years debating one token for a raw literal feature. If we had to design an entire separate syntax and semantics for a side file with tables of raw strings, we'd probably all retire before we finished.

(John Holdsworth) #82

You can make a case that raw strings have a greater claim on ‘ than char literals and I would disagree but that’s not the main problem. Given custom delimiters are a requirement changing quote does not solve the problem. What is your response to this:

In the end you need an introducer inside or outside the string and if you’re going to do that why not just multi-purpose “”.

(Pedro José Pereira Vieito) #83

What about using Markdown style code block syntax (```):

let string = ```
Example of a Raw String: "C:\DEMO"

let string = ```C:\DEMO```

They are clean and easy to understand.

(Brent Royal-Gordon) #84

Well, we really want two different things: an alternate delimiter and a raw literal. '' can sometimes act as an alternate delimiter when a string contains " but not ' and it doesn't matter whether the contents are raw or "cooked", but it's not primarily an alternate delimiter feature, any more than multiline string literals are.

My current thought on alternate delimiters is that we should put ` before the opening quote and after the closing one:

print(`""No Raw Loops" --Dave Abrahams's coworker"`)

If you have a backtick in your string, you can use two backticks on each end, etc. You can also use `'...'` for a raw string with alternate delimiters, `"""..."""` for a multiline string, `'''...'''` for a raw multiline string, etc.

This would not conflict with any currently valid syntax—neither ", ', nor ` is valid immediately following a backtick, and there isn't even much weird "let's diagnose some common invalid code" logic in the backtick handling, so it's a truly unused corner of the language. I don't think there's any direct precedent for this syntax in other languages, but it's somewhat similar to our use of backticks to escape identifiers which match keywords.

Edit to show all permutations of these features (well, I can't show every alternate delimiter, they're infinite):

// "Cooked" literal
// Raw literal

// Alternate "cooked" literal
// Alternate raw literal

// Alternate (2) "cooked" literal
// Alternate (2) raw literal

// Alternate 3+ omitted to prevent running out of memory and/or author lifespan

// "Cooked" multiline literal
// Raw multiline literal:

// Alternate "cooked" multiline literal
// Alternate raw multiline string literal:

// Alternate (2) "cooked" multiline literal
// Alternate (2) raw multiline string literal:

// Alternate 3+ omitted to prevent running out of memory and/or author lifespan, again

SE-0200: Enhancing String Literals Delimiters to Support Raw Text
(John Holdsworth) #85

You want to use ‘ and `?!? I don’t understand your example

Is that a raw string or cooked where you can use escaping so special delimiters are not required. Alternate delimiters as a requirement are unique to raw strings where normal escaping is not possible and as such a couple of solutions put forward have used this to say if there is a custom delimiter then it is a raw string, for example ###”a string”### which is fine.

Just too complicated. The elegance of the rust approach of repeated different particular characters outside a string with a prefix or “r” or “#raw” or none at all (see above) is that it is simple and easy to grok and document. Otherwise you can’t be certain you have a solution for all conceivable cases.

(Brent Royal-Gordon) #86

Yes, I want to use both. I think we should have three orthogonal features:

  1. Raw literals: If you delimit your string literal with ' instead of ", it treats backslashes within the literal as backslash characters, not as escape sequences or interpolations.

  2. Multiline literals: If you triple the delimiter, you must place a newline before and after the literal's content; newlines within the content will be preserved, and leading whitespace will be stripped to match the whitespace before the closing delimiter.

  3. Alternate delimiters: If you put one or more backticks before the opening delimiter, you must put the same number of backticks after the closing delimiter; instances of the delimiter with less than the required number of backticks will be treated as part of the literal's contents.

You can use these three features in any combination, so they provide a ton of flexibility. Usually you won't need all of them, but when you do, they'll be there. And when you want to write code that contains code, you always have good options which won't require you to mangle your string content more than necessary.

I don't think this is hard to explain to users; honestly, I think it'd be harder to explain why, for instance, alternate delimiters are only available on raw strings. Python has an alternate delimiter, single-line and multi-line strings, and option characters for things like raw, byte, and even interpolation, and users don't seem to find it too complicated to handle.

(Random thought: Maybe the ` should affect the escape character too? That is, in a `"..."` string, you have to write `\n. That would help when you're generating code with backslashes in it, but you want to interpolate content or escape things. It'd definitely be more complicated, though—might be a bridge too far.)

Alternate delimiters are only a requirement on raw strings, but they're nice to have on regular strings too. There's a reason every major scripting language has at least one alternate string literal delimiter with no semantic differences, and the one with the best reputation for string handling has the most sophisticated string literal delimiting features. If you have these features, people will occasionally be very grateful they're there.

(Xiaodi Wu) #87

Wait, since when?

(John Holdsworth) #88

I don’t agree with it personally since multiline raw syntax is so distinctive but it's generally considered to be something we have to cater for. What did I say about “distracting dimension of discourse” first time around?

(Xiaodi Wu) #89

Generally considered by whom? I recall posting a survey of other languages which demonstrated that raw strings and custom delimiters were not conjoined features; some (many?) languages support one but not the other. At what point did it become a given in this conversation that we are designing two features at the same time? To me it seems that at least the proponents of single quotes and backticks do not agree that it's a given.

If we can't agree on what shed we're painting, then I'd suggest that it's time to take a step back before trying to pick out a paint color.

I think this speaks to the core team's exhortation to flesh out motivating use cases. Overall, the process has felt like it's proceeded backwards: it was identified that "raw strings" were something to be added, and now we're trying to figure out what problems we want to solve with raw strings while also figuring out how we want to design them. I'd imagine we'd arrive at a design more organically if we started the other way: here are some problems x, y, and z; why are raw strings the best solution, and how then to design raw strings to best address them?

(John Holdsworth) #90

It’s trying to implement custom delimiters where the wheels fall off the alternate quote implementations. I’m just trying to wrangle the constraints thrown up along the way. You can see where if you look back through the thread(s). If you ask me we’re close to having an array of possible syntaxes that satisfy these constraints that I listed above and just need to decide between them but I don’t view ‘ or ` implementations being among them.

I would disagree to a limited extent you need to know what your going to use it for before you can design the feature. @Erica_Sadun has listed the motivating cases anyway. For me it is a purely technical problem: can you come up with a syntax that can represent any possible string in the manner desired without being able to escape that is not “ugly"


Just throwing this out there: German-style quotation marks:

≪"What is this 'thing'?", he asked.≫

(Have we discounted unicode?)

(Xiaodi Wu) #92

Right, and I'm seriously asking by whom this particular constraint was thrown up, and wondering whether the constraint has been sufficiently motivated by those throwing it up such that we should satisfy it.

To show my cards, I'm in agreement with Brent that custom delimiters are an orthogonal feature, and on that basis I consider it an anti-goal to roll it into this proposal.

(Brent Royal-Gordon) #93

One of the major motivations for adding raw strings is code generation. But raw strings aren't very useful for that if you can't embed single quotes inside them somehow, and even multiline raw strings aren't enough if you want to generate Swift code containing multiline raw string literals.

(Xiaodi Wu) #94

I agree with you up to here completely.

and even multiline raw strings aren't enough if you want to generate Swift code containing multiline raw string literals.

...and here I think we need to take stock. Should several designs for raw strings be discarded because they fail to account for the specific use of case of (a) code generation, (b) of Swift code, (c) containing multiline raw string literals? What are the use cases for these multiline raw string literals in Swift embedded inside multiline raw string literals in Swift?