Pure Bikeshedding: Raw Strings (why yes, again!)

Aesthetics are purely subjective though. The person that prompted me to write what I wrote claimed that the Rust syntax for raw strings is ugly and I simply do not agree. Who's right?

Sorry, but you really need something more substantial than "I think it's ugly" to stop a proposal.

In the last discussion, @Chris_Lattner3 posted a very interesting link:
http://mail.openjdk.java.net/pipermail/amber-spec-experts/2018-March/000446.html
It appeared in a rather late stage, so maybe some people already left and didn't see it yet.
Imho it is a very good analysis of the whole topic, and I think the Java guys are still happy with the result.

4 Likes

On the contrary, dismissing an important aspect of language design as "purely subjective" doesn't carry any weight for me. I'm pretty confident that we'd find substantial consensus that certain proposals for syntax are uglier than others.

In this case, I'd wager that a significant proportion of the community would agree that the Rust syntax for raw strings is lacking in aesthetic appeal; that's certainly a good reason to consider alternatives, especially in a thread about pure bikeshedding.

Aesthetics are subjective, so "I think it's ugly" should only carry weigh to the extent that it is a common point of view. My sense is that many people agree that ###"foo"### is ugly; indeed, Swift's syntax seems to consciously use # as a tiny dose of ugliness to call attention to unusual semantics. Several people in this thread seem to be of the opinion that multiple #es is too much ugliness.

I like the way this looks, but ^ is an operator character, so repurposing it would be source-breaking.

Ah, I see. I didn't think of that. I started with just single quotes around the double quotes, but I thought that might be visually confusing depending on the font. That's why I came up with the ^. Here's an example with single quotes:

let swift = '''"Here's a ''"raw string"'' inside a raw string."'''

Perhaps backticks are distinctive enough in most fonts?:

let swift = ```"Here's a ``"raw string"`` inside a raw string."```

It's best to look at the inner raw string, where all quotes are the same color. I've added an extra single quote/backtick in the examples, so it shows two single quotes/backticks next to the double quote in the inner raw string.

If ^ is considered to be a good choice:

let str = "\^path\to\some\file\^"

is still available.

Perhaps it is a good way to call attention to unusual string parsing rules, then.

Backticks are already used for something quite different, why should they also be used for raw strings? There is also still the question of whether other types of string might be required, because this syntax isn't extensible.

I'm not clear what the proposal is here. Is "\^ the opening delimiter and \^" the closing one? Why is it asymmetric? How do I modify the delimiter so I can include a raw string inside a raw string?

1 Like

If only it would be that easy to write a proposal... ;-)
It's just that it seems we once again steer towards a solution that has already been returned, despite other options which haven't been discussed as much.

The snippet was no specific suggestion, but rather pointing out another options.
With a escape sequence that is currently not valid, there are endless variations (besides custom delimiters, we could also define another combination that takes the role of \()).

All of the following examples could be chosen to represent raw string without breaking old code:

print("'\^this is \(self)'")
// > this is \(self)
print("#\^this is '\(self)'#")
// > this is '\(self)'

print("'\_this is \(self)'")
print("^\#this is \(self)^")
// > this is \(self)

print("escape:$,delim:^\#this is \(self): $(self)^")
// > this is \(self): <myModule:MyObject: 0x7fd8d04131f0>
print(":$:^\#this is \(self): $(self)^")
// > this is \(self): <myModule:MyObject: 0x7fd8d04131f0>
1 Like

Iā€™m not sure weā€™re headed for something that was rejected it was just returned for clarification, though the core team did note rā€ā€ seemed to be unpopular.

@Tino's idea from the review thread was always interesting. How about just:

print(ā€œCUSTOM_DELIMETER\qmy raw stringCUSTOM_DELIMETERā€)

I really like the [#]"content"[#] for its scalability and simplicity. If the only thing blocking consensus for it is the # character, may I then suggest using \ instead?

let s = \"here is a "string" with no \(escaping)"\
let swiftsource = \\"let s = \"here is a "string" with no \(escaping)"\"\\

let moresource = \"""
  let s = \"here is a "string" with no \(escaping)"\
  let swiftsource = \\"let s = \"here is a "string" with no \(escaping)"\"\\
  """\

I'm not sure about other peoples motivations, but a thing that bugs me with many ideas is that they sprinkle code with special characters like #, ^ or .
I care much less about clutter that is contained inside strings (after all, those are just strings - and if the compiler can ignore their content, I can too ;-), so the "invalid escape sequence concept" puts the mess inside the string.

But others might prefer the exact opposite, so the content shouldn't be damaged more than needed, and letters might be more likely to cause confusion:
I don't know a single example for a word that starts with a "q" and that leaves you with another valid word when you strip that prefix - but I don't know that many languages, so that problem might be real.
Imagine the feature would use "\l", and you encounter something like

let str = "raw\linkColor\blueraw"

Now compare with

let str = "^\^inkColor\blue^"

I think my favorite proposed syntax so far is the multiple single quote wrapped strings i.e:

'''he can't do that'''

This form cannot handle a ' at the beginning or end of a string, but I think this can be solved by requiring an odd number of 's.

I don't think it would be a problem to reuse backticks for this. And the concepts are not that different. In both cases the backticks affect how the enclosed content is parsed.

That's correct and I'm not sure where we stand on this. We could always still use the 'r' or 'raw' prefix, but I kind of like the cleanliness without a prefix.

Perhaps it's something to get used to, but my first thought was that this would be confusing.

Actually, I can't remember any real argument against it. Yes, they look similar to double quotes, but hey, they are similar - and I don't think you'll need more than three in a row often, so it's easy to see the difference unless somebody actively wants to cause confusion.
Single quotes look fine, serve a similar purpose in shell scripts, and, last but not least

We'd rather save single quoted literals for a greater purpose (e.g. non-escaped string literals).
(https://github.com/apple/swift-evolution/blob/master/commonly_proposed.md)

let simple = 'This doesn't crash: \(1 / 0 as Int)'
let confusion = '''Two quotes: '''''
let workaround = "'''" + 'This doesn't crash: \(1 / 0 as Int)' + " and contains single quotes as prefix"
let multiline = 'Maybe newlines
will be allowed as well in raw strings?'
let broken = '''''
'''With the established concept for multiline strings, we could include single quotes at the beginning as well'''
'''''

So, what's wrong with single quotes?

2 Likes

I see your link and raise you Prepitch: Character integer literals - #25 by allevato. regardless which route you go down the point is the same you can only claim a character once and Iā€™d be disappointed to see a whole character ā€˜ squandered on something as niche as raw strings.

There are many other problems with using ' and ` that have just not been thought through. Using repetitions of the quote character will confuse external editors, mean you can no longer detect multiline raw strings and create problems using the quote at the start of the intended string that are not solved by adding more rules such as requiring an odd number - what if you want two quote characters at the start of the string. The more general statement is that you really need to have a different character for the extra delimiter than your quote character at the least such as ##ā€a stringā€## or \\ā€a stringā€\\ which isnā€™t all bad. I'd keep ā€œā€ for strings with some other indicator or prefix to engage special treatment of itā€™s contents rather than add something so syntactically distinct to the language as ā€˜.

4 Likes

Hereā€™s a thought for consideration:

Those of us who remember the old days when files had resource forks (and especially those of us who played and modded Escape Velocity :-) are familiar with the idea of storing many strings in a single file. The younger crowd may have done the same thing using plists, json, or xml.

In any event, the principle Iā€™m getting at is, rather than having each raw string in its own file, which could indeed get very messy to deal with, an alternative is to introduce the ability to store many strings in a single file. A simple and well-known format could be used, which allows each string to given a unique name.

Then that file full of named strings could be included in a Swift project, and its contents accessed by name *at compile time*.

Can we do that? Just add pounds without any starter?

Sure, itā€™s unambiguous by the time you reach the ā€œ so it can be lexed. Perhaps \ā€a stringā€\ speaks more rawstring ahead. The menagerie of possible syntaxes is now:

rā€a string"
r##ā€a stringā€##
##ā€s stringā€##
\ā€a stringā€\
\\ā€a stringā€\\
#rawā€a stringā€
#raw##ā€a stringā€##
#raw(ā€œa stringā€)
#raw(DELIMā€a stringā€DELIM)
ā€œ\qa stringā€
ā€œDELIM\qa stringDELIMā€
ā€œ\^a stringā€
ā€œDELIM\^a stringDELIMā€

... and the first answer in thread about single quotes for character literals is

In the end, we might be saving that ' forever ;-)

Agreed. Without a delimiter between "real" delimiter and payload, you can't tell where the payload starts... at the same time, I think the use of different characters is a major reason for ugliness.
So, can we find a separator with minimal ugliness?

let string = ' space isn't ugly, but might be confusing'
let string = '''惻This looks humble, but I don't know how to type it'''
let string = '.easy to type, not really beautiful'
let string = ''':How is that?"'
let string = '''/How is that?"'
let string = '''\How is that?"'
let string = '''#How is that?"'
let string = \'We could also use another character that is still available\
let string = \\\'Would be a good fit if we choose slashed for /regex/\\\

Having arbitrary delimiter strings imho adds much ugliness, and in real applications, I don't think the decrease in flexibility is a real issue.

So, are there any dealbreakers for the "repeated marker with separator" family of options?