Pure Bikeshedding: Raw Strings (why yes, again!)

I honestly like the backtick delimiter approach suggested by @davedelong and @mdiep (via Go).

Is this doable in Swift? If I'm understanding the idea correctly it's 3 or more ticks as the delimiter for multiline and 1 or more ticks as the delimiter for single line, right?

…is that one horse or two? You said the word “two” but then by my parsing you only described one option that looks like r###""###.

Regardless, *I* feel that both Python-like r"" and Rust-like ### are quite ugly, and I strongly oppose adding them to Swift. I would rather require people to use separate files for raw strings, than to introduce a syntax that appears so out-of-place and stilted.

Of course, it would be preferable to find a nice-looking solution if we can.

1 Like

Single backticks are part of identifier names. Usually used to use keywords as identifiers.

enum Foo { case `case` }
2 Likes

How about using single quotes as delimiters? They are natural for string literals, and as far as I can tell they aren't used anywhere in the language.

let rawString = 'foo'
let multilineRawString = '''
this is
a multiline string
'''
let withSingleQuotes = ''''
if you want to include '''
you have to use more quotes outside
''''
1 Like

Thats a bit unacceptable imo. There are use cases for raw strings and requiring users to store them in a file is a pretty big burden. Just because the syntax for specifying them might look a bit "ugly" doesn't mean you need to exclude it from the language, especially if the same syntax works in other languages with minimal fuss.

I don't think you're going to find any elegant syntax for raw strings, since given the nature of them, any delimiter could also appear in the string, which requires specifying some other delimiter.

Oh of course, so it would need to be 3 or more regardless I suppose.

Afair, there have been plans to keep those for other purposes — and imho that illustrates the biggest problem of this thread:
We aren't wiser now than in the first discussion.
It's neither sure how a potential regex-syntax will look like, nor if there will actually be a compelling use case for single quotes.

I think we should have answers to those questions before a final decision happens — and I'd put the syntax for multiline strings up to debate again if a change is needed for an elegant solution for the whole topic.

3 Likes

It is not out of the question, especially given this explicit statement in the commonly proposed file: "We'd rather save single quoted literals for a greater purpose (e.g. non-escaped string literals)".

  • Rust (via Python) gives us raw#"...."#. It is road-tested, strong, and ugly. @beccadax and @johnno1962 have both made a strong point against #raw and @raw over raw.
  • Go-style backticking preserves the notion of fenced information, via @davedelong and @mdiep
  • Single quotes are specifically called out for a potential greater purpose, namely exactly this problem, courtesy of @cukr and mentioned in the Commonly Proposed file for "non-escaped string literals".

Both of the latter two designs can be Rustified to allow additional members of the opening and closing elements to disambiguate their use within the raw string, e.g.

''he can't do that'' // the single quote in can't
``use `italics` for that`` // the code voice around italics

I really want to settle on a single design soon

In your OP you list a lot of different use cases that could be added to the proposal. But from this thread, I'm not seeing much about the use cases, just more discussion on the syntax.

I'm getting the feeling from this and the other discussions, that the design of raw strings is very opinionated and I don't think we're going to see a syntax that everyone can settle on. So I think it's probably best to just pick one of them and stick to it. And the final syntax I could foresee being dictated by the core team anyway. The reason the original proposal was rejected wasn't because of the syntax, it was because it needed more concrete use cases.

I have seen some discussion about whether this feature should just be put on hold (possibly forever) until the regex issue is resolved. So I think it's probably best to not use regex literals as a use case for raw strings. Since mentioning them in this context will just bring out that argument.

3 Likes

In many ways we’re not much further along though this pitch has reinforced the following conclusions:.

  • Custom Delimiters are viewed as a requirement
  • a single character “r” as the signifier is so strongly unacceptable to a sufficiently large group in the community it rules out Python/Rust style raw strings.

I’d also caution against reaching for “a different quote" as a solution as raw strings are likely to be so rare that using up an entire character seems like a waste to me. Also ` already has a role in Swift and I’m holding out hope that ‘ will someday be used for single character Integer literals and how to handle custom delimiters? Arbitrary numbers of quotes as a way around this breaks the single/multiline distinction and unless the number of quotes is always odd won’t play well with external editors. How to put that quote character at the beginning of a string?

Without wanting to railroad the thread this seems to funnel the options towards a #raw modifier/introducer/signifier along the lines of #selector as discussed. “\” a as an signifier is tempting but a bit too in the direction of Perl for me. I’m one who thinks key paths should have been #keyPath(a.b) they are so rare.

Perhaps the decision should be postponed until we have a better idea where regex literals are headed but I’m not particularly optimistic that such a thing will prove to be viable so we could be waiting a long time and I don’t think we can revisit multiline syntax at this stage. I’m sure it is possible to take a small incremental forward step without prejudicing regex literals or putting everything about strings up for discussion again.

hows vv("this is a raw string")? I am curious what raw strings would look like in plain form before being parsed by the compiler? because whatever raw string identifier is used, indexing each raw string , each raw strings sub raw strings and each raw strings recursive strings should be indexable as well. I would expect vv("this is a raw string") to be read as "vv: this is a raw string" with "vv:" delimiter. Any thing within the delimiter can give context to the type of raw string it is (bash, sql, swift, css ... )

You don't want to just have repetitions of a single character, because then you can't have that character at the beginning or end of a raw string. (For example, with these syntaxes, how would you write a raw string with a single quote or backtick at the beginning?) You need several repeating characters and then one final, distinct character to mark the transition between delimiter and data.

2 Likes

You can if you make the string multiline.

While it is uglier than single line, I think it's still prettier than most of the proposed syntaxes and is already a familiar syntax.

let foo = '''
' one at the start and one at the end '
'''

If the main objection against the Rust-style repeated #s is that they are ugly, then they seem like a great solution to me. Everything else seems to have some medium-to-serious issues around clashing with the syntax of other features, breaking symmetry by only allowing multi-line strings, reserving a whole new quote character for a somewhat niche feature (especially when ' was recently strongly earmarked for Character-like uses), etc. So my suggestion would be to hold your nose if you think they're particularly ugly, or even leaning into the heavyweight appearance as a feature that calls out the different escaping behaviour.

The open questions for me around the # solution are still the ones I posted previously, particularly:

This discussion has been useful to me, and I don't think it has put us back in the same place as the previous threads. I would support a proposal based around repeated #s, once some of the answers to the above questions are nailed down.

1 Like

I feel that use of "it's ugly" should be banned as an argument in Swift Evolution. It's purely subjective and boils down to "we shouldn't do it because I don't like it". Also, sometimes ugly is good. If you have a feature that people should only use sparingly but might get abused, ugly is a good cue.

I think that that raw strings will be a useful but not often used feature so the aesthetics are not that important. I'd be OK with almost any of the proposals here because I don't anticipate using it very often.

The easiest way to solve this would be to let the user provide the delimiting character(s). (#raw(“, ”, “"quote"”), #raw(‘, ’, ‘“fancy "quote"”’))

This will result in abuse but hopefully will be manageable.

As an alternative, just using “” should work. I've never ever seen a regex with a character, so would think that this is a relatively good compromise. In the 0.1% case where you need to have a regex with that character in there, escaping it seems ok to me.

I'm sorry, but I just can't agree with this. Aesthetics matter. Even if you think they shouldn't, in practice they're a factor when programmers choose a language. They matter to Miguel de Icaza, for instance.

And why shouldn't they? Companies spend a ton of money on the aesthetics of workplaces because it has an impact on worker productivity, client experiences, and so on. Why should code be any different?

A professional programmer might spend 50,000 hours of their career looking at code. We should make sure that if they're looking at our language, we don't stab their eyeballs out with sharp syntax.

10 Likes

Not sure if everyone here is making a general point (in which case could you please take it to a separate thread) or a specific point about raw strings and this proposal. On the specific case, it would be great it every part of the language had widely-agreed-to-be-beautiful syntax, but in practice this has to be balanced against practical issues (e.g. the requirement to embed code in strings in code places a lot of restrictions on how unique the delimiters have to be, making them almost required to stand out from “regular code” in a way that might be jarring), frequency of use (e.g. attribute-heavy code can be ugly but is only required in niche uses), etc. I think Swift has a very nice syntax for regular strings, both single line and multi-line, and I have no objection to raw strings, a much more niche feature, having a more heavyweight syntax that some might view as ugly. And again, perhaps standing out is a “feature not a bug” for strings that have very different parsing from normal strings.

As a variant on the Rust style, perhaps instead of # we could use ^, which seems less ugly to me. And I guess we don't need a leading 'raw' or 'r' (or do we need to differentiate between multiple types of raw strings?):

let raw = ^"My raw string."^
let swift = ^^"Here's a ^"raw string"^ inside a raw string."^^
let multiline = ^"""
    My multiline raw string.
    """^
1 Like

You answer your own question. I side with both counterarguments you make. Sure, this would be possible to lex but we can’t keep ascribing quite niche meanings to individual characters which are near impossible to google. There are only so many keys on the keyboard and down this road is a syntax laden language like Perl, impenetrable to the uninitiated.

I’d be happy enough with #raw[#]*"a string”[#]* but would prefer the more descriptive #raw(SQL”some sql”SQL) though there doesn’t seem to be take up of the latter it has to be said.