Pure Bikeshedding: Raw Strings (why yes, again!)

Hi all,
Got a bit confused by all the entries on this topic. my two cents then:
I do prefer the simple solution that is available in C# to obtain a verbatim (raw) string from a literal.
(I wonder why this hasn't been already implemented in Swift at the start of its existence.)

In C# one would use:

 str = @"\nfoo\nfoo\tab"    // C# code

This would assign literal string to str as wysiwyg.
For the time being, for those who cannot wait for a swift builtin implementation:
I took the liberty to implement this as a nifty Swift String Extension, like so:

 //  Created by Ted van Gaalen on 27.06.18.
import Foundation
prefix operator | // just a vertical bar as @ is not available for a custom operator.
// Use this "|" op in front of a string literal to get it verbatim (raw)
// like so:
        let str = |"\nfoo\nfoo\rfoo\tfoo" //which evaluates wysiwyg.  
extension String  
{
        static prefix func | (s: String) -> String
        {   // escape char dictionary: customize/ correct if necessary:
            var escap: [Character:String] =
                ["\0" : "\\0",
                 "\\" : "\\\\",
                 "\t" : "\\t",
                 "\n" : "\\n",
                 "\r" : "\\r"   ]
        
            var result = ""

            s.forEach{ result += escap[$0] ?? String($0)}
           return result
        }
}

Please note that this raw string operator function has been only marginally tested,
because playground in my XCode Version 9.4.1 (9F2000) doesn't work
as it also didn't in previous XCode versions. It tells me that it is running
and it would probably run forever. Maybe it is looking for 42.
Someday they might get it working again, I hope.

The above extension might be not very efficient, but because its deployment
only make sense with string literals, this is probably of less importance.
Oh, and please don't tell me that I am deploying FP here :o)
Note, that a lot of "missing" language features and other whistles and bells
can be implemented quite easily as Swift Extensions, now, ain't that cool?
Kind regards
TedvG

It isn't that easy, as no string transformation can deal with invalid escape sequences like a raw literal can (nor can it disable interpolation).
But I agree that we could do more with functions than we do now:
Given the discussion about Compile-Time Constant Expressions for Swift - #105 by marcrasi, I don't think it makes much sense to have two kinds of raw string literals.
The multiline variant wouldn't be really raw anyways, and the processing can happen explicitly:

let rawString = '
This is a newline,
this \n is not.
   Look, two spaces!
'.whitespaceStrippedAsYouLike

I think this should only be an extension on multiline string. Making it work for single string is making this proposal really hard for little gain. it is really easy to scape 100 characters vs a whole paragraph.

in c# a raw string is also a multiline string, maybe introducing multiline first is hindering its evolution.

2 Likes

Feels to me like raw strings has evolved so I’ve prepared an implementation and made a toolchain available (it appears as a dev snapshot for the 27th and you’ll need to restart Xcode). The toolchain supports both # and ` (backtick) as a custom delimiter and a “differentiator" prefix of #raw or raw if you want to help test the implementation.

I found the visual weight of # to be a virtue in a delimiter when compared to ` but #raw as a prefix to be asymmetric and one # too many when compared to just raw.

I suggest we should make a start on writing a proposal for review with the syntax:

raw”a raw string”
raw#”a raw string”#
raw”””
	a raw string
	“”"
raw#”””
	a raw string
	“”"#
2 Likes

I can't help myself, but all of the variations look rather unpleasing to me...

Couldn't we follow the precedent of other literals (colors, images...)?
The syntax of those is hidden by Xcode, so it would be ok if it's a little bit heavyweight - and imho it wouldn't be that bad without special editor support either:

#stringLiteral(delimiter: "$", "Just " isn't enough to close this string$")

It would also be possible to remove the special meaning of "\", but keep string interpolation:

#stringLiteral(delimiter: "#", interpolation: "$", "This \ has no effect, but $(2 * 2) will print 4#")
1 Like

Sure, that is firmly on the “thinks raw strings will be rare, so are happy with a very verbose syntax” end of my scale from the original thread.

Wild idea. What if we build on top of existing multi line string literal.

Something like this:

let rawString = """4
                This is the first line
                Can contain """
                """
                And this is the last line
                """

Add a trailing number as the number of lines in the string literal to avoid ambiguity. It basically means: treat the next n lines as raw string.

For single line, we can use """0 (No trailing newline) and """1 (Has trailing newline).

1 Like

We think that requiring multiline strings format would be burdensome for short but “toothpick”-dense strings, and people might not use raw strings when they would aid comprehension. The number of lines thing also seems kind of error-prone—what happens if you delete a line in the middle of the string but forget to update the count?

not really a strong precedent at all, those literals are really a proprietary extension on Swift. I did not know they existed at all until v recently

1 Like

I do not share this concern. Using multi-line syntax means that the entire string of interest—and nothing else except leading whitespace—exists on its own line. In my opinion, this is a major benefit. When the code is being read, the reader immediately sees what the contents of the string are, and does not have to mentally wrestle with verbose delimiters on the same line.

Again, I think that guaranteeing raw strings will always use multi-line syntax is a significant readability win, and it should be viewed as a desirable goal in its own right.

Just signed up, "long time listener, first time caller". I only considered my own post after seeing the one from a few days ago suggesting raw"string". I though the rust syntax like r"string" had been decidedly turned down, so going back in that direction with only slightly more verbosity looked like a dead end. But that post's observation seems totally correct to me, that using a poundsign # in the keyword and also immediately after as a disambiguation character is ugly and confusing. Using other characters as the common disambiguator has been an avenue of discussion, but parsing issues have been mentioned with some of those choices. The idea that only multiline string can be raw also seems like a dead end, both of these seem like symptoms of this thread running out of ideas.

My thought was to go back to the basics and try to find a different way forward. One of the acknowledged options we've been discussing is to use a poundsign keyword, with #raw being the popular choice. A typical form for other poundsign keywords like this uses parentheses and it seems most here think it to be too verbose, but I don't think anyone has suggested using this canonical form for the edge cases and then allowing terse alternates when there are no ambiguities.

It seems that parentheses around the quote can themselves be a form of disambiguation.

#raw("some\raw\string")
#raw("raw string with a "quoted" part")

Balanced disambiguation characters could be used as need, like earlier suggestions. Using a poundsign for this leads to the same problem mentioned by the earlier post, but perhaps it's somewhat mitigated by being inside parentheses. I think a set of other characters could be selected, maybe ones that would be unavailable except in the recognizable special case within a #raw(..), possibly matching left/right pairs even repeated parentheses.

#raw(#"raw string with a "quoted" part"#)
#raw(|"raw string with a "quoted" part"|)
#raw(<"raw string with a "quoted" part">)
#raw(("raw string with a "quoted" part"))

I wouldn't personally like to use backtick as such a character, it's usually used as its own quote character and having it used in conjunction with a double quote looks weird to me. I'd instead favor using backticks fully instead of double quotes.

#raw(`raw string with a "quoted" part`)

Ok, nothing too new yet, here's my slightly new idea: allow a special form of #raw(..) that leaves out the parentheses and allows #raw to apply to the string that follows, possibly with whitespace inbetween. There's precedence for a poundsign keyword that looks like this: #if.

Without whitespace it looks just like what's been suggested previously, however we wouldn't have to shoehorn the disambiguators into this simplified syntax and instead rely parenthesized form for the edge cases.

#raw "some\raw\string"
#raw"some\raw\string"

It seems that most have regarded this to be nice for multiline strings, I don't think allowing whitespace between the keyword and doublequotes would change that.

#raw """
Banana
"""

I'm not sure if parsers would be happy if disambiguation characters or alternate quotes would permitted in this form too, but with some separation between the keyword and the string I think it looks slightly nicer than earlier suggestions.

#raw #"raw string with a "quoted" part"#
#raw |"raw string with a "quoted" part"|
#raw `raw string with a "quoted" part`

That's all I got. First post complete, do I get a badge? (oh, there are badges?)

4 Likes

This sounds feasible to me; it would just require a different parsing approach from what we've done so far. (#raw would be its own token, so it would require more interaction with the parser, which is totally doable but slightly more complicated than solutions which are handled entirely in the lexer.)

I'm not sure it's a good solution, though. If we think the #raw token is unreadable and requires additional whitespace, why are we using it for a code readability feature? If the whitespace is necessary for readability, shouldn't we count the whitespace as part of its syntactic weight?

The compiler should be able to tell and emit an error (and fixit for the correct count) for almost all situations. The only reason the count exists other than telling the compiler this is a raw string is to distinguish """ inside the string. It’s very unlikely """ is at the wrong line and the code can still compile. The only case I can think of is:

let x = """1
        abc
        """
        let y = """
        """

Using a line count is definitely much better than using the character count (which is the "Pascal"-alternative to denote where a string ends, without using a delimiter like C does).

But what when the literal has 136 lines?
You'll have the same problem of having to count something, it just happens later than with characters.
So, you'll probably use tools like wc to help you dealing with raw strings - but when you use tools, you may as well just use a translator that adds escapes where they are needed.

I appreciate the idea because it's thinking out of the box and brings something new to the discussion, but I don't think it's practical.

I would still very much like to consider an explicit number as the 'does this match' strategy

'this is 1'ANOTHER RAW STRING'1 a rawString'

The numbers are basically strings so maybe restricting them to digits is hard to explain (maybe unnecessary?) but there isn't any counting of characters.

Why doesn't this apply to regular strings? Should all strings be multi-line? If not, why should raw strings in particular be penalised?

1 Like

Because the delimiter for a regular string is extremely simple (a single character) and it takes essentially zero mental bandwidth to recognize where the beginning and end of the contents of the string are.

In order for raw strings to be capable of holding arbitrary contents they must use customizable delimiters, therefore it necessarily requires additional effort for readers to identify the boundary between the delimiters and the contents of the string.

If we stipulate that the boundary between delimiter and raw string will always be a newline, then the burden is greatly reduced. In particular, the beginning of the raw string is trivial to locate—it is the very next line—and the ending delimiter can be scanned for vertically.

1 Like

A lot of single-line uses for raw strings will not require custom delimiters, being something simple like raw"C:..\File.txt" or whatever the prefix is expected to be. Also, as discussed previously in this thread, custom delimiters may also be useful for non-raw strings, so the distinction is not clear to me. I'm not very concerned about the difficulty of scanning for the ending delimiter, because it will almost always be obvious from context, and also usually syntax highlighted.

There is one other possibility: using \ as the raw string differentiator which, while a bit “magical” and not extensible does at least have some semantic sense to it. A sort of “escape the string as a whole”.

\”a raw string with \ in it”
\##”””
    a raw string with “”” in it
    “””##

Any support for \ over raw as a prefix? It has a certain appeal - would this be “swiftier”?

7 Likes

This discussion is has stretched too long and is stopping the Swift Evolution. Use #raw and call it a day. The hash (#) can then become a string operator for the type of string we are dealing with
#objc("Hello World")
#num("123566")
#ascii("Hello World")
#raw("\n Hello \n World")
...
...

And lets try not to rust swift up.

Onwards ladies and lads!