SE-0200: "Raw" mode string literals

I find it a given that this would be useful to me in a variety of contexts. And providing a feature does not mandate that the user will use it well.

So yes. Like that one. Or like X, _, , or 💩.

Thank you for providing justification. I am undecided and have a few clarifying questions regarding these reasons.

Could you elaborate a little more on this point? What about custom delimiters makes Swift tooling especially difficult? Is this an argument regarding inherent complexity of lexing (e.g. more lexer state) or incidental complexity of Swift's current lexer (e.g. implementation limitations)?

Do you mean literally in source code or in the presentation of source code (e.g. by an IDE)? Very long multi-line string literals can also have this same effect when viewing a portion of literal source code (e.g. a diff).

Why do custom delimiters seem especially egregious to you? Why does this reasoning apply to them but not multi-line literals?

The ability to nest a literal inside another literal, e.g. as others upthread pointed out, a literal containing Swift source code. However, control over the delimiter would allow for careful nesting. (Not arguing pro/con, just a use case to help flesh out explicit rationale).

Note that the Swift compiler is fundamentally incapable of determining what is or is not a single grapheme at compile time, as that designation depends on the version of ICU present at run time. This is how Swift 4 apps get support for new emoji in new OSes. The Swift compiler attempts an approximation, often overly accepting of potential new emoji sequences.

While I enjoy the fact that emoji open up a whole new design space for custom delimiters (with snarky undertones), there's a minor catch regarding grapheme-length restrictions.

Fair enough. I still think that one can find a single character ascii delimiter that will not be used in a string in nearly every circumstance. I also welcome other solutions and remain positive about the utility of this proposal regardless of its final form.

Hi Michael,

I don’t find custom delimiters egregious but they seem to me a feature on top of a feature at a time when the proposal is already struggling to wriggle under the complexity against benefit bar.

It’s appropriate that we discuss them now and come up with a couple of designs to make sure we don’t close off future directions the language could take but for me I don’t think they need to be considered an essential part of this proposal at this time.

There are two designs in play at the moment both of which I’m sympathetic to:

#raw(Xa stringX)
where X is an ascii character for largely technical reasons involving the compiler not having the same support for segmenting graphemes as Swift itself as you mention and a “(“ opening delimiter character would map to “)” at the end etc. This would not be too difficult to implement but what would the syntax be for multiline raw?

and
#raw(delimiter: “##”, ##some code.print”hello”)##)
..where there is a default for the delimiter string of “\””. This involves parsing a string to parse a string and as you as much as anyone will know would be more of a departure for the existing lexer code having worked on it with me. I am not at all keen on the idea of strings not being delimited by “ however. This gives users too much license and would be a step in the direction of the Perl.

raw-multi-line-custom-delimited strings would suffer from the same problem in term of loss of context outside the IDE as multiline already does. Nothing specific there.

So in summary, in my own mind I built the case not including custom delimiters as an essential part of the proposal due to the following.

  • There is no doubt they would be more complex ranging from slightly to significantly. I’m just applying Occam’s Razor.

  • There is no real need for them as we already have a syntax that accept any character other than the sequence “) and newline.

  • We already have a couple of designs in hand that can still be introduced at a later date if this turns out to be a pressing requirement.

  • Custom delimiters are actually a bad thing in and of themselves in terms of code legibility though that’s just my personal opinion.

I find arguments about not being able to paste in Swift code using raw strings into Swift code using raw strings a little contrived myself. Like saying give me a vaccine for disease A provided the disease isn’t A. This is all academic anyway when, if this really is a requirement you should really be using a resource file and loading it from disk given the data is completely static.

We’re just trying to maximise what is possible without having to stand on our heads in terms of an implementation and bringing unwarranted complexity into the language. Each new increment of complexity has to be justified in terms of it’s actual utility.

3 Likes

There is a third design which several people have discussed in this thread, with no octothorpe involved.

It does not reflect well on a proposal author when they ignore the existence of alternatives that have been put forth.

hello i was summoned

Care to share?

Is this third approach the triple-double-quote-custom one hidden under the triangle here? I don't recall seeing further discussion of this approach, but it is a long thread.

Or do you mean this one? It appears actively at-odds with your prior suggestion. There was discussion on this fourth options, however, and it seems like the discussion there died down (perhaps it was centered around old style r"" syntax).

Or do you mean the one using single quotes with an intermixed delimiter like this one? Would this be a distinct fifth option, or the same as the third?

Or do you mean the approach using single quotes without delimiters here? These fifth and sixth approaches do rely on claiming single quote syntax, which is ballooning the scope a bit, and are mentioned in the alternatives considered section.

Or do you mean the >4 double quote suggestion here? This sixth approach looks promising, but it can't be the third design you're talking about because I didn't see several people discussing it.

Fortunately, ad-hominem attacks against the author are invalid criticisms of a proposal. Similarly, ad-hominem attacks against a reviewer are invalid criticisms of a review.

@johnno1962, could you update the alternatives considered section? What do you think about some of these alternatives?

I think I’ll let this thread run it’s course now and try to encapsulate it in the un-merged pseudo-proposal for the core team to evaluate at the end. There isn’t much more I can add unless anybody has a specific question.

@johnno1962, how about a way to choose the escaping ASCII character (e.g. $ instead of \).

This might be in the form of some extra syntax immediately after the opening """ delimiter (which is currently disallowed by the SE-0168 rationale).

It could remove the need for custom delimiters, because a literal """ could still be escaped (e.g. $""" instead of \""").

Interesting… but doesn’t it just shift the problem from \ to $. If you're having trouble pasting in something that might be containing “”” or “””) you have to look again at why you're doing it.

The entire class of possibilities involving in-line custom delimiters, with some rules about what constitutes a delimiter. You listed and linked to several of them.

@johnno1962, this would be an alternative mode (rather than a raw mode) for string literals.

The idea is that you'd be able to use:

  • a literal \ without the need to escape it as \\, and
  • special characters (e.g. "Tab$tSeparated$tValues" ), and
  • Unicode scalars (e.g. "Cafe$u{0301} au lait" ), and
  • interpolated expressions (e.g. "Life of $(Double.pi)" ).

For each string literal, you'd need to choose an escaping ASCII character which rarely (or never) appears.

For regular expressions, this might be a tilde ~; for Windows file paths, this might be a forward slash /.

I’ve lost track of what problem we’re trying to solve.

If the problem is that it’s difficult to have to remember to escape characters when we’re typing or pasting a string, isn’t that something that can be dealt with by the IDE or text editor? E.g. we could type or paste a string and ask the IDE or text editor to escape it.

If the problem is that strings with escape characters are harder to read - that's not something that bothers me particularly, but other people might feel differently.

Jeremy

1 Like

@benrimmington, as you stay this would be an alternative mode rather than a raw mode which has no escaping at all. This would be a slightly different thing than what this proposal is trying to address.

@jeremy you may be right this proposal has rather lost its way.

There are two use cases: The first, pasting in text such as JSON test data where you don’t want to have to worry about escapes or interpolation could be dealt with by the IDE as you say but that would seem an even more complex solution that doesn’t exist. Why not put it into the language?

The original use case which is strings with escape characters in it (which lets face it are likely to be regular expressions) the utility of the proposal has been rather eroded since we moved to #raw(“\w+”) or #rawLiteralString(“\w+”) to the point where I would be unlikely to use it to save any keystrokes even if the literal was clearer to read.

Looking back, it were just me I would use the Python r”\w+” syntax and keep things simple and lightweight as originally proposed even if this isn’t “Swifty".

  • What is your evaluation of the proposal?

-1 in it's current form.
But +1 for fixing the Regex issue.

  • Is the problem being addressed significant enough to warrant a change to Swift?

First class Regex is a big enough problem.
But simple raw mode doesn't address the issue completely.

  • Does this proposal fit well with the feel and direction of Swift?

No, at least not in the proposed syntax.
Prefixed string is an extra concept for beginners.
And it doesn't match the current style of special literals like #fileLiteral.

  • If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?

Python's r"" raw String modifier can be combined with Python 3.6's f"{var}" interpolation modifier. So that fr"\r\n{1} == "\\r\\n1". Since Swift Strings are interpolated by default, raw Stings should also be interpolated to minimize user's surprise.

IMHO, Python's raw String is a workaround for the lacking of first class regex. It rarely seen use outside of Regex(and Windows path), some even think the r modifier stands for Regex instead of Raw.

On the other hand, Ruby has both first class regex and string with custom delimiter.
In first class Regex, \ in /\d/ doesn't need to be escaped.
For custom delimite string, %("quote") == "\"quote\"" == %<"quote">
Both of them can be interpolated, just like plain String.

This proposal leans to Python's side, which is not as good as first class Regex.
Also the lack of default interpolation is a concern.

  • How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

Read the proposal. Rough read of the review thread.

-1 for the proposal, but generally positive for the concept

I can see the value in it

No. r"foo" does not fit into Swift at all, and I'm strongly opposed to this spelling.

Moreover, I feel that this proposal/review process has been highly irregular and have concerns about the apparent sense of urgency of addressing this proposal. We should take our time and get this right, especially within the context of general String interfaces & features.

Read through the various versions of the proposal and this thread.

I still think that there has to be a bigger picture before introducing new types of string literals, but there's an alternative to the odd r"" spelling that afaics hasn't been mentioned yet:
Instead of putting stuff infront of a "regular" string, you could put something right behind the opening quote.
"\%\(self)", for example, isn't a valid string in Swift, and there are many other special and regular characters which form invalid escape sequences.
I doubt there is any intuitive choice to denote a raw string, but it's not worse than "r" and less heavy than #rawstring().

1 Like

This is an interesting idea helping solve the aesthetics and it feels more like something belonging to the string. Extensible and unobtrusive. What about using a letter like:

 “\qa string with \\’s”

 “””\q
    a string with \\’s
    “””
1 Like

At least the syntax looks like a regular string (unless you know the valid escape sequences really well ;-)
If only there would be an obvious choice for a sigil...
Imho a letter has the downside that it is mixes with regular characters in the payload.
Alternatively, a sequence of characters ("\q:Actual string here") could be used instead of a single char; it would even be possible to define a delimiter string, like it was suggested with other spellings.

1 Like