SE-0200: "Raw" mode string literals

Not r"...", no.

Well, OK, the psuedo-pitch/review phase we just had where @johnno1962 revised the proposal to use #raw, possibly with a custom delimiter.

Interesting argument against raw string literals.

If the idea is to have the IDE show you the raw string in some kind of bubble and hide the true syntax, then we don't need a syntax for raw string literals. The IDE is perfectly capable of inserting the \ escapes for you and hiding them when editing. It could show nested bubbles of some kind when you want interpolation or special characters, making this applicable to all strings.

If the UI for that is done well, you'll want to use it for all strings. And if it's done badly you'll never want to use it, including for raw string literals.

Requiring an IDE to hide how truly ugly and hard something to use actually is sounds like an indication that the underlying thing has something inherently wrong with it.

1 Like

No, this is emphatically not the idea. Raw string literals are plenty capable of working on their own, as is, without an IDE. I'm just saying that Xcode can choose to show it differently as it does currently with other literals.

No. We need this feature to work with people not using IDEs, and it will have to have a standard representation that can be easily read.

Like I've said, IDEs are here to improve the experience. This doesn't mean that the current proposal is "ugly" or "hard". I don't understand where the hesitation around this is coming from, given we already have a syntax for file, color, and image literals that work very similarly and have visual "sugar" in Xcode.

@saagarjha makes a good point that #raw isn't the most descriptive name. While renaming to rawLiteral might add clarity, it just emphasises the point that we should have real literal syntax for this.

Raw strings don't fit the current #___Literal model, which are currently used to represent literals with relatively magic-less #function() markup, denoting where the compiler needs to add custom UI. A custom UI would not be helpful for strings.

I like @hooman's suggestion of a more compact syntax, using ( and " to wrap the delimiter, although it's not the most intuitive design, and can't really be discovered through autocomplete.

If we're looking for a more lightweight syntax, we should just introduce a new literal, along the lines of:

'delimiter'some.code("hello")'delimiter'

Likewise, this can't really be discovered through autocomplete. It's also not very discoverable, but advanced literals generally aren't for newcomers.
However, this is concise, fairly clear (highlighting should help), and as a literal, the magic is precedented.

I think it's possible to provide UI for this: maybe a text field that asks for a delimiter and a text area that lets you paste your raw string in. It might also be able to warn you if your string contains the delimiter you've chosen.

We could, but I'm not sure we should.

We could have a text field for each parameter of a function call. Or a dropdown for #os(_) checks. (Actually, that last one sounds quite handy)

But the only real benefit from custom UI, in the case of raw strings, is checking whether the delimiter is present, as you describe. It seems more of a convenience than something we should base the syntax around.

Honestly, I think it was a mistake that I even brought up IDE support, since it's just caused this discussion to fall off track from its original goal: discussing the syntax of raw string literals. If it's possible, I'd like to take back everything I've said about IDE assistance and make the case that the string literal syntax is good even without IDE support, and follows the syntax of the rest of the literals we have currently. As a refresher, here's what they look like today:

#fileLiteral(resourceName: "/dev/null")
#colorLiteral(red: 0, green: 0, blue: 0, alpha: 0)
#imageLiteral(resourceName: NSImageNameMobileMe)

Based on this, I think #rawStringLiteral(delimiter:_:) is the way to go. It would be used like so:

#rawStringLiteral(delimiter: "DELIMITER", DELIMITERthis is a raw stringDELIMTER)

This provides the benefits of matching the syntax of the other literals and enhancing clarity due to its relatively descriptive name. Also, it allows for essentially any delimiter to be used, and passes the test I mentioned earlier.

Eagle-eyed observers may have noticed that I've changed my suggestion somewhat from the original #rawLiteral(delimiter:value:) syntax that did not have the delimiter be a string. After consideration, I changed it because:

  • Someone mentioned #raw (and presumably #rawLiteral) aren't very explicit on what they construct. #rawStringLiteral clearly produces a string.
  • delimiter is now a full-fledged string because otherwise it would have made including , in the delimiter very hard on the parser to support. This string can be a normal string, in which case normal escaping rules apply, can be a multi-line string, or can even be a #rawStringLiteral itself.
  • I dropped the value: label because I didn't feel it was providing enough value (irony not entirely unintended).
1 Like

As I said earlier:

In this case, #rawStringLiteral doesn't represent the literal, it is the literal. Having Literal in the name seems redundant.

I don't think I'm getting you. What makes raw string literals different from file or color or image literals? As far as I'm concerned, they're all psuedo-functions that convert a string of characters in the source file into a literal type.

#rawString(delimiter: "###", ###some.code("hello")###)
and
'delimiter'some.code("hello")'delimiter'
are fundamentally just different spellings of the same thing. They're literals, they have magic rules which allow a string to use a custom delimiter.

Currently, #___Literal is used when true literals cannot be represented in text. The actual appearance of the literal is left up to the IDE. The textual representation can be verbose, since most users wont see it.

Here, #rawString is the literal, what users will be expected to manually write out, and later read. This should be the main consideration in finding a spelling.

This isn't just a function with custom UI, this is a fundamental syntax which needs language support.

For this reason, I think we shouldn't be looking to #___Literal spellings at all, but rather existing, first-class literals.

2 Likes

I'd actually say that 'delimiter'some.code("hello")'delimiter' needs #rawString around it as well. What it's representing is not something that can be represented in normal text. The Swift compiler is still performing a transformation from 'delimiter'some.code("hello")'delimiter' to some.code("hello"), just as #colorLiteral(red: 0, green: 0, blue: 0, alpha: 0) ends up becoming NSColor(srgbRed: 1.0, green: 0.0, blue: 0.0, alpha: 1.0). "Text" and "raw strings" look very similar, but they're not the same thing because raw strings are meaningless without a delimiter.

I'm afraid I can't follow your reasoning.

The difference here is that #colorLiteral(red: 0, green: 0, blue: 0, alpha: 0) could very well just be a function, implemented differently based on platform. The IDE doesn't need the # there to apply custom UI, although it helps resolve ambiguity. In other words, despite the name, these aren't actually literals.

A raw string cannot be represented as a function, otherwise we wouldn't need this proposal.

1 Like

One very simple solution is to extend the multiline string syntax to allow any number (at least 3) of double-quote characters as the delimiter:

let myVeryExcellentString = """"""""""""
    let address = """
        John Jacob Jingleheimer Schmidt
        555 Main Street
        Lake Wobegon, MN
        """
    print("\(address)")
    """"""""""""

If at least 4 double-quotes are used, then it is a raw string where nothing is escaped.

4 Likes

Trying to summarise this discussion, I would say there is an underlying tension between people who:

  • Think these strings will be rare, so are happy with some semi-verbose to verbose marker (#rawString, #rawStringLiteral, #stringLiteralWithoutEscaping(customDelimiter:, …), several dozen quotation marks in a row plus the string has to be multi-line, etc).
  • Think that these strings will be common, so want a concise marker (r"…", \"…", '…', etc).

#raw is probably somewhere in the middle there, but it's unclear if that puts it in the Goldilocks zone or no man's land.

As noted by several people, including the author, the original proposal didn't really do itself any favours here because it lacked good examples, especially since it's generally agreed that regular expressions deserve special attention of another form. The updated proposal has some more examples, which are somewhat helpful, but aren't really definitive for me.

A related issue is the question of how many special string forms Swift should have. If you want several more, then a syntax that generalises would be preferred (e.g. verbose: #specialString(arguments:…), concise: r"…", s"…", t"…", …). If you think raw strings are about the last form needed then something simpler will suffice (e.g. verbose: #rawString(…), concise: \"…", '…' if it's not reserved for character-like things).

7 Likes

Thanks for this. It is a very concise summary of exactly the status of this review.

I’ve not been able to get the new version of the proposal merged so we’ll have to proceed as is. The bulk of the multi-line string review was discussed with reference to @beccadax’s excellent rewrite which was never merged which was a shame.

The current version of the proposal I have PR’d is here now

As before if you can think of any worthwhile updates please file a PR.

Very few changes with respect to last revision except I have added that the proposal does not put forward custom delimiters due to their complexity.

I am a strong -1 because of this.

Custom delimiters are a minimum requirement for in-source raw strings. The only other viable option is to externalize raw strings into their own files, and introduce syntax for assigning the text content of a file to a variable at compile-time.

The best possible outcome for this proposal now is that it gets returned for revision. Second-best would be “Rejected on the specifics, but the idea itself is not rejected.”

1 Like

If @johnno1962 were to change that to allow a single-character custom delimiter as suggested upthread, namely:

#rawStringLiteral(X"unescaped raw string"X)

where X could be any Character chosen by the end-coder and known a priori to not be included in the literal string (with a similar version X""" and """X I suppose), would that flip your negative response?

1 Like

Any Character? You mean like this one? C̷̙̲̝͖ͭ̏ͥͮ͟

I suppose one Character is technically sufficient, but it would take a lot to convince me that it is the best option.