SE-0200: "Raw" mode string literals

This restriction would be annoying if I were, say, making a string that contained the set of printable ASCII characters, because I wouldn't be able to use anything from the Basic Latin Unicode block as a delimiter. Plus, at a glance it's hard to make out what comprises the raw string in this example, IMO:

  • It's not clear to me what is "raw" about this string representation. I get that it's raw in the sense of being uninterpreted, or unescapable, or whatever word describes it, but I think "raw" tends to suggest some numeric sequence, such as code units.

  • All the syntax seems kind of arbitrary and desperate. It's another of those scenarios where using words to spell out the meaning is clear but too much, and using special symbols is perpetually obscure.

We already have some arbitrary and desperate string syntax: multi-line literals. Is there some reason we couldn't recycle that approach? For example:

    print ("""a\bc""") // prints: a\bc

The compiler can currently recognize both these triple-" delimiters (it gives me explicit errors about why this is a malformed multi-line literal). Now that those are part of the language, single-line "multi-line" literals might not be too scary.

That syntax would largely eliminate the problem of not being able to have literal quotes:

    print ("""a\"b"c""") // prints: a\"b"c

The only exclusion from string contents would be triple-" substrings and line endings, for which a true multi-line multi-line literal would be the solution.

1 Like

I think a good test of many of these designs is: can I put the source code of the average Swift program into these? It's surprising to see how many of these examples fail this test. Unfortunately, multi-line literals are one of the ones that don't pass.

1 Like

I'm sorry, I don't understand what you just said. What fails?

It's implicit in what I suggested that the contents of the "raw" string has no newlines. If it does, you may as well use a true multi-line string anyway.

Hmm, why not? I don't why we can't have newlines in this string.

This is the part that fails. I can't paste the contents of a Swift program containing its own multi-line strings into a multi-line string and have it work.

Oh, I see what you're saying. However, none of the syntax suggested out in this discussion, including both the proposal document and your earlier suggestion, automatically meet this test. They require a choosable delimiter, and a careful choice of outer delimiter, to avoid a conflict with inner ones — not an entirely trivial task when enclosing a body of code.

(IIUC, true multi-line literals do meet the test, kinda, provided they get extra indentation when pasted. I think.)

So, yes, if nestability is an additional requirement, the syntax must provide for custom delimiters.

But you'd first have to convince everyone of the need for such a requirement, and I don't see we're there yet.

I agree with you that this is probably a requirement. Multi-line string literals, as they are currently designed, are basically literals with a delimiter of """, which unfortunately makes them not suitable for the task.

Sure, but they at least make it possible to do so.

That's what the entire pitch phase was about, no?

Not r"...", no.

Well, OK, the psuedo-pitch/review phase we just had where @johnno1962 revised the proposal to use #raw, possibly with a custom delimiter.

Interesting argument against raw string literals.

If the idea is to have the IDE show you the raw string in some kind of bubble and hide the true syntax, then we don't need a syntax for raw string literals. The IDE is perfectly capable of inserting the \ escapes for you and hiding them when editing. It could show nested bubbles of some kind when you want interpolation or special characters, making this applicable to all strings.

If the UI for that is done well, you'll want to use it for all strings. And if it's done badly you'll never want to use it, including for raw string literals.

Requiring an IDE to hide how truly ugly and hard something to use actually is sounds like an indication that the underlying thing has something inherently wrong with it.

1 Like

No, this is emphatically not the idea. Raw string literals are plenty capable of working on their own, as is, without an IDE. I'm just saying that Xcode can choose to show it differently as it does currently with other literals.

No. We need this feature to work with people not using IDEs, and it will have to have a standard representation that can be easily read.

Like I've said, IDEs are here to improve the experience. This doesn't mean that the current proposal is "ugly" or "hard". I don't understand where the hesitation around this is coming from, given we already have a syntax for file, color, and image literals that work very similarly and have visual "sugar" in Xcode.

@saagarjha makes a good point that #raw isn't the most descriptive name. While renaming to rawLiteral might add clarity, it just emphasises the point that we should have real literal syntax for this.

Raw strings don't fit the current #___Literal model, which are currently used to represent literals with relatively magic-less #function() markup, denoting where the compiler needs to add custom UI. A custom UI would not be helpful for strings.

I like @hooman's suggestion of a more compact syntax, using ( and " to wrap the delimiter, although it's not the most intuitive design, and can't really be discovered through autocomplete.

If we're looking for a more lightweight syntax, we should just introduce a new literal, along the lines of:

'delimiter'some.code("hello")'delimiter'

Likewise, this can't really be discovered through autocomplete. It's also not very discoverable, but advanced literals generally aren't for newcomers.
However, this is concise, fairly clear (highlighting should help), and as a literal, the magic is precedented.

I think it's possible to provide UI for this: maybe a text field that asks for a delimiter and a text area that lets you paste your raw string in. It might also be able to warn you if your string contains the delimiter you've chosen.

We could, but I'm not sure we should.

We could have a text field for each parameter of a function call. Or a dropdown for #os(_) checks. (Actually, that last one sounds quite handy)

But the only real benefit from custom UI, in the case of raw strings, is checking whether the delimiter is present, as you describe. It seems more of a convenience than something we should base the syntax around.

Honestly, I think it was a mistake that I even brought up IDE support, since it's just caused this discussion to fall off track from its original goal: discussing the syntax of raw string literals. If it's possible, I'd like to take back everything I've said about IDE assistance and make the case that the string literal syntax is good even without IDE support, and follows the syntax of the rest of the literals we have currently. As a refresher, here's what they look like today:

#fileLiteral(resourceName: "/dev/null")
#colorLiteral(red: 0, green: 0, blue: 0, alpha: 0)
#imageLiteral(resourceName: NSImageNameMobileMe)

Based on this, I think #rawStringLiteral(delimiter:_:) is the way to go. It would be used like so:

#rawStringLiteral(delimiter: "DELIMITER", DELIMITERthis is a raw stringDELIMTER)

This provides the benefits of matching the syntax of the other literals and enhancing clarity due to its relatively descriptive name. Also, it allows for essentially any delimiter to be used, and passes the test I mentioned earlier.

Eagle-eyed observers may have noticed that I've changed my suggestion somewhat from the original #rawLiteral(delimiter:value:) syntax that did not have the delimiter be a string. After consideration, I changed it because:

  • Someone mentioned #raw (and presumably #rawLiteral) aren't very explicit on what they construct. #rawStringLiteral clearly produces a string.
  • delimiter is now a full-fledged string because otherwise it would have made including , in the delimiter very hard on the parser to support. This string can be a normal string, in which case normal escaping rules apply, can be a multi-line string, or can even be a #rawStringLiteral itself.
  • I dropped the value: label because I didn't feel it was providing enough value (irony not entirely unintended).
1 Like

As I said earlier:

In this case, #rawStringLiteral doesn't represent the literal, it is the literal. Having Literal in the name seems redundant.

I don't think I'm getting you. What makes raw string literals different from file or color or image literals? As far as I'm concerned, they're all psuedo-functions that convert a string of characters in the source file into a literal type.

#rawString(delimiter: "###", ###some.code("hello")###)
and
'delimiter'some.code("hello")'delimiter'
are fundamentally just different spellings of the same thing. They're literals, they have magic rules which allow a string to use a custom delimiter.

Currently, #___Literal is used when true literals cannot be represented in text. The actual appearance of the literal is left up to the IDE. The textual representation can be verbose, since most users wont see it.

Here, #rawString is the literal, what users will be expected to manually write out, and later read. This should be the main consideration in finding a spelling.

This isn't just a function with custom UI, this is a fundamental syntax which needs language support.

For this reason, I think we shouldn't be looking to #___Literal spellings at all, but rather existing, first-class literals.

2 Likes

I'd actually say that 'delimiter'some.code("hello")'delimiter' needs #rawString around it as well. What it's representing is not something that can be represented in normal text. The Swift compiler is still performing a transformation from 'delimiter'some.code("hello")'delimiter' to some.code("hello"), just as #colorLiteral(red: 0, green: 0, blue: 0, alpha: 0) ends up becoming NSColor(srgbRed: 1.0, green: 0.0, blue: 0.0, alpha: 1.0). "Text" and "raw strings" look very similar, but they're not the same thing because raw strings are meaningless without a delimiter.