SE-0200: "Raw" mode string literals

  • What is your evaluation of the proposal?

-1

  • Is the problem being addressed significant enough to warrant a change to Swift?

It would be nice to not have to escape regexes, but I think a more holistic proposal regarding string literals in Swift would be better. One that can address the shortcomings of the current multiline literals, introduces raw literals, and makes them work together well, as well as with the current simple string design. Until then, all we're doing is adding more and more complexity to string literals in Swift.

  • Does this proposal fit well with the feel and direction of Swift?

Although the current direction seems making string literals more complex in addition to more powerful, I'd like to see something that unifies all of the representations together.

  • If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?

I'm not sure, since I've never needed it in the other languages I've used. Perhaps because they have distinct raw characters?

  • How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

Read the proposals, read this thread.

What is your evaluation of the proposal?

-1. I agree with previous comments that we need to have first-class (typed) support for regular expressions, that // would be the right direction and that r"" will be a very confusing and incoherent syntax for "raw strings".

Is the problem being addressed significant enough to warrant a change to Swift?

For strings which are not regular expression patterns ā€“ no, in my opinion, it's not. For regular expressions ā€“ yes, but again, it should be considered as a separate problem.

Does this proposal fit well with the feel and direction of Swift?

No. The r"" syntax is very inconsistent with the feel and direction of Swift.

If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?

Yes, I did. And I think it's not a good syntax.

How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

Read the proposal and some comments in this thread.

OK.. so nobody likes rā€a stringā€ as a syntax judging by the number of -1ā€™s coming through.

Iā€™m surprised there wasn't more take-up on @Tinoā€™s excellent suggestion yesterday:

 ā€œ\qa string with \\ā€™sā€

 ā€œā€ā€\q
    a string with \\ā€™s
    ā€œā€ā€

There is an Xcode toolchain implementing it available here. Iā€™ve taken it a step further and there are two modes: \Q which processes absolutely no escapes and \q which processes no escapes but still allows interpolation which while it might sound complex is probably what you really want.

By way of a summary of the thread so far, the toolchain supports:

print(r"SE-200" )
print(r"""
      SE-200
      """ )

print(#raw("SE-200"))
print(#raw(X"SE-200"X))
print(#raw(<"SE-200">))

print(#rawStringLiteral("""
    SE-200
    """))

print("\qSE-\(200)")
print("\QSE-200")

print("""\q
    SE-\(200)
    """)
print("""\Q
    SE-200
    """)
1 Like

I really donā€™t like having the modifiers be inside the quotation marks. it looks like the \q is part of the string contents instead of an attribute of the literal. i donā€™t get the hate towards r" i used it in python and never had a problem with itā€¦

4 Likes

I do think it's a little amusing that the Commonly Rejected Changes page actually references this use case as a preferred fit for single quoted strings...

2 Likes

Iā€™d rather see this succeed as a use for '': Prepitch: Character integer literals

2 Likes

Clearly I need to pay closer attention to evolution if I missed this coming!

I definitely like the feature. I think the r"contents" syntax is fine; the \"contents" syntax is an intriguing alternative (I read it as "backslash everything in this string"), and the #raw("contents") syntax is probably fine too. I don't like the longer alternatives to #raw; we must remember that identifiers like #imageLiteral aren't actually intended to ever be seen by a user, while this syntax would be written by hand.

A few people have complained that they don't think #raw totally explains the construct, but I don't really buy that argument. Literal syntax is part of the language grammar; it's something you should learn and eventually remember. #raw is searchable without being obstructively verbose. The backslash and leading-identifier syntaxes are harder to search, but they're at least visually distinctive enough to know that you're seeing something a little unusual.

I think the alternate delimiter question is severable and should be considered separately, but for my part, I would support single quotes as an alternative delimiter. Swift inherits many things from C, but its string handling more closely resembles scripting languages (especially Perl 6), which usually treat ' as an alternate delimiter. Besides, if there's one thing about C I wouldn't emulate, it's C's treatment of strings.

We could also consider additional options for string delimiters (I think that at one point, I suggested that any odd number of " or ' characters should be treated as a valid delimiter, and the single-line/multi-line behavior of a string should depend on whether the opening delimiter is followed by a newline or not), but I don't think we need to decide that now.

4 Likes

If it's not too late I want to pitch a different approach. Everyone here is talking about another verbose styled #rawString or a way to reuse the good old " and ' characters as delimiters for the raw string. A lot of you guys already write a different type of string (visually speaking) in this forum by using the markdown way with back-ticks. The only issue with my pitch is that the single back-ticks are already taken in Swift for something like

case `default`

so we cannot create a perfect symmetrical raw string literal to the existing " and """ versions, or can we? (Someone please clarify if this still could be made unambiguous.)

At least the multi-line rawstring literal would be possible, and most of us are already used to this:

let rawString = ```
write whatever you want in here
``` 

Ideally the other literal would look like this (but I'm not sure if it's possible due to the fact from before):

let rawString = `write whatever you want in here`
3 Likes

An intriguing suggestion. How would this handle wanting to put backticks inside the raw string?

Well that's the thing a purely raw string isn't possible with this approach and I would advocate against completely swallowing the backslash in the multi-line string literal because someone, like myself, has linter rules where exceeding the 80 character line width is an error. Therefore I need the trailing backslash to wrap a string literal. Even in the raw string literal it would be useful.

That said I'm against this aggressive change:

The \ character would loose it role as an escaping introducer altogether.

Instead I would prefer to weaken it and call the whole thing a semi-raw string.


@davedelong: To answer your qustion, I'd go with the same rule we currently have with the multi-line """ string literal if that's possible.


At least this is how I would imagine it. If there is a better solution using back-ticks then go for it. ;)

The same way you add double quote in a r"" string. You can't.

In my opinion, it's a worthwhile goal to make sure that any raw string approach allows for a custom delimiter.

Maybe it would be something like:

let raw = ```q
this is my raw string with ``` in it
q```

Putting a letter after the opening triple-backticks could be a way to specify the delimiter. (it doesn't have to be q; that was just an example)

1 Like

This is interesting. In that case why not include full support for code highlighting. Creating and highlighting code blocks - GitHub Docs

I haven't check the latest version of swift but I remember that swift doc comments accept code block. Swift Documentation - NSHipster

I am +1, As long as we can still support inline tests using the markdown block comments like rust does. Documentation - The Rust Programming Language

#raw(`this should still allow ` inside here `)

It seems like, ultimately, any kind of delimiter character chosen would not be completely failproof. There is one solution that could allow any character and be fine with it... if the actual string was in a different file that is referenced altogether. This would be more or less #fileLiteral though... What are the use cases that make us want a diverse-enough range of strings that no character can be considered safe, but still not large or separated enough that it warrants a separate file?

2 Likes

Well currently there is an issue that would speak against it.

Other than that, I don't see any other reasons against that approach and I think it's worth thinking about it.


Edit: What if you wanted the string to be file private to not expose it to the whole module?

Thread wrap up

Tomorrow is the end of the review period so Iā€™d like to try to summarise what I have taken from the thread.

Iā€™ve published an updated version of the proposal for discussion here.

Summarising the story so far, this proposal puts forward a syntax for raw string literals which are strings where \ is passed through to the literal and has no special escaping powers. The suggested syntax is now #raw(ā€œa string with a \ā€) rather than rā€a string with a \ā€ as in the original proposal merged into swift-evolution master at the beginning of this process.

This version of the proposal includes ā€œcustoms delimiters liteā€ as originally put forward by @hooman at the beginning of the thread and later echoed by @Erica_Sadun.

Removing the ā€œclosing would be reversedā€ so your favourite emoji works, this would be a valid raw string:

print(#raw(šŸ¤”"SE-200"šŸ¤”))

In addition, the new version of the proposal includes what you could call ā€œinterpolating raw stringsā€ which ignore escapes except for the sequence \(ā€¦) which can be used to interpolate. I propose the following syntax with double bracketing of the literal to turn this variant on:

print(#raw(("SE-\("200")")))

Revisiting the code which prompted me to start this process I realised it doesnā€™t take long before you find yourself needing this feature and the sequence \( is sufficiently rare outside Swift that this shouldnā€™t be a problem.

You can combine raw literals, custom delimiter syntax and interpolation with multi-line string literals

print(#raw((šŸ¤”"""
    SE-\(200)
    """šŸ¤”)))

Multiline raw literals would follow the same indentation removal rules as before.

That about covers it unless there is something Iā€™ve missed. There is a Xcode toolchain with an implementation available here

Iā€™m now firmly of the opinion that #raw(ā€œa literalā€) is about is good as weā€™re going to get. With any raw literal there is absolutely no way to escape the delimiter, so all single character delimiters (whichever quote you use) are simply not useful in the general case. This excludes rā€a stringā€, \ā€a stringā€, ā€œ\qa stringā€, 'a string' from contention. You get many more options if you move to #raw(ā€œa stringā€) which has, at a minimum, a closing delimiter of ā€œ) and you can now extend that using custom delimiter sequences.

To give the core team something to go on, Id like to call a straw poll on this new version of the proposal. Please respond what you would like the outcome to be:

+1: Accepted with revisions - New version of the proposal looks good.
-1: Reject - I donā€™t see the need for the feature

Apologies for the slightly chaotic progression of the review but I really feel we have explored the space quite thoroughly now and been able to put in front of the Core Team enough information for them to make a decision.

1 Like

It's not the most lightweight choice, but all compact solutions (like single quotes) would fail for strings that contain that specific character, so I think some verbosity can't be avoided. Additionally, it would be straightforward to introduce variations with different rules.
In isolation, it seems like the best solution, and I guess we'll have to accept the uncertainty of how a yet to be defined regex-literal will interfere with this.

Could you update the example so that it illustrates the explanation?

#raw({[?</"how will this be closed?

I think that would be #raw({[?</ā€œa stringā€}]?>/) in the current implementation which isnā€™t ideal.
Difficult to see an alternative if you want good emoji support.

Well I donā€˜t think emojis are the most important use case for string delimiters. But even when someone wants to use these, we just have to teach the compiler to treat the emoji as a single character and not reverse the unicode scalars.

2 Likes

Iā€™m saying the characters of the delimiter should not be reversed. Iā€™d rather see:

#raw(DELIMā€a stringā€DELIM)

than

#raw(DELIMā€a stringā€MILED)

This also allows emojis to work which are notoriously difficult to segment..