SE-0200: Enhancing String Literals Delimiters to Support Raw Text

Douglas_Gregor · August 22, 2018, 3:41pm

The second review of SE-0200: Enhancing String Literals Delimiters to Support Raw Text begins now and runs through Wednesday, August 29th, 2018. The proposal has been significantly revised since the initial review in March, 2018.

Reviews are an important part of the Swift evolution process. All review feedback should be either on this forum thread or, if you would like to keep your feedback private, directly to me as the review manager via email or direct message on the forums. If you send me email, please put "SE-0200" somewhere in the subject line.

What goes into a review of a proposal?

The goal of the review process is to improve the proposal under review through constructive criticism and, eventually, determine the direction of Swift.

When reviewing a proposal, here are some questions to consider:

What is your evaluation of the proposal?
Is the problem being addressed significant enough to warrant a change to Swift?
Does this proposal fit well with the feel and direction of Swift?
If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?
How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

Thank you for contributing to Swift!

Doug Gregor
Review Manager

anandabits · August 22, 2018, 4:39pm

What is your evaluation of the proposal?

+1. This proposal will greatly enhance the ergonomics of writing string-based code generators in Swift.

Is the problem being addressed significant enough to warrant a change to Swift?

Yes, definitely. The escape sequences necessary today can be confusing and are an unnecessary source of friction in the language.

Does this proposal fit well with the feel and direction of Swift?

Very much so. It is an extremely elegant solution that provides the benefits of raw strings without losing the benefits of interpolation. Without interpolation, raw strings used in code generation would need to be post-processed to replace variables.

If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?

I have used languages with raw strings and really enjoyed using them. I have not used a language that supports them while also supporting interpolation. This design is best in class IMO.

How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

I participated in the draft thread and pitched the idea of using custom delimiters in escape sequences that lead up to the current design. I also gave the final proposal a quick read.

ebg · August 22, 2018, 5:41pm

What is your evaluation of the proposal?
Adds a needed feature to Swift but IMO there might be better solutions out there. It could be difficult to discover and becomes confusing to understand with multiple #. Despite what the Alternatives Considered section says on user-specified delimiters, I think that direction would be preferable as it's easily discoverable, and useful beyond raw strings. There are obviously a number of edge cases to work out with custom delimiters but I think it could be a Swiftier approach, where String's initializer has a delimiter parameter with a default of \
Is the problem being addressed significant enough to warrant a change to Swift?
Yes
Does this proposal fit well with the feel and direction of Swift?
Maybe
If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?
n/a
How much effort did you put into your review? A glance, a quick reading, or an in-depth study?
Followed discussion threads

gwendal.roue · August 22, 2018, 6:40pm

What is your evaluation of the proposal?

My brain has already switched!

-let json = """
-    { "name": "foo" }
-    """
-let json = "{ \"name\": \"foo\" }"
+let json = #"{ "name": "foo" }"#

Ruby scripts that use DATA and __END__ are also a target.

Is the problem being addressed significant enough to warrant a change to Swift?

The proposal says it better than I could.

Does this proposal fit well with the feel and direction of Swift?

Yes, thanks for the much welcomed support for interpolation. The pound sign clashes a little bit with the domain of compiler directives, but if eminent grammarians are ok, I'm ok, too.

If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?

Very good

How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

I followed the very long process that ended with this proposal.

Chris_Lattner3 · August 22, 2018, 6:53pm

I am enthusiastically +1 on this design. This is simple, elegant, and a natural and orthogonal extension of the existing string syntaxes we have. I love that it incorporates escaping into a logical design.

The proposal itself is also really well written and well considered - it is a massive improvement vs the last draft. Kudos and thanks to the authors!

I'm positive on this, but weakly. I don't think it will be extensively used, but the people who do use it will benefit a lot from it. I don't think it causes harm to add, so I'm positive.

Yes.

This is so much better than anything else I've seen, the proposal includes a nice little survey.

I participated in the prior review but haven't followed any of the discussions since it was returned. I gave this proposal an in-depth study.

-Chris

pvieito · August 22, 2018, 8:07pm

What is your evaluation of the proposal?

Strong +1. Seems the best syntax once discarded using ``` as the delimiter.

If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?

Yes, r"" strings in Python.

How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

Reading this proposal, the original one and the pitches.

masters3d · August 22, 2018, 9:44pm

What is your evaluation of the proposal?

+1 for the raw delimited. -1 for adding interpolation to raw strings.

Is the problem being addressed significant enough to warrant a change to Swift?

Adding raw strings is absolutely needed.
I think adding interpolation to raw strings is misguided as it completely undermines the idea of a "RAW" string in a code base. "Hey, I thought this was a raw string why are these pound not rendering". One of the only use cases I was thinking in which raw string interpolation would be useful for me is perhaps creating a SQL string. In those cases people use a token like @MYTOKEN which then is replaced using a library that makes sure no SQL is injected. The SQL interpolation is not a good use case because of SQL injection.

There is also the issue with string interpolation having some performance issues String interpolation revamp: design decisions

If we ever wanted to add some form of string highlighting to raw string á la markdown code blocks, we would quickly find that any magical interpolation syntax will get in the way of a clean color highlighting experience.

I think we should wait on string interpolation until we have a better idea how regex will fit in the string story. Interpolation should be its own proposal once the interpolation issues linked above have been addressed and regex syntax has been resolved.

Does this proposal fit well with the feel and direction of Swift?

Yes, the raw string part only.

If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?

Maybe java's ``` , but they chose not to implement raw interpolation.

How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

Participated in the tread before and read the new proposal.

beccadax · August 22, 2018, 10:29pm

This is tangential to the proposal under review, but:

A couple of years ago, I wrote a toy project in Vapor, which led to me writing a SQL abstraction library, which led to me writing a SQLStatement against the current ExpressibleByStringInterpolation design, which led to me proposing a better one. So “creating a SQL query with parameters passed through placeholders” is pretty much the new use case the string interpolation rework is designed to enable. Not the only one, but it’s square in the middle of the design.

Point is, I think new string interpolation will work very well for SQL statements, so a raw syntax which allows string interpolation will be useful there.

jawbroken · August 23, 2018, 2:01am

Thanks to everyone for working on this, both in the discussion threads and on the proposal itself.

What is your evaluation of the proposal?

In favour. Despite what some say about the Swift evolution process, I think this is a good example of a case where the discussions produced both a great solution and a very strong proposal document.

Is the problem being addressed significant enough to warrant a change to Swift?

Yes. Extensive escaping in a string is hard to get right and harms readability.

Does this proposal fit well with the feel and direction of Swift?

Yes.

If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?

Favourably, because it allows delimiter (and escape) customisation without allowing arbitrary symbol or text delimiters which can be ugly or hard to recognise as delimiters.

How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

Read all the previous discussion threads and contributed to them.

taylorswift · August 23, 2018, 3:08am

support

yes

i think it’s an appropriate use of the hashtag, consistent with its use in the rest of the language if you squint hard enough. honestly it’s gonna take some getting used to but i don’t think we’re gonna find a better syntax and it’s not worth sending this proposal back a third time

basic r" strings in python, but i think the ability to have string interpolations is a big win since i forsee these things mostly being used for code generation and the like

been following this proposal for way longer than i wanted to

major props to the authors that slogged their way through 200+ posts of forum bikeshedding to ram this through. they deserve a medal of honor.

Scott · August 23, 2018, 3:46am

Strong +1

Yes, and I appreciate the authors and those who pushed this topic and took part in the long discussions that resulted in this design.

Yes

Yes, python, ruby, java, etc.

I think this design is nicer than all of those because it supports interpolation and delimiter customization with a light syntax.

Read the proposal(s) and pitches.

dwaite · August 23, 2018, 4:07am

I haven't settled yet on # being the appropriate character, because it feels like it might reference a string with a particular property in another language, like let s = #"foo" would have s be a StaticString type.

But in general I really like the proposal. It has flexibility and simplicity.

I was a little surprised to see here docs not in the list of (discounted) alternatives. I generally equate the two myself (raw strings and here docs - because both tend to be primarily for representing text/data embedded in code)

In the sense that it aids in embedding data, sure. Other methods of referencing and embedding data (such as compiling a text/data file into the binary such that it can be referenced as a String/Data constant) would be nice as well, as they provide a way to reference text/data without embedding it into the code.

Yep!

C#, although I've always felt their purpose was less raw strings and providing a way to type backtick-delimited paths without insanity).

Ruby, which has a ton of ways to do this including choose-your-own-string-literal-delimiter. I think this will be simpler to teach than that.

Reading the proposal and a little bit of reflection.

Paul_Cantrell · August 23, 2018, 4:14am

I’m favorably inclined to the proposal. Before thinking through a full review, a question for the proposal authors:

The proposal doesn’t explicitly mention “heredoc”-style strings, which e.g. in Ruby look a bit like a generalization of Swift’s """, e.g.:

str = <<~ARBITRARY_USER_DEFINED_DELIMITER
    foo
    bar
    baz
    ARBITRARY_USER_DEFINED_DELIMITER

→ "foo\nbar\n\nbaz\n"

This sort of approach is hardly new, and I'm sure you considered and rejected it after some thought. What was the rationale? AFAICT the proposal alludes to this alternative only briefly and obliquely in its last paragraphs.

kiel · August 23, 2018, 4:29am

My gratitude to the patient authors for their thoroughgoing and intelligent proposal. If it's accepted or rejected, Swift is better off from the discussion.

griotspeak · August 23, 2018, 4:33am

What is your evaluation of the proposal?
- 1
Is the problem being addressed significant enough to warrant a change to Swift?
Yes
Does this proposal fit well with the feel and direction of Swift?
Sorta? I am still disappointed that we are going with counting characters instead of digits after a sentinel. It seems like a decision based entirely on precedent.
If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?
see above. Counting characters is typical but not great.
How much effort did you put into your review? A glance, a quick reading, or an in-depth study?
I followed and joined in the discussion and read the proposal.

QuinceyMorris · August 23, 2018, 5:01am

This is a terribly flawed proposal.

The main problem is that its "raw string" is a lie. It's not a proposal for raw strings, nor a proposal to enhance string literal syntax to "support" raw text. Because it allows internal string escaping, there is no rawness at all. The very definition of "raw" is the absence of characters with special meaning inside the string.

You cannot have it both ways. Raw is raw, and if it's not raw it's not raw. That has been the problem with the entire history of this proposal and its many past discussions.

I simply don't understand why "rawness" is being used as a reason in favor of a syntax that's really about custom delimiters and escape characters.

If you look at the "Motivation" section of this proposal, all of the motivations are for raw strings. The "Candidates" section discusses "good candidate[s] for using raw strings". The "Utility" section asserts that "Raw strings are most valuable for …". But the "Design" section describes non-raw strings. The motivation simply does not support the design.

Paradoxically, I think Swift doesn't need truly raw ("raw is raw") strings. I think Swift does need customizable delimiters and escape characters in an enhanced string literal syntax. So, yes, I think this proposal is actually addressing the right question, but it's using the wrong argument — and I care because I think it's proposing a moderately lousy solution. (Enough with symbol soup, already!)

Could we please have a rewritten proposal (v3!) that doesn't mention raw strings as a motivation — ideally, that doesn't mention raw strings at all? Then, can we have a review period that reviews what it promises, not what it doesn't promise?

Please, please, let's not risk accept an underwhelming solution because we're all so tired or discouraged by an apparently endless discussion.

Chris_Lattner3 · August 23, 2018, 5:04am

I think it is possible that you're missing something important about the proposal. This really does provide raw strings and escaping in a simple and consistent design. If you happen to have a sequence like #( that you want to pass through unmolested (not turning it into an escape), then all you have to do is use two ##'s as the delimiters. The same is true for any other thing you could be intentionally trying to pass through.

It really is a raw string proposal that supports escaping. It's a beautiful thing, not oxymoronic.

-Chris

QuinceyMorris · August 23, 2018, 6:07am

Yes, I understand what you're saying, but you're stating the virtue of custom delimiters, and they remain just as virtuous if you leave out the phrase "raw string".

To my mind, "raw string" connotes a much stronger promise: that within the string you don't have to be concerned about inadvertently placing characters that have a structural ("non-raw") significance.

This isn't true of the current proposal. You can do this inadvertently, and you must check your string for dangerous character sequences. I think it's a terrible mistake to call that rawness.

beccadax · August 23, 2018, 6:28am

Speaking only for myself here:

Heredocs are basically an alternate multiline syntax. There are a few of these we could use—for instance, we could allow you to put more than three " characters at the beginning and then match that number at the end—and they all share similar flaws.

Let's think about the problems in a string that might make you want a raw syntax, and see how well a second multiline syntax would help with those:

Your string contains backslashes: An alternate multiline syntax wouldn't handle backslashes any differently, so it wouldn't help with these.
Your single-line string contains " characters: Transforming it into a multiline string is kind of disruptive to your code, especially if the string is short—you might turn a one-line expression into a three-line expression for a ten-character literal. But even if you were willing to do that, well, you could just use the existing multiline string literal syntax for that!
Your multiline string contains """ sequences: Sure, it would be useful here. But that basically means the second multiline syntax is only necessary when you're generating Swift or Python code which itself contains multiline string literals. That's a bit niche.
Your single-line string contains """ sequences: That's really niche, and it falls into the same "three lines of code for a ten-character literal" problem mentioned earlier.

So a second multiline syntax doesn't cover many of the use cases we care about. At the same time, it also redundantly covers many use cases covered by our existing multiline syntax; that's not great because it causes confusion about which one a user should choose. The few use cases it really does help with—strings containing """—are probably not worth adding such a large feature if it doesn't help us with anything else.

Beyond the general problems with a second multiline syntax, heredocs tend to be more difficult for tools—particularly multi-language syntax highlighters which don't integrate with their compilers—to handle correctly than other multiline syntaxes. But that's not the biggest problem; the generic problems with second multiline syntaxes are.

Here are the reasons for that decision (from my perspective):

I don't think (e.g.) 3# will stand out as prominently in a noisy string as ###.
The leading delimiter would be #3" and the trailing delimiter would be "3#. Would a backslash escape be \#3 or \3#? Will people remember which one it is?

By that standard, every raw string feature in every language you've ever used is a lie, because you always need to make sure the delimiter is not present in the string. For example, if we had "truly raw" strings delimited by ', you would need to make sure your string didn't contain any single quotes. If they were delimited by an arbitrary user-specified string, you would have to generate delimiters and check them against the string until you found one that wasn't present. Short of __DATA__ or a length-prefixed literal format, all raw strings are a fiction.

The difference between this proposal and a "true raw string" proposal is that, in addition to checking for "# before using #", you also need to check for \#. If you're generating code and you don't care if it uses slightly more escaping than necessary, you can just check for #. If not, checking for two sequences is not much more burdensome than checking for one, and I think that the extra feature set it unlocks is worth that cost.

QuinceyMorris · August 23, 2018, 6:59am

Yes, of course, but that does not make the two dangers equivalent.

If you inadvertently have the closing delimiter in the string, the likelihood that this will produce a compiler error at the point of the delimiter is extremely high (especially if the string content is NOT itself Swift source code).

If you inadvertently have an escape sequence in the string, it's easily possible that there's no syntax error, and that the consequent bug doesn't show up until much further down the road.

Anyway, that's all I have to say on the subject for now. Others have expressed their support, I'm expressing my opposition. If I'm wrong, I confidently expect to be ignored.