SE-0200: Enhancing String Literals Delimiters to Support Raw Text

What is your evaluation of the proposal?

+1

It is reassuring that # delimiters have already been tried and tested in Rust.

There aren't many alternatives available in ASCII:

  • dollar signs (e.g. $$" echo "$PATH" "$$ ) might be embedded too frequently;

  • underscores (e.g. _"{ "id": "\_(idNumber)" }"_ ) might be too lightweight.

Is the problem being addressed significant enough to warrant a change to Swift?

Yes.

Does this proposal fit well with the feel and direction of Swift?

Yes.

How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

A quick reading of the proposal.

This is a solid proposal — it works, it’s visually pleasing, and it’s easy to remember. I appreciate the research that went into it and the comprehensive writeup. In particular, I like that it still allows for string interpolation.

One aspect gives me concern — multiline strings. The #””” symbol doesn’t seem to fit in with the rest of the proposal. What about using the same delimiter even in the multiline case? E.g., if #” (or ##” or ###”, etc.) is the last symbol on the line, then the literal would be assumed to be multiline and the compiler would look for the closing delimiter on a subsequent line. This would would make the proposal easier to remember.

•	What is your evaluation of the proposal?

+1

•	Is the problem being addressed significant enough to warrant a change to Swift?

Yes

•	Does this proposal fit well with the feel and direction of Swift?

Yes

•	If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?

Better

•	How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

I read the pitch discussion and the final proposal.

A small change could be made to the implementation to not terminate the string if the closing delimiter (in this case ”#) is followed immediately by another hash. Would this make sense?

It could be done that way, but it would be more difficult to reason about. It would also introduce a security issue because a non-printing character could then invisibly change where a string ends.

That's an interesting remark about security. You know this is already a problem with today's strings right? I can even fool the the syntax highlighter here:

print("""
Validating password...
"​"")
guard user.validatePassword(password) else {
	fatalError("get out!")
}
print("​""
Password is valid!
""")

I'm not sure what the solution is.

I believe non-printing and zero-width characters are well-defined Unicode categories; we could not allow those to appear immediately before or after string delimiters, maybe. While more complicated, I feel like the rule @johnno1962 proposed is more in line with what I'd expect the syntax to do.

1 Like

I might expect it to be a warning, perhaps. I'm not convinced that the string terminator should ever not terminate the string.

(Is there any context in Swift in which it is legal for two strings to be immediately adjacent, e.g. "foo""bar", or immediately followed by a compiler directive, e.g. "foo"#bar()?)

A similar situation occurs in something like """foo "bar"""", where if you have more than 3 "s at the end of the string one might expect the string to be delimited by the final three quotes rather than the first.

1 Like

That gives error: multi-line string literal closing delimiter must begin on a new line, so I don't think there’s a precedent there to follow one way or the other.

In my """ example above, the zero-width space I added is neither in the non-printable category nor is it outside the string. You can construct a similar example with:

print(##"Validating..."​##); try validate(password); print(#​#"Password is valid!"##)

As long as the delimiter is more than one character, you can split it with something invisible and the fake separator becomes part of the string, along with the code between the two strings.

This is more subtle with multiline strings though, because otherwise the code to disable must be on the same line to avoid a syntax error. Note how I had to put everything on one line in this last example.

It seems hard to reason about if the string delimiter doesn't necessarily close the string. This seems like a largely theoretical concern, since I don't know in what context you would be trying to write #" ###""### "#, so I don't think it is worth complicating the implementation and mental model.

I think it might also make mistakes and diagnostics more confusing to users, because accidentally closing your string with too many # characters will wrap the rest of the code in the file in a string, and probably give you non-local errors instead of an error pointing right to the stray #. The current multi-line string implementation, for example, gives you error: unterminated string literal (pointing at the start of the string) and error: expected '{' at end of brace statement (pointing at the end of the file) if you fail to terminate it. This might be possible to improve with some heuristics, but it's inherently difficult because someone might be using raw strings to hold Swift code, which makes it hard or impossible to know where the end should be.

3 Likes

While any potential security problem has to be taken seriously, concern here seems a little overblown. If you’re editing in Xcode the Syntax highlighter isn’t fooled for a second as it uses the same code as the compiler.
Untitled
This problem isn’t really related to the topic at hand though and is a weakness of any multiple character delimiter.

After more thought I support the change to processing termination put forward as #" "######" “# seems to be something people expect to work. Re confusing people the error messages when you try to use this string without the change are more confusing than those generated by accidentally adding an extra # to a string that should terminate. What tips it for me is that of the two interpretations one seems to be one the people naively expect and the other something that could never be valid Swift so why shouldn’t the compiler choose the an interpretation that compiles.

I guess the Core Team can make the call on this one.

1 Like

edit: I just realized I missed the review window, my apologies for the noise.

original belated review
  • What is your evaluation of the proposal?

Strong +1.

(I support the suggestion to refer to this less as "raw strings" and more as "customized delimiter/escape", to avoid confusion)

  • Is the problem being addressed significant enough to warrant a change to Swift?

Yes, delimiter/escape control is the next logical extension to string literals and important to the language.

  • Does this proposal fit well with the feel and direction of Swift?

This is the most Swifty solution to this problem I've seen. It generalizes delimiters/escapes into a simple and logical approach, while current syntax (common case) is just a zero-#.

This is an elegant extension, as @jrose mentioned. It was worth the wait and long threads to arrive at this solution.

  • If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?

This compares favorably to any other approach I've seen. Delimiters are symmetric and balanced, escapes are obvious and supported. All this without requiring a new kind of literal.

  • How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

Reading of earlier iterations and relevant threads over the years, investigation of other languages.

1 Like

A revised rule that works out as you suggest might indeed be superior. However, as you point out, it strictly takes what would be invalid code today and makes it valid, so there is no reason it has to be considered now.

Moreover, a similar situation applies to multi-line literals, as Joe Groff points out. For that matter, since this proposal is a generalization of existing string literal syntax, the same rule ought to apply to them as well, and in that case it would require another rule as to the opening delimiter also:

" Hello, #" #### "# World! "

This is all to say that I believe such a change ought to be considered not as a last-minute change here but as a standalone proposal. It should apply consistently to all types of string literal, and the consequences of the change (such as the security issues we’ve just discussed) should be considered more thoroughly along with any necessary mitigations.

It is not enough that one IDE can highlight the code correctly. That is not in question. The issue is that readers have absolutely no hope of parsing this correctly, and tools that don’t rely on compiler facilities to highlight code may or may not be able to help the reader. (Nor, mind you, is syntax highlighting alone good enough as the sole defense against a security issue; color changes alone—even if consistently rendered in all contexts—are insufficient as indicators of critical information.)

This issue applies to all multi-character string delimiters, and so does the suggested change in parsing rules, as I discuss above. They are inextricably linked and ought to be considered together. Just because we already have a security issue with multiline string literals doesn’t mean that we ought to extend it further.

That said, I can see a straightforward mitigation: without resorting to errors or warnings, non-printing characters should be ignored in parsing string delimiters (or almost anything else in Swift, for that matter—Apple documentation inserts invisible optional line breaks between words in camel-case method names for better line breaking: it should be possible to copy and paste method names from the documentation into one’s code and have the optional line breaks ignored, although it’d be important then not to break the line there).

Again, I think these issues are well deserving of their own review, as it’s a sufficiently large topic and strictly an enhancement to this proposal.

3 Likes

Except that we have a review open (just?) for which this is relevant. There doesn’t seem to be general support for the idea anyway so we can quickly reach a decision point and move on. The security issues need to be addressed though, at length but separately as a bug in the current implementation. I don’t see the two questions being that tightly coupled.

Having looked at what would be involved trying to accommodate zero width characters inside a delimiter I’d not recommend trying to ignore them but check for any shenanigans and raise an error.

If this is true then surely it's just an artefact of the current implementation that should be improved. It seems inherently easier to diagnose and point directly at a stray/extra # outside of a string literal than it is to try to find where a user accidentally wrote an incorrect delimiter and didn't close a multiline string.

Fair enough, I’ve given up on the “no # after closing delimiter" idea and added a diagnostic. We’re beginning to look at the security problem in the PR to see if we can find a solution to put forward. The first step is find a way to determine if a given unicode point is zero-width/invisible and the ICU library doesn’t seem to have an api for this. This list needs to be complete. One approach is by rendering attributed strings to determine the set ahead of time. Does anybody know a better way?

I think @xwu is right about this being a more general issue that should be tackled holistically. For example, as far as I know identifiers are still not even normalised yet, so there are much more fundamental issues here:

let café = 1
let café = 2
print(café) // 1
print(café) // 2

There would ideally be a consistent set of normalisation/parsing rules that deals with these kind of issues (e.g. should zero-width/invisible characters be uniformly ignored?). There have already been several discussions about these issues, and @xwu mentions a draft proposal in the PR you linked.

2 Likes

Hello all,

This proposal has been accepted. Thank you, everyone, for participating! [Accepted] SE-0200: Enhancing String Literals Delimiters to Support Raw Text - #2

Doug

6 Likes

The implementation has been merged and is available in the swift.org nightlies if you want to help find some bugs.

I guess we can finally close this radar!
http://www.openradar.me/17970377

TTFN

8 Likes