SE-0200: "Raw" mode string literals

Well that’s the thing a purely raw string isn’t possible with this approach and I would advocate against completely swallowing the backslash in the multi-line string literal because someone, like myself, has linter rules where exceeding the 80 character line width is an error. Therefore I need the trailing backslash to wrap a string literal. Even in the raw string literal it would be useful.

That said I’m against this aggressive change:

The \ character would loose it role as an escaping introducer altogether.

Instead I would prefer to weaken it and call the whole thing a semi-raw string.


@davedelong: To answer your qustion, I’d go with the same rule we currently have with the multi-line """ string literal if that’s possible.


At least this is how I would imagine it. If there is a better solution using back-ticks then go for it. ;)

The same way you add double quote in a r"" string. You can’t.

In my opinion, it’s a worthwhile goal to make sure that any raw string approach allows for a custom delimiter.

Maybe it would be something like:

let raw = ```q
this is my raw string with ``` in it
q```

Putting a letter after the opening triple-backticks could be a way to specify the delimiter. (it doesn’t have to be q; that was just an example)

1 Like

This is interesting. In that case why not include full support for code highlighting. https://help.github.com/articles/creating-and-highlighting-code-blocks/

I haven’t check the latest version of swift but I remember that swift doc comments accept code block. http://nshipster.com/swift-documentation/

I am +1, As long as we can still support inline tests using the markdown block comments like rust does. https://doc.rust-lang.org/book/first-edition/documentation.html

#raw(`this should still allow ` inside here `)

It seems like, ultimately, any kind of delimiter character chosen would not be completely failproof. There is one solution that could allow any character and be fine with it… if the actual string was in a different file that is referenced altogether. This would be more or less #fileLiteral though… What are the use cases that make us want a diverse-enough range of strings that no character can be considered safe, but still not large or separated enough that it warrants a separate file?

2 Likes

Well currently there is an issue that would speak against it.

Other than that, I don’t see any other reasons against that approach and I think it’s worth thinking about it.


Edit: What if you wanted the string to be file private to not expose it to the whole module?

Thread wrap up

Tomorrow is the end of the review period so I’d like to try to summarise what I have taken from the thread.

I’ve published an updated version of the proposal for discussion here.

Summarising the story so far, this proposal puts forward a syntax for raw string literals which are strings where \ is passed through to the literal and has no special escaping powers. The suggested syntax is now #raw(“a string with a \”) rather than r”a string with a \” as in the original proposal merged into swift-evolution master at the beginning of this process.

This version of the proposal includes “customs delimiters lite” as originally put forward by @hooman at the beginning of the thread and later echoed by @Erica_Sadun.

Removing the “closing would be reversed” so your favourite emoji works, this would be a valid raw string:

print(#raw(🤡"SE-200"🤡))

In addition, the new version of the proposal includes what you could call “interpolating raw strings” which ignore escapes except for the sequence \(…) which can be used to interpolate. I propose the following syntax with double bracketing of the literal to turn this variant on:

print(#raw(("SE-\("200")")))

Revisiting the code which prompted me to start this process I realised it doesn’t take long before you find yourself needing this feature and the sequence \( is sufficiently rare outside Swift that this shouldn’t be a problem.

You can combine raw literals, custom delimiter syntax and interpolation with multi-line string literals

print(#raw((🤡"""
    SE-\(200)
    """🤡)))

Multiline raw literals would follow the same indentation removal rules as before.

That about covers it unless there is something I’ve missed. There is a Xcode toolchain with an implementation available here

I’m now firmly of the opinion that #raw(“a literal”) is about is good as we’re going to get. With any raw literal there is absolutely no way to escape the delimiter, so all single character delimiters (whichever quote you use) are simply not useful in the general case. This excludes r”a string”, \”a string”, “\qa string”, 'a string' from contention. You get many more options if you move to #raw(“a string”) which has, at a minimum, a closing delimiter of “) and you can now extend that using custom delimiter sequences.

To give the core team something to go on, Id like to call a straw poll on this new version of the proposal. Please respond what you would like the outcome to be:

+1: Accepted with revisions - New version of the proposal looks good.
-1: Reject - I don’t see the need for the feature

Apologies for the slightly chaotic progression of the review but I really feel we have explored the space quite thoroughly now and been able to put in front of the Core Team enough information for them to make a decision.

1 Like

It’s not the most lightweight choice, but all compact solutions (like single quotes) would fail for strings that contain that specific character, so I think some verbosity can’t be avoided. Additionally, it would be straightforward to introduce variations with different rules.
In isolation, it seems like the best solution, and I guess we’ll have to accept the uncertainty of how a yet to be defined regex-literal will interfere with this.

Could you update the example so that it illustrates the explanation?

#raw({[?</"how will this be closed?

I think that would be #raw({[?</“a string”}]?>/) in the current implementation which isn’t ideal.
Difficult to see an alternative if you want good emoji support.

Well I don‘t think emojis are the most important use case for string delimiters. But even when someone wants to use these, we just have to teach the compiler to treat the emoji as a single character and not reverse the unicode scalars.

2 Likes

I’m saying the characters of the delimiter should not be reversed. I’d rather see:

#raw(DELIM”a string”DELIM)

than

#raw(DELIM”a string”MILED)

This also allows emojis to work which are notoriously difficult to segment…

Imho that would be the best choice because it’s simple.
Reversing might look nicer, but especially with braces and similar characters, it could become really complicated, especially when we want full mirroring for those.
Unicode is huge, and you shouldn’t have to know weather a character has a mirrored equivalent. I’m sure I don’t know every sign that is used as quotation mark in some languages.
\ /
(DELIM""MI⅃Ǝᗡ)

Even if the “EOF” approach doesn’t look nice

#raw([<"TEXT"[<)

imho the syntax shouldn’t be optimized for different types of parenthesis as delimiters.

+1 for the updated version, thank you for all the hard works!.
Although a first class Regex will solve the Regex problem better, this will solve the Copy and Paste problem. JSON and HTML are great examples.
The revised version tackled the same problem as Ruby’s heredoc solved.

There’s one tiny improvement we can make, a variant that can turn off interpolation.
But maybe we can save that for another review.

Does the new proposed version support multi line literales as well?

Sure, but they wouldn’t quite be raw strings as indentation removal would apply.

The interpolation feature is only enabled if you opt in by double bracketing the literal:

#raw((“a string \(var)”))

Otherwise they are raw.

That wouldn’t make much sense to me, like the removal of wrapping back-slash, because in my code base the longest raw-string I possible would be able to create would be around 70 characters.

Making an exception for \ at the end of the line has some merits but I think we need to keep processing of raw strings as simple as possible.

That’s what I’m thinking as well so the following is fine by me:

#raw(```
   multi-line version is in the lines between the delimiters
   keep indentation and keep wrapping \
   slash but remove other rules
   ```)

I figured I’d share here some thoughts on the design space which I’ve already shared with @johnno1962 off-list. I think it’s useful to have a common vocabulary, putting some names to the problems that we’re trying to solve with this proposal.

As it happens, Wikipedia has a wonderfully thorough series of articles on just this topic, so these are some condensed notes that I took, as well as some reflections on how those points apply to Swift in particular:

Link to Gist


Notes on string literals

Most programming languages use delimiters to surround a string literal. A
known issue that arises due to the use of delimiters is delimiter collision,
which arises when the delimiter(s) themselves need to be represented in the
literal.

Solutions to delimiter collision

  • Paired quotes
    Different opening and closing delimiters; solves a limited subset of delimiter
    collision problems as it can permit only balanced, nested strings.
    Supported in PostScript (parentheses), Visual Basic .NET (curly quotes).
  • Escape characters and sequences
    A very commonly used solution.
    Already supported in Swift.
  • "Doubling up" delimiters
    Similar in concept to escaping the delimiter, two consecutive delimiters are
    interpreted as a literal character.
    Supported in Basic, Fortran, Pascal, Smalltalk.
  • Dual delimiters
    For example, a literal may be delimited by either single quotes or double
    quotes.
    Supported in Fortran, JavaScript, PHP, Python.
    A form of dual delimiters is supported in Swift in that " can be used
    without escaping inside multiline string literals.
  • Configurable multiple delimiters
    Here document-style strings are one variant; the user must know that the
    chosen delimiter will not appear in the quoted string or predict which
    sequences of characters are unlikely to appear.
    Supported in Perl, Ruby, C++11, Lua.

The principal drawback to the use of escape characters is leaning toothpick
syndrome
, a concept first widely introduced in Perl. The principle use cases
in which the issue arises are:

  • Regular expressions matching Unix-style paths
  • Windows paths–most pathologically, regular expressions matching Windows
    Uniform Naming Convention paths, which begin with the prefix \\ that
    requires double-escaping (\\\\\\\\)

Solutions to leaning toothpick syndrome

  • Custom delimiters
    In Perl, characters other than / can be used as delimiters for regular
    expressions.
  • Raw strings
    See table below for comparative syntax.
Language Syntax
C# @"string"
C++11 R"xxx(string)xxx", where xxx is an optional custom delimiter
Go `string`
Python r"string"
Scala """string""" (no interpolation) or raw"string" (interpolation)

Some conclusions

Many more languages offer raw strings than custom delimiters. The former
is addressed specifically at mitigating the issue of leaning toothpick syndrome,
which arises when using escape sequences. The latter is an alternative to
the use of escape sequences.

In Swift, both escape sequences and string interpolation segments are prefixed
with \. This is a deliberate design choice; Swift differs from languages such
as Scala where the two have distinct spelling. Scala offers a raw
interpolator
syntax (interpolation but no escaping) as well as other
variations. Swift’s deliberate design choice likely rules out such a design:
instead, string literals will support both interpolation and escaping or
neither.

Generally, languages support single-line raw strings. This trend likely
reflects the insight that leaning toothpick syndrome is most pathological in the
case of regular expressions. Although multiline raw strings would permit
unmodified embedding of source code, such a use case is not a primary motivation
because, in the absence of custom delimiters, raw strings actually disable a
solution to delimiter collision which may be necessary for the embedded code.

Support for custom delimiters for regular expressions obviates the need
for raw strings to overcome leaning toothpick syndrome involving forward
slashes
but not backslashes.

Syntax

Swift eschews numeric literal suffixes such as f and l; users have largely
rejected r"string" syntax on that basis, and it is unlikely that @"string"
in the style of C# would find greater acceptance.

Single backticks already serve another role in Swift. Multiple backticks may
still be considered, but the use of multiple backticks for single-line raw
string literals may be considered inconsistent given current syntax for
single-line and multiline string literals.

Given such considerations, the remaining options include either more verbose
spellings such as raw"string" or the single quote option 'string'.

9 Likes
Terms of Service

Privacy Policy

Cookie Policy