SE-0200: "Raw" mode string literals

Imho that would be the best choice because it's simple.
Reversing might look nicer, but especially with braces and similar characters, it could become really complicated, especially when we want full mirroring for those.
Unicode is huge, and you shouldn't have to know weather a character has a mirrored equivalent. I'm sure I don't know every sign that is used as quotation mark in some languages.
\ /
(DELIM""MI⅃Ǝᗡ)

Even if the "EOF" approach doesn't look nice

#raw([<"TEXT"[<)

imho the syntax shouldn't be optimized for different types of parenthesis as delimiters.

+1 for the updated version, thank you for all the hard works!.
Although a first class Regex will solve the Regex problem better, this will solve the Copy and Paste problem. JSON and HTML are great examples.
The revised version tackled the same problem as Ruby's heredoc solved.

There's one tiny improvement we can make, a variant that can turn off interpolation.
But maybe we can save that for another review.

Does the new proposed version support multi line literales as well?

Sure, but they wouldn’t quite be raw strings as indentation removal would apply.

The interpolation feature is only enabled if you opt in by double bracketing the literal:

#raw((“a string \(var)”))

Otherwise they are raw.

That wouldn't make much sense to me, like the removal of wrapping back-slash, because in my code base the longest raw-string I possible would be able to create would be around 70 characters.

Making an exception for \ at the end of the line has some merits but I think we need to keep processing of raw strings as simple as possible.

That's what I'm thinking as well so the following is fine by me:

#raw(```
   multi-line version is in the lines between the delimiters
   keep indentation and keep wrapping \
   slash but remove other rules
   ```)

I figured I'd share here some thoughts on the design space which I've already shared with @johnno1962 off-list. I think it's useful to have a common vocabulary, putting some names to the problems that we're trying to solve with this proposal.

As it happens, Wikipedia has a wonderfully thorough series of articles on just this topic, so these are some condensed notes that I took, as well as some reflections on how those points apply to Swift in particular:

Link to Gist


Notes on string literals

Most programming languages use delimiters to surround a string literal. A
known issue that arises due to the use of delimiters is delimiter collision,
which arises when the delimiter(s) themselves need to be represented in the
literal.

Solutions to delimiter collision

  • Paired quotes
    Different opening and closing delimiters; solves a limited subset of delimiter
    collision problems as it can permit only balanced, nested strings.
    Supported in PostScript (parentheses), Visual Basic .NET (curly quotes).
  • Escape characters and sequences
    A very commonly used solution.
    Already supported in Swift.
  • "Doubling up" delimiters
    Similar in concept to escaping the delimiter, two consecutive delimiters are
    interpreted as a literal character.
    Supported in Basic, Fortran, Pascal, Smalltalk.
  • Dual delimiters
    For example, a literal may be delimited by either single quotes or double
    quotes.
    Supported in Fortran, JavaScript, PHP, Python.
    A form of dual delimiters is supported in Swift in that " can be used
    without escaping inside multiline string literals.
  • Configurable multiple delimiters
    Here document-style strings are one variant; the user must know that the
    chosen delimiter will not appear in the quoted string or predict which
    sequences of characters are unlikely to appear.
    Supported in Perl, Ruby, C++11, Lua.

The principal drawback to the use of escape characters is leaning toothpick
syndrome
, a concept first widely introduced in Perl. The principle use cases
in which the issue arises are:

  • Regular expressions matching Unix-style paths
  • Windows paths--most pathologically, regular expressions matching Windows
    Uniform Naming Convention paths, which begin with the prefix \\ that
    requires double-escaping (\\\\\\\\)

Solutions to leaning toothpick syndrome

  • Custom delimiters
    In Perl, characters other than / can be used as delimiters for regular
    expressions.
  • Raw strings
    See table below for comparative syntax.
Language Syntax
C# @"string"
C++11 R"xxx(string)xxx", where xxx is an optional custom delimiter
Go `string`
Python r"string"
Scala """string""" (no interpolation) or raw"string" (interpolation)

Some conclusions

Many more languages offer raw strings than custom delimiters. The former
is addressed specifically at mitigating the issue of leaning toothpick syndrome,
which arises when using escape sequences. The latter is an alternative to
the use of escape sequences.

In Swift, both escape sequences and string interpolation segments are prefixed
with \. This is a deliberate design choice; Swift differs from languages such
as Scala where the two have distinct spelling. Scala offers a raw
interpolator
syntax (interpolation but no escaping) as well as other
variations. Swift's deliberate design choice likely rules out such a design:
instead, string literals will support both interpolation and escaping or
neither.

Generally, languages support single-line raw strings. This trend likely
reflects the insight that leaning toothpick syndrome is most pathological in the
case of regular expressions. Although multiline raw strings would permit
unmodified embedding of source code, such a use case is not a primary motivation
because, in the absence of custom delimiters, raw strings actually disable a
solution to delimiter collision which may be necessary for the embedded code.

Support for custom delimiters for regular expressions obviates the need
for raw strings to overcome leaning toothpick syndrome involving forward
slashes
but not backslashes.

Syntax

Swift eschews numeric literal suffixes such as f and l; users have largely
rejected r"string" syntax on that basis, and it is unlikely that @"string"
in the style of C# would find greater acceptance.

Single backticks already serve another role in Swift. Multiple backticks may
still be considered, but the use of multiple backticks for single-line raw
string literals may be considered inconsistent given current syntax for
single-line and multiline string literals.

Given such considerations, the remaining options include either more verbose
spellings such as raw"string" or the single quote option 'string'.

9 Likes

Thank you for the great summary @xwu :clap:. I really appreciate your thoughtful contributions to this forum, particularly attempts to distill the essence out of the sometime chaotic contributions to the forum.

Given that we have single and multi-line string literals, and that regex literal syntax using //'s seems likely, the remaining unserved usecase is the swath of "other" that is not covered well by any of those.

I'm not sure how wide the audience of this "other" is, but it seems that (if we need to solve for it) that we should go with the maximally powerful solution that can blast away any problematic cases, even if it is syntactically onerous. The idea being that there are few cases that need this level of treatment, but (iff) they are common enough to require a solution, so they can tolerate syntactic excess at the edges.

To me, that seems to imply that niceties like string interpolation syntax is not important. It also seems to say that the ability to have custom delimiters is important, because that is the most general solution to the individual problems (but it doesn't mean that we have to go whole hog and allow emoji as delimiters!). I agree that r"xx" syntax feel unnatural, but maybe something like:

#raw("delim", delim"..."delim) could work, for some limited idea of what the "delim" string can contain? If we require the quotes in the specified places but also require the delimiter to exist, then this allows unique-enough strings like:

#raw("x", x"crazy\(ain't it?!?@#?"x) 

which isn't too bad. Because these things are only rarely used, maybe something like this could work?

All that said, it really isn't clear to me that this is motivated enough to be worth language complexity to support. I'm glad there is extensive discussion though, so we can decide and legislate it once and for all.

-Chris

3 Likes

Also, I have to say that attempts to use '' to solve this aren't really motivating to me. I tend to believe in the use of '' delimiters for characters, but also don't think they are particularly useful for the purposes of raw strings: a raw string can very likely contain both a single and a double quote, so the perl approach of forcing you to choose seems unappetizing.

The other benefit of the approach I'm pitching above is that it lends itself to a natural "multiline" raw string literal syntax of:

#raw("x", x"""
    crazy\(ain't it?!?@#?
    """x) 

Thought we'd have to be careful to not allow "'s in the delimiter or something (to avoid ambiguity).

-Chris

1 Like

What is the benefit of having the delimiter written out three times, instead of just using the opening delimiter to close the string?

Simpler maybe to have it as an explicit parameter than as an explicit one?

If we want to do this, I think we should also have a way to apply it to a non-raw string. Strawman example:

#cooked("x", x"""
    crazy\(verb) it?!?@#?
    """x)

That suggests we should consider this custom delimiter feature, whatever it ends up looking like, to be a separate feature from raw strings.

String handling in Swift emphatically does not resemble scripting languages and intentionally so. Python for example has no concept of characters, only strings on length 1, has no concept of grapheme clusters, allows O(1) integer subscripting, and no concept of encoding views.

can i just say custom delimiters are going to be problematic for any text editor other than XCode. Swift is already an absolute nightmare to syntax highlight I don’t think any highlighter is better than about a 6 on a scale of 1 to 10 at the moment and this is only going to make it worse

1 Like

Instead of custom delimiters, how about using extra parentheses?

#raw("This string contains " and stops here.")

#raw(("This string contains ") and stops here."))

#raw((("This string contains ")) and stops here.")))

I don't buy such arguments in general: If someone insists on suboptimal tools, he can either live with the problem or just refrain from using the problematic feature (and I consider Xcode suboptimal as well, so that's not just saying "I don't have that problem" ;-)

It's another story that Swift imho is to complicated - but I don't think this proposal is a game changer in this respect.

it wouldn’t be as big a deal if xcode was free and available for linux but it’s not. im not really someone who argues over text editors but when you have no other choice

2 Likes

people say :'))))))) a lot that’s a lot of matching parentheses