SE-0200: "Raw" mode string literals

I figured I'd share here some thoughts on the design space which I've already shared with @johnno1962 off-list. I think it's useful to have a common vocabulary, putting some names to the problems that we're trying to solve with this proposal.

As it happens, Wikipedia has a wonderfully thorough series of articles on just this topic, so these are some condensed notes that I took, as well as some reflections on how those points apply to Swift in particular:

Link to Gist


Notes on string literals

Most programming languages use delimiters to surround a string literal. A
known issue that arises due to the use of delimiters is delimiter collision,
which arises when the delimiter(s) themselves need to be represented in the
literal.

Solutions to delimiter collision

  • Paired quotes
    Different opening and closing delimiters; solves a limited subset of delimiter
    collision problems as it can permit only balanced, nested strings.
    Supported in PostScript (parentheses), Visual Basic .NET (curly quotes).
  • Escape characters and sequences
    A very commonly used solution.
    Already supported in Swift.
  • "Doubling up" delimiters
    Similar in concept to escaping the delimiter, two consecutive delimiters are
    interpreted as a literal character.
    Supported in Basic, Fortran, Pascal, Smalltalk.
  • Dual delimiters
    For example, a literal may be delimited by either single quotes or double
    quotes.
    Supported in Fortran, JavaScript, PHP, Python.
    A form of dual delimiters is supported in Swift in that " can be used
    without escaping inside multiline string literals.
  • Configurable multiple delimiters
    Here document-style strings are one variant; the user must know that the
    chosen delimiter will not appear in the quoted string or predict which
    sequences of characters are unlikely to appear.
    Supported in Perl, Ruby, C++11, Lua.

The principal drawback to the use of escape characters is leaning toothpick
syndrome
, a concept first widely introduced in Perl. The principle use cases
in which the issue arises are:

  • Regular expressions matching Unix-style paths
  • Windows paths--most pathologically, regular expressions matching Windows
    Uniform Naming Convention paths, which begin with the prefix \\ that
    requires double-escaping (\\\\\\\\)

Solutions to leaning toothpick syndrome

  • Custom delimiters
    In Perl, characters other than / can be used as delimiters for regular
    expressions.
  • Raw strings
    See table below for comparative syntax.
Language Syntax
C# @"string"
C++11 R"xxx(string)xxx", where xxx is an optional custom delimiter
Go `string`
Python r"string"
Scala """string""" (no interpolation) or raw"string" (interpolation)

Some conclusions

Many more languages offer raw strings than custom delimiters. The former
is addressed specifically at mitigating the issue of leaning toothpick syndrome,
which arises when using escape sequences. The latter is an alternative to
the use of escape sequences.

In Swift, both escape sequences and string interpolation segments are prefixed
with \. This is a deliberate design choice; Swift differs from languages such
as Scala where the two have distinct spelling. Scala offers a raw
interpolator
syntax (interpolation but no escaping) as well as other
variations. Swift's deliberate design choice likely rules out such a design:
instead, string literals will support both interpolation and escaping or
neither.

Generally, languages support single-line raw strings. This trend likely
reflects the insight that leaning toothpick syndrome is most pathological in the
case of regular expressions. Although multiline raw strings would permit
unmodified embedding of source code, such a use case is not a primary motivation
because, in the absence of custom delimiters, raw strings actually disable a
solution to delimiter collision which may be necessary for the embedded code.

Support for custom delimiters for regular expressions obviates the need
for raw strings to overcome leaning toothpick syndrome involving forward
slashes
but not backslashes.

Syntax

Swift eschews numeric literal suffixes such as f and l; users have largely
rejected r"string" syntax on that basis, and it is unlikely that @"string"
in the style of C# would find greater acceptance.

Single backticks already serve another role in Swift. Multiple backticks may
still be considered, but the use of multiple backticks for single-line raw
string literals may be considered inconsistent given current syntax for
single-line and multiline string literals.

Given such considerations, the remaining options include either more verbose
spellings such as raw"string" or the single quote option 'string'.

9 Likes