The Need for Raw
I write a lot of code generation apps. Because of this, I could really use some raw strings in my life. Raw strings aren't uncommon in other programming languages. Swift needs to implement them in a way that feels cromulently Swifty.
If you don't remember, the SE-0200 proposal was returned for revision with two conditions: fixing the syntax and coming up with better use cases. I think the use cases has been fixed. It's up to the syntax, and the syntax has to work for both single line and multiline strings. @johnno1962 already implemented Raw strings as a branch so this effort just needs a revised proposal to make this happen.
At this point, I figure either I push for this to happen or I have to live with it never happening. I decided to make one last push.
Motivation
The core team wrote:
The proposal itself leans heavily on regular expressions as a use case for raw string literals. Several reviewers remarked that the motivation wasn’t strong enough to justify the introduction of new syntax in the language, so a revised proposal will need additional motivating examples in other domains.
I think the use cases part is easy. Here's my list.
- Metaprogramming: This covers use cases including code-producing-code for utility programming, targeted grammars, and building test cases without escaping. This is my use case. I have apps that generate color schemes (in Swift, ObjC, for SpriteKit/SceneKit, literals, etc), that generate appropriate date formatters, that perform language-specific escaping (for example, for Objective-C), that generate markup, and more. Other interested parties would probably include the people who build Kite Composer and PaintCode. Any utility app that outputs code would benefit in some form.
- Pedagogy: Code snippets play a major role in projects like "Learn to Code" and other teaching applications. Removing snippets to external files makes code review harder. Escaping snippets is a tedious process, which is hard to inspect. Escaping also complicates copying and pasting from working code into your source and back. When you're talking about code, and using code, having that code be formatted as an easily updated raw string is especially valuable.
- ** Dialogue**: Although many issues are handled with triple-quoted multiline strings, dialogue is peppered with quote marks and other material that requires string escaping. Sources range from scripts to literature, to other textual sources. For relatively short snippets, in the ones or tens of lines, it may be impractical to use external files for each inclusion, especially during code review.
- Windows paths: Windows uses backslashes to delineate descent through a directory tree: e.g.,
C:\Windows\All Users\Application Data
. I don't do windows but other people do. We do not judge. - Data Formats and Domain Specific Languages: It's useful to incorporate short sections of unescaped JSON and XML allows raw form cut and paste without escaping quotes. Like dialog, it may be impractical to use external files and databases for each inclusion. Doing so reduces the ease of inspection, maintenance, and updating with new material.
- Regular expressions: I saved this one for last because Raw Strings aren't just about regex. While regex in general is a much larger problem than raw strings, it is a primary (if not the primary) use case for many Swift developers. Doing raw strings right now helps regex down the line.
Approachs (aka Get Your Paint Swatches Out!)
The core team wrote that the proposed r"..."
syntax didn’t fit well with the rest of the language. The most-often-discussed replacement was #raw("...")
, but the Core Team felt more discussion (as a pitch) was necessary. In follow-up, the most popular approaches were #raw
and #rawString
.
Neither of these seems to satisfy multi-line raw strings very well (which are my primary use-case). Whatever solution is picked, needs to handle both quick regexes as well as substantial multi-line text.
It is highly unlikely that the raw string will offer custom delimiters, so the solution should be as simple and as elegant as possible and the solution for one-line and multi-line strings should be obvious and related.
Right now, multi-line strings work well with dialog and book excerpts, so long as they don't involve backslashes. This is a win for classic text and a slight problem for pedagogical text that discusses technological issues:
let aliceStrings = """
"Come, there's no use in crying like that!" said Alice to herself,
rather sharply; "I advise you to leave off this minute!" She generally
gave herself very good advice, though she very seldom followed it.
"""
This works terribly with code
let codeString = """
let (coreName, coreValue) = model.refresh()
let value = "\(coreName): \(coreValue)"
"""
The marker for raw text processing should be close to the string it annotates. Swift provides a couple of role models for this.
The first is declaration attributes. Adding a @raw
(or rawValue
) attribute to a String
declaration could establish that a following string literal be treated as raw.
let myString: @raw String = "let value = "\(coreName): \(coreValue)""
let myString: @raw String = """
let value = "\(coreName): \(coreValue)"
"""
The drawbacks are:
- You need a declaration
- You cannot use an expression with other elements.
- The annotation is fairly far away from the quote marks.
Another approach is to use a #
-delimited keyword, such as #rawString
. The problem lies in where and if you add parentheses.
Single line raw strings are easy:
let codeValue = #rawString("let value = "\(coreName): \(coreValue)"")
Multilines get more complicated, especially with the dangling parenthesis at the end.
let codeValue = #rawString("""
let (coreName, coreValue) = model.refresh()
let value = "\(coreName): \(coreValue)"
""")
Eliminating the parenthesis means the modifier must precede the literal, and r"..."
has already been ruled out. Plus it's ugly and unswifty. So how do you annotate the raw string in a Swiftilious fashion?
let codeValue = #raw "let value = "\(coreName): \(coreValue)""
let codeValue = #raw """
let (coreName, coreValue) = model.refresh()
let value = "\(coreName): \(coreValue)"
"""
These aren't horrible but they aren't wonderful either.
You could follow the example of #if
/#endif
:
let codeValue = #raw "let value = "\(coreName): \(coreValue)"" #endRaw
let codeValue = #raw """
let (coreName, coreValue) = model.refresh()
let value = "\(coreName): \(coreValue)"
""" #endRaw
// or
let codeValue = #raw
"""
let (coreName, coreValue) = model.refresh()
let value = "\(coreName): \(coreValue)"
"""
#endRaw
Again, pretty ugly.
Or you can try using some kind of operator on the quote marks:
let codeValue = #raw(")let value = "\(coreName): \(coreValue)"#raw(")
let codeValue = #raw(""")
let (coreName, coreValue) = model.refresh()
let value = "\(coreName): \(coreValue)"
#raw(""")
If you're willing to walk away from keywords, then underscores could create a new "kind" of quote marks:
let codeValue = _"let value = "\(coreName): \(coreValue)""_
let codeValue = _"""_
let (coreName, coreValue) = model.refresh()
let value = "\(coreName): \(coreValue)"
_"""_
I actually don't hate this one as much as I might. I thought the double underscores for the two triple quotes looked better than unbalanced ones. I dislike the leading preventing the close quotes from lining up with the text. You could change the triples to single underscore markers:
let codeValue =
"""_
let (coreName, coreValue) = model.refresh()
let value = "\(coreName): \(coreValue)"
"""_
Right-sided underscores look best but they kind of break the idea of the modifications being on the "outside" of the raw quoted text.
So what do you have to offer? Bonus points for clean and elegant.