Sub-syntax for string literals

Motivation

Multi-line string literals often get used to embed JSON, Yaml, JavaScript, etc. into Swift code, especially in tests.

However, for any given string literal, Swift does not seems to provide any official way to designate syntax its contents are supposed to conform to. Also, I was unable to find any prior suggestion about this. (Please correct me if I'm wrong, happy to withdraw this pitch if I missed something.)

Proposed Solution

To address this, please consider the following suggestion.

The idea is a new feature for multi-line string literals that allows the user to specify what format the nested content should conform to. Swift would provide a String.Syntax enumeration that would allow the programmer to choose amongst a wide range of existing formats to designate the string content as conforming to.

For example:

let foo = """
{ "foo": 42 }
""".syntax(.json)

let demoScript = """
document.getElementById("demo").innerHTML = "Hello JavaScript!"
""".syntax(.ecma.version(6))

let webPage = """
<!DOCTYPE html>
<html><body>
<h2>What Can JavaScript Do?</h2>
<p id="demo">JavaScript can change HTML content.</p>
<button type="button" onclick='\(demoScript)'>Click Me!</button>
</body></html>
""".syntax(.html.version(3))

Alternatively we could go with:

let demoScript = """.ecma.version(6)
document.getElementById("demo").innerHTML = "Hello JavaScript!"
"""

... inspired by how you specify the format of a multiline string in Markdown.

To specify a syntax not provided in the built-in String.Syntax enum:

"""
/my/weird/&*format/
""".syntax(.custom("myWeirdFormat"))

## Implementation Details

If you are warm to the basic idea, whether it's feasible likely hinges on how it's supposed to be implemented from he compiler's standpoint. So let me pose a few questions to the community to kind of brainstorm how we might approach this:

1. What effect should specifying the syntax of a multiline string literal have, from the standpoint of the compiler?
2. Should this feature be implemented purely as a hook for IDEs/lint-scripts to provide syntax highlighting? If so, how might that change how the syntax for it should look?
3. If the programmer wants the compiler to validate their inset strings according to the specified syntax, how could we alleviate the responsibility of providing grammar-checking implementations from Swift itself? (I.e. how could we add extensibility to Swift that would allow third parties to offer syntax validation plug-ins to the compiler?)

Of course if something like this was already proposed or already exists, please let me know. Thanks!
2 Likes

We have the ExpressibleByStringLiteral protocol, which means that string literals don't necessarily need to be used just to create a String. If you wish to support string interpolations, we also have ExpressibleByStringInterpolation.

So you could do something like:

let foo: HTML = """
<!DOCTYPE html>
<html><body>
...
</body></html>
"""

The HTML type would then need to implement the init(stringLiteral: String) initializer, where they would check the contents and presumably fail at runtime if the string was ill-formed.

It would be nice to move this up to compile-time, but:

(a) Whilst there has been some work done on compiler-evaluable functions, it isn't ready yet and likely won't be for some time.

(b) Parsing strings at compile-time is a bit iffy. It would likely be quite expensive and lead to long compile-times (previous work on compiler-evaluable functions involves using some kind of threshold to limit what can be done), and anything involving Unicode would be tricky and depend on the version of the data tables used by the compiler.

So we can already do run-time validation, but compile-time validation would need a bunch of features that we're already tracking as more general features.

13 Likes

Agreed, and I think that if you want compile-time safety it would be a more swifty move to use a DSL instead of a string literal – something like @stephencelis and @mbrandonw 's swift-html or John Sundell's Plot.

1 Like

Well the initial purpose of this language feature would just be to:

  • support syntax highlighting, indentation, & code folding in IDEs within a multiline string literal as if it was a separate file of the specified type
  • provide consistent hooks for lint scripts (or any build script) to validate the structure and formatting of code in multiline string literals

Once that's in place, then it frees people up to make additional contributions to the compiler over time to enhance the features that hinge upon this aspect of the language.

In that regard perhaps it makes more sense as a declaration modifier or attribute, like:

@syntax(".yaml")
let foo = """
foo:
    bar:
        - baz
        - qux
        - alice
        - bob
"""

Because right now, the problem is that we are discouraged from using multiline string literals for any substantial blocks of code or markup because we'll lose the normal syntax highlighting and code folding capabilities of whatever editor we're using.

Xcode has started to support inline markdown rendering for things like playgrounds, but this only works for doc comments.

1 Like

I agree but that's not really what's motivating my proposal. I'm just concerned about the fact that if I copy-paste some web response from Charles into my tests, currently the IDE has no way to know what format a given string is supposed to be, so I don't get syntax coloring or code folding etc. Therefore I'm discouraged from using multiline string literals as opposed to separate files—which I would prefer to avoid because using separate files is much more complicated and you lose the ability to "jump to declaration".

1 Like

I love this idea and I don't know of any other language/IDE that does this so I think it would be a super unique and ground-breaking feature and selling point of Swift!

Yes, it seems infeasible at first, and I agree if it means that the Swift compiler would need to know about each specific embeddable language. But is that really true? I'm not knowledgeable about this at all, but I think the compiler could still just treat it as a plain multi-line string, except while allowing a bit of extra syntax that specifies the language. Then it would just be the IDE's job to provide syntax highlighting, code folding, etc, using whatever language-specific parsers.

I can see this being very useful for scripting with Swift, being able to embed AppleScript, shell script, etc, and have the IDE basically treat it as a separate file as far as parsing goes. I don't think the DSL approach is comprehensive enough to be a complete substitute for this.

All it needs, in my mind, is a small change to the Swift compiler. Technically Xcode could already try to heuristically detect embedded code and provide these features, but adding a special syntax would make it unambiguous and nudge third-party IDEs toward adding this capability as well.

The only thing I'm not too fond of is the proposed syntax for specifying the language as it looks like a dynamic feature which implies library knowledge of specific languages... and I think that's better left to packages, using the ExpressibleByStringInterpolation protocol. The only downside I can see is that you might have to specify the language twice, like:

let script: JavaScript = """javascript
    document.getElementById("demo").innerHTML = "Hello JavaScript!"
    """

But the """javascript is even more Markdown-like, which would be my preferred syntax.

In that case, it wouldn't need much more than optional string after """, which would be stripped out for actual string value. Then the rest would be IDE-specific (though I'm not sure if that's a good thing).

2 Likes

Yes, exactly.

Actually, I forgot about interpolation and now that I think about it, it's a problem. How could the IDE validate the embedded code when it doesn't know the dynamic values?

Not being able to use interpolations with this feature would make it a lot less worthwhile in my opinion.

If it's just an IDE thing, you'd need some way to tell the IDE "hey, use a different language server to interpret this region of source code". You wouldn't want Sourcekit-LSP trying to parse JavaScript, for instance.

Textmate grammars had such a thing, but I don't believe the Language Server Protocol has anything like it. I may be wrong. If the protocol itself supports it, you might be able to add some kind of comment which would inform Soucekit-LSP that it's supposed to flag the next string literal as being in language X.

EDIT: Apparently, LSP does support it. Cool. It’s tricky (see the bit at the end about the issues MS encountered), and I’m not sure how well it would scale to n embedded languages, but potentially doable if there are handful of languages that are embedded very frequently.

2 Likes

IIRC PhpStorm does something like this with string literals that look like SQL queries (syntax highlighting, at least), and it’s pretty nice, when it works.

If the goal here is primarily to get tooling to handle a string literal as if it were one of many different file types, perhaps the missing feature here is really to make it easy to “import” a file’s contents as a string literal at compilation time? E.g. let json = #fileContentsString(“my/json/file.json”).

8 Likes