[Pitch] alternative multiline string literals


(L Mihalkovic) #1

Please accept my apologies for the repeat… I seem to have more trouble with my emails than the brilliant codebase this team has produced.
Best regards
LM/

···

——————————

Wanting to test the validity of some of the arguments I read on the main proposal, I worked on my own prototype. I think there is more freedom than seem to have been identified so far.

The syntax I am exploring is visible here: https://gist.github.com/lmihalkovic/718d1b8f2ae6f7f6ba2ef8da07b64c1c

There are still a couple of things that do not work
serialization of the @string_literal attribute
type checker code for the @string_literal attribute
skipping leading spaces on each lines, based on the indentation of the first line
removing some of the extra EOL (rule to be defined)

The following works:
comment before the literal data
@string_literal(“xxxx”). At the moment the attribute value is a string_literal, maybe a identifier would be better, and maybe it should be @string_literal(type: “xxxx”), so that other properties can be added. I persist in thinking that a lot of good can come from being able to tag the contents of string literal (e.g. XML schema validation, custom syntax coloring, … )
the code is based on a string_multiline_literal tag to make these extension formally visible in the grammar
no need to prefix each line (although it will be possible to use | as a margin)

let s0 = "s0"

let s1 = "{\"key1\": \"stringValue\"}"

let s2 = _"{"v2"}"_

let s3 =
    /* this is a template */
    _"{"key3": "stringValue"}"_

let s4 =
/* this is (almost) the same template */
_"
{
  "key4": "stringValue"
  , "key2": "stringValue"
}
"_

@string_literal("json") let s5 =
  /* this is exactly the same template as s5 */
  _"
  {
    "key5": "stringValue"
  }
  "_

@string_literal("json") let s6 =
  /* this is exactly the same template as s5 */
  _"
  >{
  > "key6": "stringValue"
  > , "key2": "stringValue"
  >}
  "_

On May 7, 2016, at 1:53 AM, John Holdsworth via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

I’ve had a go at parsing HEREDOC (why does autocorrect always change this to HERETIC!)
It wasn’t as difficult as I’d expected once you comment out a few well meaning asserts in the
compiler. To keep lexing happy there are two variants <<“HEREDOC” and <<‘HEREDOC’.

        assert( (<<"XML" + <<"XML") == (xml + xml) )
        <?xml version="1.0"?>
        <catalog>
           <book id="bk101" empty="">
               <author>\(author)</author>
               <title>XML Developer's Guide</title>
               <genre>Computer</genre>
               <price>44.95</price>
               <publish_date>2000-10-01</publish_date>
               <description>An in-depth look at creating applications with XML.</description>
           </book>
        </catalog>
        XML
        <?xml version="1.0"?>
        <catalog>
           <book id="bk101" empty="">
               <author>\(author)</author>
               <title>XML Developer's Guide</title>
               <genre>Computer</genre>
               <price>44.95</price>
               <publish_date>2000-10-01</publish_date>
               <description>An in-depth look at creating applications with XML.</description>
           </book>
        </catalog>
        XML

Its a credit to it's authors that Xcode and the remainder of the toolchain cope with this remarkably well
now that tokens arrive out of order. The weird colouring is an artefact I’ve not been able to resolve.

The changes are here: https://github.com/apple/swift/pull/2275, and amount to an additional 60 lines of by no
means bullet proof code. The total changes for multi-line literals are now 10% of the Swift lib/Parse/Lexer.cpp.


(L Mihalkovic) #2

Added details about the prototype.

// This code sample and prototype implementation (implemented in patch against the 3.0
// branch of the swift compiler) explores a possible syntax based on ideas discussed in
// the swift-evolution mailing list at
// http://thread.gmane.org/gmane.comp.lang.swift.evolution/904/focus=15133
//
// The proposed syntax uses a combination of two characters to signal the start and end
// of a multiline string in the source code: _" contents "_
//
// Additionally, the syntax introduces a new @string_literal() attibute that can be used
// to semantically tag the contents of a multiline string literal. The hope is that this
// attribute would be used by IDEs for custom validation/formatting of the contents of
// long string literals, as well as possibly be accessible at runtime via an extended
// mirror/reflection mechanism.

// Tagging literal contents
@string_literal("json") let att1 = _"{"key1": "stringValue"}"_
@string_literal("text/xml") let att2 = _"<catalog><book id="bk101" empty=""/></catalog>"_
@string_literal("swift_sil") let att3 = _" embedded SIL contents?! "_

// The following alternatives for placement of the attribute have been looked into
// and so far rejected for seemingly not fitting as closely with other attribute usage
// patterns in the Swift grammar:
//
// let att2 : @string_literal("text/xml") String = _" ... "_ // Conveys the impressing that the type is annotated rather than the variable
// let att2 = @string_literal("text/xml") _" ... "_ // Appealing, but without any precedent in the Swift grammar
//

// checking that nothing is broken
let s0 = "s0"

// The default swift syntax requires that quotes be escaped
let s1 = "{\"key1\": \"stringValue\"}"

// The proposed syntax for multiline strings works for single line strings as well (maybe it
// should not) and does not mandate that enclosed single quote characters be escaped
let s2 = _"{"v2"}"_

// When dealing with long blocks of embedded text, it seems natural to want to describe them
// as close as possible to the contents. The proposed syntax supports inserting a comment
// just before the data it documents. This allows the comment indentation to match exactly
// that of the string.
let s3 =
    /* this is a template */
    _"{"key3": "stringValue"}"_

// --------------------------------------------------------------------------------
// The following section explores different ways to deal with leading spaces

let s4 =
/* this is (almost) the same template */
_"
{
  "key4": "stringValue"
  , "key2": "stringValue"
}
"_

let equivS4 =
"\n"
"{\n" +
" \"key4\": \"stringValue\"\n" +
" , \"key2\": \"stringValue\"\n" +
"}\n" +
"\n"

//TODO: fix the leading spaces
let s5 =
  /* this is exactly the same template as s5 */
  _"
  {
    "key5": "stringValue"
  }
  "_

//TODO: fix the leading spaces
let s6 =
  /* this is exactly the same template as s5 */
  _"
  >{
  > "key6": "stringValue"
  > , "key2": "stringValue"
  >}
  "_

I would appreciate any input on the realism/degree of difficulties of pursuing something like the following (swift_sil being a built in reserved attribute value):

@string_literal(swift_sil) let att3 = _" embedded SIL contents?! "_

The train of thoughts is to try and identify a “simple” pathway for something akin to rust macros or jai’s “do this during compilation” (possible long term replacement for .gyb?!)

Please accept my apologies for the repeat… I seem to have more trouble with my emails than the brilliant codebase this team has produced.
Best regards
LM/

——————————

Wanting to test the validity of some of the arguments I read on the main proposal, I worked on my own prototype. I think there is more freedom than seem to have been identified so far.

The syntax I am exploring is visible here: https://gist.github.com/lmihalkovic/718d1b8f2ae6f7f6ba2ef8da07b64c1c

There are still a couple of things that do not work
serialization of the @string_literal attribute
type checker code for the @string_literal attribute
skipping leading spaces on each lines, based on the indentation of the first line
removing some of the extra EOL (rule to be defined)

The following works:
comment before the literal data
@string_literal(“xxxx”). At the moment the attribute value is a string_literal, maybe a identifier would be better, and maybe it should be @string_literal(type: “xxxx”), so that other properties can be added. I persist in thinking that a lot of good can come from being able to tag the contents of string literal (e.g. XML schema validation, custom syntax coloring, … )
the code is based on a string_multiline_literal tag to make these extension formally visible in the grammar
no need to prefix each line (although it will be possible to use | as a margin)

From a more technical standpoint, some of the choices where dictated by the desire to create a pathway inside the existing Lexer/Parser that would
accommodate the immediate needs of today’s simple multiline string literal needs
lay some clean foundations for future extensions
try to keep the risks to a minimum level
open doors to simplify the work involved in creating other prototypes

As such the prototype relies on the following alterations to the core logic of the parser/lexer
the introduction of a new string_multiline_literal
the following change to the the core logic of lexImpl()

  case 'o': case 'p': case 'q': case 'r': case 's': case 't': case 'u':
  case 'v': case 'w': case 'x': case 'y': case 'z':
  case '_':
+ if (CurPtr[-1] == '_' && CurPtr[0] == '"') {
+ return lexStringMultilineLiteral();
+ } else {
      return lexIdentifier();
+ }

···

On May 7, 2016, at 8:20 PM, L Mihalkovic <laurent.mihalkovic@gmail.com> wrote: