Declarative String Processing Overview

I’m excited to see this get addressed! I love the flexibility of having both RegEx and Pattern.

I do have concerns about RegEx literal syntax, though. / is already a valid prefix and postfix operator, so code like /a/ could be reinterpreted. I think we should consider reusing ", like Swift does with Characters (instead of using ' like other C-style languages). For example:

let regex: RegEx = "([0-9A-F]+)(?:\.\.([0-9A-F]+))?\s*;\s(\w+).*"

or

let regex: RegEx = #"([0-9A-F]+)(?:\.\.([0-9A-F]+))?\s*;\s(\w+).*"#

This wouldn’t be a simple String — the RegEx would still be evaluated at compile time and there would ideally be better syntax highlighting. We may also want to change the meaning of \ escapes (like in the first example), but we could instead keep \ the same and suggest the use of raw strings (like in the second example). Using " would give us valuable things from string literals, like

  • Multi-line literals
  • Raw string syntax
  • String interpolation syntax (if we decide not to change the meaning of \) that could be used to combine RegExes.

Also, what does . mean? The proposal seems to use “scalar” and “character” interchangeably, but they mean different things in Swift (“character” usually refers to a grapheme cluster, not a Unicode scalar). I think it should refer to a grapheme cluster — it would encourage correct string processing, which is a goal of Swift. However, it should be noted that no other regular expression implementation that I know of uses . to refer to a grapheme cluster — even the Unicode ICU implementation.

4 Likes