[Pitch #2] Regex Literals

Ben_Cohen · April 17, 2022, 7:00pm

The #URL(””) syntax is interesting to discuss but probably beyond the scope of this pitch (unless it’s part of an argument that shorthand literal syntax for regexes is unnecessary… but that discussion is probably something that can be had without fleshing out a more generalized/verbose alternative)

(It leads to various questions that definitely merit their own separate thread e.g. Is it a way of saying “an initializer, but all arguments must be literals”? Or is shorthand for saying all arguments must be @const? Would it rely on some kind of generalized compile-time interpretation feature? Would it enforce some kind of “must be evaluable a to compile-time” rule? If so, would it be required for any compile-time-evaluable function or just optional i.e. is it a proposed spelling for “this function must be evaluated at compile time”? How would the typing of the result be generalized to a language feature?)

This is all very well for the compiler, but not so much for editor syntax highlighting. Ideally the two wouldn’t need different rules (editors might perhaps choose to simplify Swift’s parsing rules for ease of implementation purposes, but it’s less OK to go the other way and require editors to do something the compiler doesn’t have to do).

ksluder · April 17, 2022, 7:23pm

Would it also require special casing the diagnostic for unused results?

Ben_Cohen · April 17, 2022, 7:52pm

Yes good point.

xwu · April 18, 2022, 3:19am

Yes, indeed, which is precisely why I think the idea intriguing even as I’d otherwise prefer a less verbose regex syntax.

Much as the core team has adopted @Sendable closures without generalizing the feature to arbitrary protocol conformances, while still spelling the feature with an eye towards how it could be generalizable, my point here is that if we are to lean towards a verbose regex literal syntax, we ought to really lean into it with a spelling that can be later generalized with all the interesting possibilities and questions you enumerate above.

hamishknight · April 21, 2022, 3:14pm

nukka123:

I think /.../ syntax would also affect custom operators.

infix operator /¢*¢/

extension Int {
    static func /¢*¢/ (lhs: Int, rhs: Int) -> Int {
        return lhs + rhs
    }
}

func foo(op: (Int, Int) -> Int) {
    print(op(1,2))
}

foo(op: /¢*¢/)

Unfortunately this would indeed become a regex literal. This is a variant of the unapplied infix / operator case, I will update the pitch to cover it. However unlike infix /, this cannot be disambiguated with parens, I think the best way of disambiguating would likely to be writing it as a closure, e.g:

foo(op: { $0 /¢*¢/ $1 })

The inability to disambiguate with parens also affects other infix operators that start with / and are followed by other operator characters, e.g /^. It may be necessary to tweak the lexing rule to look through operator characters to reject a closing ).

ensan-hcl:

hamishknight:
let pattern = "[abc]+"
let regex = #regex(pattern)
which would likely be unexpected.
Considering the behavior of StaticString , I don't think it's so much unnatural

Note it wouldn't be entirely like StaticString (or literals in general), as you also wouldn't be able to intermix any expressions between #regex(...) and the "..." argument. For example you wouldn't be able to write:

#regex(b ? "[abc]" : "[def]")

Or, if you were, it would lose out on editor support.

Just to clarify, contextual information is required while parsing, specifically "are we parsing an expression?". This is necessary to avoid parsing a regex literal in the following cases:

infix operator /^/ // An operator, not a regex
func /^/ (lhs: Int, rhs: Int) -> Int { 0 } // Also an operator
let i = 0 /^/ 1 // A binary operator, not a regex

We originally tried to do this purely based on the previous token while lexing, but it was less robust. However in any case, this is all strictly contained within the parser, no semantic analysis is required.

ksluder · April 21, 2022, 4:41pm

It’s perhaps important to note that this if the operator were implemented as func /¢*¢/<T>(T: lhs, T: rhs) { }, and if foo were also generic over op, this reformulation would not compile because the T would not be deducible.

hamishknight · April 21, 2022, 4:56pm

Could you elaborate? As far as I'm aware you would face the same issue for an unapplied reference, e.g both of these fail to compile:

infix operator /^/
func /^/ <T>(lhs: T, rhs: T) {}

func foo<T>(_ fn: (T, T) -> Void) {}
foo(/^/)
// error: Generic parameter 'T' could not be inferred

foo({ $0 /^/ $1 })
// error: Unable to infer type of a closure parameter '$0' in the current context

ksluder · April 21, 2022, 5:09pm

Sorry, I’m still in the _openExistential headspace where the inability for a closure to carry generic parameters is an issue.

nicklockwood · April 27, 2022, 7:00pm

+1 from me. I like it a lot, and I do think the /…/ spelling is worth fighting for, despite the edge-cases.