Empower String type with regular expression


(Jerome Paschoud) #1

I would like to see the String type to support regular expression per default. I think that a language that advertise itself as being a good scripting language should provide in its default implementation an easy way (=~ for example in Perl) to use regular expressions. I know that one can use the NSRegularExpression, but who really what to first create an NSRegularExpression object(whit all the nice escaping operation that come with every \), then get a NSTextCheckingResult, then get a range (and what I mean is a NSRange and not a NSRange<String.Index>) and finally perform slicing of your original string.

I realize that it could be considered as purely syntactic sugar, but what a nice one.

Jérôme


(David Waite) #2

It sounds like you are asking for two things:

=~ operator support for regular expressions
regular expression literals

I would love to see regular expressions be more usable in swift, but my opinion is that =~ is a bad idea, and regex literals have tradeoffs important to consider.

For =~ :
1. The semantics of the =~ operator in other languages maps to a return value of Int? - matching position or nil with no match. This means you would likely need to write if str =~ foo != nil { … }
2. The above is considerably longer and harder to read than (say) if foo.matches(str) {
2. The semantics of the =~ operator produce state around matched groups, which is usually exposed to the language as either thread-local or block-local data.
3. The =~ operator only makes sense when applied to strings (or some other random access text source) and regex

For regex literals, my only concern is that it makes regex a language feature over an extended or standard library feature. If there is ever a desire to have Swift usable on embedded systems, you would likely want to be able to drop regular expression support.

On the flip side, regex may require code generation/compilation to work. A statement like:

while Regex(“foo”).matches(currentLine) { ... }

has a performance hit in generating the regex parser on each invocation of the loop.

Not only does having a literal syntax allow the compiler to optimize this to generation of a single processor, but the compiler could even generate IL to do said processing.

-DW

···

On Dec 8, 2015, at 1:14 PM, Jerome Paschoud via swift-evolution <swift-evolution@swift.org> wrote:

I would like to see the String type to support regular expression per default. I think that a language that advertise itself as being a good scripting language should provide in its default implementation an easy way (=~ for example in Perl) to use regular expressions. I know that one can use the NSRegularExpression, but who really what to first create an NSRegularExpression object(whit all the nice escaping operation that come with every \), then get a NSTextCheckingResult, then get a range (and what I mean is a NSRange and not a NSRange<String.Index>) and finally perform slicing of your original string.

I realize that it could be considered as purely syntactic sugar, but what a nice one.

Jérôme

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Brent Royal-Gordon) #3

1. The semantics of the =~ operator in other languages maps to a return value of Int? - matching position or nil with no match. This means you would likely need to write if str =~ foo != nil { … }

Actually, I suspect the return value would be more like Range<String.Character.Index>?, or maybe even Regex.Match?.

2. The semantics of the =~ operator produce state around matched groups, which is usually exposed to the language as either thread-local or block-local data.

This is hugely dependent on the language and I think it’s pretty obvious that Swift would take a return-value-based approach:

  if let match = string =~ /”([^\\”]*(\\[\\”][^\\”]*)*)”/ {
    let quotedString = match.substrings[1]
    …
  }

For regex literals, my only concern is that it makes regex a language feature over an extended or standard library feature.

Not necessarily. The only thing that *has* to be in the language is support for parsing regex literals, and the only thing that has to be in the standard library is RegexLiteralConvertible. Even if there is a Regex type in the standard library, just that type could be dropped there without requiring huge compiler changes.

···

--
Brent Royal-Gordon
Architechies


(Chris Lattner) #4

Just MHO, but I’d really really like to see proper regex support in Swift someday.

I think it could fit naturally into the pattern matching syntax we already have - the obvious syntax for this pattern would use // delimiters.

It is also probably worth burning first-class language support for regexes. This would allow specifying variable captures inline in the pattern, would allow flexible syntax for defining regexes, support powerful extensions to the base regex model (e.g. Perl 6 style), and would provide better compile-time checking and error recovery for mistakes.

-Chris

···

On Dec 8, 2015, at 12:14 PM, Jerome Paschoud via swift-evolution <swift-evolution@swift.org> wrote:

I would like to see the String type to support regular expression per default. I think that a language that advertise itself as being a good scripting language should provide in its default implementation an easy way (=~ for example in Perl) to use regular expressions. I know that one can use the NSRegularExpression, but who really what to first create an NSRegularExpression object(whit all the nice escaping operation that come with every \), then get a NSTextCheckingResult, then get a range (and what I mean is a NSRange and not a NSRange<String.Index>) and finally perform slicing of your original string.


(Kametrixom Tikara) #5

I think we can extend this to something more general: VerbatimLiteralConvertible. A VerbatimLiteral would maybe start with a \" and end with "\ (Might not work with these delimiters). It would enable one to not have to escape special chars like " and \, so that they can be used just like that:

let verb = \"And he said: "\o/""\
let file = \"
First line
"Still in verbatim!
"\

We could then use this for the Reges type like:

struct Regex : VerbatimLiteralConvertible {
    init(verbatimLiteral value: String) {
        ...
    }
}

extension String {
    func match(regex: Regex) -> Range<String.Index>? {
        ...
    }
}

let matches = string.match(\"[a-z]"test"\)

Such a VerbatimLiteral also has the advantage of being able to copy-paste potentially big regex expressions in your code without the need to manually add all those "\" for escaping. Also it can be used to directly copy-past some text with correct line breaks and indentation.

I'd really love to see this, I'm just unsure of how the parser could delimit such a VerbatimLiteral.

Kame

···

On 08 Dec 2015, at 23:41, Brent Royal-Gordon via swift-evolution <swift-evolution@swift.org> wrote:

1. The semantics of the =~ operator in other languages maps to a return value of Int? - matching position or nil with no match. This means you would likely need to write if str =~ foo != nil { … }

Actually, I suspect the return value would be more like Range<String.Character.Index>?, or maybe even Regex.Match?.

2. The semantics of the =~ operator produce state around matched groups, which is usually exposed to the language as either thread-local or block-local data.

This is hugely dependent on the language and I think it’s pretty obvious that Swift would take a return-value-based approach:

   if let match = string =~ /”([^\\”]*(\\[\\”][^\\”]*)*)”/ {
       let quotedString = match.substrings[1]
       …
   }

For regex literals, my only concern is that it makes regex a language feature over an extended or standard library feature.

Not necessarily. The only thing that *has* to be in the language is support for parsing regex literals, and the only thing that has to be in the standard library is RegexLiteralConvertible. Even if there is a Regex type in the standard library, just that type could be dropped there without requiring huge compiler changes.

--
Brent Royal-Gordon
Architechies

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Jacob Bandes-Storch) #6

It'd be trickier, but you could imagine allowing the user to declare a
custom delimiter for their "verbatim literals", so a Regex type would use a
"/".

Ruby, for example, has string literal constructs which allow a wide range
of delimiters: https://simpleror.wordpress.com/2009/03/15/q-q-w-w-x-r-s/
(and Perl as well, I believe)

Jacob

···

On Tue, Dec 8, 2015 at 3:19 PM, Kametrixom Tikara via swift-evolution < swift-evolution@swift.org> wrote:

I think we can extend this to something more general:
VerbatimLiteralConvertible. A VerbatimLiteral would maybe start with a \"
and end with "\ (Might not work with these delimiters). It would enable one
to not have to escape special chars like " and \, so that they can be used
just like that:

let verb = \"And he said: "\o/""\
let file = \"
First line
"Still in verbatim!
"\

We could then use this for the Reges type like:

struct Regex : VerbatimLiteralConvertible {
    init(verbatimLiteral value: String) {
        ...
    }
}

extension String {
    func match(regex: Regex) -> Range<String.Index>? {
        ...
    }
}

let matches = string.match(\"[a-z]"test"\)

Such a VerbatimLiteral also has the advantage of being able to copy-paste
potentially big regex expressions in your code without the need to manually
add all those "\" for escaping. Also it can be used to directly copy-past
some text with correct line breaks and indentation.

I'd really love to see this, I'm just unsure of how the parser could
delimit such a VerbatimLiteral.

Kame

> On 08 Dec 2015, at 23:41, Brent Royal-Gordon via swift-evolution < > swift-evolution@swift.org> wrote:
>
>
>> 1. The semantics of the =~ operator in other languages maps to a return
value of Int? - matching position or nil with no match. This means you
would likely need to write if str =~ foo != nil { … }
>
> Actually, I suspect the return value would be more like
Range<String.Character.Index>?, or maybe even Regex.Match?.
>
>> 2. The semantics of the =~ operator produce state around matched
groups, which is usually exposed to the language as either thread-local or
block-local data.
>
> This is hugely dependent on the language and I think it’s pretty obvious
that Swift would take a return-value-based approach:
>
> if let match = string =~ /”([^\\”]*(\\[\\”][^\\”]*)*)”/ {
> let quotedString = match.substrings[1]
> …
> }
>
>> For regex literals, my only concern is that it makes regex a language
feature over an extended or standard library feature.
>
> Not necessarily. The only thing that *has* to be in the language is
support for parsing regex literals, and the only thing that has to be in
the standard library is RegexLiteralConvertible. Even if there is a Regex
type in the standard library, just that type could be dropped there without
requiring huge compiler changes.
>
> --
> Brent Royal-Gordon
> Architechies
>
> _______________________________________________
> swift-evolution mailing list
> swift-evolution@swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution