String update


(Michael Ilseman) #1

Hello, I just sent an email to swift-dev titled "State of String: ABI, Performance, Ergonomics, and You!” at https://lists.swift.org/pipermail/swift-dev/Week-of-Mon-20180108/006407.html, whose gist can be found at https://gist.github.com/milseman/bb39ef7f170641ae52c13600a512782f. I posted to swift-dev as much of the content is from an implementation perspective, but it also addresses many areas of potential evolution. Please refer to that email for details; here’s the recap from it:

### Recap: Potential Additions for Swift 5

* Some form of unmanaged or unsafe Strings, and corresponding APIs
* Exposing performance flags, and some way to request a scan to populate them
* API gaps
* Character and UnicodeScalar properties, such as isNewline
* Generalizing, and optimizing, String interpolation
* Regex literals, Regex type, and generalized pattern match destructuring
* Substitution APIs, in conjunction with Regexes.


(John Holdsworth) #2

Hi Michael,

Thanks for sending this through. It’s an interesting read. One section gave me pause however. I feel Swift should resist the siren call of combining Swift Syntax with Regex syntax as it falls on the wrong side of Occam's razor. ISO Regex syntax is plenty complex enough without trying to incorporate named capture, desirable as it may be. Also, if I was to go down that route, I’d move away from / as the delimiter which is a carry over from Perl to something like e”I am a regex” to give the lexer more to go on which could represent say, a cached instance of NSRegularExpression.

And now for something completely different...

Common usage patterns for a regex fall into 4 categories: deconstruction, replacement, iteration and switch/case. Ideally the representation of a regex match would the same for all four of these categories and I’d like to argue a set of expressive regex primitives can be created without building them into the language.

I’ve talked before about a regex match being coded as a string/regex subscripting into a string and I’ve been able to move this forward since last year. While this seems like an arbitrary operator to use it has some semantic sense in that you are addressing a sub-part of the string with pattern as you might use an index or a key. Subscripts also have some very interesting properties in Swift compared to other operators or functions: You don’t have to worry about precedence, they can be assigned to, used as an interator, and I've learned since my last email on this topic that the Swift type checker will disambiguate multiple subscript overloads on the basis of the type of the variable is being assigned to.

An extension to String can now realise the common use cases by judicious use of types:

var input = "Now is the time for all good men to come to the aid of the party"

if input["\\w+"] {
    print("match")
}

// receiving type controls data you get
if let firstMatch: Substring = input["\\w+"] {
    print("match: \(firstMatch)")
}

if let groupsOfFirstMatch: [Substring?] = input["(all) (\\w+)"] {
    print("groups: \(groupsOfFirstMatch)")
}

// "splat" out up to N groups of first match
if let (group1, group2): (String, String) = input["(all) (\\w+)"] {
    print("group1: \(group1), group2: \(group2)")
}

if let allGroupsOfAllMatches: [[Substring?]] = input["(\\w)(\\w*)"] {
    print("allGroups: \(allGroupsOfAllMatches)")
}

// regex replace by assignment
input["men"] = "folk"
print(input)

// parsing a properties file using regex as iterator
let props = """
    name1 = value1
    name2 = value2
    """

var params = [String: String]()
for groups in props["(\\w+)\\s*=\\s*(.*)"] {
    params[String(groups[1]!)] = String(groups[2]!)
}
print(params)

The case for switches is slightly more opaque in order to avoid executing the match twice but viable.

let match = RegexMatch()
switch input {
case RegexPattern("(\\w)(\\w*)", capture: match):
    let (first, rest) = input[match]
    print("\(first) \(rest)")
default:
    break
}

This is explored in the attached playground (repo: https://github.com/johnno1962/SwiftRegex4)

I’m not sure I really expect this to take off as an idea but I’d like to make sure it's out there as an option and it certainly qualifies as “out there”.

John

SwiftRegex4.playground.zip (14.7 KB)

···

On 10 Jan 2018, at 19:58, Michael Ilseman via swift-evolution <swift-evolution@swift.org> wrote:

Hello, I just sent an email to swift-dev titled "State of String: ABI, Performance, Ergonomics, and You!” at https://lists.swift.org/pipermail/swift-dev/Week-of-Mon-20180108/006407.html, whose gist can be found at https://gist.github.com/milseman/bb39ef7f170641ae52c13600a512782f. I posted to swift-dev as much of the content is from an implementation perspective, but it also addresses many areas of potential evolution. Please refer to that email for details; here’s the recap from it:

### Recap: Potential Additions for Swift 5

* Some form of unmanaged or unsafe Strings, and corresponding APIs
* Exposing performance flags, and some way to request a scan to populate them
* API gaps
* Character and UnicodeScalar properties, such as isNewline
* Generalizing, and optimizing, String interpolation
* Regex literals, Regex type, and generalized pattern match destructuring
* Substitution APIs, in conjunction with Regexes.

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(George) #3

Thanks, Michael. This is very interesting!

I wonder if it is worth considering (for lack of a better word) *verbose* regular expression for Swift.

For instance, your example:

let usPhoneNumber = /
  (let area: Int? <- \d{3}?) -
  (let routing: Int <- \d{3}) -
  (let local: Int <- \d{4}) /

would become something like (strawman syntax):

let usPhoneNumber = /let area: Int? <- .numberFromDigits(.exactly(3))/ + "-" +
                    /let routing: Int <- .numberFromDigits(.exactly(3))/ + "-"
                    /let local: Int <- .numberFromDigits(.exactly(4))/

With this format, I also noticed that your code wouldn't match "555-5555", only "-555-5555", so maybe it would end up being something like:

let usPhoneNumber = .optional(/let area: Int <- .numberFromDigits(.exactly(3))/ + "-") +
                    /let routing: Int <- .numberFromDigits(.exactly(3))/ + "-"
                    /let local: Int <- .numberFromDigits(.exactly(4))/

Notice that `area` is initially a non-optional `Int`, but becomes optional when transformed by the `optional` directive.
Other directives may be:

let decimal = /let beforeDecimalPoint: Int <-- .numberFromDigits(.oneOrMore)/ +
              .optional("." + /let afterDecimalPoint: Int <-- .numberFromDigits(.oneOrMore)/

In this world, the `/<--/` format will only be used for explicit binding, and the rest will be inferred from generic `+` operators.

I also think it would be helpful if `Regex` was generic over all sequence types.
Going back to the phone example, this would looks something like:

let usPhoneNumber = .optional(/let area: Int <- .numberFromDigits(.exactly(3))/ + "-") +
                    /let routing: Int <- .numberFromDigits(.exactly(3))/ + "-"
                    /let local: Int <- .numberFromDigits(.exactly(4))/
print(type(of: usPhoneNumber)) // => Regex<UnicodeScalar, (area: Int?, routing: Int, local: Int)>

Note the addition of `UnicodeScalar` to the signature of `Regex`. Other interesting signatures are `Regex<JSONToken, JSONEnumeration>` or `Regex<HTTPRequestHeaderToken, HTTPRequestHeader>`. Building parsers becomes fun!

- George

···

On Jan 10, 2018, at 11:58 AM, Michael Ilseman via swift-evolution <swift-evolution@swift.org> wrote:

Hello, I just sent an email to swift-dev titled "State of String: ABI, Performance, Ergonomics, and You!” at https://lists.swift.org/pipermail/swift-dev/Week-of-Mon-20180108/006407.html, whose gist can be found at https://gist.github.com/milseman/bb39ef7f170641ae52c13600a512782f. I posted to swift-dev as much of the content is from an implementation perspective, but it also addresses many areas of potential evolution. Please refer to that email for details; here’s the recap from it:

### Recap: Potential Additions for Swift 5

* Some form of unmanaged or unsafe Strings, and corresponding APIs
* Exposing performance flags, and some way to request a scan to populate them
* API gaps
* Character and UnicodeScalar properties, such as isNewline
* Generalizing, and optimizing, String interpolation
* Regex literals, Regex type, and generalized pattern match destructuring
* Substitution APIs, in conjunction with Regexes.

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(C. Keith Ray) #4

That looks great. One thing I would look for is iterating over multiple matches in a string. I'd want to see lazy and non-lazy sequences.

    let wordMatcher = Regex(":w*") // or whatever matches word-characters
    // separated by non-word character.

     for w in aString[allMatches: wordMatcher] { print(w) }

     for w in warAndPeaceNovel[allMatchesLazy: wordMatcher].prefix(50) { print(w) }

···

--
C. Keith Ray

* https://leanpub.com/wepntk <- buy my book?
* http://www.thirdfoundationsw.com/keith_ray_resume_2014_long.pdf
* http://agilesolutionspace.blogspot.com/

On Jan 11, 2018, at 9:50 AM, John Holdsworth via swift-evolution <swift-evolution@swift.org> wrote:

Hi Michael,

Thanks for sending this through. It’s an interesting read. One section gave me pause however. I feel Swift should resist the siren call of combining Swift Syntax with Regex syntax as it falls on the wrong side of Occam's razor. ISO Regex syntax is plenty complex enough without trying to incorporate named capture, desirable as it may be. Also, if I was to go down that route, I’d move away from / as the delimiter which is a carry over from Perl to something like e”I am a regex” to give the lexer more to go on which could represent say, a cached instance of NSRegularExpression.

And now for something completely different...

Common usage patterns for a regex fall into 4 categories: deconstruction, replacement, iteration and switch/case. Ideally the representation of a regex match would the same for all four of these categories and I’d like to argue a set of expressive regex primitives can be created without building them into the language.

I’ve talked before about a regex match being coded as a string/regex subscripting into a string and I’ve been able to move this forward since last year. While this seems like an arbitrary operator to use it has some semantic sense in that you are addressing a sub-part of the string with pattern as you might use an index or a key. Subscripts also have some very interesting properties in Swift compared to other operators or functions: You don’t have to worry about precedence, they can be assigned to, used as an interator, and I've learned since my last email on this topic that the Swift type checker will disambiguate multiple subscript overloads on the basis of the type of the variable is being assigned to.

An extension to String can now realise the common use cases by judicious use of types:

var input = "Now is the time for all good men to come to the aid of the party"

if input["\\w+"] {
    print("match")
}

// receiving type controls data you get
if let firstMatch: Substring = input["\\w+"] {
    print("match: \(firstMatch)")
}

if let groupsOfFirstMatch: [Substring?] = input["(all) (\\w+)"] {
    print("groups: \(groupsOfFirstMatch)")
}

// "splat" out up to N groups of first match
if let (group1, group2): (String, String) = input["(all) (\\w+)"] {
    print("group1: \(group1), group2: \(group2)")
}

if let allGroupsOfAllMatches: [[Substring?]] = input["(\\w)(\\w*)"] {
    print("allGroups: \(allGroupsOfAllMatches)")
}

// regex replace by assignment
input["men"] = "folk"
print(input)

// parsing a properties file using regex as iterator
let props = """
    name1 = value1
    name2 = value2
    """

var params = [String: String]()
for groups in props["(\\w+)\\s*=\\s*(.*)"] {
    params[String(groups[1]!)] = String(groups[2]!)
}
print(params)

The case for switches is slightly more opaque in order to avoid executing the match twice but viable.

let match = RegexMatch()
switch input {
case RegexPattern("(\\w)(\\w*)", capture: match):
    let (first, rest) = input[match]
    print("\(first) \(rest)")
default:
    break
}

This is explored in the attached playground (repo: https://github.com/johnno1962/SwiftRegex4)
<SwiftRegex4.playground.zip>

I’m not sure I really expect this to take off as an idea but I’d like to make sure it's out there as an option and it certainly qualifies as “out there”.

John

On 10 Jan 2018, at 19:58, Michael Ilseman via swift-evolution <swift-evolution@swift.org> wrote:

Hello, I just sent an email to swift-dev titled "State of String: ABI, Performance, Ergonomics, and You!” at https://lists.swift.org/pipermail/swift-dev/Week-of-Mon-20180108/006407.html, whose gist can be found at https://gist.github.com/milseman/bb39ef7f170641ae52c13600a512782f. I posted to swift-dev as much of the content is from an implementation perspective, but it also addresses many areas of potential evolution. Please refer to that email for details; here’s the recap from it:

### Recap: Potential Additions for Swift 5

* Some form of unmanaged or unsafe Strings, and corresponding APIs
* Exposing performance flags, and some way to request a scan to populate them
* API gaps
* Character and UnicodeScalar properties, such as isNewline
* Generalizing, and optimizing, String interpolation
* Regex literals, Regex type, and generalized pattern match destructuring
* Substitution APIs, in conjunction with Regexes.

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Michael Ilseman) #5

Hi Michael,

Thanks for sending this through. It’s an interesting read. One section gave me pause however. I feel Swift should resist the siren call of combining Swift Syntax with Regex syntax as it falls on the wrong side of Occam's razor. ISO Regex syntax is plenty complex enough without trying to incorporate named capture, desirable as it may be. Also, if I was to go down that route, I’d move away from / as the delimiter which is a carry over from Perl to something like e”I am a regex” to give the lexer more to go on which could represent say, a cached instance of NSRegularExpression.

Sorry for the confusion, in no way is the syntax of regex literals tied to any syntactic standard or historical baggage. It might happen to align when obvious or beneficial to common practice, i.e. use same basic meta-characters and built in character classes. This is important as they certainly wouldn’t honor any standard semantically, other than perhaps UTS-18 level-2 by coincidence (which doesn’t dictate preference of ambiguous matches, AFAIK).

As a downside, this does open a huge domain of bike shedding :wink:

The approach mentioned would allow someone (e.g. SPM packages) to provide functionality such as (ignoring style) for execution on Swift’s regex engine:

func compilePOSIX(_: String) throws -> Regex<[Any]> // Or perhaps Regex<[Substring]>, or Regex<POSIXMatch>, details...
func compileRE2(_: String) -> Regex<[Any]> // ditto
… PCRE, ICU, JS, Perl 5, Perl 6, etc. ...

(Note that we can't use NSRegularExpression as an execution engine out-of-the-box, as it relies on ICU which doesn’t provide matching modulo canonical equivalence. Not to mention the performance issues….)

And now for something completely different...

Common usage patterns for a regex fall into 4 categories: deconstruction, replacement, iteration and switch/case.

Could you elaborate more on this breakdown? What are the differences between deconstruction, iteration, and switch/case?

Ideally the representation of a regex match would the same for all four of these categories and I’d like to argue a set of expressive regex primitives can be created without building them into the language.

BTW, the “built into the language” would be confined to the regex literal syntax. The Regex<T> type wouldn’t necessarily need to be built-in, and could be constructed through other means.

I’ve talked before about a regex match being coded as a string/regex subscripting into a string and I’ve been able to move this forward since last year. While this seems like an arbitrary operator to use it has some semantic sense in that you are addressing a sub-part of the string with pattern as you might use an index or a key. Subscripts also have some very interesting properties in Swift compared to other operators or functions: You don’t have to worry about precedence, they can be assigned to, used as an interator, and I've learned since my last email on this topic that the Swift type checker will disambiguate multiple subscript overloads on the basis of the type of the variable is being assigned to.

Why do you use String as a regex rather than a new type, which could be ExpressibleByStringLiteral? That might help with overloading or ambiguities, and a new type is a something we can extend with regex-specific functionality.

String could have a generic subscript from Regex<T> to T. Perhaps it could also be done as a setter, assigning a value of T (which may have to be string convertible... details).

An extension to String can now realise the common use cases by judicious use of types:

var input = "Now is the time for all good men to come to the aid of the party"

if input["\\w+"] {
    print("match")
}

// receiving type controls data you get
if let firstMatch: Substring = input["\\w+"] {
    print("match: \(firstMatch)")
}

if let groupsOfFirstMatch: [Substring?] = input["(all) (\\w+)"] {
    print("groups: \(groupsOfFirstMatch)")
}

// "splat" out up to N groups of first match
if let (group1, group2): (String, String) = input["(all) (\\w+)"] {
    print("group1: \(group1), group2: \(group2)")
}

if let allGroupsOfAllMatches: [[Substring?]] = input["(\\w)(\\w*)"] {
    print("allGroups: \(allGroupsOfAllMatches)")
}

I’m interested in how you view the tradeoffs of not introducing a new type. If there was a Regex<T>, it could have computed properties for, e.g. an eager allMatches, lazy allMatches, firstMatch (given some ordering semantics), ignoringCaptures, caseInsensitive, …. then you don’t need your “(all) ” directives.

// regex replace by assignment
input["men"] = "folk"
print(input)

Ok, you’re starting to sell me on the subscript-setter that takes a Regex ;-). The setter value wouldn’t be able to use information from the captures, so we would probably still want a substitute API that takes a closure receiving captures, but this looks nice for simple usage.

Transcribing into the presented approach (Using 「 and 」 as delimiters in a surely-futile effort to not focus on specific syntax):

input[「\d+」.firstMatch] = 123
input[「\d+」.allMatches] = sequence(first: 42) { return $0 + 1 }

// parsing a properties file using regex as iterator
let props = """
    name1 = value1
    name2 = value2
    """

var params = [String: String]()
for groups in props["(\\w+)\\s*=\\s*(.*)"] {
    params[String(groups[1]!)] = String(groups[2]!)
}
print(params)

Translating this over to the literal style:

for (name, value) in props[「(let _ = \w+) \s* = \s* (let _ = .*)」.lineByLine] {
  print(name, value)
}

Or even better, give it a name!

let propertyPattern = 「(let name = \w+) \s* = \s* (let value = .*) // Regex<(name: Substring, value: Substring)>
for (name, value) in props[propertyPattern.lineByLine] {
  print(name, value)
}

The case for switches is slightly more opaque in order to avoid executing the match twice but viable.

let match = RegexMatch()
switch input {
case RegexPattern("(\\w)(\\w*)", capture: match):
    let (first, rest) = input[match]
    print("\(first) \(rest)")
default:
    break
}

Using the literal approach:

let peelFirstWordChar = 「(let leading = \w)(let trailing = \w+)」 // Regex<(leading: Substring, trailing: Substring)>, or perhaps Regex<(leading: Character, trailing: Substring>), details….
switch input {
case let (first, rest) <- peelFirstWordChar:
  print(“\(first) \(rest)”)
}

This is explored in the attached playground (repo: https://github.com/johnno1962/SwiftRegex4)
<SwiftRegex4.playground.zip>

I’m not sure I really expect this to take off as an idea but I’d like to make sure it's out there as an option and it certainly qualifies as “out there”.

I think it’s very interesting! Thanks for sharing. Do you have more usage examples?

···

On Jan 11, 2018, at 9:49 AM, John Holdsworth <mac@johnholdsworth.com> wrote:

John

On 10 Jan 2018, at 19:58, Michael Ilseman via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Hello, I just sent an email to swift-dev titled "State of String: ABI, Performance, Ergonomics, and You!” at https://lists.swift.org/pipermail/swift-dev/Week-of-Mon-20180108/006407.html, whose gist can be found at https://gist.github.com/milseman/bb39ef7f170641ae52c13600a512782f. I posted to swift-dev as much of the content is from an implementation perspective, but it also addresses many areas of potential evolution. Please refer to that email for details; here’s the recap from it:

### Recap: Potential Additions for Swift 5

* Some form of unmanaged or unsafe Strings, and corresponding APIs
* Exposing performance flags, and some way to request a scan to populate them
* API gaps
* Character and UnicodeScalar properties, such as isNewline
* Generalizing, and optimizing, String interpolation
* Regex literals, Regex type, and generalized pattern match destructuring
* Substitution APIs, in conjunction with Regexes.

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution


(John Holdsworth) #6

Look no further, the iterator in a loop context is an actual iterator and lazy:

for groups in props["(\\w+)\\s*=\\s*(.*)"] {
    params[String(groups[1]!)] = String(groups[2]!)
}

If you break half way through the remaining matches will not have been performed.

This is as opposed to the following which would be exhaustive.

if let allGroupsOfAllMatches: [[Substring?]] = props["(\\w+)\\s*=\\s*(.*)"] {
    for groups in allGroupsOfAllMatches {
        params[String(groups[1]!)] = String(groups[2]!)
    }
}

···

On 11 Jan 2018, at 18:15, C. Keith Ray <keithray@mac.com> wrote:

That looks great. One thing I would look for is iterating over multiple matches in a string. I'd want to see lazy and non-lazy sequences.

    let wordMatcher = Regex(":w*") // or whatever matches word-characters
    // separated by non-word character.

     for w in aString[allMatches: wordMatcher] { print(w) }

     for w in warAndPeaceNovel[allMatchesLazy: wordMatcher].prefix(50) { print(w) }

--
C. Keith Ray

* https://leanpub.com/wepntk <- buy my book?
* http://www.thirdfoundationsw.com/keith_ray_resume_2014_long.pdf
* http://agilesolutionspace.blogspot.com/

On Jan 11, 2018, at 9:50 AM, John Holdsworth via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Hi Michael,

Thanks for sending this through. It’s an interesting read. One section gave me pause however. I feel Swift should resist the siren call of combining Swift Syntax with Regex syntax as it falls on the wrong side of Occam's razor. ISO Regex syntax is plenty complex enough without trying to incorporate named capture, desirable as it may be. Also, if I was to go down that route, I’d move away from / as the delimiter which is a carry over from Perl to something like e”I am a regex” to give the lexer more to go on which could represent say, a cached instance of NSRegularExpression.

And now for something completely different...

Common usage patterns for a regex fall into 4 categories: deconstruction, replacement, iteration and switch/case. Ideally the representation of a regex match would the same for all four of these categories and I’d like to argue a set of expressive regex primitives can be created without building them into the language.

I’ve talked before about a regex match being coded as a string/regex subscripting into a string and I’ve been able to move this forward since last year. While this seems like an arbitrary operator to use it has some semantic sense in that you are addressing a sub-part of the string with pattern as you might use an index or a key. Subscripts also have some very interesting properties in Swift compared to other operators or functions: You don’t have to worry about precedence, they can be assigned to, used as an interator, and I've learned since my last email on this topic that the Swift type checker will disambiguate multiple subscript overloads on the basis of the type of the variable is being assigned to.

An extension to String can now realise the common use cases by judicious use of types:

var input = "Now is the time for all good men to come to the aid of the party"

if input["\\w+"] {
    print("match")
}

// receiving type controls data you get
if let firstMatch: Substring = input["\\w+"] {
    print("match: \(firstMatch)")
}

if let groupsOfFirstMatch: [Substring?] = input["(all) (\\w+)"] {
    print("groups: \(groupsOfFirstMatch)")
}

// "splat" out up to N groups of first match
if let (group1, group2): (String, String) = input["(all) (\\w+)"] {
    print("group1: \(group1), group2: \(group2)")
}

if let allGroupsOfAllMatches: [[Substring?]] = input["(\\w)(\\w*)"] {
    print("allGroups: \(allGroupsOfAllMatches)")
}

// regex replace by assignment
input["men"] = "folk"
print(input)

// parsing a properties file using regex as iterator
let props = """
    name1 = value1
    name2 = value2
    """

var params = [String: String]()
for groups in props["(\\w+)\\s*=\\s*(.*)"] {
    params[String(groups[1]!)] = String(groups[2]!)
}
print(params)

The case for switches is slightly more opaque in order to avoid executing the match twice but viable.

let match = RegexMatch()
switch input {
case RegexPattern("(\\w)(\\w*)", capture: match):
    let (first, rest) = input[match]
    print("\(first) \(rest)")
default:
    break
}

This is explored in the attached playground (repo: https://github.com/johnno1962/SwiftRegex4)
<SwiftRegex4.playground.zip>

I’m not sure I really expect this to take off as an idea but I’d like to make sure it's out there as an option and it certainly qualifies as “out there”.

John

On 10 Jan 2018, at 19:58, Michael Ilseman via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Hello, I just sent an email to swift-dev titled "State of String: ABI, Performance, Ergonomics, and You!” at https://lists.swift.org/pipermail/swift-dev/Week-of-Mon-20180108/006407.html, whose gist can be found at https://gist.github.com/milseman/bb39ef7f170641ae52c13600a512782f. I posted to swift-dev as much of the content is from an implementation perspective, but it also addresses many areas of potential evolution. Please refer to that email for details; here’s the recap from it:

### Recap: Potential Additions for Swift 5

* Some form of unmanaged or unsafe Strings, and corresponding APIs
* Exposing performance flags, and some way to request a scan to populate them
* API gaps
* Character and UnicodeScalar properties, such as isNewline
* Generalizing, and optimizing, String interpolation
* Regex literals, Regex type, and generalized pattern match destructuring
* Substitution APIs, in conjunction with Regexes.

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution


Prepitch: Character integer literals
Prepitch: Character integer literals
(Eneko Alonso) #7

Could it be possible to specify the regex type ahead avoiding having to specify the type of each captured group?

let usPhoneNumber: Regex<UnicodeScalar, (area: Int?, routing: Int, local: Int)> = /
  (\d{3}?) -
  (\d{3}) -
  (\d{4}) /

“Verbose” alternative:

let usPhoneNumber: Regex<UnicodeScalar, (area: Int?, routing: Int, local: Int)> = /
  .optional(.numberFromDigits(.exactly(3)) + "-“) +
  .numberFromDigits(.exactly(3)) + "-"
  .numberFromDigits(.exactly(4)) /
print(type(of: usPhoneNumber)) // => Regex<UnicodeScalar, (area: Int?, routing: Int, local: Int)>

Thanks,
Eneko

···

On Jan 16, 2018, at 8:52 AM, George Leontiev via swift-evolution <swift-evolution@swift.org> wrote:

Thanks, Michael. This is very interesting!

I wonder if it is worth considering (for lack of a better word) *verbose* regular expression for Swift.

For instance, your example:

let usPhoneNumber = /
  (let area: Int? <- \d{3}?) -
  (let routing: Int <- \d{3}) -
  (let local: Int <- \d{4}) /

would become something like (strawman syntax):

let usPhoneNumber = /let area: Int? <- .numberFromDigits(.exactly(3))/ + "-" +
                    /let routing: Int <- .numberFromDigits(.exactly(3))/ + "-"
                    /let local: Int <- .numberFromDigits(.exactly(4))/

With this format, I also noticed that your code wouldn't match "555-5555", only "-555-5555", so maybe it would end up being something like:

let usPhoneNumber = .optional(/let area: Int <- .numberFromDigits(.exactly(3))/ + "-") +
                    /let routing: Int <- .numberFromDigits(.exactly(3))/ + "-"
                    /let local: Int <- .numberFromDigits(.exactly(4))/

Notice that `area` is initially a non-optional `Int`, but becomes optional when transformed by the `optional` directive.
Other directives may be:

let decimal = /let beforeDecimalPoint: Int <-- .numberFromDigits(.oneOrMore)/ +
              .optional("." + /let afterDecimalPoint: Int <-- .numberFromDigits(.oneOrMore)/

In this world, the `/<--/` format will only be used for explicit binding, and the rest will be inferred from generic `+` operators.

I also think it would be helpful if `Regex` was generic over all sequence types.
Going back to the phone example, this would looks something like:

let usPhoneNumber = .optional(/let area: Int <- .numberFromDigits(.exactly(3))/ + "-") +
                    /let routing: Int <- .numberFromDigits(.exactly(3))/ + "-"
                    /let local: Int <- .numberFromDigits(.exactly(4))/
print(type(of: usPhoneNumber)) // => Regex<UnicodeScalar, (area: Int?, routing: Int, local: Int)>

Note the addition of `UnicodeScalar` to the signature of `Regex`. Other interesting signatures are `Regex<JSONToken, JSONEnumeration>` or `Regex<HTTPRequestHeaderToken, HTTPRequestHeader>`. Building parsers becomes fun!

- George

On Jan 10, 2018, at 11:58 AM, Michael Ilseman via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Hello, I just sent an email to swift-dev titled "State of String: ABI, Performance, Ergonomics, and You!” at https://lists.swift.org/pipermail/swift-dev/Week-of-Mon-20180108/006407.html, whose gist can be found at https://gist.github.com/milseman/bb39ef7f170641ae52c13600a512782f. I posted to swift-dev as much of the content is from an implementation perspective, but it also addresses many areas of potential evolution. Please refer to that email for details; here’s the recap from it:

### Recap: Potential Additions for Swift 5

* Some form of unmanaged or unsafe Strings, and corresponding APIs
* Exposing performance flags, and some way to request a scan to populate them
* API gaps
* Character and UnicodeScalar properties, such as isNewline
* Generalizing, and optimizing, String interpolation
* Regex literals, Regex type, and generalized pattern match destructuring
* Substitution APIs, in conjunction with Regexes.

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(John Holdsworth) #8

Hi Michael,

Thanks for getting back

Thanks for sending this through. It’s an interesting read. One section gave me pause however. I feel Swift should resist the siren call of combining Swift Syntax with Regex syntax as it falls on the wrong side of Occam's razor. ISO Regex syntax is plenty complex enough without trying to incorporate named capture, desirable as it may be. Also, if I was to go down that route, I’d move away from / as the delimiter which is a carry over from Perl to something like e”I am a regex” to give the lexer more to go on which could represent say, a cached instance of NSRegularExpression.

Sorry for the confusion, in no way is the syntax of regex literals tied to any syntactic standard or historical baggage. It might happen to align when obvious or beneficial to common practice, i.e. use same basic meta-characters and built in character classes. This is important as they certainly wouldn’t honor any standard semantically, other than perhaps UTS-18 level-2 by coincidence (which doesn’t dictate preference of ambiguous matches, AFAIK).

As a downside, this does open a huge domain of bike shedding :wink:

The approach mentioned would allow someone (e.g. SPM packages) to provide functionality such as (ignoring style) for execution on Swift’s regex engine:

func compilePOSIX(_: String) throws -> Regex<[Any]> // Or perhaps Regex<[Substring]>, or Regex<POSIXMatch>, details...
func compileRE2(_: String) -> Regex<[Any]> // ditto
… PCRE, ICU, JS, Perl 5, Perl 6, etc. ...

(Note that we can't use NSRegularExpression as an execution engine out-of-the-box, as it relies on ICU which doesn’t provide matching modulo canonical equivalence. Not to mention the performance issues….)

I see you’re prepared to really lift the lid on what is a regex literal !

I've read your comments and completely rewritten the regex playground using a a generic regex object with generic subscripts and the result was definitely a case of less code is more. It also neatly separates the problem out into two parts. The first: conversion from the new compiler supported literal to a typed generic and the second: what is possible using the subscript notation implemented today using string regex literals. What you loose using strings for the pattern literal is the type inference and being able to name captures in the literal. See the new playground for details.

And now for something completely different...

Common usage patterns for a regex fall into 4 categories: deconstruction, replacement, iteration and switch/case.

Could you elaborate more on this breakdown? What are the differences between deconstruction, iteration, and switch/case?

I’m thinking of the common regex operations
detect a match
extract the first match
extract the group captures of the first match
extract all matches
extract the group captures of all matches
assign to all of the above
iterate lazily over match/group captures.
pass closure over all matches.
& Is there something interesting to be done for case statements?

Ideally the representation of a regex match would the same for all four of these categories and I’d like to argue a set of expressive regex primitives can be created without building them into the language.

BTW, the “built into the language” would be confined to the regex literal syntax. The Regex<T> type wouldn’t necessarily need to be built-in, and could be constructed through other means.

I’ve talked before about a regex match being coded as a string/regex subscripting into a string and I’ve been able to move this forward since last year. While this seems like an arbitrary operator to use it has some semantic sense in that you are addressing a sub-part of the string with pattern as you might use an index or a key. Subscripts also have some very interesting properties in Swift compared to other operators or functions: You don’t have to worry about precedence, they can be assigned to, used as an interator, and I've learned since my last email on this topic that the Swift type checker will disambiguate multiple subscript overloads on the basis of the type of the variable is being assigned to.

Why do you use String as a regex rather than a new type, which could be ExpressibleByStringLiteral? That might help with overloading or ambiguities, and a new type is a something we can extend with regex-specific functionality.

String could have a generic subscript from Regex<T> to T. Perhaps it could also be done as a setter, assigning a value of T (which may have to be string convertible... details).

I’ve used a protocol for string regex literals so they can include regex options i.e. "(\\w)(\\w*)".caseInsensitive so the gap between string as a literal and compiler supported version is less.

public protocol RegexLiteral {
    var regexPattern: String { get }
    var regexOptions: NSRegularExpression.Options { get }
}

An extension to String can now realise the common use cases by judicious use of types:

var input = "Now is the time for all good men to come to the aid of the party"

if input["\\w+"] {
    print("match")
}

// receiving type controls data you get
if let firstMatch: Substring = input["\\w+"] {
    print("match: \(firstMatch)")
}

if let groupsOfFirstMatch: [Substring?] = input["(all) (\\w+)"] {
    print("groups: \(groupsOfFirstMatch)")
}

// "splat" out up to N groups of first match
if let (group1, group2): (String, String) = input["(all) (\\w+)"] {
    print("group1: \(group1), group2: \(group2)")
}

if let allGroupsOfAllMatches: [[Substring?]] = input["(\\w)(\\w*)"] {
    print("allGroups: \(allGroupsOfAllMatches)")
}

I’m interested in how you view the tradeoffs of not introducing a new type. If there was a Regex<T>, it could have computed properties for, e.g. an eager allMatches, lazy allMatches, firstMatch (given some ordering semantics), ignoringCaptures, caseInsensitive, …. then you don’t need your “(all) ” directives.

“all” directives in the new version come out of the type context you assign the match to/replace from. If it is an array then it is “all” otherwise first match operations.

if let match: (String, String, String) = numbers["(\\d+) (\\d+)-(\\d+)"] {
    print(match)
}
numbers["(\\d+) (\\d+)-(\\d+)"] = ("555", "777", "1234")
XCTAssertEqual(numbers, "phone: 555 777-1234 fax: 555 666-4321")

// arrays of tuples operate on all matches

let matches2: [(String, String, String)] = numbers["(\\d+) (\\d+)-(\\d+)"]
print(matches2)
numbers["(\\d+) (\\d+)-(\\d+)"] = [("555", "888", "1234"), ("555", "999", "4321")]
XCTAssertEqual(numbers, "phone: 555 888-1234 fax: 555 999-4321")

// regex replace by assignment
input["men"] = "folk"
print(input)

Ok, you’re starting to sell me on the subscript-setter that takes a Regex ;-). The setter value wouldn’t be able to use information from the captures, so we would probably still want a substitute API that takes a closure receiving captures, but this looks nice for simple usage.

There's a public api on the underlying regex object for all operations which the subscripts use. In the case of a closure this can also be assigned to a match instead of a template though this borders on obfuscation:

str["(\\w)(\\w*)"] = {
    (groups: (first: String, rest: String), stop) -> String in
    return groups.first+groups.rest.uppercased()
}

which subscripts translate to:

str = Regex<(String, String)>(pattern: "(\\w)(\\w*)").replacing(target: str, exec: {
    (groups: (first: String, rest: String), stop) -> String in
    return groups.first+groups.rest.uppercased()
})

Transcribing into the presented approach (Using 「 and 」 as delimiters in a surely-futile effort to not focus on specific syntax):

input[「\d+」.firstMatch] = 123
input[「\d+」.allMatches] = sequence(first: 42) { return $0 + 1 }

This is how you’d have to do it if you relied on the literal to contain the type.

// parsing a properties file using regex as iterator
let props = """
    name1 = value1
    name2 = value2
    """

var params = [String: String]()
for groups in props["(\\w+)\\s*=\\s*(.*)"] {
    params[String(groups[1]!)] = String(groups[2]!)
}
print(params)

Translating this over to the literal style:

for (name, value) in props[「(let _ = \w+) \s* = \s* (let _ = .*)」.lineByLine] {
  print(name, value)
}

.lineByLine is .regexLazy which changes the type of the literal to force an iterator.

Or even better, give it a name!

let propertyPattern = 「(let name = \w+) \s* = \s* (let value = .*) // Regex<(name: Substring, value: Substring)>
for (name, value) in props[propertyPattern.lineByLine] {
  print(name, value)
}

This is an interesting case of where I wonder what the gain in naming the captures really is considering the disruption to regex syntax this would involve. The names in the assignment override them in the finish.

The case for switches is slightly more opaque in order to avoid executing the match twice but viable.

let match = RegexMatch()
switch input {
case RegexPattern("(\\w)(\\w*)", capture: match):
    let (first, rest) = input[match]
    print("\(first) \(rest)")
default:
    break
}

Using the literal approach:

let peelFirstWordChar = 「(let leading = \w)(let trailing = \w+)」 // Regex<(leading: Substring, trailing: Substring)>, or perhaps Regex<(leading: Character, trailing: Substring>), details….
switch input {
case let (first, rest) <- peelFirstWordChar:
  print(“\(first) \(rest)”)
}

For switch, in the absence of compiler support for bindings I came up with:

let match = RegexMatch()
switch str {
case "(\\w)(\\w*)".regex(capture: match):
    let (first, rest): (String, String) = str[match]
    print("\(first)~\(rest)")
default:
    break
}

This is explored in the attached playground (repo: https://github.com/johnno1962/SwiftRegex4)
<SwiftRegex4.playground.zip>

I’m not sure I really expect this to take off as an idea but I’d like to make sure it's out there as an option and it certainly qualifies as “out there”.

I think it’s very interesting! Thanks for sharing. Do you have more usage examples?

I’ve expanded out the playground to cover most of the things you can do. The generic subscripts made the difference (Thanks Swift4!)

SwiftRegex5.playground.zip (29.5 KB)

···

On 12 Jan 2018, at 02:01, Michael Ilseman <milseman@apple.com> wrote:

On Jan 11, 2018, at 9:49 AM, John Holdsworth <mac@johnholdsworth.com <mailto:mac@johnholdsworth.com>> wrote:

John

On 10 Jan 2018, at 19:58, Michael Ilseman via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Hello, I just sent an email to swift-dev titled "State of String: ABI, Performance, Ergonomics, and You!” at https://lists.swift.org/pipermail/swift-dev/Week-of-Mon-20180108/006407.html, whose gist can be found at https://gist.github.com/milseman/bb39ef7f170641ae52c13600a512782f. I posted to swift-dev as much of the content is from an implementation perspective, but it also addresses many areas of potential evolution. Please refer to that email for details; here’s the recap from it:

### Recap: Potential Additions for Swift 5

* Some form of unmanaged or unsafe Strings, and corresponding APIs
* Exposing performance flags, and some way to request a scan to populate them
* API gaps
* Character and UnicodeScalar properties, such as isNewline
* Generalizing, and optimizing, String interpolation
* Regex literals, Regex type, and generalized pattern match destructuring
* Substitution APIs, in conjunction with Regexes.

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution


(George) #9

@Eneko While it sure seems possible to specify the type, I think this would go against the salient point "If something’s worth capturing, it’s worth giving it a name.” Putting the name further away seems like a step backward.

I could imagine a slightly more succinct syntax where things like .numberFromDigits are replaced by protocol conformance of the bound type:

extension Int: Regexable {
    func baseRegex<T>() -> Regex<T, Int>
}
let usPhoneNumber = (/let area: Int/.exactDigits(3) + "-").oneOrZero +
                    /let routing: Int/.exactDigits(3) + "-" +
                    /let local: Int/.exactDigits(4)

In this model, the `//` syntax will only be used for initial binding and swifty transformations will build the final regex.

···

On Jan 16, 2018, at 9:20 AM, Eneko Alonso via swift-evolution <swift-evolution@swift.org> wrote:

Could it be possible to specify the regex type ahead avoiding having to specify the type of each captured group?

let usPhoneNumber: Regex<UnicodeScalar, (area: Int?, routing: Int, local: Int)> = /
  (\d{3}?) -
  (\d{3}) -
  (\d{4}) /

“Verbose” alternative:

let usPhoneNumber: Regex<UnicodeScalar, (area: Int?, routing: Int, local: Int)> = /
  .optional(.numberFromDigits(.exactly(3)) + "-“) +
  .numberFromDigits(.exactly(3)) + "-"
  .numberFromDigits(.exactly(4)) /
print(type(of: usPhoneNumber)) // => Regex<UnicodeScalar, (area: Int?, routing: Int, local: Int)>

Thanks,
Eneko

On Jan 16, 2018, at 8:52 AM, George Leontiev via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Thanks, Michael. This is very interesting!

I wonder if it is worth considering (for lack of a better word) *verbose* regular expression for Swift.

For instance, your example:

let usPhoneNumber = /
  (let area: Int? <- \d{3}?) -
  (let routing: Int <- \d{3}) -
  (let local: Int <- \d{4}) /

would become something like (strawman syntax):

let usPhoneNumber = /let area: Int? <- .numberFromDigits(.exactly(3))/ + "-" +
                    /let routing: Int <- .numberFromDigits(.exactly(3))/ + "-"
                    /let local: Int <- .numberFromDigits(.exactly(4))/

With this format, I also noticed that your code wouldn't match "555-5555", only "-555-5555", so maybe it would end up being something like:

let usPhoneNumber = .optional(/let area: Int <- .numberFromDigits(.exactly(3))/ + "-") +
                    /let routing: Int <- .numberFromDigits(.exactly(3))/ + "-"
                    /let local: Int <- .numberFromDigits(.exactly(4))/

Notice that `area` is initially a non-optional `Int`, but becomes optional when transformed by the `optional` directive.
Other directives may be:

let decimal = /let beforeDecimalPoint: Int <-- .numberFromDigits(.oneOrMore)/ +
              .optional("." + /let afterDecimalPoint: Int <-- .numberFromDigits(.oneOrMore)/

In this world, the `/<--/` format will only be used for explicit binding, and the rest will be inferred from generic `+` operators.

I also think it would be helpful if `Regex` was generic over all sequence types.
Going back to the phone example, this would looks something like:

let usPhoneNumber = .optional(/let area: Int <- .numberFromDigits(.exactly(3))/ + "-") +
                    /let routing: Int <- .numberFromDigits(.exactly(3))/ + "-"
                    /let local: Int <- .numberFromDigits(.exactly(4))/
print(type(of: usPhoneNumber)) // => Regex<UnicodeScalar, (area: Int?, routing: Int, local: Int)>

Note the addition of `UnicodeScalar` to the signature of `Regex`. Other interesting signatures are `Regex<JSONToken, JSONEnumeration>` or `Regex<HTTPRequestHeaderToken, HTTPRequestHeader>`. Building parsers becomes fun!

- George

On Jan 10, 2018, at 11:58 AM, Michael Ilseman via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Hello, I just sent an email to swift-dev titled "State of String: ABI, Performance, Ergonomics, and You!” at https://lists.swift.org/pipermail/swift-dev/Week-of-Mon-20180108/006407.html, whose gist can be found at https://gist.github.com/milseman/bb39ef7f170641ae52c13600a512782f. I posted to swift-dev as much of the content is from an implementation perspective, but it also addresses many areas of potential evolution. Please refer to that email for details; here’s the recap from it:

### Recap: Potential Additions for Swift 5

* Some form of unmanaged or unsafe Strings, and corresponding APIs
* Exposing performance flags, and some way to request a scan to populate them
* API gaps
* Character and UnicodeScalar properties, such as isNewline
* Generalizing, and optimizing, String interpolation
* Regex literals, Regex type, and generalized pattern match destructuring
* Substitution APIs, in conjunction with Regexes.

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Eneko Alonso) #10

Thank you for the reply. The part I didn’t understand is if if giving names to the captured groups would be mandatory. Hopefully not.

Assuming we the user does not need names, the groups could be captures on an unlabeled tuple.

Digits could always be inferred to be numeric (Int) and they should always be “exact” (to match "\d"):

let usPhoneNumber: Regex = (.digits(3) + "-“).oneOrZero + .digits(3) + “-“ + .digits(4)

Personally, I like the `.optional` better than `.oneOrZero`:

let usPhoneNumber = Regex.optional(.digits(3) + "-“) + .digits(3) + “-“ + .digits(4)

Would it be possible to support both condensed and extended syntax?

let usPhoneNumber = / (\d{3} + "-“)? + (\d{3}) + “-“ + (\d{4}) /

Maybe only extended (verbose) syntax would support named groups?

Eneko

···

On Jan 16, 2018, at 10:01 AM, George Leontiev <georgeleontiev@gmail.com> wrote:

@Eneko While it sure seems possible to specify the type, I think this would go against the salient point "If something’s worth capturing, it’s worth giving it a name.” Putting the name further away seems like a step backward.

I could imagine a slightly more succinct syntax where things like .numberFromDigits are replaced by protocol conformance of the bound type:

extension Int: Regexable {
    func baseRegex<T>() -> Regex<T, Int>
}
let usPhoneNumber = (/let area: Int/.exactDigits(3) + "-").oneOrZero +
                    /let routing: Int/.exactDigits(3) + "-" +
                    /let local: Int/.exactDigits(4)

In this model, the `//` syntax will only be used for initial binding and swifty transformations will build the final regex.

On Jan 16, 2018, at 9:20 AM, Eneko Alonso via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Could it be possible to specify the regex type ahead avoiding having to specify the type of each captured group?

let usPhoneNumber: Regex<UnicodeScalar, (area: Int?, routing: Int, local: Int)> = /
  (\d{3}?) -
  (\d{3}) -
  (\d{4}) /

“Verbose” alternative:

let usPhoneNumber: Regex<UnicodeScalar, (area: Int?, routing: Int, local: Int)> = /
  .optional(.numberFromDigits(.exactly(3)) + "-“) +
  .numberFromDigits(.exactly(3)) + "-"
  .numberFromDigits(.exactly(4)) /
print(type(of: usPhoneNumber)) // => Regex<UnicodeScalar, (area: Int?, routing: Int, local: Int)>

Thanks,
Eneko

On Jan 16, 2018, at 8:52 AM, George Leontiev via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Thanks, Michael. This is very interesting!

I wonder if it is worth considering (for lack of a better word) *verbose* regular expression for Swift.

For instance, your example:

let usPhoneNumber = /
  (let area: Int? <- \d{3}?) -
  (let routing: Int <- \d{3}) -
  (let local: Int <- \d{4}) /

would become something like (strawman syntax):

let usPhoneNumber = /let area: Int? <- .numberFromDigits(.exactly(3))/ + "-" +
                    /let routing: Int <- .numberFromDigits(.exactly(3))/ + "-"
                    /let local: Int <- .numberFromDigits(.exactly(4))/

With this format, I also noticed that your code wouldn't match "555-5555", only "-555-5555", so maybe it would end up being something like:

let usPhoneNumber = .optional(/let area: Int <- .numberFromDigits(.exactly(3))/ + "-") +
                    /let routing: Int <- .numberFromDigits(.exactly(3))/ + "-"
                    /let local: Int <- .numberFromDigits(.exactly(4))/

Notice that `area` is initially a non-optional `Int`, but becomes optional when transformed by the `optional` directive.
Other directives may be:

let decimal = /let beforeDecimalPoint: Int <-- .numberFromDigits(.oneOrMore)/ +
              .optional("." + /let afterDecimalPoint: Int <-- .numberFromDigits(.oneOrMore)/

In this world, the `/<--/` format will only be used for explicit binding, and the rest will be inferred from generic `+` operators.

I also think it would be helpful if `Regex` was generic over all sequence types.
Going back to the phone example, this would looks something like:

let usPhoneNumber = .optional(/let area: Int <- .numberFromDigits(.exactly(3))/ + "-") +
                    /let routing: Int <- .numberFromDigits(.exactly(3))/ + "-"
                    /let local: Int <- .numberFromDigits(.exactly(4))/
print(type(of: usPhoneNumber)) // => Regex<UnicodeScalar, (area: Int?, routing: Int, local: Int)>

Note the addition of `UnicodeScalar` to the signature of `Regex`. Other interesting signatures are `Regex<JSONToken, JSONEnumeration>` or `Regex<HTTPRequestHeaderToken, HTTPRequestHeader>`. Building parsers becomes fun!

- George

On Jan 10, 2018, at 11:58 AM, Michael Ilseman via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Hello, I just sent an email to swift-dev titled "State of String: ABI, Performance, Ergonomics, and You!” at https://lists.swift.org/pipermail/swift-dev/Week-of-Mon-20180108/006407.html, whose gist can be found at https://gist.github.com/milseman/bb39ef7f170641ae52c13600a512782f. I posted to swift-dev as much of the content is from an implementation perspective, but it also addresses many areas of potential evolution. Please refer to that email for details; here’s the recap from it:

### Recap: Potential Additions for Swift 5

* Some form of unmanaged or unsafe Strings, and corresponding APIs
* Exposing performance flags, and some way to request a scan to populate them
* API gaps
* Character and UnicodeScalar properties, such as isNewline
* Generalizing, and optimizing, String interpolation
* Regex literals, Regex type, and generalized pattern match destructuring
* Substitution APIs, in conjunction with Regexes.

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution


(C. Keith Ray) #11

people may want digits as characters in order to see zeros. parsing phone numbers and social security numbers need zeros.

C. Keith Ray
https://leanpub.com/wepntk <- buy my book?
http://agilesolutionspace.blogspot.com/
twitter: @ckeithray
http://www.thirdfoundationsw.com/keith_ray_resume_2014_long.pdf

···

On Jan 16, 2018, at 11:24 AM, Eneko Alonso via swift-evolution <swift-evolution@swift.org> wrote:

Digits could always be inferred to be numeric (Int) and they should always be “exact” (to match "\d"):


(Michael Ilseman) #12

(Replying to both Eneko and George at once)

I wonder if it is worth considering (for lack of a better word) *verbose* regular expression for Swift.

It is certainly worth thought; even if we don’t go down that path there’s lessons to pick up along the way. I believe “verbal expressions” is basically what you’re describing: https://github.com/VerbalExpressions/SwiftVerbalExpressions

Thank you for the reply. The part I didn’t understand is if if giving names to the captured groups would be mandatory. Hopefully not.

Assuming we the user does not need names, the groups could be captures on an unlabeled tuple.

I mention this through use of ‘_’.

A construct like (let _ = \d+) could produce an unlabeled tuple element.

Thinking about explicit capture names, etc., is all subject to change based on more investigation and playing around with examples. See my email exchange with John Holdsworth, where most names end up being redundant with destructuring at their only use site. That may have just been overly simplistic examples, but maybe not.

Digits could always be inferred to be numeric (Int) and they should always be “exact” (to match "\d"):

let usPhoneNumber: Regex = (.digits(3) + "-“).oneOrZero + .digits(3) + “-“ + .digits(4)

What if you want to match a sequence of digits that are too large to fit in an Int? For example, the market cap of any stock in the S&P 500 would overflow Int on 32-bit platforms. Having the default represent a portion of the input (whether that be Substring or just a Range) is more faithful to the purposes of captures, which is matching parts of text. Explicitly specifying a type is syntax for passing the capture into an init that serves as both a capture-validator as well as a value constructor, which is really just yet another kind of Pattern. (This might be generalizable to use beyond regexes, but that’s a whole other digression.) This also aids discovery, as you know what type’s conformance to RegexSubmatchableiblewobble to check.

(Note that some way to get slices or ranges will always be important for things like case-insensitive matching: changing case can change the number of graphemes in a string).

Personally, I like the `.optional` better than `.oneOrZero`:

let usPhoneNumber = Regex.optional(.digits(3) + "-“) + .digits(3) + “-“ + .digits(4)

Would it be possible to support both condensed and extended syntax?

let usPhoneNumber = / (\d{3} + "-“)? + (\d{3}) + “-“ + (\d{4}) /

Maybe only extended (verbose) syntax would support named groups?

“\d” is just syntax for a built-in character class named “digit”. There will be some way to use a character class, whether built-in or user-defined, in a regex.

For example, in Perl 6, you can say “\d” or “<digit>”, both of which are equivalent. Shortcuts for some built-in character classes are convenient and leverage the collective understanding of regexes amongst developers, and I don’t think they cause harm.

Eneko

@Eneko While it sure seems possible to specify the type, I think this would go against the salient point "If something’s worth capturing, it’s worth giving it a name.” Putting the name further away seems like a step backward.

I could imagine a slightly more succinct syntax where things like .numberFromDigits are replaced by protocol conformance of the bound type:

extension Int: Regexable {
    func baseRegex<T>() -> Regex<T, Int>
}
let usPhoneNumber = (/let area: Int/.exactDigits(3) + "-").oneOrZero +
                    /let routing: Int/.exactDigits(3) + "-" +
                    /let local: Int/.exactDigits(4)

In this model, the `//` syntax will only be used for initial binding and swifty transformations will build the final regex.

Could it be possible to specify the regex type ahead avoiding having to specify the type of each captured group?

let usPhoneNumber: Regex<UnicodeScalar, (area: Int?, routing: Int, local: Int)> = /
  (\d{3}?) -
  (\d{3}) -
  (\d{4}) /

“Verbose” alternative:

let usPhoneNumber: Regex<UnicodeScalar, (area: Int?, routing: Int, local: Int)> = /
  .optional(.numberFromDigits(.exactly(3)) + "-“) +
  .numberFromDigits(.exactly(3)) + "-"
  .numberFromDigits(.exactly(4)) /
print(type(of: usPhoneNumber)) // => Regex<UnicodeScalar, (area: Int?, routing: Int, local: Int)>

Thanks,
Eneko

Thanks, Michael. This is very interesting!

I wonder if it is worth considering (for lack of a better word) *verbose* regular expression for Swift.

For instance, your example:

let usPhoneNumber = /
  (let area: Int? <- \d{3}?) -
  (let routing: Int <- \d{3}) -
  (let local: Int <- \d{4}) /

would become something like (strawman syntax):

let usPhoneNumber = /let area: Int? <- .numberFromDigits(.exactly(3))/ + "-" +
                    /let routing: Int <- .numberFromDigits(.exactly(3))/ + "-"
                    /let local: Int <- .numberFromDigits(.exactly(4))/

With this format, I also noticed that your code wouldn't match "555-5555", only "-555-5555", so maybe it would end up being something like:

let usPhoneNumber = .optional(/let area: Int <- .numberFromDigits(.exactly(3))/ + "-") +
                    /let routing: Int <- .numberFromDigits(.exactly(3))/ + "-"
                    /let local: Int <- .numberFromDigits(.exactly(4))/

Notice that `area` is initially a non-optional `Int`, but becomes optional when transformed by the `optional` directive.

That is a good catch and illustrates some of the trappings of regexes and the need for pick the right syntax. BTW, when you say optional, does it mean the match didn’t happen or the capture-validation didn’t succeed? In this example, it seems like the inclusive-or of both.

Other directives may be:

let decimal = /let beforeDecimalPoint: Int <-- .numberFromDigits(.oneOrMore)/ +
              .optional("." + /let afterDecimalPoint: Int <-- .numberFromDigits(.oneOrMore)/

In this world, the `/<--/` format will only be used for explicit binding, and the rest will be inferred from generic `+` operators.

I also think it would be helpful if `Regex` was generic over all sequence types.
Going back to the phone example, this would looks something like:

let usPhoneNumber = .optional(/let area: Int <- .numberFromDigits(.exactly(3))/ + "-") +
                    /let routing: Int <- .numberFromDigits(.exactly(3))/ + "-"
                    /let local: Int <- .numberFromDigits(.exactly(4))/
print(type(of: usPhoneNumber)) // => Regex<UnicodeScalar, (area: Int?, routing: Int, local: Int)>

Note the addition of `UnicodeScalar` to the signature of `Regex`. Other interesting signatures are `Regex<JSONToken, JSONEnumeration>` or `Regex<HTTPRequestHeaderToken, HTTPRequestHeader>`. Building parsers becomes fun!

I think I missed something. What does the `UnicodeScalar` type parameter do?

···

On Jan 16, 2018, at 11:24 AM, Eneko Alonso via swift-evolution <swift-evolution@swift.org> wrote:

On Jan 16, 2018, at 10:01 AM, George Leontiev <georgeleontiev@gmail.com <mailto:georgeleontiev@gmail.com>> wrote:

On Jan 16, 2018, at 9:20 AM, Eneko Alonso via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Jan 16, 2018, at 8:52 AM, George Leontiev via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

- George

On Jan 10, 2018, at 11:58 AM, Michael Ilseman via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Hello, I just sent an email to swift-dev titled "State of String: ABI, Performance, Ergonomics, and You!” at https://lists.swift.org/pipermail/swift-dev/Week-of-Mon-20180108/006407.html, whose gist can be found at https://gist.github.com/milseman/bb39ef7f170641ae52c13600a512782f. I posted to swift-dev as much of the content is from an implementation perspective, but it also addresses many areas of potential evolution. Please refer to that email for details; here’s the recap from it:

### Recap: Potential Additions for Swift 5

* Some form of unmanaged or unsafe Strings, and corresponding APIs
* Exposing performance flags, and some way to request a scan to populate them
* API gaps
* Character and UnicodeScalar properties, such as isNewline
* Generalizing, and optimizing, String interpolation
* Regex literals, Regex type, and generalized pattern match destructuring
* Substitution APIs, in conjunction with Regexes.

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Michael Ilseman) #13

Significant leading zeros is a good point. Another would be non-default-radix.

···

On Jan 16, 2018, at 12:22 PM, C. Keith Ray via swift-evolution <swift-evolution@swift.org> wrote:

people may want digits as characters in order to see zeros. parsing phone numbers and social security numbers need zeros.

C. Keith Ray
https://leanpub.com/wepntk <- buy my book?
http://agilesolutionspace.blogspot.com/
twitter: @ckeithray
http://www.thirdfoundationsw.com/keith_ray_resume_2014_long.pdf

On Jan 16, 2018, at 11:24 AM, Eneko Alonso via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Digits could always be inferred to be numeric (Int) and they should always be “exact” (to match "\d"):

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(George) #14

(Replying to both Eneko and George at once)

I wonder if it is worth considering (for lack of a better word) *verbose* regular expression for Swift.

It is certainly worth thought; even if we don’t go down that path there’s lessons to pick up along the way. I believe “verbal expressions” is basically what you’re describing: https://github.com/VerbalExpressions/SwiftVerbalExpressions

Thank you for the reply. The part I didn’t understand is if if giving names to the captured groups would be mandatory. Hopefully not.

Assuming we the user does not need names, the groups could be captures on an unlabeled tuple.

I mention this through use of ‘_’.

A construct like (let _ = \d+) could produce an unlabeled tuple element.

Thinking about explicit capture names, etc., is all subject to change based on more investigation and playing around with examples. See my email exchange with John Holdsworth, where most names end up being redundant with destructuring at their only use site. That may have just been overly simplistic examples, but maybe not.

Digits could always be inferred to be numeric (Int) and they should always be “exact” (to match "\d"):

let usPhoneNumber: Regex = (.digits(3) + "-“).oneOrZero + .digits(3) + “-“ + .digits(4)

What if you want to match a sequence of digits that are too large to fit in an Int? For example, the market cap of any stock in the S&P 500 would overflow Int on 32-bit platforms. Having the default represent a portion of the input (whether that be Substring or just a Range) is more faithful to the purposes of captures, which is matching parts of text. Explicitly specifying a type is syntax for passing the capture into an init that serves as both a capture-validator as well as a value constructor, which is really just yet another kind of Pattern. (This might be generalizable to use beyond regexes, but that’s a whole other digression.) This also aids discovery, as you know what type’s conformance to RegexSubmatchableiblewobble to check.

(Note that some way to get slices or ranges will always be important for things like case-insensitive matching: changing case can change the number of graphemes in a string).

Personally, I like the `.optional` better than `.oneOrZero`:

let usPhoneNumber = Regex.optional(.digits(3) + "-“) + .digits(3) + “-“ + .digits(4)

Would it be possible to support both condensed and extended syntax?

let usPhoneNumber = / (\d{3} + "-“)? + (\d{3}) + “-“ + (\d{4}) /

Maybe only extended (verbose) syntax would support named groups?

“\d” is just syntax for a built-in character class named “digit”. There will be some way to use a character class, whether built-in or user-defined, in a regex.

For example, in Perl 6, you can say “\d” or “<digit>”, both of which are equivalent. Shortcuts for some built-in character classes are convenient and leverage the collective understanding of regexes amongst developers, and I don’t think they cause harm.

Eneko

@Eneko While it sure seems possible to specify the type, I think this would go against the salient point "If something’s worth capturing, it’s worth giving it a name.” Putting the name further away seems like a step backward.

I could imagine a slightly more succinct syntax where things like .numberFromDigits are replaced by protocol conformance of the bound type:

extension Int: Regexable {
    func baseRegex<T>() -> Regex<T, Int>
}
let usPhoneNumber = (/let area: Int/.exactDigits(3) + "-").oneOrZero +
                    /let routing: Int/.exactDigits(3) + "-" +
                    /let local: Int/.exactDigits(4)

In this model, the `//` syntax will only be used for initial binding and swifty transformations will build the final regex.

Could it be possible to specify the regex type ahead avoiding having to specify the type of each captured group?

let usPhoneNumber: Regex<UnicodeScalar, (area: Int?, routing: Int, local: Int)> = /
  (\d{3}?) -
  (\d{3}) -
  (\d{4}) /

“Verbose” alternative:

let usPhoneNumber: Regex<UnicodeScalar, (area: Int?, routing: Int, local: Int)> = /
  .optional(.numberFromDigits(.exactly(3)) + "-“) +
  .numberFromDigits(.exactly(3)) + "-"
  .numberFromDigits(.exactly(4)) /
print(type(of: usPhoneNumber)) // => Regex<UnicodeScalar, (area: Int?, routing: Int, local: Int)>

Thanks,
Eneko

Thanks, Michael. This is very interesting!

I wonder if it is worth considering (for lack of a better word) *verbose* regular expression for Swift.

For instance, your example:

let usPhoneNumber = /
  (let area: Int? <- \d{3}?) -
  (let routing: Int <- \d{3}) -
  (let local: Int <- \d{4}) /

would become something like (strawman syntax):

let usPhoneNumber = /let area: Int? <- .numberFromDigits(.exactly(3))/ + "-" +
                    /let routing: Int <- .numberFromDigits(.exactly(3))/ + "-"
                    /let local: Int <- .numberFromDigits(.exactly(4))/

With this format, I also noticed that your code wouldn't match "555-5555", only "-555-5555", so maybe it would end up being something like:

let usPhoneNumber = .optional(/let area: Int <- .numberFromDigits(.exactly(3))/ + "-") +
                    /let routing: Int <- .numberFromDigits(.exactly(3))/ + "-"
                    /let local: Int <- .numberFromDigits(.exactly(4))/

Notice that `area` is initially a non-optional `Int`, but becomes optional when transformed by the `optional` directive.

That is a good catch and illustrates some of the trappings of regexes and the need for pick the right syntax. BTW, when you say optional, does it mean the match didn’t happen or the capture-validation didn’t succeed? In this example, it seems like the inclusive-or of both.

Yes, it would be inclusive-or. This is a good example of your above point how capture-validation and matching can be conflated. I can’t immediately thing of a good way to make this explicit, but being able to do /let area: Int/ to match “something that can decode to Int” feels very convenient.

Other directives may be:

let decimal = /let beforeDecimalPoint: Int <-- .numberFromDigits(.oneOrMore)/ +
              .optional("." + /let afterDecimalPoint: Int <-- .numberFromDigits(.oneOrMore)/

In this world, the `/<--/` format will only be used for explicit binding, and the rest will be inferred from generic `+` operators.

I also think it would be helpful if `Regex` was generic over all sequence types.
Going back to the phone example, this would looks something like:

let usPhoneNumber = .optional(/let area: Int <- .numberFromDigits(.exactly(3))/ + "-") +
                    /let routing: Int <- .numberFromDigits(.exactly(3))/ + "-"
                    /let local: Int <- .numberFromDigits(.exactly(4))/
print(type(of: usPhoneNumber)) // => Regex<UnicodeScalar, (area: Int?, routing: Int, local: Int)>

Note the addition of `UnicodeScalar` to the signature of `Regex`. Other interesting signatures are `Regex<JSONToken, JSONEnumeration>` or `Regex<HTTPRequestHeaderToken, HTTPRequestHeader>`. Building parsers becomes fun!

I think I missed something. What does the `UnicodeScalar` type parameter do?

I was just commenting here that we may want to regex over non-strings. Regex<UnicodeScalar, T> would operate over strings (sequences of UnicodeScalar), but being able to create Regexes for arbitrary sequences (non-strings) may be useful as well.

···

On Jan 16, 2018, at 2:18 PM, Michael Ilseman via swift-evolution <swift-evolution@swift.org> wrote:

On Jan 16, 2018, at 11:24 AM, Eneko Alonso via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Jan 16, 2018, at 10:01 AM, George Leontiev <georgeleontiev@gmail.com <mailto:georgeleontiev@gmail.com>> wrote:

On Jan 16, 2018, at 9:20 AM, Eneko Alonso via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Jan 16, 2018, at 8:52 AM, George Leontiev via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

- George

On Jan 10, 2018, at 11:58 AM, Michael Ilseman via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Hello, I just sent an email to swift-dev titled "State of String: ABI, Performance, Ergonomics, and You!” at https://lists.swift.org/pipermail/swift-dev/Week-of-Mon-20180108/006407.html, whose gist can be found at https://gist.github.com/milseman/bb39ef7f170641ae52c13600a512782f. I posted to swift-dev as much of the content is from an implementation perspective, but it also addresses many areas of potential evolution. Please refer to that email for details; here’s the recap from it:

### Recap: Potential Additions for Swift 5

* Some form of unmanaged or unsafe Strings, and corresponding APIs
* Exposing performance flags, and some way to request a scan to populate them
* API gaps
* Character and UnicodeScalar properties, such as isNewline
* Generalizing, and optimizing, String interpolation
* Regex literals, Regex type, and generalized pattern match destructuring
* Substitution APIs, in conjunction with Regexes.

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution


(David Hart) #15

While we’re on the topic of regular expressions, can someone confirm if the direction that the document is taking supports naming capture groups inside repeating patterns and automatically typing them to arrays?

let name = /
    (let firstName: String <- \w+) \s
    (let initials: [String] <- \w)* \s
    (let lastName: String <- \w+)
    /
print(type(of: name)) // => Regex<(firstName: String, initials: [Character], lastName: String)>

···

On 16 Jan 2018, at 23:20, Michael Ilseman via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Significant leading zeros is a good point. Another would be non-default-radix.

On Jan 16, 2018, at 12:22 PM, C. Keith Ray via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

people may want digits as characters in order to see zeros. parsing phone numbers and social security numbers need zeros.

C. Keith Ray
https://leanpub.com/wepntk <- buy my book?
http://agilesolutionspace.blogspot.com/
twitter: @ckeithray
http://www.thirdfoundationsw.com/keith_ray_resume_2014_long.pdf

On Jan 16, 2018, at 11:24 AM, Eneko Alonso via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Digits could always be inferred to be numeric (Int) and they should always be “exact” (to match "\d"):

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution


(Michael Ilseman) #16

That’s open for debate in the strawman. The parenthesis are performing two tasks: delimiting a capture and grouping. If we can split these two concepts, e.g. if we have non-capturing grouping, we could require that quantifiers on a capture be “sunk” into the subpattern. This would eliminate the issue.

···

On Jan 17, 2018, at 2:08 PM, David Hart <david@hartbit.com> wrote:

While we’re on the topic of regular expressions, can someone confirm if the direction that the document is taking supports naming capture groups inside repeating patterns and automatically typing them to arrays?

let name = /
    (let firstName: String <- \w+) \s
    (let initials: [String] <- \w)* \s
    (let lastName: String <- \w+)
    /
print(type(of: name)) // => Regex<(firstName: String, initials: [Character], lastName: String)>

On 16 Jan 2018, at 23:20, Michael Ilseman via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Significant leading zeros is a good point. Another would be non-default-radix.

On Jan 16, 2018, at 12:22 PM, C. Keith Ray via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

people may want digits as characters in order to see zeros. parsing phone numbers and social security numbers need zeros.

C. Keith Ray
https://leanpub.com/wepntk <- buy my book?
http://agilesolutionspace.blogspot.com/
twitter: @ckeithray
http://www.thirdfoundationsw.com/keith_ray_resume_2014_long.pdf

On Jan 16, 2018, at 11:24 AM, Eneko Alonso via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Digits could always be inferred to be numeric (Int) and they should always be “exact” (to match "\d"):

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution