Runtime crash: RegexBuilding with Regex Literals containing named capture groups

tl;dr: Composing a Regex with the DSL using named capture groups works, and matching works, but accessing the matches with the regular techniques is borking at runtime. Using the same patterns without the names works fine.

Since I'll name the tuple parameters whatever I like when accessing from Match, there's an argument that named capture groups aren't particularly necessary. But since they're supported in the Regex Literals, it seems reasonable to expect them to work with the DSL.

Background

I'm converting a regex-based lldb backtrace parser from python. In the original, I wrote a number of regex snippets for each frame component, then interpolated them into the various frame formats I was supporting.

Something like:

f_frame_address = "(?P<frame_address>\dx\w{16})"
f_frame_module = "(?P<module>\w+\.{0,1}\w+)"
f_raw_method = "(?P<raw_method>.+)"

# TEST CASE  (extract frame address, module, method)
#     frame #10: 0x00007fff2011383a libdispatch.dylib`_dispatch_client_callout + 8
regex = re.compile(
    f"(?:frame #\d{{1,2}}\: ){f_frame_address} {f_frame_module}`{f_raw_method} (?:at|\+)"
)

It works well, but I want to add features and I write Swift all day, so let's port it!

Enter Swift world and the RegexBuilder DSL:

// analogs to the python patterns above
enum FrameComponentRegex {
    static let number = /frame \#(?<frame_number>\d+):/
    static let address = /(?<frame_address>\dx\w{16})/
    static let module = /(?<frame_module>\w+\.{0,1}\w+)/
    static let function = /(?<frame_function>.+)/
}

let frame = Regex {
    FrameComponentRegex.number
    One(.whitespace)
    FrameComponentRegex.address
    One(.whitespace)
    FrameComponentRegex.module
    One("`")
    FrameComponentRegex.function
}

Compiler is happy. Interpolation would have been more readable, but fair enough.

Observed Outcomes

Pattern matching works...

let frameString = """
frame #10: 0x00007fff2011383a libdispatch.dylib`_dispatch_client_callout + 8
"""
if let match = frameString.wholeMatch(of: frame) {
    print(match) /**

Match(anyRegexOutput: _StringProcessing.AnyRegexOutput(
  input: "frame #10: 0x00007fff2011383a libdispatch.dylib`_dispatch_client_callout + 8", 
  _elements: [
    _StringProcessing.AnyRegexOutput.ElementRepresentation(optionalDepth: 0, content: Optional((range: Range(Swift.String.Index(_rawBits: 15)..<Swift.String.Index(_rawBits: 4980743)), value: nil)), name: nil, referenceID: nil), 
    _StringProcessing.AnyRegexOutput.ElementRepresentation(optionalDepth: 0, content: Optional((range: Range(Swift.String.Index(_rawBits: 458757)..<Swift.String.Index(_rawBits: 590087)), value: nil)), name: Optional("frame_number"), referenceID: nil), 
    _StringProcessing.AnyRegexOutput.ElementRepresentation(optionalDepth: 0, content: Optional((range: Range(Swift.String.Index(_rawBits: 721159)..<Swift.String.Index(_rawBits: 1900807)), value: nil)), name: Optional("frame_address"), referenceID: nil), 
    _StringProcessing.AnyRegexOutput.ElementRepresentation(optionalDepth: 0, content: Optional((range: Range(Swift.String.Index(_rawBits: 1966343)..<Swift.String.Index(_rawBits: 3080455)), value: nil)), name: Optional("frame_module"), referenceID: nil), 
    _StringProcessing.AnyRegexOutput.ElementRepresentation(optionalDepth: 0, content: Optional((range: Range(Swift.String.Index(_rawBits: 3145733)..<Swift.String.Index(_rawBits: 4980743)), value: nil)), name: Optional("frame_function"), referenceID: nil)
  ]), 
  range: Range(Swift.String.Index(_rawBits: 15)..<Swift.String.Index(_rawBits: 4980743)))

However, accessing the output of the match crashes:

let _ = match.output /** Runtime crash
Could not cast value of type 
'(Swift.Substring, Swift.Substring, Swift.Substring, Swift.Substring, Swift.Substring)' (0x7ff848f83d00) 
to 'Swift.Substring' (0x7ff848af00c8).**/

let (_, number, address, module, function) = match.output /** Compiler exception
Type of expression is ambiguous without more context **/

Expected Outcome

Setting aside whether the named capture groups can be accessed via subscript, I naively expect being able to access the captures via the tuple.

When removing the names from the capture groups, the tuple access behaves as expected:

enum FrameComponentRegex_Unnamed {
    static let number = /frame \#(\d+):/
    static let address = /(\dx\w{16})/
    static let module = /(\w+\.{0,1}\w+)/
    static let function = /(.+)/
}
if let match = frameString.wholeMatch(of: frame) {
    let (_, number, address, module, function) = match.output
    print("""
    number:     \(number)
    address:    \(address)
    module:     \(module)
    function:   \(function)
    """)
}

/**
number:     10
address:    0x00007fff2011383a
module:     libdispatch.dylib
function:   _dispatch_client_callout + 8
**/

User Error?

Any chance I'm not accessing the output correctly?

There are two issues you're seeing here, one a known ergonomic issue and one a bug that we need to fix.

The ergonomic issue is that named capture groups don't play well with the RegexBuilder DSL, due to the way labeled and unlabeled tuples aren't interchangeable in the type system. Your workaround of removing the capture names and adding them back as local variable names is one good approach for handling this.

That said, you shouldn't be seeing this runtime error — it looks like the compiled type and the Match internals aren't on the same page about the shape of the output tuple. Could you open an issue with this code?