tl;dr: Composing a Regex with the DSL using named capture groups works, and matching works, but accessing the matches with the regular techniques is borking at runtime. Using the same patterns without the names works fine.
Since I'll name the tuple parameters whatever I like when accessing from Match
, there's an argument that named capture groups aren't particularly necessary. But since they're supported in the Regex Literals, it seems reasonable to expect them to work with the DSL.
Background
I'm converting a regex-based lldb backtrace parser from python. In the original, I wrote a number of regex snippets for each frame component, then interpolated them into the various frame formats I was supporting.
Something like:
f_frame_address = "(?P<frame_address>\dx\w{16})"
f_frame_module = "(?P<module>\w+\.{0,1}\w+)"
f_raw_method = "(?P<raw_method>.+)"
# TEST CASE (extract frame address, module, method)
# frame #10: 0x00007fff2011383a libdispatch.dylib`_dispatch_client_callout + 8
regex = re.compile(
f"(?:frame #\d{{1,2}}\: ){f_frame_address} {f_frame_module}`{f_raw_method} (?:at|\+)"
)
It works well, but I want to add features and I write Swift all day, so let's port it!
Enter Swift world and the RegexBuilder DSL:
// analogs to the python patterns above
enum FrameComponentRegex {
static let number = /frame \#(?<frame_number>\d+):/
static let address = /(?<frame_address>\dx\w{16})/
static let module = /(?<frame_module>\w+\.{0,1}\w+)/
static let function = /(?<frame_function>.+)/
}
let frame = Regex {
FrameComponentRegex.number
One(.whitespace)
FrameComponentRegex.address
One(.whitespace)
FrameComponentRegex.module
One("`")
FrameComponentRegex.function
}
Compiler is happy. Interpolation would have been more readable, but fair enough.
Observed Outcomes
Pattern matching works...
let frameString = """
frame #10: 0x00007fff2011383a libdispatch.dylib`_dispatch_client_callout + 8
"""
if let match = frameString.wholeMatch(of: frame) {
print(match) /**
Match(anyRegexOutput: _StringProcessing.AnyRegexOutput(
input: "frame #10: 0x00007fff2011383a libdispatch.dylib`_dispatch_client_callout + 8",
_elements: [
_StringProcessing.AnyRegexOutput.ElementRepresentation(optionalDepth: 0, content: Optional((range: Range(Swift.String.Index(_rawBits: 15)..<Swift.String.Index(_rawBits: 4980743)), value: nil)), name: nil, referenceID: nil),
_StringProcessing.AnyRegexOutput.ElementRepresentation(optionalDepth: 0, content: Optional((range: Range(Swift.String.Index(_rawBits: 458757)..<Swift.String.Index(_rawBits: 590087)), value: nil)), name: Optional("frame_number"), referenceID: nil),
_StringProcessing.AnyRegexOutput.ElementRepresentation(optionalDepth: 0, content: Optional((range: Range(Swift.String.Index(_rawBits: 721159)..<Swift.String.Index(_rawBits: 1900807)), value: nil)), name: Optional("frame_address"), referenceID: nil),
_StringProcessing.AnyRegexOutput.ElementRepresentation(optionalDepth: 0, content: Optional((range: Range(Swift.String.Index(_rawBits: 1966343)..<Swift.String.Index(_rawBits: 3080455)), value: nil)), name: Optional("frame_module"), referenceID: nil),
_StringProcessing.AnyRegexOutput.ElementRepresentation(optionalDepth: 0, content: Optional((range: Range(Swift.String.Index(_rawBits: 3145733)..<Swift.String.Index(_rawBits: 4980743)), value: nil)), name: Optional("frame_function"), referenceID: nil)
]),
range: Range(Swift.String.Index(_rawBits: 15)..<Swift.String.Index(_rawBits: 4980743)))
However, accessing the output of the match crashes:
let _ = match.output /** Runtime crash
Could not cast value of type
'(Swift.Substring, Swift.Substring, Swift.Substring, Swift.Substring, Swift.Substring)' (0x7ff848f83d00)
to 'Swift.Substring' (0x7ff848af00c8).**/
let (_, number, address, module, function) = match.output /** Compiler exception
Type of expression is ambiguous without more context **/
Expected Outcome
Setting aside whether the named capture groups can be accessed via subscript, I naively expect being able to access the captures via the tuple.
When removing the names from the capture groups, the tuple access behaves as expected:
enum FrameComponentRegex_Unnamed {
static let number = /frame \#(\d+):/
static let address = /(\dx\w{16})/
static let module = /(\w+\.{0,1}\w+)/
static let function = /(.+)/
}
if let match = frameString.wholeMatch(of: frame) {
let (_, number, address, module, function) = match.output
print("""
number: \(number)
address: \(address)
module: \(module)
function: \(function)
""")
}
/**
number: 10
address: 0x00007fff2011383a
module: libdispatch.dylib
function: _dispatch_client_callout + 8
**/
User Error?
Any chance I'm not accessing the output correctly?