It was largely luck. Since I suspected memory management (you're working with C, of course... It's memory management...), I try to put variables in the innermost scope possible. Swift would deallocate data at the end of the scope or earlier so you'd have a lot less live data to mess with your assumption, and I usually prefer it that way anyway. And sure enough, when I was editing the orig_new
in scenario 1.
From this:
var regexPointer: OnigRegex? = nil
var error = OnigErrorInfo()
var encoding = OnigEncodingUTF8
patternChars.withUnsafeBufferPointer({ patternPointer in
let result = onig_new(...)
...
})
To this:
let regexPointer: OnigRegex = patternChars.withUnsafeBufferPointer { patternPointer in
var regex: OnigRegex?, error = OnigErrorInfo()
let result = onig_new(...)
...
}
then scenario 1 crashes. After moving things in & out of that scope, it's easy to see that encoding
has something to do with it.
Then I went to doc & source code, and notice that they work with OnigEncoding
, not OnigEncodingTypeST*
(the former is an alias for the latter btw), so I speculate that OnigEncoding
is an opaque reference (i.e. a class-like object). That'd be bad since we keep creating copies with let encoding = ...
. So I tried replacing all encoding
with OnigEncodingUTF8
, and sure enough, it works. The rest is to dig source code to figure out an explanation.
PS
most with
functions return whatever you return from the closure, so you have a lot of freedom during initialization, but may need to annotate the return types sometimes. For example, during init
, you can just do this:
// Now we don't need optional
let regexPointer: OnigRegex = patternChars.withUnsafeBufferPointer { patternPointer in
var regex: OnigRegex!, error = OnigErrorInfo()
let result = onig_new(®ex,
patternPointer.baseAddress,
patternPointer.baseAddress?.advanced(by: patternPointer.count),
OnigOptionType(),
&OnigEncodingUTF8,
OnigDefaultSyntax,
&error)
if result != ONIG_NORMAL {
print("Initialization failed with error: \(result)")
}
return regex
}
Also, you don't throw an error when result != ONIG_NORMAL
there, which I find weird.
Not that the &
bridging is only "shorthand" for appropriate with
function:
// This
foo(&a)
// Is the same as
withUnsafeMutablePointer(&a) { x in
foo(x)
}
Now we do the same for encoding:
onig_new(®ex, ..., &OnigEncodingUTF8, ...)
turns into
withUnsafeMutablePointer(&OnigEncodingUTF8) { encoding in
onig_new(®ex, ..., encoding, ...)
}
which is bad. Remember, regex
is storing the encoding
, and so encoding
is outliving the &
shorthand that creates it. Honestly, I don't know if it's ok, but it is for a similar case. You gotta be careful though.