NSRegularExpression performance


(Francois Green) #1

I’m uncertain if I’m using the correct forum, but I asked this question on the user list a few months back and no one responded. The NSRegularExpression library seems to perform poorly and I’m wondering if this is a performance bug or is it being used improperly? I’ve added links to two algorithms from the Benchmark Game project that seem quite slow when compared to other languages. While I understand that direct comparisons are not possible, this one benchmark really stands out.

http://benchmarksgame.alioth.debian.org/u64q/program.php?test=regexredux&lang=swift&id=2

http://benchmarksgame.alioth.debian.org/u64q/program.php?test=regexredux&lang=swift&id=1


(Philippe Hausler) #2

From a performance standpoint there are a few things going on here.

1) I would highly suggest to have a compiled NSRegularExpression stored once per pattern. From what I can tell this is true for the code listed? Regexes in general are not always best to re-create all the time since it has to have a “compiled" engine from ICU to be made each time.

2) Last time I looked at this specific sample the cost is bridging strings back and forth between NSString and String. In swift 4 we have made some improvements for bridging but I am not certain if any specifically apply to this context (when run on Darwin). For linux builds we are missing the referencing string variants so this can cause some severe performance hits when copying large strings.

3) I would avoid utf8.count in this case for measuring perf (it is probably going to be slow for large files)

4) per your commentary on parallelized cases, I am not certain on why that is slower. Presuming the source data is large (order of megabytes) it should not contend on the access to the regular expression. So I find this odd that it is not better to utilize all cores of your machine.

Now I think with some tuning we could probably get swift-corelibs-foundation to have some faster paths here. As well as fixing some missteps in the code listed for the two tests.

I have some branches that I have been working on for swift-corelibs-foundation that might reduce some allocation times and improve string conversions back and forth from reference types to structural types but those are not fully baked yet. Partially you have to realize that swift-corelibs-foundation is still quite new in comparison to the Foundation on Darwin. So we have been focusing on getting API coverage to a closer point than per-se performance work. Granted however pull requests are welcomed in both cases :wink:

···

On Jun 29, 2017, at 10:15 AM, Francois Green via swift-corelibs-dev <swift-corelibs-dev@swift.org> wrote:

I’m uncertain if I’m using the correct forum, but I asked this question on the user list a few months back and no one responded. The NSRegularExpression library seems to perform poorly and I’m wondering if this is a performance bug or is it being used improperly? I’ve added links to two algorithms from the Benchmark Game project that seem quite slow when compared to other languages. While I understand that direct comparisons are not possible, this one benchmark really stands out.

http://benchmarksgame.alioth.debian.org/u64q/program.php?test=regexredux&lang=swift&id=2

http://benchmarksgame.alioth.debian.org/u64q/program.php?test=regexredux&lang=swift&id=1
_______________________________________________
swift-corelibs-dev mailing list
swift-corelibs-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-corelibs-dev