ShikiSuen
(ShikiSuen)
1
I recently rewrote the language model of my input method project which handles 240k entries of data.
It has a query function to use a key to query from an OrderedDictionary<key, [Unigram]>
A Unigram is [[Key, Value], Score] as what defined in Megrez IME-data-processing module:
However, it looks like it has a serious performance problem choking itself from being fully functional. At least, all query fails.
This is the query block:
open func unigramsFor(key: String) -> [Megrez.Unigram] {
var v: [Megrez.Unigram] = []
for thing in keyValueRateMap {
if let innerThings = keyValueRateMap[key], innerThings == thing.1 as [KeyValueRate] {
for innerThing in innerThings {
v.append(
Megrez.Unigram(
keyValue: Megrez.KeyValuePair(key: innerThing.key, value: innerThing.value),
score: shouldForceDefaultRate ? defaultRate : innerThing.rate
))
}
}
}
return v
}
And the temporary struct "KeyValue" is:
public struct KeyValueRate: Equatable {
var key: String
var value: String
var rate: Double
public init(key: String = "", value: String = "", rate: Double = 0.0) {
self.key = key
self.value = value
self.rate = rate
}
public init(keyValue: KeyValue = KeyValue(key: "", value: ""), rate: Double = 0.0) {
key = keyValue.key
value = keyValue.value
self.rate = rate
}
public static func == (lhs: KeyValueRate, rhs: KeyValueRate) -> Bool {
lhs.key == rhs.key && lhs.value == rhs.value && lhs.rate == rhs.rate
}
}
It seems that Unigram is a class type? and also it isn't final?
To get the best possible performance out of swift the first step is what you seem to be doing already: measuring with tooling like Instruments. The next steps are to migrate types to where Swift shines: use structures where possible/makes-sense, if not a structure then make sure classes are final, make sure types that are high frequency hits are marked for inlining via @inlinable and @usableFromInline or even @frozen etc, and perhaps the biggest impact is make sure to only measure in release mode with optimizations enabled. Dictionary and OrderdDictionary are quite fast but if you have other things preventing them from being as fast as they could it will hinder your overall performance.
Hopefully that is helpful hints on where to start tackling the issue.
7 Likes
The thing that pops out on that profile to my eye is that you're hitting slowCompare when comparing Strings. I don't off the top of my head remember what the fast and slow paths in that method are, but that would be the place I started looking.
Perhaps your C++ code is not doing unicode-correct comparison, for example, in which case maybe you should store StringUTF8Views instead of Strings? Or perhaps your Strings are accidentally wrapped NSStrings rather than native ones?
Alejandro
(Alejandro Alonso)
4
I'd be curious to see further down the performance graph to see exactly what in NFC/NFD is taking so much time.
ShikiSuen
(ShikiSuen)
5
The performance graph (Xcode Instrument record) is here:
Record.trace.zip
Note: This issue has been solved by deprecating the usage of "for" loop.
ShikiSuen
(ShikiSuen)
7
Thanks for your suggestion. Using "struct" towards Unigram, Bigram, and the KeyValuePair (built-in in Megrez) made the unit test of Megrez much faster and responsive than before. Marking these 3 structs (previously "classes") as @frozen for now since I can feel further speed boost (not instrument-measured yet.) and I don't see compiling problems with this tag.
1 Like
ShikiSuen
(ShikiSuen)
8
Case solved by only using Unigram as the format to store data in the language model.
I don't know why Lukhnos uses a new type to store the data in C++, but I don't need to do the same in Swift.
Type conversion and data-mapping across types could take time, plus unnecessary for loops.
Thanks to Mr. Hausler's idea of frozing structs. This new "unigramsFor" now responds blazing fast" (no stucking anymore):
open func unigramsFor(key: String) -> [Megrez.Unigram] {
if let matched = keyValueRateMap[key] {
return matched
} else {
return [Megrez.Unigram]()
}
}
1 Like
Alejandro
(Alejandro Alonso)
9
Thank you! I'm going to take a look and see where we can potentially improve on on the String side.
1 Like
ShikiSuen
(ShikiSuen)
10
Update: I removed the usage of the for loop, now it doesn't stuck anymore.
I posted the new "unigramsFor(key: String)" block above for your reference.
I am considering this thread closed. Reason: I made a mistake in the build() at the 1_BlockReadingBuffer.swift
2 Likes