VisionKit detection madness (and comparing strings)

MadeByDouglas · March 13, 2024, 1:18pm

So I am using Vision framework to capture handwritten notes and convert them to text. Then I compare the text with other strings for various purposes.

I am encountering strange behavior where sometimes the captured string is not the same kind of character as one would expect.

For example, I capture full sentences fine, however when trying to capture just the letter "A" or just the letter "B" it exhibits strange behavior and fails comparison tests where one would expect the strings to be the same.

I try to compare strings captured from the scan "A" string and my code written string "A" and it fails comparison no matter what I do. I have tried String.CompareOptions [.caseInsensitive, .diacriticInsensitive, .widthInsensitive] but no difference.

When I look at the utf8CString as well as the _countAndFlagsBits in the debugger they have different values, otherwise as far as I can tell they are the same. No whitespaces or new lines, and visually they look the same, getting the utf8 value or unicodeScalars value returns the same. I have tried normalizing with precomposedStringWithCanonicalMapping I have also tried reinstantiating the "bad" string but it keeps its differences. I tried folding and setting locale to current as well.

This issue does not happen with every scan, but it happens with different letters, so far with A and B. If I delete the scanned character and type "A" or "B" it types the correct letter and comparison succeeds as expected.

Any help would be greatly appreciated.

MadeByDouglas · March 13, 2024, 1:33pm

Ok, when I copy the mystery character into this website, it shows up as the following character

I'm not sure what the mystery "A" is, but I imagine something similar.

So finally some answers, it seems vision is detecting the character in the wrong locale? How do I fix this? Why would it behave like that with default settings in English environment?

This seems like such an obvious thing, whether I am English or Russian, having it detect both characters and mix them together is awful. This seems like it would break so many things, why would it not have smarter default behavior to just focus on one language? I realize this is a bit off topic now, but seems like such an oversight.
There should be a way to force VNRecognizeTextRequest to only look for english characters, can this be done?

scanon · March 13, 2024, 1:59pm

As you note, this is really an Apple SDK question, rather than a Swift language question, so it's fairly unlikely that you'll get a satisfactory answer here. Filing a bug report or asking on the developer forums is probably your best bet.

wadetregaskis · March 13, 2024, 3:13pm

If Swift had a way to convert strings to NFKD / NFKC, or at least compare based on that, this would be less of an issue (not a perfect solution, but it'd help). This is a great example of wanting to just compare based on glyph appearance rather than scalar semantics.

MadeByDouglas · March 14, 2024, 6:05am

Yes now that I understand the problem thats true. I'll leave this here just in case someone else has the same problem and goes down the same mistaken path I did with string comparisons. As it turns out, I guess string is working just fine, its just literally detecting a different character. (the Russian Ve aka "B")

As for VisionKit and Vision, yes further testing reveals how disappointing the library is. Tbh I don't know how anyone uses this, basic testing with sample code from Locating and Displaying Recognized Text | Apple Developer Documentation shows it will find Russian characters even if primary language is set to English, and English characters even if primary language is set to Russian. Further if you turn off extra language support it still does it, and this is all with revision 3. The older revision2 model is better, but worse at detecting in general, and revision1 is the worst, (I suppose to be expected as the oldest) bad detection and mismatching languages still.
I will be filing a bug report with Apple.

MadeByDouglas · March 14, 2024, 6:12am

yes 100%!!
This is why I'm leaving this post here.
While it ultimately was a Vision framework issue, if String was a little more human friendly it would have saved me hours of hunting and even potentially provided a workaround.
There was no way for me to show what the actual glyphs were with any of the debugging tools, and no easy way to say, look, these are two different things, instead of guessing if they were just encoded differently etc.

tera · March 14, 2024, 2:28pm

It's not just VisionKit, the issue is bigger.

How is this possible?

Solution:

var TEABOX: String { #function } // returns "TEABOX"
var ТЕАВОХ: String { #function } // returns "ТЕАВОХ"
let a = Set(TEABOX)
let b = Set(ТЕАВОХ)
print(Array(a).sorted()) // ["A", "B", "E", "O", "T", "X"]
print(Array(b).sorted()) // ["А", "В", "Е", "О", "Т", "Х"]
print(a.intersection(b)) // []

MadeByDouglas · March 22, 2024, 6:32am

Interesting! I did not realize the history nor widespread issue, but it makes sense. I'm guessing your code there with vars that look like duplicates are actually different Cyrillic character sets and all that.

What would be a common sense solution for this? How have other languages tried to address this? (and of course security is another issue, I'm just thinking programmer experience here)

tera · March 24, 2024, 3:27pm

The issue comes "in a package" with the superpower of using 🍏 and 👪, etc. as variable names and ∉ and ∆, etc. as operators. Older ASCII based languages like C didn't have this issue / superpower.

Dmitriy_Ignatyev · March 25, 2024, 11:55am

What are the bytes or unicode codepoint of scanned string? I mean if we look at Data(scannedString.utf8).