String character offset <-> byte offset

tera · March 12, 2022, 11:53pm

Hmm. these strings travel "as strings" to the server / getting stored there / returned back. They can be used as strings on the server side (e.g. for searching API's). Between client and server they travel in XML and/or JSON form. The strings are usual user content (names, messages, links, etc) like in a typical messaging app, so things like emojis or other funny unicode characters will be there.

Do you mean that server (or a library on another client side) can change from, say:

"LATIN CAPITAL LETTER A WITH RING ABOVE"
U+00C5, UTF8: C385

to

"LATIN CAPITAL LETTER A" + "COMBINING RING ABOVE"
U+0041 + U+030A, UTF8: 41CC8A

or change from "0D0A" to "0A" or vice versa?

I'll verify how those cases behave. Do you have other cases in mind?