I figured how to do the previous tests properly in those languages I tried before.
Swift
Swift Code
let c = "A\u{0300}"
let d = "\u{00C0}"
let eq = c == d ? "equal" : "not equal"
print(c, eq, d)
let a = "a\u{0300}🏆💩🎬"
for c in a {
print(c, terminator: ".")
}
print()
let thirdChar = a[a.index(a.startIndex, offsetBy: 2)]
print("thirdChar: \(thirdChar)")
Kotlin
In Kotlin I had to import two modules (one for normalisation and another for grapheme cluster breaking) and create a helper for the breaking. Note that as in Swift there is no integer subscript to get the third character (or I haven't figured it out yet) so I was using an iteration and remembered the third character in a row.
Kotlin Code
import java.text.BreakIterator
import java.text.Normalizer
fun String.graphemeClusterSequence() = sequence {
val iterator = BreakIterator.getCharacterInstance()
iterator.setText(this@graphemeClusterSequence)
var start = iterator.first()
var end = iterator.next()
while (end != BreakIterator.DONE) {
yield(this@graphemeClusterSequence.substring(start, end))
start = end
end = iterator.next()
}
}
fun main() {
val c = Normalizer.normalize("A\u0300", Normalizer.Form.NFC)
val d = Normalizer.normalize("\u00C0", Normalizer.Form.NFC)
val eq = if (c == d) "equal" else "not equal"
println("$c $eq $d")
val a = Normalizer.normalize("a\u0300🏆💩🎬", Normalizer.Form.NFC)
var i = 0
var thirdChar = ""
for (ch in a.graphemeClusterSequence()) {
print(ch)
print(".")
if (i == 2) { thirdChar = ch }
i += 1
}
println()
// a.graphemeClusterSequence()[2] // not available, see how I calculated thirdChar above
print("thirdChar: $thirdChar")
}
Python
In Python I had to import an extra module, other than that it was similar to Swift and quite short. Similar to Kotlin I had to normalise strings explicitly. Note that in Python we get a third character of a string with an integer subscript.
Python Code
import unicodedata
def main():
c = unicodedata.normalize('NFC', "A\u0300")
d = unicodedata.normalize('NFC', "\u00C0")
eq = "equal" if c == d else "not equal"
print(c, eq, d)
a = unicodedata.normalize('NFC', "A\u0300🏆💩🎬")
for ch in a:
print(ch, end=".")
print()
print("thirdChar:", a[2])
main()
C#
Similar to Kotlin I had to call Normalize
explicitly, and there is no integer subscript to get to the third character.
C# Code
using System;
public class Program {
public static void Main() {
string c = "A\u0300".Normalize(System.Text.NormalizationForm.FormC);
string d = "\u00C0".Normalize(System.Text.NormalizationForm.FormC);
string eq = c == d ? "equal" : "not equal";
Console.WriteLine($"{c} {eq} {d}\n");
string a = "a\u0300🏆💩🎬".Normalize(System.Text.NormalizationForm.FormC);
var i = 0;
string thrd = "";
foreach (var ch in a.EnumerateRunes()) {
Console.Write(ch);
Console.Write(".");
if (i == 2) {
thrd = $"{ch}";
}
i += 1;
}
Console.WriteLine("\n");
// char thirdChar = a.EnumerateRunes()[2]; // can't do that, see how I calculated thrd above
Console.WriteLine($"thirdChar: {thrd}");
}
}
Now in all languages the output is as expected:
À equal À
à.🏆.💩.🎬.
thirdChar: 💩