Strings in Swift 4

silberz · March 19, 2018, 9:40am

Thanks Michael, itaiferber,and Jean-Daniel,

From your feedback, I understand that it would be possible to get a "composition" operation that follows the NFKC standard to get an array of unicode scalars for a given string. This would allow one to build a robust syntactic parser that could parse texts in any natural language. Ligatures would not be a linguistic problem because NFKC properly separates them into sequences of letters; from a linguistic point of view, emojis should be processed as words rather than letters.

From a previous discussion with Michael, I understand that a standard "decomposition" operation such as NFKD could be available as well. Such a feature would allow one to build a robust morphological parser, that could add or remove accents or stresses to/from a letter in a linguistically natural way. That is crucial for all languages that have a heavy morphology.

Another missing feature that is desperately needed: a data <=> string reversibility access that would allow a parser to tell its client at what exact position in the initial data (byte array) the match actually occurred. Without this basic feature, I don't see how to build serious NLP applications.

I don't know how to "start a pitch" to get these three features; are "pitches" Apple official procedure? I understand that the String team has other priorities and has limited Human resources (maybe I can help?). I do hope that Swift will get these features, hopefully before June when I will have to start to work full time on the next version of the NooJ linguistic platform.