I'm currently working on implementing a new diagnostics printing style which more closely associates errors and their notes (similar to what's described here). While I'm rewriting the printing code, I'd like to be able to support printing highlights and fix-its for source lines which include non-ASCII characters since the current implementation just drops them, which can be really confusing.
Unfortunately, to do that I think I probably need to do grapheme cluster breaking in the compiler to map a byte from the source line to a column number (I know grapheme clusters don't always map 1:1 to columns, but I think they're close enough). LLVM provides
llvm::sys::locale::columnWidth which Clang uses for it's printed output, but it seems like it just hardcodes some known sequences and doesn't handle a lot of cases correctly.
I guess my question is, does this seem like something even worth attempting? I took a look around the lexer and it doesn't seem like it has this functionality implemented, and I know linking ICU into the compiler is a non-starter.
I suppose this might be a good motivator to start integrating Swift code into the compiler, but I don't think we're quite ready for that yet :)