Far more alarming here is that
count on the entire string—twice. It does that inside a loop that runs for every non‐whitespace character in the string—twice per character.
Unicode does make a noticeable difference, probably because it slows that repeated iteration of the string. But it is negligible compared to the length of the file. I have several files I had to hide from swift-format because they ran longer than I had the patience to measure—I killed the process after twenty minutes. I never found the time to go back and look into it until now, but the first one I went back to find is essentially a single 1.5 MB base‐64 encoded string literal: i.e. no Unicode, little whitespace, extremely simple source tree, but very, very long.
A quick and dirty improvement that would work wonders would be to turn
Array<Character> once in the initializer and let everything after that be random access.
A more thorough solution easier on memory would be to create a type that stores an index/offset pair and pass it around instead of lone offsets Then any string access can happen with the indices, but the arithmetic and reporting can still use the offset, and neither needs to be recomputed each time. (By string access, I mean to the main file strings, not the trivial, transient fragments assembled during the computation.)
While switching to scalars is probably a good idea for other reasons, I doubt it would change all that much, since they are no more random‐access than “characters”.