On the road to Swift 6

Michael_Ilseman · January 27, 2020, 7:00pm

Yes, I have a pretty good high-level view of how this will unfold, which I wrote about here tracked in this top-level bug. I'm not currently working towards this but I can help provide guidance or mentorship for anyone who is interested.

There are details and engineering tradeoffs to be discovered from actually doing the work. We need to get Unicode data (specifically UCD, not CLDR) into the standard library binary. If the size of that data is acceptable, then this is pretty straight forward. Otherwise there is more engineering required to subset and compact the data, balancing tradeoffs between load time, speed of access, and binary size.

Grapheme breaking algorithm and normalization algorithms should be straight-forward to write, as they're mostly data-driven. Grapheme breaking will need some release-to-release updating, but hopefully not much. The Swift optimizer is much better now (and is improving), so making one that's very fast will hopefully not require undue hackery.

Note that these data files will exist in stdlib binary (and accesses to ICU are not inlined), so outside of statically-linked contexts, when we solve the data size problem above we should be good as far as the client is concerned. For statically linked contexts, we'll want to explore how to divide up the data such that it can be more-easily stripped.

For example, we could add a flag to trigger a warning/error on certain Unicode.Scalar.Properties which may pull in a lot of the data.

Isolating ASCII-ness might not be necessary if we get an important subset of the data nice and compact. Equatable and Collection conformance semantics rely on that important subset, which is much smaller than all of the UCD. Otherwise, we can have a mode that will insert a trap if they need to be consulted (e.g. non-trivial graphemes or non-trivially-NFC string contents).