Pitch: String Gaps and Missing APIs

We could do something like this, but I don't think it's worth exposing at this point:

extension UTF16 {
  enum CodeUnitClassification {
    case scalar(Unicode.Scalar)
    case leadingSurrogate(payload: UInt16)
    case trailingSurrogate(payload: UInt16)
  }
}
extension UTF8 {
  enum CodeUnitClassification {
    case ascii(Unicode.Scalar)
    case leadingByte(payload: UInt8, width: Int)
    case continuationByte(payload: UInt8)
    case invalid
  }
}

But, this isn't really how one would want to use the result for decoding or analysis, though everything else could be built on top of it (assuming it all gets optimized to something reasonable).

We don't have enough from SIMD yet. We want to pack a 4-element 4-bit lookup table into a 16 byte register for scalar width, but IIUC we don't have access to that. We'd also want to figure out the aligned load model and what the behavior is for dangling (but unread) bytes. @scanon knows more details.