Any reason String.UTF8View doesn't conform to DataProtocol? I noticed it when dealing with CryptoKit, and I needed to cast it using Array or Data (which is better to use in this scenario is entirely unclear!).
let digest = Crypto.SHA256.hash(data: Array(digestValue.utf8))
If there is no good reason not to conform it, would we need a full pitch on Foundation to do so? Or would a PR on its own be welcome?
I suspect it's because on Apple platforms Strings are not always stored as contiguous UTF-8; they could be wrapping NSStrings that store UTF-16 instead. So there's no way to expose contiguous UTF-8 without (potentially) doing an expensive bridging operation. But a current Apple person could chime in if there's more to it.
(Agreed that better guidance on "if I have a String and I need contiguous UTF-8, what's the best way to convert it" would be useful!)
Yeah, you hit the nail on the head. Because String can have external representations which aren't necessarily UTF-8, UTF8View may be an ephemeral view which isn't free to construct or iterate over. Conforming it to DataProtocol unconditionally gives the impression that it could offer a contiguous array of bytes, when in reality, this could require an allocation and O(n) walk over the string; and since that work can't reasonably be cached or held on to, this has the potential to lead to accidentally-quadratic performance.
String has since gotten makeContiguousUTF8(), though this doesn't help for things like protocol conformances. Instead, the right thing to do would be to go through withUTF8, which will make the string contiguous UTF-8 if it isn't yet, then provide a contiguous buffer of UTF-8 data:
let digest = digestValue.withUTF8 { buffer in
Crypto.SHA256.hash(data: buffer)
}
This is sort of best-of-all-worlds, in that if the String is already UTF-8 (the most common case), then offering the buffer is free and you avoid an O(n) copy of the bytes (and since withUTF8 vends an UnsafeBufferPointer<UInt8> which does conform to DataProtocol, you get that for free too); if the Stringisn't already UTF-8, then this method mutates the string to make it so, making this free in the future and (theoretically) avoiding accidentally-quadratic performance.
(Downsides are discoverability, and not being able to avoid a copy in case you have an immutable string which would require copying to make contiguous.)
I was going to bring up that DataProtocolsupports non-contiguous ranges, but I was also under the false impression modern String forced UTF-8 under the hood, even when bridging from NSString. Glad to know .withUTF8() will do the right thing though, thanks!