String, Substring, and UnicodeScalarView conform to RangeReplaceableCollection, so is there a reason why their associated UTF8View and UTF16Views don't as well? Couldn't it be helpful for working with a string's underlying representation in a performant way? Or is there a "better" way to modify a string's bytes?
If this conformance does seems like a good addition, what is the path forward? Swift Evolution pitch?
Because they donāt conform to MutableCollection. :-)
The views on a string are direct views on the stringās storage, and in the case of UnicodeScalarView can be used to mutate the string:
var str = "š®šØ"
str.unicodeScalars.replaceSubrange(...str.startIndex, with: [UnicodeScalar(0xff)])
str // ĆæšØ
This is safe because it maintains Stringās core invariant, that itās a sequence of Unicode scalars. If we did the same with UTF-8, it would be possible to break the string:
Ah, so this design decision is based on the fact that UTF8Views should be safe to traverse but are not safe to modify? Is there any reasoning behind the ability to create broken strings using String.init(decoding:as:)?
Regardless, is there a recommended way to "unsafely" modify a string's bytes in a performant way (avoiding copies and having the speed of random access)?
Note that not all strings have a mutable backing UTF-8 storage, so you canāt avoid the possibility of copying in those circumstances if your intention is to mutate, but this API is guaranteed to avoid unnecessary copies.
You can also use isContiguousUTF8 to manually check if the string is already utf-8, and use makeContiguousUTF8 to forcefully convert it to utf-8 for later consumption.
Also, note that it is not currently possible to conform an existing type to an existing protocol while maintaining ABI stability, so this isn't a change we could make with the existing language support for availability.
RangeReplaceableCollection unfortunately does not support situations where the collection can temporarily be in an invalid state before an eventual validation. RRC conformance would add append(_:UInt8) to the UTF8View, and validation would convert this to U+FFFD for every byte of a multi-byte scalar. An alternative approach could be something like a closure (or coroutine) that allows you to append bytes and only does the error correction at the very end.
You shouldnāt be able to create a ābroken stringā. That particular initializer replaces invalid UTF8 sequences/code-points with the Unicode replacement character.
If you are able to create a broken string through that initializer, I think that would be a bug.
That gives you an immutable buffer. However, the function itself is mutating as it may copy the Stringās contents to a new backing store. You should never mutate the Stringās content via the provided pointer. Just in case that wasnāt clear to everyone.