String Mutations
Hi all, I thought I’d share some thoughts about the current state of mutating the content of a string, and where we can go in the future.
Views and RangeReplaceableCollection
String and its UnicodeScalarView are RangeReplaceableCollections (“RRC”), meaning that they have operations such as append, insert, erase, and replaceSubrange. However, code unit views are not, and neither is Character.UnicodeScalarView.
Conforming to RRC is tricky for types with extra semantic invariants which could be invalidated through insertion or removal. If a conformer checks these invariants after every modification, then it loses the ability to have temporarily-invalid intermediate states (e.g. a series of appends that is valid only after all of them have happened). If a conformer never checks these invariants, it introduces a significant foot-gun to its users.
For this reason, many views do not conform to RRC. No code unit views conform to RRC, as they all present validly-encoded contents, which is a difficult invariant to maintain (and we’d see counter-intuitive behavior if we tried). Neither does Character.UnicodeScalarView, since Character a single grapheme (i.e. it is a string of length 1), which is a complicated invariant whose meaning varies version-to-version of Unicode. But, String does not have this length-restriction, so its UnicodeScalarView does conform to RRC.
We want to provide more convenient means to mutate them, but need a different approach that allows us to enforce invariants after all modifications are finished. It’s possible that with more generalized coroutines in the future, we’ll see some kind of failable-transactional mutation pattern where the modified contents are validated before being written back.
Batching Mutations
String and String.UnicodeScalarView provide replaceSubrange
, but indices are invalidated upon mutation. If we want to first scan a string building up a list of ranges and replacements, there is no way to apply all of the replacements at once as an in-place mutation. The first time we call replaceSubrange
, the indices inside all our other ranges are invalidated. The only alternative is to build up a new string through concatenation by alternating between slices from the original string and our replacements.
One approach we could implement today would be a replace-multiple-subranges extension on RangeReplaceableCollection, which would take an Array of ranges and replacements, performing the operation all in one go.
A more ergonomic solution would be to allow String to yield mutable slices through a borrow, where the caller can perform the mutation and String tracks the necessary book-keeping. But, that is pending more features from the ownership model.