SE‐0180 came up again during the review of SE‐0241, which seeks to do some damage control.
I was not around during the original review of SE‐0180. I wish I was, because it has become the only accepted evolution proposal I have ever wished could be undone. I have been just biting my tongue in silence and trying to live with it. Since others are apparently noticing problems it caused too, I thought I would start a discussion about the issues it has been causing me. A complete reversal of SE‐0180 is probably out of the question, but maybe we can find something that can be still done about some of the issues?
SE‐0180 erased the difference between the indices of a string’s various views:
Today
String
shares anIndex
type with itsCharacterView
but not with itsUTF8View
,UTF16View
, orUnicodeScalarView
. This proposal redefinesString.UTF8View.Index
,String.UTF16View.Index
, andString.CharacterView.Index
as typealiases forString.Index
[...]
The rational given was this:
The different index types are supported by a set of
Index
initializers, which are failable whenever the source index might not correspond to a position in the target view[.] [...] The result is a great deal of API surface area for apparently little gain in ordinary code, that normally only interchanges indices among views when the positions match up exactly (i.e. when the conversion is going to succeed). Also, the resulting code is needlessly awkward.
I have found just the opposite. The “ordinary code” which “only interchanges indices when the conversion is going to succeed” has taken a massive safety hit, and now requires extra boilerplate to deal with illogical failability.
(The following examples all assume a global constant let café = "cafe\u{301}"
.)
Before SE‐0180 landed, I had lots of code like this:
func use(index: String.Index) {
let scalarIndex = index.samePosition(in: café)
// ...
}
use(index: café.startIndex)
It was all perfectly valid and logical. Since the parameter is a valid character index, it must have a corresponding scalar index. The standard library knew this, and the conversion method returned a non‐optional.
But then SE‐0180 landed, and it stopped compiling. There were two possible ways to deal with it:
-
Obey the fix‐it:
func use(index: String.Index) { let scalarIndex = index.samePosition(in: café)! // ← Added “!”. // ... }
Great, just add one character and all is well—at first. To begin with, the method is only applied in the same places as before, so the indices it receives are always valid and everything continues to work. Unfortunately, over time code completion suggests to use it in new places and sooner or later, something somewhere provides it an invalid character index, forgetting to convert it from
UTF8View.Index
and passing it directly through the typealias:use(index: café.utf8.index(before: café.utf8.endIndex))
Even this bug isn’t caught at first, because many indices will still just happen to be valid. Eventually you get an inconsistent crash, which occurs far from the real cause.
-
Rethink according to the new model.
String.Index
isn’t really a character index; it’s really aUTF8View.Index
that might be usable in some string‐based APIs with undocumented behaviour occurring whenever it isn’t (more no that later). This rewrite is ready for anything:func use(index: String.Index) { guard let validated = index.samePosition(in: café) else { return } let scalarIndex = validated.samePosition(in: café.unicodeScalars)! // ... }
Now the method is bulletproof. But the tradeoffs aren’t negligible. There are extra run‐time unwrapping and validity checks. This
Void
‐returning method may silently do nothing. A version with a return value needs to become optional, forcing every call site to also handle an optionality which shouldn’t be there. If the return type is already optional, such asfunc firstMatch(for: Pattern, in: Range<String.Index>) -> Range<String.Index>?
then it conflates two different semantics intonil
: “There is no match.” vs “I cannot search because the index is invalid.”
Now the “ordinary code” which “only interchanges indices when the conversion is going to succeed”, has to choose between hard to track bugs and an avalanche of API changes to switch into the domain of unordinary code which interchanges indices with unknown validity.
To reuse SE‐0180’s own words, I feel that now “the resulting code [really] is needlessly awkward.”
There is actually a third option. It’s one I shudder at even more, but it is employed in several places in the standard library—the strategy of silently rounding to a valid index.
For example, the String
’s subscript for getting substrings apparently rounds the indices, though I can find no documentation anywhere that describes its intended behaviour for such cases.
This code is not logically valid, yet right now neither compiler nor documentation tell you that it isn’t.
let index = café.utf8.index(before: café.utf8.endIndex) // The last byte.
let before = String(café[..<index])
let after = String(café[index...])
print(before + after) // “café”
print(after.utf8.count) // 2 bytes?!?
I wish this didn’t compile. But even knowing that the current String.Index
model requires it to compile, I would at least expect the second line to fail at runtime. But it doesn’t, instead entering a bizarre, unexpected and undefined territory, injecting nebulous bugs into later logic. We might pat ourselves on the back that the two halves still make the same whole on the fourth line, but the developer asked for one byte in the top line and silently ended up with two by the fifth. Any byte offset logic that follows will be headed for a train wreck. At least they are confined to five short lines in the example, but in practice the component calls could be in far‐flung methods spread across a library. And again, the whole thing may appear to work just fine for a long time, merely because the developer neglected to test it with anything outside ASCII.
So what I’m asking is this: Can we find a way to restore type safety to string indices, or at least a way to provide a type‐safe alternative alongside the status quo? Can there be an elegant way to declare method parameters that have a compile‐time guarantee that they are valid indices of a particular view? Is there some other solution to these issues?
(Possibly interested persons: @Michael_Ilseman, @lorentey)