Others have addressed various reasons for why Swift doesn't do this, but I want to highlight why making this "just work" is not necessarily a practical solution, from a very realistic standpoint. Human text is incredibly complex, and the details are very intricate. @Michael_Ilseman can corroborate (or refute) what I have to say here, but let's explore taking an approach of making String
indexing O(1) to the end might entail.
To begin with, I'd like to point out that Unicode off the bat allows for arbitrarily long grapheme clusters, meaning that I can construct a string too large to represent in the sum total of all computer memory ever constructed, and which will ever be constructed, containing only a single character. I bring this up not to be facetious, but to show that we cannot make any assumptions about indexing given any string up-front: no fixed-size storage could represent a single character without chunking regularly ā e.g., we can try to split up the character into constituent UTF-32 components, but at the end of the day, we cannot guess how to assign individual indices to the UTF-32 storage without actually traversing the whole thing. I can create a string containing various ASCII characters and insert a huge grapheme cluster right in the middle, and we would not be able to make any assumptions about indices without processing the whole thing.
Given, then, that calculating indices ā byte offsets is an O(n) operation, we have two choices:
- We can perform this mapping up-front, on the creation of every string. This means we will spend the time and storage costs to calculate this on the creation of string literals, file names, user input, etc., every single time. Almost certainly, this is a non-starter, since very few of those strings will be indexed via integer subscripts, so this would be very wasteful for little benefit. The alternative is,
- Lazily calculate indices when subscripting actually happens. This means that we won't pay the cost up-front, but instead, construct the indices as we access them. Note that we can start optimizing this (e.g., given
string[i]
, only calculate indices up to i
so we don't pay an enormous cost on entireContentsOfWarAndPeace[0]
; we can also improve storage by storing runs of like indices, e.g. if we know characters iā
through iā
are ASCII, we don't have to calculate intermediate indices; etc.), but the more optimizations we apply, the harder it will be to reason about performance. The first subscript access to a string may or may not end up being expensive, as would be latter accesses depending on the subscript, and it would be hard to optimize around the mysterious costs of doing so.
Let's take option 2 above and run with it ā what do we do about:
- String mutation? Removing the first character of a string could easily invalidate all of the indexing work we've done already, considering that the character may be arbitrarily large. We could optimize this by modeling this as the decrement of all indices coming after, for instance, but for a very long string with many indices stored, this starts increasing the cost of string mutation. (The same goes for insertion, or replacement of a subrange ā the more we have to store, the more bookkeeping we have to do)
- String concatenation? Strings can technically begin with degenerate combining characters such that
(s1 + s2).count < s1.count + s2.count
ā how do we model this efficiently? (We could recalculate all of the indices provided by s2
, or just throw that work away if it's too expensive.)
The further we follow this down, the more non-deterministic performance can get based on the specific operations you perform, and this can easily become mysterious. If you get two strings from an API which happened to subscript them with random-access indices, they might come with a lot of baggage you as an API client are not aware of. Even if you never subscript them again, they come with potential storage and performance costs that you never cared about to begin with!
I'll echo what @jrose said above ā very often, the use cases I've seen for random access into String
s have been pretty much contrived. It's extremely rare to need to actually jump around inside of a String
and very often, by the time you're doing that, you're either not really benefitting from the semantic correctness that String
offers, or you really need a different data structure.
Papering over the complexity here would negatively impact many consumers of String
who don't care about this, and don't need it, and the tradeoffs are not weighted in their favor.
As others above-thread have pointed out: you're always welcome to add the extension to String
to allow this if you want, but a lot of hard work and effort has gone into not introducing exactly this type of pitfall into String
.