I'm not convinced that your algorithms really need it. Most languages either represents strings as collection of code points, have buggy implementations, or really problematic performance. What is your use case?
Emojis really aren't the crux here. Many natural language strings have the same issues.
Don’t joke. Swift implements dozens of more exotic use cases. Being able to easily tell what is the character at n position just feels natural and is very helpful even for things like debugging or parsing custom syntax. This String.Index type makes basic thing difficult. Saying it’s niche is a little funny.
I don't think anyone is joking? You claim non-trivial strings are niche, and that random access to specific offsets are not, without justifying your position. I have yet to se a relevant use-case that isn't more niche.
For the simple use case where you just want to find the character in a certain position it may be alright (the performance cost of scanning through the string cannot be avoided), but if you make an algorithm that iterates from 0 to theString.count and indexes into the String at each loop, then the performance penalty is way too high O(n^2) compared to just scanning through the array.
This is the kind of situation that the current api tries to prevent.
In other words: index lookups are expected to be constant in time, and looking up by an integer offset is not constant in time.
Any simple slicing, like skipping the first character and trimming the string to certain position. You are trying too hard to justify your point. If you keep explaining everything like that you will reinvent C or Assembler. Swift has a beautiful syntax that is clearly aimed for humans. But string operations are designed for machines at the moment, their api just sucks.
Strings ARE arrays of characters, in abstraction. It’s just ordered collection of characters, where each character has its position.
Languages evolve and Swift makes it possible to work conveniently with many different structures. In the past, even objects were difficult to work with in most languages. The reason was always performance.
I can think of 3 possible explanations:
Hardware power is still not good enough to handle Unicode characters efficiently.
Swift team hasn’t found a good algorithm yet.
Swift team made poor choices by making entire String API annoying just because it would be slow for very long strings. If so, maybe separate type would be a better choice?
String is NOT array of Characters, Array<Character> is! You can convert String to Array<Character> by doing
let arrayOfCharacter = Array(stringValue)
String is a Sequence of Character, mostly due to performance reason.
Because Character itself doesn’t have uniform size, Array of Character will either waste a lot of space, or be suboptimal for traversing it in sequential order. That’s why it’s never what String is trying to be.
Note that Array(stringValue) will need to traverse the string once which takes O(n) time and likely extra O(n) space.
You are correct that in abstraction they are. And in implementation both a String and an Array are ordered collections. But while Array conforms to RandomAccessCollection (which is the conformance that gives you the indexing subscript), String cannot since the protocol is exactly intended for only the types that can guarantee constant time lookups.
Each of your points are valid, but perhaps there are more considerations involved:
1: Even though hardware power was greater, you would perhaps still want to handle everything as efficiently as possible. So power increase would perhaps not be adequate to solve the issue.
2: Perhaps it is not a matter of finding good algorithms and more of a known and well understood tradeoff between being correct (with respect to the Unicode standard) and efficient at the 'cost' of having an API that may be slightly different from what you may be used to (in languages that perhaps do not care about unicode correctness or efficiency).
3: Perhaps it is again not as much a poor choice as a tradeoff. For each possible other way of modelling a String API there would be different issues. For instance in C you could have a byte buffer and you would be responsible for knowing whether they are ascii or UTF-8 or even sequences of multibyte UTF-8 strings that combine into single visible entities (the concept that is eactly modelled by the Character type - also referred to as an Extended Grapheme Cluster). Indexing here is fast, but handling the contents of the buffer is now entirely up to the user. There are really, really many benefits of the way that String is modelled in Swift - and the way I see it is that the language is completely taking care of issues that are very, very hard to deal with.
But there is a down side - namely that you are forced to consider the implications of referencing a Character inside a String. Although grasping this is not trivial, I still think that it is a good tradeoff, because day to day I don't have to deal with unicode, character encoding or anything like that. Strings simply do the work for me!
It’s not about it trying to be it or not. Good parts of programming languages make it convenient to express your thought in natural way. Bad parts sacrifice syntax to make it easy for machine to work efficiently.
Which one is most natural?
What’s the third character of my name?
What’s the third character after starting index of my character?
What’s the next character after the next character of the starting index?
Programming languages aim for natural syntax and Swift is one of the greatest I have seen so far. But I don’t like the excuses of strings not trying to be like arrays (because that’s not true, strings are NATURALLY like arrays) or that it’s a niche problem to grab n character of a text. I can understand that the hardware is still slow and making it indexed would be too heavy, but that’s just this. Call it by name: it’s a flaw. Not a design choice, not conscious ignorance of niche problem, but a sacrifice. And in the future it should be possible to talk to strings by positions, simply.
I think that a real jump in processing power would make this optimization marginal. Just like today you won’t name your files short to save memory and performance, because it’s marginal. It’s always relative. Optimizations become micro-optimizations when hardware becomes better. That’s why languages become more and more natural and just fun to write and read.
When it comes to Unicode, that statement is false. Even UTF-32 has multi-word characters. Arrays (in programming) contain fixed-sized elements. Unicode characters are not fixed-length. It is exactly your misconception that Swift is trying to avoid.
Edit: Strings are like arrays, but they aren't arrays. This difference is important enough that Swift code devs feel it should not be papered over.
At this point I think it's best to drop the topic or revive (with good effort put forth in the revival) an old discussion around improving the ergonomics of the String API.
Swift's string API was intentionally designed around unicode correctness and hiding potential pitfalls related to working with unicode strings. To say it is a flawed is flat out wrong and hurts your argument. It might not be an API you're used to working in other languages, but it is a good API for working with strings in a unicode safe manner. And part of that is giving up the notion that indexing into strings to get the n-th character is always going to be a constant time operation, it is not.