my guess is you're dealing with an api that takes a string and gives you an integer character offset back? that's the API's fault,, it should be returning a String.Index
or if it's a C API, a byte offset.
I mean returning a character on n position, just like in any other programming language. For example, my cursor is at 200px from left, so I calculate it as my 27th character. Now I need to insert something at this position, or remove that character, or display it somewhere else or anything else.
Well, @sveinhal said itâs not useful and wanted me to give examples.
are you implementing an app console with a monospace font?
Nope, letâs say I implement a game with name typing feature in monospace font.
Does your font really support all 137 439 Unicode characters with uniform spacing? What happens when the user enters a name that runs rightâtoâleft?
Maybe I will filter out allowed characters. As in a game, I will probably just use clickable graphic letters and allow to type them with keyboard as well. But it doesnât mean I will store the name in anything other than a String. I mean, we are talking about one of the most basic data types and itâs like I asked about using it to fry an egg. If I need to avoid using String in cases like that, or to stretch myself to put a character at n position, this is an unfriendly API, no matter how hard you try to lean it bakcwards.
fonts have a flag that declares if all their glyphs are of equal advance width. also,, you are presuming way too much about the character sets the game industry supports.
I think one of the reasons you're getting somewhat negative responses is because I don't think you've demonstrated your understanding of the problem Swift's String design is trying to solve.
What you're saying is akin to "Why can't I just index a linked list to easily get the nth element?" You have two options:
- If you're doing it once, you can deal with the syntactic salt of dealing with
String.index
. - If you're doing it more than once,
LinkedList
(and by analogy,String
) is the wrong choice of data structure. Both of them optimize for particular usecases (LinkedList
optimizes for fast prefix insert/delete, and rearrange operations, andString
optimizes for compact storage of performantly readable text) which are incompatible with your desire for frequent random access. Switching to a contiguous data structure like anArray
is fast and easy in both cases, and is better suited towards your use-case. And once you're done your random accessing, you can easily hop back into the world ofString
/LinkedList
to switch back to the benefits they provide.- "Well why isn't
String
just implemented as aArray<Character>
then?" Because it makes trade-offs that are inappropriate to the most common use cases:- Contiguous data structures can only provide constant time indexing because they make assumptions about element size (that they're all the same size), so that the position of any given element is predictable using just its index. To do this for Unicode strings would waste a lot of memory to padding. In effect, all characters would need to be padded to made as large as the large characters. Alternatively (and even worse), the charcter's data can be referenced indirectly, which would incur absolutely crippling heap allocation/ARC costs.
- The current compact representation of Swift String is uses less RAM, and more importantly, less cache space, so String algorithms can work faster by virtue of causing less cache thrashing. This is much more important than providing constant time indexing, because it's a much more common use case. (All strings are printed/saved/transmitted, but only a small number of them are ever indexed directly)
- "Well why isn't
I think you would incite better responses if you do a better job in understanding the problem, and importantly, demonstrating your understanding so that others will take you seriously.
What are you talking about? If I demonstrate understanding the problem, I will be like some of you here, saying "it canât be done". I do understand Unicode and that this is a difficult problem to solve, but what does it change? My topic is aboult language API, which is too complex in my opinion for what it is meant to do in most cases.
Memory management is perfect example of the same thing, which I raised a few times before. If I asked a question about possibilities of automatic memory management a few years ago, I would get the same responses. And would have been called out for "not demonstrating understanding of the problem". There are many possible solutions for this string case, including having separate API for what it does now and keeping the simple API, slower, because itâs what we usually do with strings. Or just being opened for idea that maybe one day, a better algorithm will be born on someoneâs mind.
I will demonstrate my understanding for you to feel better: oh my gosh, this Unicode is so difficult to work with efficiently. Am I good enough to ask my question now?
In my experience, code that needs random-access to the characters of a string is very common in programming exercises and toy problems encountered by people learning, where a string is nearly always equivalent to an array of ASCII characters and the goal is to display simple manipulations of it. However, random access to characters is quite rare in real-world programs that instead need to handle blobs of text, sometimes quite large, and nearly always internationalized into many languages that use the full expressivity of Unicode.
Swift has made the design decision to optimize for correctness in the real-world case at the expense of ease-of-use for the learning case, which I fully support, but it's also clear why that's frustrating to beginners. When doing coding exercises in Swift, it's frequently the right thing to do to convert the string input to an array of characters for the manipulations and then back to a string at the end.
Maybe I donât feel like itâs rare, because I come from a web dev world, where you read and manipulate userâs input all the time, have many stringy identifiers like tokens, which sometimes have parts of information at certain positions. So itâs not toy programming nor beginnerâs programming there.
That way you can call many Swiftâs features toys. Inferring types is also kind of a toy and thatâs actually great. Language feels fun to use because of that.
If someone finds a feature not intuitive to use, it doesnât necessarily mean heâs a noob or doesnât understand the problem. Heâs just looking forward for the language to become as cool as possible.
Tbh, we suggested that you convert them to array and work with that (twice by me, a few times by others), yet you never say whatâs wrong with it.
There are several things you said that indicated to me that you weren't sufficiently familiar with this problem to be able to understand the tradeoffs made in the current design. Such as:
The majority of the text world-wide uses extended grapheme clusters, to encode all non-latin alphabets. (and even some latin alphabets, those with accented chars).
This is a runtime feature that the compiler can't particularly help with. Also, the O(n)
cost of extended grapheme traversal is paid on all strings. Even those without any "heavy Unicode characters"
It's no faster, it just happens to be a shorter to spell
Because arrays store elements of constant size. You'll never see an array with some rule like "if 2
appears after 1
, the two together are actually a single element". It's always a simple 1:1 mapping between integer indexes and consistently-sized positions. Yet, that's the essence of exactly what Unicode does.
Because the cost would be absolutely unbearable. The vast majority of strings are never subscripted, so you're always paying a cost for which you only sometimes see benefit. Plus, every mutation of the string would invalidate the "index" cache, and require an O(n)
rebuilding process. Just think about what that would be like in a situation where someone builds up a string using a for
loop.
No, they're not. An "array" is a term-of-art for a contiguous collection of consistently sized elements, that can provide O(1)
random access. String is not that, at all.
Ding ding ding, correct. But "ordered collection" â "array", and the implications are very different.
Nor will it ever. No amount of hardware improvement turns a O(n)
operation into O(1)
. And many strings are still sufficiently large that the distinction is very pronounced.
Or the unicode consortium, for that matter. O(1)
grapheme breaking simply isn't possible. It would be akin to figuring out O(1)
random access into a LinkedList
, so good luck with that.
It would be slow for even surprisingly short strings. A harmless s[i]
operation within a for i in 0...s.count
loop turns O(n^2)
, without you even noticing. Considering that Swift's most popular target platform is a low-energy, low power CPU with heat constraints, making willy-nilly quadratic string algorithms has the potential for a large negative customer impact by causing hotter phones, faster battery drainer, and lower responsiveness.
Syntax is not sacrificed to make the "machine work efficiently." You could very easily write an extension on String
to add Int
based subscripts. The syntax does absolutely nothing to help the machine. It's syntactic salt. It's made to work as an eye-catcher for developers to say: "something suspicious is happening here"
Already addressed this, but again, it's simply not true.
Yes, it is marginal, that's only a constant-factor change in problem size. That's a completely different kind of optimization compared to one that changes O(1)
to O(n)
Try telling the traveling salesman that.
This quite clearly shows a lack of understanding into how Array
works.
You donât see the difference between abstraction and implementation. You justify concepts by what it is internally. Thatâs a common syndrome of a programmer. Same issue like designing ui by what it does technically instead of how it interacts with human being.
By array I mean an ordered collection. Fixed sized elements are just implementation of Swift, many languages have arrays of varying sized elements. Conceptually they are the same, but the implementation is different.
Alright, look, this conversation got pretty heated an hour ago and hasn't cooled off, and there's a lot of defensiveness and aggression flying in all directions. I do not want to be in a position where I feel like I'm cutting off someone's ability to make a point, but I also think this thread is starting to run on pure heat. I am locking this until when-I-feel-like-it tomorrow.
If an abstraction in a programming language papers over big performance differences, itâs a bad abstraction. Performance matters, in a programming language itâs not an implementation detail.