There is a great new blog post titled "UTF-8 String", authored by @Michael_Ilseman. It talks about the change to String in Swift 5 to move to using UTF-8 as its Unicode encoding strategy.
Please use this forum thread to post questions related to that blog post.
The size comparisons are always using UTF-8 as the baseline. So here the ratio is 9 UTF-8 bytes / 6 UTF-16 bytes = 3/2 --> 50% more, and for ASCII the ratio is 2 UTF-8 bytes / 4 UTF-16 bytes = 1/2 --> 50% less.
The example shows UTF-16 using 6 bytes where UTF-8 uses 9. Thus UTF-16 uses 33â % less memory than UTF-8 in that example, and UTF-8 uses 50% more than UTF-16.
The current wording says âUTF-16 uses 50% less memory than UTF-8â. But in the example, the amount of memory that UTF-8 uses is 9 bytes, and 50% of 9 is 4½. So unless the UTF-16 encoding is actually using 4½ bytes, the statement in the blog post is factually incorrect.
It would be great if the blog post could show the benchmark results mentioned. Only the NSString bridging results are linked and there were a lot of wins here.
Right, the Objective-C interop improvements mostly landed in one single PR so itâs easy to link to. UTF-8 string landed with significant wins, but some regressions that were fixed in later PRs, and even more optimizations after that. Since it was replacing a significant portion of the code in the stdlib, it was important to land early. The best we can do is either pick from various PRs (apples-to-apples), or compare across releases. Comparing across releases is less accurate, as a release contains many changes that can perturb the results, for example exclusivity enforcement.
@Erik_Eckstein, do we typically publish benchmark comparisons across releases?
Would this be clearer:
For any ASCII portion of a stringâs content, UTF-8 uses 50% less memory than UTF-16. For any portion comprised of latter-BMP scalars, UTF-8 uses 50% more memory than UTF-16.
Less accurate for microbenchmarks for development purposes maybe, but they would give readers a much better idea of the performance to expect from one release to another. Swift should be doing such general benchmarks anyway to ensure there are no performance regressions between major releases.
If by âclearerâ you mean âfactually correctâ then yes, yes it would. :-)
(Though I still hold that âcomprised ofâ should be either âcomprisingâ or âcomposed ofâ there. The difference is that âThe whole comprises the partsâ whereas âThe parts compose the wholeâ.)
Iâm bumping this thread because the objectively false statement still appears in the blog post, and @Michael_Ilsemanâs correction has not been incorporated.
Is there a better process than posting here, for getting fixes pushed to swift.org?
verb[with object]
consist of; be made up of: the country comprises twenty states.
⢠make up or constitute (a whole): this single breed comprises 50 per cent of the Swiss cattle population | (be comprised of): documents are comprised of words.
USAGE Comprise primarily means âconsist ofâ, as in the country comprises twenty states. It can also mean âconstitute or make up a wholeâ, as in this single breed comprises 50 per cent of the Swiss cattle population. When this sense is used in the passive (as in the country is comprised of twenty states), it is more or less synonymous with the first sense (the country comprises twenty states). This usage is part of standard English, but the construction comprise of, as in the property comprises of bedroom, bathroom, and kitchen, is regarded as incorrect.