Why does String.CharacterView have reserveCapacity(:)?

I’m wondering why the String.CharacterView structure has a
reserveCapacity(:) member? And even more strangely, why String itself has
the same method?

It’s even weirder that String.UnicodeScalarView has this method, but it
reserves `n` `UInt8`s of storage, instead of `n` `UInt32`s of storage. Also
why String.UTF8View and String.UTF16View do not have this method, when it
would make more sense for them to have it than for String itself and
String.CharacterView to have it.

I’m wondering why the String.CharacterView structure has a reserveCapacity(:) member?

Because it conforms to the RangeReplaceableCollection protocol, which requires `reserveCapacity(_:)`.

More broadly, because you can append characters to the collection, and so you might want to pre-size it to reduce the amount of reallocating you might need to do in the future.

And even more strangely, why String itself has the same method?

Because it has duplicates of those `CharacterView` methods which don't address individual characters. (In Swift 4, it will be merged with CharacterView.)

It’s even weirder that String.UnicodeScalarView has this method, but it reserves `n` `UInt8`s of storage, instead of `n` `UInt32`s of storage.

Because the views are simply different wrappers around a single underlying buffer type, which stores the string in 8-bit (if all characters are ASCII) or 16-bit (if some are non-ASCII). That means that `UnicodeScalarView` isn't backed by a UTF-32 buffer; it's backed by an ASCII or UTF-16 buffer, but it only generates and accepts indices corresponding to whole characters, not the second half of a surrogate pair.

Why not allocate a larger buffer anyway? Most strings use no extraplanar characters, and many strings use only ASCII characters. (Even when the user works in a non-ASCII language, strings representing code, file system paths, URLs, identifiers, localization keys, etc. are usually ASCII-only.) By reserving only `n` `UInt8`s, Swift avoids wasting memory, at the cost of sometimes having to reallocate and copy the buffer when a string contains relatively rare characters. I believe Swift doubles the buffer size on each allocation, so we're talking no more than one reallocation for a non-ASCII string and two for an extraplanar string. That's quite acceptable.

Also why String.UTF8View and String.UTF16View do not have this method, when it would make more sense for them to have it than for String itself and String.CharacterView to have it.

Because UTF8View and UTF16View are immutable. They don't conform to RangeReplaceableCollection and cannot be used to modify the string (since you could modify them to generate an invalid string).

···

On May 2, 2017, at 12:35 PM, Kelvin Ma via swift-users <swift-users@swift.org> wrote:

--
Brent Royal-Gordon
Architechies

Okay I understand most of that, but I still feel it’s misleading to put
`reserveCapacity()` on `CharacterView` and `UnicodeScalarView`.
`reserveCapacity()` should live in a type where its meaning matches up with
the meaning of the `.count` property, ideally the `UTF8View`. Otherwise it
should at least be removed from `CharacterView` and `UnicodeScalarView` and
only live in the parent `String` type.

···

On Thu, May 4, 2017 at 6:33 AM, Brent Royal-Gordon <brent@architechies.com> wrote:

On May 2, 2017, at 12:35 PM, Kelvin Ma via swift-users < > swift-users@swift.org> wrote:

I’m wondering why the String.CharacterView structure has a
reserveCapacity(:) member?

Because it conforms to the RangeReplaceableCollection protocol, which
requires `reserveCapacity(_:)`.

More broadly, because you can append characters to the collection, and so
you might want to pre-size it to reduce the amount of reallocating you
might need to do in the future.

And even more strangely, why String itself has the same method?

Because it has duplicates of those `CharacterView` methods which don't
address individual characters. (In Swift 4, it will be merged with
CharacterView.)

It’s even weirder that String.UnicodeScalarView has this method, but it
reserves `n` `UInt8`s of storage, instead of `n` `UInt32`s of storage.

Because the views are simply different wrappers around a single underlying
buffer type, which stores the string in 8-bit (if all characters are ASCII)
or 16-bit (if some are non-ASCII). That means that `UnicodeScalarView`
isn't backed by a UTF-32 buffer; it's backed by an ASCII or UTF-16 buffer,
but it only generates and accepts indices corresponding to whole
characters, not the second half of a surrogate pair.

Why not allocate a larger buffer anyway? Most strings use no extraplanar
characters, and many strings use only ASCII characters. (Even when the user
works in a non-ASCII language, strings representing code, file system
paths, URLs, identifiers, localization keys, etc. are usually ASCII-only.)
By reserving only `n` `UInt8`s, Swift avoids wasting memory, at the cost of
sometimes having to reallocate and copy the buffer when a string contains
relatively rare characters. I believe Swift doubles the buffer size on each
allocation, so we're talking no more than one reallocation for a non-ASCII
string and two for an extraplanar string. That's quite acceptable.

Also why String.UTF8View and String.UTF16View do not have this method,
when it would make more sense for them to have it than for String itself
and String.CharacterView to have it.

Because UTF8View and UTF16View are immutable. They don't conform to
RangeReplaceableCollection and cannot be used to modify the string (since
you could modify them to generate an invalid string).

--
Brent Royal-Gordon
Architechies