Pitch: Add a String Initializer with Access to Uninitialized Storage

Karl · April 8, 2019, 12:02pm

My thoughts on this API have in some ways evolved, and some ways not budged at all and have even been reinforced

It all comes down to shared Strings. They are in the ABI right now, so all of the performance wins that we've got in Swift 5 with UTF8 Strings are happening despite branching to check for external backing storage. All of the changes needed to expose this in the language are just surface level. That's just awesome.

So why do we even need a way to manually initialise String's storage, when we can so easily provide our own? Are there actual benefits to using String's built-in storage type for these cases? (I can only think of one: COW checks/in-place mutations. Can shared String's not be extended to incorporate this?)

And then this doubles back to consistency with Array: if String can accommodate external storage with such great performance, it seems strange that Array couldn't do that, too. (Of course, Array is in many ways a lot simpler and always performed better than a Unicode-compliant String, so it has more to lose).

I asked about it when the equivalent Array API was proposed. The answer I got about the 'count' header seems pretty easy to overcome in the face of everything String is able to pull off. I'm sure we could devise a way to communicate the element count.

Anyway, on the proposal as written:

I'm pretty sure I remember being told that functions in Swift either succeed, throw, or something catastrophic/unrecoverable happens - i.e. a fatalError (as opposed to the 'exception' model). So IMO throwing is the right way for initialisation-closures to model failure. It's not clear if an empty String is necessarily always a failure condition.

We should drop the documentation comments about setting initializedCount to 0 on failure (again, throwing is the way to indicate a failure), and certainly the part about doing so if the capacity is not sufficient, since it implies the API might somehow give you less capacity than you asked for (i.e. an allocation failure, which should be a fatalError), and @David_Smith literally just said it was a "key design constraint" that you can intentionally ask for less and presumably append the rest later.

The most contentious issue is this:

People who are using an 'unsafe' initialiser are probably doing so because performance is critical. Validating the data is O(N), so I think it would be useful to provide a non-validating version as well.

Not important, but out of curiosity: do we know what happens if a String somehow manages to contain invalid UTF8? Is something likely to crash or fall in to an infinite loop?