SE-0263: Add a String Initializer with Access to Uninitialized Storage

tkremenek · July 30, 2019, 6:14am

The review of SE-0263: Add a String Initializer with Access to Uninitialized Storage begins now and runs through August 9, 2019.

The proposal is written by @David_Smith.

Reviews are an important part of the Swift evolution process. All review feedback should be either on this forum thread or, if you would like to keep your feedback private, directly to me as the review manager via email or direct message on the forums. If you send me email, please put "SE-0263" somewhere in the subject line.

What goes into a review of a proposal?

The goal of the review process is to improve the proposal under review through constructive criticism and, eventually, determine the direction of Swift.

When reviewing a proposal, here are some questions to consider:

What is your evaluation of the proposal?
Is the problem being addressed significant enough to warrant a change to Swift?
Does this proposal fit well with the feel and direction of Swift?
If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?
How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

Thank you for contributing to Swift!

Ted Kremenek
Review Manager

griotspeak · July 30, 2019, 7:01am

What is your evaluation of the proposal?
+1
Is the problem being addressed significant enough to warrant a change to Swift?
Yes
Does this proposal fit well with the feel and direction of Swift?
Yes
How much effort did you put into your review? A glance, a quick reading, or an in-depth study?
I read through the proposal and have followed the Array discussion.

lukasa · July 30, 2019, 8:56am

+0.5.

I think the general desire to support this use-case is good, and SwiftNIO has a number of cases where we could benefit from it. However, I would like to ask the proposal to flesh out a number of questions, and also to propose a naming change.

In the case of the questions, the proposal doesn't state how this initializer interacts with the internal String representations. Specifically, if the capacity requested is smaller than the small-string size, will the vended pointer be an interior one? I don't think we need the proposal to affirmatively answer "yes" to that question, but if the answer is definitely "no" then I think it's important to call that out as it may affect some important use-cases (e.g. NIO would probably prefer to use a different initializer when we could hit the small-string case).

On the naming front, I think it's probably a good idea to include the word "unsafe" somewhere in this method name or in the labels. It's pretty easy to accidentally veer into memory unsafety here, and having the word "unsafe" makes it easier for reviewers and automated tools to spot that the code around this method needs to be carefully audited. I appreciate that the word "uninitialised" is present, but that is not quite as emphatic, and is also not associated with the Swift convention for unsafety strongly enough.

I think so.

If we can sort out the naming, yes, I believe it does.

Thorough read, plus at least one synthesis of this constructor in a NIO project in the hopes that it would land.

jawbroken · July 30, 2019, 9:41am

The similar function for Array uses “unsafeUninitializedCapacity” as the argument label, so this proposal should probably match that or explain why the label should be different.

Karl · July 30, 2019, 11:43am

+0.5

I think this is an important thing to have. Currently, simple tasks like reading a text file in to a String require intermediate allocations and copying because of the type system. This gives us a safe and elegant way to avoid those things.

However, I do have some issues:

Proposal is wrong

The new initializer takes a closure that operates on an UnsafeMutableBufferPointer and an inout count of initialized elements

No it doesn't. In fact, the inout count is specifically called-out as a design that was rejected.
Backwards deployment

Use of the new initializer is gated by @availability though, so there's no back-deployment concern.

availability-gated APIs in the standard library are really annoying; I would prefer that we tried to backwards-deploy this in a shim library, as no runtime changes are required. In the worst case, it could do the "dumb" thing and allocate an intermediate buffer and use the existing initialiser to copy that result to a String.
Empty Strings

The documentation for the method says that if the closure fails to initialise the String, it should return 0 and this will result in an empty String. I'm not sure I like that; it seems kind-of hacky, like we're assuming that the inputs are always validated to protect against being empty. I get the feeling we're only doing it this way because we're reluctant to define specific errors in the standard library.
Is consistency with Array a consideration or not?

The "alternatives considered" section seems to sometimes believe consistency with Array is not valuable, and at other times mentions that it would be better to be consistend with Array. It reads very weirdly.

Sure.

Yes. We already added a similar thing to Array and have more performant String initialisers planned (e.g. shared Strings).

I don't believe I have used another language with this feature.

Participated in the pitch thread and kinda-related "shared String" threads. I very much hope shared Strings also come soon, in a backwards-deployable way!

Karl · July 30, 2019, 11:49am

I think this is why:

This initialiser is not unsafe; I don't think it would be appropriate to name it as if it was.

jawbroken · July 30, 2019, 11:58am

It's not unsafe in that particular way but it gives you access to an UnsafeMutableBufferPointer which isn't bounds checked in release mode, etc. Unless I'm missing something, it seems relatively easy to use this API in a way that isn't memory safe.

Karl · July 30, 2019, 12:26pm

I'm not sure that's sufficient to stick the word "unsafe" in the API's name (or maybe it's just redundant). Otherwise, we would have done it for withContiguousStorageIfAvailable, too.

jawbroken · July 30, 2019, 12:41pm

Perhaps, but on the other hand there is withUnsafeBytes(_:) on Array and withUnsafeBytes(of:_:) and the aforementioned init(unsafeUninitializedCapacity:initializingWith:) on Array, so the precedent/logic isn't that clear here.

Edit: I'm not sure withContiguousStorageIfAvailable was that great a name, really. When looking into the history here, this API was proposed and reviewed with the name withUnsafeBufferPointerIfAvailable. The name change seems to have been made at the last minute after the end of the review for reasons I couldn't find and don't appear to have been mentioned in the acceptance rationale. Strange.

lukasa · July 30, 2019, 1:35pm

I'm in agreement. While I appreciate that Swift is not interested in following Rust down the rabbit hole of explicit unsafe annotations, it is nonetheless very helpful to have methods that call out to both users and subsequent code readers that there is danger in this section.

wadetregaskis · July 30, 2019, 3:52pm

This is asking a lot I suspect, but is someone willing to explain to the uninitiated such as myself why the compiler / optimiser can't (or at least doesn't, today) optimise out the intermediaries automatically?

(I have no practical objection to this proposal beyond the few small quibbles others have already raised, but I'm always curious why standard libraries have to handle these kinds of optimisation problems, instead of the compiler)

Jean-Daniel · July 30, 2019, 5:42pm

If I understand correctly, this is not really a compiler issue. The problem is that the code that create and populate the intermediary and the String initialiser can be in different modules. As they are not compiled at the same time, there is no way to optimise that.

One clean way to solve this issue would be to have move semantic (but we are not there yet). With move semantic, you would be able to create a String initialiser that take ownership of a UnsafeMutableBufferPointer parameter, avoiding the copy.

David_Smith · July 30, 2019, 5:44pm

Yup! I borrowed @Michael_Ilseman's clever "withUnsafeBytes(of: &aTupleOfUint64s)" trick for this.

I originally had unsafe here but was convinced to take it out. The fact that the argument says it's unsafe in its type name seems sufficient.

lukasa · July 30, 2019, 5:45pm

I'm a bit unsure about this. Type inference means that it's not uncommon that reviewers won't notice that this is an unsafe type here.

David_Smith · July 30, 2019, 5:49pm

Thanks for catching this. I changed the design from "consistency at the cost of ergonomics" to the current design for the reasons noted, but missed this mention of it.

I'll talk to the compiler folks and see what my options are here. One known limitation is that there's no way to mark something as inlinable in one version and not in older versions, but possibly I can work around that.

It's more that we can't currently get fast enough codegen for using errors to handle this, specifically in the fully inlined SmallString case. You could argue that we should wait for the compiler to catch up before we do this, but we need to define what happens if you return 0 anyway, so I don't think it's hurting anything.

David_Smith · July 30, 2019, 5:54pm

This is always an interesting thing to consider. Practically speaking, I'm interested in delivering a faster Swift now rather than in some nebulous future, but API is forever so some additional justification is reasonable here:

The tradeoff between explicit performance APIs like this and improving the optimizer is about convenience vs predictability. Ideally once the optimizer is smarter it will be able to do this in many cases (not being a compiler engineer, I can only guess at how likely that is to happen), but in specific known-to-be-hot code paths, it's often valuable to be able to guarantee that the optimization happens. I'm willing to go look at the generated assembly code to make sure the compiler did what I want, but I suspect I'm in the minority here ;)

Jean-Daniel · July 30, 2019, 5:59pm

And event with a very good optimiser, I guess it would be very hard to optimize the SmallString case anyway.

Michael_Ilseman · July 30, 2019, 6:09pm

What is your evaluation of the proposal?

+1, this aligns precisely with long-term String goals as mentioned in:

Is the problem being addressed significant enough to warrant a change to Swift?

Yes

Does this proposal fit well with the feel and direction of Swift?

Yes

If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?

N/A

How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

In-depth study.

Michael_Ilseman · July 30, 2019, 6:27pm

Small string size and formation is not formally guaranteed in documentation yet, though we could add it. There are some implications on bridging small strings out to Objective-C that couldn't fit in a tagged pointer NSString representation, e.g. one might prefer the large form for better bridging even if it could be small. Refining this concept is future work and seems outside the scope of this proposal.

FWIW, the implementation will form small strings as aggressively as it can.

There's also a lot of nuance regarding what precisely the "unsafety" of Unsafe[Mutable]BufferPointer is. The lifetime and bounds of the UBMP is guaranteed by the API and programming against it generically as a MutableCollection interface is "safe", albeit without runtime bounds checks in release builds. Of course, the UBMP-specific unsafe APIs such as deallocate() or doing arbitrary operations with baseAddress would still be unsafe.

lukasa · July 30, 2019, 6:28pm

The absence of bounds checking is unsafe, sort of by definition. (Where “unsafe” is in the typical Swift meaning of “not memory safe”)