SE-0263: Add a String Initializer with Access to Uninitialized Storage

Michael_Ilseman · July 30, 2019, 6:51pm

One main motivation for unsafeUninitializedCapacity was to call out Arrays of non-trivial types which require explicit initialization. This is non-obvious, e.g. users shouldn't use the subscript setter to initialize the elements: it will try to de-initialize uninitialized memory and result in UB. As @Karl mentioned, UTF-8 code units are trivial, so this is not a concern.

It can't be implemented entirely on the Swift 5.0 ABI. If this is made @inlinable, it would further require the symbol for the cold path. In general, wouldn't the intermediate buffer approach defeat the point of this API in the non-small case?

If this is critical, we can consider doing something for the small case.

Alternative formulations could be a return Int? or throws on the closure. The latter has some significant downsides, IIRC. This decision should be justified in the "Alternatives Considered" section.

@David_Smith, can you add justifications for "uninitializedCapacity", availability / inlinability, and zero convention on the closure return value?

David_Smith · July 30, 2019, 8:30pm

Yup, I'll do that alongside fixing the mistake caught earlier in the thread

beccadax · July 30, 2019, 9:26pm

Two quick comments:

I think it’d be better to have “UTF8” in the capacity argument label, not the closure argument label. It’s critical that people know the capacity is UTF8. People will use this with trailing closures, and when they do, that won’t be written anywhere.
I think it’d be better to use an inout count parameter like the similar Array initializer. The proposal correctly notes that this isn’t necessary for correctness as it is for Array, but I didn’t see any specific reasons it would be good to use a return value.

Otherwise, this looks like a very nice addition to the language.

Karl · July 31, 2019, 12:37pm

Inlining is tricky, yes. Wouldn't something like this be possible?

// Regular, inlineable version for platforms which support it.
@available(iOS 13, ..., *) @inlineable
func foo(...)

// Non-inlineable fallback version.
@deprecated(iOS 13, ..., *)
func foo(...)

So if your deployment target is high enough or you are availability-gated to a Darwin platform with a new-enough Swift, you will use the inlineable version. If your deployment target or availability guards are lower, you will silently use the fallback.

And yes, this wouldn't give the same performance as the version which uses the String's backing storage directly, but it would mean that you don't need as much branching and fiddling when creating your strings. For example, reading a text file:

if @available(iOS 13, ..., *) {
  return String(uninitializedCapacity: 1024) { buf in
    let result = read(fd, buf, buf.count)
    guard result < 0 else { return result }
    // handle errors.
  }
} else {
  let buf = UnsafeMutableRawPointer.allocate(1024)
  defer { buf.deallocate() }
  let result = read(fd, buf, buf.count)
  guard result < 0 else { return String(decoding: buf, as: UTF8.self) }
  // handle errors.
}

It would be very nice for users of this initialiser if we could eliminate the need to write the fallback by shipping it in a shim library.

EDIT: The shim version could also be smarter than this. For example, if the requested capacity is large, we could use the buffer as shared-String storage (which is in the ABI).

Michael_Ilseman · August 7, 2019, 9:01pm

String(utf8Capacity: n, initializingWith: myFunc) reads better to me too, especially in the trailing closure formulation of String(utf8Capacity: n) { ... }. I don't think that "uninitialized" is important here for a trivial storage type.

Why? My understanding from the Array initializer thread is that a return value would be preferred, but proved infeasible due to non-trivial element types. A returned count is easier to reason about correctly in the presence of early returns, etc., than a stored count (which may require remembering to use defer).

Overall, I don't think rote mirroring of Array's initializer is a goal here as there are differences:

Storage representation (UInt8s in contiguous memory) is different than the element type (Character). Similarly, the resulting String's .count does not necessarily match the storage-count set/returned by the closure.
UTF-8 error correction is performed afterwards, meaning that even the storage-count afterwards does not necessarily match the storage-count set/returned by the closure. Similarly, in uncommon cases, the storage may need to grow to ensure encoding validity.
While not formally guaranteed in the documentation (though it could be modified to do so), capacities <= the max small-string representation capacity will not incur any allocation at all (assuming error-correcting does not exceed it either). Array has no general analogy to this, other than perhaps the singleton empty Array.

David_Smith · August 7, 2019, 9:59pm

I was about to object to this on the grounds that it makes the precondition for initialize(from:to:) unclear, but I see that specifically excludes trivial types. I'm still a little uncertain simply because if you're autocompleting this while previously unaware of it, it may not be clear from the name what need it's intended to fill. My rule of thumb for "niche" things like this is that they shouldn't accidentally look appealing if you aren't looking for that functionality.

I don't have a strong objection to renaming it like this though.

Michael_Ilseman · August 7, 2019, 11:08pm

Could you elaborate a bit more here? IIUC, one could say something like

var str = String(utf8Capacity: 100) { return 0 }

To get an empty string with capacity for 100 UTF-8 code units for subsequent appends (similarly to a call to reserveCapacity()). Although slightly strange, would this be harmful or undesirable?

Karl · August 14, 2019, 12:22pm

@tkremenek So what's the status of this review?

Also, I just want to point out that when ABI stability was announced, we were told that Apple would make a "best effort" to backwards deploy things which could be shimmed. This seems like something that could be (at least at the API level, even if performance won't be optimal). Is the infrastructure to do this just not available yet?

tkremenek · August 14, 2019, 2:23pm

The Core Team didn't meet last week but plans on discussing the outcome of this review shortly.

Karl · August 26, 2019, 2:42pm

@tkremenek Ping. I see that the demangling proposal reached a conclusion. Did the core team manage to discuss this one, too?

Thanks.

tkremenek · August 27, 2019, 5:54am

We did discuss this. I handed the review manager responsibility off to @Ben_Cohen as I found myself not in the position to properly handle the review, and he is going to do a write up. We were also going to respond with some perspective on discussion related to backward deployment, and what are the guidance we're going to provide regarding discussion/concerns about backward deployment in proposals going forward. Thanks for your patience!

tkremenek · September 6, 2019, 6:07am

Returned with the intent to accept, with request for minor revision

The proposal has been accepted, with the request that the proposal author revises the API name to include "unsafe". This was brought up during the review discussion, and the Core Team agreed this was an important aspect to include in the API's name.

The rationale is that we have a bit of a model where we use the term "unsafe" whenever an API is introducing an unsafe buffer or pointer into your code. This identifies a transition into a region of "unsafety". Once you have a value of an "unsafe" type, you don't need to keep calling the APIs that work on those values "unsafe". This is something worth formalizing more in the API guidelines.

The API author (who is currently on vacation) should propose a new name for the API, and we will extend the review for a few days to review it with the community.

Discussion on Backwards Deployment

The review discussion also brought up the topic of backward deployment, which spurned a broader meta discussion within the Core Team on how evolution proposals and reviews should approach the topic of backward deployment — a topic that has come up before.

Backward deployment is interesting because it is obviously an important concern for Swift developers on platforms where backward deployment is a meaningful concept (such as on macOS). However, even today not all platforms where Swift is supported share the same concerns about backward deployment of APIs in the Swift runtime or Standard Library. Further, in the fullness of time as Swift is supported on more platforms the exact policy of how additions to the Standard Library will be "backward deployed" may be particular to the platform vendor or in the way Swift is distributed on a given platform.

As a point of establishing precedent, the Core team feels that a reasonable policy that strikes a balance is that backward deployment decisions are not deciding factors in evolution discussions. However, the Core team feels it is fine — and valuable — for review discussions to talk about backward deployment and potentially factor those concerns into the proposal. Backward deployment alone, however, will not be a criteria for whether or not to accept a proposal. Ultimately, how much backward deployment of an API or runtime capability is supported is a combination of (a) what is technically possible as well as (b) up to whomever is distributing Swift on a particular platform is able to provide. Of course, some of the mechanisms that help facilitate backward deployment — such as inlining of an API — can also impact the invariants and semantics of a proposed API, and should be reviewed as part of an API proposal.

Remark on tardiness of review process for this review

I'd like to extend my appreciation to the community for the patience they've shown with the tardiness of myself as the review manager on the efficiency of how this review was run, discussed within the Core team, and circled back with the community. There are opportunities to improve the efficiency and experience of the review process, and the Core team is actively looking at ways to improve the review process in a variety of ways that improves turnaround time from the Core team and increases transparency in the review process.

tkremenek · October 9, 2019, 2:04pm

Phase #2 of the review has resumed on a new thread: