SE-0263: Add a String Initializer with Access to Uninitialized Storage

I'm a bit unsure about this. Type inference means that it's not uncommon that reviewers won't notice that this is an unsafe type here.

2 Likes

Thanks for catching this. I changed the design from "consistency at the cost of ergonomics" to the current design for the reasons noted, but missed this mention of it.

I'll talk to the compiler folks and see what my options are here. One known limitation is that there's no way to mark something as inlinable in one version and not in older versions, but possibly I can work around that.

It's more that we can't currently get fast enough codegen for using errors to handle this, specifically in the fully inlined SmallString case. You could argue that we should wait for the compiler to catch up before we do this, but we need to define what happens if you return 0 anyway, so I don't think it's hurting anything.

1 Like

This is always an interesting thing to consider. Practically speaking, I'm interested in delivering a faster Swift now rather than in some nebulous future, but API is forever so some additional justification is reasonable here:

The tradeoff between explicit performance APIs like this and improving the optimizer is about convenience vs predictability. Ideally once the optimizer is smarter it will be able to do this in many cases (not being a compiler engineer, I can only guess at how likely that is to happen), but in specific known-to-be-hot code paths, it's often valuable to be able to guarantee that the optimization happens. I'm willing to go look at the generated assembly code to make sure the compiler did what I want, but I suspect I'm in the minority here ;)

1 Like

And event with a very good optimiser, I guess it would be very hard to optimize the SmallString case anyway.

  • What is your evaluation of the proposal?

+1, this aligns precisely with long-term String goals as mentioned in:

  • Is the problem being addressed significant enough to warrant a change to Swift?

Yes

  • Does this proposal fit well with the feel and direction of Swift?

Yes

  • If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?

N/A

  • How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

In-depth study.

1 Like

Small string size and formation is not formally guaranteed in documentation yet, though we could add it. There are some implications on bridging small strings out to Objective-C that couldn't fit in a tagged pointer NSString representation, e.g. one might prefer the large form for better bridging even if it could be small. Refining this concept is future work and seems outside the scope of this proposal.

FWIW, the implementation will form small strings as aggressively as it can.

There's also a lot of nuance regarding what precisely the "unsafety" of Unsafe[Mutable]BufferPointer is. The lifetime and bounds of the UBMP is guaranteed by the API and programming against it generically as a MutableCollection interface is "safe", albeit without runtime bounds checks in release builds. Of course, the UBMP-specific unsafe APIs such as deallocate() or doing arbitrary operations with baseAddress would still be unsafe.

1 Like

The absence of bounds checking is unsafe, sort of by definition. (Where “unsafe” is in the typical Swift meaning of “not memory safe”)

2 Likes

One main motivation for unsafeUninitializedCapacity was to call out Arrays of non-trivial types which require explicit initialization. This is non-obvious, e.g. users shouldn't use the subscript setter to initialize the elements: it will try to de-initialize uninitialized memory and result in UB. As @Karl mentioned, UTF-8 code units are trivial, so this is not a concern.

It can't be implemented entirely on the Swift 5.0 ABI. If this is made @inlinable, it would further require the symbol for the cold path. In general, wouldn't the intermediate buffer approach defeat the point of this API in the non-small case?

If this is critical, we can consider doing something for the small case.

Alternative formulations could be a return Int? or throws on the closure. The latter has some significant downsides, IIRC. This decision should be justified in the "Alternatives Considered" section.

@David_Smith, can you add justifications for "uninitializedCapacity", availability / inlinability, and zero convention on the closure return value?

3 Likes

Yup, I'll do that alongside fixing the mistake caught earlier in the thread :slight_smile:

Two quick comments:

  1. I think it’d be better to have “UTF8” in the capacity argument label, not the closure argument label. It’s critical that people know the capacity is UTF8. People will use this with trailing closures, and when they do, that won’t be written anywhere.

  2. I think it’d be better to use an inout count parameter like the similar Array initializer. The proposal correctly notes that this isn’t necessary for correctness as it is for Array, but I didn’t see any specific reasons it would be good to use a return value.

Otherwise, this looks like a very nice addition to the language.

5 Likes

Inlining is tricky, yes. Wouldn't something like this be possible?

// Regular, inlineable version for platforms which support it.
@available(iOS 13, ..., *) @inlineable
func foo(...)

// Non-inlineable fallback version.
@deprecated(iOS 13, ..., *)
func foo(...)

So if your deployment target is high enough or you are availability-gated to a Darwin platform with a new-enough Swift, you will use the inlineable version. If your deployment target or availability guards are lower, you will silently use the fallback.

And yes, this wouldn't give the same performance as the version which uses the String's backing storage directly, but it would mean that you don't need as much branching and fiddling when creating your strings. For example, reading a text file:

if @available(iOS 13, ..., *) {
  return String(uninitializedCapacity: 1024) { buf in
    let result = read(fd, buf, buf.count)
    guard result < 0 else { return result }
    // handle errors.
  }
} else {
  let buf = UnsafeMutableRawPointer.allocate(1024)
  defer { buf.deallocate() }
  let result = read(fd, buf, buf.count)
  guard result < 0 else { return String(decoding: buf, as: UTF8.self) }
  // handle errors.
}

It would be very nice for users of this initialiser if we could eliminate the need to write the fallback by shipping it in a shim library.

EDIT: The shim version could also be smarter than this. For example, if the requested capacity is large, we could use the buffer as shared-String storage (which is in the ABI).

String(utf8Capacity: n, initializingWith: myFunc) reads better to me too, especially in the trailing closure formulation of String(utf8Capacity: n) { ... }. I don't think that "uninitialized" is important here for a trivial storage type.

Why? My understanding from the Array initializer thread is that a return value would be preferred, but proved infeasible due to non-trivial element types. A returned count is easier to reason about correctly in the presence of early returns, etc., than a stored count (which may require remembering to use defer).

Overall, I don't think rote mirroring of Array's initializer is a goal here as there are differences:

  1. Storage representation (UInt8s in contiguous memory) is different than the element type (Character). Similarly, the resulting String's .count does not necessarily match the storage-count set/returned by the closure.
  2. UTF-8 error correction is performed afterwards, meaning that even the storage-count afterwards does not necessarily match the storage-count set/returned by the closure. Similarly, in uncommon cases, the storage may need to grow to ensure encoding validity.
  3. While not formally guaranteed in the documentation (though it could be modified to do so), capacities <= the max small-string representation capacity will not incur any allocation at all (assuming error-correcting does not exceed it either). Array has no general analogy to this, other than perhaps the singleton empty Array.
1 Like

I was about to object to this on the grounds that it makes the precondition for initialize(from:to:) unclear, but I see that specifically excludes trivial types. I'm still a little uncertain simply because if you're autocompleting this while previously unaware of it, it may not be clear from the name what need it's intended to fill. My rule of thumb for "niche" things like this is that they shouldn't accidentally look appealing if you aren't looking for that functionality.

I don't have a strong objection to renaming it like this though.

1 Like

Could you elaborate a bit more here? IIUC, one could say something like

var str = String(utf8Capacity: 100) { return 0 }

To get an empty string with capacity for 100 UTF-8 code units for subsequent appends (similarly to a call to reserveCapacity()). Although slightly strange, would this be harmful or undesirable?

1 Like

@tkremenek So what's the status of this review?

Also, I just want to point out that when ABI stability was announced, we were told that Apple would make a "best effort" to backwards deploy things which could be shimmed. This seems like something that could be (at least at the API level, even if performance won't be optimal). Is the infrastructure to do this just not available yet?

The Core Team didn't meet last week but plans on discussing the outcome of this review shortly.

@tkremenek Ping. I see that the demangling proposal reached a conclusion. Did the core team manage to discuss this one, too?

Thanks.

We did discuss this. I handed the review manager responsibility off to @Ben_Cohen as I found myself not in the position to properly handle the review, and he is going to do a write up. We were also going to respond with some perspective on discussion related to backward deployment, and what are the guidance we're going to provide regarding discussion/concerns about backward deployment in proposals going forward. Thanks for your patience!

5 Likes

Returned with the intent to accept, with request for minor revision

The proposal has been accepted, with the request that the proposal author revises the API name to include "unsafe". This was brought up during the review discussion, and the Core Team agreed this was an important aspect to include in the API's name.

The rationale is that we have a bit of a model where we use the term "unsafe" whenever an API is introducing an unsafe buffer or pointer into your code. This identifies a transition into a region of "unsafety". Once you have a value of an "unsafe" type, you don't need to keep calling the APIs that work on those values "unsafe". This is something worth formalizing more in the API guidelines.

The API author (who is currently on vacation) should propose a new name for the API, and we will extend the review for a few days to review it with the community.

Discussion on Backwards Deployment

The review discussion also brought up the topic of backward deployment, which spurned a broader meta discussion within the Core Team on how evolution proposals and reviews should approach the topic of backward deployment — a topic that has come up before.

Backward deployment is interesting because it is obviously an important concern for Swift developers on platforms where backward deployment is a meaningful concept (such as on macOS). However, even today not all platforms where Swift is supported share the same concerns about backward deployment of APIs in the Swift runtime or Standard Library. Further, in the fullness of time as Swift is supported on more platforms the exact policy of how additions to the Standard Library will be "backward deployed" may be particular to the platform vendor or in the way Swift is distributed on a given platform.

As a point of establishing precedent, the Core team feels that a reasonable policy that strikes a balance is that backward deployment decisions are not deciding factors in evolution discussions. However, the Core team feels it is fine — and valuable — for review discussions to talk about backward deployment and potentially factor those concerns into the proposal. Backward deployment alone, however, will not be a criteria for whether or not to accept a proposal. Ultimately, how much backward deployment of an API or runtime capability is supported is a combination of (a) what is technically possible as well as (b) up to whomever is distributing Swift on a particular platform is able to provide. Of course, some of the mechanisms that help facilitate backward deployment — such as inlining of an API — can also impact the invariants and semantics of a proposed API, and should be reviewed as part of an API proposal.

Remark on tardiness of review process for this review

I'd like to extend my appreciation to the community for the patience they've shown with the tardiness of myself as the review manager on the efficiency of how this review was run, discussed within the Core team, and circled back with the community. There are opportunities to improve the efficiency and experience of the review process, and the Core team is actively looking at ways to improve the review process in a variety of ways that improves turnaround time from the Core team and increases transparency in the review process.

16 Likes

Phase #2 of the review has resumed on a new thread:

Terms of Service

Privacy Policy

Cookie Policy