SE-0263: Add a String Initializer with Access to Uninitialized Storage

One main motivation for unsafeUninitializedCapacity was to call out Arrays of non-trivial types which require explicit initialization. This is non-obvious, e.g. users shouldn't use the subscript setter to initialize the elements: it will try to de-initialize uninitialized memory and result in UB. As @Karl mentioned, UTF-8 code units are trivial, so this is not a concern.

It can't be implemented entirely on the Swift 5.0 ABI. If this is made @inlinable, it would further require the symbol for the cold path. In general, wouldn't the intermediate buffer approach defeat the point of this API in the non-small case?

If this is critical, we can consider doing something for the small case.

Alternative formulations could be a return Int? or throws on the closure. The latter has some significant downsides, IIRC. This decision should be justified in the "Alternatives Considered" section.

@David_Smith, can you add justifications for "uninitializedCapacity", availability / inlinability, and zero convention on the closure return value?

3 Likes

Yup, I'll do that alongside fixing the mistake caught earlier in the thread :slight_smile:

Two quick comments:

  1. I think it’d be better to have “UTF8” in the capacity argument label, not the closure argument label. It’s critical that people know the capacity is UTF8. People will use this with trailing closures, and when they do, that won’t be written anywhere.

  2. I think it’d be better to use an inout count parameter like the similar Array initializer. The proposal correctly notes that this isn’t necessary for correctness as it is for Array, but I didn’t see any specific reasons it would be good to use a return value.

Otherwise, this looks like a very nice addition to the language.

5 Likes

Inlining is tricky, yes. Wouldn't something like this be possible?

// Regular, inlineable version for platforms which support it.
@available(iOS 13, ..., *) @inlineable
func foo(...)

// Non-inlineable fallback version.
@deprecated(iOS 13, ..., *)
func foo(...)

So if your deployment target is high enough or you are availability-gated to a Darwin platform with a new-enough Swift, you will use the inlineable version. If your deployment target or availability guards are lower, you will silently use the fallback.

And yes, this wouldn't give the same performance as the version which uses the String's backing storage directly, but it would mean that you don't need as much branching and fiddling when creating your strings. For example, reading a text file:

if @available(iOS 13, ..., *) {
  return String(uninitializedCapacity: 1024) { buf in
    let result = read(fd, buf, buf.count)
    guard result < 0 else { return result }
    // handle errors.
  }
} else {
  let buf = UnsafeMutableRawPointer.allocate(1024)
  defer { buf.deallocate() }
  let result = read(fd, buf, buf.count)
  guard result < 0 else { return String(decoding: buf, as: UTF8.self) }
  // handle errors.
}

It would be very nice for users of this initialiser if we could eliminate the need to write the fallback by shipping it in a shim library.

EDIT: The shim version could also be smarter than this. For example, if the requested capacity is large, we could use the buffer as shared-String storage (which is in the ABI).

String(utf8Capacity: n, initializingWith: myFunc) reads better to me too, especially in the trailing closure formulation of String(utf8Capacity: n) { ... }. I don't think that "uninitialized" is important here for a trivial storage type.

Why? My understanding from the Array initializer thread is that a return value would be preferred, but proved infeasible due to non-trivial element types. A returned count is easier to reason about correctly in the presence of early returns, etc., than a stored count (which may require remembering to use defer).

Overall, I don't think rote mirroring of Array's initializer is a goal here as there are differences:

  1. Storage representation (UInt8s in contiguous memory) is different than the element type (Character). Similarly, the resulting String's .count does not necessarily match the storage-count set/returned by the closure.
  2. UTF-8 error correction is performed afterwards, meaning that even the storage-count afterwards does not necessarily match the storage-count set/returned by the closure. Similarly, in uncommon cases, the storage may need to grow to ensure encoding validity.
  3. While not formally guaranteed in the documentation (though it could be modified to do so), capacities <= the max small-string representation capacity will not incur any allocation at all (assuming error-correcting does not exceed it either). Array has no general analogy to this, other than perhaps the singleton empty Array.
1 Like

I was about to object to this on the grounds that it makes the precondition for initialize(from:to:) unclear, but I see that specifically excludes trivial types. I'm still a little uncertain simply because if you're autocompleting this while previously unaware of it, it may not be clear from the name what need it's intended to fill. My rule of thumb for "niche" things like this is that they shouldn't accidentally look appealing if you aren't looking for that functionality.

I don't have a strong objection to renaming it like this though.

Could you elaborate a bit more here? IIUC, one could say something like

var str = String(utf8Capacity: 100) { return 0 }

To get an empty string with capacity for 100 UTF-8 code units for subsequent appends (similarly to a call to reserveCapacity()). Although slightly strange, would this be harmful or undesirable?

1 Like

@tkremenek So what's the status of this review?

Also, I just want to point out that when ABI stability was announced, we were told that Apple would make a "best effort" to backwards deploy things which could be shimmed. This seems like something that could be (at least at the API level, even if performance won't be optimal). Is the infrastructure to do this just not available yet?

The Core Team didn't meet last week but plans on discussing the outcome of this review shortly.

Terms of Service

Privacy Policy

Cookie Policy