Hi folks!
I've been working on improving the performance of NSString-to-String bridging, and @Michael_Ilseman pointed out that the more we can build it out of public API rather than adding a bunch of private stuff for Foundation's use, the better off everyone is.
This new initializer is designed to be consistent with the recently-accepted similar initializer on Array.
Add a String Initializer with Access to Uninitialized Storage
- Proposal: SE-NNNN
- Author: David Smith
- Review Manager: TBD
- Status: Awaiting review
- Implementation: apple/swift#23409
- Bug: SR-10288
Introduction
This proposal suggests a new initializer for String
that provides access to a String's uninitialized storage buffer.
Motivation
Bridging NSString
to String
currently requires using standard library internals to get good performance, which suggests a class of problem that the standard library is currently not well-equipped to solve: efficiently creating a String
when you don't already have a contiguous buffer of bytes to initialize it with. In the NSString
bridging case, that's using CFStringGetBytes
to copy-and-transcode the contents of the NSString
.
Proposed solution
Add a new String
initializer that lets a program work with an uninitialized
buffer.
The new initializer takes a closure that operates on an
UnsafeMutableBufferPointer
and an inout
count of initialized elements. This
closure has access to the uninitialized contents of the newly created String's
storage, and must set the intialized count of the array before exiting.
let myCocoaString = NSString("The quick brown fox jumps over the lazy dog") as CFString
var myString = String(unsafeUninitializedCapacity: CFStringGetLength(myCocoaString)) { buffer, initializedCount in
CFStringGetBytes(
myCocoaString,
buffer,
…,
&initializedCount
)
}
// myString == "The quick brown fox jumps over the lazy dog"
Without this initializer we would have had to heap allocate an UnsafeMutableBufferPointer
, copy the NSString
contents into it, and then copy the buffer again as we initialized the String
.
Detailed design
/// Creates a new String with the specified capacity in UTF-8 code units then
/// calls the given closure with a buffer covering the String's uninitialized
/// memory.
///
/// The closure should set `initializedCount` to the number of
/// initialized code units, or 0 if it couldn't initialize the buffer
/// (for example if the requested capacity was too small).
///
/// This method replaces ill-formed UTF-8 sequences with the Unicode
/// replacement character (`"\u{FFFD}"`); This may require resizing
/// the buffer beyond its original capacity.
///
/// The following examples use this initializer with the contents of two
/// different `UInt8` arrays---the first with well-formed UTF-8 code unit
/// sequences and the second with an ill-formed sequence at the end.
///
/// let validUTF8: [UInt8] = [67, 97, 102, -61, -87, 0]
/// let s = String(unsafeUninitializedCapacity: validUTF8.count,
/// initializingUTF8With: { (ptr, count) in
/// ptr.initializeFrom(validUTF8)
/// count = validUTF8.count
/// })
/// // Prints "Optional(Café)"
///
/// let invalidUTF8: [UInt8] = [67, 97, 102, -61, 0]
/// let s = String(unsafeUninitializedCapacity: invalidUTF8.count,
/// initializingUTF8With: { (ptr, count) in
/// ptr.initializeFrom(invalidUTF8)
/// count = invalidUTF8.count
/// })
/// // Prints "Optional(Caf�)"
///
/// let s = String(unsafeUninitializedCapacity: invalidUTF8.count,
/// initializingUTF8With: { (ptr, count) in
/// ptr.initializeFrom(invalidUTF8)
/// count = 0
/// })
/// // Prints "Optional("")"
///
/// - Parameters:
/// - capacity: The number of UTF-8 code units worth of memory to allocate
/// for the String.
/// - initializer: A closure that initializes elements and sets the count of
/// the new String
/// - Parameters:
/// - buffer: A buffer covering uninitialized memory with room for the
/// specified number of UTF-8 code units.
/// - initializedCount: Set this to the number of elements in `buffer`
/// that were actually initialized by the `initializer`
@inlinable @inline(__always)
public init(
unsafeUninitializedCapacity capacity: Int,
initializingUTF8With initializer: (
_ buffer: UnsafeMutableBufferPointer<UInt8>,
_ initializedCount: inout Int
) throws -> Void
) rethrows
Specifying a capacity
The initializer takes the specific capacity that a user wants to work with as a
parameter. The buffer passed to the closure has a count that is exactly the
same as the specified capacity, even if the ultimate size of the new String
is larger.
Guarantees after throwing
Unlike Array
, there are no special considerations about the state of the buffer when an error is thrown.
Source compatibility
This is an additive change to the standard library,
so there is no effect on source compatibility.
Effect on ABI stability
The new initializer will be part of the ABI, and will result in calls to a new @usableFromInline symbol being inlined into client code. Use of the new initializer is gated by @availability though, so there's no back-deployment concern.
Effect on API resilience
The additional APIs will be a permanent part of the standard library,
and will need to remain public API.
Alternatives considered
Returning the new count from the initializer closure
This is more plausible for String
than it was for Array
, since there's no need to deal with deinitialization of elements in partially initialized buffers (UInt8
s are trivial). However, it was considered more valuable to match Array
's behavior here.
Returning a Bool
to indicate success from the closure
Requiring people to either throw
or check in the caller for an empty String
return if the initializing closure fails is slightly awkward, but again, not sufficiently so to warrant being inconsistent with Array
.
Validating UTF-8 instead of repairing invalid UTF-8
Matching the behavior of most other String
initializers here also makes it more ergonomic to use, since it can be non-failable this way.