No—dictionaries and sets can't have arbitrarily sized storage buffers, so they choose the next size larger than what you pass. For Array, we would want to guarantee that the array has storage for exactly the specified number of elements.
/// For performance reasons, the size of the newly allocated storage might be
/// greater than the requested capacity. Use the array's `capacity` property
/// to determine the size of the new storage.
This is a feature I've wanted for a good while now and haven't had the time to put an initial pitch together. It would be very helpful when working with C APIs.
You're right about that — the parameter name is even minimumCapacity for that method. With the current APIs that's probably fine, but if we're exposing the full capacity of the array in this way, it might be confusing to not have as precise control over the size as we do when allocating a buffer directly. These can be quite different amounts:
var a: [UInt8] = []
a.reserveCapacity(5)
// a.capacity == 16
Yeah, that's exactly what I had in mind. It's only semi-related, in that it has similar performance benefits and reasons for existing, so a separate proposal would be fine.
This is a very important API, especially when interoperating with C libraries. I thought about this quite a bit in the past, but I'm glad to see a pitch.
My approach to this was a little less direct: Why do we even need these copies today? Why can't Array/String just wrap any old UnsafeMutableBufferPointer? I think the main reason is to provide value semantics and immutability, which requires the buffer itself to have value semantics (like Foundation.Data). Unfortunately we don't have something like that in the standard library, but if we did - theoretically Array and String could use it as backing storage, expose it directly to the user, and allow re-wrapping an existing buffer (e.g. you could create a String which shares the backing of an Array) while preserving value semantics and avoiding copying.
/// A COW memory buffer - like Foundation.Data, but guaranteed to be contiguous.
struct ContiguousData: RandomAccessCollection { /* ... */ }
struct Array<T> {
init(rawStorage: ContiguousData)
var rawStorage: ContiguousData
}
var string: String
do {
// Create an Array
let myArray = [1, 2, 3, 4]
// View its memory as a String (no copying)
string = String(rawStorage: myArray.rawStorage, encoding: .ascii)
}
// Array goes out of scope, string now holds unique reference to storage. No copy.
string.append("blah")
What do you think - would something like this work or not?
i don’t think this would work, Arrays have a header that’s allocated at the head of the buffer. so the elements actually come at an offset from the buffer start
This seems like a broader concern that may not be worth addressing in this specific case: things that need to be mutable for initialisation but are immutable after that.
Thanks for this Nate. It's been a long time coming. Here's Karl's old bug:
I agree that we should introduce a single unsafe API that handles initialization and reinitialization. The functionality needs to exist in a primitive form first. Mutability concerns and convenience can be added later.
I'm glad someone else proposed this name so I don't take any flak for it, but it works for me:
I'm looping back around to this idea after some discussion of the intention and behavior of array storage allocation in this thread: Array capacity optimization. The primary challenge that I've been looking at is that someone calling this method may easily expect that the size of the buffer in the closure matches either the capacity of the array that they observe through the capacity property or the capacity that they reserve immediately before, through a call to reserveCapacity(_:), but neither one of those is guaranteed.
I think I'm back to the original initializer that I had in mind, with a revised name (to add "unsafe"), since I'm not satisfied with the designs I can come up with for a mutating method. Here's the thought process that leads to my kind-of-paradox:
If someone calls a hypothetical withFullCapacityUnsafeMutableBufferPointer method, they may make mistaken assumptions about what size buffer they receive inside the closure. Mistakes of this kind are worse than usual, since they can lead to memory leaks or memory accesses past the buffer's boundaries. We don't have a great way to provide diagnostics when things go awry, and the buffer's size will likely be nondeterministic, so issues may go unnoticed in development only to appear in production.
We could add a capacity parameter to the method, but that complicates the method a bit, and means that the confusion is still possible, since the method doesn't really make sense if the user passes a capacity smaller than the array's count. Would we include a precondition that the specified capacity has to be at least the count? It feels like this version trades one problem for another.
Also, would we even call that method? It's no longer withFullCapacityUnsafeMutableBufferPointer, because the user is passing a specific capacity. Putting "uninitialized" in there doesn't really make sense either, since some portion of the buffer may be initialized already.
Agree? Disagree? Ideas for a mutating method that doesn't have these issues?
The initializer doesn't have these problems: the buffer can cover the exact number of elements that the user requests, even if the storage itself ends up being slightly larger, and since the whole buffer is uninitialized, we can use that word in the API, making its usage more clear.
I probably don't understand the full complexity here, but why can't or shouldn't the available capacity just be passed as an argument to the provided closure? That would be my expectation, that you would get the buffer pointer, the available capacity, and the inout count.
Right, thanks, I can't keep the unsafe APIs straight. I guess I'm still not sure what the problem is then. If someone makes an assumption about the capacity instead of checking the buffer's count, then won't any such reasonable assumption presume that there is less capacity than is actually available, rather than more, which isn't a big problem from a memory safety perspective?
I would have named it init(reserveCapacity:unsafeInitializingWith:). I don't feel strongly about the exact name, but I do think we should make a distinction between the reserved capacity vs. the actual capacity.
someone calling this method may easily expect that the size of the buffer in the closure matches either the capacity of the array that they observe through the capacity property
I didn't realize that. Why would the Array's capacity property differ from the UnsafeBufferPointer's capacity?
If the array's storage is shared, it has to allocate new, unique storage before calling the closure. The exact capacity can change during that reallocation.
Can you include some thoughts on why initializedCount is an out-parameter rather than just a return value? I get that naming it makes it slightly easier to talk about in the docs, but if it's a return value the client is required to deal with it. (Unlike the method version, you're not using the return value for anything.)
I'm still sad about not getting the mutating method version, since there's no recourse there other than to wrap your elements in Optional or drop down to UnsafeMutableBufferPointer allocation. I think my attempt to combat the capacity confusion would be to add the parameter, like you said:
mutating func withUnsafeMutableBufferPointer<Result>(
reservingCapacity minimumCapacity: Int,
do action: (
_ buffer: inout UnsafeMutableBufferPointer<Element>,
_ initializedCount: inout Int
) -> Result
) -> Result