Array Initializer with Access to Uninitialized Buffer

I probably don't understand the full complexity here, but why can't or shouldn't the available capacity just be passed as an argument to the provided closure? That would be my expectation, that you would get the buffer pointer, the available capacity, and the inout count.

A "buffer pointer" is a start/length pair, where the length is intended to be the capacity. Passing it separately would be redundant.

3 Likes

Is the inout count for the actual used count? If so could we instead return an Int over the inout?

Right, thanks, I can't keep the unsafe APIs straight. I guess I'm still not sure what the problem is then. If someone makes an assumption about the capacity instead of checking the buffer's count, then won't any such reasonable assumption presume that there is less capacity than is actually available, rather than more, which isn't a big problem from a memory safety perspective?

1 Like

I would have named it init(reserveCapacity:unsafeInitializingWith:). I don't feel strongly about the exact name, but I do think we should make a distinction between the reserved capacity vs. the actual capacity.

someone calling this method may easily expect that the size of the buffer in the closure matches either the capacity of the array that they observe through the capacity property

I didn't realize that. Why would the Array's capacity property differ from the UnsafeBufferPointer's capacity?

If the array's storage is shared, it has to allocate new, unique storage before calling the closure. The exact capacity can change during that reallocation.

Thanks for everyone's input! PR for the proposal is here: Add a proposal for `Array.init(unsafeUninitializedCapacity:initializingWith:)` by natecook1000 ¡ Pull Request #882 ¡ apple/swift-evolution ¡ GitHub

3 Likes

The proposal is excellent.

Quick question on the partition example:

var high = low + buffer.count

I'm curious. In practice, wouldn't you expect users to typically
capture and use the same count that they passed in instead?

var high = low + count

Can you include some thoughts on why initializedCount is an out-parameter rather than just a return value? I get that naming it makes it slightly easier to talk about in the docs, but if it's a return value the client is required to deal with it. (Unlike the method version, you're not using the return value for anything.)


I'm still sad about not getting the mutating method version, since there's no recourse there other than to wrap your elements in Optional or drop down to UnsafeMutableBufferPointer allocation. I think my attempt to combat the capacity confusion would be to add the parameter, like you said:

mutating func withUnsafeMutableBufferPointer<Result>(
  reservingCapacity minimumCapacity: Int,
  do action: (
    _ buffer: inout UnsafeMutableBufferPointer<Element>,
    _ initializedCount: inout Int
  ) -> Result
) -> Result
3 Likes

i agree, the count should be a return value. i feel like i’m gonna forget to set the inout parameter

Either way should work, since this initializer guarantees that the number they specify and the size of the buffer are the same. In the Array.withUnsafe____ methods we require users to use the buffer's count rather than the array's (for exclusivity reasons), so they may be accustomed to doing that already.


To be perfectly honest, it's to be parallel with the method that I removed from the proposal, in case we add it in the future. Now that I write that out, it doesn't seem very compelling, esp. given your point about requiring the client to give a value rather than forgetting (although that's a bug would be apparent pretty quickly). I think I'll revert the closure's signature.

Re: the name for the mutating method, withUnsafeMutableBufferPointer(reservingCapacity:do:) still looks too close to the existing withUnsafeMutableBufferPointer method to me — I think it would need to make clear that the buffer includes uninitialized memory.

i don't want to derail this with naming drama but i’m not really convinced by this paragraph

This proposal leaves out wording that would reference two other relevant concepts:

reserving capacity: Arrays currently have a reserveCapacity(_:) method, which is somewhat akin to the first step of the initializer. However, that method is used for the sake of optimizing performance when adding to an array, rather than providing direct access to the array's capacity. In fact, as part of the RangeReplaceableCollection protocol, that method doesn't even require any action to be taken by the targeted type. For those reasons, the idea of "reserving" capacity doesn't seem as appropriate as providing a specific capacity that will be used.

for me i conceptually think of this method as a variant of the reserve-and-push idiom where the pushes are replaced with random accesses and the count is returned at the end rather than incremented with each element. i understand it’s problematic that the reserveCapacity(_:) method on the protocol doesn’t actually guarantee the capacity is there but that sounds more like a problem with the protocol. either way unsafeUninitializedCapacity: is a horrible argument label.

maybe instead we can use allocatingCapacity: which would evoke UnsafeMutablePointer<T>.allocate(capacity:) which is really the pattern we’re trying to replace here. people who are going to be using this API are going to be people who have already been using the unsafe pointer APIs (since until now, that’s been the only way to do this), and i feel like we’ve already drilled into everyone’s head that the word allocate means unsafe and uninitialized.

The initializer can be used for this purpose, though I doubt there's much of an optimization win, since appending is already super fast if there aren't re-allocations. The goals here are more to support noncontiguous access (like the partitioning example) and mostly C interoperability cases, when you need to start with a buffer of uninitialized memory and have a function write into it. I should add an example of that second usage.

allocatingCapacity: is good, but I still think we need unsafe to be visible at the use site.

no i know what you mean i meant at the highest level the act of making an array of size n and populating it with stuff. with pushes it would only work if you populate the array from beginning to end. this API generalizes it so it can be populated in any order, but at the high level, it’s still doing the same thing. You know how big the array should be at the end, the method of populating it is just slightly different.

1 Like

what about a static method Array<T>.allocateUnsafe(capacity:_:), would line up nicely with UnsafeMutablePointer.allocate(capacity:)

I think the problem I have with including "allocate" in the name is that whenever you allocate in Swift you're also in charge of deallocating, which this avoids.

truee. so allocate and reserve probably aren’t good words then.

another idea: what if the closure didn’t take an UnsafeMutableBufferPointer, but some kind of ArrayBuffer type? (not to be confused with Array internals) in debug builds, this object would track the memorystate of each element in the buffer and trap on return if there are uninitialized elements before the returned count index. in release builds, this type would be completely transparent and write to the underlying buffer. we really don’t want to be exposing UnsafeMutableBufferPointer subscripts that directly anyway, since those do assignment which is incorrect usage.

How is using the UnsafeMutableBufferPointer for assignment incorrect? Isn't its purpose to manipulate the raw memory that backs whatever has vended it? Since the new constructor is already intended to be an unsafe API I don't see how this is bad. The UnsafeMutableBufferPointer is far more useful especially when working with (Obj-)C.

Also, I agree that avoiding allocate or reserve if possible is preferred from a naming perspective.

No! in general you cannot assign to uninitialized memory. that’s cause Swift will try to deinitialize whatever is in the destination location, and since nothing is there yet, it will (possibly) segfault. try it this crashes (as it should)

class C 
{
    let c:Int
    
    init(_ c:Int)
    {
        self.c = c
    }
}

let array:[C] = .init(_unsafeUninitializedCapacity: 32)
{
    for i:Int in (0 ..< 16).reversed() 
    {
        $0[i] = .init(i)
    }
    
    $1 = 16
}

the exception is pure value types like Int. but even ‘value-semantic’ types like [Int] cannot be used like that. it might be nonobvious at first that this is legal

let array1:[Int] = .init(_unsafeUninitializedCapacity: 32)
{
    for i:Int in (0 ..< 16).reversed() 
    {
        $0[i] = i
    }
    
    $1 = 16
}

but this is not

let array2:[[Int]] = .init(_unsafeUninitializedCapacity: 32)
{
    for i:Int in (0 ..< 16).reversed() 
    {
        $0[i] = []
    }
    
    $1 = 16
}

this can be problematic because you could have a struct as your element type and everything is working fine until you add an Array member to your struct and all of a sudden, all those [i] = s don’t work anymore

2 Likes

Ah, I didn't think about the reference type usage needing to initialize.

My main use cases are with structs and C primitives. So I am creating an array with an instance of the struct/primitive using the current init(repeating:count:) and then I use withUnsafeMutableBufferPointer() to pass in the buffer pointer to be properly initialized by the C code with the new values I actually want.

1 Like