Array Initializer with Access to Uninitialized Buffer

jawbroken · July 10, 2018, 12:37am

I probably don't understand the full complexity here, but why can't or shouldn't the available capacity just be passed as an argument to the provided closure? That would be my expectation, that you would get the buffer pointer, the available capacity, and the inout count.

jrose · July 10, 2018, 12:51am

A "buffer pointer" is a start/length pair, where the length is intended to be the capacity. Passing it separately would be redundant.

Braden_Scothern · July 10, 2018, 1:12am

Is the inout count for the actual used count? If so could we instead return an Int over the inout?

jawbroken · July 10, 2018, 1:19am

Right, thanks, I can't keep the unsafe APIs straight. I guess I'm still not sure what the problem is then. If someone makes an assumption about the capacity instead of checking the buffer's count, then won't any such reasonable assumption presume that there is less capacity than is actually available, rather than more, which isn't a big problem from a memory safety perspective?

Andrew_Trick · July 10, 2018, 6:24pm

I would have named it init(reserveCapacity:unsafeInitializingWith:). I don't feel strongly about the exact name, but I do think we should make a distinction between the reserved capacity vs. the actual capacity.

someone calling this method may easily expect that the size of the buffer in the closure matches either the capacity of the array that they observe through the capacity property

I didn't realize that. Why would the Array's capacity property differ from the UnsafeBufferPointer's capacity?

nnnnnnnn · July 10, 2018, 7:30pm

If the array's storage is shared, it has to allocate new, unique storage before calling the closure. The exact capacity can change during that reallocation.

nnnnnnnn · July 20, 2018, 8:54pm

Thanks for everyone's input! PR for the proposal is here: Add a proposal for `Array.init(unsafeUninitializedCapacity:initializingWith:)` by natecook1000 · Pull Request #882 · apple/swift-evolution · GitHub

Andrew_Trick · July 20, 2018, 11:21pm

The proposal is excellent.

Quick question on the partition example:

var high = low + buffer.count

I'm curious. In practice, wouldn't you expect users to typically
capture and use the same count that they passed in instead?

var high = low + count

jrose · July 20, 2018, 11:33pm

Can you include some thoughts on why initializedCount is an out-parameter rather than just a return value? I get that naming it makes it slightly easier to talk about in the docs, but if it's a return value the client is required to deal with it. (Unlike the method version, you're not using the return value for anything.)

I'm still sad about not getting the mutating method version, since there's no recourse there other than to wrap your elements in Optional or drop down to UnsafeMutableBufferPointer allocation. I think my attempt to combat the capacity confusion would be to add the parameter, like you said:

mutating func withUnsafeMutableBufferPointer<Result>(
  reservingCapacity minimumCapacity: Int,
  do action: (
    _ buffer: inout UnsafeMutableBufferPointer<Element>,
    _ initializedCount: inout Int
  ) -> Result
) -> Result

taylorswift · July 23, 2018, 7:33pm

i agree, the count should be a return value. i feel like i’m gonna forget to set the inout parameter

nnnnnnnn · July 23, 2018, 7:35pm

Either way should work, since this initializer guarantees that the number they specify and the size of the buffer are the same. In the Array.withUnsafe____ methods we require users to use the buffer's count rather than the array's (for exclusivity reasons), so they may be accustomed to doing that already.

To be perfectly honest, it's to be parallel with the method that I removed from the proposal, in case we add it in the future. Now that I write that out, it doesn't seem very compelling, esp. given your point about requiring the client to give a value rather than forgetting (although that's a bug would be apparent pretty quickly). I think I'll revert the closure's signature.

Re: the name for the mutating method, withUnsafeMutableBufferPointer(reservingCapacity:do:) still looks too close to the existing withUnsafeMutableBufferPointer method to me — I think it would need to make clear that the buffer includes uninitialized memory.

taylorswift · July 23, 2018, 7:43pm

i don't want to derail this with naming drama but i’m not really convinced by this paragraph

This proposal leaves out wording that would reference two other relevant concepts:

reserving capacity: Arrays currently have a reserveCapacity(_:) method, which is somewhat akin to the first step of the initializer. However, that method is used for the sake of optimizing performance when adding to an array, rather than providing direct access to the array's capacity. In fact, as part of the RangeReplaceableCollection protocol, that method doesn't even require any action to be taken by the targeted type. For those reasons, the idea of "reserving" capacity doesn't seem as appropriate as providing a specific capacity that will be used.

for me i conceptually think of this method as a variant of the reserve-and-push idiom where the pushes are replaced with random accesses and the count is returned at the end rather than incremented with each element. i understand it’s problematic that the reserveCapacity(_:) method on the protocol doesn’t actually guarantee the capacity is there but that sounds more like a problem with the protocol. either way unsafeUninitializedCapacity: is a horrible argument label.

maybe instead we can use allocatingCapacity: which would evoke UnsafeMutablePointer<T>.allocate(capacity:) which is really the pattern we’re trying to replace here. people who are going to be using this API are going to be people who have already been using the unsafe pointer APIs (since until now, that’s been the only way to do this), and i feel like we’ve already drilled into everyone’s head that the word allocate means unsafe and uninitialized.

nnnnnnnn · July 23, 2018, 8:15pm

The initializer can be used for this purpose, though I doubt there's much of an optimization win, since appending is already super fast if there aren't re-allocations. The goals here are more to support noncontiguous access (like the partitioning example) and mostly C interoperability cases, when you need to start with a buffer of uninitialized memory and have a function write into it. I should add an example of that second usage.

allocatingCapacity: is good, but I still think we need unsafe to be visible at the use site.

taylorswift · July 23, 2018, 8:27pm

no i know what you mean i meant at the highest level the act of making an array of size n and populating it with stuff. with pushes it would only work if you populate the array from beginning to end. this API generalizes it so it can be populated in any order, but at the high level, it’s still doing the same thing. You know how big the array should be at the end, the method of populating it is just slightly different.

taylorswift · July 23, 2018, 8:32pm

what about a static method Array<T>.allocateUnsafe(capacity:_:), would line up nicely with UnsafeMutablePointer.allocate(capacity:)

nnnnnnnn · July 23, 2018, 8:46pm

I think the problem I have with including "allocate" in the name is that whenever you allocate in Swift you're also in charge of deallocating, which this avoids.

taylorswift · July 23, 2018, 8:55pm

truee. so allocate and reserve probably aren’t good words then.

another idea: what if the closure didn’t take an UnsafeMutableBufferPointer, but some kind of ArrayBuffer type? (not to be confused with Array internals) in debug builds, this object would track the memorystate of each element in the buffer and trap on return if there are uninitialized elements before the returned count index. in release builds, this type would be completely transparent and write to the underlying buffer. we really don’t want to be exposing UnsafeMutableBufferPointer subscripts that directly anyway, since those do assignment which is incorrect usage.

Braden_Scothern · July 23, 2018, 11:36pm

How is using the UnsafeMutableBufferPointer for assignment incorrect? Isn't its purpose to manipulate the raw memory that backs whatever has vended it? Since the new constructor is already intended to be an unsafe API I don't see how this is bad. The UnsafeMutableBufferPointer is far more useful especially when working with (Obj-)C.

Also, I agree that avoiding allocate or reserve if possible is preferred from a naming perspective.

taylorswift · July 24, 2018, 12:23am

No! in general you cannot assign to uninitialized memory. that’s cause Swift will try to deinitialize whatever is in the destination location, and since nothing is there yet, it will (possibly) segfault. try it this crashes (as it should)

class C 
{
    let c:Int
    
    init(_ c:Int)
    {
        self.c = c
    }
}

let array:[C] = .init(_unsafeUninitializedCapacity: 32)
{
    for i:Int in (0 ..< 16).reversed() 
    {
        $0[i] = .init(i)
    }
    
    $1 = 16
}

the exception is pure value types like Int. but even ‘value-semantic’ types like [Int] cannot be used like that. it might be nonobvious at first that this is legal

let array1:[Int] = .init(_unsafeUninitializedCapacity: 32)
{
    for i:Int in (0 ..< 16).reversed() 
    {
        $0[i] = i
    }
    
    $1 = 16
}

but this is not

let array2:[[Int]] = .init(_unsafeUninitializedCapacity: 32)
{
    for i:Int in (0 ..< 16).reversed() 
    {
        $0[i] = []
    }
    
    $1 = 16
}

this can be problematic because you could have a struct as your element type and everything is working fine until you add an Array member to your struct and all of a sudden, all those [i] = s don’t work anymore

Braden_Scothern · July 24, 2018, 12:32am

Ah, I didn't think about the reference type usage needing to initialize.

My main use cases are with structs and C primitives. So I am creating an array with an instance of the struct/primitive using the current init(repeating:count:) and then I use withUnsafeMutableBufferPointer() to pass in the buffer pointer to be properly initialized by the C code with the new values I actually want.