Array Initializer with Access to Uninitialized Buffer

+1 for something like this, but please put the word "unsafe" in the keyword argument, e.g. "unsafeUninitialized..."

-Chris

1 Like

Thanks for the feedback!

That's a solid idea (and as a bonus, it would decisively bump withUnsafeMutablePointerToElements as the function with the longest name). Our recommendation for arrays is to drop down to the buffer if performance is a concern, but that solution has been limited to situations where you didn't need to change the length. This addition would allow for that solution in more use cases.

I'm conflicted about keeping the initializer — it's nice since you can end up with an immutable array, but it's probably better to have all the unsafe array APIs grouped together.

Agreed! Thanks, and congrats on your recent touring success.

Would that have a signature like this?

extension Array {
    init(count: Int, providingEachElement f: (Index) -> Element)
}

let myArray = Array<String>(count: 5) { String(repeating: "\($0)", count: $0) }
// ["", "1", "22", "333", "4444"]

I like that idea too, though it seems like a separate proposal. I think @Erica_Sadun has done some work in this area.


Revised version of APIs for this proposal:

extension Array {
    /// Creates an empty array with the specified capacity.
    init(capacity: Int)

    /// Calls the given closure with a pointer to the full capacity of the array's
    /// mutable contiguous storage.
    ///
    /// - Parameter body: A closure that can modify or deinitialize existing elements
    ///   of initialize new elements.
    ///	  - Parameters:
    ///     - buffer: An unsafe mutable buffer of the array's full storage, including
    ///       any uninitialized capacity after the initialized elements. Only the 
    ///       elements in `buffer[0..<initializedCount]` are initialized.
    ///     - initializedCount: The count of the array's initialized elements. If you
    ///       initialize or deinitialize any elements inside `body`, update 
    ///       `initializedCount` with the new count for the array.
    /// - Returns: The return value, if any, of the `body`` closure parameter.
    mutating func withFullCapacityUnsafeMutableBufferPointer<R>(
        _ body: (_ buffer: UnsafeMutableBufferPointer<Element>, _ initializedCount: inout Int) throws -> R
    ) rethrows -> R
}
1 Like

Without commenting on anything else, I think this initializer should be on RangeReplaceableCollection. It could be a customization point if, eg, Array is able to do it more efficiently, or an extension method if the generic implementation is sufficient:

extension RangeReplaceableCollection {
  init(capacity: Int) {
    self.init()
    self.reserveCapacity(capacity)
  }
}

Should this be init(minimumCapacity: Int) to match the existing Dictionary and Set APIs?

Good point!

No—dictionaries and sets can't have arbitrarily sized storage buffers, so they choose the next size larger than what you pass. For Array, we would want to guarantee that the array has storage for exactly the specified number of elements.

But the Array.reserveCapacity(_:) API doesn't guarantee an exact capacity either:

  /// For performance reasons, the size of the newly allocated storage might be
  /// greater than the requested capacity. Use the array's `capacity` property
  /// to determine the size of the new storage.
1 Like

This is a feature I've wanted for a good while now and haven't had the time to put an initial pitch together. It would be very helpful when working with C APIs.

1 Like

You're right about that — the parameter name is even minimumCapacity for that method. With the current APIs that's probably fine, but if we're exposing the full capacity of the array in this way, it might be confusing to not have as precise control over the size as we do when allocating a buffer directly. These can be quite different amounts:

var a: [UInt8] = []
a.reserveCapacity(5)
// a.capacity == 16

Yeah, that's exactly what I had in mind. It's only semi-related, in that it has similar performance benefits and reasons for existing, so a separate proposal would be fine.

1 Like

This is a very important API, especially when interoperating with C libraries. I thought about this quite a bit in the past, but I'm glad to see a pitch.

My approach to this was a little less direct: Why do we even need these copies today? Why can't Array/String just wrap any old UnsafeMutableBufferPointer? I think the main reason is to provide value semantics and immutability, which requires the buffer itself to have value semantics (like Foundation.Data). Unfortunately we don't have something like that in the standard library, but if we did - theoretically Array and String could use it as backing storage, expose it directly to the user, and allow re-wrapping an existing buffer (e.g. you could create a String which shares the backing of an Array) while preserving value semantics and avoiding copying.

/// A COW memory buffer - like Foundation.Data, but guaranteed to be contiguous.
struct ContiguousData: RandomAccessCollection { /* ... */ }

struct Array<T> {
  init(rawStorage: ContiguousData)
  var rawStorage: ContiguousData
}

var string: String
do {
  // Create an Array
  let myArray = [1, 2, 3, 4]
  // View its memory as a String (no copying)
  string = String(rawStorage: myArray.rawStorage, encoding: .ascii)
}
// Array goes out of scope, string now holds unique reference to storage. No copy.
string.append("blah")

What do you think - would something like this work or not?

i don’t think this would work, Arrays have a header that’s allocated at the head of the buffer. so the elements actually come at an offset from the buffer start

1 Like

This seems like a broader concern that may not be worth addressing in this specific case: things that need to be mutable for initialisation but are immutable after that.

I like it!

Thanks for this Nate. It's been a long time coming. Here's Karl's old bug:
https://bugs.swift.org/browse/SR-3087

I agree that we should introduce a single unsafe API that handles initialization and reinitialization. The functionality needs to exist in a primitive form first. Mutability concerns and convenience can be added later.

I'm glad someone else proposed this name so I don't take any flak for it, but it works for me:

3 Likes

I'm looping back around to this idea after some discussion of the intention and behavior of array storage allocation in this thread: Array capacity optimization. The primary challenge that I've been looking at is that someone calling this method may easily expect that the size of the buffer in the closure matches either the capacity of the array that they observe through the capacity property or the capacity that they reserve immediately before, through a call to reserveCapacity(_:), but neither one of those is guaranteed.

I think I'm back to the original initializer that I had in mind, with a revised name (to add "unsafe"), since I'm not satisfied with the designs I can come up with for a mutating method. Here's the thought process that leads to my kind-of-paradox:

  • If someone calls a hypothetical withFullCapacityUnsafeMutableBufferPointer method, they may make mistaken assumptions about what size buffer they receive inside the closure. Mistakes of this kind are worse than usual, since they can lead to memory leaks or memory accesses past the buffer's boundaries. We don't have a great way to provide diagnostics when things go awry, and the buffer's size will likely be nondeterministic, so issues may go unnoticed in development only to appear in production.
  • We could add a capacity parameter to the method, but that complicates the method a bit, and means that the confusion is still possible, since the method doesn't really make sense if the user passes a capacity smaller than the array's count. Would we include a precondition that the specified capacity has to be at least the count? It feels like this version trades one problem for another.
  • Also, would we even call that method? It's no longer withFullCapacityUnsafeMutableBufferPointer, because the user is passing a specific capacity. Putting "uninitialized" in there doesn't really make sense either, since some portion of the buffer may be initialized already.

Agree? Disagree? Ideas for a mutating method that doesn't have these issues?

The initializer doesn't have these problems: the buffer can cover the exact number of elements that the user requests, even if the storage itself ends up being slightly larger, and since the whole buffer is uninitialized, we can use that word in the API, making its usage more clear.

I have an underscored implementation of the initializer here, at Ben's suggestion: https://github.com/apple/swift/pull/17774

1 Like

I probably don't understand the full complexity here, but why can't or shouldn't the available capacity just be passed as an argument to the provided closure? That would be my expectation, that you would get the buffer pointer, the available capacity, and the inout count.

A "buffer pointer" is a start/length pair, where the length is intended to be the capacity. Passing it separately would be redundant.

3 Likes

Is the inout count for the actual used count? If so could we instead return an Int over the inout?

Right, thanks, I can't keep the unsafe APIs straight. I guess I'm still not sure what the problem is then. If someone makes an assumption about the capacity instead of checking the buffer's count, then won't any such reasonable assumption presume that there is less capacity than is actually available, rather than more, which isn't a big problem from a memory safety perspective?

1 Like

I would have named it init(reserveCapacity:unsafeInitializingWith:). I don't feel strongly about the exact name, but I do think we should make a distinction between the reserved capacity vs. the actual capacity.

someone calling this method may easily expect that the size of the buffer in the closure matches either the capacity of the array that they observe through the capacity property

I didn't realize that. Why would the Array's capacity property differ from the UnsafeBufferPointer's capacity?

If the array's storage is shared, it has to allocate new, unique storage before calling the closure. The exact capacity can change during that reallocation.

Terms of Service

Privacy Policy

Cookie Policy