[Pitch] Temporary uninitialized buffers

Hi folks! Long-time reader, first-time caller! This is my first pitch, so in addition to feedback about the concept, feel free to send me a direct message if you have any feedback on the structure/tone/etc. of this post. Thanks!

Introduction

I've been dealing with code that needs to work with temporary sequences of various types. The types I'm thinking of are usually value types and need to be allocated and dealt with en masse. In C or Objective-C, it's easy enough to stack-allocate a buffer in which to hold these values, and the logic to switch to the heap for a larger allocation is pretty simple too. Something like this:

size_t tacoCount = ...;

Taco *tacos = NULL;
Taco stackBuffer[SOME_LIMIT];
if (tacoCount < SOME_LIMIT) {
  tacos = stackBuffer;
} else {
  tacos = calloc(tacoCount, sizeof(Taco));
}

// do some work here
for (size_t i = 0; i < tacoCount; i++) {
  createTaco(tacos + i, ...);
}

eatTooMuch(tacos, tacoCount);

for (size_t i = 0; i < tacoCount; i++) {
  destroyTaco(tacos + i);
}

if (buffer != stackBuffer) {
  free(buffer);
}

In C++, we can make judicious use of std::array and std::vector to achieve the same purpose while generally preserving memory safety.

But there's not really any way to express this sort of transient buffer usage in Swift. You can allocate an UnsafeMutableBufferPointer<Taco> of course, but such a buffer pointer is allocated on the heap and the optimizer can only stack-promote when it can see tacoCount at compile-time—something that's not always feasible in production code.

The Pitch

I'd like to propose a new inlinable function in the standard library that allocates a buffer of a specified type and capacity, provides that buffer to a closure, and then deallocates the buffer. The buffer would be passed to the closure in an uninitialized state and treated as uninitialized on closure return—that is, the closure would be responsible for initializing and deinitializing the elements in the buffer. The function would look something like this:

/// Provides scoped access to a buffer pointer to memory of the specified type
/// and with the specified capacity.
///
/// - Parameters:
///   - type: The type of the buffer pointer to allocate.
///   - capacity: The capacity of the buffer pointer to allocate.
///   - body: A closure to invoke and to which the allocated buffer pointer
///     should be passed.
///
/// - Returns: Whatever is returned by `body`.
///
/// - Throws: Whatever is thrown by `body`.
///
/// This function is useful for cheaply allocating storage for a sequence of
/// values for a brief duration. Storage may be allocated on the heap or on the
/// stack, depending on the required size and alignment.
///
/// When `body` is called, the contents of the buffer pointer passed to it are
/// in an unspecified, uninitialized state. `body` is responsible for
/// initializing the buffer pointer before it is used _and_ for deinitializing
/// it before returning. `body` does not need to deallocate the buffer pointer.
///
/// The implementation may allocate a larger buffer pointer than is strictly
/// necessary to contain `capacity` values of type `type`. The behavior of a
/// program that attempts to access any such additional storage is undefined.
///
/// The buffer pointer passed to `body` (as well as any pointers to elements in
/// the buffer) must not escape—it will be deallocated when `body` returns and
/// cannot be used afterward.
@inlinable
public func withUnsafeUninitializedMutableBufferPointer<T, R>(
  of type: T.Type,
  capacity: Int,
  _ body: (UnsafeMutableBufferPointer<T>) throws -> R
) rethrows -> R

The implementation of withUnsafeUninitializedMutableBufferPointer(of:capacity:_:) would check the required size of the buffer and, if small enough, stack-promote it. If too large to safely fit on the stack, it would heap-allocate (just as UnsafeMutableBufferPointer.allocate(capacity:) does today.) The compiler can optimize such a function heavily in a way that it can't do with a heap-allocated buffer.

As a convenience, for when you just need transient storage for a single value, we can also provide a second function:

@inlinable
public func withUnsafeUninitializedMutablePointer<T, R>(
  to type: T.Type,
  _ body: (UnsafeMutablePointer<T>) throws -> R
) rethrows -> R {
  return try withUnsafeUninitializedMutableBufferPointer(of: type, capacity: 1) { buffer in
    return try body(buffer.baseAddress!)
  }
}

I have a proof-of-concept implementation of these functions that I will be sharing in a "work-in-progress" branch in the swift repo in the near future; I'll comment here when that's posted. The proof-of-concept branch (still untested!) is available on my fork of the the Swift repo.

The Catch(es)

There are a few issues with this pitch:

  1. To implement withUnsafeUninitializedMutableBufferPointer(of:capacity:_:) optimally, a new Builtin function would be needed, equivalent to alloca() in C, to allocate uninitialized space on the stack and extend its lifetime to some future point automatically. Today, there is no mechanism to access such memory without dropping down to C/Objective-C/C++.

    My proof-of-concept implementation avoids the above constraint by using several internally-defined value types to allocate stack space. These value types exist in the Swift type system and the Swift memory model, so they need to be zero-initialized which incurs a non-zero (heh) runtime cost.

    The compiler is still able to optimize away the zero-initialization when it can see that values in the buffer are untouched before the closure initializes them, but this optimization is not always applicable.

  2. There is no perfect heuristic for capping the size of the stack allocation. In my proof-of-concept work, I've settled on:

    IF size ≤ 1KB AND alignment ≤ natural THEN
      stackAllocate()
    ELSE
      heapAllocate()
    ENDIF
    

    This is an easy and fast heuristic to implement but probably needs to be refined e.g. for embedded systems.

  3. The set of scenarios where these functions are useful in Swift is more constrained than the set of such scenarios in C/Objective-C/C++. In Swift, a better solution is (often? usually?) one that involves creating a really good API interface. The counterpoint is that there's a lot of API out there, especially at lower layers (POSIX, Win32, etc.), that doesn't have a great overlay and probably won't get one in the near future. We still need to use those APIs, so being able to do so in Swift as efficiently as we do in C is still desireable. And, if we do someday get Swift interfaces for those APIs, they'll presumably need to allocate stack buffers under the covers—and to do so, they'll likely want withUnsafeUninitializedMutableBufferPointer(of:capacity:_:)!

-JG

17 Likes

This functionality is long overdue. Thanks for starting the discussion! My own wishlist for functionality in this space would include:

  • The ability to specify a count, to let you allocate space for a temporary on-stack array of elements, not just one;
  • The ability to allocate a temporary UnsafeRaw[Buffer]Pointer for untyped memory, too,
  • A variation of the signature that gives you the initialized value you put in the memory as the return value, makeValueAtUninitializedPointer<T>(_: (UnsafeMutablePointer<T>) -> Void) -> T, which could be implemented with return value optimization to do the initialization in-place.

As far as prototype implementation, another approach could be to simply malloc the memory, call the body closure, then free it. After inlining, there is at least some chance that LLVM might stack-promote the scoped malloc.

3 Likes
  • The ability to specify a count, to let you allocate space for a temporary on-stack array of elements, not just one;

My proposal does focus on a buffer rather than a single-element pointer. Can you clarify how this might be different (or are you just agreeing)? Thanks!

The ability to allocate a temporary UnsafeRaw[Buffer]Pointer for untyped memory, too,

I agree, this would be pretty easy to build given any reasonable implementation. I'll make sure to include it in the formal proposal.

  • A variation of the signature that gives you the initialized value you put in the memory as the return value, makeValueAtUninitializedPointer<T>(_: (UnsafeMutablePointer<T>) -> Void) -> T , which could be implemented with return value optimization to do the initialization in-place.

Can you give an example use case? I'm not sure I can think of a use case where this would be better than just T.init(...) but I'm sure I'm just missing something. :slight_smile:

Ah, I see that you did include a capacity. Discourse's presentation hid it below a scroll bar, sorry.

It could be useful for "placement new" sorts of situations, or where you have an underlying C API with an out parameter that initializes a value in place.

It could be useful for "placement new" sorts of situations, or where you have an underlying C API with an out parameter that initializes a value in place.

I think it would be sufficient to direct developers to use a pattern like this one:

let value = withUnsafeUninitializedMutablePointer(to: T.self) { ptr in
  ...
  return ptr.move()
}

Following the maxim to "avoid one-liners." However I don't feel strongly either way—if you feel this is important to add, it would be easy to do so (as you said earlier.)

Really happy to see this pitch! I've wanted this several times.

Would it be worth making this parametrizable? If I'm in a leaf function, I might want to put several KB on the stack, but if I'm running on a background queue with a small stack I might want to make the stack threshold much smaller. If we don't want to let the user precisely control the stack size, we could still say that the implementation is allowed to use a smaller threshold than the user specified.

My experience with using the same trick is that the zero initialization is not optimized away in practice, and it is a significant cost. I would really like to see whatever implementation we end up with solve that problem.

My concern there is that it still puts you at the mercy of the optimizer to eliminate that move. As a primitive, it could be codegenned in a way that the pointer is guaranteed to be to the indirect return buffer for the call and avoid the possibility of an actual move.

My concern there is that it still puts you at the mercy of the optimizer to eliminate that move. As a primitive, it could be codegenned in a way that the pointer is guaranteed to be to the indirect return buffer for the call and avoid the possibility of an actual move.

Fair enough @Joe_Groff! You're thinking about how it works at a lower level than I am—I'm mostly in a "trust the compiler to do the right thing" mindset. :slight_smile:

[I don't know how hard this would be to implement but] Not necessarily part of the pitch, but for the implementation, it would be nice if this worked with ASan, so trying to access the uninitialized memory would lead to an easy-to-spot error when running with ASan.

Would it be worth making this parametrizable?

It certainly wouldn't be hard to add a defaulted stackLimit parameter, however my concern there is that the stdlib would be pushing responsibility for the heuristic onto developers. I'd want the stdlib team to brainstorm a bit first, if that makes sense?

My experience with using the same trick is that the zero initialization is not optimized away in practice, and it is a significant cost. I would really like to see whatever implementation we end up with solve that problem.

I imagine that if this pitch is adopted, the stdlib changes would not land until after the requisite builtin. With that in mind, the perfect is the enemy of the good. If the only real option for this proposal is to ship it with stack-zeroing and to rely on the optimizer, the team might decide that it's still worth shipping, since the optimizer and the implementation can both be improved in future Swift releases.

To play devil's advocate a bit, what does this get you that a stack-promoted ManagedBuffer does not? Is it just stack-promotion in debug builds? (And a less complex API than ManagedBuffer.) I'd understand if it were guaranteed to be on the stack, always, but this API doesn't seem to be that.

Stack promotion is opaque to me as a user. And in practice it has worked 0 out of the times I've needed something like this API even if I go somewhat out of my way to try to make the code optimization friendly. Whether my allocation is larger than a stack threshold is both understandable, and robust against incidental changes in my code, unlike optimizations.

5 Likes

To play devil's advocate a bit, what does this get you that a stack-promoted ManagedBuffer does not? Is it just stack-promotion in debug builds? (And a less complex API than ManagedBuffer.)

That's a great question, Jordan! To my mind, ManagedBuffer and my proposed function express different-but-complementary goals.

ManagedBuffer says "I want to allocate a refcounted object with an arbitrarily long tail-allocated buffer" thus avoiding two heap allocations when one will do; the developer can then use that object as they would use any other object. If the compiler can prove a number of things, then the ManagedBuffer can be stack-promoted*:

  1. The total size of the allocation must be known at compile-time;
  2. We must be dealing with ManagedBuffer and not a subclass thereof nor a type containing a ManagedBuffer (although I suspect this is an optimizer bug and have filed a bug report about it); and
  3. The ManagedBuffer must (provably) not escape its calling context.

On the other hand, my proposed function says "I want to allocate some arbitrarily long temporary space in the most efficient location possible." The developer can then use that buffer in a scoped context and have it be automatically discarded at the end of said scope.

A developer today can certainly use ManagedBuffer to approximate my proposed function, and it might even be reasonable to implement the function as a wrapper over ManagedBuffer. But I don't think it would be clear to a reader what that developer was trying to do. For example:

let buffer = ManagedBuffer<Void, Taco>.create(minimumCapacity: tacoCount, makingHeaderWith: { _ in })
buffer.withUnsafeMutablePointerToElements { tacos in
  let tacoBuffer = UnsafeMutableBufferPointer(start: tacos, count: tacoCount)
  tacoBuffer.initialize(...)
  eatTooMuch(tacoBuffer)
  tacoBuffer.deinitialize(...)
}

The above code is functional but its purpose is obscured by the presence of an object in addition to the pointer we need. This spells out the intent more clearly:

withUnsafeUninitializedBufferPointer(of: Taco.self, capacity: tacoCount) { tacoBuffer in
  tacoBuffer.initialize(...)
  eatTooMuch(tacoBuffer)
  tacoBuffer.deinitialize(...)
}

And it relies on stack allocation as an optimization side-effect, not as an implementation guarantee. Which comes to…

I'd understand if it were guaranteed to be on the stack, always, but this API doesn't seem to be that.

My intent here in falling back to the heap is to avoid the same pitfall C's variable-length arrays have, which is that beyond a certain size you're pretty much begging to overflow the stack. So the function, as proposed, says "I'll put this on the stack if there's room** I promise" rather than either "I'll put this on the stack if the stars align during optimization" or, worse, "if you're not very very careful, everything will go very very wrong."


* I'm sure there are additional restrictions on stack promotion that I don't recall at the moment. My point is that stack promotion is fragile and an optimization only—a developer cannot depend on it occurring and should expect any and all uses of ManagedBuffer to use the heap.

** Determining whether or not there's room on the stack is the job of the heuristic I mentioned earlier. In my proof-of-concept, I simply hard-coded a 1KB limit. The stdlib can probably be a bit smarter.

6 Likes

I also think this is a useful proposal, and I’d like to see it land.

Seems like a Swift temporary buffer facility does not necessarily need
to be unsafe? A couple of common cases are foreseeable:

  • the buffer is fully initialized: an item is stored at every index,
    0..<capacity, before any items are read or modified. The Tacos
    example fits the bill.

  • the head of the buffer head is fully initialized: over the lifetime of
    the buffer, buf, it is always true that buf.count < capacity.

Roughly, the API may look like this:

struct FixedCapacityArray<T> : MutableCollection,
    RandomAccessCollection, RangeReplaceableCollection, ... {
	public init(capacity: Int)
	public init<S>(from: S, capacity: Int)
	    where S : Sequence, Self.Element == S.Element
}

public func withFixedCapacityArray<T, R>(of type: T.Type, capacity: Int,
    _ body: (FixedCapacityArray<T>) throws -> R) rethrows -> R


public func withFixedCapacityArray<S, T, R>(from seq: S, capacity: Int,
    _ body: (FixedCapacityArray<T>) throws -> R) rethrows -> R
    where S : Sequence, S.Element == T

No unsafe necessary, correct?

The Tacos example can be written in Swift like this,

withFixedCapacityArray(from: (0..<16).lazy.map { Taco(...) }, capacity: 16) {
	tacos in eatTooMuch(tacos)
}

or like this,

withFixedCapacityArray(of: Taco.self, capacity: 16) { tacos in
	for _ in 0..<16 {
		tacos.append(Taco(...))
	}
	eatTooMuch(tacos)
}

It seems like the compiler has a lot of scope to optimize the stack/heap
placement and bounds checking on that code.

Dave

2 Likes

Thanks for the feedback, David! A developer can build all that today with Array<T> and rely on stack promotion. This pitch is specifically about providing fast access to an uninitialized buffer by way of an UnsafeBufferPointer.

Consider the scenario where default-initializing a sequence of Taco is expensive, or where Taco is a type with no practical default value (e.g. Result). In those cases, it's not possible to use withFixedCapacityArray(from:capacity:) as you've proposed because there's no way to initialize the state of the array.

Even if you made the sequence's element type Optional<Taco> and initialized all elements to nil, you'd still be paying the cost of zero-initialization. As others in this thread have mentioned, that cost is a significant problem in real-world Swift codebases today. (Additionally, using optionals would make actual interactions with the sequence more awkward by forcing adopters to sprinkle if let and optional chains in otherwise non-optional code.)

The other initializer, withFixedCapacityArray(of:capacity:), silently grows the array, which is an interesting proposal, but that would be prone to out-of-bounds bugs. A buffer pointer, which has a fixed size, would not be subject to such bugs because you can't append it, let alone append it 17 times. :)

For what it's worth, such a function could be implemented atop withUnsafeUninitializedMutableBufferPointer(of:capacity:_:), if you wanted to have such an interface in your project. It would not be possible to implement FixedCapacityArray<T> unless we added some mechanism for the compiler to prevent instances from escaping the current scope. If instances can escape, then stack promotion cannot be guaranteed and is subject to the same sorts of limitations as with ManagedBuffer.

Well, the proposed function also only stack-allocates as an optimisation. The only difference seems to be that it uses a simpler and more predictable heuristic.


I had a reply written, but then I got caught up with some other things, but it was basically going to be what @gnuoyd suggested - a safe, fixed capacity array.

The thing about this proposal which gives me pause is that it's yet another unsafe pointer. I feel that Swift is far too ready to expose unsafe pointers to you; it's like it almost encourages you not to be safe. It's easy to forget that when the project website describes Swift, safety is top of the list - above performance and expressiveness. When you consider how strict some other languages are about avoiding unsafe code, and then you look at Swift's relatively lax approach... it feels like we need more discipline.

The idea is that we'd be talking about a fixed capacity array, not a fixed count array. You wouldn't need to initialise the buffer to anything or use optionals or any of that, but it would employ bounds-checking to prevent you from reading uninitialised elements.

If the buffer doesn't escape, and the capacity is known, you'd think the compiler would do a reasonable job of eliminating any superfluous bounds checks.

1 Like

Thanks for the feedback Karl.

We already have such a construct in the language in the form of Array<T> and reserveCapacity(), which the compiler can optimize into a simple stack allocation in all the scenarios where it would be safe to use the proposed FixedCapacityArray<T> type. Any additional compile-time optimizations that could be added for FixedCapacityArray<T> could also be added for Array<T>

You're right, of course. Swift aims to be a safe language by default. To quote that document:

The most obvious way to write code should also behave in a safe manner.

The most obvious way to make a buffer of type T is by using Array<T>. But that doesn't mean it's the only way and it doesn't mean that the most obvious way is the most optimal way. Sometimes, developers do need to write code with a different balance of safety and performance than the default would allow. This is why symbols like UnsafePointer include the word Unsafe in their names—it's a reminder to developers that by using them, they are leaving the well-trodden path.

I did discuss both of Dave's proposed functions. :slightly_smiling_face: Unfortunately, I accidentally posted before I finished writing my reply, so you might not have seen the full message.

1 Like

Regarding naming... it seems unnecessary to use all the words of our mouthful of an unsafe buffer type in the function name. withUnsafeUninitializedBuffer is probably fine.

In exchange, I'd suggest scrapping the "convenience" version for size of one. It's not exactly a big deal in these cases to just grab buffer.basePointer! and being explicit that you are asking for only a single element seems better anyways.

4 Likes

Naming is hard! :stuck_out_tongue: But yes, the more I type it the more my index fingers hate me, and I imagine if/when we open a formal proposal thread we'll come up with a more concise name.

To me, this speaks to some broader Pointer-vs.-BufferPointer design questions I have been known to mutter about in hallways… As long as we have both and they aren't equivalent, my inclination is to treat them as equals. (I agree this is something worth discussing, so when I write my formal proposal I'll give it some more thought.)