SE-0322: Temporary Uninitialized Buffers

Chris_Lattner3 · September 16, 2021, 4:26pm

Perhaps, on the other hand, some teams have a zero tolerance policy with ! invocations, so the boilerplate tax imposed here is much higher for the scalar case than you claim.

Ben_Cohen · September 16, 2021, 4:37pm

As I said, those teams are wrong. Force unwrapping is a legitimate part of the language (the core team reinforced this recently on a thread about deprecating it) and it would be wrong to design APIs to avoid its legitimate use because some teams have a practice of avoiding it – a bad practice that encourages writing untestable paths and obfuscating code that would be clearer with a force-unwrap.

The core team has frequently set out criteria for adding functionality to the standard library. Additions should not be trivially composable, unless they are strongly motivated by criteria such as being difficult to write correctly, avoiding common boilerplate, and enhancing readability. This is a relatively niche feature, and it's unclear if the scalar use case will be common. If it becomes so, I'd suggest a proposal in future that adds this sugar, since you can always add sugar later but never take it away.

Karl · September 16, 2021, 4:37pm

The issue I have with this approach is that it is a long-standing deficiency of the optimiser. Here's a post I made back in 2017 about exactly this:

And it has come up many, many times since then. Not to be too harsh on the optimiser here, but looking over the last 4 years, it seems that promoting heap allocations on to the stack just has not been a priority, and even obvious candidates for promotion still allocate on the heap.

I'm worried that by introducing a new API, that optimisation potential is going to continue to be ignored, because it gives us an easy excuse - to just use this other API instead. IMO, in a high-level language, you write what you mean, and the compiler's job is to make it fast. Requiring specific APIs in order to opt-in to stack allocation feels clunky and outdated - it's an obvious win, as obvious as anything else the optimiser does - and you shouldn't need to opt-in to it with special APIs. That's not to mention all of the existing code which would automatically benefit from a smarter optimiser.

For the specific API pitched, my opinion is that we should first improve the optimiser, and then discuss whether there is still a need for additional APIs.

This is more interesting; particular the part about wanting fully predictable behaviour. Obviously, letting developers control when stack allocation happens means there is a risk of stack overflow, which is generally undefined behaviour, and unlike buffer over-reads or use-after-frees, developers cannot easily predict when that will happen. Basically that would be VLAs.

If we did want to add APIs to support stack allocation beyond what an improved optimiser can provide, I think there might be room for an allocation function which attempts to allocate on the stack and returns an optional, making use of the fact that many platforms provide "checked" alternatives to alloca which notify the caller of the failure.

The actual functionality to "allocate on the stack if possible, otherwise go to the heap" should be an optimisation which all developers can rely on using the existing APIs.

xwu · September 16, 2021, 4:39pm

Personally, I’m skeptical that the language should try to be creating APIs to work around dialects that we don’t recommend which are premised on avoiding the usages we do recommend.

However, I do agree that it’s rather unwieldy—if we buy that the scalar use case exists—to be telling users to call an API to get a temporary buffer they won’t use. Character count aside, that’s a lot of conceptual noise that the user and any readers of the code would have to sift through for no benefit.

The counterpoint to that is, by not offering a scalar variant here, users are likewise prompted to ask the same question because—if I can use a buffer here, then why are there scalar variants of all these other unsafe pointer APIs? I don’t buy that drawing the line here when we do have all these micro-variants brings clarity rather than more confusion.

Chris_Lattner3 · September 16, 2021, 5:00pm

You are making a very strong statement here. Are you speaking for your personal opinion, a policy the core team has set down somewhere, or on behalf of someone else?

-Chris

Ben_Cohen · September 16, 2021, 5:11pm

There has been previous explicit core team guidance that ! is a legitimate part of the language, and not just a workaround from the days of unaudited C APIs that we should move towards deprecating.

It's my personal inference from this that if, despite this, some teams choose to ban !, we should not be catered to this by adding otherwise trivially-composable sugar we wouldn't otherwise, to avoid them writing if let x else { impossible }. This seems a fairly safe inference to make.

Ben_Cohen · September 16, 2021, 5:22pm

The criteria for additions to the standard library have been officially stated, yes. See John's acceptance post for SE-0255 for a recent example.

Now these things are subjective. One person's trivial sugar is another's major readability improvement. But I don't think it's credible to argue this is a key improvement for fluency, and it certainly doesn't match criteria such as "complex, challenging, or error-prone for users to implement themselves" or "have substantial performance advantages over a user implementation".

blangmuir · September 16, 2021, 5:23pm

In the current PR you have this as a frontend compiler option; if we wanted to say this customization is "supported" it would need to be a driver option as well, and if we want it to be usable by libraries it would also need to be added to SwiftPM as a target build setting, since versioned library packages cannot specify arbitrary compiler arguments.

It is unlikely that the caller will have sufficient additional information about the state of the program such that it can make better decisions about stack promotion than the compiler and/or Standard Library.

I think the opposite is true: if I'm writing code that makes many separate allocations I may need a smaller limit, and if I'm going to subdivide the allocation myself, or if I know my function will be called a small stack below it I may want a much larger limit than 1k. The compiler may be able to reverse-engineer that intent from my code, but avoiding unpredictability of compiler optimizations is why I would reach for this API in the first place.

I think it's probably good that the stdlib would provide a good default heuristic, but I still think we should provide caller control to override it, or at the very least document what the user can rely on from the builtin heuristic -- e.g. "the limit will be at least 1k subject to alignment requirements".

grynspan · September 16, 2021, 8:16pm

I talked to Ben briefly off-thread. An approach we can consider taking (but which is not formally part of this proposal) is to have three paths instead of two:

For small allocations (e.g. 1KB or less), just allocate on the stack;
For medium-sized allocations (1 KB < n < 32KB or something like that), consult the operating system* to determine if there is sufficient stack space available; and
For very large allocations, just go right to the heap because there's no way it's gonna work out for you.

Before folks start arguing about it: the sizes above are examples only and are not meant to be authoritative. Note also that the utility of the middle branch depends on how efficiently we can ask the operating system for the size of the stack. If it is an expensive operation or requires a heap allocation in and of itself, it's not worth doing. But if it's very cheap, we can effectively say "any allocation that fits on the stack will stack-allocate."

Since the stack promotion heuristic is not formally part of this proposal, we can make these sorts of changes after landing a simpler implementation.

* Functions that inspect the current thread's stack are:

Apple: pthread_get_stacksize_np() and pthread_get_stackaddr_np()
Linux and BSD: pthread_getattr_np()
Windows: GetCurrentThreadStackLimits()

I have not attempted to profile these functions; I'm just noting they exist and could be used here.

David_Sweeris · September 16, 2021, 8:18pm

Is it, though? I mean, from your comments it's clearly just a hint in the current implementation, but should we consider strengthening that? Isn't one of Swift's goals to be able to be used for anything? If we consider this proposal a step towards helping bring Swift to "fully predictable" codebases (realtime, embedded, etc), would it be appropriate to throw an error if the compiler can prove the target arch doesn't support the required alignment? Would crashing be a reasonable course of action if it happens at runtime?

grynspan · September 16, 2021, 8:33pm

No existing allocating interfaces in Swift throw on allocation failure. Allocation is not a considered a failable operation in Swift—if an allocation fails, the program halts and there is no opportunity to recover.

In fact, that's what will happen now: if an alignment is invalid (e.g. 999), the proposed functions will crash identically to how UnsafeMutableRawBufferPointer.allocate(byteCount:alignment:) crashes. But that could change if that function (or rather, swift_slowAlloc()) is modified to be more lenient.

David_Sweeris · September 16, 2021, 8:44pm

Sorry, my mistake, I meant "throw a compiler error"... Should it be a compile-time error to do this:

let align = 3
withUnsafeUninitializedMutableRawBufferPointer(byteCount: 42, alignment: align) { ... }

when the compiler can prove to itself that align isn't a valid alignment for the target arch?

Edit: forgot the 2nd part...

Then I've misunderstood... I thought the question was about guaranteeing stack allocation vs heap allocation. There are systems where the timing is tight enough that triggering an unexpected heap allocation could be considered a logical error even when it might not be a syntactical error (much like out-of-bounds indexing on an array).

grynspan · September 21, 2021, 9:19pm

I did a bit of digging and it turns out there is precedent for a simple less-than-or-equal heuristic. Windows' _malloca() function behaves in exactly this manner.

I'm working on updating my implementation branch to include a stdlib call (when supported, and for larger allocations only) that checks the state of the current thread's stack and only heap-allocates if there's genuinely not enough room for the allocation on the stack. However, the value of this function is ultimately going to depend on how expensive it is to check the stack's bounds: if it's more expensive than calling UnsafeMutablePointer.allocate(capacity:) then it's not worth doing.

Joe_Groff · October 1, 2021, 11:03pm

Thanks for all of your input, everyone! The Core Team has decided to accept this proposal with modifications to the proposed names.