SE-0322: Temporary Uninitialized Buffers

lukasa · September 8, 2021, 3:34pm

I'm also a +1 on the following counts:

yes, we should do this, it's important
yes, we should nest the names under the relevant types as static functions
yes, we should replace to with boundTo or as to avoid the awkward analogy to withUnsafePointer(to:).

benrimmington · September 8, 2021, 4:17pm

Ephemeral pointers are produced by implicit conversions, so a different method name would be better.

Temporary Pointers educational note.
@_nonEphemeral attribute.

beccadax · September 8, 2021, 6:00pm

Excellent proposal in all functional respects.

If we're bikeshedding, I question whether it's really helpful to give the full name of the pointer type, either in the base name or as an enclosing type. And I think we could reduce the impact of having multiple free function by making them three overloads of the same base name:

withUnsafeTemporaryAllocation(of:_:)             // UnsafeMutablePointer<T>
withUnsafeTemporaryAllocation(of:capacity:_:)    // UnsafeMutableBufferPointer<T>
withUnsafeTemporaryAllocation(ofByteCount:_:)    // UnsafeMutableRawBufferPointer

Yes, that means you can't allocate multiple elements of a type without getting a buffer pointer. The UnsafeMutablePointer is only a property access away.

For what it's worth, DefaultStringInterpolation has a similar inlined heuristic (it assumes that each interpolation will add two characters on average—a number that's deliberately low to avoid unnecessarily opting out of the small string representation), and I've never regretted it. Having a heuristic that can be optimized out is extremely valuable.

grynspan · September 8, 2021, 6:34pm

Sounds like there's a strong desire here to make these functions members of the various pointer types. Consensus is good!

We are definitely bikeshedding.

beccadax:

withUnsafeTemporaryAllocation(of:_:)             // UnsafeMutablePointer<T>
withUnsafeTemporaryAllocation(of:capacity:_:)    // UnsafeMutableBufferPointer<T>
withUnsafeTemporaryAllocation(ofByteCount:_:)    // UnsafeMutableRawBufferPointer

What about these names instead?

UnsafeMutablePointer<T>.withTemporaryMemoryBound(to:_:)
UnsafeMutableBufferPointer<T>.withTemporaryMemoryBound(to:capacity:_:)
UnsafeMutableRawBufferPointer.withTemporaryMemory(byteCount:alignment:_:)

These names mimic both allocate(capacity:) and withMemoryBound(to:_:) (and their equivalents); they're members of their relevant pointer types; and they don't use stdlib jargon like "ephemeral."

The downside of using member functions is that they generally require that the pointer type be named, which makes them somewhat wiiiiiiiide. I don't think that's avoidable without making them free functions again. That's an argument in favour of some sort of pointer shorthand/syntactic sugar as much as anything else.

This constraint (i.e. to get more than one you go through a buffer pointer) is consistent with my initial proposal.

lorentey · September 8, 2021, 7:53pm

grynspan:

What about these names instead?

UnsafeMutablePointer<T>.withTemporaryMemoryBound(to:_:)
UnsafeMutableBufferPointer<T>.withTemporaryMemoryBound(to:capacity:_:)
UnsafeMutableRawBufferPointer.withTemporaryMemory(byteCount:alignment:_:)

These work for me!

I don't think we need to optimize for brevity -- these are niche functions, ideally only used in low-level adapter layers.

grynspan · September 8, 2021, 7:57pm

It's certainly not a showstopper, but Swift loves brevity. Oh well—the types themselves are at issue here, not the stuff I'm adding. I'm not making it worse.

scanon · September 8, 2021, 8:11pm

I'm a strong +1 on this. It's a useful feature that we're missing. I do not like the names at all, but I think the bikeshedding here is already moving in a positive direction. In particular, the type produced by these is already an Unsafe....Buffer, so I don't feel the need to replicate that in the API. I like @beccadax's suggestions quite a bit, and I would like the name to include "uninitialized", and "temporary" or "scratchpad" or similar.

I worry a little bit about ending up with multiple nested closures for these in some cases, but it's not a big concern. I would like to have the raw buffer version as well so I can allocate a block of scratch memory and then bind regions to different types as needed.

From a formal specification perspective, I think that the actual evolution proposal could simply not (and maybe should not) mention stack vs. heap at all, since there's no notion of "stack" in the abstract Swift model, IIRC. It's an important implementation detail, but not part of the semantic specification of the operation in the Swift model. That said, it is important, so I'm not particularly opposed to having that section in.

grynspan · September 8, 2021, 8:18pm

What do you think of my counter-proposal?

It's really hard to justify the proposed functions without first admitting that the stack exists and is faster to allocate to than the heap. So I don't think I can remove that part of the proposal. But if you have suggestions for improvements to those sections, please do let me know.

scanon · September 8, 2021, 8:27pm

An improvement on the starting point, but too verbose for my taste, because in practice I don't think one will be able to elide the type very often. Left to my own devices, I would probably use something like:

withUninitializedWorkspace(of:capacity:_:) // UMBP<T>
withUninitializedWorkspace(byteCount:alignment:_:) // UMRBP

but those are maybe too terse for the standard library. (I slightly dislike "temporary", because it isn't; it's scoped. I will not make a big deal out of this, however.)

I think the scoped lifetime alone justifies them (and is precisely what allows the promotion to stack memory). But I definitely see where you're coming from.

1-877-547-7272 · September 8, 2021, 8:28pm

If we're going to use type methods, then we don't need to use a metatype to specify the type to bind the memory to. I also prefer the term "temporary allocation" over "temporary memory". So the API would probably be better as

UnsafeMutablePointer<Pointee>.withTemporaryAllocation(_:)
UnsafeMutableBufferPointer<Pointee>.withTemporaryAllocation(capacity:_:)
UnsafeMutableRawBufferPointer.withTemporaryAllocation(byteCount:alignment:_:)

That being said, I prefer @beccadax's proposed interface as it's much more clear than the type method approach. I personally don't see the benefit of using type methods over global functions — as @beccadax pointed out, the pointer type's name isn't really necessary for these functions.

beccadax · September 8, 2021, 11:02pm

grynspan:

beccadax:
withUnsafeTemporaryAllocation(of:_:)             // UnsafeMutablePointer<T>
withUnsafeTemporaryAllocation(of:capacity:_:)    // UnsafeMutableBufferPointer<T>
withUnsafeTemporaryAllocation(ofByteCount:_:)    // UnsafeMutableRawBufferPointer
What about these names instead?
UnsafeMutablePointer<T>.withTemporaryMemoryBound(to:_:)
UnsafeMutableBufferPointer<T>.withTemporaryMemoryBound(to:capacity:_:)
UnsafeMutableRawBufferPointer.withTemporaryMemory(byteCount:alignment:_:)
These names mimic both allocate(capacity:) and withMemoryBound(to:_:) (and their equivalents); they're members of their relevant pointer types; and they don't use stdlib jargon like "ephemeral."

I think these are better than with<#PointerType#>, since they include words (TemporaryMemory) that describe the actual purpose of the call, rather than leaving that implied. I also agree that Bound(to: is a better way to explain the meaning of the type-pinning argument than of: (although I might consider using boundTo: instead). But I still don't like having to name the base type.

To me, the relevant difference between this method and allocate(capacity:) is that, because allocate(capacity:) returns Self, the base type can often be inferred:

takeBuffer(.allocate(capacity: n))

And even in the cases where it cannot, you would have had to write the type somewhere else in the expression anyway:

let myBuffer = UnsafeMutableBufferPointer<T>.allocate(capacity: n)
let myBuffer: UnsafeMutableBufferPointer<T> = .allocate(capacity: n)

But with the APIs you're proposing, the base type will not appear in a return value—it will appear in a function argument's parameter list. It's very uncommon for these to have explicit types, and even when they do, you can't infer backwards from them to the base type.

This would not be an issue if naming the base type was helping to explain what the code does, but I would argue that it doesn't. Consider this call site:

UnsafeMutableBufferPointer.withTemporaryMemoryBound(to: Int.self, capacity: n) {
    ...
}

What does each part of the function call tell you?

with: This function matches the "with-function pattern", meaning it grants its body parameter access to some resource that is valid for the duration of the call to body. To an experienced Swift developer, this implies a ton of stuff about how to correctly use it. Verdict: Useful
Memory or Allocation: The resource being granted to the body is a memory allocation. (I like Allocation better than Memory because the latter is vague about what this memory is.) Verdict: Useful
Temporary: The memory allocation is different from a normal one in that it is meant for temporary use. This is a way of implying that it is a stack allocation without actually promising that it's a stack allocation. Verdict: Useful
Bound: The temporary memory will be bound to a type. Verdict: Useful
to: The next parameter is the type the memory will be bound to. Verdict: Useful
Int: The type it will be bound to is Int. Verdict: Useful
self: Syntactic duct tape used to avoid ambiguities between type syntax and literal syntax. Verdict: Not useful, but required by a language rule
capacity: The memory will have room for several elements, and the number of elements follows. Verdict: Useful
n: The number of elements to allocate memory for. Verdict: Useful

So all of those words either convey important information about the call, or are required by the language. But what about the base type?

Pointer: This call will involve a pointer. But isn't that obvious from the fact that it's allocating memory? Verdict: Not useful
Buffer: This call will be able to handle more than one element. But isn't that implied by the fact that we're passing a capacity? Verdict: Not useful
Mutable: The pointer will be able to mutate the pointee. But if the call is merely allocating memory without initializing it, wouldn't the API be useless if it didn't use a mutable pointer? Verdict: Not useful
Unsafe: Clearly states that misuse of this API can violate memory- or type-safety. Swift generally tries to make sure this is stated explicitly so that even developers who are not familiar with the API in question will recognize the potential for a safety violation. Verdict: Useful

Note as well that we would not want to eliminate the words that those three words are redundant with, because those other words are more specific.

So that's why I think these functions should not be nested. Keeping the global namespace clean is nice, but I don't think it's worth cluttering the call sites as much as it would. Sharing a base name keeps it at least conceptually about as clean as nesting them would, but avoids reducing clarity at use sites.

Incorporating the refinements I agree with, here's what I end up favoring:

withUnsafeTemporaryAllocation(boundTo:_:)
withUnsafeTemporaryAllocation(boundTo:capacity:_:)
withUnsafeTemporaryAllocation(byteCount:alignment:_:)

I also think withUnsafeTemporaryMemory(...) would be fine, although I don't like it quite as much.

grynspan · September 8, 2021, 11:16pm

Heh, that's absolutely true.

The more I think about it, the more I prefer exposing this functionality using free functions (with some variation of the names I suggested.) I hear folks' concerns about the number of withUnsafeXXX free functions in the standard library but I'm not convinced that that's a good enough reason to make these member functions (i.e. making them much longer at call sites.) So:

func withUnsafeTemporaryAllocationBound<T, R>(
  to type: T.Type,
  _ body: (UnsafeMutablePointer<T>) throws -> R) rethrows -> R

func withUnsafeTemporaryAllocationBound<T, R>(
  to type: T.Type,
  capacity: Int,
  _ body: (UnsafeMutableBufferPointer<T>) throws -> R) rethrows -> R

func withUnsafeTemporaryAllocation<R>(
  byteCount: Int,
  alignment: Int,
  _ body: (UnsafeMutableRawBufferPointer) throws -> R) rethrows -> R

(Note the parameter labels I'm using here match existing precedent in the standard library.)

There's value in a discussion about the proliferation of unsafe functions, but I think that discusson may come down to wanting to namespace "unsafe" mechanisms in some way (enum Unsafe, @unsafe, import Unsafe, something like that.) That's well beyond the scope of this proposal though.

And then, just as I was about to click the " Reply" button, I saw @beccadax's reply:

beccadax:

Incorporating the refinements I agree with, here's what I end up favoring:
withUnsafeTemporaryAllocation(boundTo:_:)
withUnsafeTemporaryAllocation(boundTo:capacity:_:)
withUnsafeTemporaryAllocation(byteCount:alignment:_:)
I also think withUnsafeTemporaryMemory(...) would be fine, although I don't like it quite as much.

I think us coming up with such similar proposals is a good sign. Note I said Bound(to: instead of (boundTo: because the former matches existing methods on the pointer types even though they're not quite right in re the Swift API guidelines. I'd prefer consistency with existing symbols here. withUnsafeTemporaryMemoryBound(...) would more closely match the existing withMemoryBound(...) method at the cost of less closely matching allocate(...).

1-877-547-7272 · September 9, 2021, 4:21am

I still prefer TemporaryAllocation over TemporaryMemory and I prefer (boundTo: over Bound(to:.

While I believe that consistency is important within libraries, I also believe that too much consistency between different constructs can cause confusion. I think the names withUnsafeTemporaryMemoryBound(to:_:) and withMemoryRebound(to:capacity:_:) are too similar with regard to their difference in functionality.

I also think withUnsafeTemporaryAllocationBound(to:_:) emphasizes the memory-binding part too much. withUnsafeTemporaryAllocation(boundTo:_:), on the other hand, makes it more clear that binding memory is just part of the process of creating the pointer/buffer. (It also avoids polluting the global namespace more since it uses the same symbol as the raw buffer version, but I don't think that's a very strong argument in favor of it — it's just a nice side effect.)

woolsweater · September 9, 2021, 4:30am

Excellent proposal; this will be great addition to the library. I also vote for static methods rather than free functions, but have no strong opinion on the exact naming.

benlings · September 9, 2021, 9:46am

I think ‘scoped’ makes the lifetime of the temporary buffer clearer. The docs could mention that this scoping allows for stack allocation as an implementation detail.

adam-fowler · September 9, 2021, 10:09am

+1 from me.
This fills a hole in Swift and does it in a way consistent with other Swift APIs.
I prefer nesting these inside the relevant Unsafe...Pointer types

xwu · September 9, 2021, 11:01am

I agree. This should definitely be boundTo.

The comparison to withMemoryRebound is spurious because that method could not be named withMemory: it doesn’t allocate any new memory and only rebinds, so that verb must be part of the base name there. Here, in contrast, the point of the method is that it’s working “with a temporary allocation” (or “with temporary memory”).

Moreover, “bound” can be misread as a noun when split from “to” (and indeed we use the term in the Swift standard library as a noun); “rebound” does not have that problem.

grynspan · September 9, 2021, 4:10pm

I see people feel very passionately about the naming of the proposed functions! We ultimately need to name them something though and I don't think there will be a set of names that pleases everyone. Sorry in advance to those of you who don't favour the names we eventually end up with.

scanon · September 9, 2021, 10:55pm

I know the bikeshedding of names drags on a bit, but what if we simply do something like:

withUnsafeBuffer<T>(capacity:_:)

I don't think it's necessary for this API to mention "binding" at all, and this more directly mirrors the rest of the Unsafe...BufferPointer API.

I could see adding "Temporary" or "Scoped", but I don't think that they're needed, since the actual buffer passed to the closure isn't accessible outside the closure anyway; the scoping is implicit in the API.

This closely mirrors withUnsafeUninitializedBuffer(...) from the pitch, which I think is my favorite other proposal by a pretty wide margin.

1-877-547-7272 · September 9, 2021, 11:26pm

The boundTo: parameter is helpful in resolving the generic T type. (You can’t directly specify generic type parameters in a function call — this functionality has been deemed confusing and unnecessary.) It means that instead of this code:

withUnsafeBuffer(capacity: c) { p: UnsafeMutableBufferPointer<SomeType> in
    // use the buffer (p)
}

we can write this code:

withUnsafeBuffer(boundTo: SomeType.self, capacity: c) {
    // use the buffer ($0)
}

which is more readable. This also has precedent in the withMemoryRebound function.

I think withUnsafeTemporaryAllocation is still a better name for these functions than withUnsafeBuffer. The former is more specific about what the buffer is without sacrificing clarity, while the latter is unspecific.