[Pitch] Box

+1 to simply box.value. Simple, natural, and unsurprising.

27 Likes

I can imagine three kinds of reasons one would want to use a Box:

  1. To implement a recursive tree-like data structure.
  2. To prevent structs from taking too much space on the stack.
  3. To store pointers to a heap allocation while being able to move the Box itself, in order to implement a self-referential data structure.

The first case is (or should be) already served by indirect enum cases. Arguably, the second case is better served by indirect stored properties (and variables), which would allow for copy-on-write optimization (at least for copyable types), and would allow the compiler to merge multiple indirect stored properties into a single heap allocation.

The third case is impossible to do in current Swift, because it's impossible to get a pointer without binding the lifetime of that pointer to some (borrowing or mutating) access of the value, unless the pointer was manually allocated.

More subtly, the third case requires weaker aliasing rules than the first two cases. With the first two cases, presumably, we'd like to guarantee that exclusive access to the Box (or noncopyable indirect property) implies exclusive access to the heap allocation. That allows for better optimizations, but it also breaks the "stable address" semantics required for the third case, because any pointer to the heap allocation will be invalidated by mutating or moving the Box.

The Box type in Rust has the stronger aliasing rules which accommodate the first and second cases, but preclude the third case by breaking "stable address" semantics. That's unfortunate, because people who want the third case have to manually use raw pointers instead. Or they could just ignore the issue and have their code exhibit subtle undefined behavior, which is easy because the aliasing rules aren't well-documented. Some parts of the ecosystem have ended up doing that in practice, such as the owning_ref library formerly used in the Rust compiler. Fixing the problem (without splitting Box into two different types) would likely require some kind of compromise.[1]

Thoughts on self-referential data structures

With Rust's ownership system, it's notoriously difficult to implement self-referential data structures without either failing the borrow checker or breaking Rust's aliasing rules.[2] Unsafely-implemented self-referential data structures are widely used in async Rust, and the end result is that many of those data structures, including the stackless coroutines generated by the Rust compiler, exhibit undefined behavior under the normal aliasing rules.

Currently, the Rust compiler (and MIRI, a tool to catch undefined behavior at runtime) just "turns off" the normal aliasing rules for what it thinks are self-referential data structures.[3] There are proposals to introduce an official way to opt out of the normal aliasing rules, such as an UnsafePinned or UnsafeAliasedCell marker type.[4]

Given that unsafely-implemented self-referential data structures have caused so many problems for Rust, I think it's a bad idea to repeat their mistake by encouraging them in Swift. Currently, Swift does a good job at discouraging people from breaking the aliasing rules by only exposing pointers through closure-based APIs like withUnsafePointer. In the future, we could consider extending the ownership system to accommodate self-referential data structures in a safe way. I'm not sure exactly what that could look like, but we could consider using some of the ideas in region-based isolation, which already allows the compiler to reason about (a different kind of) aliasing.


  1. Some relevant discussion of the issue: What are the uniqueness guarantees of Box and Vec? · Issue #326 · rust-lang/unsafe-code-guidelines · GitHub, https://github.com/Kimundi/owning-ref-rs/issues/49 ↩︎

  2. More specifically, the aliasing rules of Stacked Borrows and/or Tree Borrows, the most prominent proposed formal semantics for Rust. ↩︎

  3. Specifically, it looks at which types opt out of the Unpin trait as a heuristic. ↩︎

  4. Some relevant discussion: 3467-unsafe-pinned - The Rust RFC Book, Tracking Issue for RFC 3467: UnsafePinned · Issue #125735 · rust-lang/rust · GitHub, intrusive.md · GitHub ↩︎

3 Likes

Assuming this is a temporary state of affairs, it seems unfortunate to introduce a new standard library type which will then be subsumed by a built-in language feature in the future.

14 Likes

There’s a lot to like about this, though as @Joe_Groff mentioned, there are a ton of existing class-based implementations called Box in the Swift ecosystem (I’ve written a few myself), which IMO is enough of a reason not to call it Box.

I was actually surprised while reading this proposal to discover that it was for a noncopyable type – I just assumed its semantics would match those of the pre-existing implementations. I’m tolerably familiar with Rust, so I know that its Box type is also noncopyable. But that’s the default in Rust, so I wouldn’t consider that to be evidence that we should use the same name in Swift (Rust’s String is also noncopyable, and we consider that difference to be unproblematic)

I would suggest UniqueBox, but Unique would also be fine. Even NoncopyableBox would be better than plain Box from my perspective.

4 Likes

Because Standard Library does not ship a Heap data structure I don't think naming this new type as HeapBox would be a great idea: engineers new to Swift would come here looking for an efficient heap priority queue.

6 Likes

It should have conditional Copyable conformance and preserve value semantics using CoW. Unfortunately unless things have changed you can’t use indirect enums to do most of that work automatically, but I could be wrong.

and it should be called HeapAllocated<T> as that is the stated purpose.

2 Likes

This requires reference counting, which would go against the point of the proposal. It may be an indication that the proposed type needs a more specific name, though.

3 Likes

A conditionally-Copyable struct cannot have a deinit which would preclude it from being implemented as is currently being proposed. Box could instead be implemented by a struct wrapping a reference to a class, but this would incur time and space overhead of reference counting even in the case of a non-Copyable payload.

3 Likes

I don't think this is a problem, since this is a noncopyable type. Noncopyable types are COW types ;)

I was thinking in terms of allocation overhead—the same thing Dave was bringing up, from the other direction. indirect enum Foo { … } would behave differently from enum Bar { … } + Indirect<Bar>.

This fits an essential niche and a need, so the question is not whether but how.

As an outsider, I’m surprised at the unnecessary obfuscation of Box<T> and box[], given the narrow use case here and the high traffic of a stdlib type.

For the type, any name that specifically indicates possibly non-copyable unique heap storage would be better than using one of the many analogies for wrapping, which invites confusing comparison to all the other wrappers.

The confusion is for people, not compilers. The compiler might tell my Box from your Box (or mistakenly assume the stdlib Box for a missing import), but the reader has to distinguish this from other Box’s. And it’s not really just a type (like String or Array), but a new kind of storage (more like Atomic).

(A terse name of art like this might be warranted as the basis for a projected family of related types, more for composability than clarity).

For access to the value via box[], no one loves the name rawValue, but every Swift developer understands it to be the closely-wrapped element.

For the pitch proposal motivation, it would be helpful to have a table of Swift storage classes and their features, to show where a unique reference fits and anticipate where others might evolve. Similarly, it would be nice to know how this plays with other protocols and usages. E.g., could a Box for an AtomicRepresentable itself be AtomicRepresentable? Can we address the value with a KeyPath? Could (need?) ObjectIdentifier be extended to this? How does one get the runtime type or size of the Box vs. the runtime type of the rawValue? Can this type be distinguished from others syntactically for macro purposes? The box value can be another box or struct with transitive boxes. Is there anything developers need to understand about deinit in that case?

3 Likes

I would love for a symbolic shorthand like [] to become commonplace for de-referencing a pointer. .pointee just melts into the surrounding property accesses, but when I’m working with unsafe pointers I would really like to know when I’m crossing allocation boundaries. Since Box is effectively a SafePointer (ManagedPointer?), I think it makes sense to use the same spelling for both. The LSG might have opinions on whether this is the vehicle for that change, or if it should be proposed separately.

8 Likes

I was confused by this for a while. I supposed it should have the semantics of { borrow, mutate } right?

1 Like

An advantage of extending indirect enum cases to noncopyable types, I think, is that it would preserve the ability to pattern match behind the heap allocation. It's currently impossible to extract a struct property inside a pattern, and adding that ability for computed properties like Box.value/Box.pointee/Box.subscript() while still checking exhaustiveness would be a nontrivial change.

1 Like

I find the terms Box, Unique, UniqueBox, etc. quite vague. I would suggest something along the lines of UniqueAllocation or UniquePointer. Also, as others have pointed out, the empty square brackets are quite cryptic and non-Swifty. My current preference would be to declare the type as a property wrapper and reference its value through the projectedValue syntax.

6 Likes

Me, too.

The name, not the Box, is hiding in the description.

Pointer

7 Likes

Ingenious. Given this feature is meant to be a safe and error proof alternative to:

3 Likes

How are mutations handled? mutableSpan seems to use mutating get .
Ideally this would only need exclusive access though.

Would it also make sense to have non-mutable version of Box similar to how we have Span and MutableSpan?

What about Sendability of this type? It seems to me that if Value is Sendable Box could be Sendable as well as mutations should require exclusive access.

2 Likes

Can you please describe your use case in more detail? I can think of a number of reasons why you might want to do this, but they all come with various quirks that I don't think are explored well in this proposal. I think your goal is to emulate std::unique_ptr from C++ or Box from Rust, but they operate in languages that are very different than Swift, and they have different guarantees to boot.

In both Rust and C++, you can use their "heap" types to hold what is effectively any SomeProtocol. Obviously, this sees little use in Swift, because you can use any SomeProtocol, and the language can figure things out. This is a special case of not knowing the size of the type. But again, in Swift this is generally not a problem: either the size of the type is known, or it is an indirected type that is heap allocated by the compiler behind the scenes. So the only other "standard" use I can imagine that matches up is that you have a type that is very large and of known size and you want to guarantee the compiler does not copy it or overflow the stack. I feel like this is the case you are talking about, because your example uses InlineArray, which is the type that this would obviously be a problem for. If that is the case, then my feeling on this is that you don't really want a generic wrapper type at all, but an InlineArray that goes on the heap. Which is not really an InlineArray, but just an Array with a compile time size, and without ARC traffic because you intend to make it noncopyable. I don't think this looks like a Box at all but rather a special type that you should probably pitch instead.

Why am I against making a more general solution? Because I think this misses a lot of things that people actually want. Is your Box guaranteed to have a fixed address? There are…not a lot, but definitely a few use cases that could use such a thing. But first you'd have to promise this, and you haven't done so here, so I'm not sure it's on the table. And even then, Swift does not have the tools to do interesting things with such a type. Like, let's say that I want to roll my own efficient lock:

struct Lock<T> {
	let storage: Box<(os_unfair_lock_t, T)>
}

Note that this lets me merge the allocations for both T and the os_unfair_lock_t (which needs a stable address), which is nice. Of course, this type is not Copyable, nor should it be because it contains a lock and it would be bad™ to allow it to be copied. However, it is a very common operation to want to share this object! I put a mutex in there for a reason. In Rust or C++ you can make a shared pointer/Arc to this noncopyable type and then freely pass it around, and use the lock to protect its internals. In Swift there is no type that does this, and I don't think there exist facilities to even do this efficiently. So I would argue that if you are trying to make a general type, it is too early to do that because the languages we are trying to emulate here offer facilities that Swift does not provide to make the type generally useful.


Also, completely unrelated to my overall thoughts, I think that if we are going to bikeshed I am not a huge fan of prototyping new sugar to types that don't exist yet, are intentionally designed to see little use, and are highly specialized. The concept of Deref is pretty neat, but it definitely should not be part of a proposal for Box. Instead I would prefer a Box with a "normal" API (if I had to pick, I would go for HeapAllocated and .value), and then either people cry out for nicer syntax for this specific thing or we decide to unify the concept of "getting the value of this wrapper thing" and can have this type implement it–both at a future date.

11 Likes

The few times I made my own Box type similar to the proposed one would have been covered by indirect support for struct properties.

2 Likes