SE-0447: Span: Safe Access to Contiguous Storage

lorentey · October 1, 2024, 1:50am

I think that would be a mistake.

First of all, I don't think we should get to have second thoughts on the names of fundamental types of the stdlib, and we should especially not start messing with unsafe constructs. (The last time we did something like that was SE-0370. I deeply believe that the drawbacks/disruption of the assign → update API changes vastly outweighed their minor benefits.)

More interestingly though, my position is that naming these new types "spans" does not actually break with precedent -- the new constructs are different enough that they deserve a clean separation, even in terminology.

I don't think I've seen this spelt out anywhere, but the stdlib appears to consistently use the term "buffer" to mean a region of memory with unspecified (i.e., client-managed) initialization state. UnsafeBufferPointer and ManagedBuffer/ManagedBufferPointer do not (in general*) make type-level assumptions on which of their slots contain initialized values -- managing that is entirely outside their mandate, and they expect code that uses them to do custom bookkeeping to prevent misuse. This makes these constructs crucial for building efficient data structures (each of which come with well defined, but incompatible ways of performing that bookkeeping), but it also covers them in a thick layer of ambiguity and danger that makes it wildly inappropriate to use them as a way to exchange or transfer data. I look at the existing types as primarily useful for representing "storage", or data in stasis.

* UnsafeBufferPointer's Collection conformance is a bit of an outlier -- that conformance is only valid if the buffer is fully initialized. This unenforced requirement matches the similarly unenforced/unchecked initialization requirements on most mutable buffer pointer operations, though -- it's just unusual because it is on an entire conformance, not on individual operations.

In contrast, the Span type we propose here -- as well as the MutableSpan and OutputSpan variants we've provisionally named -- all come with very clear expectations/guarantees on the initialization state of the region of memory they're representing. These requirements make these types much narrower than our standard buffer types; in exchange, our expectation is that spans will become the standard way to uniformly represent data in motion -- data that is being accessed, or copied/moved/consumed.

Spans tame the unstructured storage buffers by imposing order over (parts of) them. I'd like to continue referring to the "storage buffer" of a Deque to mean the raw slab of its allocated memory, containing both initialized and uninitialized parts. But I'd also like to start talking about exposing the "spans" of a Deque, as in providing direct access to its (potentially discontiguous) initialized parts.

I'd expect a hypothetical UnsafeSpan and UnsafeMutableSpan to still come with the expectation that their memory must always be fully initialized. (And I'd imagine an UnsafeOutputSpan to have exactly as many items initialized at its start as its count.) In particular, I would not expect an UnsafeMutableSpan type to come with anything like UnsafeMutableBufferPointer's deinitializeElement(at:) operation, as it would violate the type's crucial invariants. The Unsafe prefix would imply either the lack of bounds checking in production builds, and/or the possibility of the span surviving its owner.

On the other hand, I'd expect a (safe) Buffer or BufferPointer type to come with some sort of built-in way to keep track of the initialization state of its slots. (We did consider introducing such a type, using an out of line bitmap of initialized slots. I think such a type would be of marginal utility at best: the need to allocate/maintain the bitmap makes it wildly impractical outside the rare cases that already expect one, such as certain forms of hash tables.)

I think the "span" vs "buffer" terminological distinction is legitimate -- I consider these words to mean different things, each of which is important enough to deserve a name.

I fully agree with this! Luckily, the bare minimum API that this proposal offers does not include "great new APIs for accessing the elements of a Span" that don't also exist on UnsafeBufferPointer. It merely provides a couple of subscripts (very similar to UBP), and it engages in the same sort of desperate handwaving about how they work as SE-0437 did about UnsafeBufferPointer's subscript.

Citations

(SE-0437:) Note that the generalized indexing subscript cannot provide a regular getter, as that would work by returning a copy of the item -- so the Standard Library currently has to resort to an unstable/unsafe language feature to provide direct borrowing access. (This isn't new, as we previously relied on this scheme to optimize performance; but its use now becomes unavoidable. Defining a stable language feature to implement such accessors is expected to be a topic of a future proposal.)

(SE-0447:) Note that we use a _read accessor for the subscript, a requirement in order to yield a borrowed non-copyable Element (see "Coroutines".) This will be updated to a final syntax at a later time, understanding that we intend the replacement to be source-compatible.

As I noted earlier, we are now certain that the coroutine based _read accessor does not in fact have the right semantics for these subscripts; stabilizing it may be useful elsewhere, but it will not help Span (nor UnsafeBufferPointer). The fact is that we do not currently have the means to properly implement these subscripts. When it becomes possible to do so, I do expect Unsafe*BufferPointer and Span to both immediately adopt whatever solution we'll come up with. (If the solution will be stable enough to propose to Swift Evolution, I expect the document to formally require these subscripts to adopt it.)

Span's proposed unchecked subscript is new; it arises from the strong desire for the default span subscript to be fully checked, even in production builds. I think it would very much be desirable for UnsafeBufferPointer to follow the same model; unfortunately, changing how much validation its existing subscript performs would be a massively disruptive change, and we appear to have no appetite for it at this time. (The (reasonable) worry is that enabling bounds checking would lead to a sudden, significant performance drop throughout all Swift code that gets recompiled with the new stdlib.)

We can still consider adding an explicitly unchecked subscript to the Unsafe*BufferPointer types; however, given that the existing unlabeled subscript is already unchecked, this would be more misleading than useful.