SE-0447: Span: Safe Access to Contiguous Storage

Douglas_Gregor · September 26, 2024, 8:54pm

There is a vision document coming soon where I'd love to discuss these ideas further. However, let's try to keep the discussion here focused on Span.

Doug

jrose · September 27, 2024, 3:21am

Future Directions discussion of slice types

Span does not own its storage and there is no concern about leaking larger allocations. It would benefit from being its own slice type.

I don’t think this can work unless the Index type is not Int. There are three useful properties of slicing:

Indexes can be shared across slices and the base collection.
The slice type is the same type as the base type.
The index type is Int and starts at 0.

Unfortunately, you can only pick two of them. Collection requires (1), and I don’t think the hypothetical BorrowingCollection would change that. Array picked (3) as well, and Data picked (2), and the result is Data is harder to use correctly.

Span doesn’t even have the option to pick (1) and (2) with an Int Index, since it is just a pointer-length pair. So it can follow UnsafeBufferPointer and pick (1) and (3), or it can carve its own path by not making Int the Index type. @lorentey has actually worked out that the Index can be a wrapper around UnsafePointer to satisfy (1) and (2), and still provide integer subscripts as an overload. It’s just the Collection operations that wouldn’t produce integers.

lorentey · September 27, 2024, 10:01pm

Re: Indices, slice types

Span does not own its storage and there is no concern about leaking larger allocations. It would benefit from being its own slice type.

Note that this sentence is in the Appendix; it summarizes some of the (many, many) alternative designs @glessard implemented. This section serves as a rationale for the proposed design. It details options that we carefully evaluated and found unworkable; it does not present anything that we'd still want to explore in the future. (Certainly not in context of any span types -- including variants such as MutableSpan and OutputSpan.)

Indeed -- after exhaustively exploring the available options, we decided that there was no practical way to make self-slicing work.

The idea of Index wrapping an UnsafePointer was particularly attractive; unfortunately it generally allows indices to fall "in between" valid elements, and Span would need to perform expensive validation to rule those out, on every access. We went through several rounds of (increasingly desperate) refinements/compromises to make this idea work, but despite our efforts, to our surprise we ultimately had to conclude that the best option overall is to follow UnsafeBufferPointer's preexisting design.

I did not expect that we'd end up adopting UnsafeBufferPointer's indexing model as is, but I cannot argue with these results -- we tried pretty much every other alternative, and nothing came close.

(I still think UnsafePointer would have been a better choice for UnsafeBufferPointer.Index, but for a 100% safe container like Span, offsets from the start are the way to go.

And I find that the extracting methods in SE-0437 provide a good enough alternative to self-slicing for these non-owning, quasi/fully referential container types.)

CharlesS · September 28, 2024, 6:40pm

I'll make one more plea that we include some kind of polyfill for this that can be deployed back to earlier macOS, iOS, etc. versions. If it was possible for Concurrency and its associated types, this seems like it should easily be less difficult than that was.

Otherwise, it's going to be years before a lot of use can use any of this...

Datagram · September 29, 2024, 3:26am

Will Span fill the need for FixedArray in Swift?

Will Span be used to represent imported fixed-size array fields from C instead of tuples?

glessard · September 29, 2024, 4:10am

No, because Span doesn’t own the storage it represents. There are other plans for an owning container to supersede homogeneous tuples (the integer generic parameters pitch is a stepping stone). It should be possible to use a Span to pass a Swift-based fixed array to C, though.

Jnosh · September 29, 2024, 2:01pm

I am broadly in favor of the direction outlined this proposal.

I think Spans will be a useful tool that I intend to make use of.

I am in favor of Span as a name, I have some previous familiarity via C++ but I also think that while it may not be self-explanatory to someone unfamiliar with it, it is short and concise and would expect it to be easy to pick up after coming across it for the first time and reading related documentation or proposals.

I would mirror some of the concerns raised with parts of the proposed API that may not be entirely fleshed out yet, in particular @ktoso had a very good overview.
I don't have strong opinions as to the best approach but I wouldn't mind if easily severable parts were postponed to future proposals.
Postponing the addition of Span entirely until after lifetime dependencies have been proposed or accepted is also something I would be fine with, I am more immediately interested in ~Escapable and lifetime dependencies generally.

I have previously read some of the related pitches and read this and the SE-0446 proposal for this review.

Douglas_Gregor · September 30, 2024, 6:58pm

(Dons review manager hat)

They're proposed now to gather feedback on the shape of the API, most of which is independent of the deep details of lifetimes. That's happening here, which is good. When the API "ships" is a separate matter.

(Dons personal-opinions hat)

I don't agree with your assessment above. The span provided to the closure has a lifetime tied to the execution of that closure's body: this describes the semantics that the with-style functions have had since Swift 1.0, and persists into newer APIs like withTaskGroup, despite not having a way to express this in the language. It also describes how arguments of non-@escaping function type work in practice.

With an expressive system to describe lifetime dependencies, one can invent other lifetimes for higher-order functions. We should not do anything here that would invalidate the simple description above, because it would undermine progressive disclosure in the language and create an unnecessary rift between existing Swift APIs like withUnsafeBufferPointer and the newer APIs proposed as part of this work on lifetimes. If the obvious lifetime semantics of withSpan don't work with the more general lifetime dependencies, it's the lifetime dependencies proposal that should be adjusted. @Joe_Groff discusses lifetimes and the higher-order generalizations more in SE-0446: Nonescapable Types - #28 by Joe_Groff.

I'd actually take this one step further: I think we should consider adding withSpan to the standard library. It provides clear scoping for the lifetime of the span, fits in with existing APIs cleanly, and offers a direct safe replacement to a swath of heavily-used unsafe APIs. As much as I'd like to use a property solution (span or storage or whatever we call it), it is a very different model to what we've had in Swift for a decade and it pushes developers further down the path of having to understand the lifetime model than withSpan.

Doug

John_McCall · September 30, 2024, 7:42pm

I think any differences between the API of this type and that of UnsafeBufferPointer should be clearly justified.

If we think that Span is a great name for this concept and clearly worth breaking with precedent over, then we should be proposing renaming UnsafeBufferPointer to UnsafeSpan and so on. We can separately decide whether to deprecate the old name.

Similarly, if there are great new APIs for accessing the elements of a Span, those APIs should also generally be available on UnsafeBufferPointer; this proposal should not be a vehicle for offering a better API to only Span. Conversely, every API on UBP should also be available on Span unless it would clearly undermine the safety story of Span. I understand that we can't yet make Span conform to Collection, and that's a fair justification for some amount of missing API; that should clearly be a long-term goal of this effort, however.

lorentey · October 1, 2024, 1:50am

I think that would be a mistake.

First of all, I don't think we should get to have second thoughts on the names of fundamental types of the stdlib, and we should especially not start messing with unsafe constructs. (The last time we did something like that was SE-0370. I deeply believe that the drawbacks/disruption of the assign → update API changes vastly outweighed their minor benefits.)

More interestingly though, my position is that naming these new types "spans" does not actually break with precedent -- the new constructs are different enough that they deserve a clean separation, even in terminology.

I don't think I've seen this spelt out anywhere, but the stdlib appears to consistently use the term "buffer" to mean a region of memory with unspecified (i.e., client-managed) initialization state. UnsafeBufferPointer and ManagedBuffer/ManagedBufferPointer do not (in general*) make type-level assumptions on which of their slots contain initialized values -- managing that is entirely outside their mandate, and they expect code that uses them to do custom bookkeeping to prevent misuse. This makes these constructs crucial for building efficient data structures (each of which come with well defined, but incompatible ways of performing that bookkeeping), but it also covers them in a thick layer of ambiguity and danger that makes it wildly inappropriate to use them as a way to exchange or transfer data. I look at the existing types as primarily useful for representing "storage", or data in stasis.

* UnsafeBufferPointer's Collection conformance is a bit of an outlier -- that conformance is only valid if the buffer is fully initialized. This unenforced requirement matches the similarly unenforced/unchecked initialization requirements on most mutable buffer pointer operations, though -- it's just unusual because it is on an entire conformance, not on individual operations.

In contrast, the Span type we propose here -- as well as the MutableSpan and OutputSpan variants we've provisionally named -- all come with very clear expectations/guarantees on the initialization state of the region of memory they're representing. These requirements make these types much narrower than our standard buffer types; in exchange, our expectation is that spans will become the standard way to uniformly represent data in motion -- data that is being accessed, or copied/moved/consumed.

Spans tame the unstructured storage buffers by imposing order over (parts of) them. I'd like to continue referring to the "storage buffer" of a Deque to mean the raw slab of its allocated memory, containing both initialized and uninitialized parts. But I'd also like to start talking about exposing the "spans" of a Deque, as in providing direct access to its (potentially discontiguous) initialized parts.

I'd expect a hypothetical UnsafeSpan and UnsafeMutableSpan to still come with the expectation that their memory must always be fully initialized. (And I'd imagine an UnsafeOutputSpan to have exactly as many items initialized at its start as its count.) In particular, I would not expect an UnsafeMutableSpan type to come with anything like UnsafeMutableBufferPointer's deinitializeElement(at:) operation, as it would violate the type's crucial invariants. The Unsafe prefix would imply either the lack of bounds checking in production builds, and/or the possibility of the span surviving its owner.

On the other hand, I'd expect a (safe) Buffer or BufferPointer type to come with some sort of built-in way to keep track of the initialization state of its slots. (We did consider introducing such a type, using an out of line bitmap of initialized slots. I think such a type would be of marginal utility at best: the need to allocate/maintain the bitmap makes it wildly impractical outside the rare cases that already expect one, such as certain forms of hash tables.)

I think the "span" vs "buffer" terminological distinction is legitimate -- I consider these words to mean different things, each of which is important enough to deserve a name.

I fully agree with this! Luckily, the bare minimum API that this proposal offers does not include "great new APIs for accessing the elements of a Span" that don't also exist on UnsafeBufferPointer. It merely provides a couple of subscripts (very similar to UBP), and it engages in the same sort of desperate handwaving about how they work as SE-0437 did about UnsafeBufferPointer's subscript.

Citations

(SE-0437:) Note that the generalized indexing subscript cannot provide a regular getter, as that would work by returning a copy of the item -- so the Standard Library currently has to resort to an unstable/unsafe language feature to provide direct borrowing access. (This isn't new, as we previously relied on this scheme to optimize performance; but its use now becomes unavoidable. Defining a stable language feature to implement such accessors is expected to be a topic of a future proposal.)

(SE-0447:) Note that we use a _read accessor for the subscript, a requirement in order to yield a borrowed non-copyable Element (see "Coroutines".) This will be updated to a final syntax at a later time, understanding that we intend the replacement to be source-compatible.

As I noted earlier, we are now certain that the coroutine based _read accessor does not in fact have the right semantics for these subscripts; stabilizing it may be useful elsewhere, but it will not help Span (nor UnsafeBufferPointer). The fact is that we do not currently have the means to properly implement these subscripts. When it becomes possible to do so, I do expect Unsafe*BufferPointer and Span to both immediately adopt whatever solution we'll come up with. (If the solution will be stable enough to propose to Swift Evolution, I expect the document to formally require these subscripts to adopt it.)

Span's proposed unchecked subscript is new; it arises from the strong desire for the default span subscript to be fully checked, even in production builds. I think it would very much be desirable for UnsafeBufferPointer to follow the same model; unfortunately, changing how much validation its existing subscript performs would be a massively disruptive change, and we appear to have no appetite for it at this time. (The (reasonable) worry is that enabling bounds checking would lead to a sudden, significant performance drop throughout all Swift code that gets recompiled with the new stdlib.)

We can still consider adding an explicitly unchecked subscript to the Unsafe*BufferPointer types; however, given that the existing unlabeled subscript is already unchecked, this would be more misleading than useful.

Andrew_Trick · October 1, 2024, 4:31am

Adding to Karoy's Span vs UnsafeBufferPointer distinctions... Span does away with strict aliasing, as would a hypothetical UnsafeSpan. For decoding use cases, it's convenient to be able to convert the element type:
Span<T: BitwiseCopyabe>.unsafeView<U: BitwiseCopyable>(as type: U.Type) -> Span<U>
We should always encourage the least unsafe form of an operation. A direct conversion like this is far safer than forcing programmers to use more complicated and powerful unsafe APIs, which are easy to misuse.

The UnsafeBufferPointer family of types should be relegated to the C interop layer. The "pointer" terminology is a good way to associate a type with C. "Pointer" should not appear in any Swift types unless they are intended for modeling C pointers. By making it clear that pointer types are only intended for calls to C, it's more understandable that they have C-like semantics.

I also think it's a mistake to use the "Unsafe" prefix as an umbrella for multiple forms of unsafety. If we do have an UnsafeSpan, it should only actually be unsafe in the one way in which it is obviously unsafe: unenforced temporal lifetime. It could more accurately be called EscapableSpan.

lorentey · October 1, 2024, 7:38am

I of course second what you wrote, but this passage isn't right -- while C interop is indeed an important use case for Unsafe*Pointer (although classic C has no use for Unsafe*BufferPointer), however, this is not at all the only use case for these types. They have crucial uses cases even in code that never interacts with C, and I expect many of those use cases to remain in place indefinitely, even after we've fully achieved every goal on the performance predictability roadmap.

In particular, I do not expect span types to ever be suitable to serve as the underlying storage representation of in-memory data structures. High-performance data structure implementations have a genuine need to be built on top of mutable buffers (in the stdlib sense) that don't try to dictate any particular initialization strategy over their storage slots. The Hypoarray illustration in SE-0437 (which I am hoping to pitch for real in the near(ish) future, provisionally renamed RigidArray) is a really basic example of the kind of UnsafeMutableBufferPointer use case that I do not expect to ever get replaced by spans. I also do not believe we'll figure out how to (usefully) build escapable types out of nonescapable parts; none of the possible approaches seem workable. (I expect there will also remain some legitimate general need for Swift code to allocate raw uninitialized memory, although its fair to expect most of these will disappear if/when we manage to ship a rich enough selection of high-performance data structures.)

Importantly, these UnsafeBufferPointer use cases are hidden deep within the internal implementation details of high-performance data structure implementations (or other safe constructs); I do not expect they will ever need to rise to the level of their public API surface.

I do expect spans to entirely replace our current misuse of Unsafe*[Buffer]Pointer in public APIs such as Sequence.withUnsafeContiguousStorageIfAvailable or Array.init(unsafeUninitializedCapacity:initializingWith:).

xwu · October 1, 2024, 2:28pm

Mmm, I'm curious about your thinking particularly on the latter. Earlier you said the key invariant of Span types is that their memory is always fully initialized, but of course the purpose of Array(unsafeUninitializedCapacity:initializingWith:) is to get you to that point from uninitialized memory.

Nobody1707 · October 1, 2024, 3:21pm

I still think this API should require AnyBitPattern and not just BitwiseCopyable. Bool is BitwiseCopyable, but creating one from 0x02..<0xFF is instant UB and that's a bit too unsafe even for an unsafeView.

ksluder · October 1, 2024, 3:31pm

The same could be said about any BitwiseCopyable type that maintains invariants.

Nobody1707 · October 1, 2024, 3:33pm

Yes, but most other BitwiseCopyable types with invariants only have UB when you perform certain operations on them after the invariants are broken, in the case of Bool (or any other enum) merely existing in an invalid state is UB.

ksluder · October 1, 2024, 3:37pm

That would imply this API could never be used with a struct that contained a Bool stored property. I don’t think that’s tenable.

Nobody1707 · October 1, 2024, 3:42pm

That's not enitely true, you would just have to make sure the bytes corresponding to the Bool were valid first. Also, you can already hit this UB by rebinding the buffer from withUnsafe{Mutable}Bytes(of:_:) or with unsafeBitCast(_:to:). This is already something you should not be doing.

Andrew_Trick · October 1, 2024, 3:48pm

Good point. It's not worth introducing an UninitializedSpan type just to handle some implementation details that very few people will encounter. I'm mainly concerned with types that appear in public API.

ksluder · October 1, 2024, 3:53pm

I would appreciate @Andrew_Trick’s input on whether the existence of memory bound to Bool having a value other than 0x0 or 0x1 is undefined behavior. If that’s true, is Bool the only type with this very restrictive property? Or would the existence of a byte bound to String.UTF8View.Element with its high bit set also immediately invoke nasal demons?

I don’t think it would ever be practical to say that code must introspect an opaque byte buffer for all of the bytes a Bool (or any other “special” type) might be constructed from at the risk of invoking UB. The behavior of trying to initialize a Bool from an arbitrary byte should be well defined.