There is a vision document coming soon where I'd love to discuss these ideas further. However, let's try to keep the discussion here focused on Span
.
Doug
There is a vision document coming soon where I'd love to discuss these ideas further. However, let's try to keep the discussion here focused on Span
.
Doug
Span
does not own its storage and there is no concern about leaking larger allocations. It would benefit from being its own slice type.
I don’t think this can work unless the Index type is not Int. There are three useful properties of slicing:
Unfortunately, you can only pick two of them. Collection requires (1), and I don’t think the hypothetical BorrowingCollection would change that. Array picked (3) as well, and Data picked (2), and the result is Data is harder to use correctly.
Span doesn’t even have the option to pick (1) and (2) with an Int Index, since it is just a pointer-length pair. So it can follow UnsafeBufferPointer and pick (1) and (3), or it can carve its own path by not making Int the Index type. @lorentey has actually worked out that the Index can be a wrapper around UnsafePointer to satisfy (1) and (2), and still provide integer subscripts as an overload. It’s just the Collection operations that wouldn’t produce integers.
Span
does not own its storage and there is no concern about leaking larger allocations. It would benefit from being its own slice type.
Note that this sentence is in the Appendix; it summarizes some of the (many, many) alternative designs @glessard implemented. This section serves as a rationale for the proposed design. It details options that we carefully evaluated and found unworkable; it does not present anything that we'd still want to explore in the future. (Certainly not in context of any span types -- including variants such as MutableSpan
and OutputSpan
.)
I don’t think this can work unless the Index type is not Int.
Indeed -- after exhaustively exploring the available options, we decided that there was no practical way to make self-slicing work.
The idea of Index
wrapping an UnsafePointer
was particularly attractive; unfortunately it generally allows indices to fall "in between" valid elements, and Span
would need to perform expensive validation to rule those out, on every access. We went through several rounds of (increasingly desperate) refinements/compromises to make this idea work, but despite our efforts, to our surprise we ultimately had to conclude that the best option overall is to follow UnsafeBufferPointer
's preexisting design.
I did not expect that we'd end up adopting UnsafeBufferPointer
's indexing model as is, but I cannot argue with these results -- we tried pretty much every other alternative, and nothing came close.
(I still think UnsafePointer
would have been a better choice for UnsafeBufferPointer.Index
, but for a 100% safe container like Span
, offsets from the start are the way to go.
And I find that the extracting
methods in SE-0437 provide a good enough alternative to self-slicing for these non-owning, quasi/fully referential container types.)
I'll make one more plea that we include some kind of polyfill for this that can be deployed back to earlier macOS, iOS, etc. versions. If it was possible for Concurrency and its associated types, this seems like it should easily be less difficult than that was.
Otherwise, it's going to be years before a lot of use can use any of this...
Will Span fill the need for FixedArray in Swift?
We're looking into how best to give Swift the equivalent functionality of fixed-size arrays in C or other systems languages. Here are some directions I've been considering, and I'd love to hear the community's ideas and reactions: Tuples as fixed-size arrays This was the approach outlined in our most recent pitch on the subject: we already use homogeneous tuples to represent imported fixed-size array fields from C, and library developers sometimes reach for them manually (usually under duress)…
Will Span be used to represent imported fixed-size array fields from C instead of tuples?
No, because Span doesn’t own the storage it represents. There are other plans for an owning container to supersede homogeneous tuples (the integer generic parameters pitch is a stepping stone). It should be possible to use a Span to pass a Swift-based fixed array to C, though.
I am broadly in favor of the direction outlined this proposal.
I think Spans will be a useful tool that I intend to make use of.
I am in favor of Span as a name, I have some previous familiarity via C++ but I also think that while it may not be self-explanatory to someone unfamiliar with it, it is short and concise and would expect it to be easy to pick up after coming across it for the first time and reading related documentation or proposals.
I would mirror some of the concerns raised with parts of the proposed API that may not be entirely fleshed out yet, in particular @ktoso had a very good overview.
I don't have strong opinions as to the best approach but I wouldn't mind if easily severable parts were postponed to future proposals.
Postponing the addition of Span entirely until after lifetime dependencies have been proposed or accepted is also something I would be fine with, I am more immediately interested in ~Escapable
and lifetime dependencies generally.
I have previously read some of the related pitches and read this and the SE-0446 proposal for this review.
I am of course fully on board with adding the
Span
/RawSpan
types to the Standard Library. But I question the wisdom of proposing them now: without initializers, and without any other way to produce actual span instances, the proposed span types aren't going to be actually usable for anything in practice.
(Dons review manager hat)
They're proposed now to gather feedback on the shape of the API, most of which is independent of the deep details of lifetimes. That's happening here, which is good. When the API "ships" is a separate matter.
The proposal suggests that it would be possible to implement
withSpan
/withBytes
methods without the need to reason about lifetime dependencies. I don't think that is true -- in fact, I believe that such higher order functions establish far more complicated lifetime dependencies than a function that directly returns aSpan
.
(Dons personal-opinions hat)
I don't agree with your assessment above. The span provided to the closure has a lifetime tied to the execution of that closure's body: this describes the semantics that the with-style functions have had since Swift 1.0, and persists into newer APIs like withTaskGroup, despite not having a way to express this in the language. It also describes how arguments of non-@escaping
function type work in practice.
With an expressive system to describe lifetime dependencies, one can invent other lifetimes for higher-order functions. We should not do anything here that would invalidate the simple description above, because it would undermine progressive disclosure in the language and create an unnecessary rift between existing Swift APIs like withUnsafeBufferPointer and the newer APIs proposed as part of this work on lifetimes. If the obvious lifetime semantics of withSpan
don't work with the more general lifetime dependencies, it's the lifetime dependencies proposal that should be adjusted. @Joe_Groff discusses lifetimes and the higher-order generalizations more in SE-0446: Nonescapable Types - #28 by Joe_Groff.
I'd actually take this one step further: I think we should consider adding withSpan
to the standard library. It provides clear scoping for the lifetime of the span, fits in with existing APIs cleanly, and offers a direct safe replacement to a swath of heavily-used unsafe APIs. As much as I'd like to use a property solution (span
or storage
or whatever we call it), it is a very different model to what we've had in Swift for a decade and it pushes developers further down the path of having to understand the lifetime model than withSpan
.
Doug
I think any differences between the API of this type and that of UnsafeBufferPointer
should be clearly justified.
If we think that Span
is a great name for this concept and clearly worth breaking with precedent over, then we should be proposing renaming UnsafeBufferPointer
to UnsafeSpan
and so on. We can separately decide whether to deprecate the old name.
Similarly, if there are great new APIs for accessing the elements of a Span
, those APIs should also generally be available on UnsafeBufferPointer
; this proposal should not be a vehicle for offering a better API to only Span
. Conversely, every API on UBP
should also be available on Span
unless it would clearly undermine the safety story of Span
. I understand that we can't yet make Span
conform to Collection
, and that's a fair justification for some amount of missing API; that should clearly be a long-term goal of this effort, however.
we should be proposing renaming
UnsafeBufferPointer
toUnsafeSpan
and so on
I think that would be a mistake.
First of all, I don't think we should get to have second thoughts on the names of fundamental types of the stdlib, and we should especially not start messing with unsafe constructs. (The last time we did something like that was SE-0370. I deeply believe that the drawbacks/disruption of the assign
→ update
API changes vastly outweighed their minor benefits.)
More interestingly though, my position is that naming these new types "spans" does not actually break with precedent -- the new constructs are different enough that they deserve a clean separation, even in terminology.
I don't think I've seen this spelt out anywhere, but the stdlib appears to consistently use the term "buffer" to mean a region of memory with unspecified (i.e., client-managed) initialization state. UnsafeBufferPointer
and ManagedBuffer
/ManagedBufferPointer
do not (in general*) make type-level assumptions on which of their slots contain initialized values -- managing that is entirely outside their mandate, and they expect code that uses them to do custom bookkeeping to prevent misuse. This makes these constructs crucial for building efficient data structures (each of which come with well defined, but incompatible ways of performing that bookkeeping), but it also covers them in a thick layer of ambiguity and danger that makes it wildly inappropriate to use them as a way to exchange or transfer data. I look at the existing types as primarily useful for representing "storage", or data in stasis.
* UnsafeBufferPointer
's Collection
conformance is a bit of an outlier -- that conformance is only valid if the buffer is fully initialized. This unenforced requirement matches the similarly unenforced/unchecked initialization requirements on most mutable buffer pointer operations, though -- it's just unusual because it is on an entire conformance, not on individual operations.
In contrast, the Span
type we propose here -- as well as the MutableSpan
and OutputSpan
variants we've provisionally named -- all come with very clear expectations/guarantees on the initialization state of the region of memory they're representing. These requirements make these types much narrower than our standard buffer types; in exchange, our expectation is that spans will become the standard way to uniformly represent data in motion -- data that is being accessed, or copied/moved/consumed.
Spans tame the unstructured storage buffers by imposing order over (parts of) them. I'd like to continue referring to the "storage buffer" of a Deque
to mean the raw slab of its allocated memory, containing both initialized and uninitialized parts. But I'd also like to start talking about exposing the "spans" of a Deque
, as in providing direct access to its (potentially discontiguous) initialized parts.
I'd expect a hypothetical UnsafeSpan
and UnsafeMutableSpan
to still come with the expectation that their memory must always be fully initialized. (And I'd imagine an UnsafeOutputSpan
to have exactly as many items initialized at its start as its count
.) In particular, I would not expect an UnsafeMutableSpan
type to come with anything like UnsafeMutableBufferPointer
's deinitializeElement(at:)
operation, as it would violate the type's crucial invariants. The Unsafe
prefix would imply either the lack of bounds checking in production builds, and/or the possibility of the span surviving its owner.
On the other hand, I'd expect a (safe) Buffer
or BufferPointer
type to come with some sort of built-in way to keep track of the initialization state of its slots. (We did consider introducing such a type, using an out of line bitmap of initialized slots. I think such a type would be of marginal utility at best: the need to allocate/maintain the bitmap makes it wildly impractical outside the rare cases that already expect one, such as certain forms of hash tables.)
I think the "span" vs "buffer" terminological distinction is legitimate -- I consider these words to mean different things, each of which is important enough to deserve a name.
Similarly, if there are great new APIs for accessing the elements of a
Span
, those APIs should also generally be available onUnsafeBufferPointer
; this proposal should not be a vehicle for offering a better API to onlySpan
.
I fully agree with this! Luckily, the bare minimum API that this proposal offers does not include "great new APIs for accessing the elements of a Span
" that don't also exist on UnsafeBufferPointer
. It merely provides a couple of subscripts (very similar to UBP
), and it engages in the same sort of desperate handwaving about how they work as SE-0437 did about UnsafeBufferPointer
's subscript.
(SE-0437:) Note that the generalized indexing subscript cannot provide a regular getter, as that would work by returning a copy of the item -- so the Standard Library currently has to resort to an unstable/unsafe language feature to provide direct borrowing access. (This isn't new, as we previously relied on this scheme to optimize performance; but its use now becomes unavoidable. Defining a stable language feature to implement such accessors is expected to be a topic of a future proposal.)
(SE-0447:) Note that we use a
_read
accessor for the subscript, a requirement in order toyield
a borrowed non-copyableElement
(see "Coroutines".) This will be updated to a final syntax at a later time, understanding that we intend the replacement to be source-compatible.
As I noted earlier, we are now certain that the coroutine based _read
accessor does not in fact have the right semantics for these subscripts; stabilizing it may be useful elsewhere, but it will not help Span
(nor UnsafeBufferPointer
). The fact is that we do not currently have the means to properly implement these subscripts. When it becomes possible to do so, I do expect Unsafe*BufferPointer
and Span
to both immediately adopt whatever solution we'll come up with. (If the solution will be stable enough to propose to Swift Evolution, I expect the document to formally require these subscripts to adopt it.)
Span
's proposed unchecked
subscript is new; it arises from the strong desire for the default span subscript to be fully checked, even in production builds. I think it would very much be desirable for UnsafeBufferPointer
to follow the same model; unfortunately, changing how much validation its existing subscript performs would be a massively disruptive change, and we appear to have no appetite for it at this time. (The (reasonable) worry is that enabling bounds checking would lead to a sudden, significant performance drop throughout all Swift code that gets recompiled with the new stdlib.)
We can still consider adding an explicitly unchecked subscript to the Unsafe*BufferPointer
types; however, given that the existing unlabeled subscript is already unchecked, this would be more misleading than useful.
Adding to Karoy's Span
vs UnsafeBufferPointer
distinctions... Span
does away with strict aliasing, as would a hypothetical UnsafeSpan
. For decoding use cases, it's convenient to be able to convert the element type:
Span<T: BitwiseCopyabe>.unsafeView<U: BitwiseCopyable>(as type: U.Type) -> Span<U>
We should always encourage the least unsafe form of an operation. A direct conversion like this is far safer than forcing programmers to use more complicated and powerful unsafe APIs, which are easy to misuse.
The UnsafeBufferPointer
family of types should be relegated to the C interop layer. The "pointer" terminology is a good way to associate a type with C. "Pointer" should not appear in any Swift types unless they are intended for modeling C pointers. By making it clear that pointer types are only intended for calls to C, it's more understandable that they have C-like semantics.
I also think it's a mistake to use the "Unsafe" prefix as an umbrella for multiple forms of unsafety. If we do have an UnsafeSpan
, it should only actually be unsafe in the one way in which it is obviously unsafe: unenforced temporal lifetime. It could more accurately be called EscapableSpan
.
The
UnsafeBufferPointer
family of types should be relegated to the C interop layer. The "pointer" terminology is a good way to associate a type with C. "Pointer" should not appear in any Swift types unless they are intended for modeling C pointers. By making it clear that pointer types are only intended for calls to C, it's more understandable that they have C-like semantics.
I of course second what you wrote, but this passage isn't right -- while C interop is indeed an important use case for Unsafe*Pointer
(although classic C has no use for Unsafe*BufferPointer
), however, this is not at all the only use case for these types. They have crucial uses cases even in code that never interacts with C, and I expect many of those use cases to remain in place indefinitely, even after we've fully achieved every goal on the performance predictability roadmap.
In particular, I do not expect span types to ever be suitable to serve as the underlying storage representation of in-memory data structures. High-performance data structure implementations have a genuine need to be built on top of mutable buffers (in the stdlib sense) that don't try to dictate any particular initialization strategy over their storage slots. The Hypoarray
illustration in SE-0437 (which I am hoping to pitch for real in the near(ish) future, provisionally renamed RigidArray
) is a really basic example of the kind of UnsafeMutableBufferPointer
use case that I do not expect to ever get replaced by spans. I also do not believe we'll figure out how to (usefully) build escapable types out of nonescapable parts; none of the possible approaches seem workable. (I expect there will also remain some legitimate general need for Swift code to allocate raw uninitialized memory, although its fair to expect most of these will disappear if/when we manage to ship a rich enough selection of high-performance data structures.)
Importantly, these UnsafeBufferPointer
use cases are hidden deep within the internal implementation details of high-performance data structure implementations (or other safe constructs); I do not expect they will ever need to rise to the level of their public API surface.
I do expect spans to entirely replace our current misuse of Unsafe*[Buffer]Pointer
in public APIs such as Sequence.withUnsafeContiguousStorageIfAvailable
or Array.init(unsafeUninitializedCapacity:initializingWith:)
.
I do expect spans to entirely replace our current misuse of
Unsafe*[Buffer]Pointer
in public APIs such asSequence.withUnsafeContiguousStorageIfAvailable
orArray.init(unsafeUninitializedCapacity:initializingWith:)
.
Mmm, I'm curious about your thinking particularly on the latter. Earlier you said the key invariant of Span
types is that their memory is always fully initialized, but of course the purpose of Array(unsafeUninitializedCapacity:initializingWith:)
is to get you to that point from uninitialized memory.
Span<T: BitwiseCopyabe>.unsafeView<U: BitwiseCopyable>(as type: U.Type) -> Span
I still think this API should require AnyBitPattern
and not just BitwiseCopyable
. Bool
is BitwiseCopyable
, but creating one from 0x02..<0xFF
is instant UB and that's a bit too unsafe even for an unsafeView
.
The same could be said about any BitwiseCopyable type that maintains invariants.
Yes, but most other BitwiseCopyable
types with invariants only have UB when you perform certain operations on them after the invariants are broken, in the case of Bool
(or any other enum) merely existing in an invalid state is UB.
That would imply this API could never be used with a struct that contained a Bool stored property. I don’t think that’s tenable.
That's not enitely true, you would just have to make sure the bytes corresponding to the Bool
were valid first. Also, you can already hit this UB by rebinding the buffer from withUnsafe{Mutable}Bytes(of:_:)
or with unsafeBitCast(_:to:)
. This is already something you should not be doing.
Importantly, these
UnsafeBufferPointer
use cases are hidden deep within the internal implementation details of high-performance data structure implementations (or other safe constructs); I do not expect they will ever need to raise to the level of their public API surface.
Good point. It's not worth introducing an UninitializedSpan
type just to handle some implementation details that very few people will encounter. I'm mainly concerned with types that appear in public API.
I would appreciate @Andrew_Trick’s input on whether the existence of memory bound to Bool
having a value other than 0x0
or 0x1
is undefined behavior. If that’s true, is Bool
the only type with this very restrictive property? Or would the existence of a byte bound to String.UTF8View.Element
with its high bit set also immediately invoke nasal demons?
I don’t think it would ever be practical to say that code must introspect an opaque byte buffer for all of the bytes a Bool
(or any other “special” type) might be constructed from at the risk of invoking UB. The behavior of trying to initialize a Bool
from an arbitrary byte should be well defined.