SE-0447: Span: Safe Access to Contiguous Storage

Jumhyn · September 23, 2024, 8:34pm

Of course, Span on its own doesn't really attempt to convey this non-owning-ness either.

Alejandro · September 23, 2024, 8:52pm

No, but like @glessard mentioned earlier there are other languages using this name for the same construct at least in C++ and C# (Rust names this slice, but it has a special syntax in source).

David_Smith · September 23, 2024, 9:03pm

Yeah, this is what I was getting at with "under this ontology". If we understand this to be a kind of buffer then the extra words describe what kind. If we understand it as its own standalone thing, then the separate word for that thing can have this meaning.

grynspan · September 23, 2024, 9:45pm

I think this API will be useful. I think we should look at deprecating some of the buffer pointer API on other types that, in a safe world, would use spans instead. If somebody really needs a buffer pointer, they can get it from the span.

I am a bit bothered by RawSpan as a different type. I understand that it mirrors our pointer types, but it does add complexity to the API surface. As a strawman… Span<Never>? It's sort of saying "this is a Span, and you can never access its elements [because it doesn't have any because it's untyped]"? (I'm not actually advocating for Span<Never> here. Weird. Just trying to think about how we could avoid a separate RawSpan type.)

glessard · September 23, 2024, 9:54pm

Something to consider is that this is the first of a family of types. We need to add a mutable variant (for delegating mutations), and an output-only variant (for delegating initialization safely), at least. The names for these could be MutableSpan and OutputSpan. Span definitely has the merit of being short and nearly particle-like.

j-f1 · September 23, 2024, 10:00pm

I actually like the idea of Span<Never> — it would handily make all of the typed APIs unavailable, and you could put the untyped APIs directly onto the typed type so there’s no need to upcast.

glessard · September 23, 2024, 10:08pm

We had initially considered putting the unsafeLoad(as:) and related API on Span<UInt8>, but that does two things that don't sit well: it hides useful API in an unusual place (in extensions specific to an element type), and raises the question of what's special about UInt8? RawSpan has an advantage of not relying on generic specialization for good performance.

John_McCall · September 23, 2024, 10:08pm

You could certainly add special operations to Span<Never>, but you'd also need to remove / specialize all of the normal operations that are specified to work with values of the element type. The operation sets are basically totally independent.

ellie20 · September 23, 2024, 10:30pm

I think an interesting direction would be to create safe pointer types with names directly corresponding to the unsafe ones. Not only BufferPointer and RawBufferPointer, but also Pointer, MutablePointer, etc.

It could be helpful to give the safe and unsafe pointer types the same APIs to facilitate switching between the two, like how CheckedContinuation and UnsafeContinuation have the same APIs. A safe Pointer type could have a pointee property consistent with the UnsafePointer type, instead of the property being named something different like referent.

I wonder to what extent the association of "pointer" with unsafety is inherent and to what extent it is changeable. It could be that C has irreversibly associated "pointer" with unsafety, but maybe Swift can overcome the association. In Rust, "reference" refers to safe pointers. In C++, "reference" refers to unsafe pointers that are automatically dereferenced, and "pointer" refers to both unsafe pointers and safe(-ish) smart pointers like std::unique_ptr. In Java, object references are referred to as both "pointers" (in NullPointerException) and "references" (in WeakReference). In Cyclone, "pointer" refers to both safe and unsafe pointers.

lorentey · September 24, 2024, 12:17am

I strongly oppose naming this core type something clumsy like BufferPointer. Please let's not do that. Spans aren't "buffers"; they also aren't pointers to buffers.

Span is an extremely strong name; it has a punch that befits the importance of this abstraction. It also happens to be in preexisting use in the same context.

Names this good are rare; when we find one, we need to recognize it as such as and celebrate our good luck.

lorentey · September 24, 2024, 12:47am

extension Span where Element: ~Copyable {
  public subscript(_ position: Int) -> Element { _read }
}
Note that we use a _read accessor for the subscript, a requirement in order to yield a borrowed non-copyable Element (see "Coroutines".) This will be updated to a final syntax at a later time, understanding that we intend the replacement to be source-compatible.

This is not exactly correct. We now know that _read accessor coroutines will not suffice for element access -- the final form of Span's subscripts will need to use very different semantics than what is being promised here.

The elements of a span are guaranteed to exist as long as the span does. We'll need the subscript accessors to fully embrace this fact: the borrows of the elements that we produce through subscript access need to have precisely the same lifetime dependencies as the span itself.

(Read accessors are allowed to materialize their result on demand; so the borrows we get from them are necessarily tied to the specific access to the span itself. This is far too complicated and it results in unusably narrow lifetime dependencies. To be able to build useful abstractions around spans (and borrowing access in general), we need to model direct accesses as direct accesses.)

We've been assuming that a coroutine-based (or unsafeAddress-based) Span subscript will be source-compatible with the eventual borrowing accessor construct (which does not exist yet). However, this is just an assumption; it should be called out in the proposal, and it deserves critical review.

lorentey · September 24, 2024, 1:58am

API nitpicks:

Index validation members

extension Span where Element: ~Copyable {
  public func boundsContain(_ index: Int) -> Bool
  public func boundsContain(_ indices: Range<Int>) -> Bool
  public func boundsContain(_ indices: ClosedRange<Int>) -> Bool
}

As the proposal admits, these are not good names. We should not add badly named interfaces to the Swift Standard Library.

I suggest we should remove these members from the Span types. Alternatively, I suggest giving them the following names:

extension Span where Element: ~Copyable {
  public func isValidIndex(_ index: Int) -> Bool
  public func isValidSubrange(_ subrange: Range<Int>) -> Bool
  public func isValidSubrange(_ subrange: ClosedRange<Int>) -> Bool
}

extension RawSpan {
  public func isValidByteOffset(_ offset: Int) -> Bool
  public func isValidByteOffsetRange(_ subrange: Range<Int>) -> Bool
  public func isValidByteOffsetRange(_ subrange: ClosedRange<Int>) -> Bool
}

The ClosedRange overloads are rubbing me the wrong way -- why do we need those? Should we also have a generic isValidSubrange that takes any RangeExpression? I think I'd be happier if we didn't have any of these.

Update: I've now remembered that I already suggested these in public, way back in June. This makes me a bit confused.

If we can only find "not ideal" names for these predicates, then I don't see why we would want to propose them as public API. After all, no preexisting standard container comes with such operations. The proposal does not justify why they'd need to be unique to Span, but neither does it mention the obvious alternative: that we'd need to add similar predicates to the UnsafeBufferPointer types, too.

I therefore suggest that these operations be removed, deferring them to a potential future proposal that properly introduces them in the general stdlib, rather than isolating their discussion to a single type. Clearly, we are not ready to do that in this proposal.

(Any adopters of Span types will be able to validate indices and offsets the obvious way -- by manually checking that they're within 0 ..< count. This is similar to how we are currently forced to implement index validation in code that deals with UnsafeBufferPointer.) (End of update.)

Structural containment checks

I second the idea of removing isWithin(_:), subsuming it into indicesWithin(_:).

However, I think the proposed behavior of the indicesWithin(_:) method is quite inconsistent with preexisting API design practice for container types. It is proposed to be a method on a container type that returns the container's index type, but the indices returned belong to some other container, not self. This is highly irregular. I believe it to be without precedent in/near the Standard Library, and I do not think we'd want to establish it as a viable pattern.

I suggest to turn the indicesWithin method inside out (swapping its self with its argument), and rename it to subrange(of:):

extension Span where Element: ~Copyable {
  /// Returns the index range where the memory of `span` is located within
  /// `self`, or `nil` if `span` is not a subspan of `self`.
  ///
  /// - Parameters:
  ///    - span: a potential subspan of `self`
  /// - Returns: A range of indices within `self` corresponding to `span`, or `nil` if there aren't any.
  /// - Complexity: O(1)
  public func subrange(of span: borrowing Self) -> Range<Int>?
}

indices(of:) would also be a viable alternative spelling for this.

This suggests byteOffsets(of:) to be the name of the analogue method on RawSpan:

extension RawSpan {
  public func byteOffsets(of span: borrowing Self) -> Range<Int>?
}

On `indices`

I would not automatically expect a hypothetical future container protocol that supports noncopyable conforming types (and elements) to come with an indices property.

FWIW, the very early draft we have for such a protocol does not come with one.

Part of the problem is that there is a fundamental need to avoid clashing with the indices property in Collection. Some types will want to conform to both protocols, so the noncopyable-flavored protocol can only come with indices if it can be made fully compatible with Collection's requirements. Unfortunately, the new protocol's indices type would need to be nonescapable, which makes this unlikely to work: it seems highly unlikely that Collection will ever be generalized to support nonescapable Indices associated types.

(The default value of this new, noncopyable-flavored Indices associated type would need to be a type that holds a borrow of the original generic noncopyable container. Borrows are inherently nonescapable, and so a type that contains a borrow would itself need to be nonescapable. Note that we cannot express stored borrows in today's Swift, not even with experimental feature flags.)

Therefore, while it is not entirely impossible that a new container protocol would provide something equivalent to indices, but (1) it would probably need to come with a different name, and (2) it would also require language constructs that do not exist yet.

I think this makes it a bad idea to ship Span with an indices property today. Its implementation would be effectively 0 ..< self.count; we do not urgently need a named property to avoid typing that. Once the dust has settled a bit on the new container protocols, then we may revisit this issue. (For example, if the container protocols end up not having anything like an indices view, then that makes Span free to add one.)

The lack of a practical use case

I am of course fully on board with adding the Span/RawSpan types to the Standard Library. But I question the wisdom of proposing them now: without initializers, and without any other way to produce actual span instances, the proposed span types aren't going to be actually usable for anything in practice.

Therefore, I think it would be pointless to actually add Span and RawSpan to the Standard Library in their proposed partial form. This proposal is incomplete and useless without crucial followup work that (1) properly defines/explains lifetime dependencies and (2) that introduces the missing span initializers. SE-0447 should not ship in a Swift release without that followup work.

At the moment, this proposal is a didactic crutch at best. It is going to allow the lifetimes proposal to tie its concepts to concrete motivating code examples that will be far more useful than the usual half-baked foo/bar illustrations. Those examples will breath life into Span, giving it its true purpose. (At worst, it encourages us to make premature, but binding API decisions without fully understanding the ultimate role (and unique constraints of) this type. It does not help that the proposal (for whatever reason) appears to intentionally propose suboptimal names for much of what little API it does define. )

The proposal suggests that it would be possible to implement withSpan/withBytes methods without the need to reason about lifetime dependencies. I don't think that is true -- in fact, I believe that such higher order functions establish far more complicated lifetime dependencies than a function that directly returns a Span. We currently lack the means to reason about these dependencies; and I don't believe it would be okay to introduce API that Swift developers cannot properly understand. (This issue also applies to the subscripts that are in the normative part of this proposal, as well as the UnsafePointer.pointee and the UnsafeBufferPointer.subscript generalizations we shipped in SE-0437. All of these have deep semantic problems, especially with noncopyable pointees/elements.)

rvsrvs · September 24, 2024, 1:17pm

I'm trying to get caught up on how Span will work and am confused about the provided toolchain.

Given that all initializers for Span are dependent on lifetime annotations which being proposed separately, are there flags in the toolchain I can turn on which would allow me to actually create and use Span in code? Or should be only be reviewing API here? I'd much prefer the former as I have specific use cases in mind I'd like to try out.

Jumhyn · September 24, 2024, 1:19pm

The provided toolchains should have the withSpan API on Array. As noted at top-of-thread, the withSpan API is not actually being proposed, but is provided as a way to create a Span and play around with it.

rvsrvs · September 24, 2024, 1:22pm

Ah, I had misunderstood. I will defer a review. The use cases I have in mind need lifetime annotations. Thanks @Jumhyn !

orobio · September 26, 2024, 11:52am

For public use, it doesn’t seem to be very common then.

There also is @unchecked Sendable, but in that case ‘unsafe’ was deliberately not used, because @unsafe Sendable could give the impression that it would be unsafe to send. The unchecked subscript API is definitely unsafe to use, so that is a different situation.

I can imagine it could be good to be more specific about the type of unsafety, but I agree with @tbkka that it can be convenient to be able to grep for ‘unsafe’. Additionally, it is a clear trigger for users and code reviewers. Ideally, Swift would have the concept of an unsafe function, like Rust, and force visibility in that way; then the argument label becomes less important.

glessard · September 26, 2024, 8:32pm

I feel they're similar usages: the safety of an @unchecked Sendable conformance relies on a promise that necessary invariants aren't violated; it could involve undefined behaviour if the isolation isn't done correctly. The unchecked: subscript relies on a promise that you know the offset you're passing is in bounds; if it is, then it's perfectly safe; otherwise an out-of-bounds access could involve undefined behaviour.

Note that the unchecked: subscripts will have an @unsafe annotation, as per the "WarnUnsafe" experimental feature.

xwu · September 26, 2024, 8:51pm

This feature has not yet been discussed or pitched ;)

glessard · September 26, 2024, 8:52pm

Indeed not yet.

orobio · September 26, 2024, 8:52pm

They are very similar, but the subtle difference is that in one case it’s the author of the API that makes the promise, and in the other case it’s the user of the API.

I hadn’t heard of this, but it sounds promising!

SE-0447: Span: Safe Access to Contiguous Storage

Index validation members

Structural containment checks

On indices

The lack of a practical use case

On `indices`