[pitch] Add `Span`-providing Properties to Standard Library Types

glessard · November 21, 2024, 11:05pm

There hasn’t been a vision document about the container protocols. There is a draft in the “future” branch of the swift-collections repository.

Douglas_Gregor · November 22, 2024, 10:27pm

I prefer storage to elements, because storage is more specific about what you're getting: a handle to the underlying storage. The less-specific elements could just mean "give me a collection of all of the elements".

I love this proposal as-is

Doug

Philippe_Hausler · November 23, 2024, 12:17am

Per the adoption of Data; the type originally had a typed memory binding to UInt8 since its providence was that of NSData. However, since in the evolution of Swift happened after the introduction of Data that behavior was then changed to favor the un-typed bag of bytes (and not per-se bound UInt8's). This means that we ideally should only offer the raw accessors to promote Data's un-bound nature.

So +1 for the raw var bytes: RawSpawn { get } but -1 for the var storage: Span<UInt8> iiuc the differential correctly.

glessard · November 23, 2024, 12:18am

Makes sense. I will make that change.

Philippe_Hausler · November 23, 2024, 12:19am

I presume that if we are incorrect in that case we could offer it later with little effort. Right?

glessard · November 23, 2024, 12:21am

Yeah. Even if we don't it will remain really easy to go to a Span<UInt8> from the bytes, by virtue of UInt8 being BitwiseCopyable. It would be one of the types that would be BitwiseLoadable if we make that layout constraint happen.

grynspan · November 23, 2024, 1:13pm

Perhaps some combination of the words? storedElements?

dnadoba · November 23, 2024, 7:02pm

I think we should take a step back and think about how the protocols look like and what semantics they have as @lukasa suggested. Even if we don't introduce it just yet because of current compiler limitations, it is still useful to do the thought exercise and see if the semantics and names are a good fit.

I think this is why it shouldn't be called storage because you don't get a handle to the full underlying storage, but instead a potential subset of the underlying storage.

This is the case for Array as it only gives you access to its initialized storage.
Data is a potential slice of a larger Data and would only give access to a sub sequence of the full storage. In fact, this is the case for most SubSequences e.g. ArraySlice and the various types wrapped in Slice<>.

The term storage is often used as an implementation detail that is usually not exposed and is not a 1:1 mapping to what is proposed here. I think elements is more fitting, better than storage but I don't love elements either just yet.

Alejandro · November 23, 2024, 7:38pm

Can we not just call this .span?

glessard · November 23, 2024, 7:44pm

Yeah, but that’s just repeating the type signature; it doesn’t say what it is, just how it’s presented.

Alejandro · November 23, 2024, 8:04pm

It says exactly what you're getting from the API. It may be repeating the type signature, but at the use site you're rarely going to be typing Span out explicitly anyway:

let s = arr.span
foo(arr.span)
bar(s)
print(s[0])

// And in the not so distant future...
let ms = arr.mutableSpan
ms[0] = 123
ms.shuffle()

Dmitriy_Ignatyev · November 26, 2024, 9:42am

It is an Objc-C style design – NSMutableString, NSMutableArray...
What is the reason to do it in such a way when having mutable methods in swift?

glessard · November 26, 2024, 4:53pm

This is simply about the law of exclusivity. Span borrows read-only memory and, as such, supports multiple simultaneous read-only accesses. A "mutable borrow" is a writeable access that must be exclusive. The exclusivity requirement can be modeled nicely in the type system with non-copyability. From this, it follows that a mutable borrow must be modeled a with its own type, MutableSpan.

Nevin · November 26, 2024, 4:56pm

I’m not weighing in with an opinion here, but I want to point out that “can be modeled” does not imply “must be modeled”.

One could imagine a world where the compiler recognizes the difference between var x: Span and let x: Span.

glessard · November 26, 2024, 5:04pm

I didn't say "must be modeled", but it definitely "can be modeled". The current compiler cannot do it another way as far as I can tell. The world where a non-owning reference type puts different borrowing requirements whether its binding is let or var is neither current nor near-future Swift.

xwu · November 26, 2024, 6:28pm

As no doubt you know, we have done some reckoning about what it "means" to use let versus var in the context of Swift atomics, so there is some precedent for fiddling with these.

Since we've already contemplated concessions to mark a type as "must-never-be-var" in that context, extending that design so that a paired type can be marked as "must-never-be-let" and having those known to the compiler as duals of each other might not be as far-fetched as at first blush.

...if it is wise to do so.

Alejandro · November 26, 2024, 6:58pm

This is just not how these types work. Span and MutableSpan are reference types, they are not value types in the general sense (yes, they themselves are values, but they are closer to UnsafeBufferPointer and UnsafeMutableBufferPointer than anything else). A type vending a var span: Span { get } getter is providing read only access to some contiguous piece of memory which may or may not be mutable. Consider a type which gives you a span over some constant memory in the binary; this memory is not mutable whatsoever and if we treated these things like values types (i.e. provide mutable accessors on Span) then it's just fundamentally incorrect:

// s points to some __TEXT,__const memory
// e.g. [0x0, 0x1, 0x2]
var s = myType.span
s[0] = 123 // NOT OK

Another reason why we need two separate types is that getting a Span from some type is a read only access i.e. requires a regular { get }. If Span provided mutable accessors then this would require a { mutating get } (because we need to signal to the compiler that mutations on MutableSpan will directly mutate whatever type/container/parent vended you the MutableSpan) which is obviously not always available if you don't have a mutable reference to the type vending you the span:

func something(
  with arr: borrowing SomeNoncopyableArray<Atomic<Int>>
) {
  var s = arr.span
  s[0] = Atomic(123) // NOT OK

  // what we really want:
  let ms = arr.mutableSpan // error: cannot mutate 'arr'
  var s = arr.span
  s[0] = Atomic(123) // error: Span.subscript is not mutable
}

Here we have a borrowing read access of some hypothetical noncopyable array containing some atomic integers. We do not have exclusive access to this array and thus we cannot mutate it otherwise we will run into undefined behavior. Therefore, we need to distinguish Span from MutableSpan because we can provide read only getters (get) that don't require exclusive access (aka we don't need a var reference or an inout reference to some type to access some var span: Span { get }) which prevents mutation when it is either 1. unwelcome or 2. disallowed completely.

Karl · November 27, 2024, 2:01pm

Yeah this is a known performance problem with UnicodeScalar.UTF8View - that it encodes the entire scalar again each time you read a byte. Unfortunately it's also @frozen and entirely @inlinable so we can't easily change that. However, it's trivial to implement a span view by just encoding the scalar once in to a fixed-width integer/stack buffer and yielding a span over it.

It's still a constant-time operation, and doesn't allocate any heap memory, so I don't see any problem at all if we were to encode in the .span accessor - only huge benefits for users of the type.

I believe the idea is that eventually these spans will form the backbone of a replacement to the Collection protocol hierarchy. Given that, I think we should take this opportunity to address the flaws in these Collection conformances.

The UnicodeScalar.UTF16View is slightly less problematic as the encoding is simpler, but I think it should also get a .span, implemented in the same way as the UTF8View.

EDIT: Godbolt comparison of a for loop over UTF8View vs. eagerly encoding using withUTF8CodeUnits. The latter generates significantly less code. This is what I'm suggesting the .span view provide.

glessard · November 28, 2024, 4:34pm

Adding a storage property to UnicodeScalar.UTF8View would require the storage property to be a coroutine. We would prefer these accessors to be borrowing, and not allow them to allocate temporary storage.

glessard · December 20, 2024, 7:05pm

I have made changes to the proposal, rounding out explanatory text as well as rounding out the cases where a ~Escapable & Copyable value (like Span or RawSpan) can be returned by a computed property. In particular we can now return a computed RawSpan from a Span, enabling Ben Rimmington's suggestion of moving the bytes properties to Span rather than adding them to every type that can provide a Span.

The following paragraph is added to "detailed design":
"A computed property getter defined on a non-escapable and copyable (~Escapable & Copyable) type and returning a ~Escapable & Copyable value copies the lifetime dependency of the callee. The returned value becomes an additional borrow of the callee's dependency, but is otherwise independent from the callee."

This capability is interesting to reduce the amount of API added by this proposal, and is also necessary for a future generalization of Optional, in particular its unsafelyUnwrapped property (and the ! operator.)

Thanks to the above, Span gains the following property:

extension Span where Element: BitwiseCopyable {
  /// Share the raw bytes of this `Span`'s elements
  var bytes: RawSpan { get }
}

With this change we do not plan to add other bytes computed properties to standard library types.

The updated document is here. Updated sections of interest are "Detailed Design", including a discussion of performance, as well as expanding on alternatives considered and future directions.