[Pitch] MutableSpan

glessard · February 10, 2025, 5:01pm

Hello everyone! Continuing on from adding Span/RawSpan (SE-0447) and the computed properties that return them (SE-0456), this is a pitch for MutableSpan and MutableRawSpan, helper types to delegate mutations of exclusively-borrowed memory.

Motivation

Many standard library container types can provide direct access to modify their internal representation. Up to now, it has only been possible to do so in an unsafe way. The standard library provides this unsafe functionality with closure-taking functions such as withUnsafeMutableBufferPointer() and withContiguousMutableStorageIfAvailable().

These functions have a few different drawbacks, most prominently their reliance on unsafe types, which makes them unpalatable in security-conscious environments. We continue addressing these issues with MutableSpan and MutableRawSpan, new non-copyable and non-escapable types that manage respectively mutations of typed and untyped memory.

In addition to the new types, we will propose adding new API some standard library types to take advantage of MutableSpan and MutableRawSpan.

Proposed solution

We introduced Span to provide shared read-only access to containers. The natural next step is to provide a similar capability for mutable access. Mutability requires exclusive access, per Swift's [law of exclusivity][SE-0176]. Span is copyable, and must be copyable in order to properly model read access under the law of exclusivity: a value can be simultaneously accessed through multiple read-only accesses. Exclusive access cannot be modeled with a copyable type, since a copy would represent an additional access, in violation of the law of exclusivity. We therefore need a non-copyable type separate from Span in order to model mutations.

MutableSpan

MutableSpan allows delegating mutations of a type's contiguous internal representation, by providing access to an exclusively-borrowed view of a range of contiguous, initialized memory. MutableSpan relies on guarantees that it has exclusive access to the range of memory it represents, and that the memory it represents will remain valid for the duration of the access. These provide data race safety and temporal safety. Like Span, MutableSpan performs bounds-checking on every access to preserve spatial safety.

A MutableSpan provided by a container represents a mutation of that container, via an exclusive borrow. Mutations are implemented by mutating functions and subscripts, which let the compiler statically enforce exclusivity.

MutableRawSpan

MutableRawSpan allows delegating mutations to memory representing possibly heterogeneously-typed values, such as memory intended for encoding. It makes the same safety guarantees as MutableSpan. A MutableRawSpan can be obtained from a MutableSpan whose Element is BitwiseCopyable.

Extensions to standard library types

The standard library will provide mutableSpan computed properties. These return lifetime-dependent MutableSpan instances, and represent a mutation of the instance that provided them. These computed properties are the safe and composable replacements for the existing withUnsafeMutableBufferPointer closure-taking functions. For example,

func(_ array: inout Array<Int>) {
  var ms = array.mutableSpan
  modify(&ms)        // call function that mutates a MutableSpan<Int>
  // array.append(2) // attempt to modify `array` would be an error here
  _ = consume ms     // access to `array` via `ms` ends here
  array.append(1)
}

These computed properties represent a case of lifetime relationships not covered in [SE-0456][SE-0456]. In SE-0456 we defined lifetime relationships for computed property getters of non-escapable and copyable types (~Escapable & Copyable). We propose defining them for properties of non-escapable and non-copyable types (~Escapable & ~Copyable). A ~Escapable & ~Copyable value borrows another binding; if this borrow is also a mutation then it is an exclusive borrow. The scope of the borrow, whether or not it is exclusive, extends until the last use of the dependent binding.

Please see the proposal documents for the detailed design:
[Pitch] MutableSpan and MutableRawSpan

glessard · February 10, 2025, 5:27pm

One area requiring more experimentation is the question of slicing MutableSpans, especially for bulk copies. There are two dimensions to this question: the slicing syntax itself and the nature of mutating functions. We have already established an approach for slicing for generalized buffers in SE-0437's extracting() methods. This seems like a reasonable approach for MutableSpan as well, but the compiler's reasoning about exclusivity depends on mutating functions and var bindings. The extracting() approach would mean that every copy into a sliced mutable span would require an explicit var binding. We are considering alternatives to mitigate this ergonomics issue.

Karl · February 10, 2025, 6:20pm

2 things:

I'm guessing that the mutating get disallows reassignment - if so, that's great!

I've used the mutable wrapper view pattern extensively in some of my libraries via the _modify accessor, and one of its major limitations has been that users can reassign the wrappers:
```
// I want this to work
url.pathComponents += ["usr", "bin"]

// But not this
urlA.pathComponents = urlB.pathComponents
```
In this example, each of the .pathComponents views really contains the storage of the entire URL (it's just a view), so the assignment is actually equivalent to urlA = urlB and affects all parts of the URL.

And this is a general problem. I remember looking at swift-system's FilePath, which also uses mutable wrappers, and it also has to consider this (in FilePath's case, you could assign an empty components object via path.components = .init() and lose the existing path root, so it defensively copies that data in to a local variable). I think some of the types in swift-collections have similar guards.

So if a mutating get returning a non-escaping, non-copyable type is a viable replacement for that pattern which avoids the reassignment problem, then I'm an enthusiastic +1 on it.

However, I would like to know if the same rule could be scaled to functions.
Can we add (Mutable)Span accessors to ManagedBuffer?

I can't construct either of these types by myself, and my data is stored in a ManagedBuffer. I believe it should be possible to do something like this... (written in browser):
```
extension ManagedBuffer {

  /// Important: the elements in the range must be initialized.
  ///
  subscript(elements: Range<Int>) -> Span<Element> {
     get {
       // The elements pointer is actually safe to escape in this case.
       let ptr = withUnsafeMutablePointerToElements { $0 }
       let buf = UnsafeBufferPointer(base: ptr + elements.lowerBound, count: elements.count)
       return buf.span
     }
  }
}
```
And something similar for MutableSpan. It relies on escaping a pointer value in defiance of its documentation, so it would be better if the standard library did this.

(Again, if these could be done as functions rather than subscripts, that would be even better)

Basically, as these proposals go on, it's becoming clear that libraries like mine which use mutable wrapper views will really want to migrate from the existing strategy of _modify and moving storage around (including moving it back in the cleanup phase), to a new strategy where we store borrows in non-escaping types and return them from these new coroutine-less accessors. I wasn't anticipating that part of moving to Span would involve moving from coroutine accessors and eliminating that post-yield cleanup phase, but it seems like this new approach has some significant benefits.

But if I want to eliminate that cleanup phase I need those borrows available where I store my data.

glessard · February 10, 2025, 6:47pm

Yes, mutating get disallows reassignment. Functions that would take an inout binding don't disallow reassignment, though the link to the original value would likely be severed. This is related to the slicing and mutating-functions issue mentioned above.

I'm afraid not, since ManagedBuffer makes no guarantee about the initialization state of its storage. We have been thinking about what to do about ManagedBuffer; it is an interesting and significant issue. I hope that OutputSpan will help in that regard (that's the next pitch.)

fclout · February 12, 2025, 5:31pm

I left a small number of spelling-related comments on the PR. More substantively:

It's surprising to me that there is a start index argument on bulk-update properties, but no end index, especially with the repeating: overload. It also seems to mean you can't update a part of a span with a part of another span (at least, not using standard-library-defined methods).

You can pass an Array to a function that takes an UnsafePointer/UnsafeMutablePointer using &array as the argument. Does that also work with MutableSpan?

Is the byteOffsets property on MutableRawSpan basically indices?

MutableRawSpan is automatically Sendable, but the constraints on unsafeLoad and storeBytes is that the element has to be BitwiseCopyable. Does BitwiseCopyable imply Sendable? I think it does, but I wanted to verify with you.

jrose · February 12, 2025, 8:05pm

Pointers are bitwise-copyable, so, no. (Pointers are also mostly non-Sendable to make you stop and think about what you’re doing, but still.)

glessard · February 12, 2025, 10:11pm

We don't have any syntax sugar around Span or MutableSpan at the moment.

There is a clear usability gap we need to address here. We found that the old slicing semantics (involving shared indices) didn't work well with Span, and the attempted replacement (the extracting() functions) doesn't work well with MutableSpan due to all its operations being mutating functions. Having a some RangeExpression<Index> parameter is a possibility here, but that approach was rejected in favor of slicing during in the run-up to Swift 3; I hope that this proposal doesn't end up having to overturn such an old precedent.

fclout · February 12, 2025, 11:45pm

Yeah, then I think there's a problem with the conformance of MutableRawSpan to Sendable. MutableRawSpan being Sendable allows you to ferry a MutableSpan of pointers to another concurrency domain. Seems like either the conformance has to go, or the mutableRawSpan property must be conditional on Element being Sendable, and unsafeLoad/storeBytes on MutableRawSpan must also require their T to be Sendable.

glessard · February 12, 2025, 11:47pm

~~It is already that way: ```swift extension MutableSpan: @unchecked Sendable where Element: Sendable {} ```~~

~~It's unchecked because the compiler sees a pointer as the internal representation, and that isn't Sendable.~~

I see what you meant now. That is a good point. We should do the same for Span.

dnadoba · February 12, 2025, 11:48pm

glessard:

func(_ array: inout Array<Int>) {
  var ms = array.mutableSpan
  modify(&ms)        // call function that mutates a MutableSpan<Int>
  // array.append(2) // attempt to modify `array` would be an error here
  _ = consume ms     // access to `array` via `ms` ends here
  array.append(1)
}

Would the access to ms implicitly end at the last use or is an explicit consume required? e.g. would that be valid:

func(_ array: inout Array<Int>) {
  var ms = array.mutableSpan
  modify(&ms)        // call function that mutates a MutableSpan<Int>
  // does the access end here?
  array.append(2) // if yes, is this valid?
}

glessard · February 12, 2025, 11:49pm

The access does end after the last use of ms, but the implicitness does not make it easy to write an example.

benrimmington · February 13, 2025, 9:25am

Should there be (conditional) extensions to the SIMD vector types?

extension SIMD{2,3,4,8,16,32,64} where Scalar == {U,}Int{,8,16,32,64} {
  var mutableSpan: MutableSpan<Scalar> { mutating get }
}

extension SIMD{2,3,4,8,16,32,64} where Scalar == Float{16,32,64} {
  var mutableSpan: MutableSpan<Scalar> { mutating get }
}

fclout · February 13, 2025, 5:35pm

Related but probably for another proposal, it would be really convenient to find out what it takes to safely turn a Span<Scalar> to a Span<SIMD##<Scalar>> and vice-versa, and make it easy. This is common in projects like codecs that contain a lot of unsafe code.

Karl · February 13, 2025, 9:17pm

The one major hangup that I have about this whole design is the separate Span and MutableSpan types. I believe I mentioned it briefly during a previous review.

This decision would propagate down in to any mutable view types which wanted to make use of spans. For example, currently String exposes unicodeScalarsView - and on mutable strings you can mutate through it. There is one view type, but it acts like a value.

let constStr = "foo"
constStr.unicodeScalars.first       // "f"
constStr.unicodeScalars.append("t") // ❌ Error

var mutStr = "foo"
mutStr.unicodeScalars.first         // "f"
mutStr.unicodeScalars.append("t")   // ✅ OK - because the string is a 'var'
print(mutStr) // "foot"

If the UnicodeScalarsView were being written using Span and MutableSpan, we would need separate UnicodeScalarsView and MutableUnicodeScalarsView types, vended by separate .unicodeScalars and .mutableUnicodeScalars properties.

That's more code for library authors to write, test, and maintain. The type split is also more awkward for clients of the library and may require additional types (protocols) to paper over.

I can't help but feel that this is a regression, and that spans should act more like values than references.

glessard · February 13, 2025, 10:04pm

The String and UnicodeScalars case works because they are copyable, escapable, and (as a bonus) also share their underlying representation. UnicodeScalars isn't a mere view over a String: it is a first-class type that shares its representation with String. (It is unfortunate that this fact has been thoroughly deëmphasized.)

Span and MutableSpan are different beasts: they know nothing of the implementation of the type that vends them. They simply know how to represent a portion of memory, and are very much, unavoidably, reference types.

As for unifying them, the types they aim to mostly replace (UnsafeBufferPointer and UnsafeMutableBufferPointer) have the same dichotomy even without the safety, so I don't see this as a regression.

As I noted in the proposal text, a read-only Span wouldn't be particularly usable if it required exclusive access, but that would be the price to pay to somehow get both read-only and mutating access to the same region of memory. In Rust there is also a dichotomy between & a[0..2] and &mut a[0..2]; the syntax looks similar, but the underlying implementation is as separate as Span and MutableSpan.

I expect to make a prototype of MutableSpan available next week.

mpangburn · February 13, 2025, 10:45pm

I trust the authors here if separate Span and MutableSpan types are required (even if the naming pattern evokes comparison to Objective-C, which might be undesirable).

Is it useful to define any protocols over the *Span family of types for generic programming? I see var span: Span<Element> { borrowing get } as a way to access a read-only view here, so maybe a protocol has little added value; I'm wondering if your explorations in other members of the family (e.g. OutputSpan mentioned upthread) has revealed any patterns though.

glessard · February 13, 2025, 10:50pm

Protocols involving generalized containers (including the Span family) are planned. This involves changing a lot of machinery in the generics system. Those changes are partially done but notably have so far excluded non-copyable and non-escapable types from being used as associated types. For the generalized container protocols to work, however, we need to be able to use non-copyable and/or non-escapable Element types (primary associated type,) as well as (probably) use Span/MutableSpan as other associated types.

Karl · February 13, 2025, 11:20pm

I don't know what the Rust thing is about, but I'm not suggesting you would need exclusive access for a read-only Span.

If I understand you correctly (and I might not), I think your concern boils down to the fact that they both use get accessors. In other words, you're concerned that this won't fly:

extension Array {
  var span: Span<Element> {
    get { /* does NOT require exclusive access, returns readonly span */ }
    mutating get { /* requires exclusive access, returns read-write span */ }
  }
}

And if that's the case, I think we should look at our accessors design again to make it possible. Or use different accessors.

Types today already do it (without relying on their ability to escape), so I'm not sure why doing it in a non-escaping type would be impossible.

glessard · February 13, 2025, 11:45pm

We decided to defer extensions to SIMD for the moment. As you noted a few weeks ago, it seems actually possible to conform a type to SIMD without contiguous storage. We would have to define the extensions on the concrete types themselves, which is possible but somewhat onerous. We will therefore defer the SIMD-specific extensions (including the ones proposed in SE-0456) until later. We may end up defining them on the concrete types, or we may fix the SIMD protocol (it really should require contiguous storage,) or we may take advantage of InlineArray instead.

Alejandro · February 14, 2025, 12:08am

Karl:

I don't know what the Rust thing is about, but I'm not suggesting you would need exclusive access for a read-only Span.

If I understand you correctly (and I might not), I think your concern boils down to the fact that they both use get accessors. In other words, you're concerned that this won't fly:
extension Array {
  var span: Span<Element> {
    get { /* does NOT require exclusive access, returns readonly span */ }
    mutating get { /* requires exclusive access, returns read-write span */ }
  }
}
And if that's the case, I think we should look at our accessors design again to make it possible. Or use different accessors.

Types today already do it (without relying on their ability to escape), so I'm not sure why doing it in a non-escaping type would be impossible.

Let's ignore the accessor returning the span for a moment because it's largely irrelevant. Considering the following code:

func something(array: borrowing [Int]) {
  var s = array.span
  s[0] = 123
}

This code is illegal because we only have read only access to the array. If we didn't have MutableSpan, how does Span, the type, know to disallow this? If we assume the subscript on Span looks something like:

subscript(_ index: Int) -> Element {
  // or _read
  get {
    ...
  }

  // or _modify
  set {
    ...
  }
}