[pitch] Add `Span`-providing Properties to Standard Library Types

glessard · November 20, 2024, 2:37pm

This proposes the addition of computed properties to standard library types, to provide direct read-only access to their internal storage via Span and RawSpan.

Introduction

We recently introduced the Span and RawSpan types, but did not provide ways to obtain instances of either from existing types. This proposal adds properties that vend a lifetime-dependent Span from a variety of standard library types, as well as vend a lifetime-dependent RawSpan when the underlying element type supports it.

Motivation

Many standard library container types can provide direct access to their internal representation. Up to now, it has only been possible to do so in an unsafe way. The standard library provides this unsafe functionality with closure-taking functions such as withUnsafeBufferPointer(), withContiguousStorageIfAvailable() and withUnsafeBytes(). These functions have a few different drawbacks, most prominently their reliance on unsafe types, which makes them unpalatable in security-conscious environments. Closure-taking API can also be difficult to compose with new features and with one another. These issues are addressed head-on with non-escapable types in general, and Span in particular. With this proposal, compatible standard library types will provide access to their internal representation via computed properties of type Span and RawSpan.

Proposed solution

Computed properties returning non-escapable copyable values represent a particular case of lifetime relationships between two bindings. While initializing a non-escapable value in general requires lifetime annotations in order to correctly describe the lifetime relationship, the specific case of computed properties returning non-escapable copyable values can only represent one type of relationship between the parent binding and the non-escapable instance it provides: a borrowing relationship.

For example, in the example below we have an instance of type A, with a well-defined lifetime because it is non-copyable. An instance of A can provide access to a type B which borrows the instance A:

struct A: ~Copyable, Escapable {}
struct B: ~Escapable, Copyable {
  init(_ a: borrowing A) {}
}
extension A {
  var b: B { B(self) }
}

func function() {
    var a = A()
    var b = a.b // access to `a` begins here
    read(b)
    // `b` has ended here, ending access to `a`
    modify(&a)  // `modify()` can have exclusive access to `a`
}

If we were to attempt using b again after the call to modify(&a), the compiler would report an overlapping access error, due to attempting to mutate a (with modify(&a)) while it is already being accessed through b's borrow. Note that the copyability of B means that it cannot represent a mutation of A; it therefore represents a non-exclusive borrowing relationship.

Given this, we propose to enable the definition of a borrowing relationship via a computed property. With this feature we then propose to add storage computed properties to standard library types that can share their internal typed storage, as well as bytes computed properties to those standard library types that can safely share their internal storage as untyped memory.

Detailed Design

A computed property getter of an Escapable type returning a non-escapable and copyable type (~Escapable & Copyable) establishes a borrowing lifetime relationship of the returned value on the callee's binding. As long as the returned value exists (including local copies,) then the callee's binding is being borrowed. In terms of the law of exclusivity, a borrow is a read-only access. Multiple borrows are allowed to overlap, but cannot overlap with any mutation.

By allowing the language to define lifetime dependencies in this limited way, we can add Span-providing properties to standard library types.

Extensions to Standard Library types

The standard library and Foundation will provide storage and bytes computed properties. These computed properties are the safe and composable replacements for the existing withUnsafeBufferPointer and withUnsafeBytes closure-taking functions.

extension Array {
  /// Share this `Array`'s elements as a `Span`
  var storage: Span<Element> { get }
}

extension Array where Element: BitwiseCopyable {
  /// Share the bytes of this `Array`'s elements as a `RawSpan`
  var bytes: RawSpan { get }
}

Please see the full list of extensions in the evolution pull request, or in the rendered document here.

Source compatibility

This proposal is additive and source-compatible with existing code.

ABI compatibility

This proposal is additive and ABI-compatible with existing code.

Implications on adoption

The additions described in this proposal require a new version of the Swift standard library and runtime.

Alternatives considered

Adding `withSpan()` and `withBytes()` closure-taking functions

The storage and bytes properties aim to be safe replacements for the withUnsafeBufferPointer() and withUnsafeBytes() closure-taking functions. We could consider withSpan() and withBytes() closure-taking functions that would provide an quicker migration away from the older unsafe functions. We do not believe the closure-taking functions are desirable in the long run. In the short run, there may be a desire to clearly mark the scope where a Span instance is used. The default method would be to explicitly consume a Span instance:

var a = ContiguousArray(0..<8)
var span = a.storage
read(span)
_ = consume span
a.append(8)

In order to visually distinguish this lifetime, we could simply use a do block:

var a = ContiguousArray(0..<8)
do {
  let span = a.storage
  read(span)
}
a.append(8)

A more targeted solution may be a consuming function that takes a non-escaping closure:

var a = ContiguousArray(0..<8)
var span = a.storage
consuming(span) { span in
  read(span)
}
a.append(8)

During the evolution of Swift, we have learned that closure-based API are difficult to compose, especially with one another. They can also require alterations to support new language features. For example, the generalization of closure-taking API for non-copyable values as well as typed throws is ongoing; adding more closure-taking API may make future feature evolution more labor-intensive. By instead relying on returned values, whether from computed properties or functions, we build for greater composability. Use cases where this approach falls short should be reported as enhancement requests or bugs.

Giving the properties different names

We chose the names storage and bytes because those reflect what they represent. Another option would be to name the properties after how they represent what they do, which would be span and rawSpan. It is possible the name storage would be deemed to clash too much with existing properties of types that would like to provide views of their internal storage with Span-providing properties. For example, the Standard Library's concrete SIMD-conforming types have a property var _storage. The current proposal means that making this property of SIMD types into public API would entail a name change more significant than simply removing its leading underscore.

Allowing the definition of non-escapable properties of copyable non-escapable types

The particular case of the lifetime dependence created by a property of a copyable non-escapable type is not as simple as when the parent type is escapable. There are two possible ways to define the lifetime of the new instance: it can either depend on the lifetime of the original instance, or it can acquire the lifetime of the original instance and be otherwise independent. We believe that both these cases can be useful, and therefore defer allowing either until there is a language annotation to differentiate between them.

dnadoba · November 20, 2024, 4:13pm

storage is usually mutable (for mutable collections at least).

Will they also expose a MutableSpan at some point? What would that property be called?

glessard · November 20, 2024, 4:16pm

Yes, there will be a MutableSpan, and properties providing it would be mutableStorage. I didn't repeat the future directions from the SE-0447, but maybe I should add this one!
[edit: added future directions here]

We don't have a way to overload over a quality such as mutability, so we must have a different property name to return Span vs. returning MutableSpan.

benrimmington · November 20, 2024, 6:40pm

Your previous proposal had:

extension String {
  var utf8Span: UTF8Span { _read }
}

extension Substring {
  var utf8Span: UTF8Span { _read }
}

extension UTF8Span {
  var storage: Span<UInt8> { get }
}

Your current proposal has:

extension String.UTF8View {
  var storage: Span<Element> { get }
  var bytes: RawSpan { get }
}

extension Substring.UTF8View {
  var storage: Span<Element> { get }
  var bytes: RawSpan { get }
}

If UTF8Span is still planned, should it be considered as an alternative?

Could the bytes property be moved alongside the existing withUnsafeBytes method?

extension Span where Element: BitwiseCopyable {
  var bytes: RawSpan { get }
}

For example:

myString.utf8.storage.bytes
myString.utf8.storage.withUnsafeBytes { … }

Should EmptyCollection and StaticString be included in this proposal?

glessard · November 20, 2024, 6:48pm

UTF8View is still planned and will be pitched separately.

StaticString could vend Span/RawSpan, similar to UTF8View.
EmptyCollection could return nil-based, empty spans.

The idea of moving the bytes property from each individual type to instead be a property of Span where Element: BitwiseCopyable is quite interesting. It would reduce the amount of new API significantly. This being said, that is precisely the case of the property declaration that is excluded by the proposal: it would return a copyable non-escapable type from another copyable non-escapable type. We happen to know that it should copy/inherit the lifetime of the providing instance, but every other case we've defined works differently: the returned instance borrows the providing instance.

dnadoba · November 20, 2024, 6:51pm

Span and MutableSpan can only give access to the fully initialized part of Arrays storage, not the entire allocated region. Will there also be a property that returns an OutputSpan (or similar) instead of MutableSpan that gives access to the initialized elements but also allows appending to the end of the initialized portion up to the limit of the allocated region?

Also, storage is likely to be confusing for existing users of Array. The public documentation of Array today calls the entire allocated region it's storage. We would need to find a new name for it and update it's documentation.

glessard · November 20, 2024, 6:55pm

Yes, the step after MutableSpan will be OutputSpan.

dnadoba · November 20, 2024, 6:58pm

What will that property be called?

glessard · November 20, 2024, 7:04pm

My expectation is that using OutputSpan will as a parameter received in a closure, because there is a significant amount of implementation-specific housekeeping and cleanup to be done as part of changing the size of a data structure. The possible patterns are (1) closure-taking functions, or (2) each type providing their own OutputSpan wrapper to delegate initialization. (2) seems worse than (1).

Karl · November 20, 2024, 8:45pm

The shape I would like for OutputSpan is something like:

var someArray = Array(0..<10)

inout output = someArray.insertionSpan(at: 4, expectedCount: 6 /* optional */)

output += 42
output += -15...-10

_ = consume output
print(someArray)
// [0, 1, 2, 3, 42, -15, -14, -13, -12, -11, -10, 4, 5, 6, ...]

Ideally, there would be methods to handle appending, insertion at an index, and replacement of a subrange. We'd hold the cursor in a mutating borrow (inout), but we could otherwise use it in our regular control flow (including using it across await suspension points and yield in coroutine accessors).

I wonder if this kind of spelling could work, too:

inout output = someArray.beginInsertion(at: 4)

// someArray cannot be read here, because we began a mutating borrow.

output.end() // consumes 'output', ending the mutating borrow

grynspan · November 21, 2024, 12:08am

We want to adopt RawSpan in Swift Testing's experimental attachments feature, in particular in a protocol that various types will conform to. However, because our minimum deployment target on ABI-stable platforms is generally lower than whatever aligns with the current Swift release, we are unable to add a protocol requirement that produces a RawSpan. The current (incomplete) protocol specifies a withUnsafeBufferPointer-shaped requirement instead.

Do you have any advice for us or others in this predicament?

grynspan · November 21, 2024, 12:09am

What about elements instead of storage?

benrimmington · November 21, 2024, 8:09am

Character.UTF8View will have the proposed APIs, because it's an alias for String.UTF8View.
Unicode.Scalar.UTF8View doesn't store code units, but should it be included in the proposal?
Should fixed-width integer types be included? Their bytes and words would have different element orders, so myInt.words.bytes would be "mixed-endian".

glessard · November 21, 2024, 9:50am

elements is not bad; in fact that's the label we have for the provisional/underscored initializers. It might have a better equivalence with bytes.

lukasa · November 21, 2024, 10:19am

This looks great! One quick extension: should we be taking this opportunity to define one or more protocols to which these types conform? It's been a source of annoyance for some time that ContiguousBytes is defined in Foundation and not in the Standard Library, forcing a proliferation of protocols with the same shape. Should we consider nipping this in the bud?

Two protocols come to mind. One matches ContiguousBytes, something like (name deliberately bad to avoid people thinking I have a good idea for it)

protocol Bytesable {
    var bytes: RawSpan { get }
}

The other probably belongs as the natural cousin of withContiguousStorageIfAvailable on Sequence:

protocol Sequence {
    // existing protocol definition above
    var storageIfAvailable: Span<Element>? { get }
}

extension Sequence {
    @_alwaysEmitIntoClient
    var storageIfAvailable: Span<Element>? { nil }
}

glessard · November 21, 2024, 10:31am

UnicodeScalar.UTF8View generates its code units on the fly on each access, so there is no memory address to attach to. It could have been implemented by generating the code units up front and storing them, and then we could have a Span over them. While it's slightly disappointing, it's not clear whether this should be considered a problem.

I wouldn't want to put these properties on every individual FixedWidthInteger type; this might be a good use for CollectionOfOne's bytes property.
The Words view explicitly orders its elements from least to most significant, rather than in their in-memory order, so it is not Spannable.

glessard · November 21, 2024, 10:39am

We would like to consider it, but the issues we listed in SE-0447 aren't yet sufficiently resolved to do it. The accessors road map that was discussed in the "modify and read accessors" pitch thread is part of the solution. The other part, namely allowing protocol associated types with suppressed protocol requirements, will be solved as part of the generalized containers effort.

grynspan · November 21, 2024, 5:59pm

While we're here, the stdlib doesn't have a public HasContiguousBytes protocol which makes working with potentially discontiguous collections a potential perf footgun. It'd be great if we could introduce a new protocol to cover "types that provide these spans at O(1)".

glessard · November 21, 2024, 6:01pm

That’s in the plans as well.

filip-sakel · November 21, 2024, 10:55pm

Is that in some vision document? Because the recent ownership-related features/types don’t feel very connected yet.