[Pitch] Safe Access to Contiguous Storage

glessard · February 6, 2024, 11:03pm

We are proposing a new abstraction for safe, direct access to memory: StorageView. This is the proposal formerly known as BufferView, and is a companion to Non-Escapable Types and to Lifetime Dependency Annotations for Non-escapable Types, which were jointly pitched yesterday by @tbkka.

Introduction

We introduce StorageView<T>, an abstraction for container-agnostic access to contiguous memory. It will expand the expressivity of performant Swift code without giving up on the memory safety properties we rely on: temporal safety, spatial safety, definite initialization and type safety.

In the C family of programming languages, memory can be shared with any function by using a pointer and (ideally) a length. This allows contiguous memory to be shared with a function that doesn't know the layout of a struct being used by the caller. A heap-allocated array, contiguously-stored named fields or even a single stack-allocated instance can all be accessed through a C pointer. We aim to create a similar idiom in Swift, with no compromise to memory safety.

Motivation

Consider for example a program using multiple libraries, including base64 decoding. The program would obtain encoded data from one or more of its dependencies, which could supply it in the form of [UInt8], Foundation.Data or even String, among others. None of these types is necessarily more correct than another, but the base64 decoding library must pick an input format. It could declare its input parameter type to be some Sequence<UInt8>, but such a generic function significantly limits performance. This may force the library author to either declare its entry point as inlinable, or to implement an internal fast path using withContiguousStorageIfAvailable() and use an unsafe type. The ideal interface would have a combination of the properties of both some Sequence<UInt8> and UnsafeBufferPointer<UInt8>.

Proposed solution

StorageView will allow sharing the contiguous internal representation of a type, by providing access to a borrowed view of a span of contiguous memory. A view does not copy the underlying data: it instead relies on a guarantee that the original container cannot be modified or destroyed during the lifetime of the view. StorageView's lifetime is statically enforced as a lifetime dependency to a binding of the type vending it, preventing its escape from the scope where it is valid for use. This guarantee preserves temporal safety. StorageView also performs bounds-checking on every access to preserve spatial safety. Additionally StorageView always represents initialized memory, preserving the definite initialization guarantee.

By relying on borrowing, StorageView can provide simultaneous access to a non-copyable container, and can help avoid unwanted copies of copyable containers.

The full proposal can be found here:
Safe Access to Contiguous Storage
It depends on the capabilities pitched in Non-Escapable Types and Lifetime Dependency, and the related discussions:

tbkka · February 6, 2024, 11:05pm

I'll go update my writeup to use the new name. I am so excited about this!

mickeyl · February 8, 2024, 1:07pm

I like this a lot. Very well written and comprehensive, by the way. Will the primary use of this be to ease interfacing with C-style languages or is it rather targeted at implementing high performance algorithms that rely on contiguous storage in Swift?

glessard · February 8, 2024, 4:13pm

The primary use is for high-performance swift code. The C interop affordances set up a pattern so that adopters of StorageView don’t have to explicitly add interop API to their type.

dmt · February 9, 2024, 1:16am

@glessard Could you please clarify some aspects from the detailed design.

What exactly these two inits do? Why is there an additional generic parameter and why its name is the same as in the outer scope - Element ?

  public init<Element: BitwiseCopyable, Owner>(
    unsafeBytes: UnsafeRawBufferPointer, as type: Element.Type, owner: borrowing Owner
  ) -> borrow(owner) Self

  public init<Element: BitwiseCopyable, Owner>(
    unsafeRawPointer: UnsafeRawPointer, as type: Element.Type, count: Int, owner: borrowing Owner
  ) -> borrow(owner) Self

Why is there a mix of functions that work with Index and byte offsets?

  public func load<T: BitwiseCopyable>(
    fromByteOffset: Int = 0, as: T.Type
  ) -> T

  public func load<T: BitwiseCopyable>(
    from index: Index, as: T.Type
  ) -> T

  public func loadUnaligned<T: BitwiseCopyable>(
    fromByteOffset: Int = 0, as: T.Type
  ) -> T

  public func loadUnaligned<T: BitwiseCopyable>(
    from index: Index, as: T.Type
  ) -> T

It seems strange, because this API is designed around Element, but these two functions accept raw byte offset. We can achieve the same with StorageView<UInt8>, or a dedicated type representing raw untyped data. Maybe a better alternative would be to provide a function to reinterpret view as some other type. In this case StorageView<UInt8>. Something like

let array: [UInt64] = ...
let value: UInt16 = array.storageView
  .view(
    of: 1..., // Range<StorageViewIndex<UInt64>>
    as: UInt8.self
  )
  .load(
    from: 42 // StorageViewIndex<UInt8>
    as: UInt16.self
  )

The reading ability in this API works nice for cases when data is homogenous and is stored in the native for executable format. But this is not always the case. For example, consider reading a sequence of integers of different width and encoded using another endianness. For building things like binary format parsers this is highly necessary feature.

Sajjon · February 9, 2024, 7:23am

Why is ~Escapable, Copyable means that StorageView is NOT Escapable but IS Copyable?

public struct StorageView<Element: ~Copyable & ~Escapable>
: ~Escapable, Copyable {
  internal var _start: StorageViewIndex<Element>
  internal var _count: Int
}

Why the asymmetry with the bounds of Element which is neither Escapable nor Copyable? Naively I would think that the bounds on the Element would match the bounds on the wrapping type - much like e.g. Array is Equatable if its Element is Equatable (same for Sendable, Hashable, Codable etc...). So this asymmetry confused me. Or did I get it wrong (I'm not yet too used to ~ syntax rules)? Does ~Escapable, Copyable in fact mean "NOT Escapable NOR Copyable"?

glessard · February 9, 2024, 6:45pm

Yes, StorageView is copyable but not escapable. That is a usability issue, and it allows you to use normal slicing operations to access different sub-parts of your initial view. If it weren't copyable then the slicing operations would have to be either mutating or consuming operations, which would mean you'd lose access to parts of your view just because you tried to operate on another part of it.

As for Element: ~Copyable & ~Escapable, this is not a prescription. It means than "Element need be neither Copyable nor Escapable", but it can be either or both as well. If we did not specify Element like this, we would be requiring Copyable & Escapable elements.

fclout · February 9, 2024, 6:51pm

What is the interaction of StorageView with the law of exclusivity? It sounds like you would be able to get two inout references to the same object from two different StorageViews?

glessard · February 9, 2024, 6:57pm

Note that a mutable variant is left for a future proposal, in order to keep things manageable. StorageView is read-only.

Holding a StorageView is a borrow of the containing instance's binding. By the law of exclusivity, multiple simultaneous borrows (read-only accesses) are allowed, therefore multiple simultaneous StorageView instances to the same container are allowed. When a StorageView is consumed, its particular borrow ends, and only once all borrows have ended then a mutating access to the container can begin.

An eventual mutable version of StorageView would need to be non-copyable (as well as non-escapable) in order to ensure exclusive access during mutations.

glessard · February 9, 2024, 7:14pm

I wrote these wrong. They should be in an extension where Element: BitwiseCopyable; I will change that in the proposal document.

Once written correctly, I hope they're clearer. They allow you to specify the type that's in your memory (and of the returned StorageView) even if you only have an untyped raw pointer to your memory.

1-877-547-7272 · February 9, 2024, 7:42pm

I don’t like the name StorageView. IMO “Buffer” more clearly expresses that its elements are contiguously stored than just “Storage”. I also think the reasoning for the name change is weak.

While the use of the word "buffer" would be consistent with the UnsafeBufferPointer type, it is nevertheless not a great name, since "buffer" is usually used in reference to transient storage.

While many buffers are transient, I don’t personally think a thing’s transient just because it was referred to as a buffer. APIs like UnsafeBufferPointer and MTLBuffer use the term “buffer” in a similar way and it doesn’t seem to cause any confusion.

On the other hand we already have a nomenclature using the term "Storage" in the withContiguousStorageIfAvailable() function, and the term "View" in the API of String .

I think it’s notable that the withContiguousStorageIfAvailable API specifies that it’s contiguous storage, not just storage. To me, the term “storage” suggests that the sequence’s elements are stored somewhere in memory, not that they are necessarily stored contiguously.

We could call this type ContiguousStorageView, but that’s a pretty long and verbose name. I think BufferView is the most concise and clear name for this type.

glessard · February 9, 2024, 8:17pm

One of the things we would like to avoid with StorageView is to have an explicitly-untyped version. This is why the load and loadUnaligned functions are restricted to those StorageViews that contain BitwiseCopyabletypes. (BitwiseCopyable will soon be proposed, but simply it's a type that does not require deinitialization, explicitly or implicitly).

I have not added type-reinterpreting functions on whole views, but we should add them. What is uncertain to me are the restrictions on the second type; should both types be required to be BitwiseCopyable? Note it's unclear whether the aligned load() function should have that restriction; it does not on RawPointer.

This being said the existing load and loadUnaligned functions on UnsafeRawPointer and UnsafeRawBufferPointer use byte offsets exclusively, not indices. I hope that we can soon try a full prototype of StorageView with a parser, and ascertain whether either the offset approach or the index approach for load is unnecessary.

It is an interesting question whether we should restrict load and loadUnaligned to StorageView<UInt8>. If we did so, should we also add them to StorageView<Int8>? Why not on StorageView<Int16>? The answer I find most natural is to not restrict their availability unduly. They do not make sense over a view to a non-BitwiseCopyable type such as AnyObject, but is there a reason not to allow them if your viewed type is (Int, SIMD8<Int32>)?

za_creature · February 9, 2024, 8:42pm

The key safety feature is that a StorageView cannot escape to a scope where the value it borrowed no longer exists.

This is the way!

Does this also hold true for non-escapable async?

glessard · February 9, 2024, 9:37pm

Sendability and Escapability are complementary, along different axes. A non-escapable value could be passed via a borrowed parameter of a function that's in another isolation domain, but once there the non-escapability would mean that the value can't be copied for safekeeping into e.g. the storage of that function's actor.
(I'm not certain I answered the question you meant to ask, though.)

za_creature · February 9, 2024, 9:57pm

I think you did

I was concerned about async functions being conceptually @escaping and whether I would be able to do (apologies for not following the actual proposed API syntax):

await sock.send(buffer.view[0..<512])

But based on your answer (and ignoring Sendable), I guess that ~Escapable is a conceptual superset of ~Copyable?

glessard · February 9, 2024, 10:04pm

Escapability limits where a value can go. Noncopyability ensures there is only one copy, but a noncopyable value could be moved anywhere. This is why a MutableStorageView would need to be both non-copyable and non-escapable. The non-escapability ensures temporal soundness (that is accesses happen only while the memory allocation is known to be valid), while the non-copyability ensures against data races by enforcing exclusivity.

za_creature · February 9, 2024, 10:39pm

That's the pitch, but how does it work in practice?

await marks a (possibly long) suspension point, usually modeled as two calls with store respectively restore. This is semantically non-escaping but any call after await is really naturally @escaping. I guess that the compiler can take care of that since it's ~~turtles~~ Tasks all the way down.

I think that my confusion was about @escaping fn vs ~Escapable struct

Approaching the problem from a different angle (re: atomics, continuations, int a[n]), it feels like ~Escapable is both a solution and a nice pun. Since there's already some bikeshedding about the name, could you @glessard come up with a few ~~non-temporal~~* use cases where non-escapable is usefully less strict than at-fixed-memory-address?

I would definitely love to see consuming / borrowing semantics being completely orthogonal to escaping semantics

*: EDIT too many double-negations

Karl · February 11, 2024, 6:38pm

I'm also not thrilled about the name "storage view". I think that name has connotations which are not always accurate - developers may wish to expose lexically-scoped buffers in their APIs for a variety of reasons.

For instance, perhaps I want to stream the bytes of a file, giving a closure temporary access to a stack-allocated buffer:

struct File {
  func stream(_ processBytes: (StorageView<UInt8>) -> Void)
}

And I've mentioned previously that if we had generator functions in the language, one could even imagine an async iterator using this pattern.

[Pitch] Generalize `AsyncSequence` and `AsyncIteratorProtocol`

And then you'd write your iterator like so:
mutating func next<T>(
  _ yield: ([UInt8]) async throws(T) -> IterationResult
) async throws(errorUnion(Failure, T)) {

  var buffer = [UInt8]()
  while await populateBuffer(&buffer) {
    guard try await yield(buffer) == .continue else { break }
  }
}
And now that we're yielding borrows of the element rather than returning copies of it, the element can instead be a non-copyable type -- and even a non-escaping type like some kind of stack buffer.

And so if we imagine some kind of File.ChunkedBytesGenerator (or whatever you want to call it), its associated Element type would be called StorageView. But those values don't represent the storage of the file or the bytes generator. It's not really storage of anything that you'd care about; it's just a non-escapable buffer.

I also think the name is unnecessarily scary/low-level. If this is a safe type, we should give it a nicer name. I think the term "array" is friendlier for Swift developers than "buffer", so I'd suggest something along the lines of SharedArray, NonOwnedArray (or maybe even "unowned"), or BorrowedArray.

I have more to say about this (in particular the idea of a ContiguousStorage protocol - I don't love it), but for now I wanted to mention something about the name.

glessard · February 12, 2024, 7:40pm

I do think that the term "Borrowed" gets to the essence of the type. Personally I don't think we should overload the term "Array" too much. Is it a BorrowedContiguousMemorySpan? a BorrowedBuffer? just BorrowedMemory?

Jumhyn · February 12, 2024, 8:12pm

The former here feels quite wordy to me and I feel like UnsafeBufferPointer provides good precedent for 'buffer' as a simpler way to refer to this. So BorrowedBuffer SGTM.