[Pitch] Generalize ContiguousBytes to support Span et al

Hi all,

The ContiguousBytes protocol is there to describe types that can vend storage into their underlying bytes, such as unsafe buffer pointers, arrays, and Data. However, the current design of ContiguousBytes does not work with the Span family of types, because it assumes that the Self type is both Copyable and Escapable. This proposal generalizes ContiguousBytes to support non-copyable and non-escapable types, makes InlineArray and the various Span types conform to it, and provides a safe counterpart to the withUnsafeBytes requirement of ContiguousBytes.

I propose updating ContiguousBytes to support Span et al by changing its definition to this:

public protocol ContiguousBytes: ~Escapable, ~Copyable {
    /// Calls the given closure with the contents of underlying storage.
    ///
    /// - note: Calling `withUnsafeBytes` multiple times does not guarantee that
    ///         the same buffer pointer will be passed in every time.
    /// - warning: The buffer argument to the body should not be stored or used
    ///            outside of the lifetime of the call to the closure.
    func withUnsafeBytes<R>(_ body: (UnsafeRawBufferPointer) throws -> R) rethrows -> R

    /// Calls the given closure with the contents of underlying storage.
    ///
    /// - note: Calling `withBytes` multiple times does not guarantee that
    ///         the same span will be passed in every time.
    func withBytes<R, E>(_ body: (RawSpan) throws(E) -> R) throws(E) -> R
}

... and then making the Span, MutableSpan, RawSpan, MutableRawSpan, UTF8Span, and InlineArray types conform to it.

Full proposal is here. Implementation is here.

Thoughts?

Doug

18 Likes

Seems straightforwardly good. It occurs to me that we should probably figure out some sort of overarching strategy for this and withContiguousStorageIfAvailable in terms of which one is “preferred” (i.e. checked-for first) in situations where we’re dynamically discovering which fast paths are available.

Thanks for thinking through this! Overall, this looks like a good direction to me in order to make the existing ContiguousBytes protocol better in a world where we’re moving towards span types. I think that there’s still discussion to be had that the span proposals alluded to around what new APIs should be using to accept bytes as input (since ContiguousBytes has its issues as well and we may want to move to a world where APIs just use RawSpan directly / there is some automatic conversion to RawSpan). That being said, I don’t think that precludes us from improving ContiguousBytes via your proposal here to help in cases where ContiguousBytes is already used today. Overall just a few general questions:

Availability

Will these new declarations (either the requirement or the default implementation come with availability)? I assume the default implementation we can define as @_alwaysEmitIntoClient and therefore wouldn’t come with availability, but what about the protocol requirement - because the default implementation is always available will that not have availability either, or are there availability constraints on Darwin platforms to keep in mind for adoption here?

withUnsafeBytes Typed Throws Adoption

Your implementation PR also updates the ContiguousBytes.withUnsafeBytes requirement to adopt typed throws, but I don’t think I see that mentioned in the proposal. Just to clarify, is that also a proposed change here, and are there any source or ABI compatibility impacts when doing so that we should be aware of?

OutputSpan

Should we also be adding a conformance to Output(Raw)Span? It’s simple enough to take an output span and call .span to get a Span which would conform, but the same argument can be said for types like InlineArray and UTF8Span which are gaining a conformance here, so would it be consistent to include OutputSpan?

This is great!

I have a minor nit with the generic signature for withBytes() especially if it is to be codified in ABI: we should make sure that the order of the parameters in the angle brackets is the same as the order in the declaration. This has prevented us from updating the syntax of some stdlib functions to use anonymous generic types (some Protocol) in the past. I have no idea what syntax updates could occur here, but let’s not let this mistake occur again!

I imagine we have withBytes instead of a borrowing bytes property because we can’t have these yet in protocols?

1 Like

There’s that, but also because a correct default implementation can’t be written.

1 Like

I believe we should actually take this opportunity to move the protocol to the standard library. It is fundamentally valuable to be able to tell in advance if a particular sequence (or other value) has contiguous storage, which enables various fast paths. Having the protocol up in Foundation forces a dependency on a rather large library.

1 Like

The Span, MutableSpan, and InlineArray types will conditionally conform to ContiguousBytes when the element type is UInt8, just like Array and Unsafe(Mutable)BufferPointer already do:

Should the new (and existing) conformances use Element: BitwiseCopyable instead? (Trivial is mentioned in some FIXME comments.)

The protocol requirement needs to have availability constraints for the Swift 6.3 runtime (or wherever this lands), as do the new conformances of Span et al to ContiguousBytes. Everything else follows the availability of the types, i.e., Swift 6.2-aligned for InlineArray and Swift 5.1-aligned for the back-deployment targets of Span et al.

The pull request only does this for Embedded Swift, where we need to use typed throws. I didn't think we needed to cover that in the proposal.

I had forgotten that OutputSpan had a span property for the already-initialized values. I'll add it.

I don't think this applies to any of the generic parameters I've added; they all need to be referenced in multiple places and none of the protocols here have primary associated types.

It's also that some types that conform to ContiguousBytes couldn't provide a borrowing bytes property even if we could express it now. See the first section of Alternatives Considered for more.

I agree that having to pull in Foundation(Essentials) to get this protocol and its conformances is a fairly heavy dependency. However, I think this protocol might be an evolutionary dead-end, and that more APIs should take RawSpan directly rather than a ContiguousBytes-conforming protocol.

They can't, because BitwiseCopyable is a marker protocol and one cannot define a conditional conformance whose requirements involve a marker protocol (because we could not match them at runtime).

I suppose that the concrete withBytes I've been adding to all of these types could be use the more-general BitwiseCopyable constraint, although I don't know if that buys us all that much when the span property is right there.

Doug

1 Like

Well, then why enhance it? :slight_smile:

(On that note, is there a generic way to express “this type has a bytes: RawSpan or a span: Span<Element> property? Perhaps that's the protocol we need here?)

What happens if you conform a noncopyable type to this protocol and backwards deploy your program to an older Apple OS? It seems like that might result in weird behavior, and we cannot prevent that statically because we don’t yet have a way to express an availability requirement that says “this protocol has a Self: Copyable associated requirement before OS version N”.

Because some existing APIs are already using it, and it's easier to adopt Span if those APIs can be made to work with it in an ABI- and source-compatible way vs. introducing new ones. Maybe that's enough reason to sink it down into the standard library even in its somewhat-regrettable form.

We don't have a good way to do that yet. It's the API I would want a standard library protocol to provide.

The only problematic scenario I know of is if you have an existing API in the system that uses ContiguousBytes and makes a copy of values of that type, then you generalize it with @abi:

@abi(func encrypt<Bytes: ContiguousBytes>(_ bytes: Bytes) -> [UInt8])
func encrypt<Bytes: ContiguousBytes>(_ bytes: Bytes) -> [UInt8] 
    where Bytes: ~Copyable, Bytes: ~Escapable {
 ... 
}

The safer thing to do, as an owner of an ABI-stable library like this, would be to keep the Copyable requirement but still generalize to Bytes: ~Escapable. You'll lose out on passing a MutableSpan directly into the API (it's span is okay), but it prevents any possibility of runtime issues. I'll update the discussion to capture this, thanks!

Doug

1 Like

I wanted to raise exactly this question, so thanks for addressing this already. I was under the impression that RawSpan is our new currency type going forward, and APIs should start accepting that. It feels backwards if we expect new APIs to accept ContiguousBytes instead of RawSpan. However, I understand the motivation to make existing users of this protocol be able to re-spell their implementation on top of RawSpan than on top of unsafe pointers. Could we create a new API design guideline that recommends the usage of RawSpan and Span<Element> over using unsafe pointers or this protocol?

I have been working on a lot of asynchronous code recently that leverages spans, and one limitation of a lot of our new APIs is that they only provide the span in a synchronous closure, e.g. the new APIs that provide an OutputSpan on InlineArray or Array. This makes it impossible to compose with asynchronous code. Do we want to add an async variant for the proposed RawSpan API here as well?

Why should we limit this to only Embedded? The thrown error is rethrown from the user provided closure. We have updated many APIs already throughout the stdlib and ecosystem to use typed throws in these cases since it makes composability with typed throws across the ecosystem easier and there is no downside such as locking in a concrete error type. If such a low-level protocol starts to provide dialects based on Embedded/non-Embedded it is going to grow the #if conditions throughout the ecosystem.

1 Like

This is a good idea. For the moment, I've added some documentation to ContiguousBytes into the PR for this change that nudges folks toward RawSpan. We'll need something more comprehensive for the API design guidelines.

The source-compatibility story for adding an async variant here isn't good, because you'd have to build it on top of the synchronous withUnsafeBytes.

We don't know how to make this change to an existing protocol in a manner that maintains both source and binary compatibility. Embedded needs this change because it doesn't support untyped throws, and it's possible in Embedded because there's no ABI.

The right answer here is to move to using withBytes, which is typed-throws everywhere and avoids this problem, while being safer.

Doug

1 Like

It's tangential to this pitch, but I think the language needs a way to express "this protocol requirement is deprecated, use this one instead" that, at compile time, also automatically provides one or the other default implementations. Swift Testing has a similar problem that came up recently with its Attachable protocol taking an UnsafeRawBufferPointer instead of a RawSpan, but it's non-trivial to update the relevant protocol without either a) breaking existing adopters or b) having withUnsafeBytes() and withBytes() infinitely recurse through each other.

So I'm wondering if, for a pitch like this to succeed, we need a migration mechanism that isn't also a footgun.

Edit: None of my comments in this thread should be taken to mean we shouldn't move folks to RawSpan, of course. :smiley:

2 Likes

So what's been special-cased for Hashable, but more general?

1 Like

You mean this? Yeah, I'd like to be able to express that here too.

1 Like