SE-0456: Add Span-providing Properties to Standard Library Types

Douglas_Gregor · January 16, 2025, 6:15am

Hi everyone,

The review of SE-0456 " Add Span-providing Properties to Standard Library Types" begins now and runs through January 28, 2025.

In order to try this feature out, the proposal authors have provided toolchains:

Toolchain for macOS: https://ci.swift.org/job/swift-PR-toolchain-macos/1713/artifact/branch-main/swift-PR-78561-1713-osx.tar.gz
Toolchain for Linux (Ubuntu): https://download.swift.org/tmp/pull-request/78561/1217/ubuntu2004/PR-ubuntu2004.tar.gz
Toolchain for Windows: https://ci-external.swift.org/job/swift-PR-build-toolchain-windows/5655/artifact/*zip*/archive.zip

Reviews are an important part of the Swift evolution process. All review feedback should be either on this forum thread or, if you would like to keep your feedback private, directly to the review manager via the forum messaging feature. When contacting the review manager directly, please keep the proposal link at the top of the message.

What goes into a review?

The goal of the review process is to improve the proposal under review through constructive criticism and, eventually, determine the direction of Swift. When writing your review, here are some questions you might want to answer in your review:

What is your evaluation of the proposal?
Is the problem being addressed significant enough to warrant a change to Swift?
Does this proposal fit well with the feel and direction of Swift?
If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?
How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

More information about the Swift evolution process is available at https://github.com/apple/swift-evolution/blob/main/process.md .

Thanks for contributing to Swift!

Doug Gregor
Review Manager

CrystDragon · January 16, 2025, 6:53am

Thanks.

I'm still in the process of reading, but I have a quick question, will the wording in this proposal align with future proposals?

I have read the pitch about lifetime dependency, in that pitch, there's a section on "dependency inference". Can I assume the behavior of "a computed property with ~Escaping type" from this proposal falls just into the category of "a method with no parameters" from that pitch?

If this is the case, I think it could benefit if we use uniformed terms. For example, instead of saying

establishes a borrowing lifetime relationship of the returned value on the callee's binding.

how about introduce the terms about lifetime dependency first, and then just say

the computed property becomes a borrowing one, and we infer a scoped dependency

glessard · January 16, 2025, 5:45pm

The pitch for lifetime dependency will be much revised and reworded before it comes to review. We are introducing as little new nomenclature as possible in this review. An updated lifetime dependency pitch will build on this proposal.

scanon · January 16, 2025, 6:50pm

+1, this is great.

I think that I would prefer to name these properties .span rather than .storage. I think in the long term this will be clearer at the point of use, but also I'm slightly worried about existing types that have a .storage property and which would like to provide a span property idiomatically. That's not an unsolvable problem, but using the name .span (which such types are much less likely to use today) will make migration and adoption easier.

David_Smith · January 16, 2025, 7:07pm

As a heavy user of withContiguousStorageIfAvailable and related methods inside the stdlib, I am of course extremely excited about both improved safety and eliminating pyramids of closures. Big +1

Karl · January 16, 2025, 7:28pm

I can give an example of this.

My WebURL type has a UTF8 view, and I would like it to expose its contiguous storage. However, in this case, "storage" refers to the underlying URL storage - which includes its code-units, but also includes other bookkeeping data.

extension WebURL {
  struct UTF8View {
    private var storage: URLStorage
  }
}

Even though I'm using "storage" for a private variable and the Span would be a public property, "storage" is still the best name for me to use internally, and it would be annoying if I had to rename it something less obvious.

My only other feedback is that while this property replaces withUnsafeBufferPointer, it does not replace withContiguousStorageIfAvailable because it is not available on Sequence -- therefore generic code still has no way to get a Span.

Correct me if I'm wrong, but I believe it is (conceptually) possible to implement something like the below on top of the existing withContiguousStorageIfAvailable:

extension Sequence {
  var spanIfAvailable: Span<Element>?
}

The only impediment is that the language doesn't let you yield from inside a closure:

extension Sequence {

  var spanIfAvailable: Span<Element>? {
    _read {
      withContiguousStorageIfAvailable { buffer in
        yield Span(buffer)
      }
    }
  }

}

But I’m not sure that this is actually a problem, though - as I understand it, after the functions get split, we basically end up with the same sequence of events:

1. withContiguousStorageIfAvailable() entry
2. yield - use of the span by the caller
3. withContiguousStorageIfAvailable() exit

So we could actually allow yielding in this way. And even if (let's say) we couldn't work out how to expose that publicly in the language, the standard library could still do it internally to implement this.

glessard · January 16, 2025, 7:38pm

Karl:

Correct me if I'm wrong, but I believe it is (conceptually) possible to implement something like the below on top of the existing withContiguousStorageIfAvailable:
extension Sequence {
  var spanIfAvailable: Span<Element>?
}

No, this would in general be an escape. A type can vend a Span only if can provide a stable address for a scope, and withContiguousStorageIfAvailable() does not do that. For example, NSArray instances bridged as Array currently allocate, copy then deallocte on every call of wCSIA(). The changes that enable CollectionOfOne to vend a Span can't be retroactively applied, for example.

Karl · January 16, 2025, 7:53pm

The idea is that it would be implemented as a _read, and would have all the same lifetime dependence annotations as the proposed properties.

wCSIA does - of course - give you a stable address within the closure scope, and I believe the following code snippets are equivalent:

var foo: String {
  _read {
    yield someStorage
  }
}

// Is the same as:

func withFoo(_ block: (borrowing String) -> Void) {
  block(someStorage)
}

So:

var spanIfAvailable: Span<Element>? {
  _read {
    withContiguousStorageIfAvailable { buffer in
      yield Span(buffer)
    }
  }
}

// Is the same as:

func withSpanIfAvailable<R>(_ block: (borrowing Span<Element>?) -> R) -> R {
  withContiguousStorageIfAvailable { buffer in
    block(Span(buffer))
  }
}

Right? Or am I missing something?

And the second (function) formulation is safe, so the first (_read) formulation is also safe.

scanon · January 16, 2025, 8:23pm

This is correct. Spans over generic collections will be built on new collection protocols ("Containers" is the term we use for talking informally about these), because the existing collection protocols cannot work with non[copyable,escapable] containers.

I expect that withContiguousStorageIfAvailable will be "replaced" in such future protocols with something that provides something like a sequence of contiguous spans. In the meantime, for using Span-consuming algorithms with existing collection/sequence conformers, the best bet is probably to call wCSIA and then convert to a span within that closure; this makes the lifetime scoping much clearer.

Karl · January 16, 2025, 8:43pm

We're suggesting many of the standard library types expose a single contiguous span, and lots of library types can do the same.

I don't see how the lifetime scoping would be any more difficult with a default-provided .spanIfAvailable than it would be with the proposed .storage properties - it's the same Span model of lifetime management, after all.

Unsafe APIs are unsafe in particular ways. wCSIA is unsafe in two ways:

It uses a closure scope, so it can't statically verify that references to the memory do not escape its live range.
It provides its data using an unsafe pointer, which does not include bounds checking.

It is possible for libraries to wrap the buffer pointer and implement their own bounds checking, solving #2 for themselves (this is what I did in my library), but everybody has lacked the language features to solve #1 until now.

I really like it when I write software using some core primitives, and then later on, new software composes those primitives in new, interesting ways and improves my API for free. Here, we have an opportunity to provide all existing contiguous containers with an automatic uplift and expose a safer API to their users. Users in the future will be able to integrate Swift libraries that haven't been updated since 4.2 (or whenever), and still benefit from these safer APIs.

If we can do that, I think we should.

glessard · January 16, 2025, 9:30pm

But we can't. The semantics of withContiguousStorageIfAvailable() are that the storage can be manifested on entry and destroyed on exit. There's no way to maintain it after returning. If making it work requires a change in the code, then we might as well have a new requirement. The future container protocol will define that new requirement.

Karl · January 16, 2025, 9:36pm

Right - with coroutines, you don't return - you yield before the return.

The caller's work happens at the point of the yield (as if they'd passed in a closure containing that work), and the code after the yield (i.e. the return from wCSIA) happens when the caller is done.

I believe the compiler implements this by function-splitting, as it does for async functions, except for coroutines there are no suspensions or Task scheduling. It works out as if you called these "partial functions" in a row:

1. withContiguousStorageIfAvailable() entry
2. yield - use of the span by the caller
3. withContiguousStorageIfAvailable() exit

glessard · January 16, 2025, 9:38pm

withContiguousStorageIfAvailable() is a function, not a coroutine. Source-stability and ABI-stability mean that it can't be retrofitted to become a coroutine. Again, if a change requires a major new addition, we might as well make the new addition the correct one.

Karl · January 16, 2025, 9:45pm

Yeah, what I'm saying is that I think it can be done in a compatible way - again, the yield is basically just executing the caller's "partial function", which is the same pattern that wCSIA implements anyway.

To put it another way: we write the accessor as a coroutine, but I think that after function splitting, it basically becomes the withXYZ { ... } pattern again internally anyway. So it should be possible to wrap a withXYZ { ... }-style function and expose it via a coroutine (which allows us to not return).

I hesitate to discuss compiler internals because I'm not hugely familiar with them, but my understanding is that's how it works. @John_McCall would know.

scanon · January 16, 2025, 10:12pm

If you immediately convert the UBP to a Span, and write your closure in terms of that, then it absolutely does enforce both of these. This is not the long-term solution, but it's very much a solution that we can live with until we are able to define new protocols with the right semantics.

In particular, nontrivial usage (i.e. anything more than immediately passing it as a single function argument) of a spanIfAvailable read accessor requires a closure anyway to get the right lifetime, so it's not buying you any useful lifetime beyond what wCSIA already provides. We do want to provide API that generalizes wCSIA in terms of Span, but we want that API to have useful lifetime guarantees that do not require wrapping everything in nested closures.

Karl · January 16, 2025, 10:30pm

Would the proposed .storage properties not have the same limitations?

scanon · January 16, 2025, 10:32pm

No, because they are not implemented as read accessors. (They are get accessors that do a little dance to fix-up the lifetime of the returned span. In the fullness of time--specifically when we get an accepted lifetime model and annotations and accessors that support the semantics we need--this will become more natural to write without these measures for non-stdlib types. But what we have today is just enough to make it work for these types now, in a way that we're confident we can support in the future.)

Alejandro · January 16, 2025, 10:40pm

The problem with _read is that the value it yields cannot have the same lifetime of the thing that vended it because you have the freedom of generating a value on demand and cleaning up afterwards once you enter the coroutine again. This generated value does not share the same lifetime as the sequence that generated it, for example. We could in theory have something like you mentioned with:

var spanIfAvailable: Span<Element>? {
  _read {
    withContiguousStorageIfAvailable { buffer in
      yield Span(buffer)
    }
  }
}

but the lifetime guarantees of that Span are not the ones we generally want because you wouldn't be able to do something like this:

func futureSeq<S: FutureSequence>(
_ s: S
) -> Span<S.Element>? {
  s.spanIfAvailable
}

This would be an escape because outside of the function the compiler does not know that it needs to perform a cleanup of the _read coroutine (it may not even know that the value returned from this function is from a coroutine). The only way to allow this in any sense is to have the function itself be a coroutine that can yield that value out, but that's not currently a thing in Swift.

The proposed properties have a very different lifetime guarantee; the spans provided from them have the same lifetime as the type that vended it. An array that gives you a span guarantees that no mutation will occur and that the location of its contents won't change as long as a span of it has been vended. Unlike the previous example with coroutines, the following example would actually work:

func spanFromArray(
  _ a: [Int]
) -> Span<Int> {
  a.span
}

CharlesS · January 16, 2025, 11:05pm

The one wrinkle in this is that Foundation.Data can be non-contiguously stored, particularly in instances that come from DispatchIO. Can we also provide a property that can return a collection of Spans instead of a single one, to read through these objects more performantly?

itaiferber · January 16, 2025, 11:47pm

Just a heads-up that this isn't true as of DataProtocol and new inline Data by phausler · Pull Request #20225 · swiftlang/swift · GitHub — Data instances are always contiguous; discontinuous DispatchData buffers are copied on bridging if needed to make them contiguous. (NSData can still be discontiguous under the hood)

(See also the Data conformance to ContiguousBytes)