SE-0447: Span: Safe Access to Contiguous Storage

Douglas_Gregor · September 18, 2024, 4:40am

Hi everyone,

The review of SE-0447 "Span: Safe Access to Contiguous Storage " begins now and runs through October 1, 2024.

In order to try this feature out, the proposal authors have provided toolchains:

Toolchain for macOS: https://ci.swift.org/job/swift-PR-toolchain-macos/1498/artifact/branch-main/swift-PR-76406-1498-osx.tar.gz
Toolchain for Linux (Ubuntu): https://download.swift.org/tmp/pull-request/76406/1039/ubuntu2004/PR-ubuntu2004.tar.gz
Toolchain for Windows: https://ci-external.swift.org/job/swift-PR-build-toolchain-windows/5477/artifact/zip/archive.zip

This proposal is a library counterpart and first adopter of SE-0446 "Non-escapable types". In the toolchains, withSpan() functions are available with the Array family of types (Array , ArraySlice , ContiguousArray ), on UTF8View (String.UTF8View , Substring.UF8View ), as well as on the UnsafeBufferPointer family. Where the element type is BitwiseCopyable , a withBytes() function is also available, vending a RawSpan . While these with(Raw)Span operations are not specifically proposed, they offer a way to try out Span, its API, and how it interacts with Swift code as a non-escapable type.

Reviews are an important part of the Swift evolution process. All review feedback should be either on this forum thread or, if you would like to keep your feedback private, directly to the review manager via the forum messaging feature. When contacting the review manager directly, please keep the proposal link at the top of the message.

What goes into a review?

The goal of the review process is to improve the proposal under review through constructive criticism and, eventually, determine the direction of Swift. When writing your review, here are some questions you might want to answer in your review:

What is your evaluation of the proposal?
Is the problem being addressed significant enough to warrant a change to Swift?
Does this proposal fit well with the feel and direction of Swift?
If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?
How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

More information about the Swift evolution process is available at https://github.com/apple/swift-evolution/blob/main/process.md .

Thanks for contributing to Swift!

Doug Gregor
Review Manager

bok1 · September 19, 2024, 1:46am

I haven't reviewed the proposal in great detail yet but any chance we could reconsider on the name? apple/swift-distributed-tracing makes prolific use of Span and withSpan as do many other implementations of tracing/telemetry packages. I see some alternatives were already considered, but this potential conflict wasn't mentioned.

glessard · September 19, 2024, 3:52am

While we were not aware of the name Span in OpenTelemetry, we do note that other languages use the term in exactly the same sense as is proposed here (a contiguous region of memory): C# and C++. Another major similar type is Rust's slice, which is obviously not an available name for the Swift standard library. While having a naming collision with a package-provided type is unfortunate, it's likely to happen in a large ecosystem. Fortunately it should be fairly easy to disambiguate in practice – note that swift-distributed-tracing builds without issues using the test toolchain.

ktoso · September 19, 2024, 5:17am

I should probably chime in on that as I maintain swift-distributed-tracing and remain in touch with otel specification etc.

Short version: I think it's a bit unfortunate but ultimately fine for stdlib to use the Span name. Tracing would be able to disambiguate with Tracing.Span. They're unlikely to be used in the exact same contexts so I don't worry too much about naming confusion.

~

Personally I thought BufferView was an excellent self-descriptive name, and the proposal consistently explains Span in terms of "it's like a buffer", so there's something to be said about it being a good name.

I don't think BufferView implies anything misleading though as the alternatives considered are arguing, it just seems to come down to personal preference at this point. Might be worth a short reconsideration, but if it's to be Span I think we can all live with that just fine.

For what it's worth the term Span in distributed tracing systems dates back to ~2010 from the original dapper paper (and then zipkin, open tracing, open telemetry etc), and it is well known terminology in any tracing system to date. Having that said, I think it's fine if stdlib wants to use the name to remain in-line with C++ I suppose.

The use of the Span type in tracing systems is usually not spelling out the type, one await withSpan("name") { span in } and they're rarely passed around anywhere, so I don't think there's too much confusion here.

A tracing span would never be confused with an array.withSpan {} for example I think. Perhaps it is worth to acknowledge this naming clash and that we think it's fine in the alternatives considered where naming is discussed.

As far as type disambiguation, Tracing.Span and Swift.Span are both fine, so I don't think we need to change anything in the tracing ecosystem if this is to be called Span.

Back to actual review though

I'm very excited about this type and the related lifetime annotations, this has been something we've (people who shuffle bytes around all day over network or files) been waiting for

Overall the proposed API looks good, and I'm happy to see unchecked subscripts included as well. APIs I'd have some input on are mostly in future directions category tbh, so I'll save that for the future.

From a quick skim of the APIs it seems NIO's ByteBuffer will be able to implement these with ease – perhaps worth calling it out as important adopter, along side Foundation.Data etc?

Looks great and I look forward to seeing the lifetime proposals come in

Alejandro · September 19, 2024, 6:20am

Also, keep in mind that the standard library is special and has the ability to shadow user defined types with the same name.

struct Int {
  func hello() {
    print("hello")
  }
}

let x = Int()
x.hello()

So existing modules who use the Span name will automatically keep using their own definition. You can reference the standard library's version with Swift.Span. Of course, if the standard library is exposing a public Span type and another module B is exposing a Span type, a client of both of these modules would have to disambiguate I think? (unless the shadow rules apply here and prefers non-stdlib definitions)

j-f1 · September 19, 2024, 1:30pm

How could someone access a non-copyable value from an offset into a RawSpan? It seems like the corresponding methods require Copyable types only, but a typed Span does not require the element type to be copyable.

glessard · September 19, 2024, 1:53pm

Loading from a pointer is inherently a copying operation, and accordingly it seems unsound to allow loading a non-copyable value from a raw pointer. Note that we did not generalize UnsafeRawPointer.load(as:) for non-copyable return values either. Constructing non-copyable values from an arbitrary buffer will therefore require an explicit parsing step.

As for creating a RawSpan from a Span: safe construction of a RawSpan will require the underlying element type to be BitwiseCopyable, which excludes Spans over non-copyable values.

simonomi · September 19, 2024, 2:39pm

Overall, I'm strongly in favor of this proposal. As someone who uses swift to shuffle bytes around all day over files, I can't wait to adopt it! :)

My one nitpick is with the boundsContain methods. The much more idiomatic way to do this in swift would be .indices.contains(), is there a reason we don't use that instead? I see that in the implementation, _indices is underscored, but in my opinion it should be publicly visible, even if Span doesn't conform to Collection. (A hypothetical non-escapable non-copyable collection protocol would still have that requirement, right?)

glessard · September 19, 2024, 3:09pm

We didn't propose indices in order to ensure we wouldn't clobber the future namespace for the Collection protocol. As suggested by the underscored property, we expect to propose it at a later time, as part of a generalized containers proposal. This being said, the contains() method you refer to only applies for single indices, not ranges of indices. (While there is a way to get Range<T>.contains(_: some Collection<T>) that comes from the Regex module, it is an efficiency mistake in need of fixing (see issue).)

In general, just like in Collection, I would expect the type returned by an indices property to not be always cheap to produce, and it tends to be cheaper for the Collection instance itself to do index validation directly, rather than delegating to its Indices type. The same logic will apply to generalized collections, leading to these public index-validation API.

simonomi · September 19, 2024, 3:45pm

Ahh, I didn't realize that Range<T>.contains(_: some Collection<T>) wasn't RangeExpression<T>.contains(_: some RangeExpression<T>) (or something roughly equivalent). IMO this API should exist, but I guess that's outside the scope of this proposal.

It makes sense that indices might not always be cheap to produce, but for Span in particular, it is, so I would still prefer for it to be used instead of boundsContain.

Also, since Span doesn't work with for-loops yet, it'd be nice to use

for i in mySpan.indices {
  calculation(mySpan[i])
}

instead of

for i in 0..<mySpan.count {
  calculation(mySpan[i])
}

orobio · September 21, 2024, 9:14pm

Overall, the proposal looks good to me, but I wonder if the unchecked subscripting should use ‘unsafe’, which is commonly used for unsafe APIs:

public subscript(unsafe position: Int) -> Element { _read }

glessard · September 23, 2024, 3:55pm

This is an interesting discussion to have. The problem I see with just using "unsafe" everywhere it that it doesn't describe the type of unsafety at hand. We often see taxonomies of unsafety separate it in 5 categories (e.g. this post,) namely "temporal", "spatial", "type", "initialization" and "thread." I would personally like to see what kind of safety is being side-stepped in a given unsafe call; that is why I am proposing the word "unchecked" for functions where bounds-checking is suppressed, as a marker for the kind of unsafety. The concurrency effort is also developing a vocabulary for specifying possible thread unsafety. We also have Array.init(unsafeUninitializedCapacity:initializingWith:), which specifies that the memory it lends out is uninitialized.

Could that be overly pedantic, though? I am interested in that discussion.

tbkka · September 23, 2024, 6:35pm

I suppose unsafeBounds might work here -- it allows you to grep for unsafe to find unsafe uses throughout a large code base while also clarifying that this is a bounds-checking unsafety.

tbkka · September 23, 2024, 6:38pm

As someone who has written a fair bit of network, compression, serialization, and other code that needs to work with chunks of bytes, I'm very excited about this. I'm even more excited about the future directions and look forward to being able to someday use this to write the kind of safe, performant code that Swift has simply been unable to express until now.

glessard · September 23, 2024, 6:43pm

The word "unchecked" is all over the standard library, though often in underscored position. It does have at least one public use in Range.init(uncheckedBounds:).

glessard · September 23, 2024, 7:02pm

I have another topic of discussion to bring up, regarding the pair of span comparison/containment functions in the proposal, isWithin(_: Self) -> Bool and indicesWithin(_: Self) -> Range<Int>?. I had assumed that computing the range of indices would be a significant cost above and beyond computing the yes/no answer. However, after some benchmarking it's clear it isn't very much more:

With a BitwiseCopyable element, computing the bounds takes about 5% more time than computing the bool.
With a non-BitwiseCopyable element, computing the bounds takes about 10% more time than computing the bool.

The difference between the BitwiseCopyable case and otherwise is due to the fact that a Span<some BitwiseCopyable> is allowed to have an unaligned base pointer, whereas it is required to be aligned in the general case. Therefore when computing the Bool with a non-BitwiseCopyable element, an integer division can be replaced by sums and multiplications. Nevertheless, the difference is quite small.

This information leads me to suggest removing the isWithin(_: Self) -> Bool function from the proposal, leaving only the indicesWithin(_: Self) -> Range<Int>? function to fulfill this purpose.

John_McCall · September 23, 2024, 7:40pm

I'm excited to see this moving forward. I agree with some of the other reviewers that I'm not sure we need to introduce a new term here instead of e.g. simply dropping the "Unsafe" from UnsafeBufferPointer.

Jumhyn · September 23, 2024, 7:53pm

I have a slight allergy to introducing 'pointer' to safe Swift as well. I think previous pitches of this idea mentioned BufferView or BorrowedBuffer as potential names, which I liked. I agree invoking 'buffer' for this concept makes sense to me.

hisekaldma · September 23, 2024, 8:07pm

Can’t this type simply be called Buffer? The View suffix seems as unnecessary as the Pointer one.

David_Smith · September 23, 2024, 8:31pm

I think what both View and Pointer are trying to convey is the non-owning-ness of it. "This is someone else's buffer". Under this ontology, plain Buffer would be a name you'd use for the result of malloc rather than the result of a method that lets you peek at an Array's storage.