SE-0456: Add Span-providing Properties to Standard Library Types

lorentey · January 25, 2025, 1:22am

For the naming:

General naming patterns

We generally prefer to name properties primarily after their purpose. As a general rule, we do not simply repeat the name of the type, except for cases where the type is specifically created to back that particular property.

As an extreme example, consider the property Dictionary.keys. Its type is Dictionary.Keys, and that feels right: the only purpose for the Dictionary.Keys type is to be the result type for the keys property. Indeed, that property is the only way to produce dictionary key views. I think it would be fair to say in this case, the type name is in fact largely irrelevant, as we rarely (if ever) need to spell it out -- the type is legitimately named after the property.

On the other end of the spectrum, consider the description property of protocol CustomStringConvertible. The name tells us nothing about its result type; it focuses entirely on its purpose. Indeed, calling it string would be somewhat confusing, as it would tell us nothing about what sort of data we'd expect to find in the result. We would certainly cope if that property was named string, but I find description far better.

Our Span types are somewhere in between these two extremes; but I think they're closer to String than they're to Dictionary.Keys.

In my view, Span is a universal interface type that safely provides direct access to regions of memory. The proposed properties are going to be just one of many ways to generate span instances.

The proposed properties will obviously be very handy for contiguous containers like Array or String; but not all containers are contiguous, and not all memory regions are tied to one.

I think it makes sense for the name to clarify what the returned span is supposed to be, as it will make code that uses it easier to understand.

Appeal to direct precedent

These new properties are intended to be the spiritual successor to the Sequence protocol requirement withContiguousStorageIfAvailable that was added in SE-0237, back in 2018. Note how that name does not include the name of the type it yields, either: it is providing direct access to contiguous storage -- the fact that it happens to be represented by an UnsafeBufferPointer is beside the point.

Notably, SE-0237 ended up breaking with the previous precedent of simply calling such methods after the types they're providing. (For example, Array.withUnsafeBufferPointer and ManagedBuffer.withUnsafeMutablePointers are both predating that proposal.)

Remember that the initial versions of that proposal used the names withUnsafe[Mutable]BufferPointer (like the legacy Array or ManagedBuffer methods still do) -- the change to "contiguous storage" came as part of the Core Team's acceptance notes after the second review.

The notes do not provide a rationale for this change, and I do not remember our engineering discussions at the time. But my (fairly informed) hypothesis is that it came from the desire to clarify the purpose of these APIs. (Although I may be remembering my reaction after the change got announced, rather than the actual reasons that led to it.)

withContiguousStorageIfAvailable does not have much better "mouthfeel" than withUnsafeBufferPointerIfSupported -- both of these feel quite bureaucratic and verbose. But the accepted name is significantly better at explaining what the provided unsafe buffer pointer is supposed to be; I think it leaves far less room for confusion.

So it goes in our case. A property named span would be infuriatingly noncommittal about what its result is supposed to be -- calling it storage directly (and succinctly) documents that it is intended to expose direct access to the container's storage representation.

On the intuitive meaning of "storage"

The purpose of these properties is to expose direct access to the native storage representation behind these container types. We've made significant effort to make sure they do exactly that. We aren't pussyfooting around this: the API signatures we're proposing intentionally disallow the returned Span to be over memory that's allocated on the fly. (Unlike most existing APIs like Sequence.withContiguousStorageIfAvailable, Array.withUnsafeBufferPointer or String.withCString.)

Obviously, this limits the number of types that can provide these properties; but in exchange, we vastly increase their value, by providing predictable performance. Unlike the old with* APIs (or coroutine-based alternatives), we want systems programmers to feel comfortable repeatedly calling these new properties in a loop. (Layering concerns often make it unfeasible to avoid doing that.)

The proposed name directly reflects this. Beyond clearly communicating what data the returned Span is expected to hold, the word "storage" is also strongly implying permanence -- if a memory region is only materialized for the duration of each access, then it shouldn't be called "storage"!

On clashes with internal names

I believe that good public API names should have clear precedence over internal/private names. We should not avoid choosing public names that clash with non-public names that can be trivially renamed. I believe authors will be able to swallow a one-time, minor inconvenience, in exchange for their clients getting better interface names as a result.

Not many container types can provide direct access to their storage representation as a single contiguous span. Even if we end up defining a protocol that requires this property, I do not expect many types would want to conform to it. But if the name storage feels like it interferes too much with existing implementation conventions, then by all means we can go with something else.

Naming alternatives

As I said above, I think the proposed name storage implies two separate things, both quite valuable:

It sets a clear expectation that the returned Span contains precisely the same elements (and in the same order) as the container on which the property is invoked.
It strongly hints that the Span directly accesses the actual, native storage representation, rather than something that's materialized on demand.

The "obvious" name span captures neither of these points. I think it's far too noncommittal to be a good choice -- it would not be the end of the world if we chose it, but it sets a mediocre precedent. We can and should aim higher.

@glessard's contents keeps the first point, but it mostly loses the second. It isn't bad, but perhaps it might be a bit too prominent -- I do not think we'll want to suggest that Swift programmers should look at this property as the preferred way to express accessing the contents of containers. (I look at Span types as mostly under-the-hood plumbing -- we'd heavily rely on them, but typically they'd be hidden behind higher-level operations.)

How about something like storageSpan? It preserves both of the points above, it avoids clashing with anyone's internal names, and it is obscure enough not to attract too much unnecessary attention. (I often complain about unwieldy type names, but this is not that.)