SE-0527: RigidArray and UniqueArray

Hi all -

The review of SE-0527: RigidArray and UniqueArray begins now and runs through April 27, 2026.

Reviews are an important part of the Swift evolution process. All review feedback should be either on this forum thread or, if you would like to keep your feedback private, directly to the review manager via the forum messaging feature. When contacting the review manager directly, please keep the proposal link at the top of the message.

What goes into a review?

The goal of the review process is to improve the proposal under review through constructive criticism and, eventually, determine the direction of Swift. When writing your review, here are some questions you might want to answer in your review:

  • What is your evaluation of the proposal?
  • Is the problem being addressed significant enough to warrant a change to Swift?
  • Does this proposal fit well with the feel and direction of Swift?
  • If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?
  • How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

More information about the Swift evolution process is available here.

Thank you,

Steve Canon
Review Manager

11 Likes

+1 from me. These solve real problems that I come across regularly in my performance-focused and embedded work.

I don’t love the reallocate wording on that particular function. It seems too “implementation-detail-y” and I don’t see a big motivation not to call it resize if that’s what it’s supposed to do. resize would be more specific and describes the user’s intention better IMO.

3 Likes

+1 from me.

I used the types in GitHub - swift-dns/swift-idna: A high-performance, highly optimized multi-platform implementation of Punycode & IDNA - Internationalized Domain Names in Applications · GitHub a few months ago and don't recall any special issues or otherwise I would have filed issues for them in swift-collections.

They achieved what exactly they were supposed to achieve for my usecase (non-copyable collection types, no ARC, no exclusivity checks etc... harder to use but more performant than Array, generally) and I also really liked the usage of Span types everywhere, instead of unsafe pointers, which makes a lot of the APIs much more elegant.

I'll have to say I don't like the usage of "Unique", but I'll leave the name bikesheddings to the other folks, as it gets too subjective. I've seen discussions about similar "Unique" usages elsewhere and the discussion was longer than I could handle.

2 Likes

+1 on adding those types to the standard libraries. I have used these types extensively in recent projects, and they work great.

The only limitation that I ran into with those two types that applies to any other type that offers OutputSpan-based initializers and methods is that these OutputSpans are only available in synchronous closures. This makes them different from any other span type that can be used in asynchronous contexts. It prohibits code like this:

let array = UniqueArray<UInt8>(capacity: 64) { outputSpan in
    try await file.read(into: outputSpan) // error: Context isn't async
}

This problem is more general than just those two types, but this proposal introduces a significant amount of new APIs that are based on OutputSpans, so I would be interested in how we are going to solve this moving forward.

FileHandle example

This is a small nit, so keeping it in a hidden section. Can we stop using FileHandle/FileDescriptor examples that use ~Copyable types and call close in the deinit in evolution proposals, please? We had multiple discussions in these forums already that calling close in deinit's is a bad example due to it being throwing and potentially asynchronous. Continuing to use this as an example will just lead to developers copying this pattern around.

5 Likes

+1 from me.

It would be helpful to mention InlineArray and any other approved types to highlight the “Array” ecosystem when outlining the Rigid|Unique array features. Additionally, would it be appropriate to mark the unsafe buffer pointer methods as deprecated and recommend the Span versions? At a minimum, include that suggestion in their documentation. If a developer is using this new interface, it seems like a good opportunity to encourage them to avoid unsafe pointers.

We would not want to deprecate the buffer pointer API, because you really do need it sometimes, but yes, we can nudge people toward using Span.

1 Like

A small comment, not a full review: the keepingCapacity argument of removeAll should default to true, or perhaps be omitted altogether, since there is another spelling of "remove all and discard the allocation"—assigning .init(). I realize this is a divergence from Array, but it's a convergence towards RigidArray, and I believe a better API design in general. (Rust's Vec::clear works the same way, and likewise "assign a new Vec over the old one" is the way to "remove all and discard the allocation".)

2 Likes

Very positive on the inclusion of this in the standard library. I've been using UniqueArray in a form similar to the swift-collections version in several projects, and it's great from a performance / usability standpoint.

I have three nontrivial concerns with the proposal.

Containers module isn't needed

I feel that noncopyable containers belong in the standard library alongside their corresponding copy-on-write collections. Copy-on-write is still the right default for most code, but the Unique variants should be right there, in the same core module, for folks who need them. Sequestering them in a separate Containers module makes them less discoverable and less likely to be used when folks should use them.

RigidArray isn't needed

Despite using UniqueArray in a whole bunch of places, I have yet to need RigidArray for anything. In my experience, I've needed either a truly fixed-size array or I've needed a resizable UniqueArray. A RigidArray that traps on an append that exceeds the capacity feels, to me, like more of a footgun than a data structure I would reach for. I think it's fine if it stays in swift-collections for those who need it, but it feels like putting it in the standard library (even in a standard Containers module) is going to lure folks into using it where they aren't thinking about preconditions.

I do wonder if the use cases that motivated RigidArray would be better served by a differently-named append variant that traps rather than reallocations. appendWithoutReallocation is very long, but it makes clear the extra precondition beyond what one would expect of just append.

Sharing a representation with Array

This is covered the alternatives considered, but I want to bring it up for more discussion.

The case for having UniqueArray and Array share their storage representation is that it allows O(1) conversion between the two. For example, a low-level API might produce a UniqueArray because that's the lowest-overhead solution. But a high-level library or user code consuming that UniqueArray wants to traffic in Array because it's easier to work with and commonly used throughout the rest of the program. If there is no shared representation, that requires copying the contents---an O(n) operation with a new heap allocation.

If there's a shared representation between the two, then you can have an Array initializer that consumes a UniqueArray and operates in O(1) time with no memory allocation, e.g.,

extension Array {
  init(consuming: consuming UniqueArray<Element>) { /* steal the guts of UniqueArray */ }
}

The other direction is possible when the array is uniquely referenced, so the API might look like this (and is also O(1)):

extension Array {
  mutating func consumeIfUnique() -> UniqueArray<Element>? {
    if isUniquelyReferenced { return /* steal the guts, then set self = [] */ }
    else { return nil }
  }
}

Now, the Array representation is larger than what UniqueArray would normally need: it has two pointers worth of storage in it, one for the type metadata pointer (for the class that backs the Array) and one for the reference counts. Those would be untouched or set to constant values by UniqueArray, and filled in at the point where Array consumes the UniqueArray.

That's constant per-instance overhead of two extra pointers for UniqueArray to support this O(1) operation. I think that this kind of layering of libraries is going to become a lot more common in Swift, as lower-level libraries use more noncopyable types for performance/code size reasons while most code keeps on using the copy-on-write types for convenience.

Doug

6 Likes

I think it’s worth talking about the potential for using UniqueArray to optimize code that needs to work with Array. Array is a critical part of the Swift ecosystem, often used as a currency type for passing lists of values between components. As such, it has to balance a lot of competing demands, which ultimately comes out as performance overhead; that's why these alternative implementations are interesting. Some code will be able to benefit by simply trivially replacing a type: for example, storing a UniqueArray<T> instead of an Array<T>. But a lot of code will not be able to do this because it still has to interact with Array, and I think that code is still worth optimizing.

What I'm imagining would look like the addition of three APIs:

  • an API that mutates an Array to produce an optional UniqueArray by stealing the underlying buffer if it is optionally referenced;
  • an API that consumes an Array to produce a UniqueArray by taking the underlying buffer if it is optionally referenced and otherwise copying it; and
  • an API that consumes a UniqueArray to produce an Array by taking the underlying buffer.

All of these would rely on Array and UniqueArray sharing a common representation, which has repercussions that I'll discuss below.

These APIs could be used to optimize several common code patterns by avoiding an unnecessary deep copy when interacting with Array:

  • code that receives an Array value and then mutates it,
  • code that produces a collection in its own way but then specifically needs to pass it off as an Array, or
  • code that combines the two, such as a mutating algorithm on Array.

A lot of code does not have the choice to replace Array, either because it has a dependency or requirement that is out of its control or because the type is not a drop-in replacement for Array. As examples of the former, many SDK methods, especially in Apple's SDK, expect to be passed or return Arrays rather than some more abstract type. As examples of the latter, an algorithm on Array cannot simply be rewritten to work on MutableSpan unless it doesn't need to add or remove elements, and it cannot be rewritten to work on OutputSpan unless it can accurately determine the required array capacity up front.

The cost of adding these APIs would be committing to sharing a representation between UniqueArray and Array. This would not necessarily add reference-counting overhead to UniqueArray, because UniqueArray can still rely on an invariant that it holds a unique reference to its buffer. But it would require array buffer to have the standard Swift object header, which has a two-word direct overhead as well as a global cost in required class metadata.[1] It would also require the size and capacity of the array to be stored in the array buffer as they are in Array, which slightly pessimizes reading the size, although this is a fairly small cost.

I think on balance that these are probably excessive costs for UniqueArray to bear just for the potential of optimizing interaction with Array. We want UniqueArray to have optimal performance, and that means not burdening it with extra compatibility requirements. Sharing a representation would also preclude having an unsafe initializer that constructs a UniqueArray with an unsafe buffer pointer, although I believe the authors are currently opposed to adding such an API because of its potential for introducing corruption at a distance.

But this would leave us without any reasonable tools for optimizing Array code; programmers who need to interact with Array will be left in an awkward position. So I think it's worth staking out a future direction of adding, say, an Array.UniqueBuffer which represents a unique reference to an Array buffer. That would have essentially the same API surface as UniqueArray, just accepting some small overheads in order to achieve toll-free interoperation with Array.

Edit: and now that I've written this, I see that Doug has made many of the same points above.


  1. Buffer objects must store a pointer to a class object that describes the class, which is generally unique per element type. In many cases, this can be allocated and initialized statically, but it must still be registered with the runtime dynamically on Apple platforms. ↩︎

3 Likes

Hmm, is there a spare bit in UniqueArray that could be used as a flag to indicate whether its buffer shares the Array layout?

I would think that we could at least avoid eagerly populating the object header when a buffer is only ever used as a UniqueArray, leaving the metadata word set to null until forced when we transfer ownership of the buffer to a refcounted Array. That way code that only uses UniqueArray doesn't have to pay any costs other than the initial extra allocation.

1 Like

Hmm, that's an interesting idea. We should be able to use the null value to track whether we've initialized the object header, which would side-step any concerns about re-initializing it. It would still impose a small execution-time overhead on UniqueArray in that we'd need to check for an initialized object header on the destruction path. Alternatively, we could tear the object header down when converting an Array into a UniqueArray if we're certain that there's never going to be something useful to keep alive there.

There's also the fact that memory allocators are "fuzzy", in that the actual size of an allocation is often slightly more than what was requested. So it's possible that the two extra pointers will often only use memory that would otherwise be unused. In contrast, the inline representation of UniqueArray is less likely to be in its own heap allocation, and more likely to be embedded within a larger data structure such as a struct or array, where the inline size matters more. So maybe the "thin" representation will actually slightly improve memory utilization overall.

Hypothetically, if the performance difference between the "thin" and "fat" representations ends up being negligible, then the main concern, I think, would be interoperability with other code. The "thin" representation would have better interoperability with code that expects a heap-allocated array without a header, perhaps low-level C code or Swift code that uses UnsafeBufferPointer. The "fat" representation would have better interoperability with Swift code that uses Array.

Larger allocations only really get hidden by allocation quantization when there's a non-uniform pattern that means the extra bytes fit within a quantum more often than would be expected from random chance alone. Otherwise, given an allocator with a quantum size of Q, increasing requested allocation size by B bytes requires an additional quantum at the rate B/Q, costing Q bytes whenever it does, giving an expected increase in memory use of B bytes per allocation.

1 Like

There’s also some very real performance costs that arise from making the data storage of an array no longer cacheline-aligned; you frequently need to touch one more cacheline than you otherwise would, and some vectorized algorithms (either hand-vectorized or autovectorized) will end up spending more time in (generally) slower edge-handling code.

If the goal for these types is “give me the lowest-overhead default array that doesn’t require me to think about this stuff,” that’s fairly compelling for me. If the goal is just “give me Array without uniqueness checks and CoW” we could probably accept the perf hit.

1 Like