[Pitch] RigidArray and UniqueArray

Hello Evolution,

We want to finally propose adding RigidArray and UniqueArray to the standard library after having some experience with them in the swift-collections package.

Most of the API being proposed is NOT present in this pitch due to body size, but you can read the full proposal here: Propose RigidArray and UniqueArray by Azoy · Pull Request #3202 · swiftlang/swift-evolution · GitHub

RigidArray and UniqueArray

Introduction

We propose to introduce two new array types to the standard library,
RigidArray and UniqueArray, that are both capable of storing noncopyable
elements.

Motivation

Swift 5.9 introduced noncopyable struct and enums into the language and we've
steadily been adding more new API that helps support these types. The standard
library has even implemented noncopyable API like Atomic and Mutex as well
as InlineArray that can store values of noncopyable types in inline storage.
One of the areas that's still a sore spot however, is that there are no data
structures in the standard library itself that support heap allocating a list
of noncopyable values.

Most users reach for Array when it comes to needing to store a list of values
somewhere that can be readily accessed. Array unfortunately does not support
noncopyable values and its default copy on write (:cow_face:) behavior is quite
problematic for these kinds of types as well:

struct File {
  let fd: UInt32

  init(_ path: String) { ... }

  deinit {
    close(fd)
  }
}

let file1 = File("file1.txt")
let file2 = File("file2.md")

var a = [file1, file2]

let file3 = File("file3.swift")

// Ok, 'a' is still uniquely referenced so performing this mutation
// doesn't need to copy on write.
a.append(file3)

var b = a

let file4 = File("file4.tar.gz")

// Error!
b.append(file4)

Appending to b triggers a copy on write on the underlying array buffer, but
we're holding noncopyable values! This particular example is problematic because
if we blindly copied the values within the array, then when we go to destroy b
we would call close on file1, file2, and file3. Working with these files
on a would all be closed!

Storing noncopyable elements in an array means that sharing the array cannot
mutate the elements unless we can dynamically ensure we have exclusive access to
the array buffer and we can unique the array buffer to guarantee exclusiveness.
Array can dynamically guarantee it has exclusive access, but it doesn't know
how to actually unique the buffer if there are noncopyable elements. There are
no hooks during something like an .append call on an array to inform it how to
potentially copy or clone a noncopyable element.

Proposed solution

The standard library proposes to add two new array types RigidArray and
UniqueArray.

RigidArray is a noncopyable, heap allocated, fixed capacity array type, while
UniqueArray is its dynamically resizing variant. UniqueArray provides the
ease-of-use benefits of an automatically self-resizing container type, but that
inherently comes with the cost of those implicit reallocations -- making it far
more difficult to reliably reason about how much storage the data structure will
allocate during its use, and precisely when those allocations may happen. In
contrast, RigidArray requires its storage to be carefully (and explicitly) sized
in advance; this makes it far more difficult to use, but in exchange its
operations have much tighter time and space complexity guarantees, making it
better suited for low-latency or memory-constrained use cases.

In a nutshell, UniqueArray gives us a slightly reimagined, ownership-aware
version of the familiar Array type with uniquely held storage. RigidArray
turns that into a fixed-capacity construct that feels quite inflexible (or
rigid), but its rigidity makes it far more suited for realtime or embedded use.

var a = UniqueArray<File>()
a.append(file1)
a.append(file2)

var b = a

// Ok
b.append(file3)

// Error: 'a' used after consume
a.append(file4)

var b = a is now a move rather than it copying the Array class reference.
This means that a is no longer valid to use. Statically we guarantee that
UniqueArray's are only uniquely held. Note that a doesn't perform
deinitialization anymore, it has transferred ownership to b and once b goes
out of scope we deinitialize the elements.

RigidArray shares all of those semantics, but it has a very strict capacity
limit that will cause it to fatal error the process if one tries to over append
to it:

var a = RigidArray<Int>(capacity: 2)
a.append(1)
a.append(2)

// Runtime error: out of capacity
a.append(3)

This is an extremely important detail. Performance critical contexts that
cannot sacrifice an implicit allocation from underneath their feet would greatly
prefer to use RigidArray where it's impossible to do so. Solutions that try to
carefully use UniqueArray without implicitly allocating will almost certainly
be met with bugs. RigidArray provides this guarantee at the type level. These
semantics come with concrete time and space complexity guarantees that aren't
present with UniqueArray.

Detailed design

RigidArray

/// A fixed capacity, heap allocated, noncopyable array of potentially
/// noncopyable elements.
///
/// `RigidArray` instances are created with a specific maximum capacity. Elements
/// can be added to the array up to that capacity, but no more: trying to add an
/// item to a full array results in a runtime trap.
///
///      var items = RigidArray<Int>(capacity: 2)
///      items.append(1)
///      items.append(2)
///      items.append(3) // Runtime error: RigidArray capacity overflow
///
/// Rigid arrays provide convenience properties to help verify that it has
/// enough available capacity: `isFull` and `freeCapacity`.
///
///     guard items.freeCapacity >= 4 else { throw CapacityOverflow() }
///     items.append(copying: newItems)
///
/// It is possible to extend or shrink the capacity of a rigid array instance,
/// but this needs to be done explicitly, with operations dedicated to this
/// purpose (such as ``reserveCapacity`` and ``reallocate(capacity:)``).
/// The array never resizes itself automatically.
///
/// It therefore requires careful manual analysis or up front runtime capacity
/// checks to prevent the array from overflowing its storage. This makes
/// this type more difficult to use than a dynamic array. However, it allows
/// this construct to provide predictably stable performance.
///
/// This trading of usability in favor of stable performance limits `RigidArray`
/// to the most resource-constrained of use cases, such as space-constrained
/// environments that require carefully accounting of every heap allocation, or
/// time-constrained applications that cannot accommodate unexpected latency
/// spikes due to a reallocation getting triggered at an inopportune moment.
///
/// For use cases outside of these narrow domains, we generally recommmend
/// the use of ``UniqueArray`` rather than `RigidArray`. (For copyable elements,
/// the standard `Array` is an even more convenient choice.)
public struct RigidArray<Element: ~Copyable>: ~Copyable {}

extension RigidArray: Sendable where Element: Sendable & ~Copyable {}

UniqueArray

/// A dynamically self-resizing, heap allocated, noncopyable array of
/// potentially noncopyable elements.
///
/// `UniqueArray` instances automatically resize their underlying storage as
/// needed to accommodate newly inserted items, using a geometric growth curve.
/// This frees code using `UniqueArray` from having to allocate enough
/// capacity in advance; on the other hand, it makes it difficult to tell
/// when and where such reallocations may happen.
///
/// For example, appending an element to a dynamic array has highly variable
/// complexity; often, it runs at a constant cost, but if the operation has to
/// resize storage, then the cost of an individual append suddenly becomes
/// proportional to the size of the whole array.
///
/// The geometric growth curve allows the cost of such latency spikes to
/// get amortized across repeated invocations, bringing the average cost back
/// to O(1); but they make this construct less suitable for use cases that
/// expect predictable, consistent performance on every operation.
///
/// Implicit growth also makes it more difficult to predict/analyze the amount
/// of memory an algorithm would need. Developers targeting environments with
/// stringent limits on heap allocations may prefer to avoid using dynamically
/// resizing array types as a matter of policy. The type `RigidArray` provides
/// a fixed-capacity array variant that caters specifically for these use cases,
/// trading ease-of-use for more consistent/predictable execution.
public struct UniqueArray<Element: ~Copyable>: ~Copyable {}

extension UniqueArray: Sendable where Element: Sendable & ~Copyable {}

Source compatibility

RigidArray and UniqueArray are new types within the standard library, so
source should still be compatible. For developers using these types from
swift-collections, those versions
of these types will still be preferred by the compiler due to the shadowing
rule for standard library type names.

ABI compatibility

The API introduced in this proposal are purely additive to the standard library's
ABI; thus existing ABI is compatible.

Implications on adoption

RigidArray and UniqueArray are new types within the standard library, so
adopters must use at least the version of Swift that introduced these types.
For developers using these types from
swift-collections, it may make
sense to continue using those versions of these types due to the backward
deployment nature of having the source from a package.

Future directions

Clonable

This proposal, like the UniqueBox
proposal, introduces clone() (and clone(capacity:)) for both RigidArray
and UniqueArray. Clone is an explicit deep copy returning an owned array
instance for the caller. Note that this API currently requires
Element: Copyable, but this disallows nested 2d arrays
UniqueArray<UniqueArray<Int>> for example. As mentioned in the UniqueBox
proposal, there is a hidden protocol here Cloneable that would enable such
functionality:

public protocol Cloneable: ~Copyable {
  func clone() -> Self
}

Rigid and Unique variants of other standard data structures

The swift-collections package
defines other flavors of Set and Dictionary that correlate to the Rigid
and Unique semantics defined for the proposed array types here. There's also
Deque variants that would be a potentially welcome change to the standard
library as well.

  • RigidDeque and UniqueDeque

  • RigidSet and UniqueSet

  • RigidDictionary and UniqueDictionary

Container protocols

While this proposal does add the BorrowingSequence conformance for both of
the proposed array types, we still aren't ready to propose any container
protocols on top of it. swift-collections
is currently exploring designs for such a protocol here: https://github.com/apple/swift-collections/blob/main/Sources/ContainersPreview/Protocols/Container.swift

Literal initialization

The proposed array types are explicitly not ExpressibleByArrayLiteral
regardless of if the element is copyable or not. It feels strange that it would
work for some cases and not in others. We would need to overhaul the expressible
protocol for noncopyable elements since that protocol traffics in an Array
which can never support such elements. We could also do a macro based solution
that desugars to insertions.

Alternatives considered

Allocator arguments

Since we're proposing new array types, we have the unique (pun intended)
opportunity to allow these data structures to be allocated with custom allocators.

public struct UniqueArray<Element: ~Copyable, Alloc: Allocator>: ~Copyable {}

Adding an allocator generic argument is very reminiscent of how it works with
std::vector in C++ and Vec in Rust.

This approach would require an Allocator protocol that custom allocators could
conform to and provide some SystemAllocator that comes by default in the
standard library (similar to SystemRandomNumberGenerator):

protocol Allocator {
  func allocate<T>(_: T.Type) -> UnsafePointer<T>

  func deallocate<T>(_: UnsafePointer<T>)

  ...
}

However, this makes working with these types a little more awkward:

func foo(with x: borrowing Unique<Int>)

error: generic type 'UniqueArray' specialized with too few type parameters (got 1, but expected 2)
1 | protocol Allocator {}
2 | 
3 | struct UniqueArray<Element: ~Copyable, Alloc: Allocator>: ~Copyable {
  |        `- note: generic struct 'UniqueArray' declared here
4 | 
5 | }
6 | 
7 | func foo(with x: borrowing UniqueArray<Int>) {}
  |                            `- error: generic type 'UniqueArray' specialized with too few type parameters (got 1, but expected 2)
8 | 

You could alleviate this with UniqueArray<Int, some Allocator> (or an explicit
generic parameter), but you’ve made working with this type much harder than it
needs to be. C++ and Rust solve this particular issue with default values for
generic parameters. So in our original Unique definition we could have:

public struct UniqueArray<Element: ~Copyable, Alloc: Allocator = SystemAllocator>: ~Copyable

Which helps working with this type significantly. Again, this is reliant on
language features that unfortunately do not exist at the time.

All that said, some folks are very hesitant to add default generic parameters
for a few reasons:

  • Forgetting to be generic over the allocator could lead to situations where you
    provide API for only the system allocator (which isn’t that bad!). ABI stable
    libraries wouldn’t be able to modify this function definition unless they
    deprecate the old symbol and add a new one (or used @export(implementation) to
    begin with).
  • Default values for generic parameters could lead to a worse developer
    experience especially when debugging stack traces. C++ is pretty infamous for
    having ridiculously long specializations that while in source are easy to grok,
    its output in a stack trace is less so.
  • Being generic over an Allocator specifically means you now need to care about
    the copyability of the allocator you’re storing. For SystemAllocator, it would
    just be a zero sized type that is a wrapper over UnsafeMutablePointer.allocate
    and deallocate, so we can easily copy it. However, in Rust especially there are
    many places where API is only available when the allocator is clonable which
    makes writing the most generic API possible a little more difficult. While the
    folks on the standard library are happy to deal with these challenges, we can’t
    guarantee that the community at a whole will. It’s very possible folks extending
    UniqueArray (or other potential future collections) won’t deal with it and
    just provide the API where Alloc = SystemAllocator which like I mentioned
    earlier, is a great default.
11 Likes

My very quick feedback is that these tools are great and the names are confusing. When I saw the proposal title, I immediately assumed that UniqueArray was an array whose values are equatable and unique under equality, and I imagine that confusion would be widespread and perpetual.

They’re verbose, but I’d prefer names that are less clever and more transparent, e.g. NoncopyableArray and FixedNoncopyableArray.

15 Likes

I personally would prefer to see the Allocator generic parameter with the eventual default the parameter.

I feel there’s no sense in forcing people to rewrite standard library components or migrate to yet another more generic type again when we could do it just once.

1 Like

Since this has been mentioned here and in the UniqueBox thread, folks should understand that adding "allocator" support is a much larger design space than just slapping a new parameter onto a type's generic signature or function signature. There are multiple reasons:

  • The allocator is viral when you want to write generic algorithms. Even if you can eventually default it at the usage site, you'd probably need to declare it explicitly on any function that wants to generically operate on these collections in order to support non-default allocators.
  • Using any custom allocator makes a safe-by-design type now unsafe, because the allocator can do literally anything it wants under the hood that its author wishes.
  • Generic arguments aren't the solution to all such problems. There's no way to provide a custom allocator for class instances either, and that's something that would need language support, not just a change to a generic signature.

Custom allocators are something we should look at holistically in the language, not rush to add them on hand-picked data structures.

8 Likes

Taking inspiration from Jon’s chart-based naming approach in the UniqueBox discussion, I think that RigidArray warrants a “Unique” in its name to indicate that it is noncopyable. Unless it’s on the table to make RigidArray conditionally copyable?

2 Likes

Sorry if this was brought up before, but it seems like a poor fit for these collections to be in the standard library as proposed, because I’d expect the existing Array and InlineArray to be extended to support non-copyable elements. I understand that there is a lot of compatibility issues to be ironed out, but if that is even remotely possible, I think it would be a bad idea to pollute the Swift module namespace with types that only exist as a stopgap. Third-party packages like swift-collections seem like a perfect place for these kind of types. The swift-atomics package comes to mind as a place to keep essential functionality until a better version of it lands in the standard library.

1 Like

Array's copy on write nature is fundamentally incompatible with non-copyable elements like I laid out in the motivation. There is no extending Array. InlineArray already supports non-copyable elements, but it solves a different use case than what's being proposed.

5 Likes

I understand that, but the copy-on-write behavior is an implementation detail. As far as I know, officially, Array only advertises value semantics. Since non-copyable types fundamentally conflict with CoW, it doesn't seem inconceivable that Array with non-copyable elements would have a different underlying storage behavior.

6 Likes

Agreed with the confusing names. Along with other proposals we now have so many prefixes or suffixes to be applied to original types. We do need them of course, but I feel they are not descriptive enough. Especially so for “Rigid”.

2 Likes

I agree that this proposal needs an explanation of why Array can't be extended to noncopyables rather than adding UniqueArray as a new type; it seems at first glance that the vast majority of this API is the same as Array, and that making Array conditionally-copyable would be sufficient to ensure that it doesn't CoW when it has noncopyable elements (the storage will effectively always be known to be uniquely referenced). NSArray bridging is inapplicable because ObjC types are always Copyable. Conversion to Any is inapplicable because Any is assumed to be Copyable... quite happy to believe there is a good reason, but I can't see it myself.

(the pitch attempts to explain this with a code example which does var b = a, which, if it's even legal syntax when a is noncopyable, should end a's lifetime, thereby returning the array's storage to being uniquely referenced)

5 Likes

As with UniqueBox, we should avoid making promises that values are heap-allocated when the compiler is at liberty to stack-promote them.

3 Likes

Agreed.

@Alejandro, could you help us understand why these UniqueX types should be in the standard library instead of only somewhere like swift-collections?

Source compatibility

RigidArray and UniqueArray are new types within the standard library, so
source should still be compatible. For developers using these types from
swift-collections, those versions
of these types will still be preferred by the compiler due to the shadowing
rule for standard library type names.

Have we brought data structures from Collections to Standard Library before? This might be the first time this has happened… correct?

What opinions are you planning to take on backwards compatibility with projects built from swift-collections that then migrate to Swift.UniqueArray? If an engineer migrates Collections.UniqueArray to Swift.UniqueArray is this pitch proposal planning to make any strong statements about how that migration should be supported? Would that cross-compatibility be an explicit goal of the proposal?

Swift.Array documents its CoW in public:

Good point, but making Array support non-copyable Element with non-CoW storage doesn't invalidate this, since all existing use cases of Array would continue to behave exactly the same way.

3 Likes

I feel the same way, especially for UniqueSet and UniqueArray

A quick question here: when these types are eventually added to the standard library, will they be removed from the swift-collections package?

I hope not, because I still want to use them on older systems. But I'm also not sure if there will be collision issues if we have two copies.

UniqueArray is not only valuable for ~Copyable element types. It is not even primarily useful for them. When applied to copyable element types, it allows the user to avoid reference counting, uniqueness checks, and spooky-action-at-a-distance copies, all of which are absolutely vital to write low-level code with predictable performance.

Extending Array to support non-copyable types would be useful, but nowhere near as useful as an array type that is always noncopyable, even with copyable elements. Even if non-copyable elements didn’t exist, we would still want to have something like UniqueArray.

7 Likes

Yeah, I would say UniqueArray and Array<~Copyable> are differently useful. Array<~Copyable> would be particularly useful for generalizing APIs that want to work will all types while still providing copyability where possible, but some of its overheads cannot be avoided when working with noncopyable types. Others have noted that it would need to preserve a refcounting header (or at least space for one), but in generic ~Copyable contexts, code will also still need to conservatively perform uniqueness checks for mutations, in case the concrete type may be Copyable and the buffer has multiple references.

8 Likes

I like the names of the RigidArray and UniqueArray types.

I don't like the name of the freeCapacity property (because "free" is also a verb), but OutputSpan already has the same API.


Are both of the CustomStringConvertible and CustomDebugStringConvertible conformances necessary? Will the description and debugDescription have different results?


Could the malloc_good_size function be used:

  • by RigidArray and UniqueArray when resizing explicitly?

  • by UniqueArray when resizing implicitly?

(I think malloc_good_size is only available on Apple platforms.)

1 Like