Hello Evolution,
We want to finally propose adding RigidArray and UniqueArray to the standard library after having some experience with them in the swift-collections package.
Most of the API being proposed is NOT present in this pitch due to body size, but you can read the full proposal here: Propose RigidArray and UniqueArray by Azoy · Pull Request #3202 · swiftlang/swift-evolution · GitHub
RigidArray and UniqueArray
- Proposal: SE-NNNN
- Authors: Karoy Lorentey, Alejandro Alonso
- Review Manager: TBD
- Status: Awaiting review
- Implementation: swiftlang/swift#87521
- Review: (pitch)
Introduction
We propose to introduce two new array types to the standard library,
RigidArray and UniqueArray, that are both capable of storing noncopyable
elements.
Motivation
Swift 5.9 introduced noncopyable struct and enums into the language and we've
steadily been adding more new API that helps support these types. The standard
library has even implemented noncopyable API like Atomic and Mutex as well
as InlineArray that can store values of noncopyable types in inline storage.
One of the areas that's still a sore spot however, is that there are no data
structures in the standard library itself that support heap allocating a list
of noncopyable values.
Most users reach for Array when it comes to needing to store a list of values
somewhere that can be readily accessed. Array unfortunately does not support
noncopyable values and its default copy on write (
) behavior is quite
problematic for these kinds of types as well:
struct File {
let fd: UInt32
init(_ path: String) { ... }
deinit {
close(fd)
}
}
let file1 = File("file1.txt")
let file2 = File("file2.md")
var a = [file1, file2]
let file3 = File("file3.swift")
// Ok, 'a' is still uniquely referenced so performing this mutation
// doesn't need to copy on write.
a.append(file3)
var b = a
let file4 = File("file4.tar.gz")
// Error!
b.append(file4)
Appending to b triggers a copy on write on the underlying array buffer, but
we're holding noncopyable values! This particular example is problematic because
if we blindly copied the values within the array, then when we go to destroy b
we would call close on file1, file2, and file3. Working with these files
on a would all be closed!
Storing noncopyable elements in an array means that sharing the array cannot
mutate the elements unless we can dynamically ensure we have exclusive access to
the array buffer and we can unique the array buffer to guarantee exclusiveness.
Array can dynamically guarantee it has exclusive access, but it doesn't know
how to actually unique the buffer if there are noncopyable elements. There are
no hooks during something like an .append call on an array to inform it how to
potentially copy or clone a noncopyable element.
Proposed solution
The standard library proposes to add two new array types RigidArray and
UniqueArray.
RigidArray is a noncopyable, heap allocated, fixed capacity array type, while
UniqueArray is its dynamically resizing variant. UniqueArray provides the
ease-of-use benefits of an automatically self-resizing container type, but that
inherently comes with the cost of those implicit reallocations -- making it far
more difficult to reliably reason about how much storage the data structure will
allocate during its use, and precisely when those allocations may happen. In
contrast, RigidArray requires its storage to be carefully (and explicitly) sized
in advance; this makes it far more difficult to use, but in exchange its
operations have much tighter time and space complexity guarantees, making it
better suited for low-latency or memory-constrained use cases.
In a nutshell, UniqueArray gives us a slightly reimagined, ownership-aware
version of the familiar Array type with uniquely held storage. RigidArray
turns that into a fixed-capacity construct that feels quite inflexible (or
rigid), but its rigidity makes it far more suited for realtime or embedded use.
var a = UniqueArray<File>()
a.append(file1)
a.append(file2)
var b = a
// Ok
b.append(file3)
// Error: 'a' used after consume
a.append(file4)
var b = a is now a move rather than it copying the Array class reference.
This means that a is no longer valid to use. Statically we guarantee that
UniqueArray's are only uniquely held. Note that a doesn't perform
deinitialization anymore, it has transferred ownership to b and once b goes
out of scope we deinitialize the elements.
RigidArray shares all of those semantics, but it has a very strict capacity
limit that will cause it to fatal error the process if one tries to over append
to it:
var a = RigidArray<Int>(capacity: 2)
a.append(1)
a.append(2)
// Runtime error: out of capacity
a.append(3)
This is an extremely important detail. Performance critical contexts that
cannot sacrifice an implicit allocation from underneath their feet would greatly
prefer to use RigidArray where it's impossible to do so. Solutions that try to
carefully use UniqueArray without implicitly allocating will almost certainly
be met with bugs. RigidArray provides this guarantee at the type level. These
semantics come with concrete time and space complexity guarantees that aren't
present with UniqueArray.
Detailed design
RigidArray
/// A fixed capacity, heap allocated, noncopyable array of potentially
/// noncopyable elements.
///
/// `RigidArray` instances are created with a specific maximum capacity. Elements
/// can be added to the array up to that capacity, but no more: trying to add an
/// item to a full array results in a runtime trap.
///
/// var items = RigidArray<Int>(capacity: 2)
/// items.append(1)
/// items.append(2)
/// items.append(3) // Runtime error: RigidArray capacity overflow
///
/// Rigid arrays provide convenience properties to help verify that it has
/// enough available capacity: `isFull` and `freeCapacity`.
///
/// guard items.freeCapacity >= 4 else { throw CapacityOverflow() }
/// items.append(copying: newItems)
///
/// It is possible to extend or shrink the capacity of a rigid array instance,
/// but this needs to be done explicitly, with operations dedicated to this
/// purpose (such as ``reserveCapacity`` and ``reallocate(capacity:)``).
/// The array never resizes itself automatically.
///
/// It therefore requires careful manual analysis or up front runtime capacity
/// checks to prevent the array from overflowing its storage. This makes
/// this type more difficult to use than a dynamic array. However, it allows
/// this construct to provide predictably stable performance.
///
/// This trading of usability in favor of stable performance limits `RigidArray`
/// to the most resource-constrained of use cases, such as space-constrained
/// environments that require carefully accounting of every heap allocation, or
/// time-constrained applications that cannot accommodate unexpected latency
/// spikes due to a reallocation getting triggered at an inopportune moment.
///
/// For use cases outside of these narrow domains, we generally recommmend
/// the use of ``UniqueArray`` rather than `RigidArray`. (For copyable elements,
/// the standard `Array` is an even more convenient choice.)
public struct RigidArray<Element: ~Copyable>: ~Copyable {}
extension RigidArray: Sendable where Element: Sendable & ~Copyable {}
UniqueArray
/// A dynamically self-resizing, heap allocated, noncopyable array of
/// potentially noncopyable elements.
///
/// `UniqueArray` instances automatically resize their underlying storage as
/// needed to accommodate newly inserted items, using a geometric growth curve.
/// This frees code using `UniqueArray` from having to allocate enough
/// capacity in advance; on the other hand, it makes it difficult to tell
/// when and where such reallocations may happen.
///
/// For example, appending an element to a dynamic array has highly variable
/// complexity; often, it runs at a constant cost, but if the operation has to
/// resize storage, then the cost of an individual append suddenly becomes
/// proportional to the size of the whole array.
///
/// The geometric growth curve allows the cost of such latency spikes to
/// get amortized across repeated invocations, bringing the average cost back
/// to O(1); but they make this construct less suitable for use cases that
/// expect predictable, consistent performance on every operation.
///
/// Implicit growth also makes it more difficult to predict/analyze the amount
/// of memory an algorithm would need. Developers targeting environments with
/// stringent limits on heap allocations may prefer to avoid using dynamically
/// resizing array types as a matter of policy. The type `RigidArray` provides
/// a fixed-capacity array variant that caters specifically for these use cases,
/// trading ease-of-use for more consistent/predictable execution.
public struct UniqueArray<Element: ~Copyable>: ~Copyable {}
extension UniqueArray: Sendable where Element: Sendable & ~Copyable {}
Source compatibility
RigidArray and UniqueArray are new types within the standard library, so
source should still be compatible. For developers using these types from
swift-collections, those versions
of these types will still be preferred by the compiler due to the shadowing
rule for standard library type names.
ABI compatibility
The API introduced in this proposal are purely additive to the standard library's
ABI; thus existing ABI is compatible.
Implications on adoption
RigidArray and UniqueArray are new types within the standard library, so
adopters must use at least the version of Swift that introduced these types.
For developers using these types from
swift-collections, it may make
sense to continue using those versions of these types due to the backward
deployment nature of having the source from a package.
Future directions
Clonable
This proposal, like the UniqueBox
proposal, introduces clone() (and clone(capacity:)) for both RigidArray
and UniqueArray. Clone is an explicit deep copy returning an owned array
instance for the caller. Note that this API currently requires
Element: Copyable, but this disallows nested 2d arrays
UniqueArray<UniqueArray<Int>> for example. As mentioned in the UniqueBox
proposal, there is a hidden protocol here Cloneable that would enable such
functionality:
public protocol Cloneable: ~Copyable {
func clone() -> Self
}
Rigid and Unique variants of other standard data structures
The swift-collections package
defines other flavors of Set and Dictionary that correlate to the Rigid
and Unique semantics defined for the proposed array types here. There's also
Deque variants that would be a potentially welcome change to the standard
library as well.
-
RigidDequeandUniqueDeque -
RigidSetandUniqueSet -
RigidDictionaryandUniqueDictionary
Container protocols
While this proposal does add the BorrowingSequence conformance for both of
the proposed array types, we still aren't ready to propose any container
protocols on top of it. swift-collections
is currently exploring designs for such a protocol here: https://github.com/apple/swift-collections/blob/main/Sources/ContainersPreview/Protocols/Container.swift
Literal initialization
The proposed array types are explicitly not ExpressibleByArrayLiteral
regardless of if the element is copyable or not. It feels strange that it would
work for some cases and not in others. We would need to overhaul the expressible
protocol for noncopyable elements since that protocol traffics in an Array
which can never support such elements. We could also do a macro based solution
that desugars to insertions.
Alternatives considered
Allocator arguments
Since we're proposing new array types, we have the unique (pun intended)
opportunity to allow these data structures to be allocated with custom allocators.
public struct UniqueArray<Element: ~Copyable, Alloc: Allocator>: ~Copyable {}
Adding an allocator generic argument is very reminiscent of how it works with
std::vector in C++ and Vec in Rust.
This approach would require an Allocator protocol that custom allocators could
conform to and provide some SystemAllocator that comes by default in the
standard library (similar to SystemRandomNumberGenerator):
protocol Allocator {
func allocate<T>(_: T.Type) -> UnsafePointer<T>
func deallocate<T>(_: UnsafePointer<T>)
...
}
However, this makes working with these types a little more awkward:
func foo(with x: borrowing Unique<Int>)
error: generic type 'UniqueArray' specialized with too few type parameters (got 1, but expected 2)
1 | protocol Allocator {}
2 |
3 | struct UniqueArray<Element: ~Copyable, Alloc: Allocator>: ~Copyable {
| `- note: generic struct 'UniqueArray' declared here
4 |
5 | }
6 |
7 | func foo(with x: borrowing UniqueArray<Int>) {}
| `- error: generic type 'UniqueArray' specialized with too few type parameters (got 1, but expected 2)
8 |
You could alleviate this with UniqueArray<Int, some Allocator> (or an explicit
generic parameter), but you’ve made working with this type much harder than it
needs to be. C++ and Rust solve this particular issue with default values for
generic parameters. So in our original Unique definition we could have:
public struct UniqueArray<Element: ~Copyable, Alloc: Allocator = SystemAllocator>: ~Copyable
Which helps working with this type significantly. Again, this is reliant on
language features that unfortunately do not exist at the time.
All that said, some folks are very hesitant to add default generic parameters
for a few reasons:
- Forgetting to be generic over the allocator could lead to situations where you
provide API for only the system allocator (which isn’t that bad!). ABI stable
libraries wouldn’t be able to modify this function definition unless they
deprecate the old symbol and add a new one (or used@export(implementation)to
begin with). - Default values for generic parameters could lead to a worse developer
experience especially when debugging stack traces. C++ is pretty infamous for
having ridiculously long specializations that while in source are easy to grok,
its output in a stack trace is less so. - Being generic over an
Allocatorspecifically means you now need to care about
the copyability of the allocator you’re storing. ForSystemAllocator, it would
just be a zero sized type that is a wrapper overUnsafeMutablePointer.allocate
and deallocate, so we can easily copy it. However, in Rust especially there are
many places where API is only available when the allocator is clonable which
makes writing the most generic API possible a little more difficult. While the
folks on the standard library are happy to deal with these challenges, we can’t
guarantee that the community at a whole will. It’s very possible folks extending
UniqueArray(or other potential future collections) won’t deal with it and
just provide the API whereAlloc = SystemAllocatorwhich like I mentioned
earlier, is a great default.