you don’t need UniqueArray for the pattern you show in your example, you can do it today via OutputSpan.
Here follows some silly questions… So I can use UniqueArray<MyCopyableType> too right (i.e. I can use it for types which are copyable) - just that UniqueArray does not require the Element to be Copyable, and I might wanna do so if I wanna avoid copies of the array, e.g. when I want full control of performance? And an instance of this UniqueArray is still movable (consumed?), and even if its elements are copyable, moving the array does just that, does not copy each element in it, just moves the whole array without any copies, correct?
(I’ve almost only coded Rust last 2 years, and I have not really tried “ownership” in Swift yet, just glanced at the many various proposals the last years).
Yes, yes, and yes. ![]()
I think I’ll have to agree here with Nate. Folks can continue using Array, but for those that just want the benefit of eliminating all uniqueness checks and retain/releases, UniqueArray is a perfect choice.
Consider scenarios like JSON deserialization or CLI parsing, you almost never want to use RigidArray here because these parsers don’t know how many elements are going to be put into the array until it’s parsed everything. Clients writing Codable structs won’t want to use RigidArray because then the decode APIs will have to bubble up customization hooks or something for clients to specify resizing behavior for these data structures. As opposed to continuing to use Array or opting into UniqueArray to have more control over the copying behavior of the array.
Well, my viewpoint is hopelessly skewed by working so much on collection/container implementations, but if I were writing a high-performance deserializer, I would in fact not mind having control over precisely when storage grows and by how much. RigidArray isn't unusable in such contexts -- it "just" requires invoking a separate, explicit operation to resize its storage.
I am not saying UniqueArray has no valid use case -- but my experience suggests that the split may not be as clearly weighted toward it as we initially expected; and so I'm curious to see more people play with these.
Sorry for possibly off topic. Is there a good example of implementation that is compatible and was made keeping in mind Memory Integrity Enforcement? Does it mean the UnsafeMutableBufferPointer by itself incompatible with MIE?
(may be this should be moved to another discussion)
I have some questions about the implementation.
-
BoxandRigidArraystore typed pointers.Span,MutableSpan, andOutputSpanstore raw pointers. Is there a good reason to prefer raw pointers? -
Box,RigidArray, andUniqueArrayhave unavailable copies of their declarations for older compilers. Could they instead use conditional compilation for attributes?#if compiler(>=6.2) @safe #else @available(*, unavailable, message: "…") #endif @frozen public struct Box<T: ~Copyable>: ~Copyable {
RigidArray is one example of that -- it allocates its storage as a single homogeneous block using the UnsafeMutableBufferPointer.allocate allocator. It stores its count and capacity in-place, outside of this storage buffer.
In a copy-on-write collection type, avoiding tail allocation means that the class instance should be a distinct allocation from any variable-sized storage buffer. The class instance is used to own storage (providing a deinit that deallocates it), and to keep track of whether it's uniquely referenced. At first approximation, allocating storage separately means that the class instance will have to now hold a pointer to storage -- meaning that to access an element, we first need to dereference the class reference to access this pointer, then dereference the storage pointer within it. (Extracting a copy of the pointer into the top-level struct wrapper that holds the class reference is one way to avoid the overhead (if any) of having to deal with this double indirection.)
No. Unsafe[Mutable]BufferPointer (and its safe siblings, the span types), are merely tools for representing regions of contiguous memory. UMBP's allocate method knows the element type for which it is allocating memory, so it can pass that on to an underlying typed allocator.
It's important to note that the recommendation against using ManagedBuffer is not absolute; we have no intention of deprecating that construct, or any of the collection types that use it. The advent of memory integrity enforcement simply adds a new axis to the design space of Swift containers. We have started to consider it when evaluating potential solutions, and it will encourage us to avoid implementing designs that score especially poorly against the new metric -- particularly in the stdlib.
Notably, the use of tail allocations (whether done via ManagedBuffer, TrailingArray or other means) do not interfere with the goals of MIE when the size of the tail-allocated buffer does not vary across instances. (Such as when used to represent fixed-size nodes in a search tree.)
It's for special effects. We needed to allow two Span instances to overlap even if their element type is different, without violating memory binding rules. (This requires careful analysis to do safely, but we thought it was important to allow. I'm sure time will tell if this was the right move.)
I think MutableSpan and OutputSpan just went with the same setup, even though of course they must never overlap with any other span instance while they're exclusively accessed. (@glessard may remember a more convincing reason.)
To make up for not being able to assume bindings, the borrowing Span construct has a brand new guarantee that isn't present in UnsafePointer or UnsafeMutableBufferPointer -- it is that the compiler is safe to assume that for the entire duration that Span exists, the elements stored within must not be exclusively accessed/mutated. AIUI, this will allow the compiler to eventually implement optimizations that it isn't able to reliably perform over code using UMBP -- e.g. by allowing it to more often reuse the result of previous loads from a Span around operations with unknown effects.
RigidArray and Box exclusively own their storage and they do not need to allow for such complications.
There is no clever insight behind it. It was driven by my desire not to have to think about which parts of the real declaration might be using constructs that earlier compilers would fail to compile.
That requires that the capacity is known beforehand. I am working on an encoder where the capacity is unknown and the buffer needs to be resized as I append more elements. UniqueArray looks like it fits this use case well.
Is there a safe way to get an Array from an UniqueArray? This works
someUniqueArray.span.withUnsafeBufferPointer { Array($0) }
but it would be great if this could be done without involving unsafe APIs. The UniqueArray is great while encoding, but a regular Array is better for passing the encoded data along in my software.
This is one area where this initial release is not fully fleshed out yet.
These should all be generic operations on any RangeReplaceableCollection:
extension RangeReplaceableCollection {
func init(copying source: borrowing Span<Element>) {
source.withUnsafeBufferPointer { self.init($0) }
}
func init(copying source: borrowing MutableSpan<Element>) {
self.init(source.span)
}
func init(copying source: borrowing OutputSpan<Element>) {
self.init(source.span)
}
func init(copying source: borrowing RigidArray<Element>) {
self.init(source.span)
}
func init(copying source: borrowing UniqueArray<Element>) {
self.init(source.span)
}
// etc. etc.
}
As time passes, we'll need to add a wide range of new container implementations: RigidDeque, UniqueDeque, RigidSet, UniqueSet, RigidDictionary, UniqueDictionary, etc. etc. We'll naturally want to be able to efficiently copy/move/consume data between them and/or the existing collection types. It would be silly to try doing this by defining a direct overload for every combination of source and target.
What we rather want is a set of protocols that define the various shapes of generic containers, and then introduce a nice set of generic operations that can transfer data between them. One of these operations would be an initializer that efficiently copies data from a container into a range-replaceable collection:
extension RangeReplaceableCollection {
func init<Source: Container<Element>>(copying source: borrowing Source) {
self.init()
self.reserveCapacity(source.count)
var it = source.startBorrowingIteration()
while true {
let span = it.nextSpan()
if span.isEmpty() { break }
self.append(copying: span)
}
}
}
After specialization and all the other optimizations, I'd expect this to come out exactly as fast as a concrete overload [Contiguous]Array.init(copying: borrowing UniqueArray<Element>). (To achieve this, we'll likely need to add additional customization hooks to RangeReplaceableCollection that generalize [Rigid,Unique]Array.append(count:initializingWith:) with support for piecewise contiguous initialization, and extend the standard collection types to support that. In the meantime, we may well add targeted specializations for the most frequent cases. Prototyping such customization hooks in a package is tricky, but one obvious way to do that is to introduce a (source-unstable) new protocol refining the standard one.)
Note how I'm careful to label the argument copying here: it's Array(copying: items), not Array(items). This explains and clarifies to the reader what is happening, distinguishing this from Array(consuming: items) that would destroy the source, moving its contents into the new array. The label also distinguishes these operations from the existing Sequence-based ones, that I fully expect to also gain similar span-based improvements.
Well, if your intent is to ultimately move the data into an Array, then it would likely be better to do this with an operation that consumes the source rather than copying it, thereby avoiding unnecessary refcounting overhead. (In case Element is not a trivial type.)
extension Array {
init(consuming source: UniqueArray<Element>) {
self.init(
capacity: source.count,
initializingWith: { (target: inout OutputSpan<Element>) in
source.consumeAll { (source: consuming InputSpan<Element>) in
target.append(consuming: source)
}
}
}
}
// Or rather, expressed as a generic operation over consumable
// piecewise contiguous containers:
extension RangeReplaceableCollection {
init<Source: ConsumableContainer<Element>>(
consuming source: consuming Source
) {
self.init()
self.reserveCapacity(source.count)
self.edit { (target: inout OutputSpan<Element>) in
source.consumeAll { (source: consuming InputSpan<Element>) in
// Called repeatedly to consume every storage chunk in `source`, in sequence
target.append(consuming: source)
}
}
}
But, crucially, if you need to eventually pass the data along as an Array, why not simply put that data in an Array in the first place? Constructing the data in a different type seems counter-productive. I believe Steve was referring to this initializer on Array:
extension Array {
public init<E: Error>(
capacity: Int,
initializingWith: (_ span: inout OutputSpan<Element>) throws(E) -> Void
) throws(E)
}
This initializer gives us a direct way to safely initialize Array's underlying storage, cutting through all its various abstraction layers. I'm using it to implement Array(consuming: UniqueArray) in the draft above.
In my case Element is just UInt8, but thanks for the code sample!
I want to avoid using Array while encoding to avoid accidental exclusivity checks where they are not needed. I do not know the required capacity beforehand, thus I cannot use OutputSpan as far as I can tell.
In my limited usage thus far, I did miss some Sequence/Collection algorithms like elementsEqual to compare a container to a Data, for example. For now, escaping to an unsafe buffer pointer for these operations helps gradual adoption.
Looking forward to a Container protocol and the direction Swift is taking with these new collections.
FWIW, I'm using UniqueArray to implement something approximating a slab allocator where the slabs ~Copyable structs (so they can deallocate their buffer) and are stored in a UniqueArray. Dynamic resizing helps here since you can have arbitrarily many slabs.
Alright. Do make sure to verify that the work you do to improve performance does not actually end up having the opposite result.
Appending to an Array is not slow. I do not expect that constructing things in a Rigid- or DynamicArray will speed things up enough to make up for the overhead of having to copy the final result into an actual Array value. I'd be curious to see actual numbers, though!
The full set of OutputSpan-based operations aren't limited to initialization. SE-0485 has introduced the following entry point:
extension Array {
/// Grows the array to ensure capacity for the specified number of elements,
/// then calls the closure with an OutputSpan covering the array's
/// uninitialized memory.
public mutating func append<E: Error>(
addingCapacity: Int,
initializingWith initializer: (inout OutputSpan<Element>) throws(E) -> Void
) throws(E)
}
RigidArray and UniqueArray ship with a more fully worked out set for "in-place" insertions and "in-place" removals.
extension UniqueArray where Element: ~Copyable {
public mutating func edit<E: Error, R: ~Copyable>(
_ body: (inout OutputSpan<Element>) throws(E) -> R
) throws(E) -> R
public init<E: Error>(
capacity: Int,
initializingWith body: (inout OutputSpan<Element>) throws(E) -> Void
) throws(E)
public mutating func append<E: Error, Result: ~Copyable>(
count: Int,
initializingWith body: (inout OutputSpan<Element>) throws(E) -> Result
) throws(E) -> Result
public mutating func insert<Result: ~Copyable>(
count: Int,
at index: Int,
initializingWith body: (inout OutputSpan<Element>) -> Result
) -> Result
public mutating func replaceSubrange<Result: ~Copyable>(
_ subrange: Range<Int>,
newCount: Int,
initializingWith body: (inout OutputSpan<Element>) -> Result
) -> Result
// This one is not yet enabled
public mutating func consumeSubrange<E: Error, Result: ~Copyable>(
_ bounds: Range<Int>,
consumingWith body: (inout InputSpan<Element>) throws(E) -> Result
) throws(E) -> Result
}
I expect we'll propose the same operations on the standard Array as well. (I'm sure the details will evolve; they always do. In particular, we will probably want a replaceSubrange variant that allows customizing the consumption of the old items.)
These are reasonably straightforward to generalize for piecewise contiguous containers (such as ring buffers or ropes), by simply allowing the closure argument to get called multiple times. As such, they would also provide a nice set of new customization hooks on RangeReplaceableCollection. (If we choose to do so, we can also define a range-replaceable container protocol using these as core requirements -- as long as we end up carrying over the idea of an Index from Collection.)
Most of the time I don’t need to convert to an Array, so I think it’s fine. Numbers would be great, but I find it difficult to test the same algorithm for UniqueArray and Array as I find that there is too little shared API between the two to write the code in a generic fashion.
I am sure that will improve as a Container protocol is introduced. Perhaps Array will gain append(copying:) for generic code of UniqueArray/Array. Or I should work around that by introducing my own bridging protocol and conform both UniqueArray and Array to it.
Span<T> is also useful for providing convenient element-wise access over a byte buffer (UnsafeRaw[Buffer]Pointer, RawSpan, Data, etc.). To make this easy and relatively safe, Span<T> could, in the future, support initialization directly from UnsafeRawPointer. This applies equally to MutableSpan and OutputSpan.
let typedPtr = UnsafeMutablePointer<T>.allocate(capacity: 1)
var span = MutableSpan<U>(_unsafeStart: UnsafeMutableRawPointer(typedPtr), byteCount: 1)
write(&span)
read(typedPtr.pointee)
This is valid (read will see the value written by write) because MutableSpan never assumes its memory is bound to U. If MutableSpan used a typed pointer internally, someone would need to be responsible for rebinding memory before its first use and after its last use.
Boxand RigidArray own their memory, bind it to their element type, and don’t hand out pointers to some other type. Perfect for a typed pointer.
Could the Box also be implemented as a @propertyWrapper type?
-
init(wrappedValue:)would forward toinit(_:). -
var wrappedValue: Twould be the same assubscript() -> T. -
init(projectedValue:)is for "API-level" property wrappers on function/closure parameters (SE-0293). I wasn't able to test this. -
var projectedValue: Selfmight need yielding (or non-yielding)borrowandmutateaccessors (SE-0474). I wasn't able to use_readand_modify(Swift 6.2 compiler).
EDIT: issue swiftlang/swift#81624 is unresolved.