Swift Collections 1.3.0

Swift Collections 1.3.0 is out now; it is a new feature release primarily focused on an initial set of basic constructs that help Swift’s “low-level” systems programming use case.

This release supports Swift toolchain versions 6.0, 6.1 and 6.2, and it includes the following improvements:

BasicContainers module

This new module collects ownership-aware, low-level variants of existing data structures in the core standard library. In this release, this module consists of two array variants, UniqueArray and RigidArray.

These new types are provided as less flexible, noncopyable alternatives to the classic Array type. The standard Array implements value semantics with the copy-on-write optimization; this inherently requires elements to be copyable, and it is itself copyable.

struct UniqueArray<Element> is a noncopyable array variant that takes away Array's copy-on-write behavior, enabling support for noncopyable elements. This type's noncopyability means mutations can always assume that the array is uniquely owned, with no shared copies (hence the name!). This means that array mutations such as mutating an element at an index can behave much more predictably, with no unexpected performance spikes due to having to copy shared storage.

struct RigidArray<Element> goes even further, by also disabling dynamic resizing. Rigid arrays have a fixed capacity: they are initialized with room for a particular number of elements, and they never implicitly grow (nor shrink) their storage. When a rigid array's count reaches its capacity, it becomes unable to add any new items -- inserting into a full array is considered a programming error. This makes this a quite inflexible (or rigid) type indeed, as avoiding storage overflow requires careful, up front planning on the resource needs of the task at hand. In exchange, rigid arrays can have extremely predictable performance characteristics.

UniqueArray is a great default choice when a task just needs an array type that is able store noncopyable elements. RigidArray is best reserved for use cases that require absolute, pedantic control over memory use or latency -- such as control software running in environments with extremely limited memory, or when a certain task must always be completed in some given amount of time.

The Unique and Rigid prefixes applied here establish a general naming convention for low-level variants of the classic copy-on-write data structure implementations. Future releases are expected to flesh out our zoo of container types by adding Unique and Rigid variants of the existing Set, Dictionary, Deque, Heap and other constructs, with type names such as as RigidDictionary and UniqueDeque.

TrailingElementsModule module

This new module ships a new TrailingArray construct, a preview of a new low-level, ownership-aware variant of ManagedBuffer. This is primarily intended as a interoperability helper for C constructs that consist of a fixed-size header directly followed by variable-size storage buffer.

ContainersPreview module

This module is intended to contain previews of an upcoming ownership-aware container model. In this initial release, this module consists of just one construct: struct Box<T>.

Box is a wrapper type that forms a noncopyable, heap allocated box around an arbitrary value.

47 Likes

Thanks for sharing!

• • •

From a naming perspective, I think the Unique prefix could use some bikeshedding.

As an outsider who hasn’t been following the development of Swift Collections closely, when I read UniqueArray my first thought is:

“Oh, this must be an array that ensures its elements are unique. So it’s like an ordered set. Wait… isn’t OrderedSet already a thing?”

And then after reading what the type actually is, my second thought is:

“Hmm, but the arrays are not really unique. I can easily make two distinct instances with identical contents.”

I don’t know if a better prefix can be found, but I think it deserves some collective effort by the community.

17 Likes

I happened to stumble on this a couple days ago, and it piqued my interest because I have a use case where I'm using ManagedBuffer but the design of this would be a better fit (rather, TrailingPadding, because I have a raw block of memory that I'm binding different types to manually).

I can't see how I would use this directly as the storage underlying a CoW data type though. Since ManagedBuffer is a class I can use isKnownUniquelyReferenced to manage when I need to copy it. But TrailingPadding is a ~Copyable struct so I can't do the same thing here.

I'd also like to avoid wrapping the TrailingPadding in a custom class just to get the isKnownUniquelyReferenced behavior, because that turns every one of my values from one heap allocation into two.

Is what I'm looking for possible today? I've been thinking of pitching a "new ManagedBuffer" that supported raw memory access to the trailing buffer and Span APIs, and that overlaps a bit with what you have here.

1 Like

We ended up with the Unique prefix after a long deliberation within Apple’s Standard Library team, including with members of the LSG who are not at all shy about overruling my technical and aesthetic arguments.

UniqueArray is a fine name; I think it will serve this project well. I’m confident you’ll come to the same position, once you get a chance to sit down and use it for a bit.

This package is not governed by the Swift Evolution process. These names have already shipped in a stable package tag. This announcement was not an invitation for any bikeshedding. We’ll stick to these names; I think they’re quite easy to get used to.

(Yes, I’m aware of the clash with previous loose terminology of “unique” collections in a different sense; we deliberately chose this name despite that. One advantage of this prefix is that it evokes std::unique_ptr and Swift’s isKnownUniquelyReferenced API. It succinctly highlights the primary difference between this new type and the classic Array construct: UniqueArray is always uniquely owned.)

We do intend these constructs to serve as previews of future stdlib additions. There will be plenty of opportunity to suggest alternative names when/if these types get pitched and proposed to the Standard Library. That said, I’m personally hoping that inertia will prevent the S-E process from arbitrarily/capriciously messing with these names, unless for some reason they prove to be actually horrible. I have no wish to repeat the fiasco of SE–0453.

4 Likes

It is also not governed by a mandate to exclude community input.

Announcing these types as part of a public release, without (as far as I am aware) inviting participation and discussion by the community, was a choice, not a requirement.

2 Likes

TrailingArray is not intended to be used for building CoW data structures – its primary use case is interoperability, to allow Swift code to (more or less) safely deal with C structs with a trailing flexible array member. (And similar buffer-like constructs with a fixed-size header.)

For data structure implementations that aren’t yet baked into Swift’s ABI, we’re currently trying to move away from the classic ManagedBuffer model where copy-on-write data structures store their contents in a variable-sized tail-allocated buffer immediately following a constant header. This representation is not compatible with the security-in-depth strategy of using typed allocators in our Memory Integrity Enforcement initiative, and the general recommendation is to avoid using it.

Excitingly, it is quite possible to build a safe(!) construct for building a copy-on-write collection implementation by wrapping a noncopyable container like RigidArray – I’m very interested in exploring this direction by fleshing out this tool within this package. A very early, unstable draft of such a construct is in the swift-collections 1.3.0 tag, although it is disabled for now. (The caveat is that while the process of forwarding each operation from the wrapper to the underlying rigid variant is relatively straightforward to implement, it is an entirely manual process that requires some careful thought.)

1 Like

The goal of this package is to serve as a playground for potential future additions to the stdlib. These constructs are expected to graduate into stdlib proposals at some point – at which stage they will be subject to the full brunt of the Swift Evolution process. In the meantime, as a practical matter, what ships here has to be ultimately up to the careful consideration of the stdlib team.

Insightful API criticisms are of course always welcome. However, you have explicitly attempted to initiate a bikeshedding discussion. “Bikeshedding” refers to the unfortunate practice of engaging in prolonged, emotional discussions over trivial matters, ignoring deeper questions. I have come to loathe this practice, and I am doing my best to discourage it.

It is certainly possible for a naming discussion not to devolve into bikeshedding – but explicitly asking for a bikeshedding discussion is not the way to do that. Try to get used to the names we shipped; if they still feel wrong after a few weeks of actively using them, then please do complain.

1 Like

I will bow out of this discussion because I feel unwelcome here.

I shared my initial thoughts, and I suggested that community input on naming could be valuable.

The response I received has conveyed what I perceive as hostility toward community input. I will not speculate on whether that was the intent, but it is the effect.

If you think that some other wording would improve my comment, by all means please replace “bikeshedding” with your preferred alternative when you read it. This sort of collaborative input you are providing on my writing is a great way to improve its quality and clarity.

5 Likes

Then OwnedArray seems to sound better. Is it possible to make current names as deprecated typealiases and introduce new names in future?

No matter what the names were chosen, I’m very glad to see this announcement.

1 Like

Are there any plans for benchmarks (Array / ContiguousArray / RigidArray / UniqueArray)?

Perhaps might also be worth exploring this latter part of the description if considering alternative names: UnsharedArray, UncopiedArray, etc.

1 Like

Sure. I don't expect microbenchmarks to be very exciting; it's mostly just going to validate that we haven't done any obvious mess-ups. We already have some microbenchmarks for Array operations -- I often use them as a reference baseline while developing other container types.

One really surprising thing I discovered while using these types is that in low-level (high-performance or resource limited) contexts, I often end up overwhelmingly reaching for RigidArray rather than UniqueArray.

And if I'm not operating in such contexts, I don't really care to embrace the inherent pedantic constraints of ownership-aware types and operations, so naturally I tend to stick to the classic Array.

UniqueArray appears to sit in a semantic "no man's land" in between Array and RigidArray, not really filling any clear need. It seems dynamic sizing may not be all that important after all.

I find this interesting to contrast with the disproportionate amount of effort everyone seems to spend focusing on UniqueArray, as if it was the obviously more important type. (Granted, almost all of that effort is invariably spent on the name, not on any of its actual issues. But still.)

By its implementation, UniqueArray is in fact a relatively straightforward wrapper over a RigidArray instance. (I expect it may get a bit more complicated when we start implementing separate code paths for resizing vs in-place operations -- it's not right that insert's reallocating path moves the tail twice.)

I'm looking forward to finding out if others have the same experience, once folks have had a chance to play with these for a while.

1 Like

The current type names are fine for prototyping work (which is what this is). Please do kindly stop bikeshedding them.

The only way I can see for this package to remain in development over the next few years is for it to become what it was supposed to be from the start: a prototyping playground for figuring out the right container shapes before we actually propose them to the stdlib. This inherently means that we should be okay with making mistakes and changing our minds. The prototypes exist for us to learn where the problems are, so that we can fix them before putting these in production.

We are of course able to deprecate these types (or even the entire module) if we decide that they aren't expressing the right operations. I half-expect we'll do that, especially when we get to start defining generic algorithms -- the interactions with Sequence algorithms are going to be a nightmare to deal with, and I'm not so gently pushed directly into trying to tackle that.

4 Likes

A+ :slight_smile:

A+ :slight_smile:

1 Like

This is a very interesting insight! Do you think it's idiosyncratic to the kind of context you're working in, or is it likely to be reflective of overall usage?

With the Collections package serving both as an incubator for future stdlib improvements and as a specialized package, I wonder if this points to RigidArray and UniqueArray falling into two opposite sides of that spectrum. That is, would this usage experience suggest that the latter is best kept in a specialized library, particularly if it is a shiny but not ideal choice for most users?

1 Like

Have you used either of these in places where you're adapting existing, Array-based code to use the new types' noncopyability? When working on performance, that can be a powerful tool for finding and eliminating unintentional copies, while still allowing the "copying" that occurs when resizing/reallocating.

My expectation is that UniqueArray will be most important in these contexts, since the additional invariant that RigidArray not exceed the original capacity would be harder to satisfy.

1 Like

Is there an API to initialize Array from UniqueArray without copy cost (O(1))?
If it is not existing now, is it technically possible to provide such API?

If such an API exists, I think we could start using UniqueArray in certain parts of the code.

For example:

struct Foo {
    var name: String
}

func makeLargeData() -> [Foo] {
    var data = UniqueArray<Foo>()
    for i in 0 ..< 100 {
        data.append(Foo(name: "\(i)"))
    }
    // No exact matches in call to initializer
    return Array(data)
}
1 Like

There is no such API. It is technically possible, but it would require that UniqueArray set itself up to look like an Array from the get go in terms of object layout and such which we want to avoid for these new types.

1 Like

That is indeed what I'm very curious to find out as well!

Yes! But I've been primarily focused on improving performance, and (with few exceptions) the problems at hand generally were quite amenable to fixed-capacity solutions. So I think I need to gather more data.

It is also possible that my preference for RigidArray just indicates that I'm something of a control freak, and I when I find a reason to reach for the convenience/control slider, I tend to end up moving it all the way. I did not feel UniqueArray was giving me enough control. Others might not be willing to put up with the consequences -- the dynamic variant is unquestionably easier to use.

2 Likes