We've seen some increased discussion recently about the idea of a struct OrderedSet to complement the Foundation class NSOrderedSet type. This is something we've been thinking about for a while. I think this is a great opportunity to bring the community together to collaborate on a new API for Foundation.
I'd like to start a discussion here, and am looking for help from the community to develop a full API proposal for this type. I've discussed this idea with members of the Swift standard library team and we agree that we should start with this type as API in the Foundation project. We will target Swift 5 as the release for this new functionality.
We will run this review using the swift-evolution process, slightly tailored for Foundation. I think this will feel a lot like the Swift package manager proposals. I will act as the review manager.
As struct OrderedSet will be written as part of the Swift overlay for Foundation, we can have the implementation done completely in open source as part of Swift. There is no requirement for changes in the Objective-C implementation, or a requirement for a new corresponding Objective-C implementation.
Here are what I believe to be the basic requirements:
struct type and value semantics
Adoption of appropriate Swift standard library collection protocols
API that feels like it fits in with other Swift collection types
No bridging to the class NSOrderedSet type. Bridging introduces the possibility for a source incompatibility, and we have decided that we wish to stop introducing source incompatibilities.
Shared implementation with swift-corelibs-foundation
I'm excited to have the ability to collaborate with all of you on a major new API for Foundation.
Will the implementation be limited to <T: Hashable>? I get that it's a "Set", but it'd be nice if the generic constraint was only <T: Equatable>. You'd have O(n) performance for Equatable objects and O(1) for Hashable, but the ability to not require hashability would be nice.
Equatable is a bit limiting, Comparable would open more doors as far as order-maintaining collections go. The decision probably has bigger consequences, as one could imagine an OrderedDictionary type as well that would need to play consistently.
Such a line of thinking could lead to 3 distinct kinds of OrderedSet
hash-oriented (identity is intrinsic to the element, order is defined on the temporal order of insertion)
equality-oriented (identity is intrinsic to element, order is defined on the temporal order of insertion)
comparable-oriented (identity is intrinsic to element, order CAN be intrinsic to the type OR temporal order of insertion)
What the right choices will be here I think is a key part to this proposals success.
We had very similar discussions about 'What Does the phrase "an ordered set" mean to you', when initially designing the API for NSOrderedSet . The answer being so subjective is one of the reasons that NSOrderedSet feels both like an NSArray and a NSSet as needed (down to the proxying properties: .array and .set being available for when the ordered-set API doesn't fully cover the behavior you may need.
Counted seems like a worthy pursuit -- but at least in my mind, outside the scope of seeking to introduce the concept of 'Sorted Elements' or 'Ordered Elements' (for some yet to be determined definition of Sorted/Ordered) into the algebra of collection protocols.
For the first - I think we should probably come up with use scenarios, and the alternatives you can do in the absence of first-class Ordered/Sorted set.
For example, CoreData uses OrderedSets all over the place - but not necessarily in the same spirit as say NSMutableOrderedSet which models uniqueness + insertion order. CoreData vends OrderedSets because they semantically fit their domain (its a Set of unique things, that happens to have a very specific Order).
Another scenario I've seen folks use OrderedSets is in the order of Sort Criteria (e.g. columns with tabular data). Each column is unique, and the order matters for stable ordering of some secondary collection.
The cost/benefit analysis of these different scenarios would likely help to inform which is worth the maintenance burden.
One limitation we've encountered in the past (with regard to collections in Objective-C) is the lack of a runtime-only data structures. This would let you create say a 'Sorted' collection that takes the function it uses to determine the order of its elements. The NSPointerFunction family of collections try to fill in many common cases (by giving names to specific configurations that went outside the realm of pure Objective-C references), but even these raise exceptions if you try to persist something 'unpersistable'.
As for question 2, I feel like that has the potential to be the most artistic part of this effort.
An ordered set with extrinsic order (i.e. NSOrderedSet-style.)
An ordered set with intrinsic order (a SortedSet?)
with orthogonal Equatable/Comparable vs Hashable implementation.
On this last axis, it feels like we could have a struct OrderedSet<ElementType: Equatable> that becomes more efficient by selecting a separate implementation if the ElementType happens to also be Hashable?
This feels dangerous to me. I'd be afraid to forget making my types Hashable and unknowingly getting O(n) performance. I'd prefer if the performance was consistently O(1). For example, Dictionary could support Equatable keys, but it doesn't make sense from a performance point of view.
My first thought was to look at what other languages include in their standard libraries. I notice that for the C++ stdlib, there is no such thing as an ordered set. Instead they have boost.MultiIndex (more details here). From a cursory glance at collections in Rust, you might be able to use BTreeSet to the same effect.
I stopped there because these two examples already make me wonder... is an OrderedSet the right solution to the kinds of problems or situations they'd be deployed in? For example, would a Swift MultiIndex coupled with a Set be more appropriate and, having abstracted questions of storage away, would the MultiIndex have greater affordance for other API or user-defined types?
I’m not sure I understand why we need a “SortedSet” distinction. If we have a OrderedSet “foo”, you can trivially sort the contents just as you can an Array “bar”, no?
I’m firmly in the Hashable camp. Both for consistency and for Erica’s point about accidental O(n). Especially with the new features coming, Hashable isn’t particularly onerous a burden and we’re already paying it for Set.