RFP: OrderedSet

Tony_Parker · March 26, 2018, 8:21pm

Hello everyone,

We've seen some increased discussion recently about the idea of a struct OrderedSet to complement the Foundation class NSOrderedSet type. This is something we've been thinking about for a while. I think this is a great opportunity to bring the community together to collaborate on a new API for Foundation.

I'd like to start a discussion here, and am looking for help from the community to develop a full API proposal for this type. I've discussed this idea with members of the Swift standard library team and we agree that we should start with this type as API in the Foundation project. We will target Swift 5 as the release for this new functionality.

We will run this review using the swift-evolution process, slightly tailored for Foundation. I think this will feel a lot like the Swift package manager proposals. I will act as the review manager.

As struct OrderedSet will be written as part of the Swift overlay for Foundation, we can have the implementation done completely in open source as part of Swift. There is no requirement for changes in the Objective-C implementation, or a requirement for a new corresponding Objective-C implementation.

Here are what I believe to be the basic requirements:

struct type and value semantics
Adoption of appropriate Swift standard library collection protocols
API that feels like it fits in with other Swift collection types
No bridging to the class NSOrderedSet type. Bridging introduces the possibility for a source incompatibility, and we have decided that we wish to stop introducing source incompatibilities.
Shared implementation with swift-corelibs-foundation

I'm excited to have the ability to collaborate with all of you on a major new API for Foundation.

davedelong · March 26, 2018, 8:27pm

Question:

Will the implementation be limited to <T: Hashable>? I get that it's a "Set", but it'd be nice if the generic constraint was only <T: Equatable>. You'd have O(n) performance for Equatable objects and O(1) for Hashable, but the ability to not require hashability would be nice.

griotspeak · March 26, 2018, 8:34pm

Wouldn't the constraint have to be Comparable?

Tino · March 26, 2018, 8:38pm

It's not a sorted set, but rather an array that can't contain duplicates.

Michael_J_LeHew_Jr · March 26, 2018, 8:38pm

Equatable is a bit limiting, Comparable would open more doors as far as order-maintaining collections go. The decision probably has bigger consequences, as one could imagine an OrderedDictionary type as well that would need to play consistently.

Such a line of thinking could lead to 3 distinct kinds of OrderedSet

hash-oriented (identity is intrinsic to the element, order is defined on the temporal order of insertion)
equality-oriented (identity is intrinsic to element, order is defined on the temporal order of insertion)
comparable-oriented (identity is intrinsic to element, order CAN be intrinsic to the type OR temporal order of insertion)

What the right choices will be here I think is a key part to this proposals success.

Michael_J_LeHew_Jr · March 26, 2018, 8:42pm

We had very similar discussions about 'What Does the phrase "an ordered set" mean to you', when initially designing the API for NSOrderedSet . The answer being so subjective is one of the reasons that NSOrderedSet feels both like an NSArray and a NSSet as needed (down to the proxying properties: .array and .set being available for when the ordered-set API doesn't fully cover the behavior you may need.

griotspeak · March 26, 2018, 9:06pm

Ah. I forgot about that bit. I was thinking about Set implemented using only Comparable

millenomi · March 27, 2018, 12:34am

Two questions in my mind:

Which of these makes the most sense to implement (and perpetually maintain)?
Is there a synthesis possible between one or more of these types, or should they be kept separate?

Erica_Sadun · March 27, 2018, 12:46am

If we get ordered sets, can we get counted ones?

Michael_J_LeHew_Jr · March 27, 2018, 1:25am

Counted seems like a worthy pursuit -- but at least in my mind, outside the scope of seeking to introduce the concept of 'Sorted Elements' or 'Ordered Elements' (for some yet to be determined definition of Sorted/Ordered) into the algebra of collection protocols.

Michael_J_LeHew_Jr · March 27, 2018, 1:39am

Those are good questions.

For the first - I think we should probably come up with use scenarios, and the alternatives you can do in the absence of first-class Ordered/Sorted set.

For example, CoreData uses OrderedSets all over the place - but not necessarily in the same spirit as say NSMutableOrderedSet which models uniqueness + insertion order. CoreData vends OrderedSets because they semantically fit their domain (its a Set of unique things, that happens to have a very specific Order).

Another scenario I've seen folks use OrderedSets is in the order of Sort Criteria (e.g. columns with tabular data). Each column is unique, and the order matters for stable ordering of some secondary collection.

The cost/benefit analysis of these different scenarios would likely help to inform which is worth the maintenance burden.

One limitation we've encountered in the past (with regard to collections in Objective-C) is the lack of a runtime-only data structures. This would let you create say a 'Sorted' collection that takes the function it uses to determine the order of its elements. The NSPointerFunction family of collections try to fill in many common cases (by giving names to specific configurations that went outside the realm of pure Objective-C references), but even these raise exceptions if you try to persist something 'unpersistable'.

As for question 2, I feel like that has the potential to be the most artistic part of this effort.

millenomi · March 27, 2018, 3:10am

It feels the taxonomy here is:

An ordered set with extrinsic order (i.e. NSOrderedSet-style.)
An ordered set with intrinsic order (a SortedSet?)

with orthogonal Equatable/Comparable vs Hashable implementation.

On this last axis, it feels like we could have a struct OrderedSet<ElementType: Equatable> that becomes more efficient by selecting a separate implementation if the ElementType happens to also be Hashable?

Aciid · March 27, 2018, 3:13am

Does this mean there will be only one implementation and not two (overlay + corelibs-foundation)?

millenomi · March 27, 2018, 3:15am

I think it's safe to say a strongly desired goal is that we offer the same code on all platforms for this.

hartbit · March 27, 2018, 3:45am

This feels dangerous to me. I'd be afraid to forget making my types Hashable and unknowingly getting O(n) performance. I'd prefer if the performance was consistently O(1). For example, Dictionary could support Equatable keys, but it doesn't make sense from a performance point of view.

Erica_Sadun · March 27, 2018, 3:54am

I'd prefer <T: Hashable>, just like other sets:

public struct Set<Element> : SetAlgebra, Hashable, Collection, ExpressibleByArrayLiteral where Element : Hashable

Think about implementation, and re-balancing if needed, where the complexity grows.

To be fair, I don't think I've ever really needed ordered set outside of examples. But I've used counted sets (bags) a lot.

kiel · March 27, 2018, 4:09am

Great initiative, thank you!

My first thought was to look at what other languages include in their standard libraries. I notice that for the C++ stdlib, there is no such thing as an ordered set. Instead they have boost.MultiIndex (more details here). From a cursory glance at collections in Rust, you might be able to use BTreeSet to the same effect.

I stopped there because these two examples already make me wonder... is an OrderedSet the right solution to the kinds of problems or situations they'd be deployed in? For example, would a Swift MultiIndex coupled with a Set be more appropriate and, having abstracted questions of storage away, would the MultiIndex have greater affordance for other API or user-defined types?

karim · March 27, 2018, 11:58am

I’m not sure I understand why we need a “SortedSet” distinction. If we have a OrderedSet “foo”, you can trivially sort the contents just as you can an Array “bar”, no?

I’m firmly in the Hashable camp. Both for consistency and for Erica’s point about accidental O(n). Especially with the new features coming, Hashable isn’t particularly onerous a burden and we’re already paying it for Set.

DevAndArtist · March 27, 2018, 12:31pm

Silly question because I'm confused now. Where is the difference between OrderedSet and SortedSet? For me OrderedSet would look like this

struct OrderedSet<Element> where Element : Comparable & Hashable, SetAlgebra, Hashable ... { ... }

I really miss this type in our app I'm currently working on. I have unique items that must be ordered (sorted) by their date property.

DeFrenZ · March 27, 2018, 12:41pm

If I understood correctly:

OrderedSet has an order, in the same way that Array is ordered.
SortedSet would always keep its contents sorted according to their conformance to Comparable.

As an example

let unordered: Set = [3, 5, 1, 7]
print(unordered) // no assumptions about order.

let ordered: OrderedSet = [3, 5, 1, 7]
print(ordered) // 3, 5, 1, 7

let sorted: SortedSet = [3, 5, 1, 7]
print(sorted) // 1, 3, 5, 7