RFP: OrderedSet

Thank you. To be fair I personally really don't need the OrderedSet then and I don't see its shiny use-cases. What I really would wish for is something like a SortedSet by your definition.

C++'s bizarre nomenclature strikes again...

I believe the generic std::set in C++ is in fact an ordered set (or in the vernacular of this thread, a sorted set, since the comparator is specified once for the set). It's implemented as a red/black tree.

If you want a Swift-style set (that is, a hash table), I believe that's std::unordered_set.

One thing I notice from code-kata sites is the frequent use of std::multiset, which is an ordered collection that allows duplicate elements, implemented as a binary tree. (I can never see this name without my brain yelling "Leeloo Dallas multiset!")

The naming in Obj-c always tripped me up too, perhaps Ordered and Sorted are too close and can be given something more dissimilar.

I do reach for OrderedSet from time to time, and I think many API's can improve their semantics by using OrderedSet instead of Array. For example, arrangedSubviews on UIStackView is an array with a uniqueness guarantee according to the documentation, but this could be more clearly expressed if it returned an OrderedSet.

Iā€™m familiar with SortedSets from Scala, implemented by class TreeSet which is a red-black tree with ordering specified during initialization. If you leave out the ordering argument, the default ordering on the element type will be used.

I had to page swap-in the NSOrderedSet from NSHipster and official docs as this was the first time I have encountered itā€¦ and Iā€™m thoroughly confused by this :man_zombie:!

The immutable variant looks like an array with uniqued elements that promises faster membership tests ā€“ without actually mentioning any performance guarantees.

But the mutable variant is where it gets hairy: insert(_:,at:), moveObjects(at:,to:), exchangeObject(at:,withObjectAt:), sort, sortRangeā€¦ From this I gather the elements are implicitly sorted in insertion order, but you can muck around like itā€™s an array.

A set and an array at the same time!?! Iā€™m having a hard time estimating performance and memory characteristics of this novel (from CS POV) type. And because the documentation doesnā€™t say, Iā€™ll assume itā€™s backed by an array and a tree that stores array indices? Membership test doing tree traversal with dual pointer chasing for each comparison? Or is this sorted array with binary search and insertion indices in a parallel array?

I see a clear need for a SortedSet, thoroughly studied data structure with well known performance guarantees. In my opinion, merging SortedSet and Array into single datastructure is quite strange. Iā€™d prefer the collections in Swift standard library to evolve in modular and composable direction modeled after Scalaā€™s Collections.

@Tony_Parker could please expand what you mean by complement here:

Iā€™m really unclear on the requirements hereā€¦

Searching through a SortedSet has different characteristics, for one.

OrderedSet is useful for maintainingā€¦ order. If you need to know when an item was first added in, this is useful. Creating lists of people involved in something might be an example. There isn't a need to list someone twice, but maybe I want to know what order I added them to the list in.

6 Likes

Both ordered sets and sorted sets exist for JVM languages, Java, Scala, Kotlin, etc., in the form of LinkedHashSet and TreeSet and their derivatives respectively. Ordered sets, linked hash sets, are very common in JVM code. All the sets, not just LinkedHashSet and TreeSet, on the JVM require elements to be Hashable to give fast performance (not a big deal on the JVM since objects are automatically Hashable via their memory address).

2 Likes

As an example, i keep a list of observers in a custom OrderedSet-like struct. I want to ensure a single observer isn't registered twice, but the order in which they registered has to be respected (or at the very minimum deterministic).

2 Likes

All I meant by complement was: we should provide a struct that has approximately the same features as the reference type, so that people can use this new struct in Swift most of the time. Somewhat like Set vs NSSet.

I am open to several types (ordered and sorted) if we think that makes sense. I agree we should be more clear about performance guarantees than the existing NSOrderedSet type.

This new type doesn't have to have 100% the same behavior as NSOrderedSet. We can take this opportunity to improve the API, much like we did for many of the other value types we've introduced in Foundation. However, it would be weird if we introduce a struct OrderedSet which is totally unrelated in concept to the reference type.

Hopefully that clarifies things a bit.

8 Likes

For now I think the best we can do is to have one implementation that is copied into both places. There are some practical considerations about why the overlay is part of the compiler and why swift-corelibs-foundation is a separate git repository.

Someday (I'm hoping after ABI stability is done), I hope to consolidate the projects so we can have just one repository.

1 Like

Yes please! I find OrderedSets very useful.
A cache for example will use FIFO to drop an object when hitting a threshold (such as a count).
Another use case for me is keeping ā€œRecent Searchesā€ where duplicates donā€™t make sense and order is important.

Iā€™ve asked about it at WWDC 17 (along with JSON parsing) and Iā€™m excited to see these features get implemented.

2 Likes

If an element that is already in the ordered set is added again to the set will the element now be considered at the front of the ordered set?

1 Like

The semantics of NSMutableOrderedSet (which I believe control the intended behavior of OrderedSet) follow the NSSet convention, in that adding an object equal to an object in the set does nothing:

https://developer.apple.com/documentation/foundation/nsmutableorderedset/1408009-add

2 Likes

Set behave the same and will teturn false if you try to insert the same element.

1 Like

If weā€™re buidling the ā€œrecent searchesā€ UI element mentioned earlier and reordering the result based on recency, perhaps the SortedSet type is more useful for this scenario?

1 Like

It seems to me that building (or having the effects of) a ā€œSortedSetā€ is pretty easy given an ā€œOrderedSetā€. But an ā€œOrderedSetā€ type cannot be simulated with only a ā€œSetā€ and ā€œSortedSetā€

1 Like

I think this is the key question. By default, we should follow the convention of NSMutableOrderedSet (re-adding does nothing), but we should also have an option to override this behavior and move the element to the end.

func add(_ item : Element, reorderIfPresent : Bool  =  false)

Does that hold its own weight vs just:

// strawman syntax
orderedSet.remove(item)
orderedSet.append(item)

I don't have a strong opinion, really, just asking. I suppose there's a performance win with a single statement, but not sure if I grok all the use-cases for this situation.

1 Like

That's closer to splay semantics than existing NSMutableOrderedSet. @QuinceyMorris is right in that NSMutableOrderedSet is a set first, then an array only after the set membership test fails.

Even SortedSet isn't quite appropriate for the 'move the inserted item to the front', because it requires the sort-predicate to globally manipulate the data structure which is not typical.

@palimondo - the idea of ordered set is very common when yielding results from select distinct-style database queries - which is why CoreData uses them as heavily as they do. Yes they are 'sorted' in that there is a distinct order to them (whatever the order by clause states for instance), but the predicate that orders them is far less important than the fact that they are distinct and in a specific order.

In fact, thinking about that seems to be the crux of the difference between the two kinds of sets. And gosh yes, we need better names than sorted vs ordered! :)

Hi everyone,

Looks like we've made some great progress in this thread. Does anyone want to volunteer to draw up a simple starting point of what an interface for OrderedSet and SortedSet might look like?