RFP: OrderedSet

In the Java world sets that retain their insertion order are called linked sets and they don't have a seperate protocol they conform to the normal set protocol. Whereas sorted sets do have a seperate protocol. It would be great if in Swift sets had a common set protocol and a seperate sorted protocol, something like:

            Sequence
               |
           Collection
               |
             Unique
               |________ __________
               |        |          |
              Set    LinkedSet  Sorted
                                   |
                               SortedSet

PS Have used Swift terminology; in Java a protocol is an interface and some of the names are not exactly as I stated.

That matches collections in Scala that have following type hierarchy:
image
These are Traits, roughly equivalent to Protocols in Swift. The concrete implementation of SortedSet is TreeSet which is a Red-Black Tree data structure.

Then there is LinkedHashSet a mutable Set implementation offering iterator that maintains the insertion order.


Also forgot to mention before, that @lorentey already quite thoroughly explored the design space and implementation of ordered set. See his 20 min dotSwift 2017 talk, GitHub and excellent book.

2 Likes

Apologies, I haven't been following this thread closely. Indeed, I'm very much interested in sorted data structures, and I have extensive implementation experience that I'd be excited to put to use. My old BTree project has an example of what a SortedSet API could look like in Swift; that package's API may be an interesting starting point for the new "Sorted" thread.

However, OrderedSet/ArraySet is a very different animal. It has a different set of operations than a tree-based sorted set, and it has drastically different performance characteristics. It's its own unique thing, with some really interesting UI applications. (It's also a relatively uncommon data structure -- I don't think I have seen it implemented anywhere but Cocoa.) Smooth conversion to/from NSOrderedSet will likely be important, so we should not stray too far from Foundation's implementation -- operations should have the same asymptotic performance. For example, OrderedSet should not traverse fancy search trees for lookups, and as it's a hashing data structure, it too must have its elements Hashable.

I agree with @nnnnnnnn and @Ben_Cohen; I don't think it makes sense for OrderedSet/ArraySet to conform to any of the existing mutable collection protocols. It's perfectly fine for it to come with its own unique set of operations. If some of that overlaps with MutableCollection or RangeReplaceableCollection, then that's great, but we shouldn't force conformance on something that doesn't want it.

3 Likes

@lorentey I love the work you did with the revamped Hashable protocol, I'd be happy if you would tackle SortedSet and similar sorted types next. :+1: (I'll be there for feedback.)

Sorry, I only recently was able to join in and hopefully didn't miss anything when I was going through the (very long) thread. A few points I'd like to make to begin from my perspective of a long term heavy user of NSOrderedSet (it's my favorite Foundation collection!)...

  1. I like the idea of SortedSet but it looks like it deserves its own thread separate from this one. I'd like that idea to be explored further in isolation before I comment on it.

  2. It seems there is a bit of confusion with the "OrderedSet" name. Since we've already decided not to bridge to NSOrderedSet and not necessarily have the same API, maybe we should consider a different name. "IndexedSet" comes to mind, but maybe a better one is out there waiting to be found.

  3. My personal take on protocol adoption is that we should go for it, adopting both SetAlgebra and most everything that Array adopts but with the following caveats:
    3.1) The indexed API should be stricter than NSMutableOrderedSet. I.e. the postcondition of inserting A at index X is that the element at index X is A. If that cannot be guaranteed it should be treated as programmer error.
    3.2) Additional "safe" indexed API for those who don't care if their insertion ends up where they expect it.

@Tony_Parker suggested ArraySet, which seems promising. IndexedSet is misleading, because Set itself currently has indices, and in general having indices is orthogonal to maintaining insertion order.

What is your use case, that motivates this desire to always overwrite the equivalent set member with the latest call’s argument?

I’m getting a feeling OrderedSet under this definition is just a sorted Array, and we are arguing about the semantics of insert, whether it should overwrite the element at position returned by bisect

@jawbroken True that, thinking more about it "Index" is an incredibly overloaded term and IndexedSet would be as likely to cause confusion as OrderedSet.

Maybe, pulling off the Thesaurus, "ArrangedSet"?

@palimondo Ok let me elaborate. As has been mentioned earlier in the thread, if you insert an element in a NSMutableOrderedSet that is already present in the collection at a different index the operation just fails to do anything at all, silently.

I'm advocating for enforcing the contract of an insert operation, so either it succeeds at inserting the desired element at the desired index or it causes an error maybe up to a preconditionFailure.

I'd rather have that than semantics where a present element would be moved from elsewhere, which has some bizarre edge cases (for example, if you want to insert an element at the end of the collection and it's already present there's no way to move the existing element around to preserve the contract).

And yes, a OrderedSet (or whatever we end up calling it) is "just" an Array that also can perform set algebra on its elements, guaranteeing uniqueness of elements and with an O(1) containment check.

The conversation has been threaded with talk about a collection that keeps its elements sorted according to a comparator (a "SortedSet") which is why I was advocating moving that to a separate thread.

No, because searches have no inherent ordering. The ordering is based on when they were added to the set.

You’re right, we are crossing the streams by discussing both types at the same time. My previous point was more relevant to SortedSet. Can I ask, how are you using NSMutableOrderedSet in practice? I’m having trouble picturing in my head a use case, that requires O(1) containment check and at the same time it requires arbitrary, user maintained order. How big are these collections?

Specifically, I don’t understand when in practice does the added overhead of maintaining parallel hashing structures cross the pay-off threshold vis-a-vis just doing linear contains search on [E] where E: Equatable.

There are quite a few cases where I need to both guarantee uniqueness and have fast inclusion testing for a group of things that also need to be presented in a certain order. Most of my heavy usage of them has been UI related and they are used as the model for displaying tables and lists. In some of the cases I can think of the collections aren't usually that large (a few dozen at most) but I have others where they routinely hit thousands of items.

And of course for something like DB access where the items returned are necessarily guaranteed to be unique it works quite well, as in Core Data.

I've also done a fair bit of work to deal with collection change management and propagation and ordered sets are both easier and faster to diff (and they diff more deterministically) and the ability to treat them as sets has also come in handy.

In the end everything can be done with arrays and dictionaries but NSOrderedSet has, in many cases, made my life easier while introducing little in the way of issues.

To get this topic going again to keep pace for inclusion in Swift 5, I think there's some consensus that ArraySet is the best name for a Swift counterpart to Obj-C's NSOrderedSet.

The protocol conformance should likely include those of both Array and Set:

struct ArraySet<Element> : SetAlgebra, 
Hashable, 
Collection, 
RandomAccessCollection, 
RangeReplaceableCollection,
ExpressibleByArrayLiteral where Element : Hashable

with open question around how to handle MutableCollection and RangeReplaceableCollection to work with Set behavior.

Another open question is whether insertion should match NSMutableOrderedSet's insertObject method:

If the object is already a member, this method has no effect.

with some opinions that the API should support this behavior and also support an option to update the element's index.

Other Set types like SortedSet should not be included in this discussion but instead have their own threads as they're fundamentally different behaviors to support.

1 Like

I would dispute that ArraySet is a correct name for this, as using Array as part of the name can be confusing insofar as the guarantees we make for e.g. O(·) of access. (As far as I can tell, we document no guarantees for NSOrderedSet, and I don't think we should document or mislead by implying any here either.)

3 Likes

There's another source of confusion considering precedents like CharacterSet:
It would be reasonable to associate "ArraySet" with Set<Array<T>>...

2 Likes

A suitable name can be determined during review, but probably shouldn't prevent any proposal from going forward.

The O.P. made it sound like the core team was collaborating to include an NSOrderedSet implementation for Swift 5, is this still the case?

1 Like

I was hoping to aim for Swift 5, but there hasn't been much forward progress here. If we can find someone who is willing to pick up the torch and move this forward, perhaps we can still make that deadline.

I doubt we'll make Swift 5 now, however, I think all discussion points here have been great insofar.

The correct name can indeed be decided during review/submittal, however because we're bringing NSOrderedSet over to Swift, personally, I think it makes the most sense to merely drop the NS prefix.

There have been times in my own work, where NSOrderedSet has been required due to the need of omitting duplicates in a list, without resulting in a O(n) or worse implementation. NSOrderedSet has a complexity of O(1) which is a fantastic feature that Swift should have.

We already have Set, to filter out duplicates, but nothing to maintain the order, which is often more appropriate.


struct OrderedSet<Element> : SetAlgebra, Hashable, Collection, RandomAccessCollection, RangeReplaceableCollection, ExpressibleByArrayLiteral where Element : Hashable


Ideally, we would replicate the API functionality across the board, using Swift-ier function names as appropriate.

I also think that a Swift OrderedSet should be inherently mutable, like its Set counterpart.


I am happy to attempt to put together a proper proposal, if nobody has yet. Since I recently required its use... :smiley:

3 Likes

That would be really nice! See this answer for how to do so.

1 Like

Bumping this thread to see what, if anything, has come of it. A sorted set is the one data structure that I've wished for since Swift 1, mostly because I forget that Set itself isn't sorted.

Not trying to play a role as forum police, but please don't just bump threads as you wish. If you want a certain topic to be pushed forwards and if the thread itself is either huge or ancient, it's recommended that you take the time and sum up everything from the previous thread and open a new thread. Thank you. :slight_smile:

2 Likes