Introducing Swift Collections

tkremenek · April 5, 2021, 10:22pm

I’m thrilled to announce Swift Collections, a new open-source package focused on extending the set of available Swift data structures.

Please check out the blog post for details, or go right to the repository!

Questions on the blog post can be asked here!

CC @lorentey @kylemacomber

Karl · April 5, 2021, 10:46pm

Protocol conformance checkers? Comprehensive benchmarks?!?

I love it Great stuff!

nikitamounier · April 5, 2021, 10:48pm

Looking through the code, this seems like it was quite an undertaking (to say the least) to implement just these three data structures – congrats @lorentey and @kylemacomber! I can't wait to see more.

Jon_Shier · April 7, 2021, 7:01pm

For fun, and after a bug fix by @lorentey, I was able to run the announcement benchmarks on my M1 mini. Quite an interesting difference:

M1 mini benchmarks

05 OrderedDictionary lookups

Overall it took 11329s to run 20 iterations up to 16M elements.

vale-cocoa · April 9, 2021, 12:26am

I was wondering if using a Left-Leaning-Red-Black-Tree for the ordered dictionary would be beneficial.
I made one in Swift while studying an online class by Robert Sedgewick on Coursera and his book Algorightms.
This the swift package, in case it might be of any interest.

Nobody1707 · April 9, 2021, 7:15am

If it would be, then there's a likelihood that a B-tree would be better, as it's equivalent to an RB-tree, but with better cache locality. There's already a Swift implementation of one, but it hasn't been updated in quite a while.

lorentey · April 9, 2021, 8:27am

Fun! It seems most results are largely comparable; however, things tend to be (sometimes significantly) faster on the new chip (as one might expect), and there are some indications of differences between the two memory subsystems (e.g., look at the shape of the Deque subscript curves). These are fascinating, but be careful before drawing conclusions: I'm not sure how useful (& how fair) these microbenchmarks are when comparing architectures.

One thing I haven't noticed before is how Deque.prepend switched places with std::deque in one of the graphs, becoming the fastest. In the blog's benchmarks, Deque measured slower, because the compiler wasn't (yet!) able to move the call to isKnownUniquelyReferenced out of the tight benchmark loop. Evidently this either somehow isn't the case with the ARM backend, or something about the M1 is favoring the Swift implementation. It's an interesting result; I think I'll need to do some investigations.

lorentey · April 9, 2021, 9:01am

I would expect it would be less useful for the ordered dictionary case, where the dictionary is expected to be able to handle any arbitrary ordering -- but it would definitely be a valid implementation of a SortedDictionary.

However, I agree with @Nobody1707 -- in my previous work, I found that a B-tree implementation written in pure Swift can easily be as much as ~300-400% faster than the best red-black tree implementation I could come up with, while also using significantly less memory. (As long as the tree doesn't need to be (overly) persistent.)

So we're currently planning to base the sorted collection implementation around some variant of the B-tree data structure. (It won't exactly be the classic B-tree you'd find in a data structures book, because internal nodes will need to be kept much smaller than leaves so that copy-on-write copies won't need to duplicate too much memory.)

I think it would be interesting to benchmark your code against whatever we come up with (or just the original BTree package) -- if it comes out ahead, we should talk!

(p.s. Sedgewick's book was my Algorithms professor's favorite reference; I should check out his class!)

vale-cocoa · April 9, 2021, 12:46pm

Hi thanks a lot for the reply.
Actually I've also studied your Objc-io book Optimizing Collections too, thus I am aware of the B-Tree implementation having a better performance than the one you did with a Red-Black-Tree.
Indeed I am curious too to use your same benchmarking methodology for the Left-Leaning Red-Black-Tree, cause it should at least perform better than the other Red-Black-Trees as per Sedgewick's studies.
I doubt too it would beat your B-Tree with nodes sized as the L1-Cache pages, I'm curious to see to what degree though.
Thus I'll make an OrderedDictionary based on it and try to apply your benchmarking methodology.
Still I am not sure if these performance tests are good to be done in a swift package and against a data structure which is linked to the standard library as per this article by Matt Gallagher.

One more thing: don't you think a Swift Standard Library Protocol for Queue should be made? That's cause there are plenty types of Queues as Data Structures aside for a Deque.

All the best,
Valeriano Della Longa

vale-cocoa · April 9, 2021, 12:52pm

Hi Nobody1707 and thanks for your insight. I am aware of that implementation having studied the book Optimizing Collections.
Anyway based on Sedgewick's studies, the Left-Leaning Red Black Tree should be more performant than other implementations of Red-Black Tree.

DevAndArtist · April 9, 2021, 2:43pm

Just asking out of curiosity. Would a NonEmpty collection wrapper fit this package and be allowed to be proposed for its inclusion? Expressing non-empty collection is equally useful as expressing an ordered collection.

DevAndArtist · April 17, 2021, 10:15am

@lorentey I'd love to hear your opinion on that. cc @stephencelis & @mbrandonw

jawbroken · April 19, 2021, 1:24am

I wouldn't say that it is equally useful, given that many methods would require you to drop the non-empty guarantee (e.g. anything that returns a subsequence or can remove elements) and the only upside seems to be that it would remove optionality from a small number of return types (first, popFirst, randomElement, min, max). It's roughly like trying to represent a non-zero number in the type system so you can't divide by zero.