I'm looking for more information on the relative performance of associative collections, and with a little guidance I would be happy to generate some of that information myself or contribute new benchmarks, if the information isn't already out there.

I am working on a project that builds a number of sets. Ultimately, these sets will be represented as sorted regions (or possibly even Eytzinger regions) within a single large array. If the (outdated) graph here is a good indicator, I think that in practice it will probably be fastest to simply pay the N^2 cost of building these sets in place (due to linear insertion cost). But, I'll need a fallback for large sets to keep the worst-case performance in check. As such, I don't find these graphs all that useful: they are comparing a cache-friendly data structure with a C++ data structure we know not to be cache friendly, and not readily accessible to a Swift program. I'd be much more interested in seeing how the performance of these sorted sets/mappings compares with my Swift-available alternatives: the standard library's hashed sets/mappings and with a sorted array implementation.

So, two questions: 1. is the data I'm looking for available somewhere, and 2. if not, is there any guidance for adding benchmarks?

If the program's use of the set has distinct "building" and "using" phases, and you aren't frequently updating the set, you could also build the set as an unordered array first, and then sort and unique the array once after all of the elements have been added, which should be less than quadratic given a flat memory space, and is often better for cache locality on real hierarchical memory than incrementally inserting into a sorted or hashed data structure.

Thanks, but the sets are built by putting items in a work queue, and each item should only be processed once, so I need set lookups before the final stage. I'd be happy to discuss my data structure problem elsewhere and maybe we'll come up with some ideas that do work, but I wanted to keep this thread focused on benchmarks.