I believe the existing OrderedCollections
/OrderedSet
/OrderedDictionary
are eminently named, because they highlight the most salient distinguishing feature of these implementations with a succinct, unambiguous prefix. (OrderedDictionary
is a bit long, but oh well.) These are the right names, even though they do not convey that these are hashed collections, nor do they imply anything about the precise data structures that they implement.
Similarly, the future SortedCollections
, SortedSet
and SortedDictionary
look as perfect to me as Swift names can be, for the same reason. "Sorted set" does not spell it out that this will also be a (sort of) persistent data structure, or that it will implement an in-memory b-tree variant, or even that it requires Comparable rather than Hashable from its Element type. While these are all super interesting features, the primary reason one would choose to use these is that they keep their items sorted, and this is what should be reflected in their naming.
I bring up OrderedSet
and SortedSet
as excellent names, because they highlight a very obvious but also very much unavoidable "issue" -- the difference between the two is far from intuitive. We're using the words "ordered" and "sorted" to mean very different things, but I know many (most?) newcomers initially expect "ordered set" to mean the same as a "sorted set", and (as I have seen, repeatedly) they often reach for one when they would be better off with the other.
However, as long as the documentation is properly explaining the distinction, and as long as we always use these terms consistently, this is not at all a problem: if you're new to these collection types, you quickly learn the terminology when you first encounter them, as part of the general learning process. Equally importantly, these names have enough precedent that a veteran programmer who has used these in other languages will hopefully find them relatively familiar and intuitively obvious.
The same is true for PersistentSet
and PersistentDictionary
. I don't love the prefix "persistent", as (1) I find it hard to say it aloud without rolling my eyes and pausing for breath and (2) I find that it's precise meaning is even less obvious to people new to this concept than the "sorted" vs "ordered" distinction.
Note that as Steve has pointed out, the first issue here is my own personal problem, not an objective truth -- and also note that I do like these well enough to propose to use these names in the first place.
The second issue, on the other hand, is probably unsolvable. I very much doubt we can find a usable name for these whose meaning will be intuitively obvious to people who have never come across such data structures before.
What I'm hoping though is that someone would point out an obviously better alternative that is still cromulent to folks who have used these in other languages. (And perhaps avoids the superficial & temporary -- but still real -- confusion about the meaning of the word "persistent".)
Failing that, I'm very happy to stick with PersistentSet
and PersistentDictionary
.
SharedTreeSet
is a bit of a mouthful -- it would be nice if we could stick to the pattern, and prefix these names with a single, preferably short term. Highlighting that these are sharable would get rid of the need to describe their internal representation -- so we can delete Tree
. (I don't see how using the term Node
would add anything, either.) Instead of Shared
, Sharable
might be a more appropriate prefix.
This way, we'd end up with SharableSet
and SharableDictionary
. I don't think I have seen these used elsewhere, and a search did not pop up any prior art, either. But I also don't immediately hate them -- they are just generic enough to (maybe) let them go past my no-newly-invented-terminology rule.
The crucial question is: would these names be obvious enough to folks who have used such data structures elsewhere?
For example: would SharableSet
be likely to be misunderstood as a set that is safe to share across threads or actors?
I do think highlighting that these are trees does heavily hint at their persistency / shareable nature, but on reflection I strongly agree with your second point about the module name -- PersistentCollections
isn't nearly specific enough, and something like TreeCollections
/SharableCollections
would share the same drawback.
We got a free pass on omitting "hashed" from the names PersistentSet
and PersistentDictionary
, because it's implied from the naming of the standard Set
& Dictionary
. But PersistentCollections
reads like it would be the right module to hold all persistent data structure implementations, which isn't correct -- for example, SortedSet
and SortedDictionary
will also be (at least somewhat) persistent data structures, but they will ship in a separate SortedCollections
module, not in PersistentCollections
. So probably the module name should include some qualifier about the hashing nature of its contents.
The stdlib documentation calls Set
and Dictionary
"hashed collections", so PersistentHashedCollections
and SharableHashedCollections
are the current front runners for the module name for me.
That is generally not true. These types optimize for mutations of shared copies; however they aren't necessarily any better for large data sets than the standard Set
and Dictionary
, unless the collections do get copied before mutation, at least some of the time.
In general, I expect that unless snapshots are kept, insertions/removals into these will be slower (by a constant factor) than the standard types. (Lookups will often be quite close, as not having to probe through multiple candidate buckets seems to sometimes make up for the overhead of having to dereference a series of children down the tree. The effect is likely larger for complex keys like String
than it is for, say, simple integers.) To add insult to injury, the new types are typically using less memory than the standard types for really tiny counts (because of the clever node compression scheme), but if there are many items, the tree-based types (currently) tend to use a bit more memory overall.
"Set that optimizes for shared mutations" really is the thing that the names need to succinctly convey to people.