[API Review for 1.1] Hash-Array Mapped Prefix Trees

To me, this suggests that it is appropriate for Swift Collections to vend types which are explicitly implemented using HAMTs; that instead of trying to think of a somewhat vague name which describes the usage but is neutral to the implementation strategy (e.g. ShareableSet), it should go in the complete opposite direction.

Once you start naming things based on usage, I would argue that you are instead implementing one of those higher-level types as you mention in the second paragraph.

The building blocks should be explicit about their implementation strategies in order to clearly communicate their strengths and weaknesses - this is wood, or brick, or stone; it doesn't help me build a house if my supplier only describes the building blocks as consisting of 'some house-building material'.

3 Likes

That’s a very good point.

Perhaps HashTreeSet would make sense.

You're arguing here against the established naming conventions of the Swift Standard Library, and standard libraries in general. This is an uphill battle, and I'm not really interested in participating in it.

I suspect you are also underestimating the value and importance of persistency as a functional distinguishing property. To me, a persistent set is functionally very different to a flat hash table, even if they implement the same API.

As I said above, the only reason these types exist is to provide persistency; the precise data structure they internally implement is useful to know, but it's strictly secondary to this property -- it ought to be described in the documentation, but it does not need to be repeated ad nauseam in the type name.

However, I also previously wrote:

So the current options that I find palatable are:

set type dictionary type module name
PersistentSet PersistentDictionary PersistentHashedCollections
ShareableSet ShareableDictionary ShareableHashedCollections
TreeSet TreeDictionary HashTreeCollections

The first (and original) entry is what I think are the natural names for these. The second entry is the current choice, designed to eliminate confusion* about the meaning of "persistency" and to find a friendlier name than a bone dry latin one.

(* I am not particularly convinced the confusion is more than skin deep; and no matter what name we choose, the documentation will need to explain that these types are persistent data structures, as that is the established term of art for these.)

People have noted that "shareable" can also be misinterpreted -- however, the confusion in that case seems even less convincing to me: this precise term isn't currently in widespread use for anything else in this field; the root "share" does pop up in a lot of places, but I don't see it being applied to container types. "Shareable" does have the drawback of being far less specific than "persistent", and it is arguably venturing uncomfortably far into the practice of Inventing New Terms of Art.

The structural names TreeSet/TreeDictionary would be mediocre choices that do not directly emphasize the crucial persistence property. But (1) they do indirectly hint at it; (2) they have plenty of precedence in other languages; and (3) they are so tepidly noncommittal that there is zero chance of (non-performative) confusion -- all users will need to learn the true properties of these types by reading the documentation. They also have the (to me) very attractive benefit of being succinct.

Thank you all for your input over these two weeks! It has been an interesting and productive discussion.

I'll ponder the arguments, weigh the pros and cons, run things by some experienced folks off this forum, and reach a decision on naming soon. I suspect we will end up going with one of these three choices.

6 Likes

Having read the original "Pitch" and the discussion that ensued from that I still much prefer the names as suggested in the pitch.

2 Likes

My first thought when seeing the post title was persisting data to some type of storage.

This has been brought up already, but just to add another data point...

1 Like

Some proposals include a “Future Directions” section so I’m guessing there are no further plans for these types that may affect their naming. Is this a fair guess?

Or perhaps there could be future directions but the package might introduce new, dedicated types that compose with these types, e.g. for true “persistent data structures” which preserve and revert between versions?

Yes. 100%.

Two or more people have a shared history. You do not share your history with yourself.

1 Like

Indeed. I could live with "Shareable", but I think "Persistent" would be a grave mistake. It is so baked into our collective heads as pertaining to persistency between launches, ie, on disk, that it would certainly lead to confusion. The first question anyone is going to ask is "How do I make this PersistentSet persist?" Who's on first?

While "Shareable" is acceptable, I still think it would be better to go for a term that means basically the same, but is not widely used, to avoid any prejudice about what it is for. Eg. the term "Stake" is used in the blockchain world, but not in Swift world, and means basically the same thing (eg proof of stake = proof of your sharing, stake a claim = take a share). It is widely understood, but not overloaded in our community.

Thank you very much for taking the time to provide the detailed response, I really appreciate it. It really helps clarify some very important points. It also deepens my concern with the direction of the development of the language and the lack of the voice of the vast majority of the language users who don't regularly follow or participate in these forums.

I am not a typical user of the language myself, but enough of an outsider to see the ever increasing bias in the language's developers and curators.

An aside about me if you are curious

I am just an old humble civil engineer who started programming with FORTRAN IV using punch cards and have been lucky enough to be exposed to a variety of languages and paradigms over the decades. Not to mention being through revolution, war, immigration, etc.

As you noted, the language is moving away from providing currency types that just work well enough in almost any case (I believe this is what mere mortal programmers like) towards static and well defined performance characteristics that computer scientists like. I understand that these precise and predictable set of data structures are absolutely necessary as low-level building blocks for framework and system level programming, and I really love and appreciate what Swift Collections package is doing.

On the other hand, I thought Swift wants to be a language that helps ordinary developers write safe and performant code, even if they are not mathematicians and computer scientists.

I am too busy to write more, but there is so much more to say.

Meanwhile, to help younger folks get a sense of how Apple's currency data structures used to be, check out this 17 years old blog post.

1 Like

I am confused by this perception; nothing is being taken away. Array and Set and Dictionary are still there, any most programmers should (and will) still use them. This proposal is about adding another option for cases where those types are inadequate, rather than making those types more complicated and difficult to use correctly.

6 Likes

Sorry, I was not clear, and my post is technically off-topic:

We absolutely need the functionality provided by Swift Collections library and this particular API. I know we are not taking away anything from the standard library, and I understand that with ABI stability, Set, Dictionary and Array are here to stay as they are. Also, they are a good (to acceptable) fit for over 80% of use cases.

What concerns me is that we don't get the functionality provided by Swift Collections for (almost) free and as part of the currency types used with high level frameworks. I don't agree with this push away from providing high-level adaptable constructs like CFArray/NSArray (as linked above) which used to make life easier for average programmers by not forcing them to need something like Swift Collection library to get adequate performance for their ordinary boring needs.

I regret that Swift Collections (and knowing the details of data structure trade-offs) is becoming a mandatory requirement for writing performant every-day run of the mill apps. This will also push framework developers to either accept the lost performance or add dependency and API surface to cover different variants of the same conceptual container. I don't like having to write multiple versions of the same API to cover accepting/returning say Array<Bool> and BitArray.

I wish there was some more R&D to evaluate the feasibility of providing higher level and more opaque currency types for use by higher level Swift frameworks: A modern-day native Swift equivalent to the legacy Foundation types (CFArray/NSArray,...) instead of completely abandoning that approach.

Back to the topic:

Given all this, now I am convinced that we should stick with the CS terms of art and even using the full backing data structure names (as @lorentey already did in the title of this topic) to correctly communicate what these are.

Fundamental issues with the core design principles of the Swift Standard Library are best raised with (and best addressed by) the Core Team or the Language Workgroup, not this (project-specific) forum. If you believe these additions surface a major issue that requires a course correction in the stdlib, please raise it on the Standard Library forum or directly on one of the sub-forums for Swift Evolution.

(I'm sorry for snubbing you like this. I actually composed a detailed reply but I won't post it -- I can't take on the time commitment of opening this particular box, and I'm in fact a bit miffed that I let myself be nerd sniped like that. I do think it would be very important to explain e.g. how your entire premise is fundamentally wrong (:stuck_out_tongue_winking_eye:); or how the closest analogue to the NSArray class cluster in Swift is the RandomAccessCollection protocol, not Array; or why std::vector<bool> is bad design; or how that infamous article can be so misleading, etc. etc. Sadly this really isn't the appropriate time for me to do it, and (tragically) I wouldn't have capacity to track the inevitable discussion that follows, no matter how interesting it would be.)

Quick note: I don't know why the title reverted to Persistent -- I suspect it's either due to some technical issue, or it was some well-meaning but silent moderation. (I don't much care either way; the title is fine as is.)

In any case, the current title does not indicate anything about the eventual outcome of this review. (Which is not yet decided. Thanks y'all again for the valuable input; I need some time to digest it.)

4 Likes

6 posts were merged into an existing topic: Adding 0-based integer signed offset and offset range subscripts to RandomAccessCollection

Ah! I was baiting to get that reply and open that box . Since I also don't have time to fully participate, I was planning on just sitting back and enjoying the show :stuck_out_tongue_winking_eye:. OK, I am really busy this morning. I will open a new appropriate topic to raise these concerns. I will help pushing you guys to better communicate and document these design choices and the chosen trade-offs.

I am fully aware that generics and protocols are supposed to fill the role that class clusters used to play, but there is a lot to be desired. Until very recently, generics (and existentials) have been suffering from serious usability issues that have been intimidating mere mortal programmers. I am grateful that they are being addressed now. Still, there is much more to be done, especially when module boundary and resilience is involved.

Moderator note: the final part of this post has been removed to a new thread, linked below.

2 Likes

Are we talking about a Merkle tree?

Maybe "HashTrieSet"? The term "trie" is used for similar structures for some reason.

Shareable seems like a very reasonable choice to me.

Any name with "trie" or "prefix tree" in it seems like something you should steer well clear of. Those are specialized data structures that are occasionally extremely useful because of the specialized APIs they support around e.g. finding entries that share a maximal prefix with a given lookup key. We wouldn't want a programmer looking for them to think these data structures could be used that way.

3 Likes

I just saw the term HAMT used above which according to wikipedia is "Hash array mapped trie".

HAMTSet / HAMTDictionary looks good as well.

Shareable seems like a very reasonable choice to me.

Let me beat this dead horse one more time…

The vagueness of “shareable” worries me, especially since any set is shareable by the assumed definition until you edit it. And it seems to imply some sort of locking mechanism to allow cross thread sharing, which isn’t the case.

Why not SharedResourceSet? Says more succinctly what is shared, with no implication that it is threadsafe.

PS my personal favorite is PerpetuatingSet. Says what it does, and frankly, the name kicks butt

3 Likes

The compromise we ended up with is to use the structural type names TreeSet and TreeDictionary. The name for the new module is HashTreeCollections.

I rejected the prefixes Persistent and Shareable due to very persistent (sic) feedback that people consider them misleading.

TreeSet and TreeDictionary are structural rather than functional names, but they avoid any and all such confusion, which makes them a reasonable compromise.

In other programming languages, the names TreeSet and TreeDictionary are more typically used for sorted collections. However, we will use the proper, functional names SortedSet and SortedDictionary for those, leaving the Tree prefix free to reserve for persistent data structures.

When this package gains persistent variants of Array and String, we will therefore have the option to call them TreeArray and TreeString. PersistentSet, PersistentDictionary, PersistentArray and PersistentString would be the most technically correct names, but the association with long term storage seems too strong to accept them at this time. (We can revisit this if/when these types get proposed for inclusion in the Standard Library. It is very much possible that the confusion will dissipate as people become familiar with the concept.)

One more thing: Offline feedback indicated that the current TreeDictionary features for diffing/merging two dictionaries (keys.subtracting/keys.intersection) aren't enough to properly support immediate use cases: we will need a more elaborate interface for structurally combining two dictionary instances. This will require some non-trivial API additions -- I'll open a separate thread once I have a workable implementation.

This review thread is now closed. Thank you all for contributing to this discussion!

18 Likes