RFD: CountedSet

jawbroken · April 13, 2018, 3:52am

That's one question, though I think the ability to use generic algorithms for SetAlgebra is another one, at least if it's decided that a CountedSet should conform to SetAlgebra. The multiset definitions seem mostly safe to use generically as sets (because they behave essentially like you have a set where multiple items are distinguished by a sequential index, i.e. {a, a, b} represented as a set by {a₁, a₂, b}) but I fear the ad hoc nature of the alternative definitions (sometimes ignoring counts, sometimes adding them, sometimes breaking commutativity by retaining the counts from the lhs, etc) makes it hard to understand what you can safely do with them from a SetAlgebra perspective.

I had some time to poke around for CountedSet equivalents in other programming languages. Here are some quick notes:

C++ multiset: Maintains a sorted list of all items using a custom comparison function, rather than counting them. The count is the cardinality of the multiset (not the number of unique elements), and their set intersection method for multisets appears to behave like a mathematical multiset.
Java Bag (Apache Commons): Looks like it also maintains all inserted items, instead of counting them. Makes a mess of things, with most methods seemingly violating the semantics of Collection and tagged “(Violation)”, which is what I'm trying to avoid with SetAlgebra. size is the cardinality of the multiset (sum of all counts), uniqueSet returns the set of unique elements (like my suggestion above for a Set-like view that ignores counts).
Java Multiset (Google): This one is actually counted. size is the multiset cardinality, provides an elementSet view without the counts, doesn't appear to have union or intersection operations.
Objective-C NSCountedSet: The starting point for this proposal. The behaviour of this seems all over the place (and not well documented). I played around with it in a playground for a while and it seems like intersection ignores counts entirely, just intersecting the sets of unique elements and setting the counts to 1 (?). union also behaves weirdly, it seems to ignore the counts in the second NSCountedSet and just insert one of each unique element from it. e.g.
[a: 2, b: 1, c: 1].union([a: 3, b: 1]) = [a: 3, b: 2, c: 1]. I don't think NSCountedSet is a good example to emulate.
Python collections.Counter: Dictionary with counts. “Addition and subtraction combine counters by adding or subtracting the counts of corresponding elements. Intersection and union return the minimum and maximum of corresponding counts” i.e. the standard multiset definitions. Counts can be negative for some reason.

I guess my conclusion is that some languages provide a multiset where all the elements are actually stored, rather than counting them (I guess because sometimes elements considered “equal” might be distinct in other ways). Some languages have multisets that you can't do set operations on, or can only do a subset of them. Of languages that do have set operations, Python has the most reasonable implementation to me (if you ignore that counts can be negative), and the Objective-C behaviour seems very idiosyncratic in ways that don't seem particularly useful, let alone correct.

Another option to mixing this functionality together and coming up with distinct names, mentioned at the end of my previous post and with precedent in several of the languages above, would be to provide a Set/SetAlgebra view of the Elements with non-zero count, so you could do operations that only depend on the presence or absence of Elements. Perhaps you could also (or instead) provide some targeted APIs that answer questions like “does this CountedSet contain at least one of each of the elements in this Sequence.