Unordered Array / Counted Set

I think it would be very useful to have a data structure representing an unordered collection of non-unique elements. Using an Array ascribes a certain degree of significance to the order of elements, which can be misleading and detrimental to both hashing and equivalence checks.

There are reference-type implementations of the concept in Foundation (NSCountedSet) and CoreFoundation (CFBag).

As for my proposed implementation: if elements were Hashable, a simple Dictionary of type [Element: UInt] could be used to back the data structure, with the value being the quantity of elements. I’m not certain that constraint is truly necessary, though, so I’d welcome alternative suggestions.

1 Like

An example of when this is useful is storing summands: summation is associative and commutative, meaning order should not affect equivalence. That makes an Array unsuitable. It is not idempotent, however, so the number of times an element appears is consequential. That makes a Set unsuitable.

Note

Yes, I know that computer addition cannot be truly associative. That’s an implementation detail, not a property of my API, and it is handled accordingly.

Yep, a counted, unordered, hashed multiset could be a reasonable addition to the package. I think this is mostly an API design challenge, rather than a data structure implementation task, as [Element: Int] seems like the right data structure for implementing such a set.

We discussed this a little bit in https://github.com/apple/swift-collections/issues/20.

Whether this proves to be a worthy addition depends on how much better API the wrapper type provides over a raw dictionary of integers -- and on how well the design would translate to the other multiset variants that we will likely want to add, including the non-counting SortedMultiset. (There are fundamental differences between these two that I expect will be reflected in their API; but how much do they have in common? And what other multiset-like things do we need to consider?)

1 Like

It would not necessarily be difficult to implement. To optimize, to test, to achieve consistent use across the Swift ecosystem? That’s slightly harder. Not that the difficulty of the implementation should be a barrier to inclusion, of course.

In terms of API interface, I think it should be extremely similar to an Array. The key difference would be hashing, equivalence, and potentially certain performance characteristics. Storing many copies of an element could take far less memory, for instance.

How should Collection.count work, in terms of time complexity? Adding up every value in a backing dictionary would be an O(n) operation, obviously, so it might be preferable to cache count internally and update it on mutation.

I recognize that NSCountedSet.count returns the number of distinct elements rather than the number of elements overall, but I feel that is misleading at best. The type should probably expose a view for distinct elements, akin to Dictionary.Keys, that can serve that purpose.

There's some discussion of this in this previous thread, and I did a review of some other programming language's implementations there.

Having reviewed that thread, I might take a crack at an implementation myself.

The way I see it, a CountedSet is just a wrapper around a Dictionary. In light of that, I’ll literally start there. I’m going to ignore Foundation’s CountedSet: anyone who wants a drop-in replacement should probably petition the Foundation team instead.

SetAlgebra conformance seems fairly straightforward in my book: if the type conforms, it has to meet all the axioms. If that can’t be done without being unnecessarily confusing, it shouldn’t conform.

By the way, I find it sort of hilarious that the standard library literally defines a CountedSet type as a code example in documentation.

@lorentey I assume I should make a separate target for this. Since you expressed plans to potentially add other multisets in the future, should I call it MultiSets rather than CountedSet?

Should the CountedSet API trap if provided a count that isn’t greater than zero (for instance, in a dictionary literal)? The alternatives are ignoring such counts and trapping (using a precondition or fatal error).

I’m leaning towards trapping with a precondition.

extension CountedSet: ExpressibleByDictionaryLiteral {
  /// Creates a counted set initialized with a dictionary literal.
  ///
  /// Do not call this initializer directly. It is called by the compiler to
  /// handle dictionary literals. To use a dictionary literal as the initial
  /// value of a counted set, enclose a comma-separated list of key-value pairs
  /// in square brackets.
  ///
  /// For example, the code sample below creates a counted set with string
  /// keys.
  ///
  ///     let countriesOfOrigin = ["BR": 2, "GH": 1, "JP": 5]
  ///     print(countriesOfOrigin)
  ///     // Prints "["BR", "BR", "JP", "JP", "JP", "JP", "JP", "GH"]"
  ///
  /// - Parameter elements: The element-count pairs that will make up the new
  ///   counted set. Each key in `elements` must be unique.
  /// - Precondition: Each count must be greater than zero.
  @inlinable
  public init(dictionaryLiteral elements: (Element, Int)...) {
    _storage = .init(uniqueKeysWithValues: elements.lazy.map {
      precondition($0.1.signum() == 1)
      return $0
    })
  }
}

also consider "signed multisets" option (those allow negative, zero and infinite multiplicities).
perhaps via an initialiser parameter:

init(...., options: Multiset.Options = .default)
// .positive == .default
// .nonNegative
// .signed
// .weighted (real)
// .withInfinity (perhaps implemented as Int.max / .min for integral multiplicities)

also worth mentioning: other languages make the difference between hash and tree based data structures obvious (HashSet vs TreeSet / SortedSet / OrderedSet). the difference is important for custom types: whether to provide a hash or lessThan operation. if not too late for Swift to adopt this practice then the name could be HashMultiset.

There are tentative plans for other multisets in the future. This one mostly just has the semantics of an array.

Terms of Service

Privacy Policy

Cookie Policy