Adding more data structures to the standard library

I'm studying for job interviews for Swift positions by completing code problems. By far the most painful part of this preparation is the dismal state of data structures in Swift.

If I were programming in Java (which I really rather wouldn't), I would have easy access to...

Java's collections

Sets

  • Unordered sets (HashSet, CopyOnWriteArraySet, EnumSet)
  • Insertion ordered sets (LinkedHashSet)
  • Naturally ordered sets (TreeSet, ConcurrentSkipListSet)

Maps ("Dictionaries" in Swift)

  • Unordered dictionaries (HashMap, WeakHashMap, IdentityHashMap, EnumMap)
  • Insertion ordered dictionaries (LinkedHashMap)
  • Naturally ordered sets (TreeMap)

Ordered collections

  • Arrays (ArrayList, LinkedList, Vector (a synchronized array), LinkedList)
  • Queues (ArrayDeque, ConcurrentLinkedDeque, PriorityQueue, SyncrhonousQueue, ArrayBlockingQueue, DelayQueue, LinkedBlockingQueue, PriorityBLockingQueue, linkedTransferQueue, LinkedBlockingDequeue`)

Behold this beauty: Collection Hierarchy

and Map Simple

Amazingly, many Java programmers don't even find these sufficient! The Guava library (a ubiquitous Java library by Google) adds in all kinds of collections like multi maps (map keys to multiple values), BiMaps (which offer O(1) look up from key to value, and from value to key), and all kinds of immutable, concurrent, lazy, etc. variants of most things above.

Meanwhile in Swift land

We have just Set (unordered), Dictionary (unordered), and Array. No concurrent collections (value types solves most of the issues here, but it doesn't solve things like the classic produce-consumer, for example), no linked lists, queues, stacks, bimaps, multimaps, or sort ordered collections.

If we import the ObjC data structures available from Foundation, and have a masochistic desire to cast everywhere because of the lack of generics, we get ordered sets, counted set, and that's ... about it.

Anticipated responses

"The standard library is meant to be very minimal"

If we took that to its conclusions, then we shouldn't have Dictionary, Set or Array, either. People should just use Unsafe(Mutable)?(Raw)?(Buffer)?Pointer. All of those pointer types could be implemented in terms of UnsafeMutableRawPointer, so there isn't even a need for any of the other variants.

Clearly that's infeasible. These are too useful to pass up. But so are ordered dictionaries, counted sets, and synchronized collections. I can't count how many times have I have had to wrap a dictionary and an array of keys for ordering (it's real fun to keep them in sync!).

"But you can just import XYZ library"

Generally yes, but not in a programming challenge, interview or quick scratch project.

Also, there's no prevailing standards, so there's no established "common currency" types for the use-cases these structures fill.

"Just write your own"

... :man_facepalming:t2:

18 Likes

I'd prefer to have a blessed library to use. They're something people generally get wrong, so it's probably worthwhile to have.

If anything, many of these data structures don't match Swift's Collection contract; most notably, their natural subscript isn't O(1). So I don't think many of them will conform to the protocol.

Can I suggest starting from a slightly more constructive position? Between the opening title and the concluding facepalm, this post is a bit antagonistic.

Keeping the standard library so super-minimal that it doesn't cover the most basic of data structures is definitely a non-goal.

The main reasons the standard library doesn’t have a wider range of data structures are:

  1. we were pending ABI stability - both as a project focus and as a barrier to increasing the binary size since it shipped with every app; and
  2. no-one's yet done the work of researching, designing, implementing and proposing them.

Since 1) is now behind us, proposals for fundamental data structures would definitely be welcome! @lorentey is also planning to spend some time on this for the next couple of releases and I expect would be keen to work with some collaborators. I'd suggest the right place to start is a discussion of the suitable fundamental data structures, over on the standard library development forum. From there, an overview doc can be put together then people can take individual parts to work on.

My view is this would be: key first additions are ordered and sorted sets and dictionaries, a bimap, and some lower-level types that can be built on top of, like a ring buffer. Linked lists have been discussed a few times in the past as a candidate, but are really not that widely useful. Note, Swift's Array can act as a stack already. Concurrent collections are a topic that should probably be deferred for now.

50 Likes

That’s not even true :joy: - but admittedly, many people don’t even seem to use Set and are happy with nothing but arrays and dictionaries...

6 Likes

I don't even know this sub-category exist! Thanks for bringing this up!

4 Likes

For many data structures this isn't an issue in practice, only in theory. For example, with many tree-based Collection designs, while subscript access might theoretically be O(depth), the maximum depth the tree might reach is so low that you can effectively treat it as O(1).

(tbh the main problem with Collection is the requirement Index be Comparable... but it's not a showstopper in most cases, just more work)

3 Likes

I didn't intend for the title to sound antagonistic, it was just a genuine observation of mine. I haven't seen anything to suggest that data structures were being taken seriously. I suspect part of the cause is that a large portion of Swift's audience is iOS devs, where JSON is commonly used for interacting with restful APIs, which doesn't support anything besides arrays and dictionaries.

I would be glad to put together an initial proposal for a "first round" of data types (the more important ones like you mentioned), if that seems like something that people would be interested in. This thread is here to gauge interest.

Yeah, which is why I have been loving code problems that require use of a stack. But when I need a queue, and have O(n) head deletions... ouch.

2 Likes

Is that for the purpose of bounds checking? What indexes aren't comparable (raw pointers?)?

What are the general thoughts around marker protocols that have no API? e.g. SortedCollection, where the protocol can't really do anything to ensure conforming types are sorted, by can be used as documentation (e.g. in a return type, a parameter to a binary search function, etc.)

I've had issues with linked list indices before, I would guess indices for trees would have similar difficulties
The easy solution is to include a distance from start parameter, but then if you want to be a bidirectional collection you need an O(1) size to get the distance of the endIndex

Yeah, for a linked list you need to store a count from start, though the end doesn’t need it, you can just define .end > _.

Tree indices can be a path from the root down to the element.

I think they’re the source of much debate as to whether they’re a good or bad idea :)

Even if .end > _ what's .index(before: .end) on your doubly-linked list?

1 Like

Oh right. Yes, you’re stuck there. You could store the count on the node too, but then you lose O(1) inserts which is kinda one of the points of linked list... :frowning:

This gets me thinking... what is Dictionary<Key, Value>.Index, and how is it comparable?

2 Likes

AFAIK Dictionary indices are just indices into the underlying array. The only weirdness is that index(after:) has to skip over all the unused slots

Edit: You can see it in the repl:

  1> let dic = Dictionary(uniqueKeysWithValues: (0..<10000).lazy.map { ($0, $0) }) 
// Large dictionary printout removed
  2> dic.startIndex
$R0: Dictionary<Int, Int>.Index = {
  _variant = native {
    native = {
      bucket = {
        offset = 1
      }
      age = -1612773233
    }
  }
}

And there's no way to do that skip without linear probing, right? So O(bucketCount)?

That'd require more bookkeeping, and it's already O(1) amortized when looping. So I don't really see a point.

A single call would be O(bucketCount), but obviously looping over the whole dictionary isn't O(bucketCount^2) due to the fact that the dictionary can't have bucketCount gaps between every pair of indices

Edit: though because of this, calling index(after:) in a lazy collection wrapper's subscript will make that O(n) in unlucky cases (though other collections exhibit much bigger issues than Dictionary here, like LazyFilterCollection)

1 Like

YOU TAKE THAT BACK! :stuck_out_tongue_closed_eyes:

7 Likes