Assigning a dictionary without re-hashing

FWIW, I did a quick test: hash value calculation of UUID().uuidString based keys takes about 30ns on my old computer, if the whole access is 100ns it's quite significant chunk.

When is the dictionary element index invalidated or may become invalid? Haven't found it in the documentation :astonished:. Possible candidates:

  1. when removing the indexed element (even if we are not using the index itself)
  2. when removing an element (different from the indexed one)
  3. when adding another element
  4. when changing element values (including the indexed element). Excluding changing elements' values to nil which is equivalent to removing elements (see above).

Obviously in 1 and it also feels like in 2 & 3, but not 4, is this correct assumption?

That's a good question. In the absence of documentation, you must assume that all indexes are invalidated on any mutation. MutableCollection imposes some additional requirements, so you can assume that mutating through dict.values[index] does not invalidate indexes.

Strictly speaking, mutating through dict[key] could invalidate indexes when key is already in the dictionary. It would be reasonable to ask for that as a guarantee in the documentation.

#1 and #3 can absolutely invalidate. #2 may or may not, but it's an implementation decision that I suspect we don't want to promise anything about.

1 Like

Thank you.

Do you know if the following foo and bar are equivalent?

struct Item {
    var foo: Int = 0
    var bar: String = ""

var items: [String: Item] = ...

func change(_ item: inout Item) { += 1

func foo(key: String) {
    change(&items[key, default: Item()])

func bar(key: String) {
    var item = items[key, default: Item()]
    items[key] = item
1 Like

They should have the same effect; for a subscript implemented with get and set they would have the same behavior as well. However, a subscript implemented with _modify could provide an optimized implementation that only has to do one lookup of key.

1 Like

Dictionary never invalidates indices when changing the value of an existing key. This is not explicitly spelled out in the docs, but an important hint is that the Dictionary.Values view is a MutableCollection.

(We go to extraordinary lengths to preserve indices on value mutations even when the original dictionary is a "verbatim bridged" NSDictionary instance. There is some eldritch code hiding in there -- consider that NSDictionary doesn't really model indices at all. :sweat_smile:)

Dictionary does provide the following guarantee about index invalidation vs insertions:

A dictionary’s indices stay valid across additions to the dictionary as long as the dictionary has enough capacity to store the added values without allocating more buffer. When a dictionary outgrows its buffer, existing indices may be invalidated without any notification.

When you know how many new values you’re adding to a dictionary, use the init(minimumCapacity:) initializer to allocate the correct amount of buffer.

I believe this originated in C++. I find this is a rather unfortunate guarantee, as it prevents the Dictionary from reordering collision chains on insertion to improve average lookup performance (as in round robin hashing). For better or worse, we are stuck with it now, though.

Removing a key from a Dictionary always invalidates every index within it, as the removal operation involves rearranging the collision chain that included the original item.

Set operates the same way.


Can we formalize this guarantee?

Absolutely. This is a (semantic) API change, so I expect it'll require going through Swift Evolution. I've been collecting other ill-defined aspects of the stdlib -- perhaps we have a large enough batch of these now to be worth the expense of a small proposal.

(Edit: unifying and documenting the expected behavior of base properties on mutable slices could also be part of this, although that one also requires actual behavioral changes in some obscure corners of the stdlib.)

IMO, dealing with base semantics is big enough to tackle on its own, particularly if the rest of the semantic changes are going to be uncontroversial and require little debate.