[Pitch] - Add `DictionaryProtocol` to the standard library

Not yet, everything in a single swift file.

I made everything @inlinable but that's still COW-ing for me:

Code
public final class IntRef {
    public var value: Int
    @inlinable public init(_ value: Int) {
        self.value = value
    }
}

public struct IntVal {
    public var ref: IntRef
    @inlinable public init(_ value: Int = 0) {
        ref = IntRef(value)
    }
    @inlinable public var value: Int {
        @inlinable get { ref.value }
        @inlinable set {
            guard isKnownUniquelyReferenced(&ref) else {
                print("COW")
                ref = IntRef(newValue)
                return
            }
            print("NO COW")
            ref.value = newValue
        }
    }
    @inlinable public mutating func append(_ element: Int) {
        value += element
    }
}

extension [Int: IntVal] {
    @inlinable public subscript(newSubscript key: Key) -> IntVal {
        @inlinable get { self[key]! }
        @inlinable set {
            self[key] = newValue
        }
    }
}

@inlinable public func test_subscript() {
    var values: [Int: IntVal] = [0 : IntVal(1)]
    var a = values[newSubscript: 0]
    a.append(0) // COW 🤔
    values[newSubscript: 0] = a
    print("done")
}

Do you have an example of "_semantics" ?

you really shouldn’t be experiencing any COW at all if all your code is in one module unless you are doing something really weird. are you sure you’re compiling with optimizations enabled? if so, i’m inclined to say you may have found a compiler bug.

@inlinable only affects callers from outside the module the declaration lives in.

At first I thought having a protocol for a Map data type is not needed just because there are not so many data types like this in standard library.
But I have some counter points now.

  • Abstraction over interface follows Liskov substitution principle (kinda). What I mean by that is whenever a function consumes a generic constrained to Collection protocol I can supply any type conforming to Collection to that function and expectation of how that code works will still be fulfilled
  • There is more than one implementation of "associated array" or "map" data type: hashtable, binary search tree, red black tree, unbalanced binary search tree or literally array of tuples. Or even more than one type, like C++ provides map, unordered_map, multimap and unordered_multimap. All of those types/implementation share common interface: key-value pair. So why standard library should not provide a common interface to abstract on for it? Swift Collections could benefit from that quite a lot, providing SortedDictionary, OrderedDictionary and TreeDictionary.

One of the interesting issues with a DictionaryProtocol would be establishing performance guarantees.

Dictionary lookups are generally expected to be constant-time on average, but some implementations take different approaches. Something like a URL's query string would be linear (the backing storage is just a string - we need to search for the key every time).

I'm not sure how useful it is for algorithm writers when there can be such variation. Maybe it would be worth excluding types which only do linear lookups, and say they might as well just use Collection. Perhaps we would say that DictionaryProtocol lookups must be at least logarithmic on average.

And perhaps we would introduce a refined protocol, similar to RandomAccessCollection, for implementations which provide average constant-time lookups.

the systemic cause of this is the lack of OrderedDictionary in the standard library - i personally find myself drawing a bright line between “lookup” structures backed by Dictionary and “order-preserving” representations backed by contiguous storage. because we cannot have both at the same time without adding third-party dependencies.

there are certainly pros and cons to this, arguably separating lookup-optimized representations from order-preserving representations might actually be more efficient, though this is totally speculative. but i do feel like i am trapped in some local minima, and i wonder if linear lookup complexity would simply be less common in the first place if OrderedDictionary were more accessible.

It isn't, actually. At least not in this case. I considered using some kind of hash table but decided not to.

Eagerly extracting all key-value pairs into a Dictionary/OrderedDictionary/Array/etc can improve the performance of some operations, but it can lose information, and it comes with significant costs that may not be universally desirable. For instance, let's say I wanted to look up a single key - building a hash table of all key-value pairs in order to perform that one lookup would perform much, much worse than a single linear search.

IMO libraries should strive, as far as possible, to give clients the freedom to make such decisions for themselves. When your data is stored in a string (as it is for a URL component), iterating all subcomponents or searching for a single subcomponent are necessarily linear operations. It's just a natural consequence of the data's source. By embracing that, engineers of client applications/libraries are able to make full use of their own judgement and, if appropriate, copy the information in to a specialised structure of their choosing as efficiently as if they parsed it out themselves.

The idea that I would be introducing a third-party dependency was not a factor in the decision. I wouldn't have used an OrderedDictionary even if it were in the standard library.

In this case, no. I don't even consider the linear lookup complexity to be a problem.

That being said (and to bring this back on topic), I'm not convinced that it would be a good candidate for a DictionaryProtocol conformance. It provides key-value pairs, but the performance is such that you might as well stick to treating it like a Collection.

In other words, I'm suggesting that, for writers of generic algorithms, sub-linear average lookup complexity is one of the most useful traits that distinguishes what we might call "dictionaries" from your regular-old collections. It's not enough for the type just to model some kind of key-value pairs.

Maybe that's obvious to some (it wasn't to me). It may also begin to answer the question of which useful algorithms you could express, and when you would know if DictionaryProtocol is the appropriate generic constraint.

1 Like

Hey Ben!

It's not only sugar for the initializers. I proposed a modification to @lorentey's idea:

The current initializer (e.g. Dictionary.init(grouping:by:)) strongly couples instance creation and grouping, so it's impossible to do just the grouping on its own into an existing instance (without making a new dictionary and merging it into the old).

The only way to take an instance as a parameter would be if there was some protocol that describes a mutable associative collection.

1 Like