[Pitch] - Add `DictionaryProtocol` to the standard library

There are a multitude of protocols for collections:

Sequence
Collection
MutableCollection
RangeReplaceableCollection
BidirectionalCollection
RandomAccessCollection

And even one for sets:

SetAlgebra

I think we should have a protocol for dictionaries. There are already several types that could conform.

Dictionary<Key, Value> // from swift standard library
NSDictionary // from Foundation
NSCache<Key, Value> // from Foundation
NSMapTable<Key, Value> // from Foundation
OrderedDictionary<Key, Value> // from Swift Algorithms
4 Likes

Hi @aggie33 – the key when making a case for a new protocol like this is to show useful code that is generic over all these types.

What kind of operations on a generic protocol that lets you map a key onto a value would you write for all these types, if you had such a protocol? Are there equivalents of Sequence.map or MutableCollection.shuffle that could apply to all the potential conforming types you list?

13 Likes

Well, for example, subscript(_:default:) { get set } could be made an extension on DictionaryProtocol, and then all dictionary types would get it.

extension DictionaryProtocol {
  public subscript(key: Key, default defaultValue: @autoclosure () -> Value) -> Value { 
    get { self[key] ?? defaultValue() }
    set { self[key] = newValue }
  }
}

The filter(_:) method could also be implemented on DictionaryProtocol to return Self.

3 Likes

Surely defaultValue should be an autoclosure.

1 Like

You're right.

One concrete use case I've run into is to allow customization of my group(by:) algorithm to allow callers to choose to use OrderedDictionary, or others:

7 Likes

There’s a simpler solution to that, which is to just stick with the initializer versions instead, which works today without a protocol. Creating a protocol purely to allow sugaring initializers as chained methods seems insufficient justification.

This is a bit of an anti-example, because this would not be a good idea. Implementing this via a subscript get/set defeats copy-on-write, so values[grouping, default: []].append(element) becomes a linear operation. Even once _modify becomes a full language feature, the implementation for modifying an element, creating a default initially if it isn’t in the dictionary, may not have a straightforward default implementation – determining that needs some experimentation.

This and your filter example also raise another question that would need answering: is this one protocol, or a hierarchy of protocols. To implement filter requires multiple things:

  • the ability to iterate (which could be done by also requiring Sequence possibly)
  • the ability to create a new empty instance of the dictionary
  • the ability to write values for keys as well as read them.

In the Collection hiearchy, these are all modeled by separate protocols.

  • Sequence for looping over values – should DictionaryProtocol refine it?
  • Collection for indexed subscript, equivalent to lookup up a value for a key
  • MutableCollection for changing elements at an index, equivalent to mutating a value for a key
  • RangeReplaceableCollection for (amongst other things) creating empty collections and also adding new elements to collections.

So is this really a hierarchy of protocols? If yes… this becomes even more complex addition, and so in turn needs more justification to ensure that complexity is useful. If no, that keeps the protocol nice and simple but at the cost of potentially ruling out conformance of some types to this protocol.

There’s also the question of whether standard algorithms on the protocol should be customization points. For example, a win with Dictionary.filter is that it can potentially do it more efficiently, using knowledge of the existing keys and their bucketing. This would only work if filter were a customization point. But if every useful algorithm on a protocol ends up needing to be a customization point, that’s not very scalable. This isn’t true of the collection hierarchy, but it might be of a dictionary protocol.

All this is to point out that what seems like a simple ask (just add a unifying protocol for all these similar things) can end up being a deceptively complex task

3 Likes

to be fair, this is a rather severe limitation of swift protocols that affects a lot of other use cases besides the pitched DictionaryProtocol. i would rather we prioritize fixing the underlying problem (which would probably either involve named subscripts or the ability to declare _modify as a protocol accessor requirement), than see it be used as a justification to shoot down ideas like DictionaryProtocol.

6 Likes

Defeating CoW is a much bigger deal for collections than is not having _modify made official for, say, integer types. It is a fair critique of a proposal if its centerpiece use case is one that cannot be made performant in the absence of future unrelated features.

1 Like

I'm not sure this would really make such a great protocol.

If you look at the types which you suggested might conform:

Dictionary<Key, Value> // from swift standard library
NSDictionary // from Foundation
NSCache<Key, Value> // from Foundation
NSMapTable<Key, Value> // from Foundation
OrderedDictionary<Key, Value> // from Swift Algorithms

They all seem to be associative arrays, and so their APIs can superficially appear similar, but they also have wildly different behaviour which makes it difficult to write generic algorithms using them.

  • (NS)Dictionary is unordered. If you're copying its key-value pairs in to another data structure that isn't a dictionary, you will almost certainly want to sort them first.

  • OrderedDictionary is ordered. If you're copying its key-value pairs in to another data structure that isn't a dictionary, you should not sort them first, since the existing order is actually meaningful.

  • NSCache can drop values behind your back according to system resource pressure, so you can never really rely on it having any particular contents. Presumably for that reason, you can't iterate its contents, and it doesn't even have count or isEmpty properties. It is used in completely different contexts to the other types.

  • Much of NSMapTable is obsolete these days. Storing arbitrary pointers was only a useful feature because NSDictionary required keys and values to be subclasses of NSObject (a limitation which Swift's native Dictionary does not have) and most of NSPointerFunctions is necessarily unsafe.

    There may be some value in associating a collection of weak references with a key, but even then I think it would be better modelled as a Dictionary<Key, WeakCollection<MyObject>> rather than having the whole table be mutated from any thread any time an object gets deallocated. It might be worth a specialised type.

    Even then, it would suffer the same problem as NSCache that the contents change behind your back. Meaning I think such map-tables fill a similar role to caches in that they store ephemeral data, and aren't really used in the same contexts as the other dictionary types.

As swift-collections adds more specialised dictionary implementations, they might want to settle on some common API surface based on some expected generic algorithms (or not), but I think that is best done in the collections library at least for now.

2 Likes

just to be clear, the ask is not to make _modify “official” (it is already de facto official, the only kind of official that matters), the ask is to make it possible to declare an accessor requirement of _modify, one that cannot be witnessed by a set.

3 Likes
1 Like

Can you expand on this, what exactly are you doing here and what is the issue. The protocol talks about a dictionary subscript and you are showing a seemingly unrelated Array's append operation. What's "grouping" here? Please show the whole example.

1 Like

in the absence of @inlinable, this operation expands to something like:

var values:[Int: [String]] = ...

var array:[String] = values[group, default: []]

array.append(element) // COW!

values[group, default: []] = array

it also actually involves two hashing operations, one for the get and one for the set.

1 Like

You mean that if I do this as is:

    var values: [Int: [String]] = [0 : ["1", "2", "3"]]
    values[0, default: []].append("Hello")

it won't incur COW and if I do it via the above DictionaryProtocol it would incur COW?

if the compiler can’t prove that it is a Dictionary (or some other type that witnesses that requirement with a _modify accessor), yes it will.

1 Like

Thank you. I created this test app:

class IntRef {
    var value: Int
    init(_ value: Int) {
        self.value = value
    }
}

struct IntVal {
    private var ref: IntRef
    init(_ value: Int = 0) {
        ref = IntRef(value)
    }
    var value: Int {
        get { ref.value }
        set {
            guard isKnownUniquelyReferenced(&ref) else {
                print("COW")
                ref = IntRef(newValue)
                return
            }
            print("NO COW")
            ref.value = newValue
        }
    }
    mutating func append(_ element: Int) {
        value += element
    }
}

extension [Int: IntVal] {
    subscript(newSubscript key: Key) -> IntVal {
        get { self[key]! }
        set { self[key] = newValue }
    }
}

protocol DictionaryProtocol {
    subscript(protocolSubscript key: Int) -> IntVal { get set }
}

extension [Int: IntVal]: DictionaryProtocol {
    subscript(protocolSubscript key: Key) -> IntVal {
        get { self[key]! }
        set { self[key] = newValue }
    }
}

func test_builtinSubscript() {
    var values: [Int: IntVal] = [0 : IntVal(1)]
    values[0]!.append(0) // NO COW
}

func test_newSubscript() {
    var values: [Int: IntVal] = [0 : IntVal(1)]
    values[newSubscript: 0].append(0) // COW
}

func test_existential() {
    var values: DictionaryProtocol = [0 : IntVal(1)]
    values[protocolSubscript: 0].append(0) // COW
}

func test() {
    test_builtinSubscript() // NO COW
    test_newSubscript() // COW
    test_existential() // COW
}

test()

I can see COW triggered for the existential case as you describe :+1:
However I also see it being triggered when I declare my subscript, which doesn't go through a protocol conformance :thinking: Why would it be? So it's not just about going through protocol / existential which is problematic but the problem is more profound?


Edit:
I see. This fixes the COW issue with my custom subscript:

extension [Int: IntVal] {
    subscript(newSubscript key: Key) -> IntVal {
        get { self[key]! }
        _modify {
            yield &self[key]!
        }
    }
}

and we can't (yet?) do the same with protocols:

protocol DictionaryProtocol {
    subscript(protocolSubscript key: Int) -> IntVal { get _modify }
    // 🛑 Expected get or set in a protocol property
}

@taylorswift thank you for helping deciphering this.


Edit 2:

Hmm. But that solves the COW issue when using protocol:

protocol DictionaryProtocol {
    subscript(protocolSubscript key: Int) -> IntVal { get set }
}

extension [Int: IntVal]: DictionaryProtocol {
    subscript(protocolSubscript key: Key) -> IntVal {
        get { self[key]! }
        _modify {
            yield &self[key]!
        }
    }
}
1 Like
    var a = values[0]!
    a.append(0) // this should COW âś…
    print(values[0]!)
    values[0]! = a

    var a = values[0]!
    a.append(0) // but why this COW-ing? 🤔
    values[0]! = a

BTW, why is that COW-ing? Can't compiler see that the original array is not used but overwritten right away?

The workaround (besides using _modify instead of set) could be resetting the element to nil (if it's optional) or to some placeholder value (if it's non optional):

    let placeholder = IntVal()
    var a = values[0]!
    values[0] = placeholder
    a.append(0) // NO COW
    values[0] = a

did you try this across a module boundary?

the compiler does not know that the original array is overwritten after the append, unless the entire call-chain of the subscript is @inlinable to the caller. (or if you use the very dangerous @_semantics annotations.)