Dictionary Enhancements


(Ben Cohen) #1

Hi swift-evolution,

Following up on Ted’s post regarding the opening up of stage 2, I’m starting a thread to discuss improvements to the Dictionary type.

Here is a list of commonly requested changes/enhancements to Dictionary, all of which would probably be appropriate to put together into a single evolution proposal:

init from/merge in a Sequence of Key/Value pairs (already raised as SE-100: https://github.com/apple/swift-evolution/blob/master/proposals/0100-add-sequence-based-init-and-merge-to-dictionary.md).
make the Values view collection a MutableCollection (as in this PR: https://github.com/apple/swift-evolution/pull/555).
Add a defaulting subscript get (e.g. counts[key, default: 0] += 1 or grouped(key, default:[]].append(value)).
Add a group by-like init to create a Dictionary<K,[V]> from a sequence of V and a closure (V)->K.
Add Dictionary.filter to return a Dictionary.
Add Dictionary.mapValues to return a Dictionary (can be more efficiently implemented than composition as the storage layout remains the same).
Add capacity property and reserveCapacity() method.
Have Dictionary.removeAtIndex return the Index of the next entry.
(once we have conditional conformance) Make dictionaries with Equatable values Equatable.
Please reply here with any comments or questions on the above list, or any additions you believe are important that are missing from it.

All methods added to the standard library increase complexity, so need a strong justification to reduce the risk of API sprawl. When requesting additions/modifications, please keep the following questions in mind:

Is the suggested addition a common operation that many would find useful? Can it be flexible enough to cover different needs?
Will it encourage good practice? Might it be misused or encourage anti-patterns?
Can the operation be composed simply from existing std lib features? Is that composition intuitive/readable?
Is writing the equivalent by hand hard to get right? Are there common correctness traps that this addition would help avoid?
Is writing the equivalent by hand hard to make efficient? Are there common performance traps that this addition would help avoid?
Might a native implementation be able to execute more efficiently, by accessing internals, than the equivalent implementation using public APIs?
As he has already written up SE-100 and another Dictionary proposal, Nate Cook has kindly offered to collate a new omnibus proposal for Dictionary, which will then get pitched here.

I will send another email about enhancements to Sequence/Collection algorithms shortly.

Thanks!

Ben


(David Sweeris) #2

If we added an @injective annotation (https://en.wikipedia.org/wiki/Injective_function), we could add a .mapToDict function that returns another Dictionary with different keys instead of an array of tuples.

Come to think of it, we could give sets the same treatment, too.

- Dave Sweeris

···

On Feb 16, 2017, at 4:26 PM, Ben Cohen via swift-evolution <swift-evolution@swift.org> wrote:

Hi swift-evolution,

Following up on Ted’s post regarding the opening up of stage 2, I’m starting a thread to discuss improvements to the Dictionary type.

Here is a list of commonly requested changes/enhancements to Dictionary, all of which would probably be appropriate to put together into a single evolution proposal:

init from/merge in a Sequence of Key/Value pairs (already raised as SE-100: https://github.com/apple/swift-evolution/blob/master/proposals/0100-add-sequence-based-init-and-merge-to-dictionary.md).
make the Values view collection a MutableCollection (as in this PR: https://github.com/apple/swift-evolution/pull/555).
Add a defaulting subscript get (e.g. counts[key, default: 0] += 1 or grouped(key, default:[]].append(value)).
Add a group by-like init to create a Dictionary<K,[V]> from a sequence of V and a closure (V)->K.
Add Dictionary.filter to return a Dictionary.
Add Dictionary.mapValues to return a Dictionary (can be more efficiently implemented than composition as the storage layout remains the same).
Add capacity property and reserveCapacity() method.
Have Dictionary.removeAtIndex return the Index of the next entry.
(once we have conditional conformance) Make dictionaries with Equatable values Equatable.
Please reply here with any comments or questions on the above list, or any additions you believe are important that are missing from it.


(Haravikk) #3

Just wanted to quickly weigh in on this one, but I wonder if this might make most sense coming as a slight restructuring of the collection protocols. It could for example make sense as an ExtendableCollection protocol, which could also take a general purpose add/insert method (in the case of Dictionary this would take a Key/Value pair), which I believe is something that's also lacking from Dictionary?

Basically it's a protocol representing a Collection whose contents can grow, but which places no guarantees over ordering. So unlike append() you're not guaranteed to have the new element under .last. It makes sense to group with capacity since you don't need capacity in something that can't expand in some way.

···

On 17 Feb 2017, at 00:26, Ben Cohen via swift-evolution <swift-evolution@swift.org> wrote:

Hi swift-evolution,

Add capacity property and reserveCapacity() method.


(Jon Hull) #4

Thoughts inline.

Hi swift-evolution,

Following up on Ted’s post regarding the opening up of stage 2, I’m starting a thread to discuss improvements to the Dictionary type.

Here is a list of commonly requested changes/enhancements to Dictionary, all of which would probably be appropriate to put together into a single evolution proposal:

init from/merge in a Sequence of Key/Value pairs (already raised as SE-100: https://github.com/apple/swift-evolution/blob/master/proposals/0100-add-sequence-based-init-and-merge-to-dictionary.md).

+1. I have wanted this since Swift 1.

make the Values view collection a MutableCollection (as in this PR: https://github.com/apple/swift-evolution/pull/555).

I think Nate’s proposal covers this case well.

Add a defaulting subscript get (e.g. counts[key, default: 0] += 1 or grouped(key, default:[]].append(value)).

I am indifferent to this. I am happy using ??. I guess it could be slightly more efficient because it avoids wrapping and unwrapping the optional.

Add a group by-like init to create a Dictionary<K,[V]> from a sequence of V and a closure (V)->K.

+1. I would use this.

Add Dictionary.filter to return a Dictionary.

+1.

Add Dictionary.mapValues to return a Dictionary (can be more efficiently implemented than composition as the storage layout remains the same).

+1000. I have also been asking for this since the beginning. I built my own version (and use it frequently), but as you say, the standard library can do it much more efficiently.

I would also like to see an in-place version as well.

One design detail. Even though it is only mapping the values, I would like it to pass the key to the closure as well. It occasionally figures in to the mapping logic.

Add capacity property and reserveCapacity() method.

+0.5. I could see it being useful occasionally.

Have Dictionary.removeAtIndex return the Index of the next entry.

No opinion on this.

(once we have conditional conformance) Make dictionaries with Equatable values Equatable.

+1

Please reply here with any comments or questions on the above list, or any additions you believe are important that are missing from it.

I would also like to see a version of map which returns a dictionary and handles key collisions:

  let newDict = myDict.map(collision: {k,v1,v2 in v2}) { (k,v) in ... }

The collision parameter would take a throwing closure and handle the case of a key conflict (by returning the value to use, throwing, or trapping). It would have a default value so that it would only have to be specified if a different behavior was desired.

In advanced cases, the collision could be used to accumulate values together. Because of this, I would actually like to see this on *collection* (not just dictionary). The map closure is handed each element of the sequence (which in the case of dictionary is a key/value tuple), and expects a return value of a key/value tuple. The collision block is called when a key is returned which has already been used to figure out what value to use. This might choose a winner, or it could act like reduce, building a value from the components.

As a concrete example of what this allows, I could take in an array of words/strings [“apple”, “aardvark”, …] and then do the following to get a count of how many words start with each letter:
  
  let letterFrequency = words.map(collision:{$1+$2}) { (String($0.characters.first ?? “”) ,1)}
  
  print(letterFrequency[“a”]) //number of words starting with “a"

I am ok using a term other than ‘map' if that is easier on the compiler, but I would like this functionality. At the simple end, it allows map functionality for both keys and values. At the advanced end, it acts as a categorizing reduce over a sequence/collection.

You can even trivially implement the proposed groupBy with it (this is on Collection, but you could do an init on dict the same way):

  func grouped<K>(by categorizer: (Element)->K ) -> [K:Element] {
    return self.map(collision:{$1+$2}) {(categorizer($0), [$0])}
  }

Thanks,
Jon

···

On Feb 16, 2017, at 4:26 PM, Ben Cohen via swift-evolution <swift-evolution@swift.org> wrote:

All methods added to the standard library increase complexity, so need a strong justification to reduce the risk of API sprawl. When requesting additions/modifications, please keep the following questions in mind:

Is the suggested addition a common operation that many would find useful? Can it be flexible enough to cover different needs?
Will it encourage good practice? Might it be misused or encourage anti-patterns?
Can the operation be composed simply from existing std lib features? Is that composition intuitive/readable?
Is writing the equivalent by hand hard to get right? Are there common correctness traps that this addition would help avoid?
Is writing the equivalent by hand hard to make efficient? Are there common performance traps that this addition would help avoid?
Might a native implementation be able to execute more efficiently, by accessing internals, than the equivalent implementation using public APIs?
As he has already written up SE-100 and another Dictionary proposal, Nate Cook has kindly offered to collate a new omnibus proposal for Dictionary, which will then get pitched here.

I will send another email about enhancements to Sequence/Collection algorithms shortly.

Thanks!

Ben

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(David Waite) #5

Hi swift-evolution,

Following up on Ted’s post regarding the opening up of stage 2, I’m starting a thread to discuss improvements to the Dictionary type.

Here is a list of commonly requested changes/enhancements to Dictionary, all of which would probably be appropriate to put together into a single evolution proposal:

<snip>

Add a defaulting subscript get (e.g. counts[key, default: 0] += 1 or grouped(key, default:[]].append(value)).

Interesting, most of my scenarios aren’t one line subscript mutations like this, so I have been happy with counts[key] ?? 0, grouped(key) ?? [], etc.

Add capacity property and reserveCapacity() method.

would CoW copies also have the same reserved capacity? One way I could hypothetically copy around dictionaries that are using significantly more memory than I suspected, the other way algorithms would be quite confusing as an assignment to a variable might cause the capacity to revert.

In terms of additional features, I have needed to group by value before, e.g. turn from K->V to V -> [K], which isn’t a simple reduce statement.

-DW

···

On Feb 16, 2017, at 5:26 PM, Ben Cohen via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:


(Ole Begemann) #6

Out of interest, how would you implement this? Does it require a generics feature that's slated for Swift 4? I tried two approaches that don't compile in a current Swift 3.1 snapshot (and I'm getting a segfault with both examples in a dev snapshot from 2017-02-14):

1)

extension Dictionary {
    // error: same-type constraint 'Value' == '[S.Iterator.Element]' is recursive
    init<S: Sequence>(values: S, groupedBy: (S.Iterator.Element) -> Key)
        where Value == [S.Iterator.Element] {
        ...
        }
    }
}

2)

// error: reference to generic type 'Array' requires arguments in <...>
extension Dictionary where Value == Array {
    init<S: Sequence>(values: S, groupedBy: (S.Iterator.Element) -> Key)
        where S.Iterator.Element == Value.Element {
        ...
        }
    }
}

···

On 17 Feb 2017, at 01:26, Ben Cohen via swift-evolution <swift-evolution@swift.org> wrote:

Here is a list of commonly requested changes/enhancements to Dictionary, all of which would probably be appropriate to put together into a single evolution proposal:

init from/merge in a Sequence of Key/Value pairs (already raised as SE-100: https://github.com/apple/swift-evolution/blob/master/proposals/0100-add-sequence-based-init-and-merge-to-dictionary.md).
make the Values view collection a MutableCollection (as in this PR: https://github.com/apple/swift-evolution/pull/555).
Add a defaulting subscript get (e.g. counts[key, default: 0] += 1 or grouped(key, default:[]].append(value)).
Add a group by-like init to create a Dictionary<K,[V]> from a sequence of V and a closure (V)->K.


(Brent Royal-Gordon) #7

  • init from/merge in a Sequence of Key/Value pairs (already raised as SE-100: https://github.com/apple/swift-evolution/blob/master/proposals/0100-add-sequence-based-init-and-merge-to-dictionary.md).

Yes, I think this is a great idea.

  • make the Values view collection a MutableCollection (as in this PR: https://github.com/apple/swift-evolution/pull/555).

Sounds good to me.

  • Add a defaulting subscript get (e.g. counts[key, default: 0] += 1 or grouped(key, default:[]].append(value)).

I'd really like this to be a broader mechanism—either somehow permitting assignment through `??`:

  (counts[key] ?? 0) += 1

Or having some kind of `Dictionary` variant with a default value:

  var counts = DefaultedDictionary(elements, default: 0)
  counts[key] += 1

  • Add a group by-like init to create a Dictionary<K,[V]> from a sequence of V and a closure (V)->K.

Honestly, this is such a good idea that I might want it to be an extension method in `Sequence` or `Collection`:

  extension Sequence {
    func grouped<Key: Hashable>(by makeKey: (Element) throws -> Key) rethrows -> [Key: Element] {…}
  }

We don't expose `map` and `filter` as `Array` initializers; why would `grouped(by:)` be one?

  • Add Dictionary.filter to return a Dictionary.

Sure. If it took a `(Key, Value) -> Bool` predicate, that could even distinguish between the `Array`-returning version and the `Dictionary` one.

  • Add Dictionary.mapValues to return a Dictionary (can be more efficiently implemented than composition as the storage layout remains the same).

Yes, this is something I want. So do 114 Stack Overflow users: <http://stackoverflow.com/questions/24116271/whats-the-cleanest-way-of-applying-map-to-a-dictionary-in-swift/24219069#24219069>

I'd actually prefer that this *not* take a `Key` parameter, because that's more likely to interfere with point-free styles of programming. Reasonable people can disagree on that, though.

  • Add capacity property and reserveCapacity() method.

Eh, maybe.

  • Have Dictionary.removeAtIndex return the Index of the next entry.

I'm not sure what the point of this is, and I'm not sure if you think this should also be done to equivalent APIs on e.g. `RangeReplaceableCollection`.

  • (once we have conditional conformance) Make dictionaries with Equatable values Equatable.

No-brainer—definitely want this one.

Please reply here with any comments or questions on the above list, or any additions you believe are important that are missing from it.

I'd sort of like to see a protocol for key-value lookup types, sort of like `SetAlgebra` but for dictionaries. I've recently been building a custom dictionary (a `SortedDictionary` type which sorts its elements by key) and not having a protocol to describe dictionaries abstractly is a bit of an irritant.

···

On Feb 16, 2017, at 4:26 PM, Ben Cohen via swift-evolution <swift-evolution@swift.org> wrote:

--
Brent Royal-Gordon
Architechies


(Ben Cohen) #8

Hi swift-evolution,

Following up on Ted’s post regarding the opening up of stage 2, I’m starting a thread to discuss improvements to the Dictionary type.

Here is a list of commonly requested changes/enhancements to Dictionary, all of which would probably be appropriate to put together into a single evolution proposal:

init from/merge in a Sequence of Key/Value pairs (already raised as SE-100: https://github.com/apple/swift-evolution/blob/master/proposals/0100-add-sequence-based-init-and-merge-to-dictionary.md).
make the Values view collection a MutableCollection (as in this PR: https://github.com/apple/swift-evolution/pull/555).
Add a defaulting subscript get (e.g. counts[key, default: 0] += 1 or grouped(key, default:[]].append(value)).
Add a group by-like init to create a Dictionary<K,[V]> from a sequence of V and a closure (V)->K.
Add Dictionary.filter to return a Dictionary.
Add Dictionary.mapValues to return a Dictionary (can be more efficiently implemented than composition as the storage layout remains the same).
Add capacity property and reserveCapacity() method.
Have Dictionary.removeAtIndex return the Index of the next entry.
(once we have conditional conformance) Make dictionaries with Equatable values Equatable.
Please reply here with any comments or questions on the above list, or any additions you believe are important that are missing from it.

If we added an @injective annotation (https://en.wikipedia.org/wiki/Injective_function), we could add a .mapToDict function that returns another Dictionary with different keys instead of an array of tuples.

Assuming Dictionary also acquires an initializer from a sequence of key/value pairs, being able to map both keys and values doesn’t buy you much over just constructing a new dictionary. The win with mapping only the values is that the underlying hash table can retain the same physical layout, just with different values in the slots, so it can be done faster.

···

On Feb 16, 2017, at 5:06 PM, David Sweeris <davesweeris@mac.com> wrote:

On Feb 16, 2017, at 4:26 PM, Ben Cohen via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Come to think of it, we could give sets the same treatment, too.

- Dave Sweeris


(Matthew Johnson) #9

Thoughts inline.

Hi swift-evolution,

Following up on Ted’s post regarding the opening up of stage 2, I’m starting a thread to discuss improvements to the Dictionary type.

Here is a list of commonly requested changes/enhancements to Dictionary, all of which would probably be appropriate to put together into a single evolution proposal:

init from/merge in a Sequence of Key/Value pairs (already raised as SE-100: https://github.com/apple/swift-evolution/blob/master/proposals/0100-add-sequence-based-init-and-merge-to-dictionary.md).

+1. I have wanted this since Swift 1.

make the Values view collection a MutableCollection (as in this PR: https://github.com/apple/swift-evolution/pull/555).

I think Nate’s proposal covers this case well.

Add a defaulting subscript get (e.g. counts[key, default: 0] += 1 or grouped(key, default:[]].append(value)).

I am indifferent to this. I am happy using ??. I guess it could be slightly more efficient because it avoids wrapping and unwrapping the optional.

Add a group by-like init to create a Dictionary<K,[V]> from a sequence of V and a closure (V)->K.

+1. I would use this.

Add Dictionary.filter to return a Dictionary.

+1.

Add Dictionary.mapValues to return a Dictionary (can be more efficiently implemented than composition as the storage layout remains the same).

+1000. I have also been asking for this since the beginning. I built my own version (and use it frequently), but as you say, the standard library can do it much more efficiently.

Agree this is a big one. Hopefully if we ever get higher-kinded types we might also have a way to map this method (pun intended) to a Dictionary implementation of Functor.

I would also like to see an in-place version as well.

This would be useful for transformations mapping values to the same type.

One design detail. Even though it is only mapping the values, I would like it to pass the key to the closure as well. It occasionally figures in to the mapping logic.

I can see how this could be useful but it should be an additional overload. I want to be able to give mapValues any function that just takes the value type as input.

Add capacity property and reserveCapacity() method.

+0.5. I could see it being useful occasionally.

Have Dictionary.removeAtIndex return the Index of the next entry.

No opinion on this.

(once we have conditional conformance) Make dictionaries with Equatable values Equatable.

+1

Please reply here with any comments or questions on the above list, or any additions you believe are important that are missing from it.

I would also like to see a version of map which returns a dictionary and handles key collisions:

  let newDict = myDict.map(collision: {k,v1,v2 in v2}) { (k,v) in ... }

The collision parameter would take a throwing closure and handle the case of a key conflict (by returning the value to use, throwing, or trapping). It would have a default value so that it would only have to be specified if a different behavior was desired.

In advanced cases, the collision could be used to accumulate values together. Because of this, I would actually like to see this on *collection* (not just dictionary). The map closure is handed each element of the sequence (which in the case of dictionary is a key/value tuple), and expects a return value of a key/value tuple. The collision block is called when a key is returned which has already been used to figure out what value to use. This might choose a winner, or it could act like reduce, building a value from the components.

As a concrete example of what this allows, I could take in an array of words/strings [“apple”, “aardvark”, …] and then do the following to get a count of how many words start with each letter:
  
  let letterFrequency = words.map(collision:{$1+$2}) { (String($0.characters.first ?? “”) ,1)}
  
  print(letterFrequency[“a”]) //number of words starting with “a"

I am ok using a term other than ‘map' if that is easier on the compiler, but I would like this functionality. At the simple end, it allows map functionality for both keys and values. At the advanced end, it acts as a categorizing reduce over a sequence/collection.

This is a very interesting idea.

···

Sent from my iPad

On Feb 17, 2017, at 8:50 PM, Jonathan Hull via swift-evolution <swift-evolution@swift.org> wrote:

On Feb 16, 2017, at 4:26 PM, Ben Cohen via swift-evolution <swift-evolution@swift.org> wrote:

You can even trivially implement the proposed groupBy with it (this is on Collection, but you could do an init on dict the same way):

  func grouped<K>(by categorizer: (Element)->K ) -> [K:Element] {
    return self.map(collision:{$1+$2}) {(categorizer($0), [$0])}
  }

Thanks,
Jon

All methods added to the standard library increase complexity, so need a strong justification to reduce the risk of API sprawl. When requesting additions/modifications, please keep the following questions in mind:

Is the suggested addition a common operation that many would find useful? Can it be flexible enough to cover different needs?
Will it encourage good practice? Might it be misused or encourage anti-patterns?
Can the operation be composed simply from existing std lib features? Is that composition intuitive/readable?
Is writing the equivalent by hand hard to get right? Are there common correctness traps that this addition would help avoid?
Is writing the equivalent by hand hard to make efficient? Are there common performance traps that this addition would help avoid?
Might a native implementation be able to execute more efficiently, by accessing internals, than the equivalent implementation using public APIs?
As he has already written up SE-100 and another Dictionary proposal, Nate Cook has kindly offered to collate a new omnibus proposal for Dictionary, which will then get pitched here.

I will send another email about enhancements to Sequence/Collection algorithms shortly.

Thanks!

Ben

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Nate Cook) #10

Hi Jon,

Thank you for the feedback, this is really valuable! A couple questions below.

Thoughts inline.

Hi swift-evolution,

Following up on Ted’s post regarding the opening up of stage 2, I’m starting a thread to discuss improvements to the Dictionary type.

Here is a list of commonly requested changes/enhancements to Dictionary, all of which would probably be appropriate to put together into a single evolution proposal:

init from/merge in a Sequence of Key/Value pairs (already raised as SE-100: https://github.com/apple/swift-evolution/blob/master/proposals/0100-add-sequence-based-init-and-merge-to-dictionary.md).

+1. I have wanted this since Swift 1.

make the Values view collection a MutableCollection (as in this PR: https://github.com/apple/swift-evolution/pull/555).

I think Nate’s proposal covers this case well.

Add a defaulting subscript get (e.g. counts[key, default: 0] += 1 or grouped(key, default:[]].append(value)).

I am indifferent to this. I am happy using ??. I guess it could be slightly more efficient because it avoids wrapping and unwrapping the optional.

Add a group by-like init to create a Dictionary<K,[V]> from a sequence of V and a closure (V)->K.

+1. I would use this.

Add Dictionary.filter to return a Dictionary.

+1.

Add Dictionary.mapValues to return a Dictionary (can be more efficiently implemented than composition as the storage layout remains the same).

+1000. I have also been asking for this since the beginning. I built my own version (and use it frequently), but as you say, the standard library can do it much more efficiently.

I would also like to see an in-place version as well.

One design detail. Even though it is only mapping the values, I would like it to pass the key to the closure as well. It occasionally figures in to the mapping logic.

Do you have any examples you can share where you use the key in this type of map? I'm not contesting its usefulness; it would just be helpful to see some real-world usage.

Add capacity property and reserveCapacity() method.

+0.5. I could see it being useful occasionally.

Have Dictionary.removeAtIndex return the Index of the next entry.

No opinion on this.

(once we have conditional conformance) Make dictionaries with Equatable values Equatable.

+1

Please reply here with any comments or questions on the above list, or any additions you believe are important that are missing from it.

I would also like to see a version of map which returns a dictionary and handles key collisions:

  let newDict = myDict.map(collision: {k,v1,v2 in v2}) { (k,v) in ... }

The collision parameter would take a throwing closure and handle the case of a key conflict (by returning the value to use, throwing, or trapping). It would have a default value so that it would only have to be specified if a different behavior was desired.

In advanced cases, the collision could be used to accumulate values together. Because of this, I would actually like to see this on *collection* (not just dictionary). The map closure is handed each element of the sequence (which in the case of dictionary is a key/value tuple), and expects a return value of a key/value tuple. The collision block is called when a key is returned which has already been used to figure out what value to use. This might choose a winner, or it could act like reduce, building a value from the components.

I think the uses you're describing are handled by the merging initializer proposed in SE-100:
  https://github.com/apple/swift-evolution/blob/master/proposals/0100-add-sequence-based-init-and-merge-to-dictionary.md

For example, you can use a combining closure to select specific elements when the keys collide:

let duplicates: DictionaryLiteral = ["a": 1, "b": 2, "a": 3, "b": 4]
// Using the first value only
Dictionary(merging: duplicates, combine: { (first, _) in first }) // ["b": 2, "a": 1]
// Using the maximum value
Dictionary(merging: duplicates, combine: max) // ["b": 4, "a": 3]

or to calculate the frequencies of values in a sequence:

extension Sequence where Iterator.Element: Hashable {
    func frequencies() -> [Iterator.Element: Int] {
        return Dictionary(merging: self.lazy.map { v in (v, 1) }, combine: +)
    }
}
[1, 2, 2, 3, 1, 2, 4, 5, 3, 2, 3, 1].frequencies()
// [2: 4, 4: 1, 5: 1, 3: 3, 1: 3]

Could you take a look and see if that provides the functionality you're looking for?

As a concrete example of what this allows, I could take in an array of words/strings [“apple”, “aardvark”, …] and then do the following to get a count of how many words start with each letter:
  
  let letterFrequency = words.map(collision:{$1+$2}) { (String($0.characters.first ?? “”) ,1)}
  
  print(letterFrequency[“a”]) //number of words starting with “a"

I am ok using a term other than ‘map' if that is easier on the compiler, but I would like this functionality. At the simple end, it allows map functionality for both keys and values. At the advanced end, it acts as a categorizing reduce over a sequence/collection.

You can even trivially implement the proposed groupBy with it (this is on Collection, but you could do an init on dict the same way):

  func grouped<K>(by categorizer: (Element)->K ) -> [K:Element] {
    return self.map(collision:{$1+$2}) {(categorizer($0), [$0])}
  }

Thanks!
Nate

···

On Feb 17, 2017, at 8:50 PM, Jonathan Hull via swift-evolution <swift-evolution@swift.org> wrote:

On Feb 16, 2017, at 4:26 PM, Ben Cohen via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:


(Ben Cohen) #11

Hi swift-evolution,

Following up on Ted’s post regarding the opening up of stage 2, I’m starting a thread to discuss improvements to the Dictionary type.

Here is a list of commonly requested changes/enhancements to Dictionary, all of which would probably be appropriate to put together into a single evolution proposal:

<snip>

Add a defaulting subscript get (e.g. counts[key, default: 0] += 1 or grouped(key, default:[]].append(value)).

Interesting, most of my scenarios aren’t one line subscript mutations like this, so I have been happy with counts[key] ?? 0, grouped(key) ?? [], etc.

As well as a readability/discoverability benefit, there’s also a performance benefit: d[key] = (d[key] ?? []) + [value] is a linear-time operation, whereas d[key, default: []].append(value) should be constant time (there is some work needed on dictionary internals for this to work but the API ought to enable it once that’s done).

Add capacity property and reserveCapacity() method.

would CoW copies also have the same reserved capacity?

Yes, it would mirror arrays, which behave like this. Note dictionaries already have this characteristic – they just don’t expose their capacity or allow you to preemptively expand it.

Any other behavior would be tricky. Until its mutated, a CoW copy is a literal copy, so must share the original’s capacity. Once the mutation forces a buffer copy, it would be pretty odd for that capacity to suddenly shrink.

One way I could hypothetically copy around dictionaries that are using significantly more memory than I suspected,the other way algorithms would be quite confusing as an assignment to a variable might cause the capacity to revert.

In terms of additional features, I have needed to group by value before, e.g. turn from K->V to V -> [K], which isn’t a simple reduce statement.

If there is a group-by operation from sequences, this ought to be composable from that given dictionaries are (Key,Value) sequences.

···

On Feb 17, 2017, at 11:20 PM, David Waite <david@alkaline-solutions.com> wrote:

On Feb 16, 2017, at 5:26 PM, Ben Cohen via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

-DW


(Ben Cohen) #12

Oops, looks like a bug in the same-type constraint implementation. Ought to work in 3.1. Minimal crasher repro:

extension Array {
  func f<S: Sequence>()
  where Element == S.Iterator.Element? {
        
  }
}

I’ve raised https://bugs.swift.org/browse/SR-4008

It ought to be done with the first one. For the second, you can’t constrain Value == Array because Array isn’t a type, needs to be Array<Something>.

As an (impractical) workaround, this compiles:

extension Dictionary {
    subscript(k: Key, default default: Value) -> Value {
        get { return self[k] ?? `default` }
        set { self[k] = newValue }
    }
}

extension Dictionary where Value: RangeReplaceableCollection {
    init<S: Sequence>(grouping values: S, by: (S.Iterator.Element) -> Key)
        where S.Iterator.Element == Value.Iterator.Element {
            self = [:]
            for x in values {
                let k = by(x)
                self[k, default: Value()].append(x)
            }
    }
}

let s = [10,20,22,30,31]
// have to explicitly type the Dictionary as a specific RRC
let d: [Int:[Int]] = Dictionary(grouping: s) { $0%10 }

print(d)

···

On Feb 19, 2017, at 11:22 AM, Ole Begemann <ole@oleb.net> wrote:

On 17 Feb 2017, at 01:26, Ben Cohen via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Here is a list of commonly requested changes/enhancements to Dictionary, all of which would probably be appropriate to put together into a single evolution proposal:

init from/merge in a Sequence of Key/Value pairs (already raised as SE-100: https://github.com/apple/swift-evolution/blob/master/proposals/0100-add-sequence-based-init-and-merge-to-dictionary.md).
make the Values view collection a MutableCollection (as in this PR: https://github.com/apple/swift-evolution/pull/555).
Add a defaulting subscript get (e.g. counts[key, default: 0] += 1 or grouped(key, default:[]].append(value)).
Add a group by-like init to create a Dictionary<K,[V]> from a sequence of V and a closure (V)->K.

Out of interest, how would you implement this? Does it require a generics feature that's slated for Swift 4? I tried two approaches that don't compile in a current Swift 3.1 snapshot (and I'm getting a segfault with both examples in a dev snapshot from 2017-02-14):

1)

extension Dictionary {
    // error: same-type constraint 'Value' == '[S.Iterator.Element]' is recursive
    init<S: Sequence>(values: S, groupedBy: (S.Iterator.Element) -> Key)
        where Value == [S.Iterator.Element] {
        ...
        }
    }
}

2)

// error: reference to generic type 'Array' requires arguments in <...>
extension Dictionary where Value == Array {
    init<S: Sequence>(values: S, groupedBy: (S.Iterator.Element) -> Key)
        where S.Iterator.Element == Value.Element {
        ...
        }
    }
}


(Nate Cook) #13

Here is a list of commonly requested changes/enhancements to Dictionary, all of which would probably be appropriate to put together into a single evolution proposal:

init from/merge in a Sequence of Key/Value pairs (already raised as SE-100: https://github.com/apple/swift-evolution/blob/master/proposals/0100-add-sequence-based-init-and-merge-to-dictionary.md).
make the Values view collection a MutableCollection (as in this PR: https://github.com/apple/swift-evolution/pull/555).
Add a defaulting subscript get (e.g. counts[key, default: 0] += 1 or grouped(key, default:[]].append(value)).
Add a group by-like init to create a Dictionary<K,[V]> from a sequence of V and a closure (V)->K.

Out of interest, how would you implement this? Does it require a generics feature that's slated for Swift 4? I tried two approaches that don't compile in a current Swift 3.1 snapshot (and I'm getting a segfault with both examples in a dev snapshot from 2017-02-14):

1)

extension Dictionary {
    // error: same-type constraint 'Value' == '[S.Iterator.Element]' is recursive
    init<S: Sequence>(values: S, groupedBy: (S.Iterator.Element) -> Key)
        where Value == [S.Iterator.Element] {
        ...
        }
    }
}

2)

// error: reference to generic type 'Array' requires arguments in <...>
extension Dictionary where Value == Array {
    init<S: Sequence>(values: S, groupedBy: (S.Iterator.Element) -> Key)
        where S.Iterator.Element == Value.Element {
        ...
        }
    }
}

I don't totally have my head around this, but won't a Dictionary initializer like this require parameterized extensions as described in the generics manifesto?
  https://github.com/apple/swift/blob/master/docs/GenericsManifesto.md#parameterized-extensions

Something like this:

extension Dictionary<T> where Value == [T] {
    init<S: Sequence>(_ values: S, groupedBy: (S.Iterator.Element) -> Key)
        where S.Iterator.Element == T
    { ... }
}

For what it's worth, I had been expecting that we'd add this as a method on Sequence, a la joined(by:), not an initializer.

extension Sequence {
    func grouped<Key: Hashable>(by grouping: (Iterator.Element) -> Key)
        -> [Key: [Iterator.Element]]
    { ... }
}
let d = (1...10).grouped(by: { $0 % 3 })
// [2: [2, 5, 8], 0: [3, 6, 9], 1: [1, 4, 7, 10]]

Nate

···

On Feb 19, 2017, at 6:13 PM, Ben Cohen via swift-evolution <swift-evolution@swift.org> wrote:

On Feb 19, 2017, at 11:22 AM, Ole Begemann <ole@oleb.net <mailto:ole@oleb.net>> wrote:

On 17 Feb 2017, at 01:26, Ben Cohen via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Oops, looks like a bug in the same-type constraint implementation. Ought to work in 3.1. Minimal crasher repro:

extension Array {
  func f<S: Sequence>()
  where Element == S.Iterator.Element? {
        
  }
}

I’ve raised https://bugs.swift.org/browse/SR-4008

It ought to be done with the first one. For the second, you can’t constrain Value == Array because Array isn’t a type, needs to be Array<Something>.

As an (impractical) workaround, this compiles:

extension Dictionary {
    subscript(k: Key, default default: Value) -> Value {
        get { return self[k] ?? `default` }
        set { self[k] = newValue }
    }
}

extension Dictionary where Value: RangeReplaceableCollection {
    init<S: Sequence>(grouping values: S, by: (S.Iterator.Element) -> Key)
        where S.Iterator.Element == Value.Iterator.Element {
            self = [:]
            for x in values {
                let k = by(x)
                self[k, default: Value()].append(x)
            }
    }
}

let s = [10,20,22,30,31]
// have to explicitly type the Dictionary as a specific RRC
let d: [Int:[Int]] = Dictionary(grouping: s) { $0%10 }

print(d)

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution


(Ben Cohen) #14

Thanks Brent.

  • init from/merge in a Sequence of Key/Value pairs (already raised as SE-100: https://github.com/apple/swift-evolution/blob/master/proposals/0100-add-sequence-based-init-and-merge-to-dictionary.md).

Yes, I think this is a great idea.

  • make the Values view collection a MutableCollection (as in this PR: https://github.com/apple/swift-evolution/pull/555).

Sounds good to me.

  • Add a defaulting subscript get (e.g. counts[key, default: 0] += 1 or grouped(key, default:[]].append(value)).

I'd really like this to be a broader mechanism—either somehow permitting assignment through `??`:

  (counts[key] ?? 0) += 1

My main concern with this (aside from whether it's implementable) is that it isn’t discoverable.

Or having some kind of `Dictionary` variant with a default value:

  var counts = DefaultedDictionary(elements, default: 0)
  counts[key] += 1

I think this is a useful type to have at some point, maybe even this proposal, and complementary to a subscript that takes a default at the point of use.

  • Add a group by-like init to create a Dictionary<K,[V]> from a sequence of V and a closure (V)->K.

Honestly, this is such a good idea that I might want it to be an extension method in `Sequence` or `Collection`:

  extension Sequence {
    func grouped<Key: Hashable>(by makeKey: (Element) throws -> Key) rethrows -> [Key: Element] {…}
  }

We don't expose `map` and `filter` as `Array` initializers; why would `grouped(by:)` be one?

This is a good point (Nate made it earlier too). I could go either way. In general, we could probably do with a written set of guidelines about exactly when/why something should be an initializer vs a method.

  • Add Dictionary.filter to return a Dictionary.

Sure. If it took a `(Key, Value) -> Bool` predicate, that could even distinguish between the `Array`-returning version and the `Dictionary` one.

As it is right now, that wouldn’t distinguish it since Sequence.filter on Dictionary can already take a (Key,Value)->Bool transformation since (Key,Value) is the Element type:
let d = [1:"1"]
let f: (Int,String)->Bool = {_,_ in true }
d.filter(f) // [(key: 1, value: "1")]

Another suggestion in the opposite direction was that the Filtered type become an associated type on Sequence, which would also allow lazy things to stay lazy generically. This would be source breaking though where current generic code that filters a sequence assumes an array result.

  • Add Dictionary.mapValues to return a Dictionary (can be more efficiently implemented than composition as the storage layout remains the same).

Yes, this is something I want. So do 114 Stack Overflow users: <http://stackoverflow.com/questions/24116271/whats-the-cleanest-way-of-applying-map-to-a-dictionary-in-swift/24219069#24219069>

I'd actually prefer that this *not* take a `Key` parameter, because that's more likely to interfere with point-free styles of programming. Reasonable people can disagree on that, though.

  • Add capacity property and reserveCapacity() method.

Eh, maybe.

We also have a lack of symmetry the other way too, as there is Dictionary.init(minimumCapacity:), but no RangeReplaceableCollection.init(minimumCapacity:)

  • Have Dictionary.removeAtIndex return the Index of the next entry.

I'm not sure what the point of this is, and I'm not sure if you think this should also be done to equivalent APIs on e.g. `RangeReplaceableCollection`.

  • (once we have conditional conformance) Make dictionaries with Equatable values Equatable.

No-brainer—definitely want this one.

Please reply here with any comments or questions on the above list, or any additions you believe are important that are missing from it.

I'd sort of like to see a protocol for key-value lookup types, sort of like `SetAlgebra` but for dictionaries. I've recently been building a custom dictionary (a `SortedDictionary` type which sorts its elements by key) and not having a protocol to describe dictionaries abstractly is a bit of an irritant.

This probably isn’t necessary at the std lib level until there is more than one std lib type that would conform to it. If DefaultedDictionary or OrderedDictionary were added, certainly.

(relatedly, SetAlgebra also needs looking at before it gets ABI-locked-down, because while it doesn’t have to be a Collection, it does have a few methods that require you be able to enumerate the elements in the set such as isDisjoint(with:), which precludes things like intentional sets where membership is determined using closures)

···

On Feb 19, 2017, at 10:37 PM, Brent Royal-Gordon <brent@architechies.com> wrote:

On Feb 16, 2017, at 4:26 PM, Ben Cohen via swift-evolution <swift-evolution@swift.org> wrote:

--
Brent Royal-Gordon
Architechies


(David Sweeris) #15

Only if the mapped dictionary’s values are the same type (or at least the same size) as in the original dictionary, right? Would the proposed function allow you to mapValues from a [K:V] to a [K:T]?

- Dave Sweeris

···

On Feb 16, 2017, at 5:17 PM, Ben Cohen <ben_cohen@apple.com> wrote:

The win with mapping only the values is that the underlying hash table can retain the same physical layout, just with different values in the slots, so it can be done faster.


(Jon Hull) #16

Add Dictionary.mapValues to return a Dictionary (can be more efficiently implemented than composition as the storage layout remains the same).

+1000. I have also been asking for this since the beginning. I built my own version (and use it frequently), but as you say, the standard library can do it much more efficiently.

I would also like to see an in-place version as well.

One design detail. Even though it is only mapping the values, I would like it to pass the key to the closure as well. It occasionally figures in to the mapping logic.

Do you have any examples you can share where you use the key in this type of map? I'm not contesting its usefulness; it would just be helpful to see some real-world usage.

I use mapValues mostly for simple transformations. Calling .lowercased on Strings so that they are consistent or lightening/darkening colors.

Where the key comes in is when the key is actually gives context to the value (i.e. the value only makes sense in the context of the key). This isn’t the prettiest example, but I have some animation code where I store some values defining the animation details in a dictionary keyed by the view to be animated. Then I can run down each view and check if it needs setup/etc… based on those details. If I need to update those details based on the current state of the view, I need access to the associated view in map. Embarrassingly messy. I would most likely refactor before releasing the code publicly, but it does work.

I would be ok with a simpler mapValues as long as a version that can change the key is also available somewhere. That would cover any of my use cases (and I would just keep the key the same).

I actually almost use the key transform more often (though I still like mapValues for it’s simplicity when that is what is needed). I transform string keys to be uniform (lowercased & removing diacritics). For example I have a parser which has a built-in color keyword that can match color names and push the associated color on the stack. The programmer can set a [String:UIColor] dictionary of names/colors. I run through and normalize the names so that the normalized user input will match.

Please reply here with any comments or questions on the above list, or any additions you believe are important that are missing from it.

I would also like to see a version of map which returns a dictionary and handles key collisions:

  let newDict = myDict.map(collision: {k,v1,v2 in v2}) { (k,v) in ... }

The collision parameter would take a throwing closure and handle the case of a key conflict (by returning the value to use, throwing, or trapping). It would have a default value so that it would only have to be specified if a different behavior was desired.

In advanced cases, the collision could be used to accumulate values together. Because of this, I would actually like to see this on *collection* (not just dictionary). The map closure is handed each element of the sequence (which in the case of dictionary is a key/value tuple), and expects a return value of a key/value tuple. The collision block is called when a key is returned which has already been used to figure out what value to use. This might choose a winner, or it could act like reduce, building a value from the components.

I think the uses you're describing are handled by the merging initializer proposed in SE-100:
  https://github.com/apple/swift-evolution/blob/master/proposals/0100-add-sequence-based-init-and-merge-to-dictionary.md

For example, you can use a combining closure to select specific elements when the keys collide:

let duplicates: DictionaryLiteral = ["a": 1, "b": 2, "a": 3, "b": 4]
// Using the first value only
Dictionary(merging: duplicates, combine: { (first, _) in first }) // ["b": 2, "a": 1]
// Using the maximum value
Dictionary(merging: duplicates, combine: max) // ["b": 4, "a": 3]

or to calculate the frequencies of values in a sequence:

extension Sequence where Iterator.Element: Hashable {
    func frequencies() -> [Iterator.Element: Int] {
        return Dictionary(merging: self.lazy.map { v in (v, 1) }, combine: +)
    }
}
[1, 2, 2, 3, 1, 2, 4, 5, 3, 2, 3, 1].frequencies()
// [2: 4, 4: 1, 5: 1, 3: 3, 1: 3]

Could you take a look and see if that provides the functionality you're looking for?

Almost. From the document, the signature was slightly different. There it only works on sequences of tuples (i.e. dictionaries). If the signature was changed to work with any sequence (as you show above), then it would meet my needs.

I do kind of like being able to call it as a method on sequence (the same way you call map on sequence), but that is just a matter of style. You could always build one form from the other...

Thanks,
Jon


(Brent Royal-Gordon) #17

From what I understand, this is implicit tuple splatting and shouldn't be allowed anymore. Or am I wrong about that?

···

On Feb 20, 2017, at 10:19 AM, Ben Cohen <ben_cohen@apple.com> wrote:

As it is right now, that wouldn’t distinguish it since Sequence.filter on Dictionary can already take a (Key,Value)->Bool transformation since (Key,Value) is the Element type:
let d = [1:"1"]
let f: (Int,String)->Bool = {_,_ in true }
d.filter(f) // [(key: 1, value: "1")]

--
Brent Royal-Gordon
Architechies


(Slava Pestov) #18

The win with mapping only the values is that the underlying hash table can retain the same physical layout, just with different values in the slots, so it can be done faster.

Only if the mapped dictionary’s values are the same type (or at least the same size) as in the original dictionary, right? Would the proposed function allow you to mapValues from a [K:V] to a [K:T]?

In both cases the key/bucket structure would be the same, I believe.

Slava

···

On Feb 16, 2017, at 5:32 PM, David Sweeris via swift-evolution <swift-evolution@swift.org> wrote:

On Feb 16, 2017, at 5:17 PM, Ben Cohen <ben_cohen@apple.com <mailto:ben_cohen@apple.com>> wrote:

- Dave Sweeris
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Ben Cohen) #19

The win with mapping only the values is that the underlying hash table can retain the same physical layout, just with different values in the slots, so it can be done faster.

Only if the mapped dictionary’s values are the same type (or at least the same size) as in the original dictionary, right? Would the proposed function allow you to mapValues from a [K:V] to a [K:T]?

In both cases the key/bucket structure would be the same, I believe.

Right. Values could be different types, you can still reuse the layout i.e. allocate the same element-count buffer (even if the size of the element differs), then put the pairs in the same places they were per the keys, without going through the whole process of computing hashes, resolving collisions etc.

···

On Feb 16, 2017, at 5:38 PM, Slava Pestov <spestov@apple.com> wrote:

On Feb 16, 2017, at 5:32 PM, David Sweeris via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Feb 16, 2017, at 5:17 PM, Ben Cohen <ben_cohen@apple.com <mailto:ben_cohen@apple.com>> wrote:

Slava

- Dave Sweeris
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution


(David Sweeris) #20

That's a neat trick!

- Dave Sweeris

···

On Feb 16, 2017, at 17:45, Ben Cohen <ben_cohen@apple.com> wrote:

On Feb 16, 2017, at 5:38 PM, Slava Pestov <spestov@apple.com> wrote:

On Feb 16, 2017, at 5:32 PM, David Sweeris via swift-evolution <swift-evolution@swift.org> wrote:

On Feb 16, 2017, at 5:17 PM, Ben Cohen <ben_cohen@apple.com> wrote:

The win with mapping only the values is that the underlying hash table can retain the same physical layout, just with different values in the slots, so it can be done faster.

Only if the mapped dictionary’s values are the same type (or at least the same size) as in the original dictionary, right? Would the proposed function allow you to mapValues from a [K:V] to a [K:T]?

In both cases the key/bucket structure would be the same, I believe.

Right. Values could be different types, you can still reuse the layout i.e. allocate the same element-count buffer (even if the size of the element differs), then put the pairs in the same places they were per the keys, without going through the whole process of computing hashes, resolving collisions etc.