Proposal: Add a sequence-based initializer to Dictionary


(Nicola Salmoria) #1

To handle the case of duplicate keys, why not allow to pass in a 'combine'
function? This could default to a preconditionFailure to be consistent with
the DictionaryLiteral behavior, but be overridden by the caller as needed.

extension Dictionary {
    /// Creates a dictionary with the keys and values in the given sequence.
    init<S: SequenceType where S.Generator.Element == Generator.Element>(_
sequence: S, combine: (existing: Value, other: Value) -> Value = {
preconditionFailure("Sequence contains duplicate keys"); return $1 } ) {
        self.init()
        for (key, value) in sequence {
            if let existing = updateValue(value, forKey: key) {
                updateValue(combine(existing: existing, other: value),
forKey: key)
            }
        }
    }
}

usage examples:

A brief draft is below... I had mostly written this up before I saw the

thread and Gwendal's similar contribution -- happy to hear feedback and
fold in comments/revisions!

Nate

---

Introduction

The Dictionary type should allow initialization from a sequence of (Key,

Value) tuples.

Motivation

Array and Set both have initializers that create a new instance from a

sequence of elements. The Array initializer is useful for converting other
sequences and collections to the "standard" collection type, but the Set
initializer is essential for recovering set operations after performing any
functional operations on a set. For example, filtering a set produces a
collection without any kind of set operations available:

let numberSet = Set(1 ... 100)
let fivesOnly = numberSet.lazy.filter { $0 % 5 == 0 }

"fivesOnly" is a LazyFilterCollection<Set<Int>> instead of a Set --

sending that back through the Set sequence initializer restores the
expected methods:

let fivesOnlySet = Set(numberSet.lazy.filter { $0 % 5 == 0 })
fivesOnlySet.isSubsetOf(numberSet) // true

Dictionary, on the other hand, has no such initializer, so a similar

operation leaves no room except for building a mutable Dictionary via
iteration or functional methods with dubious performance. These techniques
also don't support type inference from the source sequence, increasing
verbosity:

var viaIteration: [String: Int] = [:]
for (key, value) in evenOnly {
    viaIteration[key] = value
}

let viaFunction: [String: Int] = evenOnly.reduce([:]) { (cumulative,

keyValue) in

    var mutableDictionary = cumulative
    mutableDictionary[keyValue.0] = keyValue.1
    return mutableDictionary
}

Proposed solution

The proposed solution would add an initializer to Dictionary that accepts

any sequence of (Key, Value) tuple pairs, matching the Dictionary's element
type when treated as a sequence:

init<S: SequenceType where S.Generator.Element == Generator.Element>(_

sequence: S)

Instead of the techniques for recovering a Dictionary shown above, the

proposed initializer would allow a much cleaner syntax to be written:

let viaProposed = Dictionary(evenOnly)

Moreover, this new initializer would allow for some convenient uses that

aren't currently possible.

:+1:t3: Initializing from an array of tuples:

let dictFromArray = Dictionary([("a", 1), ("b", 2), ("c", 3), ("d", 4)])

:clap:t3: Initializing from a DictionaryLiteral (the type, not an actual

literal):

let literal: DictionaryLiteral = ["a": 1, "b": 2, "c": 3, "d": 4]
let dictFromDL = Dictionary(literal)

:tada: Initializing from a pair of zipped sequences (examples abound):

let letters = "abcdefghij".characters.lazy.map { String($0) }
let dictFromZip = Dictionary(zip(letters, 1...10))
// ["b": 2, "a": 1, "i": 9, "j": 10, "c": 3, "e": 5, "f": 6, "g": 7, "d":

4, "h": 8]

Potential pitfalls

One caveat is that the new initializer doesn't prevent using a sequence

with multiple identical keys. In such a case, the last key/value would
"win" and exist in the dictionary. Such an initialization is a compile-time
error with a dictionary literal, but succeeds under the new initializer:

let _ = ["z": 1, "z": 2, "z": 3, "z": 4]
// fatal error: Dictionary literal contains duplicate keys
Dictionary([("z", 1), ("z", 2), ("z", 3), ("z", 4)])
// ["z": 4]

This behavior is particularly troublesome when used in conjunction with a

mapping operation that modifies a dictionary's keys, since dictionaries
have no particular guaranteed order:

let overlapping = Dictionary(dictFromArray.lazy.map { (_, value) in ("z",

value) })

// ["z": ???]

While a pitfall, this behavior is less a symptom of the proposed API and

more an inherent problem with recovering a dictionary after modifying its
keys. The current ways of rebuilding a dictionary (as shown above) are just
as susceptible to silently dropping values. Moreover, the sequence-based
initializer for Set exhibits the same behavior, though slightly less
problematic in most cases:

let dividedNumbers = Set(numberSet.map { $0 / 20 })
// {4, 5, 2, 0, 1, 3}

Given the potential lossiness of the initializer, should it use a

parameter name for the sequence? I would suggest not, to match the syntax
of Array.init(_:slight_smile: and Set.init(_:), but a parameter like "collapsingKeys"
would make the risk clear to users.

Detailed design

The implementation is simple enough to show in the proposal:

extension Dictionary {
    /// Creates a dictionary with the keys and values in the given

sequence.

    init<S: SequenceType where S.Generator.Element ==

Generator.Element>(_ sequence: S) {

        self.init()
        for (key, value) in sequence {
            updateValue(value, forKey: key)
        }
    }
}

(As part of the standard library, this could use the nativeUpdateValue

method.)

Impact on existing code

As a new API, this will have no impact on existing code.

Alternatives considered

As suggested in the thread below, a method could be added to SequenceType

that would build a dictionary. This approach seems less of a piece with the
rest of the standard library, and overly verbose when used with a
Dictionary that is only passing through filtering or mapping operations. I
don't think the current protocol extension system could handle a
passthrough case (i.e., something like "extension SequenceType where
Generator.Element == (Key, Value)").

Alternately, the status quo could be maintained. Which would be sad.

>
> Doesn’t Swift prefer initializers?
>
> So let’s build a Dictionary initializer that eats any sequence of (key,

value) pairs:

>
> extension Dictionary {
> init<S: SequenceType where S.Generator.Element == (Key,

Value)>(keyValueSequence s: S) {

> self.init()
> for (key, value) in s {
> self[key] = value
> }
> }
> }
>
> do {
> // From array of (key, value) pairs
> let input = [("foo", 1), ("bar", 2)]
> let d = Dictionary(keyValueSequence: input)
> print(d)
> }
> do {
> // From another dictionary
> let input = [1: "foo", 2: "bar"]
> let d = Dictionary(keyValueSequence: input)
> print(d)
> }
> do {
> // Reverse key and values
> let input = [1: "foo", 2: "bar"]
> let d = Dictionary(keyValueSequence: input.map { ($1, $0) })
> print(d)
> }
>
> Gwendal
>
>>
>> I'd prefer "mapToDict" otherwise it sounds like a dictionary gets

mapped, at least for me.

>>
>> -Thorsten
>>

<swift-evolution at swift.org <mailto:swift-evolution at swift.org>>:

>>>
>>> This solution looks great! How do you feel about “mapDict”?
>>>
>>> -Kenny
>>>
>>>
>>>>>
>>>>>
>>>>>
>>>>> I named the method(s) „toDict“ instead of „map“ because map

normally returns a collection which is either the same as the receiver or a
simple one.

>>>>> The second version is more general and allows to do things like
>>>>>
>>>>> let dict = ["Tom", "Dick", "Harry"].enumerate().toDict { (index,

value) in (index + 1, value) }

>>>>
>>>> Map would probably be a more correct mathematically speaking — but

it would be inconsistent with the naming convention already chosen for
Swift. So for Swift - toDict (or toDictionary) would be the best choice.

···

> On Jan 13, 2016, at 11:55 AM, Gwendal Roué via swift-evolution <swift-evolution at swift.org> wrote:
>> Le 13 janv. 2016 à 18:41, Thorsten Seitz via swift-evolution <swift-evolution at swift.org <mailto:swift-evolution at swift.org>> a écrit :
>>> Am 13.01.2016 um 17:13 schrieb Kenny Leung via swift-evolution
>>>>> On Jan 12, 2016, at 10:28 AM, Craig Cruden <ccruden at novafore.com <mailto:ccruden at novafore.com>> wrote: