[Proposal Draft] Provide Custom Collections for Dictionary Keys and Values


(Nate Cook) #1

Introduction

This proposal addresses significant unexpected performance gaps when using dictionaries. It introduces type-specific collections for a Dictionary instance's keys and values properties.

New DictionaryKeys and DictionaryValues collections provide efficient key lookup and mutable access to dictionary values, enabling updates to be performed in-place and allowing copy-on-write optimization of stored values.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#motivation>Motivation

This proposal address two problems:

The Dictionary type keys implementation is inefficient, because LazyMapCollection doesn't know how to forward lookups to the underlying dictionary storage.
Dictionaries do not offer value-mutating APIs. The mutating key-based subscript wraps values in an Optional. This prevents types with copy-on-write optimizations from recognizing they are singly referenced.
This proposal uses the following [String: [Int]] dictionary to demonstrate these problems:

var dict = ["one": [1], "two": [2, 2], "three": [3, 3, 3]]
<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#inefficient-dictkeys-search>Inefficient dict.keys Search

Swift coders normally test key membership using nil checks or underscored optional bindings:

if dict["one"] != nil {
    // ...
}
if let _ = dict["one"] {
    // ...
}
These approaches provide the expected performance of a dictionary lookup but they read neither well nor "Swifty". Checking keys reads much better but introduces a serious performance penalty: this approach requires a linear search through a dictionary's keys to find a match.

if dict.keys.contains("one") {
    // ...
}
A similar dynamic plays out when comparing dict.index(forKey:) and dict.keys.index(of:).

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#inefficient-value-mutation>Inefficient Value Mutation

Dictionary values can be modified through the keyed subscript by direct reassignment or by using optional chaining. Both of these statements append 1 to the array stored by the key "one":

// Direct re-assignment
dict["one"] = (dict["one"] ?? []) + [1]

// Optional chaining
dict["one"]?.append(1)
Both approaches present problems. The first is complex and hard to read. The second ignores the case where "one" is not a key in the dictionary. It forces its check into a higher branch and encourages forced unwrapping. Furthermore, neither approach allows the array to grow in place. They introduce an unnecessary copy of the array's contents even though dict is the sole holder of its storage.

Adding mutation to a dictionary's index-based subscripting isn't possible. Changing a key stored at a particular index would almost certainly modify its hash value, rendering the index incorrect. This violates the requirements of the MutableCollection protocol.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#proposed-solution>Proposed Solution

This proposal adds a custom collection for the keys and values dictionary properties. This follows the example set by String, which presents multiple views of its contents. A new DictionaryKeys collection introduces efficient key lookup, while a new DictionaryValues collection provides a mutable collection interface to dictionary values.

These changes introduce a simple and efficient way of checking whether a dictionary includes a key:

// Performant
if dict.keys.contains("one") {
    // ...
}
As a mutable collection, values enables modification without copies or clumsy code:

if let i = dict.index(forKey: "one") {
    dict.values[i].append(1) // no copy here
} else {
    dict["one"] = [1]
}
Both the keys and values collections share the same index type as Dictionary. This allows the above sample to be rewritten as:

// Using `dict.keys.index(of:)`
if let i = dict.keys.index(of: "one") {
    dict.values[i].append(1)
} else {
    dict["one"] = [1]
}
<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#detailed-design>Detailed design

The standard library introduces two new collection types: DictionaryKeys and DictionaryValues.
A Dictionary's keys and values property types change from LazyMapCollection to these new types.
The new collection types are not directly constructable. They are presented only as views into a dictionary.
struct Dictionary<Key: Hashable, Value>: ... {
    var keys: DictionaryKeys<Key, Value> { get }
    var values: DictionaryValues<Key, Value> { get set }

    // Remaining declarations
}

/// A collection view of a dictionary's keys.
struct DictionaryKeys<Key: Hashable, Value>: Collection {
    typealias Index = DictionaryIndex<Key, Value>
    subscript(i: Index) -> Key { get }

    // Other `Collection` requirements
}

/// A mutable collection view of a dictionary's values.
struct DictionaryValues<Key: Hashable, Value>: MutableCollection {
    typealias Index = DictionaryIndex<Key, Value>
    subscript(i: Index) -> Value { get set }

    // Other `Collection` requirements
}
A sample implementation of this proposal can be found in this branch <https://github.com/natecook1000/swift/tree/nc-dictionary>.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#impact-on-existing-code>Impact on existing code

The performance improvements of using the new DictionaryKeys type and the mutability of the DictionaryValuescollection are both additive in nature.

Most uses of these properties are transitory in nature. Adopting this proposal should not produce a major impact on existing code. The only impact on existing code exists where a program explicitly specifies the type of a dictionary's keysor values property. The fix is to change the specified type.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#alternatives-considered>Alternatives considered

The Generics Manifesto <https://github.com/apple/swift/blob/master/docs/GenericsManifesto.md> lists nested generics as a goal. This could impact the naming and structure of these new collection types.

Instead of DictionaryKeys<Key, Value> and DictionaryValues<Key, Value>, these types could be Dictionary<Key, Value>.Keys and Dictionary<Key, Value>.Values. However, because many types in the standard library may be revisited once such a feature is available (indices, iterators, etc.), the current lack of nesting shouldn't prevent consideration of this proposal.

It could be possible to add additional compiler features that manage mutation through existing key-based subscripting without the copy-on-write problems of the current implementation. I don't know enough about how that would be implemented to speak to its feasibility or level of effort. Such a feature would reduce the need for a mutable DictionaryValues collection.


(Xiaodi Wu) #2

Very elegant solution. Strong +1; no other feedback comes to mind atm.

···

On Tue, Oct 11, 2016 at 4:28 PM, Nate Cook via swift-evolution < swift-evolution@swift.org> wrote:

Introduction

This proposal addresses significant unexpected performance gaps when using
dictionaries. It introduces type-specific collections for a Dictionary
instance's keys and values properties.

New DictionaryKeys and DictionaryValues collections provide efficient key
lookup and mutable access to dictionary values, enabling updates to be
performed in-place and allowing copy-on-write optimization of stored values.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#motivation>
Motivation

This proposal address two problems:

   - The Dictionary type keys implementation is inefficient, because
   LazyMapCollection doesn't know how to forward lookups to the
   underlying dictionary storage.
   - Dictionaries do not offer value-mutating APIs. The mutating
   key-based subscript wraps values in an Optional. This prevents types
   with copy-on-write optimizations from recognizing they are singly
   referenced.

This proposal uses the following [String: [Int]] dictionary to
demonstrate these problems:

var dict = ["one": [1], "two": [2, 2], "three": [3, 3, 3]]

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#inefficient-dictkeys-search>
Inefficient dict.keys Search

Swift coders normally test key membership using nil checks or underscored
optional bindings:

if dict["one"] != nil {
    // ...
}if let _ = dict["one"] {
    // ...
}

These approaches provide the expected performance of a dictionary lookup
but they read neither well nor "Swifty". Checking keys reads much better
but introduces a serious performance penalty: this approach requires a
linear search through a dictionary's keys to find a match.

if dict.keys.contains("one") {
    // ...
}

A similar dynamic plays out when comparing dict.index(forKey:) and
dict.keys.index(of:).

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#inefficient-value-mutation>Inefficient
Value Mutation

Dictionary values can be modified through the keyed subscript by direct
reassignment or by using optional chaining. Both of these statements append
1 to the array stored by the key "one":

// Direct re-assignment
dict["one"] = (dict["one"] ?? []) + [1]
// Optional chaining
dict["one"]?.append(1)

Both approaches present problems. The first is complex and hard to read.
The second ignores the case where "one" is not a key in the dictionary.
It forces its check into a higher branch and encourages forced unwrapping.
Furthermore, neither approach allows the array to grow in place. They
introduce an unnecessary copy of the array's contents even though dict is
the sole holder of its storage.

Adding mutation to a dictionary's index-based subscripting isn't possible.
Changing a key stored at a particular index would almost certainly modify
its hash value, rendering the index incorrect. This violates the
requirements of the MutableCollection protocol.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#proposed-solution>Proposed
Solution

This proposal adds a custom collection for the keys and values dictionary
properties. This follows the example set by String, which presents
multiple views of its contents. A new DictionaryKeys collection
introduces efficient key lookup, while a new DictionaryValues collection
provides a mutable collection interface to dictionary values.

These changes introduce a simple and efficient way of checking whether a
dictionary includes a key:

// Performantif dict.keys.contains("one") {
    // ...
}

As a mutable collection, values enables modification without copies or
clumsy code:

if let i = dict.index(forKey: "one") {
    dict.values[i].append(1) // no copy here
} else {
    dict["one"] = [1]
}

Both the keys and values collections share the same index type as
Dictionary. This allows the above sample to be rewritten as:

// Using `dict.keys.index(of:)`if let i = dict.keys.index(of: "one") {
    dict.values[i].append(1)
} else {
    dict["one"] = [1]
}

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#detailed-design>Detailed
design

   - The standard library introduces two new collection types:
   DictionaryKeys and DictionaryValues.
   - A Dictionary's keys and values property types change from
   LazyMapCollection to these new types.
   - The new collection types are not directly constructable. They are
   presented only as views into a dictionary.

struct Dictionary<Key: Hashable, Value>: ... {
    var keys: DictionaryKeys<Key, Value> { get }
    var values: DictionaryValues<Key, Value> { get set }

    // Remaining declarations
}
/// A collection view of a dictionary's keys.struct DictionaryKeys<Key: Hashable, Value>: Collection {
    typealias Index = DictionaryIndex<Key, Value>
    subscript(i: Index) -> Key { get }

    // Other `Collection` requirements
}
/// A mutable collection view of a dictionary's values.struct DictionaryValues<Key: Hashable, Value>: MutableCollection {
    typealias Index = DictionaryIndex<Key, Value>
    subscript(i: Index) -> Value { get set }

    // Other `Collection` requirements
}

A sample implementation of this proposal can be found in this branch
<https://github.com/natecook1000/swift/tree/nc-dictionary>.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#impact-on-existing-code>Impact
on existing code

The performance improvements of using the new DictionaryKeys type and the
mutability of the DictionaryValuescollection are both additive in nature.

Most uses of these properties are transitory in nature. Adopting this
proposal should not produce a major impact on existing code. The only
impact on existing code exists where a program explicitly specifies the
type of a dictionary's keysor values property. The fix is to change the
specified type.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#alternatives-considered>Alternatives
considered

The Generics Manifesto
<https://github.com/apple/swift/blob/master/docs/GenericsManifesto.md> lists
nested generics as a goal. This could impact the naming and structure of
these new collection types.

Instead of DictionaryKeys<Key, Value> and DictionaryValues<Key, Value>,
these types could be Dictionary<Key, Value>.Keys and Dictionary<Key,
>.Values. However, because many types in the standard library may be
revisited once such a feature is available (indices, iterators, etc.), the
current lack of nesting shouldn't prevent consideration of this proposal.
It could be possible to add additional compiler features that manage
mutation through existing key-based subscripting without the copy-on-write
problems of the current implementation. I don't know enough about how that
would be implemented to speak to its feasibility or level of effort. Such a
feature would reduce the need for a mutable DictionaryValues collection.

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Matthew Johnson) #3

Looks very nice. +1 here as well.

···

On Oct 11, 2016, at 4:28 PM, Nate Cook via swift-evolution <swift-evolution@swift.org> wrote:

Introduction

This proposal addresses significant unexpected performance gaps when using dictionaries. It introduces type-specific collections for a Dictionary instance's keys and values properties.

New DictionaryKeys and DictionaryValues collections provide efficient key lookup and mutable access to dictionary values, enabling updates to be performed in-place and allowing copy-on-write optimization of stored values.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#motivation>Motivation

This proposal address two problems:

The Dictionary type keys implementation is inefficient, because LazyMapCollection doesn't know how to forward lookups to the underlying dictionary storage.
Dictionaries do not offer value-mutating APIs. The mutating key-based subscript wraps values in an Optional. This prevents types with copy-on-write optimizations from recognizing they are singly referenced.
This proposal uses the following [String: [Int]] dictionary to demonstrate these problems:

var dict = ["one": [1], "two": [2, 2], "three": [3, 3, 3]]
<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#inefficient-dictkeys-search>Inefficient dict.keys Search

Swift coders normally test key membership using nil checks or underscored optional bindings:

if dict["one"] != nil {
    // ...
}
if let _ = dict["one"] {
    // ...
}
These approaches provide the expected performance of a dictionary lookup but they read neither well nor "Swifty". Checking keys reads much better but introduces a serious performance penalty: this approach requires a linear search through a dictionary's keys to find a match.

if dict.keys.contains("one") {
    // ...
}
A similar dynamic plays out when comparing dict.index(forKey:) and dict.keys.index(of:).

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#inefficient-value-mutation>Inefficient Value Mutation

Dictionary values can be modified through the keyed subscript by direct reassignment or by using optional chaining. Both of these statements append 1 to the array stored by the key "one":

// Direct re-assignment
dict["one"] = (dict["one"] ?? []) + [1]

// Optional chaining
dict["one"]?.append(1)
Both approaches present problems. The first is complex and hard to read. The second ignores the case where "one" is not a key in the dictionary. It forces its check into a higher branch and encourages forced unwrapping. Furthermore, neither approach allows the array to grow in place. They introduce an unnecessary copy of the array's contents even though dict is the sole holder of its storage.

Adding mutation to a dictionary's index-based subscripting isn't possible. Changing a key stored at a particular index would almost certainly modify its hash value, rendering the index incorrect. This violates the requirements of the MutableCollection protocol.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#proposed-solution>Proposed Solution

This proposal adds a custom collection for the keys and values dictionary properties. This follows the example set by String, which presents multiple views of its contents. A new DictionaryKeys collection introduces efficient key lookup, while a new DictionaryValues collection provides a mutable collection interface to dictionary values.

These changes introduce a simple and efficient way of checking whether a dictionary includes a key:

// Performant
if dict.keys.contains("one") {
    // ...
}
As a mutable collection, values enables modification without copies or clumsy code:

if let i = dict.index(forKey: "one") {
    dict.values[i].append(1) // no copy here
} else {
    dict["one"] = [1]
}
Both the keys and values collections share the same index type as Dictionary. This allows the above sample to be rewritten as:

// Using `dict.keys.index(of:)`
if let i = dict.keys.index(of: "one") {
    dict.values[i].append(1)
} else {
    dict["one"] = [1]
}
<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#detailed-design>Detailed design

The standard library introduces two new collection types: DictionaryKeys and DictionaryValues.
A Dictionary's keys and values property types change from LazyMapCollection to these new types.
The new collection types are not directly constructable. They are presented only as views into a dictionary.
struct Dictionary<Key: Hashable, Value>: ... {
    var keys: DictionaryKeys<Key, Value> { get }
    var values: DictionaryValues<Key, Value> { get set }

    // Remaining declarations
}

/// A collection view of a dictionary's keys.
struct DictionaryKeys<Key: Hashable, Value>: Collection {
    typealias Index = DictionaryIndex<Key, Value>
    subscript(i: Index) -> Key { get }

    // Other `Collection` requirements
}

/// A mutable collection view of a dictionary's values.
struct DictionaryValues<Key: Hashable, Value>: MutableCollection {
    typealias Index = DictionaryIndex<Key, Value>
    subscript(i: Index) -> Value { get set }

    // Other `Collection` requirements
}
A sample implementation of this proposal can be found in this branch <https://github.com/natecook1000/swift/tree/nc-dictionary>.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#impact-on-existing-code>Impact on existing code

The performance improvements of using the new DictionaryKeys type and the mutability of the DictionaryValuescollection are both additive in nature.

Most uses of these properties are transitory in nature. Adopting this proposal should not produce a major impact on existing code. The only impact on existing code exists where a program explicitly specifies the type of a dictionary's keysor values property. The fix is to change the specified type.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#alternatives-considered>Alternatives considered

The Generics Manifesto <https://github.com/apple/swift/blob/master/docs/GenericsManifesto.md> lists nested generics as a goal. This could impact the naming and structure of these new collection types.

Instead of DictionaryKeys<Key, Value> and DictionaryValues<Key, Value>, these types could be Dictionary<Key, Value>.Keys and Dictionary<Key, Value>.Values. However, because many types in the standard library may be revisited once such a feature is available (indices, iterators, etc.), the current lack of nesting shouldn't prevent consideration of this proposal.

It could be possible to add additional compiler features that manage mutation through existing key-based subscripting without the copy-on-write problems of the current implementation. I don't know enough about how that would be implemented to speak to its feasibility or level of effort. Such a feature would reduce the need for a mutable DictionaryValues collection.
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Jacob Bandes-Storch) #4

+1. Haven't hit this issue personally, but I agree it's important and the
proposed solution seems like the right fix.

···

On Tue, Oct 11, 2016 at 2:28 PM, Nate Cook via swift-evolution < swift-evolution@swift.org> wrote:

Introduction

This proposal addresses significant unexpected performance gaps when using
dictionaries. It introduces type-specific collections for a Dictionary
instance's keys and values properties.

New DictionaryKeys and DictionaryValues collections provide efficient key
lookup and mutable access to dictionary values, enabling updates to be
performed in-place and allowing copy-on-write optimization of stored values.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#motivation>
Motivation

This proposal address two problems:

   - The Dictionary type keys implementation is inefficient, because
   LazyMapCollection doesn't know how to forward lookups to the
   underlying dictionary storage.
   - Dictionaries do not offer value-mutating APIs. The mutating
   key-based subscript wraps values in an Optional. This prevents types
   with copy-on-write optimizations from recognizing they are singly
   referenced.

This proposal uses the following [String: [Int]] dictionary to
demonstrate these problems:

var dict = ["one": [1], "two": [2, 2], "three": [3, 3, 3]]

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#inefficient-dictkeys-search>
Inefficient dict.keys Search

Swift coders normally test key membership using nil checks or underscored
optional bindings:

if dict["one"] != nil {
    // ...
}if let _ = dict["one"] {
    // ...
}

These approaches provide the expected performance of a dictionary lookup
but they read neither well nor "Swifty". Checking keys reads much better
but introduces a serious performance penalty: this approach requires a
linear search through a dictionary's keys to find a match.

if dict.keys.contains("one") {
    // ...
}

A similar dynamic plays out when comparing dict.index(forKey:) and
dict.keys.index(of:).

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#inefficient-value-mutation>Inefficient
Value Mutation

Dictionary values can be modified through the keyed subscript by direct
reassignment or by using optional chaining. Both of these statements append
1 to the array stored by the key "one":

// Direct re-assignment
dict["one"] = (dict["one"] ?? []) + [1]
// Optional chaining
dict["one"]?.append(1)

Both approaches present problems. The first is complex and hard to read.
The second ignores the case where "one" is not a key in the dictionary.
It forces its check into a higher branch and encourages forced unwrapping.
Furthermore, neither approach allows the array to grow in place. They
introduce an unnecessary copy of the array's contents even though dict is
the sole holder of its storage.

Adding mutation to a dictionary's index-based subscripting isn't possible.
Changing a key stored at a particular index would almost certainly modify
its hash value, rendering the index incorrect. This violates the
requirements of the MutableCollection protocol.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#proposed-solution>Proposed
Solution

This proposal adds a custom collection for the keys and values dictionary
properties. This follows the example set by String, which presents
multiple views of its contents. A new DictionaryKeys collection
introduces efficient key lookup, while a new DictionaryValues collection
provides a mutable collection interface to dictionary values.

These changes introduce a simple and efficient way of checking whether a
dictionary includes a key:

// Performantif dict.keys.contains("one") {
    // ...
}

As a mutable collection, values enables modification without copies or
clumsy code:

if let i = dict.index(forKey: "one") {
    dict.values[i].append(1) // no copy here
} else {
    dict["one"] = [1]
}

Both the keys and values collections share the same index type as
Dictionary. This allows the above sample to be rewritten as:

// Using `dict.keys.index(of:)`if let i = dict.keys.index(of: "one") {
    dict.values[i].append(1)
} else {
    dict["one"] = [1]
}

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#detailed-design>Detailed
design

   - The standard library introduces two new collection types:
   DictionaryKeys and DictionaryValues.
   - A Dictionary's keys and values property types change from
   LazyMapCollection to these new types.
   - The new collection types are not directly constructable. They are
   presented only as views into a dictionary.

struct Dictionary<Key: Hashable, Value>: ... {
    var keys: DictionaryKeys<Key, Value> { get }
    var values: DictionaryValues<Key, Value> { get set }

    // Remaining declarations
}
/// A collection view of a dictionary's keys.struct DictionaryKeys<Key: Hashable, Value>: Collection {
    typealias Index = DictionaryIndex<Key, Value>
    subscript(i: Index) -> Key { get }

    // Other `Collection` requirements
}
/// A mutable collection view of a dictionary's values.struct DictionaryValues<Key: Hashable, Value>: MutableCollection {
    typealias Index = DictionaryIndex<Key, Value>
    subscript(i: Index) -> Value { get set }

    // Other `Collection` requirements
}

A sample implementation of this proposal can be found in this branch
<https://github.com/natecook1000/swift/tree/nc-dictionary>.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#impact-on-existing-code>Impact
on existing code

The performance improvements of using the new DictionaryKeys type and the
mutability of the DictionaryValuescollection are both additive in nature.

Most uses of these properties are transitory in nature. Adopting this
proposal should not produce a major impact on existing code. The only
impact on existing code exists where a program explicitly specifies the
type of a dictionary's keysor values property. The fix is to change the
specified type.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#alternatives-considered>Alternatives
considered

The Generics Manifesto
<https://github.com/apple/swift/blob/master/docs/GenericsManifesto.md> lists
nested generics as a goal. This could impact the naming and structure of
these new collection types.

Instead of DictionaryKeys<Key, Value> and DictionaryValues<Key, Value>,
these types could be Dictionary<Key, Value>.Keys and Dictionary<Key,
>.Values. However, because many types in the standard library may be
revisited once such a feature is available (indices, iterators, etc.), the
current lack of nesting shouldn't prevent consideration of this proposal.
It could be possible to add additional compiler features that manage
mutation through existing key-based subscripting without the copy-on-write
problems of the current implementation. I don't know enough about how that
would be implemented to speak to its feasibility or level of effort. Such a
feature would reduce the need for a mutable DictionaryValues collection.

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Rien) #5

Beautiful, +1

Rien

···

On 11 Oct 2016, at 23:28, Nate Cook via swift-evolution <swift-evolution@swift.org> wrote:

Introduction

This proposal addresses significant unexpected performance gaps when using dictionaries. It introduces type-specific collections for a Dictionary instance's keys and values properties.

New DictionaryKeys and DictionaryValues collections provide efficient key lookup and mutable access to dictionary values, enabling updates to be performed in-place and allowing copy-on-write optimization of stored values.

Motivation

This proposal address two problems:

  • The Dictionary type keys implementation is inefficient, because LazyMapCollection doesn't know how to forward lookups to the underlying dictionary storage.
  • Dictionaries do not offer value-mutating APIs. The mutating key-based subscript wraps values in an Optional. This prevents types with copy-on-write optimizations from recognizing they are singly referenced.
This proposal uses the following [String: [Int]] dictionary to demonstrate these problems:

var dict = ["one": [1], "two": [2, 2], "three": [3, 3, 3]]
Inefficient dict.keys Search

Swift coders normally test key membership using nil checks or underscored optional bindings:

if dict["one"] != nil
{
    
// ...

}

if let _ = dict["one"
] {
    
// ...

}

These approaches provide the expected performance of a dictionary lookup but they read neither well nor "Swifty". Checking keys reads much better but introduces a serious performance penalty: this approach requires a linear search through a dictionary's keys to find a match.

if dict.keys.contains("one") {
    // ...
}

A similar dynamic plays out when comparing dict.index(forKey:) and dict.keys.index(of:).

Inefficient Value Mutation

Dictionary values can be modified through the keyed subscript by direct reassignment or by using optional chaining. Both of these statements append 1 to the array stored by the key "one":

// Direct re-assignment

dict[
"one"] = (dict["one"] ?? []) + [1
]

// Optional chaining

dict[
"one"]?.append(1)
Both approaches present problems. The first is complex and hard to read. The second ignores the case where "one" is not a key in the dictionary. It forces its check into a higher branch and encourages forced unwrapping. Furthermore, neither approach allows the array to grow in place. They introduce an unnecessary copy of the array's contents even though dict is the sole holder of its storage.

Adding mutation to a dictionary's index-based subscripting isn't possible. Changing a key stored at a particular index would almost certainly modify its hash value, rendering the index incorrect. This violates the requirements of the MutableCollection protocol.

Proposed Solution

This proposal adds a custom collection for the keys and values dictionary properties. This follows the example set by String, which presents multiple views of its contents. A new DictionaryKeys collection introduces efficient key lookup, while a new DictionaryValues collection provides a mutable collection interface to dictionary values.

These changes introduce a simple and efficient way of checking whether a dictionary includes a key:

// Performant
if dict.keys.contains("one"
) {
    
// ...

}

As a mutable collection, values enables modification without copies or clumsy code:

if let i = dict.index(forKey: "one"
) {
    dict
.values[i].append(1) // no copy here

}
else
{
    dict[
"one"] = [1
]
}

Both the keys and values collections share the same index type as Dictionary. This allows the above sample to be rewritten as:

// Using `dict.keys.index(of:)`
if let i = dict.keys.index(of: "one"
) {
    dict
.values[i].append(1
)
}
else
{
    dict[
"one"] = [1
]
}

Detailed design

  • The standard library introduces two new collection types: DictionaryKeys and DictionaryValues.
  • A Dictionary's keys and values property types change from LazyMapCollection to these new types.
  • The new collection types are not directly constructable. They are presented only as views into a dictionary.
struct Dictionary<Key: Hashable, Value>: ...
{
    
var keys: DictionaryKeys<Key, Value> { get
}
    
var values: DictionaryValues<Key, Value> { get set
}

// Remaining declarations

}

/// A collection view of a dictionary's keys.
struct DictionaryKeys<Key: Hashable, Value>
: Collection {
    
typealias Index = DictionaryIndex<Key, Value>

subscript(i: Index) -> Key { get
}

// Other `Collection` requirements

}

/// A mutable collection view of a dictionary's values.
struct DictionaryValues<Key: Hashable, Value>
: MutableCollection {
    
typealias Index = DictionaryIndex<Key, Value>

subscript(i: Index) -> Value { get set
}

// Other `Collection` requirements

}

A sample implementation of this proposal can be found in this branch.

Impact on existing code

The performance improvements of using the new DictionaryKeys type and the mutability of the DictionaryValuescollection are both additive in nature.

Most uses of these properties are transitory in nature. Adopting this proposal should not produce a major impact on existing code. The only impact on existing code exists where a program explicitly specifies the type of a dictionary's keysor values property. The fix is to change the specified type.

Alternatives considered

The Generics Manifesto lists nested generics as a goal. This could impact the naming and structure of these new collection types.

Instead of DictionaryKeys<Key, Value> and DictionaryValues<Key, Value>, these types could be Dictionary<Key, Value>.Keys and Dictionary<Key, Value>.Values. However, because many types in the standard library may be revisited once such a feature is available (indices, iterators, etc.), the current lack of nesting shouldn't prevent consideration of this proposal.

It could be possible to add additional compiler features that manage mutation through existing key-based subscripting without the copy-on-write problems of the current implementation. I don't know enough about how that would be implemented to speak to its feasibility or level of effort. Such a feature would reduce the need for a mutable DictionaryValues collection.
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(plx) #6

The issue addressed is real; I’m not sure this is the best approach.

In particular, unless I’m missing something obvious, the ownership strategy here would have to be:

- `DictionaryKeys` and `DictionaryValues` would each induce the expected +1 retain on the underlying storage
- `DictionaryValues`’s mutations avoid triggering COW on the underlying storage by skipping the usual ownership check

…as otherwise it’s unclear how you’d do those in-place mutations (and this seems to be how the implementation works...is that correct?).

With that design, it seems like you’d wind up allowing things like the below:

  // example A
  let foo = [ “abc”: [1,2,3], “efg”: [4,5,6] ]
  let bar = foo // shared storage, no COW
  foo.values[foo.index(of: “abc”)!].append(789) // update shared storage, no COW

  // shared storage mutated,
  // despite (a) both being `let` and (b) only foo.values getting touched
  foo[“abc”] // [1, 2, 3, 789]
  bar[“abc”] // [1, 2, 3, 789]

  // example B
  var foo = [ “abc”: [1,2,3], “efg”: [4,5,6] ]
  var bar = foo // shared storage, no COW
  foo.values[foo.index(of: “abc”)!].append(789)

  // shared storage mutated only foo.values getting touched
  foo[“abc”] // [1, 2, 3, 789]
  bar[“abc”] // [1, 2, 3, 789]

  // example C
  var foo = [ “abc”: [1,2,3], “efg”: [4,5,6] ]
  var bar = foo
  bar[“abc”] = [1, 2, 3, 4] // COW triggered here, no shared storage
  foo.values[foo.index(of: “abc”)!].append(789)

  // only `foo`’s storage mutated, b/c change to `bar` triggered COW
  foo[“abc”] // [1, 2, 3, 789]
  bar[“abc”] // [1, 2, 3, 4]

…where both A (by itself) and the B/C contrast seem very unwelcome.

Also, even if we assume we only ever make *responsible* use, having the stdlib include such directly-mutating views would seem likely to complicate any future concurrency plans.

To reiterate, I think the issue being addressed here is extremely important…I just don’t think I can get behind this type of solution (unless I’m grossly misunderstanding its mechanics).

···

On Oct 11, 2016, at 4:28 PM, Nate Cook via swift-evolution <swift-evolution@swift.org> wrote:

Introduction

This proposal addresses significant unexpected performance gaps when using dictionaries. It introduces type-specific collections for a Dictionary instance's keys and values properties.

New DictionaryKeys and DictionaryValues collections provide efficient key lookup and mutable access to dictionary values, enabling updates to be performed in-place and allowing copy-on-write optimization of stored values.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#motivation>Motivation

This proposal address two problems:

The Dictionary type keys implementation is inefficient, because LazyMapCollection doesn't know how to forward lookups to the underlying dictionary storage.
Dictionaries do not offer value-mutating APIs. The mutating key-based subscript wraps values in an Optional. This prevents types with copy-on-write optimizations from recognizing they are singly referenced.
This proposal uses the following [String: [Int]] dictionary to demonstrate these problems:

var dict = ["one": [1], "two": [2, 2], "three": [3, 3, 3]]
<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#inefficient-dictkeys-search>Inefficient dict.keys Search

Swift coders normally test key membership using nil checks or underscored optional bindings:

if dict["one"] != nil {
    // ...
}
if let _ = dict["one"] {
    // ...
}
These approaches provide the expected performance of a dictionary lookup but they read neither well nor "Swifty". Checking keys reads much better but introduces a serious performance penalty: this approach requires a linear search through a dictionary's keys to find a match.

if dict.keys.contains("one") {
    // ...
}
A similar dynamic plays out when comparing dict.index(forKey:) and dict.keys.index(of:).

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#inefficient-value-mutation>Inefficient Value Mutation

Dictionary values can be modified through the keyed subscript by direct reassignment or by using optional chaining. Both of these statements append 1 to the array stored by the key "one":

// Direct re-assignment
dict["one"] = (dict["one"] ?? []) + [1]

// Optional chaining
dict["one"]?.append(1)
Both approaches present problems. The first is complex and hard to read. The second ignores the case where "one" is not a key in the dictionary. It forces its check into a higher branch and encourages forced unwrapping. Furthermore, neither approach allows the array to grow in place. They introduce an unnecessary copy of the array's contents even though dict is the sole holder of its storage.

Adding mutation to a dictionary's index-based subscripting isn't possible. Changing a key stored at a particular index would almost certainly modify its hash value, rendering the index incorrect. This violates the requirements of the MutableCollection protocol.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#proposed-solution>Proposed Solution

This proposal adds a custom collection for the keys and values dictionary properties. This follows the example set by String, which presents multiple views of its contents. A new DictionaryKeys collection introduces efficient key lookup, while a new DictionaryValues collection provides a mutable collection interface to dictionary values.

These changes introduce a simple and efficient way of checking whether a dictionary includes a key:

// Performant
if dict.keys.contains("one") {
    // ...
}
As a mutable collection, values enables modification without copies or clumsy code:

if let i = dict.index(forKey: "one") {
    dict.values[i].append(1) // no copy here
} else {
    dict["one"] = [1]
}
Both the keys and values collections share the same index type as Dictionary. This allows the above sample to be rewritten as:

// Using `dict.keys.index(of:)`
if let i = dict.keys.index(of: "one") {
    dict.values[i].append(1)
} else {
    dict["one"] = [1]
}
<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#detailed-design>Detailed design

The standard library introduces two new collection types: DictionaryKeys and DictionaryValues.
A Dictionary's keys and values property types change from LazyMapCollection to these new types.
The new collection types are not directly constructable. They are presented only as views into a dictionary.
struct Dictionary<Key: Hashable, Value>: ... {
    var keys: DictionaryKeys<Key, Value> { get }
    var values: DictionaryValues<Key, Value> { get set }

    // Remaining declarations
}

/// A collection view of a dictionary's keys.
struct DictionaryKeys<Key: Hashable, Value>: Collection {
    typealias Index = DictionaryIndex<Key, Value>
    subscript(i: Index) -> Key { get }

    // Other `Collection` requirements
}

/// A mutable collection view of a dictionary's values.
struct DictionaryValues<Key: Hashable, Value>: MutableCollection {
    typealias Index = DictionaryIndex<Key, Value>
    subscript(i: Index) -> Value { get set }

    // Other `Collection` requirements
}
A sample implementation of this proposal can be found in this branch <https://github.com/natecook1000/swift/tree/nc-dictionary>.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#impact-on-existing-code>Impact on existing code

The performance improvements of using the new DictionaryKeys type and the mutability of the DictionaryValuescollection are both additive in nature.

Most uses of these properties are transitory in nature. Adopting this proposal should not produce a major impact on existing code. The only impact on existing code exists where a program explicitly specifies the type of a dictionary's keysor values property. The fix is to change the specified type.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#alternatives-considered>Alternatives considered

The Generics Manifesto <https://github.com/apple/swift/blob/master/docs/GenericsManifesto.md> lists nested generics as a goal. This could impact the naming and structure of these new collection types.

Instead of DictionaryKeys<Key, Value> and DictionaryValues<Key, Value>, these types could be Dictionary<Key, Value>.Keys and Dictionary<Key, Value>.Values. However, because many types in the standard library may be revisited once such a feature is available (indices, iterators, etc.), the current lack of nesting shouldn't prevent consideration of this proposal.

It could be possible to add additional compiler features that manage mutation through existing key-based subscripting without the copy-on-write problems of the current implementation. I don't know enough about how that would be implemented to speak to its feasibility or level of effort. Such a feature would reduce the need for a mutable DictionaryValues collection.
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Daniel Vollmer) #7

Hi,

I very much think the points mentioned in the motivation are worth addressing (and IMO this is not an area where “maybe the optimizer can be made smarter” can cut it; I want performance guarantees, not hopes).

[snip]

On a shallow read I like presented approach, except for

Both the keys and values collections share the same index type as Dictionary. This allows the above sample to be rewritten as:

// Using `dict.keys.index(of:)`
if let i = dict.keys.index(of: "one") {
    dict.values[i].append(1)
} else {
    dict["one"] = [1]
}

The asymmetry between the if / else branches seems ugly to me. That is once obtaining the value “directly” from dict, and once through the values-view. I don’t have a great solution here, but is is possible to subscript the dict by its `Index` as well as its `Key`?

// Using `dict.keys.index(of:)`
if let i = dict.keys.index(of: "one") {
    dict[i].append(1)
} else {
    dict["one"] = [1]
}

On another note, I’m not sure whether there is a way (or whether it’s even worth trying) to avoid hashing the key twice when the `else` branch is taken.

  Daniel.

···

On 11 Oct 2016, at 23:28, Nate Cook via swift-evolution <swift-evolution@swift.org> wrote:


(Jean-Denis Muys) #8

A clear win for me. +1

···

On 11 Oct 2016, at 23:28, Nate Cook via swift-evolution <swift-evolution@swift.org> wrote:

Introduction

This proposal addresses significant unexpected performance gaps when using dictionaries. It introduces type-specific collections for a Dictionary instance's keys and values properties.

New DictionaryKeys and DictionaryValues collections provide efficient key lookup and mutable access to dictionary values, enabling updates to be performed in-place and allowing copy-on-write optimization of stored values.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#motivation>Motivation

This proposal address two problems:

The Dictionary type keys implementation is inefficient, because LazyMapCollection doesn't know how to forward lookups to the underlying dictionary storage.
Dictionaries do not offer value-mutating APIs. The mutating key-based subscript wraps values in an Optional. This prevents types with copy-on-write optimizations from recognizing they are singly referenced.
This proposal uses the following [String: [Int]] dictionary to demonstrate these problems:

var dict = ["one": [1], "two": [2, 2], "three": [3, 3, 3]]
<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#inefficient-dictkeys-search>Inefficient dict.keys Search

Swift coders normally test key membership using nil checks or underscored optional bindings:

if dict["one"] != nil {
    // ...
}
if let _ = dict["one"] {
    // ...
}
These approaches provide the expected performance of a dictionary lookup but they read neither well nor "Swifty". Checking keys reads much better but introduces a serious performance penalty: this approach requires a linear search through a dictionary's keys to find a match.

if dict.keys.contains("one") {
    // ...
}
A similar dynamic plays out when comparing dict.index(forKey:) and dict.keys.index(of:).

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#inefficient-value-mutation>Inefficient Value Mutation

Dictionary values can be modified through the keyed subscript by direct reassignment or by using optional chaining. Both of these statements append 1 to the array stored by the key "one":

// Direct re-assignment
dict["one"] = (dict["one"] ?? []) + [1]

// Optional chaining
dict["one"]?.append(1)
Both approaches present problems. The first is complex and hard to read. The second ignores the case where "one" is not a key in the dictionary. It forces its check into a higher branch and encourages forced unwrapping. Furthermore, neither approach allows the array to grow in place. They introduce an unnecessary copy of the array's contents even though dict is the sole holder of its storage.

Adding mutation to a dictionary's index-based subscripting isn't possible. Changing a key stored at a particular index would almost certainly modify its hash value, rendering the index incorrect. This violates the requirements of the MutableCollection protocol.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#proposed-solution>Proposed Solution

This proposal adds a custom collection for the keys and values dictionary properties. This follows the example set by String, which presents multiple views of its contents. A new DictionaryKeys collection introduces efficient key lookup, while a new DictionaryValues collection provides a mutable collection interface to dictionary values.

These changes introduce a simple and efficient way of checking whether a dictionary includes a key:

// Performant
if dict.keys.contains("one") {
    // ...
}
As a mutable collection, values enables modification without copies or clumsy code:

if let i = dict.index(forKey: "one") {
    dict.values[i].append(1) // no copy here
} else {
    dict["one"] = [1]
}
Both the keys and values collections share the same index type as Dictionary. This allows the above sample to be rewritten as:

// Using `dict.keys.index(of:)`
if let i = dict.keys.index(of: "one") {
    dict.values[i].append(1)
} else {
    dict["one"] = [1]
}
<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#detailed-design>Detailed design

The standard library introduces two new collection types: DictionaryKeys and DictionaryValues.
A Dictionary's keys and values property types change from LazyMapCollection to these new types.
The new collection types are not directly constructable. They are presented only as views into a dictionary.
struct Dictionary<Key: Hashable, Value>: ... {
    var keys: DictionaryKeys<Key, Value> { get }
    var values: DictionaryValues<Key, Value> { get set }

    // Remaining declarations
}

/// A collection view of a dictionary's keys.
struct DictionaryKeys<Key: Hashable, Value>: Collection {
    typealias Index = DictionaryIndex<Key, Value>
    subscript(i: Index) -> Key { get }

    // Other `Collection` requirements
}

/// A mutable collection view of a dictionary's values.
struct DictionaryValues<Key: Hashable, Value>: MutableCollection {
    typealias Index = DictionaryIndex<Key, Value>
    subscript(i: Index) -> Value { get set }

    // Other `Collection` requirements
}
A sample implementation of this proposal can be found in this branch <https://github.com/natecook1000/swift/tree/nc-dictionary>.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#impact-on-existing-code>Impact on existing code

The performance improvements of using the new DictionaryKeys type and the mutability of the DictionaryValuescollection are both additive in nature.

Most uses of these properties are transitory in nature. Adopting this proposal should not produce a major impact on existing code. The only impact on existing code exists where a program explicitly specifies the type of a dictionary's keysor values property. The fix is to change the specified type.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#alternatives-considered>Alternatives considered

The Generics Manifesto <https://github.com/apple/swift/blob/master/docs/GenericsManifesto.md> lists nested generics as a goal. This could impact the naming and structure of these new collection types.

Instead of DictionaryKeys<Key, Value> and DictionaryValues<Key, Value>, these types could be Dictionary<Key, Value>.Keys and Dictionary<Key, Value>.Values. However, because many types in the standard library may be revisited once such a feature is available (indices, iterators, etc.), the current lack of nesting shouldn't prevent consideration of this proposal.

It could be possible to add additional compiler features that manage mutation through existing key-based subscripting without the copy-on-write problems of the current implementation. I don't know enough about how that would be implemented to speak to its feasibility or level of effort. Such a feature would reduce the need for a mutable DictionaryValues collection.
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Alexis) #9

Just to clarify: It seems like the only ABI-affecting change here is the type of keys/values. As you note at the end of your proposal, this should just be Dictionary.Keys/Dictionary.Values regardless of whether we implement this proposal or not, in which case this can be punted for Swift 4. It should be fine to keep .Keys/.Values resilient so that we can change their implementation details later if we want.

On the actual proposal: this is a pretty reasonable given Swift’s current design and constraints. That said, I expect pushing forward on this kind of thing right now is premature given the goals of Swift 4. A major aspect of Swift 4 is reworking the way CoW semantics function internally, which could drastically affect the way we approach this problem.

I’d really like if we could eliminate the “double search/hash” in the no-existing-key case. There are ways to do this really cleanly, but they probably involve more advanced CoW-safety propagation. In particular, you want some way for the collection to return its search state to the caller so that they can hand it back to insertion to just resume from there.

For instance:

map.entries[key] // An enum like Found(Value) | NotFound(SearchState)
   .withDefault(value: []) // Unwrap the enum by completing the NotFound(SearchState)
   .append(1) // Now we have a value in both cases, we can append!

Or more complex:

map.entries[key]
   .withDefault { /* logic that computes value */ }
   .append(1)

I think this can be made to work in the current system if withDefault is actually `[withDefault:]`, which is fine but a bit weird from a user’s perspective.

In an ideal world the user could actually pattern match on the result of `entries[key]`. In this way they could match on it and perform special logic in both cases for really complex situations. This would make withDefault “just a convenience”, so we aren’t pressured to add more methods like it every time someone has a new Even More Complex use-case. e.g.:

switch map.entries[key] {
case .Found(entry):
  if entry.value == 10 {
    entry.remove()
    print(“Found a value too many times! Moving key to fast-path auxiliary structure…”)
  } else {
    entry.value += 1
  }
case .NotFound(entry):
  entry.insert(1)
  print(“Found a value for the first time! Registering a bunch of extra stuff…”)
}

But again, this is all dependent on a much more powerful SIL/ARC, and we just don’t know what we’re going to get at this stage.


(Karoy Lorentey) #10

I think this is a lovely approach to solving this API design problem.

One thing I don’t quite understand yet is how these kinds of mutable views interact with copy on write semantics. COW rules can be subtle, and these views seem to put an extra twist on top that seems hard to understand/explain.

I would expect DictionaryValues to behave like a separate copy of the dictionary:

   var dict = [“foo”: 1, “bar”: 2]
   let i = dict.keys.index(of: “foo”)!
   var values = dict.values
   values[i] = 42 // COW copy
   print(dict[“foo”]) // => 1
   dict.values = values // Original storage is released
   print(dict[“foo”]) // => 42

Given that `values` contains a strong reference to the storage, I was curious to find out how this copy is elided when we write `dict.values[i] = 42`. So I tried building your branch, and I found that mutating `values` has an immediate effect on the dictionary as well:

   let dict = [“foo”: 1, “bar”: 2] // Note let, not var
   let copy = dict
   let i = dict.keys.index(of: “foo”)!
   var values = dict.values
   values[i] = 42
   print(dict[“foo”]) // => 42 (?!)
   print(copy[“foo”]) // => 42 (?!?!)

This is not the intended behavior, right?

Károly

···

On 2016-10-11, at 23:28, Nate Cook via swift-evolution <swift-evolution@swift.org> wrote:

Introduction

This proposal addresses significant unexpected performance gaps when using dictionaries. It introduces type-specific collections for a Dictionary instance's keys and values properties.

New DictionaryKeys and DictionaryValues collections provide efficient key lookup and mutable access to dictionary values, enabling updates to be performed in-place and allowing copy-on-write optimization of stored values.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#motivation>Motivation

This proposal address two problems:

The Dictionary type keys implementation is inefficient, because LazyMapCollection doesn't know how to forward lookups to the underlying dictionary storage.
Dictionaries do not offer value-mutating APIs. The mutating key-based subscript wraps values in an Optional. This prevents types with copy-on-write optimizations from recognizing they are singly referenced.
This proposal uses the following [String: [Int]] dictionary to demonstrate these problems:

var dict = ["one": [1], "two": [2, 2], "three": [3, 3, 3]]
<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#inefficient-dictkeys-search>Inefficient dict.keys Search

Swift coders normally test key membership using nil checks or underscored optional bindings:

if dict["one"] != nil {
    // ...
}
if let _ = dict["one"] {
    // ...
}
These approaches provide the expected performance of a dictionary lookup but they read neither well nor "Swifty". Checking keys reads much better but introduces a serious performance penalty: this approach requires a linear search through a dictionary's keys to find a match.

if dict.keys.contains("one") {
    // ...
}
A similar dynamic plays out when comparing dict.index(forKey:) and dict.keys.index(of:).

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#inefficient-value-mutation>Inefficient Value Mutation

Dictionary values can be modified through the keyed subscript by direct reassignment or by using optional chaining. Both of these statements append 1 to the array stored by the key "one":

// Direct re-assignment
dict["one"] = (dict["one"] ?? []) + [1]

// Optional chaining
dict["one"]?.append(1)
Both approaches present problems. The first is complex and hard to read. The second ignores the case where "one" is not a key in the dictionary. It forces its check into a higher branch and encourages forced unwrapping. Furthermore, neither approach allows the array to grow in place. They introduce an unnecessary copy of the array's contents even though dict is the sole holder of its storage.

Adding mutation to a dictionary's index-based subscripting isn't possible. Changing a key stored at a particular index would almost certainly modify its hash value, rendering the index incorrect. This violates the requirements of the MutableCollection protocol.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#proposed-solution>Proposed Solution

This proposal adds a custom collection for the keys and values dictionary properties. This follows the example set by String, which presents multiple views of its contents. A new DictionaryKeys collection introduces efficient key lookup, while a new DictionaryValues collection provides a mutable collection interface to dictionary values.

These changes introduce a simple and efficient way of checking whether a dictionary includes a key:

// Performant
if dict.keys.contains("one") {
    // ...
}
As a mutable collection, values enables modification without copies or clumsy code:

if let i = dict.index(forKey: "one") {
    dict.values[i].append(1) // no copy here
} else {
    dict["one"] = [1]
}
Both the keys and values collections share the same index type as Dictionary. This allows the above sample to be rewritten as:

// Using `dict.keys.index(of:)`
if let i = dict.keys.index(of: "one") {
    dict.values[i].append(1)
} else {
    dict["one"] = [1]
}
<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#detailed-design>Detailed design

The standard library introduces two new collection types: DictionaryKeys and DictionaryValues.
A Dictionary's keys and values property types change from LazyMapCollection to these new types.
The new collection types are not directly constructable. They are presented only as views into a dictionary.
struct Dictionary<Key: Hashable, Value>: ... {
    var keys: DictionaryKeys<Key, Value> { get }
    var values: DictionaryValues<Key, Value> { get set }

    // Remaining declarations
}

/// A collection view of a dictionary's keys.
struct DictionaryKeys<Key: Hashable, Value>: Collection {
    typealias Index = DictionaryIndex<Key, Value>
    subscript(i: Index) -> Key { get }

    // Other `Collection` requirements
}

/// A mutable collection view of a dictionary's values.
struct DictionaryValues<Key: Hashable, Value>: MutableCollection {
    typealias Index = DictionaryIndex<Key, Value>
    subscript(i: Index) -> Value { get set }

    // Other `Collection` requirements
}
A sample implementation of this proposal can be found in this branch <https://github.com/natecook1000/swift/tree/nc-dictionary>.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#impact-on-existing-code>Impact on existing code

The performance improvements of using the new DictionaryKeys type and the mutability of the DictionaryValuescollection are both additive in nature.

Most uses of these properties are transitory in nature. Adopting this proposal should not produce a major impact on existing code. The only impact on existing code exists where a program explicitly specifies the type of a dictionary's keysor values property. The fix is to change the specified type.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#alternatives-considered>Alternatives considered

The Generics Manifesto <https://github.com/apple/swift/blob/master/docs/GenericsManifesto.md> lists nested generics as a goal. This could impact the naming and structure of these new collection types.

Instead of DictionaryKeys<Key, Value> and DictionaryValues<Key, Value>, these types could be Dictionary<Key, Value>.Keys and Dictionary<Key, Value>.Values. However, because many types in the standard library may be revisited once such a feature is available (indices, iterators, etc.), the current lack of nesting shouldn't prevent consideration of this proposal.

It could be possible to add additional compiler features that manage mutation through existing key-based subscripting without the copy-on-write problems of the current implementation. I don't know enough about how that would be implemented to speak to its feasibility or level of effort. Such a feature would reduce the need for a mutable DictionaryValues collection.
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Russ Bishop) #11

I’m not commenting on this proposal directly but shouldn’t this optimization be something the compiler can resolve? I suppose it requires something akin to the Rust borrow checker to determine that ownership passes from a local to the Optional and then to the setter, or in the case of write-back that references don’t escape the current scope except through the setter.

In other words this is a significant problem in many places and it would be a huge win to solve it in the compiler. I’m not sure changing types here and there to work around it is worth the trouble (though some of the issues you point out with Dictionary apply regardless).

Russ

···

On Oct 11, 2016, at 2:28 PM, Nate Cook via swift-evolution <swift-evolution@swift.org> wrote:
<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#inefficient-value-mutation>Inefficient Value Mutation

Dictionary values can be modified through the keyed subscript by direct reassignment or by using optional chaining. Both of these statements append 1 to the array stored by the key "one":

// Direct re-assignment
dict["one"] = (dict["one"] ?? []) + [1]

// Optional chaining
dict["one"]?.append(1)
Both approaches present problems. The first is complex and hard to read. The second ignores the case where "one" is not a key in the dictionary. It forces its check into a higher branch and encourages forced unwrapping. Furthermore, neither approach allows the array to grow in place. They introduce an unnecessary copy of the array's contents even though dict is the sole holder of its storage.


(Dave Abrahams) #12

Introduction

This proposal addresses significant unexpected performance gaps when using dictionaries. It
introduces type-specific collections for a Dictionary instance's keys and values properties.

New DictionaryKeys and DictionaryValues collections provide efficient key lookup and mutable access
to dictionary values, enabling updates to be performed in-place and allowing copy-on-write
optimization of stored values.

Doing this is a good idea for resilience reasons, if nothing else, as it
will allow us to adjust the implementations of these collections without
breaking clients.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#motivation>Motivation

This proposal address two problems:

The Dictionary type keys implementation is inefficient, because
LazyMapCollection doesn't know how to forward lookups to the
underlying dictionary storage.

I don't see why traversing a collection of keys should require
“lookups,” at least in the normal meaning of the word, which refers to
locating a hash entry given a key.

Dictionaries do not offer value-mutating APIs.

Subscript is a value-mutating API:

  var b = [1:"one"]
  b[1]! += "!"
  print(b[1]) // "one!"
  

The mutating key-based subscript wraps values in an Optional. This
prevents types with copy-on-write optimizations from recognizing they
are singly referenced.

Yes, that's an efficiency problem. I'm not making promises here, but I
have good reasons to think we'll address this issue by fixing the way
inout works. Specifically, we'd prevent (statically where possible)
aliased accesses to references that are participating in inout access
and thus be able to avoid incrementing their reference counts in these
scenarios.

This proposal uses the following [String: [Int]] dictionary to
demonstrate these problems:

var dict = ["one": [1], "two": [2, 2], "three": [3, 3, 3]]
<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#inefficient-dictkeys-search>Inefficient
dict.keys Search

Swift coders normally test key membership using nil checks or underscored optional bindings:

if dict["one"] != nil {
    // ...
}
if let _ = dict["one"] {
    // ...
}
These approaches provide the expected performance of a dictionary lookup but they read neither well
nor "Swifty". Checking keys reads much better but introduces a serious performance penalty: this
approach requires a linear search through a dictionary's keys to find a match.

if dict.keys.contains("one") {
    // ...
}

Heh. dict.keys could get you a Set<Key>

A similar dynamic plays out when comparing dict.index(forKey:) and dict.keys.index(of:).

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#inefficient-value-mutation>Inefficient
Value Mutation

Dictionary values can be modified through the keyed subscript by
direct reassignment or by using optional chaining. Both of these
statements append 1 to the array stored by the key "one":

// Direct re-assignment
dict["one"] = (dict["one"] ?? []) + [1]

// Optional chaining
dict["one"]?.append(1)

Both approaches present problems. The first is complex and hard to
read. The second ignores the case where "one" is not a key in the
dictionary.

IMO the right way to phrase this is that the semantics of the first
approach are much more commonly useful than those of the second.

It forces its check into a higher branch and encourages forced
unwrapping.

I don't think that's a valid argument against anything. There's nothing
wrong with forced unwrapping.

Furthermore, neither approach allows the array to grow in place. They
introduce an unnecessary copy of the array's contents even though dict
is the sole holder of its storage.

Yes. Here's the horrible-but-reasonably-efficient way to do this today:

var x: [Int]? = []
swap(&x, &dict["one"])
x = x ?? Array(minimumCapacity: 1)
x!.append(1)
swap(&x, &dict["one"])

Adding mutation to a dictionary's index-based subscripting isn't possible. Changing a key stored at
a particular index would almost certainly modify its hash value, rendering the index incorrect. This
violates the requirements of the MutableCollection protocol.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#proposed-solution>Proposed
Solution

This proposal adds a custom collection for the keys and values
dictionary properties. This follows the example set by String, which
presents multiple views of its contents. A new DictionaryKeys
collection introduces efficient key lookup, while a new
DictionaryValues collection provides a mutable collection interface to
dictionary values.

These changes introduce a simple and efficient way of checking whether
a dictionary includes a key:

// Performant
if dict.keys.contains("one") {
    // ...
}

As a mutable collection, values enables modification without copies or
clumsy code:

if let i = dict.index(forKey: "one") {
    dict.values[i].append(1) // no copy here
} else {
    dict["one"] = [1]
}

Well, I find that a bit clumsy still, but it's better than what we've
got. There's no reason dict.values shoudn't be a MutableCollection.

Both the keys and values collections share the same index type as
Dictionary. This allows the above sample to be rewritten as:

// Using `dict.keys.index(of:)`
if let i = dict.keys.index(of: "one") {
    dict.values[i].append(1)
} else {
    dict["one"] = [1]
}
<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#detailed-design>Detailed
design

That is a good point. It makes me inclined to deprecate index(forKey:)

···

on Tue Oct 11 2016, Nate Cook <swift-evolution@swift.org> wrote:

The standard library introduces two new collection types:
DictionaryKeys and DictionaryValues. A Dictionary's keys and values
property types change from LazyMapCollection to these new types. The
new collection types are not directly constructable. They are
presented only as views into a dictionary.

struct Dictionary<Key: Hashable, Value>: ... {
    var keys: DictionaryKeys<Key, Value> { get }
    var values: DictionaryValues<Key, Value> { get set }

    // Remaining declarations
}

/// A collection view of a dictionary's keys.
struct DictionaryKeys<Key: Hashable, Value>: Collection {
    typealias Index = DictionaryIndex<Key, Value>
    subscript(i: Index) -> Key { get }

    // Other `Collection` requirements
}

/// A mutable collection view of a dictionary's values.
struct DictionaryValues<Key: Hashable, Value>: MutableCollection {
    typealias Index = DictionaryIndex<Key, Value>
    subscript(i: Index) -> Value { get set }

    // Other `Collection` requirements
}

A sample implementation of this proposal can be found in this branch
<https://github.com/natecook1000/swift/tree/nc-dictionary>.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#impact-on-existing-code>Impact
on existing code

The performance improvements of using the new DictionaryKeys type and
the mutability of the DictionaryValuescollection are both additive in
nature.

Most uses of these properties are transitory in nature. Adopting this
proposal should not produce a major impact on existing code. The only
impact on existing code exists where a program explicitly specifies
the type of a dictionary's keysor values property. The fix is to
change the specified type.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#alternatives-considered>Alternatives
considered

The Generics Manifesto
<https://github.com/apple/swift/blob/master/docs/GenericsManifesto.md>
lists nested generics as a goal. This could impact the naming and
structure of these new collection types.

Instead of DictionaryKeys<Key, Value> and
DictionaryValues<Key, Value>, these types
could be Dictionary<Key, Value>.Keys and
Dictionary<Key, Value>.Values. However, because many types in the
standard library may be revisited once such a feature is available
(indices, iterators, etc.), the current lack of nesting shouldn't
prevent consideration of this proposal.

It could be possible to add additional compiler features that manage
mutation through existing key-based subscripting without the
copy-on-write problems of the current implementation. I don't know
enough about how that would be implemented to speak to its feasibility
or level of effort. Such a feature would reduce the need for a mutable
DictionaryValues collection.

--
-Dave


(Paul Cantrell) #13

A late-arriving strong +1 for me. The index-related stuff is elegant and much needed. I’m surprised to learn that dict.keys and dict.values are copies and not already views! Clearly they should be.

Question: I hit a closely related performance wall just last week, doing something like this:

    for k in dict.keys {
        dict.values[k].append(1)
    }

I assume / hope the proposal would also support this?

    for i in dict.indices {
        dict.values[i].append(1)
    }

…or would it be this?

    for i in dict.keys.indices {
        dict.values[i].append(1)
    }

…or either?

Cheers, P

···

On Oct 11, 2016, at 4:28 PM, Nate Cook via swift-evolution <swift-evolution@swift.org> wrote:

Introduction

This proposal addresses significant unexpected performance gaps when using dictionaries. It introduces type-specific collections for a Dictionary instance's keys and values properties.

New DictionaryKeys and DictionaryValues collections provide efficient key lookup and mutable access to dictionary values, enabling updates to be performed in-place and allowing copy-on-write optimization of stored values.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#motivation>Motivation

This proposal address two problems:

The Dictionary type keys implementation is inefficient, because LazyMapCollection doesn't know how to forward lookups to the underlying dictionary storage.
Dictionaries do not offer value-mutating APIs. The mutating key-based subscript wraps values in an Optional. This prevents types with copy-on-write optimizations from recognizing they are singly referenced.
This proposal uses the following [String: [Int]] dictionary to demonstrate these problems:

var dict = ["one": [1], "two": [2, 2], "three": [3, 3, 3]]
<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#inefficient-dictkeys-search>Inefficient dict.keys Search

Swift coders normally test key membership using nil checks or underscored optional bindings:

if dict["one"] != nil {
    // ...
}
if let _ = dict["one"] {
    // ...
}
These approaches provide the expected performance of a dictionary lookup but they read neither well nor "Swifty". Checking keys reads much better but introduces a serious performance penalty: this approach requires a linear search through a dictionary's keys to find a match.

if dict.keys.contains("one") {
    // ...
}
A similar dynamic plays out when comparing dict.index(forKey:) and dict.keys.index(of:).

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#inefficient-value-mutation>Inefficient Value Mutation

Dictionary values can be modified through the keyed subscript by direct reassignment or by using optional chaining. Both of these statements append 1 to the array stored by the key "one":

// Direct re-assignment
dict["one"] = (dict["one"] ?? []) + [1]

// Optional chaining
dict["one"]?.append(1)
Both approaches present problems. The first is complex and hard to read. The second ignores the case where "one" is not a key in the dictionary. It forces its check into a higher branch and encourages forced unwrapping. Furthermore, neither approach allows the array to grow in place. They introduce an unnecessary copy of the array's contents even though dict is the sole holder of its storage.

Adding mutation to a dictionary's index-based subscripting isn't possible. Changing a key stored at a particular index would almost certainly modify its hash value, rendering the index incorrect. This violates the requirements of the MutableCollection protocol.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#proposed-solution>Proposed Solution

This proposal adds a custom collection for the keys and values dictionary properties. This follows the example set by String, which presents multiple views of its contents. A new DictionaryKeys collection introduces efficient key lookup, while a new DictionaryValues collection provides a mutable collection interface to dictionary values.

These changes introduce a simple and efficient way of checking whether a dictionary includes a key:

// Performant
if dict.keys.contains("one") {
    // ...
}
As a mutable collection, values enables modification without copies or clumsy code:

if let i = dict.index(forKey: "one") {
    dict.values[i].append(1) // no copy here
} else {
    dict["one"] = [1]
}
Both the keys and values collections share the same index type as Dictionary. This allows the above sample to be rewritten as:

// Using `dict.keys.index(of:)`
if let i = dict.keys.index(of: "one") {
    dict.values[i].append(1)
} else {
    dict["one"] = [1]
}
<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#detailed-design>Detailed design

The standard library introduces two new collection types: DictionaryKeys and DictionaryValues.
A Dictionary's keys and values property types change from LazyMapCollection to these new types.
The new collection types are not directly constructable. They are presented only as views into a dictionary.
struct Dictionary<Key: Hashable, Value>: ... {
    var keys: DictionaryKeys<Key, Value> { get }
    var values: DictionaryValues<Key, Value> { get set }

    // Remaining declarations
}

/// A collection view of a dictionary's keys.
struct DictionaryKeys<Key: Hashable, Value>: Collection {
    typealias Index = DictionaryIndex<Key, Value>
    subscript(i: Index) -> Key { get }

    // Other `Collection` requirements
}

/// A mutable collection view of a dictionary's values.
struct DictionaryValues<Key: Hashable, Value>: MutableCollection {
    typealias Index = DictionaryIndex<Key, Value>
    subscript(i: Index) -> Value { get set }

    // Other `Collection` requirements
}
A sample implementation of this proposal can be found in this branch <https://github.com/natecook1000/swift/tree/nc-dictionary>.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#impact-on-existing-code>Impact on existing code

The performance improvements of using the new DictionaryKeys type and the mutability of the DictionaryValuescollection are both additive in nature.

Most uses of these properties are transitory in nature. Adopting this proposal should not produce a major impact on existing code. The only impact on existing code exists where a program explicitly specifies the type of a dictionary's keysor values property. The fix is to change the specified type.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#alternatives-considered>Alternatives considered

The Generics Manifesto <https://github.com/apple/swift/blob/master/docs/GenericsManifesto.md> lists nested generics as a goal. This could impact the naming and structure of these new collection types.

Instead of DictionaryKeys<Key, Value> and DictionaryValues<Key, Value>, these types could be Dictionary<Key, Value>.Keys and Dictionary<Key, Value>.Values. However, because many types in the standard library may be revisited once such a feature is available (indices, iterators, etc.), the current lack of nesting shouldn't prevent consideration of this proposal.

It could be possible to add additional compiler features that manage mutation through existing key-based subscripting without the copy-on-write problems of the current implementation. I don't know enough about how that would be implemented to speak to its feasibility or level of effort. Such a feature would reduce the need for a mutable DictionaryValues collection.
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Braeden Profile) #14

Awesome; +1. Does this address the lack of .init(keys:values:)? Would it make that addition easier?

···

On Oct 11, 2016, at 3:38 PM, Xiaodi Wu via swift-evolution <swift-evolution@swift.org> wrote:

Very elegant solution. Strong +1; no other feedback comes to mind atm.

On Tue, Oct 11, 2016 at 4:28 PM, Nate Cook via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:
Introduction

This proposal addresses significant unexpected performance gaps when using dictionaries. It introduces type-specific collections for a Dictionary instance's keys and values properties.

New DictionaryKeys and DictionaryValues collections provide efficient key lookup and mutable access to dictionary values, enabling updates to be performed in-place and allowing copy-on-write optimization of stored values.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#motivation>Motivation

This proposal address two problems:

The Dictionary type keys implementation is inefficient, because LazyMapCollection doesn't know how to forward lookups to the underlying dictionary storage.
Dictionaries do not offer value-mutating APIs. The mutating key-based subscript wraps values in an Optional. This prevents types with copy-on-write optimizations from recognizing they are singly referenced.
This proposal uses the following [String: [Int]] dictionary to demonstrate these problems:

var dict = ["one": [1], "two": [2, 2], "three": [3, 3, 3]]
<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#inefficient-dictkeys-search>Inefficient dict.keys Search

Swift coders normally test key membership using nil checks or underscored optional bindings:

if dict["one"] != nil {
    // ...
}
if let _ = dict["one"] {
    // ...
}
These approaches provide the expected performance of a dictionary lookup but they read neither well nor "Swifty". Checking keys reads much better but introduces a serious performance penalty: this approach requires a linear search through a dictionary's keys to find a match.

if dict.keys.contains("one") {
    // ...
}
A similar dynamic plays out when comparing dict.index(forKey:) and dict.keys.index(of:).

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#inefficient-value-mutation>Inefficient Value Mutation

Dictionary values can be modified through the keyed subscript by direct reassignment or by using optional chaining. Both of these statements append 1 to the array stored by the key "one":

// Direct re-assignment
dict["one"] = (dict["one"] ?? []) + [1]

// Optional chaining
dict["one"]?.append(1)
Both approaches present problems. The first is complex and hard to read. The second ignores the case where "one" is not a key in the dictionary. It forces its check into a higher branch and encourages forced unwrapping. Furthermore, neither approach allows the array to grow in place. They introduce an unnecessary copy of the array's contents even though dict is the sole holder of its storage.

Adding mutation to a dictionary's index-based subscripting isn't possible. Changing a key stored at a particular index would almost certainly modify its hash value, rendering the index incorrect. This violates the requirements of the MutableCollection protocol.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#proposed-solution>Proposed Solution

This proposal adds a custom collection for the keys and values dictionary properties. This follows the example set by String, which presents multiple views of its contents. A new DictionaryKeys collection introduces efficient key lookup, while a new DictionaryValues collection provides a mutable collection interface to dictionary values.

These changes introduce a simple and efficient way of checking whether a dictionary includes a key:

// Performant
if dict.keys.contains("one") {
    // ...
}
As a mutable collection, values enables modification without copies or clumsy code:

if let i = dict.index(forKey: "one") {
    dict.values[i].append(1) // no copy here
} else {
    dict["one"] = [1]
}
Both the keys and values collections share the same index type as Dictionary. This allows the above sample to be rewritten as:

// Using `dict.keys.index(of:)`
if let i = dict.keys.index(of: "one") {
    dict.values[i].append(1)
} else {
    dict["one"] = [1]
}
<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#detailed-design>Detailed design

The standard library introduces two new collection types: DictionaryKeys and DictionaryValues.
A Dictionary's keys and values property types change from LazyMapCollection to these new types.
The new collection types are not directly constructable. They are presented only as views into a dictionary.
struct Dictionary<Key: Hashable, Value>: ... {
    var keys: DictionaryKeys<Key, Value> { get }
    var values: DictionaryValues<Key, Value> { get set }

    // Remaining declarations
}

/// A collection view of a dictionary's keys.
struct DictionaryKeys<Key: Hashable, Value>: Collection {
    typealias Index = DictionaryIndex<Key, Value>
    subscript(i: Index) -> Key { get }

    // Other `Collection` requirements
}

/// A mutable collection view of a dictionary's values.
struct DictionaryValues<Key: Hashable, Value>: MutableCollection {
    typealias Index = DictionaryIndex<Key, Value>
    subscript(i: Index) -> Value { get set }

    // Other `Collection` requirements
}
A sample implementation of this proposal can be found in this branch <https://github.com/natecook1000/swift/tree/nc-dictionary>.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#impact-on-existing-code>Impact on existing code

The performance improvements of using the new DictionaryKeys type and the mutability of the DictionaryValuescollection are both additive in nature.

Most uses of these properties are transitory in nature. Adopting this proposal should not produce a major impact on existing code. The only impact on existing code exists where a program explicitly specifies the type of a dictionary's keysor values property. The fix is to change the specified type.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#alternatives-considered>Alternatives considered

The Generics Manifesto <https://github.com/apple/swift/blob/master/docs/GenericsManifesto.md> lists nested generics as a goal. This could impact the naming and structure of these new collection types.

Instead of DictionaryKeys<Key, Value> and DictionaryValues<Key, Value>, these types could be Dictionary<Key, Value>.Keys and Dictionary<Key, Value>.Values. However, because many types in the standard library may be revisited once such a feature is available (indices, iterators, etc.), the current lack of nesting shouldn't prevent consideration of this proposal.

It could be possible to add additional compiler features that manage mutation through existing key-based subscripting without the copy-on-write problems of the current implementation. I don't know enough about how that would be implemented to speak to its feasibility or level of effort. Such a feature would reduce the need for a mutable DictionaryValues collection.

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Erica Sadun) #15

I agree. I like this proposal.

-- E

···

On Oct 11, 2016, at 3:38 PM, Xiaodi Wu via swift-evolution <swift-evolution@swift.org> wrote:

Very elegant solution. Strong +1; no other feedback comes to mind atm.

On Tue, Oct 11, 2016 at 4:28 PM, Nate Cook via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:
Introduction

This proposal addresses significant unexpected performance gaps when using dictionaries. It introduces type-specific collections for a Dictionary instance's keys and values properties.

New DictionaryKeys and DictionaryValues collections provide efficient key lookup and mutable access to dictionary values, enabling updates to be performed in-place and allowing copy-on-write optimization of stored values.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#motivation>Motivation

This proposal address two problems:

The Dictionary type keys implementation is inefficient, because LazyMapCollection doesn't know how to forward lookups to the underlying dictionary storage.
Dictionaries do not offer value-mutating APIs. The mutating key-based subscript wraps values in an Optional. This prevents types with copy-on-write optimizations from recognizing they are singly referenced.
This proposal uses the following [String: [Int]] dictionary to demonstrate these problems:

var dict = ["one": [1], "two": [2, 2], "three": [3, 3, 3]]
<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#inefficient-dictkeys-search>Inefficient dict.keys Search

Swift coders normally test key membership using nil checks or underscored optional bindings:

if dict["one"] != nil {
    // ...
}
if let _ = dict["one"] {
    // ...
}
These approaches provide the expected performance of a dictionary lookup but they read neither well nor "Swifty". Checking keys reads much better but introduces a serious performance penalty: this approach requires a linear search through a dictionary's keys to find a match.

if dict.keys.contains("one") {
    // ...
}
A similar dynamic plays out when comparing dict.index(forKey:) and dict.keys.index(of:).

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#inefficient-value-mutation>Inefficient Value Mutation

Dictionary values can be modified through the keyed subscript by direct reassignment or by using optional chaining. Both of these statements append 1 to the array stored by the key "one":

// Direct re-assignment
dict["one"] = (dict["one"] ?? []) + [1]

// Optional chaining
dict["one"]?.append(1)
Both approaches present problems. The first is complex and hard to read. The second ignores the case where "one" is not a key in the dictionary. It forces its check into a higher branch and encourages forced unwrapping. Furthermore, neither approach allows the array to grow in place. They introduce an unnecessary copy of the array's contents even though dict is the sole holder of its storage.

Adding mutation to a dictionary's index-based subscripting isn't possible. Changing a key stored at a particular index would almost certainly modify its hash value, rendering the index incorrect. This violates the requirements of the MutableCollection protocol.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#proposed-solution>Proposed Solution

This proposal adds a custom collection for the keys and values dictionary properties. This follows the example set by String, which presents multiple views of its contents. A new DictionaryKeys collection introduces efficient key lookup, while a new DictionaryValues collection provides a mutable collection interface to dictionary values.

These changes introduce a simple and efficient way of checking whether a dictionary includes a key:

// Performant
if dict.keys.contains("one") {
    // ...
}
As a mutable collection, values enables modification without copies or clumsy code:

if let i = dict.index(forKey: "one") {
    dict.values[i].append(1) // no copy here
} else {
    dict["one"] = [1]
}
Both the keys and values collections share the same index type as Dictionary. This allows the above sample to be rewritten as:

// Using `dict.keys.index(of:)`
if let i = dict.keys.index(of: "one") {
    dict.values[i].append(1)
} else {
    dict["one"] = [1]
}
<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#detailed-design>Detailed design

The standard library introduces two new collection types: DictionaryKeys and DictionaryValues.
A Dictionary's keys and values property types change from LazyMapCollection to these new types.
The new collection types are not directly constructable. They are presented only as views into a dictionary.
struct Dictionary<Key: Hashable, Value>: ... {
    var keys: DictionaryKeys<Key, Value> { get }
    var values: DictionaryValues<Key, Value> { get set }

    // Remaining declarations
}

/// A collection view of a dictionary's keys.
struct DictionaryKeys<Key: Hashable, Value>: Collection {
    typealias Index = DictionaryIndex<Key, Value>
    subscript(i: Index) -> Key { get }

    // Other `Collection` requirements
}

/// A mutable collection view of a dictionary's values.
struct DictionaryValues<Key: Hashable, Value>: MutableCollection {
    typealias Index = DictionaryIndex<Key, Value>
    subscript(i: Index) -> Value { get set }

    // Other `Collection` requirements
}
A sample implementation of this proposal can be found in this branch <https://github.com/natecook1000/swift/tree/nc-dictionary>.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#impact-on-existing-code>Impact on existing code

The performance improvements of using the new DictionaryKeys type and the mutability of the DictionaryValuescollection are both additive in nature.

Most uses of these properties are transitory in nature. Adopting this proposal should not produce a major impact on existing code. The only impact on existing code exists where a program explicitly specifies the type of a dictionary's keysor values property. The fix is to change the specified type.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#alternatives-considered>Alternatives considered

The Generics Manifesto <https://github.com/apple/swift/blob/master/docs/GenericsManifesto.md> lists nested generics as a goal. This could impact the naming and structure of these new collection types.

Instead of DictionaryKeys<Key, Value> and DictionaryValues<Key, Value>, these types could be Dictionary<Key, Value>.Keys and Dictionary<Key, Value>.Values. However, because many types in the standard library may be revisited once such a feature is available (indices, iterators, etc.), the current lack of nesting shouldn't prevent consideration of this proposal.

It could be possible to add additional compiler features that manage mutation through existing key-based subscripting without the copy-on-write problems of the current implementation. I don't know enough about how that would be implemented to speak to its feasibility or level of effort. Such a feature would reduce the need for a mutable DictionaryValues collection.

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Nate Cook) #16

Awesome; +1. Does this address the lack of .init(keys:values:)? Would it make that addition easier?

No, I don't think this has any bearing on that question. There's a separate proposal for that sort of initializer that was deferred from the Swift 3 evolution process (probably until stage 2 of Swift 4 since it's additive):

https://github.com/apple/swift-evolution/blob/master/proposals/0100-add-sequence-based-init-and-merge-to-dictionary.md

Nate

···

On Oct 11, 2016, at 5:06 PM, Braeden Profile <jhaezhyr12@gmail.com> wrote:

On Oct 11, 2016, at 3:38 PM, Xiaodi Wu via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Very elegant solution. Strong +1; no other feedback comes to mind atm.

On Tue, Oct 11, 2016 at 4:28 PM, Nate Cook via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:
Introduction

This proposal addresses significant unexpected performance gaps when using dictionaries. It introduces type-specific collections for a Dictionary instance's keys and values properties.

New DictionaryKeys and DictionaryValues collections provide efficient key lookup and mutable access to dictionary values, enabling updates to be performed in-place and allowing copy-on-write optimization of stored values.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#motivation>Motivation

This proposal address two problems:

The Dictionary type keys implementation is inefficient, because LazyMapCollection doesn't know how to forward lookups to the underlying dictionary storage.
Dictionaries do not offer value-mutating APIs. The mutating key-based subscript wraps values in an Optional. This prevents types with copy-on-write optimizations from recognizing they are singly referenced.
This proposal uses the following [String: [Int]] dictionary to demonstrate these problems:

var dict = ["one": [1], "two": [2, 2], "three": [3, 3, 3]]
<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#inefficient-dictkeys-search>Inefficient dict.keys Search

Swift coders normally test key membership using nil checks or underscored optional bindings:

if dict["one"] != nil {
    // ...
}
if let _ = dict["one"] {
    // ...
}
These approaches provide the expected performance of a dictionary lookup but they read neither well nor "Swifty". Checking keys reads much better but introduces a serious performance penalty: this approach requires a linear search through a dictionary's keys to find a match.

if dict.keys.contains("one") {
    // ...
}
A similar dynamic plays out when comparing dict.index(forKey:) and dict.keys.index(of:).

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#inefficient-value-mutation>Inefficient Value Mutation

Dictionary values can be modified through the keyed subscript by direct reassignment or by using optional chaining. Both of these statements append 1 to the array stored by the key "one":

// Direct re-assignment
dict["one"] = (dict["one"] ?? []) + [1]

// Optional chaining
dict["one"]?.append(1)
Both approaches present problems. The first is complex and hard to read. The second ignores the case where "one" is not a key in the dictionary. It forces its check into a higher branch and encourages forced unwrapping. Furthermore, neither approach allows the array to grow in place. They introduce an unnecessary copy of the array's contents even though dict is the sole holder of its storage.

Adding mutation to a dictionary's index-based subscripting isn't possible. Changing a key stored at a particular index would almost certainly modify its hash value, rendering the index incorrect. This violates the requirements of the MutableCollection protocol.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#proposed-solution>Proposed Solution

This proposal adds a custom collection for the keys and values dictionary properties. This follows the example set by String, which presents multiple views of its contents. A new DictionaryKeys collection introduces efficient key lookup, while a new DictionaryValues collection provides a mutable collection interface to dictionary values.

These changes introduce a simple and efficient way of checking whether a dictionary includes a key:

// Performant
if dict.keys.contains("one") {
    // ...
}
As a mutable collection, values enables modification without copies or clumsy code:

if let i = dict.index(forKey: "one") {
    dict.values[i].append(1) // no copy here
} else {
    dict["one"] = [1]
}
Both the keys and values collections share the same index type as Dictionary. This allows the above sample to be rewritten as:

// Using `dict.keys.index(of:)`
if let i = dict.keys.index(of: "one") {
    dict.values[i].append(1)
} else {
    dict["one"] = [1]
}
<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#detailed-design>Detailed design

The standard library introduces two new collection types: DictionaryKeys and DictionaryValues.
A Dictionary's keys and values property types change from LazyMapCollection to these new types.
The new collection types are not directly constructable. They are presented only as views into a dictionary.
struct Dictionary<Key: Hashable, Value>: ... {
    var keys: DictionaryKeys<Key, Value> { get }
    var values: DictionaryValues<Key, Value> { get set }

    // Remaining declarations
}

/// A collection view of a dictionary's keys.
struct DictionaryKeys<Key: Hashable, Value>: Collection {
    typealias Index = DictionaryIndex<Key, Value>
    subscript(i: Index) -> Key { get }

    // Other `Collection` requirements
}

/// A mutable collection view of a dictionary's values.
struct DictionaryValues<Key: Hashable, Value>: MutableCollection {
    typealias Index = DictionaryIndex<Key, Value>
    subscript(i: Index) -> Value { get set }

    // Other `Collection` requirements
}
A sample implementation of this proposal can be found in this branch <https://github.com/natecook1000/swift/tree/nc-dictionary>.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#impact-on-existing-code>Impact on existing code

The performance improvements of using the new DictionaryKeys type and the mutability of the DictionaryValuescollection are both additive in nature.

Most uses of these properties are transitory in nature. Adopting this proposal should not produce a major impact on existing code. The only impact on existing code exists where a program explicitly specifies the type of a dictionary's keysor values property. The fix is to change the specified type.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#alternatives-considered>Alternatives considered

The Generics Manifesto <https://github.com/apple/swift/blob/master/docs/GenericsManifesto.md> lists nested generics as a goal. This could impact the naming and structure of these new collection types.

Instead of DictionaryKeys<Key, Value> and DictionaryValues<Key, Value>, these types could be Dictionary<Key, Value>.Keys and Dictionary<Key, Value>.Values. However, because many types in the standard library may be revisited once such a feature is available (indices, iterators, etc.), the current lack of nesting shouldn't prevent consideration of this proposal.

It could be possible to add additional compiler features that manage mutation through existing key-based subscripting without the copy-on-write problems of the current implementation. I don't know enough about how that would be implemented to speak to its feasibility or level of effort. Such a feature would reduce the need for a mutable DictionaryValues collection.

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution


(Tony Allevato) #17

As someone who's hit this performance issue myself, a big +1 from me. The
solution looks like it fits perfectly into Swift.

···

On Tue, Oct 11, 2016 at 3:01 PM Matthew Johnson via swift-evolution < swift-evolution@swift.org> wrote:

Looks very nice. +1 here as well.

On Oct 11, 2016, at 4:28 PM, Nate Cook via swift-evolution < > swift-evolution@swift.org> wrote:

Introduction

This proposal addresses significant unexpected performance gaps when using
dictionaries. It introduces type-specific collections for a Dictionary
instance's keys and values properties.

New DictionaryKeys and DictionaryValues collections provide efficient key
lookup and mutable access to dictionary values, enabling updates to be
performed in-place and allowing copy-on-write optimization of stored values.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#motivation>
Motivation

This proposal address two problems:

   - The Dictionary type keys implementation is inefficient, because
   LazyMapCollection doesn't know how to forward lookups to the
   underlying dictionary storage.
   - Dictionaries do not offer value-mutating APIs. The mutating
   key-based subscript wraps values in an Optional. This prevents types
   with copy-on-write optimizations from recognizing they are singly
   referenced.

This proposal uses the following [String: [Int]] dictionary to
demonstrate these problems:

var dict = ["one": [1], "two": [2, 2], "three": [3, 3, 3]]

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#inefficient-dictkeys-search>
Inefficient dict.keys Search

Swift coders normally test key membership using nil checks or underscored
optional bindings:

if dict["one"] != nil {
    // ...
}if let _ = dict["one"] {
    // ...
}

These approaches provide the expected performance of a dictionary lookup
but they read neither well nor "Swifty". Checking keys reads much better
but introduces a serious performance penalty: this approach requires a
linear search through a dictionary's keys to find a match.

if dict.keys.contains("one") {
    // ...
}

A similar dynamic plays out when comparing dict.index(forKey:) and
dict.keys.index(of:).

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#inefficient-value-mutation>Inefficient
Value Mutation

Dictionary values can be modified through the keyed subscript by direct
reassignment or by using optional chaining. Both of these statements append
1 to the array stored by the key "one":

// Direct re-assignment
dict["one"] = (dict["one"] ?? []) + [1]
// Optional chaining
dict["one"]?.append(1)

Both approaches present problems. The first is complex and hard to read.
The second ignores the case where "one" is not a key in the dictionary.
It forces its check into a higher branch and encourages forced unwrapping.
Furthermore, neither approach allows the array to grow in place. They
introduce an unnecessary copy of the array's contents even though dict is
the sole holder of its storage.

Adding mutation to a dictionary's index-based subscripting isn't possible.
Changing a key stored at a particular index would almost certainly modify
its hash value, rendering the index incorrect. This violates the
requirements of the MutableCollection protocol.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#proposed-solution>Proposed
Solution

This proposal adds a custom collection for the keys and values dictionary
properties. This follows the example set by String, which presents
multiple views of its contents. A new DictionaryKeys collection
introduces efficient key lookup, while a new DictionaryValues collection
provides a mutable collection interface to dictionary values.

These changes introduce a simple and efficient way of checking whether a
dictionary includes a key:

// Performantif dict.keys.contains("one") {
    // ...
}

As a mutable collection, values enables modification without copies or
clumsy code:

if let i = dict.index(forKey: "one") {
    dict.values[i].append(1) // no copy here
} else {
    dict["one"] = [1]
}

Both the keys and values collections share the same index type as
Dictionary. This allows the above sample to be rewritten as:

// Using `dict.keys.index(of:)`if let i = dict.keys.index(of: "one") {
    dict.values[i].append(1)
} else {
    dict["one"] = [1]
}

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#detailed-design>Detailed
design

   - The standard library introduces two new collection types:
   DictionaryKeys and DictionaryValues.
   - A Dictionary's keys and values property types change from
   LazyMapCollection to these new types.
   - The new collection types are not directly constructable. They are
   presented only as views into a dictionary.

struct Dictionary<Key: Hashable, Value>: ... {
    var keys: DictionaryKeys<Key, Value> { get }
    var values: DictionaryValues<Key, Value> { get set }

    // Remaining declarations
}
/// A collection view of a dictionary's keys.struct DictionaryKeys<Key: Hashable, Value>: Collection {
    typealias Index = DictionaryIndex<Key, Value>
    subscript(i: Index) -> Key { get }

    // Other `Collection` requirements
}
/// A mutable collection view of a dictionary's values.struct DictionaryValues<Key: Hashable, Value>: MutableCollection {
    typealias Index = DictionaryIndex<Key, Value>
    subscript(i: Index) -> Value { get set }

    // Other `Collection` requirements
}

A sample implementation of this proposal can be found in this branch
<https://github.com/natecook1000/swift/tree/nc-dictionary>.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#impact-on-existing-code>Impact
on existing code

The performance improvements of using the new DictionaryKeys type and the
mutability of the DictionaryValuescollection are both additive in nature.

Most uses of these properties are transitory in nature. Adopting this
proposal should not produce a major impact on existing code. The only
impact on existing code exists where a program explicitly specifies the
type of a dictionary's keysor values property. The fix is to change the
specified type.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#alternatives-considered>Alternatives
considered

The Generics Manifesto
<https://github.com/apple/swift/blob/master/docs/GenericsManifesto.md> lists
nested generics as a goal. This could impact the naming and structure of
these new collection types.

Instead of DictionaryKeys<Key, Value> and DictionaryValues<Key, Value>,
these types could be Dictionary<Key, Value>.Keys and Dictionary<Key,
>.Values. However, because many types in the standard library may be
revisited once such a feature is available (indices, iterators, etc.), the
current lack of nesting shouldn't prevent consideration of this proposal.
It could be possible to add additional compiler features that manage
mutation through existing key-based subscripting without the copy-on-write
problems of the current implementation. I don't know enough about how that
would be implemented to speak to its feasibility or level of effort. Such a
feature would reduce the need for a mutable DictionaryValues collection.
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Said Sikira) #18

+1

Beautiful, +1

Rien

Introduction

This proposal addresses significant unexpected performance gaps when

using dictionaries. It introduces type-specific collections for a
Dictionary instance's keys and values properties.

New DictionaryKeys and DictionaryValues collections provide efficient key

lookup and mutable access to dictionary values, enabling updates to be
performed in-place and allowing copy-on-write optimization of stored values.

Motivation

This proposal address two problems:

• The Dictionary type keys implementation is inefficient, because

LazyMapCollection doesn't know how to forward lookups to the underlying
dictionary storage.

• Dictionaries do not offer value-mutating APIs. The mutating key-based

subscript wraps values in an Optional. This prevents types with
copy-on-write optimizations from recognizing they are singly referenced.

This proposal uses the following [String: [Int]] dictionary to

demonstrate these problems:

var dict = ["one": [1], "two": [2, 2], "three": [3, 3, 3]]
Inefficient dict.keys Search

Swift coders normally test key membership using nil checks or underscored

optional bindings:

if dict["one"] != nil
{

// ...

}

if let _ = dict["one"
] {

// ...

}

These approaches provide the expected performance of a dictionary lookup

but they read neither well nor "Swifty". Checking keys reads much better
but introduces a serious performance penalty: this approach requires a
linear search through a dictionary's keys to find a match.

if dict.keys.contains("one") {
// ...
}

A similar dynamic plays out when comparing dict.index(forKey:) and

dict.keys.index(of:).

Inefficient Value Mutation

Dictionary values can be modified through the keyed subscript by direct

reassignment or by using optional chaining. Both of these statements append
1 to the array stored by the key "one":

// Direct re-assignment

dict[
"one"] = (dict["one"] ?? []) + [1
]

// Optional chaining

dict[
"one"]?.append(1)
Both approaches present problems. The first is complex and hard to read.

The second ignores the case where "one" is not a key in the dictionary. It
forces its check into a higher branch and encourages forced unwrapping.
Furthermore, neither approach allows the array to grow in place. They
introduce an unnecessary copy of the array's contents even though dict is
the sole holder of its storage.

Adding mutation to a dictionary's index-based subscripting isn't

possible. Changing a key stored at a particular index would almost
certainly modify its hash value, rendering the index incorrect. This
violates the requirements of the MutableCollection protocol.

Proposed Solution

This proposal adds a custom collection for the keys and values dictionary

properties. This follows the example set by String, which presents multiple
views of its contents. A new DictionaryKeys collection introduces efficient
key lookup, while a new DictionaryValues collection provides a mutable
collection interface to dictionary values.

These changes introduce a simple and efficient way of checking whether a

dictionary includes a key:

// Performant
if dict.keys.contains("one"
) {

// ...

}

As a mutable collection, values enables modification without copies or

clumsy code:

if let i = dict.index(forKey: "one"
) {
dict
.values[i].append(1) // no copy here

}
else
{
dict[
"one"] = [1
]
}

Both the keys and values collections share the same index type as

Dictionary. This allows the above sample to be rewritten as:

// Using `dict.keys.index(of:)`
if let i = dict.keys.index(of: "one"
) {
dict
.values[i].append(1
)
}
else
{
dict[
"one"] = [1
]
}

Detailed design

• The standard library introduces two new collection types:

DictionaryKeys and DictionaryValues.

• A Dictionary's keys and values property types change from

LazyMapCollection to these new types.

• The new collection types are not directly constructable. They are

presented only as views into a dictionary.

struct Dictionary<Key: Hashable, Value>: ...
{

var keys: DictionaryKeys<Key, Value> { get
}

var values: DictionaryValues<Key, Value> { get set
}

// Remaining declarations

}

/// A collection view of a dictionary's keys.
struct DictionaryKeys<Key: Hashable, Value>
: Collection {

typealias Index = DictionaryIndex<Key, Value>

subscript(i: Index) -> Key { get
}

// Other `Collection` requirements

}

/// A mutable collection view of a dictionary's values.
struct DictionaryValues<Key: Hashable, Value>
: MutableCollection {

typealias Index = DictionaryIndex<Key, Value>

subscript(i: Index) -> Value { get set
}

// Other `Collection` requirements

}

A sample implementation of this proposal can be found in this branch.

Impact on existing code

The performance improvements of using the new DictionaryKeys type and the

mutability of the DictionaryValuescollection are both additive in nature.

Most uses of these properties are transitory in nature. Adopting this

proposal should not produce a major impact on existing code. The only
impact on existing code exists where a program explicitly specifies the
type of a dictionary's keysor values property. The fix is to change the
specified type.

Alternatives considered

The Generics Manifesto lists nested generics as a goal. This could impact

the naming and structure of these new collection types.

Instead of DictionaryKeys<Key, Value> and DictionaryValues<Key, Value>,

these types could be Dictionary<Key, Value>.Keys and Dictionary<Key,

.Values. However, because many types in the standard library may be

revisited once such a feature is available (indices, iterators, etc.), the
current lack of nesting shouldn't prevent consideration of this proposal.

It could be possible to add additional compiler features that manage

mutation through existing key-based subscripting without the copy-on-write
problems of the current implementation. I don't know enough about how that
would be implemented to speak to its feasibility or level of effort. Such a
feature would reduce the need for a mutable DictionaryValues collection.

···

On October 12, 2016 at 8:54:45 AM, Rien via swift-evolution ( swift-evolution@swift.org) wrote:

On 11 Oct 2016, at 23:28, Nate Cook via swift-evolution < swift-evolution@swift.org> wrote:
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Stephan Knitelius) #19

+1

On Wed, 12 Oct 2016 at 09:09 Said Sikira via swift-evolution <

···

swift-evolution@swift.org> wrote:

+1

On October 12, 2016 at 8:54:45 AM, Rien via swift-evolution ( > swift-evolution@swift.org) wrote:

Beautiful, +1

Rien

> On 11 Oct 2016, at 23:28, Nate Cook via swift-evolution < > swift-evolution@swift.org> wrote:
>
> Introduction
>
> This proposal addresses significant unexpected performance gaps when
using dictionaries. It introduces type-specific collections for a
Dictionary instance's keys and values properties.
>
> New DictionaryKeys and DictionaryValues collections provide efficient
key lookup and mutable access to dictionary values, enabling updates to be
performed in-place and allowing copy-on-write optimization of stored values.
>
> Motivation
>
> This proposal address two problems:
>
> • The Dictionary type keys implementation is inefficient, because
LazyMapCollection doesn't know how to forward lookups to the underlying
dictionary storage.
> • Dictionaries do not offer value-mutating APIs. The mutating key-based
subscript wraps values in an Optional. This prevents types with
copy-on-write optimizations from recognizing they are singly referenced.
> This proposal uses the following [String: [Int]] dictionary to
demonstrate these problems:
>
> var dict = ["one": [1], "two": [2, 2], "three": [3, 3, 3]]
> Inefficient dict.keys Search
>
> Swift coders normally test key membership using nil checks or
underscored optional bindings:
>
> if dict["one"] != nil
> {
>
> // ...
>
> }
>
> if let _ = dict["one"
> ] {
>
> // ...
>
> }
>
> These approaches provide the expected performance of a dictionary lookup
but they read neither well nor "Swifty". Checking keys reads much better
but introduces a serious performance penalty: this approach requires a
linear search through a dictionary's keys to find a match.
>
> if dict.keys.contains("one") {
> // ...
> }
>
> A similar dynamic plays out when comparing dict.index(forKey:) and
dict.keys.index(of:).
>
> Inefficient Value Mutation
>
> Dictionary values can be modified through the keyed subscript by direct
reassignment or by using optional chaining. Both of these statements append
1 to the array stored by the key "one":
>
> // Direct re-assignment
>
> dict[
> "one"] = (dict["one"] ?? []) + [1
> ]
>
>
> // Optional chaining
>
> dict[
> "one"]?.append(1)
> Both approaches present problems. The first is complex and hard to read.
The second ignores the case where "one" is not a key in the dictionary. It
forces its check into a higher branch and encourages forced unwrapping.
Furthermore, neither approach allows the array to grow in place. They
introduce an unnecessary copy of the array's contents even though dict is
the sole holder of its storage.
>
> Adding mutation to a dictionary's index-based subscripting isn't
possible. Changing a key stored at a particular index would almost
certainly modify its hash value, rendering the index incorrect. This
violates the requirements of the MutableCollection protocol.
>
> Proposed Solution
>
> This proposal adds a custom collection for the keys and values
dictionary properties. This follows the example set by String, which
presents multiple views of its contents. A new DictionaryKeys collection
introduces efficient key lookup, while a new DictionaryValues collection
provides a mutable collection interface to dictionary values.
>
> These changes introduce a simple and efficient way of checking whether a
dictionary includes a key:
>
> // Performant
> if dict.keys.contains("one"
> ) {
>
> // ...
>
> }
>
> As a mutable collection, values enables modification without copies or
clumsy code:
>
> if let i = dict.index(forKey: "one"
> ) {
> dict
> .values[i].append(1) // no copy here
>
> }
> else
> {
> dict[
> "one"] = [1
> ]
> }
>
> Both the keys and values collections share the same index type as
Dictionary. This allows the above sample to be rewritten as:
>
> // Using `dict.keys.index(of:)`
> if let i = dict.keys.index(of: "one"
> ) {
> dict
> .values[i].append(1
> )
> }
> else
> {
> dict[
> "one"] = [1
> ]
> }
>
> Detailed design
>
> • The standard library introduces two new collection types:
DictionaryKeys and DictionaryValues.
> • A Dictionary's keys and values property types change from
LazyMapCollection to these new types.
> • The new collection types are not directly constructable. They are
presented only as views into a dictionary.
> struct Dictionary<Key: Hashable, Value>: ...
> {
>
> var keys: DictionaryKeys<Key, Value> { get
> }
>
> var values: DictionaryValues<Key, Value> { get set
> }
>
>
> // Remaining declarations
>
> }
>
>
> /// A collection view of a dictionary's keys.
> struct DictionaryKeys<Key: Hashable, Value>
> : Collection {
>
> typealias Index = DictionaryIndex<Key, Value>
>
>
> subscript(i: Index) -> Key { get
> }
>
>
> // Other `Collection` requirements
>
> }
>
>
> /// A mutable collection view of a dictionary's values.
> struct DictionaryValues<Key: Hashable, Value>
> : MutableCollection {
>
> typealias Index = DictionaryIndex<Key, Value>
>
>
> subscript(i: Index) -> Value { get set
> }
>
>
> // Other `Collection` requirements
>
> }
>
> A sample implementation of this proposal can be found in this branch.
>
> Impact on existing code
>
> The performance improvements of using the new DictionaryKeys type and
the mutability of the DictionaryValuescollection are both additive in
nature.
>
> Most uses of these properties are transitory in nature. Adopting this
proposal should not produce a major impact on existing code. The only
impact on existing code exists where a program explicitly specifies the
type of a dictionary's keysor values property. The fix is to change the
specified type.
>
> Alternatives considered
>
> The Generics Manifesto lists nested generics as a goal. This could
impact the naming and structure of these new collection types.
>
> Instead of DictionaryKeys<Key, Value> and DictionaryValues<Key, Value>,
these types could be Dictionary<Key, Value>.Keys and Dictionary<Key,
>.Values. However, because many types in the standard library may be
revisited once such a feature is available (indices, iterators, etc.), the
current lack of nesting shouldn't prevent consideration of this proposal.
>
> It could be possible to add additional compiler features that manage
mutation through existing key-based subscripting without the copy-on-write
problems of the current implementation. I don't know enough about how that
would be implemented to speak to its feasibility or level of effort. Such a
feature would reduce the need for a mutable DictionaryValues collection.
> _______________________________________________
> swift-evolution mailing list
> swift-evolution@swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(TJ Usiyan) #20

+1 from me. Seems like a solid change.

···

On Wed, Oct 12, 2016 at 12:39 AM, Jacob Bandes-Storch via swift-evolution < swift-evolution@swift.org> wrote:

+1. Haven't hit this issue personally, but I agree it's important and the
proposed solution seems like the right fix.

On Tue, Oct 11, 2016 at 2:28 PM, Nate Cook via swift-evolution < > swift-evolution@swift.org> wrote:

Introduction

This proposal addresses significant unexpected performance gaps when
using dictionaries. It introduces type-specific collections for a
Dictionary instance's keys and values properties.

New DictionaryKeys and DictionaryValues collections provide efficient
key lookup and mutable access to dictionary values, enabling updates to be
performed in-place and allowing copy-on-write optimization of stored values.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#motivation>
Motivation

This proposal address two problems:

   - The Dictionary type keys implementation is inefficient, because
   LazyMapCollection doesn't know how to forward lookups to the
   underlying dictionary storage.
   - Dictionaries do not offer value-mutating APIs. The mutating
   key-based subscript wraps values in an Optional. This prevents types
   with copy-on-write optimizations from recognizing they are singly
   referenced.

This proposal uses the following [String: [Int]] dictionary to
demonstrate these problems:

var dict = ["one": [1], "two": [2, 2], "three": [3, 3, 3]]

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#inefficient-dictkeys-search>
Inefficient dict.keys Search

Swift coders normally test key membership using nil checks or
underscored optional bindings:

if dict["one"] != nil {
    // ...
}if let _ = dict["one"] {
    // ...
}

These approaches provide the expected performance of a dictionary lookup
but they read neither well nor "Swifty". Checking keys reads much better
but introduces a serious performance penalty: this approach requires a
linear search through a dictionary's keys to find a match.

if dict.keys.contains("one") {
    // ...
}

A similar dynamic plays out when comparing dict.index(forKey:) and
dict.keys.index(of:).

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#inefficient-value-mutation>Inefficient
Value Mutation

Dictionary values can be modified through the keyed subscript by direct
reassignment or by using optional chaining. Both of these statements append
1 to the array stored by the key "one":

// Direct re-assignment
dict["one"] = (dict["one"] ?? []) + [1]
// Optional chaining
dict["one"]?.append(1)

Both approaches present problems. The first is complex and hard to read.
The second ignores the case where "one" is not a key in the dictionary.
It forces its check into a higher branch and encourages forced unwrapping.
Furthermore, neither approach allows the array to grow in place. They
introduce an unnecessary copy of the array's contents even though dict is
the sole holder of its storage.

Adding mutation to a dictionary's index-based subscripting isn't
possible. Changing a key stored at a particular index would almost
certainly modify its hash value, rendering the index incorrect. This
violates the requirements of the MutableCollection protocol.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#proposed-solution>Proposed
Solution

This proposal adds a custom collection for the keys and values dictionary
properties. This follows the example set by String, which presents
multiple views of its contents. A new DictionaryKeys collection
introduces efficient key lookup, while a new DictionaryValues collection
provides a mutable collection interface to dictionary values.

These changes introduce a simple and efficient way of checking whether a
dictionary includes a key:

// Performantif dict.keys.contains("one") {
    // ...
}

As a mutable collection, values enables modification without copies or
clumsy code:

if let i = dict.index(forKey: "one") {
    dict.values[i].append(1) // no copy here
} else {
    dict["one"] = [1]
}

Both the keys and values collections share the same index type as
Dictionary. This allows the above sample to be rewritten as:

// Using `dict.keys.index(of:)`if let i = dict.keys.index(of: "one") {
    dict.values[i].append(1)
} else {
    dict["one"] = [1]
}

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#detailed-design>Detailed
design

   - The standard library introduces two new collection types:
   DictionaryKeys and DictionaryValues.
   - A Dictionary's keys and values property types change from
   LazyMapCollection to these new types.
   - The new collection types are not directly constructable. They are
   presented only as views into a dictionary.

struct Dictionary<Key: Hashable, Value>: ... {
    var keys: DictionaryKeys<Key, Value> { get }
    var values: DictionaryValues<Key, Value> { get set }

    // Remaining declarations
}
/// A collection view of a dictionary's keys.struct DictionaryKeys<Key: Hashable, Value>: Collection {
    typealias Index = DictionaryIndex<Key, Value>
    subscript(i: Index) -> Key { get }

    // Other `Collection` requirements
}
/// A mutable collection view of a dictionary's values.struct DictionaryValues<Key: Hashable, Value>: MutableCollection {
    typealias Index = DictionaryIndex<Key, Value>
    subscript(i: Index) -> Value { get set }

    // Other `Collection` requirements
}

A sample implementation of this proposal can be found in this branch
<https://github.com/natecook1000/swift/tree/nc-dictionary>.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#impact-on-existing-code>Impact
on existing code

The performance improvements of using the new DictionaryKeys type and
the mutability of the DictionaryValuescollection are both additive in
nature.

Most uses of these properties are transitory in nature. Adopting this
proposal should not produce a major impact on existing code. The only
impact on existing code exists where a program explicitly specifies the
type of a dictionary's keysor values property. The fix is to change the
specified type.

<https://gist.github.com/natecook1000/473720ba072fa5a0cd5e6c913de75fe1#alternatives-considered>Alternatives
considered

The Generics Manifesto
<https://github.com/apple/swift/blob/master/docs/GenericsManifesto.md> lists
nested generics as a goal. This could impact the naming and structure of
these new collection types.

Instead of DictionaryKeys<Key, Value> and DictionaryValues<Key, Value>,
these types could be Dictionary<Key, Value>.Keys and Dictionary<Key,
>.Values. However, because many types in the standard library may
be revisited once such a feature is available (indices, iterators, etc.),
the current lack of nesting shouldn't prevent consideration of this
proposal.
It could be possible to add additional compiler features that manage
mutation through existing key-based subscripting without the copy-on-write
problems of the current implementation. I don't know enough about how that
would be implemented to speak to its feasibility or level of effort. Such a
feature would reduce the need for a mutable DictionaryValues collection.

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution