[Proposal Draft] Provide Custom Collections for Dictionary Keys and Values

The issue addressed is real; I’m not sure this is the best approach.

In particular, unless I’m missing something obvious, the ownership strategy here would have to be:

- `DictionaryKeys` and `DictionaryValues` would each induce the expected +1 retain on the underlying storage
- `DictionaryValues`’s mutations avoid triggering COW on the underlying storage by skipping the usual ownership check

…as otherwise it’s unclear how you’d do those in-place mutations (and this seems to be how the implementation works...is that correct?).

That's not quite right—when you access these views through the dictionary, they do not increment the storage retain count. This is the way slicing and views currently work on other mutable types. For example, when you reverse a slice of an array in-place, the slice doesn't get its own duplicate storage:

var a = Array(1...10)
a[0..<5].reverse()
a == [5, 4, 3, 2, 1, 6, 7, 8, 9, 10]

However, if you create a new variable out of the slice and reverse that, the slice does get its own storage:

var b = Array(1...10)
var bSlice = b[0..<5]
bSlice.reverse()
bSlice == [5, 4, 3, 2, 1]
b == [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Strings and their views work the same way:

var s = "abcdefg"
s.characters.append("H") // storage updated in place
s == "abcdefgH"

var sChars = s.characters // no copy yet
sChars.removeLast() // sChars gets its own copy before the mutation
s == "abcdefgH"
String(sChars) == "abcdefg"

var t = s // no copy yet
t.characters.removeLast() // t gets a new copy here
s == "abcdefgH"
t == "abcdefg"

I don't know the name of the compiler feature that enables this, but it's a critical part of the way views and slices work.

With that design, it seems like you’d wind up allowing things like the below:

  // example A
  let foo = [ “abc”: [1,2,3], “efg”: [4,5,6] ]
  let bar = foo // shared storage, no COW
  foo.values[foo.index(of: “abc”)!].append(789) // update shared storage, no COW

  // shared storage mutated,
  // despite (a) both being `let` and (b) only foo.values getting touched
  foo[“abc”] // [1, 2, 3, 789]
  bar[“abc”] // [1, 2, 3, 789]

Example A isn't allowed—if foo and bar are both immutable, both of their `values` collections are also immutable, so there's no way to modify their shared storage.

  // example B
  var foo = [ “abc”: [1,2,3], “efg”: [4,5,6] ]
  var bar = foo // shared storage, no COW
  foo.values[foo.index(of: “abc”)!].append(789)

  // shared storage mutated only foo.values getting touched
  foo[“abc”] // [1, 2, 3, 789]
  bar[“abc”] // [1, 2, 3, 789]

Example B is incorrect—the mutation at `foo.values[...].append(789)` triggers a copy of the entire dictionary's underlying storage before allowing the mutation, since it knows that storage isn't uniquely referenced.

  // example C
  var foo = [ “abc”: [1,2,3], “efg”: [4,5,6] ]
  var bar = foo
  bar[“abc”] = [1, 2, 3, 4] // COW triggered here, no shared storage
  foo.values[foo.index(of: “abc”)!].append(789)

  // only `foo`’s storage mutated, b/c change to `bar` triggered COW
  foo[“abc”] // [1, 2, 3, 789]
  bar[“abc”] // [1, 2, 3, 4]

This is the current behavior and would remain the same after the proposed the changes.

…where both A (by itself) and the B/C contrast seem very unwelcome.

Also, even if we assume we only ever make *responsible* use, having the stdlib include such directly-mutating views would seem likely to complicate any future concurrency plans.

To reiterate, I think the issue being addressed here is extremely important…I just don’t think I can get behind this type of solution (unless I’m grossly misunderstanding its mechanics).

Nate

···

On Oct 12, 2016, at 9:32 AM, plx via swift-evolution <swift-evolution@swift.org> wrote:

Thanks for your feedback! Response below.

Hi,

I very much think the points mentioned in the motivation are worth addressing (and IMO this is not an area where “maybe the optimizer can be made smarter” can cut it; I want performance guarantees, not hopes).

[snip]

On a shallow read I like presented approach, except for

Both the keys and values collections share the same index type as Dictionary. This allows the above sample to be rewritten as:

// Using `dict.keys.index(of:)`
if let i = dict.keys.index(of: "one") {
   dict.values[i].append(1)
} else {
   dict["one"] = [1]
}

The asymmetry between the if / else branches seems ugly to me. That is once obtaining the value “directly” from dict, and once through the values-view. I don’t have a great solution here, but is is possible to subscript the dict by its `Index` as well as its `Key`?

// Using `dict.keys.index(of:)`
if let i = dict.keys.index(of: "one") {
   dict[i].append(1)
} else {
   dict["one"] = [1]
}

I share your concern with this, and there is an approach that would make this kind of interface possible. Basically, what you're describing here would necessitate changing Dictionary to act like a mutable collection of values, and instead of presenting `keys` and `values` views, we would probably offer `keys` and `keysAndValues` (or something) views. With that new model, we'd have code like the following:

var dict = ["one": [1], "two": [2, 2], "three": [3, 3, 3]]

// Iterating the dictionary itself would be like we now iterate the values collection
for val in dict {
  print(val)
}
// Output is just the values (unordered as usual)
// [1]
// [3, 3, 3]
// [2, 2]

// Iterating a new view would act like the dictionary currently does
for (key, val) in dict.keysAndValues {
  print(key, val)
}
// "one", [1]
// "three", [3, 3, 3]
// "two", [2, 2]

Any sequence or collections operations on the dictionary itself would only interact with values:

print(dict.first)
// Optional([1])
print(dict.first(where: { $0.count > 2 }))
// Optional([3, 3, 3])

I'm not strictly opposed to this approach, but I do prefer the way the current dictionary implementation presents its elements as key-value pairs. When you iterate a dictionary you're seeing all of its contents, which I like, but there are clear tradeoffs between the two. Making Dictionary a collection of values is also a much more significant change than the one proposed—we'd need to do some research into the ways dictionaries are used to know how much larger an effect that would be.

What do others think of this as an alternative solution for the motivating issues? Does anyone actually use indices right now to work with dictionaries, or is key-based read/write pretty much the only interface?

On another note, I’m not sure whether there is a way (or whether it’s even worth trying) to avoid hashing the key twice when the `else` branch is taken.

This is outside the scope of the proposal, but as far as I can see I don't think there's a solution for this that wouldn't overly expose the internal workings of the dictionary.

Thanks again,
Nate

···

On Oct 12, 2016, at 5:40 AM, Daniel Vollmer via swift-evolution <swift-evolution@swift.org> wrote:

On 11 Oct 2016, at 23:28, Nate Cook via swift-evolution <swift-evolution@swift.org> wrote:

Thanks for your feedback! Response below.

Hi,

I very much think the points mentioned in the motivation are worth
addressing (and IMO this is not an area where “maybe the optimizer can be
made smarter” can cut it; I want performance guarantees, not hopes).

[snip]

On a shallow read I like presented approach, except for

Both the keys and values collections share the same index type as
Dictionary. This allows the above sample to be rewritten as:

// Using `dict.keys.index(of:)`
if let i = dict.keys.index(of: "one") {
   dict.values[i].append(1)
} else {
   dict["one"] = [1]
}

The asymmetry between the if / else branches seems ugly to me. That is
once obtaining the value “directly” from dict, and once through the
values-view. I don’t have a great solution here, but is is possible to
subscript the dict by its `Index` as well as its `Key`?

// Using `dict.keys.index(of:)`
if let i = dict.keys.index(of: "one") {
   dict[i].append(1)
} else {
   dict["one"] = [1]
}

I share your concern with this, and there is an approach that would make
this kind of interface possible. Basically, what you're describing here
would necessitate changing Dictionary to act like a mutable collection of
values, and instead of presenting `keys` and `values` views, we would
probably offer `keys` and `keysAndValues` (or something) views. With that
new model, we'd have code like the following:

var dict = ["one": [1], "two": [2, 2], "three": [3, 3, 3]]

// Iterating the dictionary itself would be like we now iterate the values
collection
for val in dict {
print(val)
}
// Output is just the values (unordered as usual)
// [1]
// [3, 3, 3]
// [2, 2]

// Iterating a new view would act like the dictionary currently does
for (key, val) in dict.keysAndValues {
print(key, val)
}
// "one", [1]
// "three", [3, 3, 3]
// "two", [2, 2]

Any sequence or collections operations on the dictionary itself would only
interact with values:

print(dict.first)
// Optional([1])
print(dict.first(where: { $0.count > 2 }))
// Optional([3, 3, 3])

I'm not strictly opposed to this approach, but I do prefer the way the
current dictionary implementation presents its elements as key-value pairs.
When you iterate a dictionary you're seeing *all* of its contents, which
I like, but there are clear tradeoffs between the two. Making Dictionary a
collection of values is also a much more significant change than the one
proposed—we'd need to do some research into the ways dictionaries are used
to know how much larger an effect that would be.

What do others think of this as an alternative solution for the motivating
issues? Does anyone actually use indices right now to work with
dictionaries, or is key-based read/write pretty much the only interface?

I much, much prefer your proposed solution to this alternative. Most of it
is just a gut reaction, I have to admit. However, let me try to verbalize
some reasons:

For one, the performance issues are the motivating problem, and they are
solved with minimal disruption to the current Dictionary API in your
currently proposed solution. Second, I think the asymmetry identified
(while aesthetically less than perfectly pleasing) is in the nature of
opaque indices and is consistent with precedent in String. Finally, I think
it's ideal that we can introduce collection types to learners by pointing
out that arrays are notionally index-based and dictionaries are key-based
(however these types are actually implemented behind the scenes);
permitting direct subscripting by dictionary indices muddies that
distinction.

On another note, I’m not sure whether there is a way (or whether it’s even

···

On Wed, Oct 12, 2016 at 10:31 AM, Nate Cook via swift-evolution < swift-evolution@swift.org> wrote:

On Oct 12, 2016, at 5:40 AM, Daniel Vollmer via swift-evolution < > swift-evolution@swift.org> wrote:
On 11 Oct 2016, at 23:28, Nate Cook via swift-evolution < > swift-evolution@swift.org> wrote:
worth trying) to avoid hashing the key twice when the `else` branch is
taken.

This is outside the scope of the proposal, but as far as I can see I don't
think there's a solution for this that wouldn't overly expose the internal
workings of the dictionary.

Thanks again,
Nate

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

I think this is a lovely approach to solving this API design problem.

One thing I don’t quite understand yet is how these kinds of mutable views interact with copy on write semantics. COW rules can be subtle, and these views seem to put an extra twist on top that seems hard to understand/explain.

I would expect DictionaryValues to behave like a separate copy of the dictionary:

   var dict = [“foo”: 1, “bar”: 2]
   let i = dict.keys.index(of: “foo”)!
   var values = dict.values
   values[i] = 42 // COW copy
   print(dict[“foo”]) // => 1
   dict.values = values // Original storage is released
   print(dict[“foo”]) // => 42

Given that `values` contains a strong reference to the storage, I was curious to find out how this copy is elided when we write `dict.values[i] = 42`. So I tried building your branch, and I found that mutating `values` has an immediate effect on the dictionary as well:

   let dict = [“foo”: 1, “bar”: 2] // Note let, not var
   let copy = dict
   let i = dict.keys.index(of: “foo”)!
   var values = dict.values
   values[i] = 42
   print(dict[“foo”]) // => 42 (?!)
   print(copy[“foo”]) // => 42 (?!?!)

This is not the intended behavior, right?

Ha, no! Thanks for the bug report. :)

See my response here for more about COW: [swift-evolution] [Proposal Draft] Provide Custom Collections for Dictionary Keys and Values

Nate

···

On Oct 12, 2016, at 10:58 AM, Károly Lőrentey via swift-evolution <swift-evolution@swift.org> wrote:

On 2016-10-11, at 23:28, Nate Cook via swift-evolution <swift-evolution@swift.org> wrote:

Introduction

This proposal addresses significant unexpected performance gaps when using dictionaries. It introduces type-specific collections for a Dictionary instance's keys and values properties.

New DictionaryKeys and DictionaryValues collections provide efficient key lookup and mutable access to dictionary values, enabling updates to be performed in-place and allowing copy-on-write optimization of stored values.

Motivation

This proposal address two problems:

The Dictionary type keys implementation is inefficient, because LazyMapCollection doesn't know how to forward lookups to the underlying dictionary storage.
Dictionaries do not offer value-mutating APIs. The mutating key-based subscript wraps values in an Optional. This prevents types with copy-on-write optimizations from recognizing they are singly referenced.
This proposal uses the following [String: [Int]] dictionary to demonstrate these problems:

var dict = ["one": [1], "two": [2, 2], "three": [3, 3, 3]]
Inefficient dict.keys Search

Swift coders normally test key membership using nil checks or underscored optional bindings:

if dict["one"] != nil {
    // ...
}
if let _ = dict["one"] {
    // ...
}
These approaches provide the expected performance of a dictionary lookup but they read neither well nor "Swifty". Checking keys reads much better but introduces a serious performance penalty: this approach requires a linear search through a dictionary's keys to find a match.

if dict.keys.contains("one") {
    // ...
}
A similar dynamic plays out when comparing dict.index(forKey:) and dict.keys.index(of:).

Inefficient Value Mutation

Dictionary values can be modified through the keyed subscript by direct reassignment or by using optional chaining. Both of these statements append 1 to the array stored by the key "one":

// Direct re-assignment
dict["one"] = (dict["one"] ?? ) + [1]

// Optional chaining
dict["one"]?.append(1)
Both approaches present problems. The first is complex and hard to read. The second ignores the case where "one" is not a key in the dictionary. It forces its check into a higher branch and encourages forced unwrapping. Furthermore, neither approach allows the array to grow in place. They introduce an unnecessary copy of the array's contents even though dict is the sole holder of its storage.

Adding mutation to a dictionary's index-based subscripting isn't possible. Changing a key stored at a particular index would almost certainly modify its hash value, rendering the index incorrect. This violates the requirements of the MutableCollection protocol.

Proposed Solution

This proposal adds a custom collection for the keys and values dictionary properties. This follows the example set by String, which presents multiple views of its contents. A new DictionaryKeys collection introduces efficient key lookup, while a new DictionaryValues collection provides a mutable collection interface to dictionary values.

These changes introduce a simple and efficient way of checking whether a dictionary includes a key:

// Performant
if dict.keys.contains("one") {
    // ...
}
As a mutable collection, values enables modification without copies or clumsy code:

if let i = dict.index(forKey: "one") {
    dict.values[i].append(1) // no copy here
} else {
    dict["one"] = [1]
}
Both the keys and values collections share the same index type as Dictionary. This allows the above sample to be rewritten as:

// Using `dict.keys.index(of:)`
if let i = dict.keys.index(of: "one") {
    dict.values[i].append(1)
} else {
    dict["one"] = [1]
}
Detailed design

The standard library introduces two new collection types: DictionaryKeys and DictionaryValues.
A Dictionary's keys and values property types change from LazyMapCollection to these new types.
The new collection types are not directly constructable. They are presented only as views into a dictionary.
struct Dictionary<Key: Hashable, Value>: ... {
    var keys: DictionaryKeys<Key, Value> { get }
    var values: DictionaryValues<Key, Value> { get set }

    // Remaining declarations
}

/// A collection view of a dictionary's keys.
struct DictionaryKeys<Key: Hashable, Value>: Collection {
    typealias Index = DictionaryIndex<Key, Value>
    subscript(i: Index) -> Key { get }

    // Other `Collection` requirements
}

/// A mutable collection view of a dictionary's values.
struct DictionaryValues<Key: Hashable, Value>: MutableCollection {
    typealias Index = DictionaryIndex<Key, Value>
    subscript(i: Index) -> Value { get set }

    // Other `Collection` requirements
}
A sample implementation of this proposal can be found in this branch.

Impact on existing code

The performance improvements of using the new DictionaryKeys type and the mutability of the DictionaryValuescollection are both additive in nature.

Most uses of these properties are transitory in nature. Adopting this proposal should not produce a major impact on existing code. The only impact on existing code exists where a program explicitly specifies the type of a dictionary's keysor values property. The fix is to change the specified type.

Alternatives considered

The Generics Manifesto lists nested generics as a goal. This could impact the naming and structure of these new collection types.

Instead of DictionaryKeys<Key, Value> and DictionaryValues<Key, Value>, these types could be Dictionary<Key, Value>.Keys and Dictionary<Key, Value>.Values. However, because many types in the standard library may be revisited once such a feature is available (indices, iterators, etc.), the current lack of nesting shouldn't prevent consideration of this proposal.

It could be possible to add additional compiler features that manage mutation through existing key-based subscripting without the copy-on-write problems of the current implementation. I don't know enough about how that would be implemented to speak to its feasibility or level of effort. Such a feature would reduce the need for a mutable DictionaryValues collection.
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Ah, I see plx has already brought this up. So this is a bug in the implementation, and (presumably) DictionaryValue’s mutable subscript addressor is supposed to take care of COW semantics without introducing needless copying.

(It’s too bad we aren’t supposed to use these magical addressors outside stdlib; the same issues tend to come up in custom collections, too.) :-(

···

On 2016-10-12, at 17:58, Károly Lőrentey via swift-evolution <swift-evolution@swift.org> wrote:

I think this is a lovely approach to solving this API design problem.

One thing I don’t quite understand yet is how these kinds of mutable views interact with copy on write semantics. COW rules can be subtle, and these views seem to put an extra twist on top that seems hard to understand/explain.

I would expect DictionaryValues to behave like a separate copy of the dictionary:

   var dict = [“foo”: 1, “bar”: 2]
   let i = dict.keys.index(of: “foo”)!
   var values = dict.values
   values[i] = 42 // COW copy
   print(dict[“foo”]) // => 1
   dict.values = values // Original storage is released
   print(dict[“foo”]) // => 42

Given that `values` contains a strong reference to the storage, I was curious to find out how this copy is elided when we write `dict.values[i] = 42`. So I tried building your branch, and I found that mutating `values` has an immediate effect on the dictionary as well:

   let dict = [“foo”: 1, “bar”: 2] // Note let, not var
   let copy = dict
   let i = dict.keys.index(of: “foo”)!
   var values = dict.values
   values[i] = 42
   print(dict[“foo”]) // => 42 (?!)
   print(copy[“foo”]) // => 42 (?!?!)

This is not the intended behavior, right?

Károly

On 2016-10-11, at 23:28, Nate Cook via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Introduction

This proposal addresses significant unexpected performance gaps when using dictionaries. It introduces type-specific collections for a Dictionary instance's keys and values properties.

New DictionaryKeys and DictionaryValues collections provide efficient key lookup and mutable access to dictionary values, enabling updates to be performed in-place and allowing copy-on-write optimization of stored values.

<dictionary-proposal.md · GitHub

This proposal address two problems:

The Dictionary type keys implementation is inefficient, because LazyMapCollection doesn't know how to forward lookups to the underlying dictionary storage.
Dictionaries do not offer value-mutating APIs. The mutating key-based subscript wraps values in an Optional. This prevents types with copy-on-write optimizations from recognizing they are singly referenced.
This proposal uses the following [String: [Int]] dictionary to demonstrate these problems:

var dict = ["one": [1], "two": [2, 2], "three": [3, 3, 3]]
<dictionary-proposal.md · GitHub dict.keys Search

Swift coders normally test key membership using nil checks or underscored optional bindings:

if dict["one"] != nil {
    // ...
}
if let _ = dict["one"] {
    // ...
}
These approaches provide the expected performance of a dictionary lookup but they read neither well nor "Swifty". Checking keys reads much better but introduces a serious performance penalty: this approach requires a linear search through a dictionary's keys to find a match.

if dict.keys.contains("one") {
    // ...
}
A similar dynamic plays out when comparing dict.index(forKey:) and dict.keys.index(of:).

<dictionary-proposal.md · GitHub Value Mutation

Dictionary values can be modified through the keyed subscript by direct reassignment or by using optional chaining. Both of these statements append 1 to the array stored by the key "one":

// Direct re-assignment
dict["one"] = (dict["one"] ?? ) + [1]

// Optional chaining
dict["one"]?.append(1)
Both approaches present problems. The first is complex and hard to read. The second ignores the case where "one" is not a key in the dictionary. It forces its check into a higher branch and encourages forced unwrapping. Furthermore, neither approach allows the array to grow in place. They introduce an unnecessary copy of the array's contents even though dict is the sole holder of its storage.

Adding mutation to a dictionary's index-based subscripting isn't possible. Changing a key stored at a particular index would almost certainly modify its hash value, rendering the index incorrect. This violates the requirements of the MutableCollection protocol.

<dictionary-proposal.md · GitHub Solution

This proposal adds a custom collection for the keys and values dictionary properties. This follows the example set by String, which presents multiple views of its contents. A new DictionaryKeys collection introduces efficient key lookup, while a new DictionaryValues collection provides a mutable collection interface to dictionary values.

These changes introduce a simple and efficient way of checking whether a dictionary includes a key:

// Performant
if dict.keys.contains("one") {
    // ...
}
As a mutable collection, values enables modification without copies or clumsy code:

if let i = dict.index(forKey: "one") {
    dict.values[i].append(1) // no copy here
} else {
    dict["one"] = [1]
}
Both the keys and values collections share the same index type as Dictionary. This allows the above sample to be rewritten as:

// Using `dict.keys.index(of:)`
if let i = dict.keys.index(of: "one") {
    dict.values[i].append(1)
} else {
    dict["one"] = [1]
}
<dictionary-proposal.md · GitHub design

The standard library introduces two new collection types: DictionaryKeys and DictionaryValues.
A Dictionary's keys and values property types change from LazyMapCollection to these new types.
The new collection types are not directly constructable. They are presented only as views into a dictionary.
struct Dictionary<Key: Hashable, Value>: ... {
    var keys: DictionaryKeys<Key, Value> { get }
    var values: DictionaryValues<Key, Value> { get set }

    // Remaining declarations
}

/// A collection view of a dictionary's keys.
struct DictionaryKeys<Key: Hashable, Value>: Collection {
    typealias Index = DictionaryIndex<Key, Value>
    subscript(i: Index) -> Key { get }

    // Other `Collection` requirements
}

/// A mutable collection view of a dictionary's values.
struct DictionaryValues<Key: Hashable, Value>: MutableCollection {
    typealias Index = DictionaryIndex<Key, Value>
    subscript(i: Index) -> Value { get set }

    // Other `Collection` requirements
}
A sample implementation of this proposal can be found in this branch <https://github.com/natecook1000/swift/tree/nc-dictionary&gt;\.

<dictionary-proposal.md · GitHub on existing code

The performance improvements of using the new DictionaryKeys type and the mutability of the DictionaryValuescollection are both additive in nature.

Most uses of these properties are transitory in nature. Adopting this proposal should not produce a major impact on existing code. The only impact on existing code exists where a program explicitly specifies the type of a dictionary's keysor values property. The fix is to change the specified type.

<dictionary-proposal.md · GitHub considered

The Generics Manifesto <https://github.com/apple/swift/blob/master/docs/GenericsManifesto.md&gt; lists nested generics as a goal. This could impact the naming and structure of these new collection types.

Instead of DictionaryKeys<Key, Value> and DictionaryValues<Key, Value>, these types could be Dictionary<Key, Value>.Keys and Dictionary<Key, Value>.Values. However, because many types in the standard library may be revisited once such a feature is available (indices, iterators, etc.), the current lack of nesting shouldn't prevent consideration of this proposal.

It could be possible to add additional compiler features that manage mutation through existing key-based subscripting without the copy-on-write problems of the current implementation. I don't know enough about how that would be implemented to speak to its feasibility or level of effort. Such a feature would reduce the need for a mutable DictionaryValues collection.
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Just to clarify: It seems like the only ABI-affecting change here is the type of keys/values. As you
note at the end of your proposal, this should just be Dictionary.Keys/Dictionary.Values regardless
of whether we implement this proposal or not, in which case this can
be punted for Swift 4.

No, it can't, in the sense that just making them typealiases would be
insufficient to create resilience.

It should be fine to keep .Keys/.Values resilient so that we can
change their implementation details later if we want.

On the actual proposal: this is a pretty reasonable given Swift’s
current design and constraints. That said, I expect pushing forward on
this kind of thing right now is premature given the goals of Swift
4. A major aspect of Swift 4 is reworking the way CoW semantics
function internally, which could drastically affect the way we
approach this problem.

I’d really like if we could eliminate the “double search/hash” in the
no-existing-key case. There are ways to do this really cleanly, but
they probably involve more advanced CoW-safety propagation. In
particular, you want some way for the collection to return its search
state to the caller so that they can hand it back to insertion to just
resume from there.

For instance:

map.entries[key] // An enum like Found(Value) | NotFound(SearchState)
   .withDefault(value: ) // Unwrap the enum by completing the NotFound(SearchState)
   .append(1) // Now we have a value in both cases, we can append!

Or more complex:

map.entries[key]
   .withDefault { /* logic that computes value */ }
   .append(1)

I think this can be made to work in the current system if withDefault
is actually `[withDefault:]`, which is fine but a bit weird from a
user’s perspective.

IMO this should be written as follows:

  map[key, default: ].append(1)

In an ideal world the user could actually pattern match on the result
of `entries[key]`. In this way they could match on it and perform
special logic in both cases for really complex situations. This would
make withDefault “just a convenience”, so we aren’t pressured to add
more methods like it every time someone has a new Even More Complex
use-case. e.g.:

switch map.entries[key] {
case .Found(entry):
  if entry.value == 10 {
    entry.remove()
    print(“Found a value too many times! Moving key to fast-path auxiliary structure…”)
  } else {
    entry.value += 1
  }
case .NotFound(entry):
  entry.insert(1)
  print(“Found a value for the first time! Registering a bunch of extra stuff…”)
}

But again, this is all dependent on a much more powerful SIL/ARC, and
we just don’t know what we’re going to get at this stage.

This compiles today:

  extension Dictionary {
    subscript(k: Key, body: (inout Value?)->()) -> Void {
      get {
        // Exercise for the reader. Efficient and safe implementation only
        // possible inside the stdlib.
      }
    }
  }

  map[key] { v in
    if let found = v {
      if found == 10 { v = nil; print("Found too many") }
      else { v = found + 1 }
    }
    else {
      v = 1
      print("Found first")
    }
  }

No need, really, to use subscript; it could be spelled:

  map.withValue(forKey: key) { ... }

ersump'n.

···

on Wed Oct 12 2016, Alexis <swift-evolution@swift.org> wrote:

--
-Dave

A late-arriving strong +1 for me. The index-related stuff is elegant and much needed. I’m surprised
to learn that dict.keys and dict.values are copies and not already
views!

They are views.

···

on Fri Oct 14 2016, Paul Cantrell <swift-evolution@swift.org> wrote:

Clearly they should be.

Question: I hit a closely related performance wall just last week, doing something like this:

    for k in dict.keys {
        dict.values[k].append(1)
    }

I assume / hope the proposal would also support this?

    for i in dict.indices {
        dict.values[i].append(1)
    }

…or would it be this?

    for i in dict.keys.indices {
        dict.values[i].append(1)
    }

…or either?

Cheers, P

On Oct 11, 2016, at 4:28 PM, Nate Cook via swift-evolution > <swift-evolution@swift.org> wrote:

Introduction

This proposal addresses significant unexpected performance gaps when using dictionaries. It

introduces type-specific collections for a Dictionary instance's keys and values properties.

New DictionaryKeys and DictionaryValues collections provide efficient key lookup and mutable access to dictionary values, enabling updates to be performed in-place and allowing copy-on-write optimization of stored values.

<dictionary-proposal.md · GitHub

This proposal address two problems:

The Dictionary type keys implementation is inefficient, because LazyMapCollection doesn't know how to forward lookups to the underlying dictionary storage.
Dictionaries do not offer value-mutating APIs. The mutating key-based subscript wraps values in an Optional. This prevents types with copy-on-write optimizations from recognizing they are singly referenced.
This proposal uses the following [String: [Int]] dictionary to demonstrate these problems:

var dict = ["one": [1], "two": [2, 2], "three": [3, 3, 3]]
<dictionary-proposal.md · GitHub dict.keys Search

Swift coders normally test key membership using nil checks or underscored optional bindings:

if dict["one"] != nil {
    // ...
}
if let _ = dict["one"] {
    // ...
}
These approaches provide the expected performance of a dictionary lookup but they read neither well nor "Swifty". Checking keys reads much better but introduces a serious performance penalty: this approach requires a linear search through a dictionary's keys to find a match.

if dict.keys.contains("one") {
    // ...
}
A similar dynamic plays out when comparing dict.index(forKey:) and dict.keys.index(of:).

<dictionary-proposal.md · GitHub Value Mutation

Dictionary values can be modified through the keyed subscript by direct reassignment or by using optional chaining. Both of these statements append 1 to the array stored by the key "one":

// Direct re-assignment
dict["one"] = (dict["one"] ?? ) + [1]

// Optional chaining
dict["one"]?.append(1)
Both approaches present problems. The first is complex and hard to read. The second ignores the case where "one" is not a key in the dictionary. It forces its check into a higher branch and encourages forced unwrapping. Furthermore, neither approach allows the array to grow in place. They introduce an unnecessary copy of the array's contents even though dict is the sole holder of its storage.

Adding mutation to a dictionary's index-based subscripting isn't possible. Changing a key stored at a particular index would almost certainly modify its hash value, rendering the index incorrect. This violates the requirements of the MutableCollection protocol.

<dictionary-proposal.md · GitHub Solution

This proposal adds a custom collection for the keys and values dictionary properties. This follows the example set by String, which presents multiple views of its contents. A new DictionaryKeys collection introduces efficient key lookup, while a new DictionaryValues collection provides a mutable collection interface to dictionary values.

These changes introduce a simple and efficient way of checking whether a dictionary includes a key:

// Performant
if dict.keys.contains("one") {
    // ...
}
As a mutable collection, values enables modification without copies or clumsy code:

if let i = dict.index(forKey: "one") {
    dict.values[i].append(1) // no copy here
} else {
    dict["one"] = [1]
}
Both the keys and values collections share the same index type as Dictionary. This allows the above sample to be rewritten as:

// Using `dict.keys.index(of:)`
if let i = dict.keys.index(of: "one") {
    dict.values[i].append(1)
} else {
    dict["one"] = [1]
}
<dictionary-proposal.md · GitHub design

The standard library introduces two new collection types: DictionaryKeys and DictionaryValues.
A Dictionary's keys and values property types change from LazyMapCollection to these new types.
The new collection types are not directly constructable. They are presented only as views into a dictionary.
struct Dictionary<Key: Hashable, Value>: ... {
    var keys: DictionaryKeys<Key, Value> { get }
    var values: DictionaryValues<Key, Value> { get set }

    // Remaining declarations
}

/// A collection view of a dictionary's keys.
struct DictionaryKeys<Key: Hashable, Value>: Collection {
    typealias Index = DictionaryIndex<Key, Value>
    subscript(i: Index) -> Key { get }

    // Other `Collection` requirements
}

/// A mutable collection view of a dictionary's values.
struct DictionaryValues<Key: Hashable, Value>: MutableCollection {
    typealias Index = DictionaryIndex<Key, Value>
    subscript(i: Index) -> Value { get set }

    // Other `Collection` requirements
}
A sample implementation of this proposal can be found in this branch <https://github.com/natecook1000/swift/tree/nc-dictionary&gt;\.

<dictionary-proposal.md · GitHub on existing code

The performance improvements of using the new DictionaryKeys type and the mutability of the DictionaryValuescollection are both additive in nature.

Most uses of these properties are transitory in nature. Adopting this proposal should not produce a major impact on existing code. The only impact on existing code exists where a program explicitly specifies the type of a dictionary's keysor values property. The fix is to change the specified type.

<dictionary-proposal.md · GitHub considered

The Generics Manifesto <https://github.com/apple/swift/blob/master/docs/GenericsManifesto.md&gt;

lists nested generics as a goal. This could impact the naming and structure of these new collection
types.

Instead of DictionaryKeys<Key, Value> and DictionaryValues<Key, Value>, these types could be

Dictionary<Key, Value>.Keys and Dictionary<Key, Value>.Values. However, because many types in the
standard library may be revisited once such a feature is available (indices, iterators, etc.), the
current lack of nesting shouldn't prevent consideration of this proposal.

It could be possible to add additional compiler features that manage mutation through existing

key-based subscripting without the copy-on-write problems of the current implementation. I don't
know enough about how that would be implemented to speak to its feasibility or level of effort. Such
a feature would reduce the need for a mutable DictionaryValues collection.

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

--
-Dave

Thanks for the quick reply; given that I’m quite wrong about the important mechanics I rescind my criticisms.

I will say I care about this enough to reply because the inability to do in-place mutation of dictionary values has been an incredibly frustrating limitation and I’d just assumed the situation with slices/views would necessarily have similar issues for similar reasons…but glad to learn it’s not what I thought!

That said, I think efficient in-place mutation is too important to only expose so implicitly (seemingly due to the compiler eliding the otherwise-expected retain increments when the view is sufficiently “transient”…which seems like you perhaps can’t have an "in-place capable" view that’s implemented as a class, I’d think).

But none of this impacts my change to being in support for the proposal.

···

On Oct 12, 2016, at 10:07 AM, Nate Cook <natecook@gmail.com> wrote:

On Oct 12, 2016, at 9:32 AM, plx via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

The issue addressed is real; I’m not sure this is the best approach.

In particular, unless I’m missing something obvious, the ownership strategy here would have to be:

- `DictionaryKeys` and `DictionaryValues` would each induce the expected +1 retain on the underlying storage
- `DictionaryValues`’s mutations avoid triggering COW on the underlying storage by skipping the usual ownership check

…as otherwise it’s unclear how you’d do those in-place mutations (and this seems to be how the implementation works...is that correct?).

That's not quite right—when you access these views through the dictionary, they do not increment the storage retain count. This is the way slicing and views currently work on other mutable types. For example, when you reverse a slice of an array in-place, the slice doesn't get its own duplicate storage:

var a = Array(1...10)
a[0..<5].reverse()
a == [5, 4, 3, 2, 1, 6, 7, 8, 9, 10]

However, if you create a new variable out of the slice and reverse that, the slice does get its own storage:

var b = Array(1...10)
var bSlice = b[0..<5]
bSlice.reverse()
bSlice == [5, 4, 3, 2, 1]
b == [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Strings and their views work the same way:

var s = "abcdefg"
s.characters.append("H") // storage updated in place
s == "abcdefgH"

var sChars = s.characters // no copy yet
sChars.removeLast() // sChars gets its own copy before the mutation
s == "abcdefgH"
String(sChars) == "abcdefg"

var t = s // no copy yet
t.characters.removeLast() // t gets a new copy here
s == "abcdefgH"
t == "abcdefg"

I don't know the name of the compiler feature that enables this, but it's a critical part of the way views and slices work.

With that design, it seems like you’d wind up allowing things like the below:

  // example A
  let foo = [ “abc”: [1,2,3], “efg”: [4,5,6] ]
  let bar = foo // shared storage, no COW
  foo.values[foo.index(of: “abc”)!].append(789) // update shared storage, no COW

  // shared storage mutated,
  // despite (a) both being `let` and (b) only foo.values getting touched
  foo[“abc”] // [1, 2, 3, 789]
  bar[“abc”] // [1, 2, 3, 789]

Example A isn't allowed—if foo and bar are both immutable, both of their `values` collections are also immutable, so there's no way to modify their shared storage.

  // example B
  var foo = [ “abc”: [1,2,3], “efg”: [4,5,6] ]
  var bar = foo // shared storage, no COW
  foo.values[foo.index(of: “abc”)!].append(789)

  // shared storage mutated only foo.values getting touched
  foo[“abc”] // [1, 2, 3, 789]
  bar[“abc”] // [1, 2, 3, 789]

Example B is incorrect—the mutation at `foo.values[...].append(789)` triggers a copy of the entire dictionary's underlying storage before allowing the mutation, since it knows that storage isn't uniquely referenced.

  // example C
  var foo = [ “abc”: [1,2,3], “efg”: [4,5,6] ]
  var bar = foo
  bar[“abc”] = [1, 2, 3, 4] // COW triggered here, no shared storage
  foo.values[foo.index(of: “abc”)!].append(789)

  // only `foo`’s storage mutated, b/c change to `bar` triggered COW
  foo[“abc”] // [1, 2, 3, 789]
  bar[“abc”] // [1, 2, 3, 4]

This is the current behavior and would remain the same after the proposed the changes.

…where both A (by itself) and the B/C contrast seem very unwelcome.

Also, even if we assume we only ever make *responsible* use, having the stdlib include such directly-mutating views would seem likely to complicate any future concurrency plans.

To reiterate, I think the issue being addressed here is extremely important…I just don’t think I can get behind this type of solution (unless I’m grossly misunderstanding its mechanics).

Nate

The issue addressed is real; I’m not sure this is the best approach.

In particular, unless I’m missing something obvious, the ownership strategy here would have to be:

- `DictionaryKeys` and `DictionaryValues` would each induce the expected +1 retain on the

underlying storage

- `DictionaryValues`’s mutations avoid triggering COW on the underlying storage by skipping the

usual ownership check

…as otherwise it’s unclear how you’d do those in-place mutations
(and this seems to be how the implementation works...is that
correct?).

That's not quite right—when you access these views through the
dictionary, they do not increment the storage retain count. This is
the way slicing and views currently work on other mutable types. For
example, when you reverse a slice of an array in-place, the slice
doesn't get its own duplicate storage:

var a = Array(1...10)
a[0..<5].reverse()
a == [5, 4, 3, 2, 1, 6, 7, 8, 9, 10]

Oh, yes it certainly does. This is currently inefficient because
pinning is tied to addressors and addressors can only return pointers to
things that actually exist in memory somewhere. The slice doesn't.

However, if you create a new variable out of the slice and reverse that, the slice does get its own
storage:

var b = Array(1...10)
var bSlice = b[0..<5]
bSlice.reverse()
bSlice == [5, 4, 3, 2, 1]
b == [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

This doesn't demonstrate anything; you're just showing the effects of
value semantics. If you go

b[0..<5] = bSlice

you'll then have

b == [5, 4, 3, 2, 1, 6, 7, 8, 9, 10]

And that is exactly what happens in the first example.

This is a major flaw that prevents Swift's collection model from being
fully general and efficient. I think we will be fixing it, by fixing
the way inout works, as a consequence of the work on ownership, for
Swift 4. But as noted elsewhere, that's a wild-ass guess at the moment.

Strings and their views work the same way:

var s = "abcdefg"
s.characters.append("H") // storage updated in place
s == "abcdefgH"

var sChars = s.characters // no copy yet
sChars.removeLast() // sChars gets its own copy before the mutation
s == "abcdefgH"
String(sChars) == "abcdefg"

var t = s // no copy yet
t.characters.removeLast() // t gets a new copy here
s == "abcdefgH"
t == "abcdefg"

I don't know the name of the compiler feature that enables this, but
it's a critical part of the way views and slices work.

I wish :-)

···

on Wed Oct 12 2016, Nate Cook <swift-evolution@swift.org> wrote:

On Oct 12, 2016, at 9:32 AM, plx via swift-evolution <swift-evolution@swift.org> wrote:

With that design, it seems like you’d wind up allowing things like the below:

  // example A
  let foo = [ “abc”: [1,2,3], “efg”: [4,5,6] ]
  let bar = foo // shared storage, no COW
  foo.values[foo.index(of: “abc”)!].append(789) // update shared storage, no COW

  // shared storage mutated,
  // despite (a) both being `let` and (b) only foo.values getting touched
  foo[“abc”] // [1, 2, 3, 789]
  bar[“abc”] // [1, 2, 3, 789]

Example A isn't allowed—if foo and bar are both immutable, both of
their `values` collections are also immutable, so there's no way to
modify their shared storage.

  // example B
  var foo = [ “abc”: [1,2,3], “efg”: [4,5,6] ]
  var bar = foo // shared storage, no COW
  foo.values[foo.index(of: “abc”)!].append(789)

  // shared storage mutated only foo.values getting touched
  foo[“abc”] // [1, 2, 3, 789]
  bar[“abc”] // [1, 2, 3, 789]

Example B is incorrect—the mutation at `foo.values[...].append(789)`
triggers a copy of the entire dictionary's underlying storage before
allowing the mutation, since it knows that storage isn't uniquely
referenced.

  // example C
  var foo = [ “abc”: [1,2,3], “efg”: [4,5,6] ]
  var bar = foo
  bar[“abc”] = [1, 2, 3, 4] // COW triggered here, no shared storage
  foo.values[foo.index(of: “abc”)!].append(789)

  // only `foo`’s storage mutated, b/c change to `bar` triggered COW
  foo[“abc”] // [1, 2, 3, 789]
  bar[“abc”] // [1, 2, 3, 4]

This is the current behavior and would remain the same after the proposed the changes.

…where both A (by itself) and the B/C contrast seem very unwelcome.

Also, even if we assume we only ever make *responsible* use, having
the stdlib include such directly-mutating views would seem likely to
complicate any future concurrency plans.

To reiterate, I think the issue being addressed here is extremely
important…I just don’t think I can get behind this type of solution
(unless I’m grossly misunderstanding its mechanics).

Nate

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

--
-Dave

A late-arriving strong +1 for me. The index-related stuff is elegant and much needed. I’m surprised
to learn that dict.keys and dict.values are copies and not already
views!

They are views.

Oh, OK. I stand … er, type … corrected! I was thinking of this bit of the proposal:

this approach requires a linear search through a dictionary's keys to find a match.

if dict.keys.contains("one") {

I had wrongly assumed in the past that contains() would forward to an efficient lookup against the underlying dictionary. That’s what surprised me. It clearly deserves fixing.

From that, I made the incorrect leap to assuming that dict.keys made a copy. On more careful rereading and a visit to the docs for LazyMapCollection, the problem is much clearer to me now! Thanks as always for setting me right on the details.

Cheers, P

···

On Oct 15, 2016, at 8:34 PM, Dave Abrahams via swift-evolution <swift-evolution@swift.org> wrote:
on Fri Oct 14 2016, Paul Cantrell <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

I believe the implementation of efficient in-place mutation is very explicit in the code -- it is done by implementing DictionaryValue’s subscript using a special “mutableAddressWithNativeOwner” addressor instead of a normal setter.

https://github.com/natecook1000/swift/blob/ed95aec4a20589a3b9c131f43444aa33705343cc/stdlib/public/core/HashedCollections.swift.gyb#L2169-L2173

AFAICU this would also work if DictionaryValue was a reference type.

Unfortunately, as far as I know, these addressors aren’t available outside stdlib, so custom collections cannot currently implement such mutable views (including mutable ranges) in a similarly efficient way.

We can, however, approximate a similar effect outside of stdlib by designing closure-based APIs like `mydict.withValues { values in values[i] = 42 }`, in which the collection moves its storage to the view while the closure is executing (temporarily making its own contents disappear / appear invalid). The syntax and underlying mental model is perhaps not as nice, but (assuming the compiler is able to optimize away the nonescaping closure) we can achieve some of the performance benefits.

···

On 2016-10-12, at 19:17, plx via swift-evolution <swift-evolution@swift.org> wrote:

Thanks for the quick reply; given that I’m quite wrong about the important mechanics I rescind my criticisms.

I will say I care about this enough to reply because the inability to do in-place mutation of dictionary values has been an incredibly frustrating limitation and I’d just assumed the situation with slices/views would necessarily have similar issues for similar reasons…but glad to learn it’s not what I thought!

That said, I think efficient in-place mutation is too important to only expose so implicitly (seemingly due to the compiler eliding the otherwise-expected retain increments when the view is sufficiently “transient”…which seems like you perhaps can’t have an "in-place capable" view that’s implemented as a class, I’d think).

But none of this impacts my change to being in support for the proposal.

On Oct 12, 2016, at 10:07 AM, Nate Cook <natecook@gmail.com <mailto:natecook@gmail.com>> wrote:

On Oct 12, 2016, at 9:32 AM, plx via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

The issue addressed is real; I’m not sure this is the best approach.

In particular, unless I’m missing something obvious, the ownership strategy here would have to be:

- `DictionaryKeys` and `DictionaryValues` would each induce the expected +1 retain on the underlying storage
- `DictionaryValues`’s mutations avoid triggering COW on the underlying storage by skipping the usual ownership check

…as otherwise it’s unclear how you’d do those in-place mutations (and this seems to be how the implementation works...is that correct?).

That's not quite right—when you access these views through the dictionary, they do not increment the storage retain count. This is the way slicing and views currently work on other mutable types. For example, when you reverse a slice of an array in-place, the slice doesn't get its own duplicate storage:

var a = Array(1...10)
a[0..<5].reverse()
a == [5, 4, 3, 2, 1, 6, 7, 8, 9, 10]

However, if you create a new variable out of the slice and reverse that, the slice does get its own storage:

var b = Array(1...10)
var bSlice = b[0..<5]
bSlice.reverse()
bSlice == [5, 4, 3, 2, 1]
b == [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Strings and their views work the same way:

var s = "abcdefg"
s.characters.append("H") // storage updated in place
s == "abcdefgH"

var sChars = s.characters // no copy yet
sChars.removeLast() // sChars gets its own copy before the mutation
s == "abcdefgH"
String(sChars) == "abcdefg"

var t = s // no copy yet
t.characters.removeLast() // t gets a new copy here
s == "abcdefgH"
t == "abcdefg"

I don't know the name of the compiler feature that enables this, but it's a critical part of the way views and slices work.

With that design, it seems like you’d wind up allowing things like the below:

  // example A
  let foo = [ “abc”: [1,2,3], “efg”: [4,5,6] ]
  let bar = foo // shared storage, no COW
  foo.values[foo.index(of: “abc”)!].append(789) // update shared storage, no COW

  // shared storage mutated,
  // despite (a) both being `let` and (b) only foo.values getting touched
  foo[“abc”] // [1, 2, 3, 789]
  bar[“abc”] // [1, 2, 3, 789]

Example A isn't allowed—if foo and bar are both immutable, both of their `values` collections are also immutable, so there's no way to modify their shared storage.

  // example B
  var foo = [ “abc”: [1,2,3], “efg”: [4,5,6] ]
  var bar = foo // shared storage, no COW
  foo.values[foo.index(of: “abc”)!].append(789)

  // shared storage mutated only foo.values getting touched
  foo[“abc”] // [1, 2, 3, 789]
  bar[“abc”] // [1, 2, 3, 789]

Example B is incorrect—the mutation at `foo.values[...].append(789)` triggers a copy of the entire dictionary's underlying storage before allowing the mutation, since it knows that storage isn't uniquely referenced.

  // example C
  var foo = [ “abc”: [1,2,3], “efg”: [4,5,6] ]
  var bar = foo
  bar[“abc”] = [1, 2, 3, 4] // COW triggered here, no shared storage
  foo.values[foo.index(of: “abc”)!].append(789)

  // only `foo`’s storage mutated, b/c change to `bar` triggered COW
  foo[“abc”] // [1, 2, 3, 789]
  bar[“abc”] // [1, 2, 3, 4]

This is the current behavior and would remain the same after the proposed the changes.

…where both A (by itself) and the B/C contrast seem very unwelcome.

Also, even if we assume we only ever make *responsible* use, having the stdlib include such directly-mutating views would seem likely to complicate any future concurrency plans.

To reiterate, I think the issue being addressed here is extremely important…I just don’t think I can get behind this type of solution (unless I’m grossly misunderstanding its mechanics).

Nate

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

The issue addressed is real; I’m not sure this is the best approach.

In particular, unless I’m missing something obvious, the ownership strategy here would have to be:

- `DictionaryKeys` and `DictionaryValues` would each induce the expected +1 retain on the

underlying storage

- `DictionaryValues`’s mutations avoid triggering COW on the underlying storage by skipping the

usual ownership check

…as otherwise it’s unclear how you’d do those in-place mutations
(and this seems to be how the implementation works...is that
correct?).

That's not quite right—when you access these views through the
dictionary, they do not increment the storage retain count. This is
the way slicing and views currently work on other mutable types. For
example, when you reverse a slice of an array in-place, the slice
doesn't get its own duplicate storage:

var a = Array(1...10)
a[0..<5].reverse()
a == [5, 4, 3, 2, 1, 6, 7, 8, 9, 10]

Oh, yes it certainly does. This is currently inefficient because
pinning is tied to addressors and addressors can only return pointers to
things that actually exist in memory somewhere. The slice doesn't.

Ack, sorry everyone! Listen to Dave.

I got carried away with examples that went further than my proposal. As far as I can tell from my testing, the examples in the proposal are still accurate.

However, if you create a new variable out of the slice and reverse that, the slice does get its own
storage:

var b = Array(1...10)
var bSlice = b[0..<5]
bSlice.reverse()
bSlice == [5, 4, 3, 2, 1]
b == [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

This doesn't demonstrate anything; you're just showing the effects of
value semantics. If you go

b[0..<5] = bSlice

you'll then have

b == [5, 4, 3, 2, 1, 6, 7, 8, 9, 10]

And that is exactly what happens in the first example.

This is a major flaw that prevents Swift's collection model from being
fully general and efficient. I think we will be fixing it, by fixing
the way inout works, as a consequence of the work on ownership, for
Swift 4. But as noted elsewhere, that's a wild-ass guess at the moment.

Strings and their views work the same way:

var s = "abcdefg"
s.characters.append("H") // storage updated in place
s == "abcdefgH"

var sChars = s.characters // no copy yet
sChars.removeLast() // sChars gets its own copy before the mutation
s == "abcdefgH"
String(sChars) == "abcdefg"

var t = s // no copy yet
t.characters.removeLast() // t gets a new copy here
s == "abcdefgH"
t == "abcdefg"

I don't know the name of the compiler feature that enables this, but
it's a critical part of the way views and slices work.

I wish :-)

:flushed:

···

On Oct 13, 2016, at 1:28 AM, Dave Abrahams via swift-evolution <swift-evolution@swift.org> wrote:
on Wed Oct 12 2016, Nate Cook <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Oct 12, 2016, at 9:32 AM, plx via swift-evolution <swift-evolution@swift.org> wrote:

With that design, it seems like you’d wind up allowing things like the below:

// example A
let foo = [ “abc”: [1,2,3], “efg”: [4,5,6] ]
let bar = foo // shared storage, no COW
foo.values[foo.index(of: “abc”)!].append(789) // update shared storage, no COW

// shared storage mutated,
// despite (a) both being `let` and (b) only foo.values getting touched
foo[“abc”] // [1, 2, 3, 789]
bar[“abc”] // [1, 2, 3, 789]

Example A isn't allowed—if foo and bar are both immutable, both of
their `values` collections are also immutable, so there's no way to
modify their shared storage.

// example B
var foo = [ “abc”: [1,2,3], “efg”: [4,5,6] ]
var bar = foo // shared storage, no COW
foo.values[foo.index(of: “abc”)!].append(789)

// shared storage mutated only foo.values getting touched
foo[“abc”] // [1, 2, 3, 789]
bar[“abc”] // [1, 2, 3, 789]

Example B is incorrect—the mutation at `foo.values[...].append(789)`
triggers a copy of the entire dictionary's underlying storage before
allowing the mutation, since it knows that storage isn't uniquely
referenced.

// example C
var foo = [ “abc”: [1,2,3], “efg”: [4,5,6] ]
var bar = foo
bar[“abc”] = [1, 2, 3, 4] // COW triggered here, no shared storage
foo.values[foo.index(of: “abc”)!].append(789)

// only `foo`’s storage mutated, b/c change to `bar` triggered COW
foo[“abc”] // [1, 2, 3, 789]
bar[“abc”] // [1, 2, 3, 4]

This is the current behavior and would remain the same after the proposed the changes.

…where both A (by itself) and the B/C contrast seem very unwelcome.

Also, even if we assume we only ever make *responsible* use, having
the stdlib include such directly-mutating views would seem likely to
complicate any future concurrency plans.

To reiterate, I think the issue being addressed here is extremely
important…I just don’t think I can get behind this type of solution
(unless I’m grossly misunderstanding its mechanics).

Nate

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

--
-Dave

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

I agree that at least for stdlib purposes there’s something that looks like an explicit choice to make in-place mutation *available*.

What I was trying to explain is whether or not in-place mutation *happens* is a bit implicit. It’s one thing to say that the difference here is just an idiom to know:

  var foo = [0, 1, 2, 3, 4, 5]

  // not in place:
  var slice = foo[1…3]
  slice.reverse() // `foo` not mutated

  // in place:
  foo[1…3].reverse() // `foo` mutated

…but whether or not this function triggers an in-place mutation:

  func reverse<T>(_ slice: inout ArraySlice<T>) {
   slice.reverse()
  }

…depends on how it’s being called:

  var slice = foo[1…3]
  reverse(&slice) // `foo` unchanged
  
  reverse(&foo[1…3]) // `foo` mutated in-place

This seems consistent with the “in-place…or not?” behavior being largely just the usual COW, + the compiler eliding the typical retain/release on any sufficiently-transient slices; e.g. as if:

  // in place:
  foo[1…3].reverse() // `foo` mutated

  // is treated as the equivalent of:
  @noretain var slice = foo[1…3]
  slice.reverse()

…where the @noretain is some (fictional) attribute suppressing the retain/release you’d otherwise trigger when `foo[1…3]` is stored into `slice`.

That’s the mental model it suggests, at least…and it just seemed unlikely that the compiler would be able to propagate something like `@noretain` onto a specific instance variable in a specific instance of a class-based view that captured a reference to the viewed collection’s underlying storage…whence the comment about class-based views. But I’ve been very wrong already today and probably am here as well.

As this is getting off-topic for something that seems like it’ll get postponed until later anyways I’d just like to say thanks again for taking the time to propose this, for correcting my misunderstandings…and that I’m eagerly looking forward to any improvements into COW visibility and any steps towards having more-explicit control over the COW mechanism.

···

On Oct 12, 2016, at 1:11 PM, Károly Lőrentey <karoly@lorentey.hu> wrote:

I believe the implementation of efficient in-place mutation is very explicit in the code -- it is done by implementing DictionaryValue’s subscript using a special “mutableAddressWithNativeOwner” addressor instead of a normal setter.

https://github.com/natecook1000/swift/blob/ed95aec4a20589a3b9c131f43444aa33705343cc/stdlib/public/core/HashedCollections.swift.gyb#L2169-L2173

AFAICU this would also work if DictionaryValue was a reference type.

Unfortunately, as far as I know, these addressors aren’t available outside stdlib, so custom collections cannot currently implement such mutable views (including mutable ranges) in a similarly efficient way.

We can, however, approximate a similar effect outside of stdlib by designing closure-based APIs like `mydict.withValues { values in values[i] = 42 }`, in which the collection moves its storage to the view while the closure is executing (temporarily making its own contents disappear / appear invalid). The syntax and underlying mental model is perhaps not as nice, but (assuming the compiler is able to optimize away the nonescaping closure) we can achieve some of the performance benefits.

On 2016-10-12, at 19:17, plx via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Thanks for the quick reply; given that I’m quite wrong about the important mechanics I rescind my criticisms.

I will say I care about this enough to reply because the inability to do in-place mutation of dictionary values has been an incredibly frustrating limitation and I’d just assumed the situation with slices/views would necessarily have similar issues for similar reasons…but glad to learn it’s not what I thought!

That said, I think efficient in-place mutation is too important to only expose so implicitly (seemingly due to the compiler eliding the otherwise-expected retain increments when the view is sufficiently “transient”…which seems like you perhaps can’t have an "in-place capable" view that’s implemented as a class, I’d think).

But none of this impacts my change to being in support for the proposal.

On Oct 12, 2016, at 10:07 AM, Nate Cook <natecook@gmail.com <mailto:natecook@gmail.com>> wrote:

On Oct 12, 2016, at 9:32 AM, plx via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

The issue addressed is real; I’m not sure this is the best approach.

In particular, unless I’m missing something obvious, the ownership strategy here would have to be:

- `DictionaryKeys` and `DictionaryValues` would each induce the expected +1 retain on the underlying storage
- `DictionaryValues`’s mutations avoid triggering COW on the underlying storage by skipping the usual ownership check

…as otherwise it’s unclear how you’d do those in-place mutations (and this seems to be how the implementation works...is that correct?).

That's not quite right—when you access these views through the dictionary, they do not increment the storage retain count. This is the way slicing and views currently work on other mutable types. For example, when you reverse a slice of an array in-place, the slice doesn't get its own duplicate storage:

var a = Array(1...10)
a[0..<5].reverse()
a == [5, 4, 3, 2, 1, 6, 7, 8, 9, 10]

However, if you create a new variable out of the slice and reverse that, the slice does get its own storage:

var b = Array(1...10)
var bSlice = b[0..<5]
bSlice.reverse()
bSlice == [5, 4, 3, 2, 1]
b == [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Strings and their views work the same way:

var s = "abcdefg"
s.characters.append("H") // storage updated in place
s == "abcdefgH"

var sChars = s.characters // no copy yet
sChars.removeLast() // sChars gets its own copy before the mutation
s == "abcdefgH"
String(sChars) == "abcdefg"

var t = s // no copy yet
t.characters.removeLast() // t gets a new copy here
s == "abcdefgH"
t == "abcdefg"

I don't know the name of the compiler feature that enables this, but it's a critical part of the way views and slices work.

With that design, it seems like you’d wind up allowing things like the below:

  // example A
  let foo = [ “abc”: [1,2,3], “efg”: [4,5,6] ]
  let bar = foo // shared storage, no COW
  foo.values[foo.index(of: “abc”)!].append(789) // update shared storage, no COW

  // shared storage mutated,
  // despite (a) both being `let` and (b) only foo.values getting touched
  foo[“abc”] // [1, 2, 3, 789]
  bar[“abc”] // [1, 2, 3, 789]

Example A isn't allowed—if foo and bar are both immutable, both of their `values` collections are also immutable, so there's no way to modify their shared storage.

  // example B
  var foo = [ “abc”: [1,2,3], “efg”: [4,5,6] ]
  var bar = foo // shared storage, no COW
  foo.values[foo.index(of: “abc”)!].append(789)

  // shared storage mutated only foo.values getting touched
  foo[“abc”] // [1, 2, 3, 789]
  bar[“abc”] // [1, 2, 3, 789]

Example B is incorrect—the mutation at `foo.values[...].append(789)` triggers a copy of the entire dictionary's underlying storage before allowing the mutation, since it knows that storage isn't uniquely referenced.

  // example C
  var foo = [ “abc”: [1,2,3], “efg”: [4,5,6] ]
  var bar = foo
  bar[“abc”] = [1, 2, 3, 4] // COW triggered here, no shared storage
  foo.values[foo.index(of: “abc”)!].append(789)

  // only `foo`’s storage mutated, b/c change to `bar` triggered COW
  foo[“abc”] // [1, 2, 3, 789]
  bar[“abc”] // [1, 2, 3, 4]

This is the current behavior and would remain the same after the proposed the changes.

…where both A (by itself) and the B/C contrast seem very unwelcome.

Also, even if we assume we only ever make *responsible* use, having the stdlib include such directly-mutating views would seem likely to complicate any future concurrency plans.

To reiterate, I think the issue being addressed here is extremely important…I just don’t think I can get behind this type of solution (unless I’m grossly misunderstanding its mechanics).

Nate

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution