[Discussion] Dictionary Key as Index


(Andrew Bennett) #1

I've moved this discussion over from "[swift-evolution] [Proposal]
mapValues" which discusses mapValues, like map but it preserves keys and
transforms values, returning a Dictionary.

I'm in favour mapValues, but I feel like it should be the default behaviour
of map for a Dictionary. Being the default requires changes to protocols
(many already planned), language features (already planned), and addressing
many interface inconsistencies between Array and Dictionary.

---- continuing from the other thread ----

Thanks Brent for the in depth analysis!

As I mentioned at the top: I was focusing on the usage examples not the
implementation. I've made compromises in this implementation to conform to
the existing protocols. Many of these compromises would probably be
addressed by SE0065 (
https://github.com/apple/swift-evolution/blob/master/proposals/0065-collections-move-indices.md
).

Answers inline.

> This is a clarification on what I meant (I haven't had much time to test
it, but I think it's representative):
>
> https://gist.github.com/therealbnut/c223d90a34bb14448b65fc6cc0ec70ac

There are a number of problems with this:

* Given just an `Index`, you need to be able to calculate the next
`Index`, and check if it lies before the `endIndex`. The complexity of this
is hidden in your example because you're relying on a traditional
`DictionaryIndex` to perform these tasks. Thus, the `Index` type in your
dictionary design would be some sort of wrapper-around-the-key, not the key
itself. That's the source of the `subscript(_: Index) -> Value` which
apparently confused you.

This was an implementation detail, see the comment at the top of the file. If

SE0065 is implemented then the previous/next/lastIndex can be found from
the collection in constant amortized time. This is possible after SE0065.

I wasn't confused, but I was unsure of the best approach. Array and
Dictionary probably need different Indexable-style protocols.

The main issue is that Dictionary can get/set an optional Value to
insert/remove values. Array doesn't currently have Optional get/set. Being
Optional doesn't make sense for an Array unless it's sparse. It would be
unexpected if an Array's `.endIndex != .count`.

I also think that, ideally, `Index == Key` for a Dictionary.

There may be an index-like type which allows lookup with a lower constant
overhead (bucket index), but it's likely to only be used by extensions, if
at all.

* Generic `SequenceType` and `CollectionType` operations now use only the

values, not the keys and values. That means that `Array(newDictionary)`
gets you an array of values, `filter` returns an array of values, `reduce`
operates only on the values, etc; all of these operations throw away the
keys. That's not necessarily wrong, but I'm not sure that you're aware of
it.

I'm fine for `Array(newDictionary)` to discard keys, there's always
`Array(newDictionary.enumerate())`. I think it's more consistent for filter
to work on values. I think all collections would also benefit from versions
of map/filter/reduce etc. that also have an Index (so you could get every
second value of an array, for example).

I'd like the protocol to have associatedtype FilterType, for NewDictionary
it would be defined as something like:
    typealias FilterType = NewDictionary<Key,Value>
This would allow filter, reduce, etc. to preserve the keys.

* Your `map` is overloading `SequenceType.map` merely by return type. That

means you can no longer assign it directly to a newly-declared variable, or
use it in many other contexts where the type has to be inferred.

Overloading is an implementation detail at the moment, as mentioned in the
comment at the top of the file. Ideally this would be Self.MapType once we
have generic associatedtypes. For dictionaries:
    typealias MapType<T> = NewDictionary<Key,T>

* Your `enumerate` is similarly overloading by return type. It's also
further confusing an already confused semantic. `enumerate` does not pair
elements with their indices; it pairs them with integers starting at zero.
This *happens* to correspond to their array indices, but that's not
necessarily true of any other Collection. (I'm going to start another
thread about this.)

The overloaded return type is an unfortunate implementation detail, to fit

the existing protocols. The existing enumerate uses (Int,Value), as you
mentioned, i'm not sure if enumerate is changing after SE0065. Changing it
to (Index,Value) would solve the overload issue. I've seen a lot of array
code that uses enumerate() this way currently. I'm not sure what enumerate
is meant to be for if (Index,Value) is inappropriate.

Either way I think all collections would benefit from a property that
exposes a sequence of (Index,Value) pairs.

Basically, in this conception of Dictionary:

* The keys are disposable and the values are the important part—most
operations on a Dictionary would throw away the keys and use only the
values.

Agreed, this may add complexity to generic/default implementations, or
require changes to some of the protocols (my preference is protocol
changes).

* Index still would not be a Key—it would be some kind of instance which

wrapped or converted into a Key.

After SE0065 they can be the same.

* You still would need to name Dictionary-creating `map` and similar

functions differently from their Array-generating cousins.

Only if you want map to return an array. There's a good argument for
Self.MapType etc. once we have generic associatedtype.

* Iterating over keys and values together would require some kind of
Dictionary-specific API, not something that was available on other
collections or sequences.

I think that the existing `enumerate()` should do, if it is changed to
return (Index,Value) rather than (Int,Value). The downside is that
dictionaries would require `.enumerate()` added to many for loops, I agree
that this is not ideal. It may be that I'm misinterpreting enumerate, there
could probably be a shorter property name that would suffice.

···

On Saturday, 16 April 2016, Brent Royal-Gordon <brent@architechies.com> wrote:

Maybe that would be an improvement, but I'm skeptical.

Skeptical is good, it wasn't perfect, as long you have an open mind :slight_smile:

--

Brent Royal-Gordon
Architechies


(Brent Royal-Gordon) #2

* Given just an `Index`, you need to be able to calculate the next `Index`, and check if it lies before the `endIndex`. The complexity of this is hidden in your example because you're relying on a traditional `DictionaryIndex` to perform these tasks. Thus, the `Index` type in your dictionary design would be some sort of wrapper-around-the-key, not the key itself. That's the source of the `subscript(_: Index) -> Value` which apparently confused you.

This was an implementation detail, see the comment at the top of the file. If SE0065 is implemented then the previous/next/lastIndex can be found from the collection in constant amortized time. This is possible after SE0065.

I wasn't confused, but I was unsure of the best approach. Array and Dictionary probably need different Indexable-style protocols.

I suspect that's a non-starter.

The Collection protocol encapsulates this semantic:

  1. The type is a Sequence of finite length which always (unless it's mutated) contains the same elements in the same order.

  2. The Index represents a location in the Sequence.

  3. Indexing with an Index retrieves the element at that location.

  4. You can retrieve the successor index to a given index; sub-protocols provide other ways to transform indices.

  5. You can == one Index with another, and if they match, they refer to the same element.

  6. With SE-0065, you can also < one Index with another to see which one comes earlier in the collection.

Both Array and Dictionary (as well as other types like Set and String (or rather, String's views)) need to support these semantics. Array and String are ordered collections—any element can appear at any index—so the Index is the main means by which we access the collection's elements. Set and Dictionary are not ordered, so we have other ways to access their elements. This difference is

Also note that SE-0065 makes this harder, not easier. Requirement #6—that Index must be Comparable—would require Dictionary keys to be Comparable and Dictionary iteration to happen in sorted order. Since the natural order of Dictionary key access is (roughly) by hashValue, this would make iterating over the Dictionary very slow, or would require a separate, sorted storage for the keys, or something along those lines.

The alternative to that is to disconnect Index from Key. If you do that, then Index is basically just the index of the element in the linear storage backing the Dictionary, and iteration is almost as fast as looping over an Array—you just have to skip over unused buckets. That's a much saner way to handle things. So this:

I also think that, ideally, `Index == Key` for a Dictionary.

Turns out to be incorrect. Iterating over a Dictionary by Key is not a particularly good solution.

Alternatively, you might think of it this way: what you describe here:

There may be an index-like type which allows lookup with a lower constant overhead (bucket index), but it's likely to only be used by extensions, if at all.

*is* what Collection is for.

I'm fine for `Array(newDictionary)` to discard keys, there's always `Array(newDictionary.enumerate())`. I think it's more consistent for filter to work on values. I think all collections would also benefit from versions of map/filter/reduce etc. that also have an Index (so you could get every second value of an array, for example).

As I mentioned (and brought up on another thread for discussion), that's not what `enumerate()` is for.

I'd like the protocol to have associatedtype FilterType, for NewDictionary it would be defined as something like:
    typealias FilterType = NewDictionary<Key,Value>
This would allow filter, reduce, etc. to preserve the keys.

* Your `map` is overloading `SequenceType.map` merely by return type. That means you can no longer assign it directly to a newly-declared variable, or use it in many other contexts where the type has to be inferred.

Overloading is an implementation detail at the moment, as mentioned in the comment at the top of the file. Ideally this would be Self.MapType once we have generic associatedtypes. For dictionaries:
    typealias MapType<T> = NewDictionary<Key,T>

The use of "typealias" here is misleading, because these aren't really typealiases—they're associated types. Generic associated types are not currently a feature of Swift. I'm not sure if they're even in the generics manifesto.

Either way I think all collections would benefit from a property that exposes a sequence of (Index,Value) pairs.

Agreed.

* The keys are disposable and the values are the important part—most operations on a Dictionary would throw away the keys and use only the values.

Agreed, this may add complexity to generic/default implementations, or require changes to some of the protocols (my preference is protocol changes).

I'm not necessarily saying that a Value-oriented Dictionary is wrong, but I'm not sure it's right, either.

* Index still would not be a Key—it would be some kind of instance which wrapped or converted into a Key.

After SE0065 they can be the same.

They can't; see above.

···

--
Brent Royal-Gordon
Architechies