[Pitch] Add `mapValues` method to Dictionary


(Honza Dvorsky) #1

Hello everyone,

I have added a very simple, but powerful method into a Dictionary extension
on multiple projects in the last weeks, so I'd like to bring up the idea of
adding it into the standard library, in case other people can see its
benefits as well.

Currently, Dictionary conforms to Collection with its Element being the
tuple of Key and Value. Thus transforming the Dictionary with regular map
results in [T], whereas I'd find it more useful to also have a method which
results in [Key:T].

Let me present an example of where this makes sense.

I recently used the GitHub API to crawl some information about
repositories. I started with just names (e.g. "/apple/swift",
"/apple/llvm") and fetched a JSON response for each of the repos, each
returning a dictionary, which got saved into one large dictionary as the
end of the full operation, keyed by its name, so the structure was
something like

{
  "/apple/swift": { "url":..., "size":...., "homepage":... },
  "/apple/llvm": { "url":..., "size":...., "homepage":... },
  ...
}

To perform analysis, I just needed a dictionary mapping the name of the
repository to its size, freeing me to discard the rest of the results.
This is where things get interesting, because you can't keep this action
nicely functional anymore. I had to do the following:

let repos: [String: JSON] = ...
var sizes: [String: Int] = [:]
for (key, value) in repos {
  sizes[key] = value["size"].int
}
// use sizes...

Which isn't a huge amount of work, but it creates unnecessary mutable state
in your transformation pipeline (and your current scope). And I had to
write it enough times to justify bringing it up on this list.

I suggest we add the following method to Dictionary:

extension Dictionary {
      public func mapValues<T>(_ transform: @noescape (Value) throws -> T)
rethrows -> [Key: T] {
        var transformed: [Key: T] = [:]
        for (key, value) in self {
            transformed[key] = try transform(value)
        }
        return transformed
    }
}

It is modeled after Collection's `map` function, with the difference that
a) only values are transformed, instead of the Key,Value tuple and
b) the returned structure is a transformed Dictionary [Key:T], instead of
[T]

This now allows a much nicer workflow:

let repos: [String: JSON] = ...
var sizes = repos.mapValues { $0["size"].int }
// use sizes...

and even multi-step transformations on Dictionaries, previously only
possible on Arrays, e.g.
var descriptionTextLengths = repos.mapValues { $0["description"].string
}.mapValues { $0.characters.count }

You get the idea.

What do you think? I welcome all feedback, I'd like to see if people would
support it before I write a proper proposal.

Thanks! :slight_smile:
Honza Dvorsky


(Brent Royal-Gordon) #2

What do you think? I welcome all feedback, I'd like to see if people would support it before I write a proper proposal.

This has come up a couple of times, I believe most recently in the April 12 thread "[Proposal] mapValues". It always gets substantial, but not universal, support.

If I were you, I would search the list archives for previous threads about mapping dictionaries and see which arguments have come up before. If they change your mind, great. If they don't, they'll make up your proposal's "Alternatives Considered" section.

···

--
Brent Royal-Gordon
Architechies


(Honza Dvorsky) #3

Hi Brent,

thanks, I should have caught that, unfortunately I don't know of a nice way
to search the mailing list, other than searching your own inbox, which
doesn't include messages from the time you weren't subscribed - how do you
do it?

You're right, I found the previous conversation from over a month ago,
where almost everyone +1'd, but it hasn't moved anywhere since, as far as I
know. Or is there a proposal already? (Nothing was linked from the thread).

So apologies for the duplication, but let's take this as a signal to revive
that conversation, as I haven't found any responses which were negative, I
mostly saw positive ones and ones that didn't feel strongly about it.

Honza

···

On Sat, May 21, 2016 at 12:58 PM Brent Royal-Gordon <brent@architechies.com> wrote:

> What do you think? I welcome all feedback, I'd like to see if people
would support it before I write a proper proposal.

This has come up a couple of times, I believe most recently in the April
12 thread "[Proposal] mapValues". It always gets substantial, but not
universal, support.

If I were you, I would search the list archives for previous threads about
mapping dictionaries and see which arguments have come up before. If they
change your mind, great. If they don't, they'll make up your proposal's
"Alternatives Considered" section.

--
Brent Royal-Gordon
Architechies


(Haravikk) #4

I think that before this can be done there needs to be an abstraction of what a Dictionary is, for example a Map<Key, Value> protocol. This would allow us to also implement the important lazy variations of what you suggest, which would likely be more important for very large dictionaries as dictionaries are rarely consumed in their entirety; in other words, calculating and storing the transformed value for every key/value pair is quite a performance overhead if only a fraction of those keys may actually be accessed. Even if you are consuming the whole transformed dictionary the lazy version is better since it doesn’t store any intermediate values, you only really want a fully transformed dictionary if you know the transformation is either very costly, or transformed values will be accessed frequently.

Anyway, long way of saying that while the specific implementation is definitely wanted, the complete solution requires a few extra steps which should be done too, as lazy computation can have big performance benefits.

That and it’d be nice to have a Map protocol in stdlib for defining other map types, such as trees, since these don’t require Hashable keys, but dictionaries do.

···

On 21 May 2016, at 11:27, Honza Dvorsky via swift-evolution <swift-evolution@swift.org> wrote:

Hello everyone,

I have added a very simple, but powerful method into a Dictionary extension on multiple projects in the last weeks, so I'd like to bring up the idea of adding it into the standard library, in case other people can see its benefits as well.

Currently, Dictionary conforms to Collection with its Element being the tuple of Key and Value. Thus transforming the Dictionary with regular map results in [T], whereas I'd find it more useful to also have a method which results in [Key:T].

Let me present an example of where this makes sense.

I recently used the GitHub API to crawl some information about repositories. I started with just names (e.g. "/apple/swift", "/apple/llvm") and fetched a JSON response for each of the repos, each returning a dictionary, which got saved into one large dictionary as the end of the full operation, keyed by its name, so the structure was something like

{
  "/apple/swift": { "url":..., "size":...., "homepage":... },
  "/apple/llvm": { "url":..., "size":...., "homepage":... },
  ...
}

To perform analysis, I just needed a dictionary mapping the name of the repository to its size, freeing me to discard the rest of the results.
This is where things get interesting, because you can't keep this action nicely functional anymore. I had to do the following:

let repos: [String: JSON] = ...
var sizes: [String: Int] = [:]
for (key, value) in repos {
  sizes[key] = value["size"].int
}
// use sizes...

Which isn't a huge amount of work, but it creates unnecessary mutable state in your transformation pipeline (and your current scope). And I had to write it enough times to justify bringing it up on this list.

I suggest we add the following method to Dictionary:

extension Dictionary {
      public func mapValues<T>(_ transform: @noescape (Value) throws -> T) rethrows -> [Key: T] {
        var transformed: [Key: T] = [:]
        for (key, value) in self {
            transformed[key] = try transform(value)
        }
        return transformed
    }
}

It is modeled after Collection's `map` function, with the difference that
a) only values are transformed, instead of the Key,Value tuple and
b) the returned structure is a transformed Dictionary [Key:T], instead of [T]

This now allows a much nicer workflow:

let repos: [String: JSON] = ...
var sizes = repos.mapValues { $0["size"].int }
// use sizes...

and even multi-step transformations on Dictionaries, previously only possible on Arrays, e.g.
var descriptionTextLengths = repos.mapValues { $0["description"].string }.mapValues { $0.characters.count }

You get the idea.

What do you think? I welcome all feedback, I'd like to see if people would support it before I write a proper proposal.

Thanks! :slight_smile:
Honza Dvorsky

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Tim Vermeulen) #5

I really like this idea, because indeed this wasn’t possible functionally before. I have a small remark though, wouldn’t it be better to let transform be of type (Key, Value) throws -> T instead of (Value) throws -> T? You can just ignore the key (with _) if you don’t need it, but I think it might come in handy in some cases.

···

Hello everyone,

I have added a very simple, but powerful method into a Dictionary extension on multiple projects in the last weeks, so I'd like to bring up the idea of adding it into the standard library, in case other people can see its benefits as well.

Currently, Dictionary conforms to Collection with its Element being the tuple of Key and Value. Thus transforming the Dictionary with regular map results in [T], whereas I'd find it more useful to also have a method which results in [Key:T].

Let me present an example of where this makes sense.

I recently used the GitHub API to crawl some information about repositories. I started with just names (e.g. "/apple/swift", "/apple/llvm") and fetched a JSON response for each of the repos, each returning a dictionary, which got saved into one large dictionary as the end of the full operation, keyed by its name, so the structure was something like

{
"/apple/swift": { "url":..., "size":...., "homepage":... },
"/apple/llvm": { "url":..., "size":...., "homepage":... },
...
}

To perform analysis, I just needed a dictionary mapping the name of the repository to its size, freeing me to discard the rest of the results.
This is where things get interesting, because you can't keep this action nicely functional anymore. I had to do the following:

let repos: [String: JSON] = ...
var sizes: [String: Int] = [:]
for (key, value) in repos {
sizes[key] = value["size"].int
}
// use sizes...

Which isn't a huge amount of work, but it creates unnecessary mutable state in your transformation pipeline (and your current scope). And I had to write it enough times to justify bringing it up on this list.

I suggest we add the following method to Dictionary:

extension Dictionary {
public func mapValues<T>(_ transform: @noescape (Value) throws ->T) rethrows ->[Key: T] {
var transformed: [Key: T] = [:]
for (key, value) in self {
transformed[key] = try transform(value)
}
return transformed
}
}

It is modeled after Collection's `map` function, with the difference that
a) only values are transformed, instead of the Key,Value tuple and
b) the returned structure is a transformed Dictionary [Key:T], instead of [T]

This now allows a much nicer workflow:

let repos: [String: JSON] = ...
var sizes = repos.mapValues { $0["size"].int }
// use sizes...

and even multi-step transformations on Dictionaries, previously only possible on Arrays, e.g.
var descriptionTextLengths = repos.mapValues { $0["description"].string }.mapValues { $0.characters.count }

You get the idea.

What do you think? I welcome all feedback, I'd like to see if people would support it before I write a proper proposal.

Thanks! :slight_smile:
Honza Dvorsky


(Matthew Johnson) #6

I think that before this can be done there needs to be an abstraction of what a Dictionary is, for example a Map<Key, Value> protocol. This would allow us to also implement the important lazy variations of what you suggest, which would likely be more important for very large dictionaries as dictionaries are rarely consumed in their entirety; in other words, calculating and storing the transformed value for every key/value pair is quite a performance overhead if only a fraction of those keys may actually be accessed. Even if you are consuming the whole transformed dictionary the lazy version is better since it doesn’t store any intermediate values, you only really want a fully transformed dictionary if you know the transformation is either very costly, or transformed values will be accessed frequently.

Anyway, long way of saying that while the specific implementation is definitely wanted, the complete solution requires a few extra steps which should be done too, as lazy computation can have big performance benefits.

That and it’d be nice to have a Map protocol in stdlib for defining other map types, such as trees, since these don’t require Hashable keys, but dictionaries do.

+1 to defining map abstractions in the standard library (separating read only from read write). The value associatedtype should not take a position on optionality, allowing for maps which have a valid value for all possible keys. I have done similar things in other languages and found it extremely useful. It is not uncommon to have code that just needs to read and / or write to / from a map without having concern for the implementation of the map.

One issue I think we should sort out along side this is some kind of abstraction which allows code to use functions or user-defined types without regard for which it is accessing. The map abstraction would build on this abstraction, allowing single argument functions to be viewed as a read only map.

One option is to allow functions to conform to protocols that only have subscript { get } requirements (we would probably only allow them to be subscripted through the protocol interface). I think this feels like the most Swifty direction.

Another option is to take the path I have seen in several languages which is to allow overloading of the function call "operator". I originally wanted this in Swift but now wonder if the first option might be a better way to accomplish the same goals.

-Matthew

···

Sent from my iPad

On May 21, 2016, at 8:45 AM, Haravikk via swift-evolution <swift-evolution@swift.org> wrote:

On 21 May 2016, at 11:27, Honza Dvorsky via swift-evolution <swift-evolution@swift.org> wrote:

Hello everyone,

I have added a very simple, but powerful method into a Dictionary extension on multiple projects in the last weeks, so I'd like to bring up the idea of adding it into the standard library, in case other people can see its benefits as well.

Currently, Dictionary conforms to Collection with its Element being the tuple of Key and Value. Thus transforming the Dictionary with regular map results in [T], whereas I'd find it more useful to also have a method which results in [Key:T].

Let me present an example of where this makes sense.

I recently used the GitHub API to crawl some information about repositories. I started with just names (e.g. "/apple/swift", "/apple/llvm") and fetched a JSON response for each of the repos, each returning a dictionary, which got saved into one large dictionary as the end of the full operation, keyed by its name, so the structure was something like

{
"/apple/swift": { "url":..., "size":...., "homepage":... },
"/apple/llvm": { "url":..., "size":...., "homepage":... },
...
}

To perform analysis, I just needed a dictionary mapping the name of the repository to its size, freeing me to discard the rest of the results.
This is where things get interesting, because you can't keep this action nicely functional anymore. I had to do the following:

let repos: [String: JSON] = ...
var sizes: [String: Int] = [:]
for (key, value) in repos {
sizes[key] = value["size"].int
}
// use sizes...

Which isn't a huge amount of work, but it creates unnecessary mutable state in your transformation pipeline (and your current scope). And I had to write it enough times to justify bringing it up on this list.

I suggest we add the following method to Dictionary:

extension Dictionary {
     public func mapValues<T>(_ transform: @noescape (Value) throws -> T) rethrows -> [Key: T] {
       var transformed: [Key: T] = [:]
       for (key, value) in self {
           transformed[key] = try transform(value)
       }
       return transformed
   }
}

It is modeled after Collection's `map` function, with the difference that
a) only values are transformed, instead of the Key,Value tuple and
b) the returned structure is a transformed Dictionary [Key:T], instead of [T]

This now allows a much nicer workflow:

let repos: [String: JSON] = ...
var sizes = repos.mapValues { $0["size"].int }
// use sizes...

and even multi-step transformations on Dictionaries, previously only possible on Arrays, e.g.
var descriptionTextLengths = repos.mapValues { $0["description"].string }.mapValues { $0.characters.count }

You get the idea.

What do you think? I welcome all feedback, I'd like to see if people would support it before I write a proper proposal.

Thanks! :slight_smile:
Honza Dvorsky

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Honza Dvorsky) #7

While I agree that it'd be nice to add a Map abstraction into which we
could move a lot of the Dictionary-ness, my original pitch is *just* about
adding the specific implementation of `mapValues` in its regular, non-lazy
form. My example was about only keeping a subset of the information in
memory in a Dictionary to allow for quick and frequent access (lazy goes
against that). I think it'd be better to get that in first, or at least
evaluate that separately from a comprehensive refactoring of the
Dictionary, which would just accumulate more opinions and slow this
specific step down.

If one of you have specific ideas about the potential Map protocol, I
encourage you to start a separate thread for that, to focus the
conversation on the parameters of what it would look like.

I guess I'm now asking - would you support a proposal for adding the basic
mapValues function as the first step, with the potential extendability to a
Map protocol allowing for a lazy version? Because I'd like to keep the
proposal as focused as possible to increase the chance of an on-point
discussion.

Thanks,
Honza

···

On Sat, May 21, 2016 at 3:27 PM Matthew Johnson <matthew@anandabits.com> wrote:

Sent from my iPad

> On May 21, 2016, at 8:45 AM, Haravikk via swift-evolution < > swift-evolution@swift.org> wrote:
>
> I think that before this can be done there needs to be an abstraction of
what a Dictionary is, for example a Map<Key, Value> protocol. This would
allow us to also implement the important lazy variations of what you
suggest, which would likely be more important for very large dictionaries
as dictionaries are rarely consumed in their entirety; in other words,
calculating and storing the transformed value for every key/value pair is
quite a performance overhead if only a fraction of those keys may actually
be accessed. Even if you are consuming the whole transformed dictionary the
lazy version is better since it doesn’t store any intermediate values, you
only really want a fully transformed dictionary if you know the
transformation is either very costly, or transformed values will be
accessed frequently.
>
> Anyway, long way of saying that while the specific implementation is
definitely wanted, the complete solution requires a few extra steps which
should be done too, as lazy computation can have big performance benefits.
>
> That and it’d be nice to have a Map protocol in stdlib for defining
other map types, such as trees, since these don’t require Hashable keys,
but dictionaries do.

+1 to defining map abstractions in the standard library (separating read
only from read write). The value associatedtype should not take a position
on optionality, allowing for maps which have a valid value for all possible
keys. I have done similar things in other languages and found it extremely
useful. It is not uncommon to have code that just needs to read and / or
write to / from a map without having concern for the implementation of the
map.

One issue I think we should sort out along side this is some kind of
abstraction which allows code to use functions or user-defined types
without regard for which it is accessing. The map abstraction would build
on this abstraction, allowing single argument functions to be viewed as a
read only map.

One option is to allow functions to conform to protocols that only have
subscript { get } requirements (we would probably only allow them to be
subscripted through the protocol interface). I think this feels like the
most Swifty direction.

Another option is to take the path I have seen in several languages which
is to allow overloading of the function call "operator". I originally
wanted this in Swift but now wonder if the first option might be a better
way to accomplish the same goals.

-Matthew

>
>> On 21 May 2016, at 11:27, Honza Dvorsky via swift-evolution < > swift-evolution@swift.org> wrote:
>>
>> Hello everyone,
>>
>> I have added a very simple, but powerful method into a Dictionary
extension on multiple projects in the last weeks, so I'd like to bring up
the idea of adding it into the standard library, in case other people can
see its benefits as well.
>>
>> Currently, Dictionary conforms to Collection with its Element being the
tuple of Key and Value. Thus transforming the Dictionary with regular map
results in [T], whereas I'd find it more useful to also have a method which
results in [Key:T].
>>
>> Let me present an example of where this makes sense.
>>
>> I recently used the GitHub API to crawl some information about
repositories. I started with just names (e.g. "/apple/swift",
"/apple/llvm") and fetched a JSON response for each of the repos, each
returning a dictionary, which got saved into one large dictionary as the
end of the full operation, keyed by its name, so the structure was
something like
>>
>> {
>> "/apple/swift": { "url":..., "size":...., "homepage":... },
>> "/apple/llvm": { "url":..., "size":...., "homepage":... },
>> ...
>> }
>>
>> To perform analysis, I just needed a dictionary mapping the name of the
repository to its size, freeing me to discard the rest of the results.
>> This is where things get interesting, because you can't keep this
action nicely functional anymore. I had to do the following:
>>
>> let repos: [String: JSON] = ...
>> var sizes: [String: Int] = [:]
>> for (key, value) in repos {
>> sizes[key] = value["size"].int
>> }
>> // use sizes...
>>
>> Which isn't a huge amount of work, but it creates unnecessary mutable
state in your transformation pipeline (and your current scope). And I had
to write it enough times to justify bringing it up on this list.
>>
>> I suggest we add the following method to Dictionary:
>>
>> extension Dictionary {
>> public func mapValues<T>(_ transform: @noescape (Value) throws ->
T) rethrows -> [Key: T] {
>> var transformed: [Key: T] = [:]
>> for (key, value) in self {
>> transformed[key] = try transform(value)
>> }
>> return transformed
>> }
>> }
>>
>> It is modeled after Collection's `map` function, with the difference
that
>> a) only values are transformed, instead of the Key,Value tuple and
>> b) the returned structure is a transformed Dictionary [Key:T], instead
of [T]
>>
>> This now allows a much nicer workflow:
>>
>> let repos: [String: JSON] = ...
>> var sizes = repos.mapValues { $0["size"].int }
>> // use sizes...
>>
>> and even multi-step transformations on Dictionaries, previously only
possible on Arrays, e.g.
>> var descriptionTextLengths = repos.mapValues { $0["description"].string
}.mapValues { $0.characters.count }
>>
>> You get the idea.
>>
>> What do you think? I welcome all feedback, I'd like to see if people
would support it before I write a proper proposal.
>>
>> Thanks! :slight_smile:
>> Honza Dvorsky
>>
>> _______________________________________________
>> swift-evolution mailing list
>> swift-evolution@swift.org
>> https://lists.swift.org/mailman/listinfo/swift-evolution
>
> _______________________________________________
> swift-evolution mailing list
> swift-evolution@swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution


(Dave Abrahams) #8

Try the search box at the bottom of
http://news.gmane.org/gmane.comp.lang.swift.evolution

HTH,

···

on Sat May 21 2016, Honza Dvorsky <swift-evolution@swift.org> wrote:

Hi Brent,

thanks, I should have caught that, unfortunately I don't know of a
nice way to search the mailing list, other than searching your own
inbox, which doesn't include messages from the time you weren't
subscribed - how do you do it?

--
-Dave


(Brent Royal-Gordon) #9

I have a small remark though, wouldn’t it be better to let transform be of type (Key, Value) throws -> T instead of (Value) throws -> T? You can just ignore the key (with _) if you don’t need it, but I think it might come in handy in some cases.

The problem is, that closes the door to writing many simple maps in functional style. For instance, this:

  dictionaryOfNumbers.mapValues(abs)

Would have to become this:

  dictionaryOfNumbers.mapValues { _, v in abs(v) }

(It *might* be possible to do it with `$1`, but I'm not sure; there are some limitations around that.)

A value-value map is just simpler and cleaner, while almost always giving you what you need.

···

--
Brent Royal-Gordon
Architechies


(Haravikk) #10

For this the key/value pair constructor for Dictionary would be the better option, since the .map() method essentially already lets you do this. Adding key transformation to .mapValues() would only make it less specialised (it’s called .mapValues after all ;), plus it would be incompatible with any simultaneous/future lazy implementation since you can’t just transform the keys lazily (you also need a way to transform them back if you intend to use them for lookups).

···

On 23 May 2016, at 13:03, Tim Vermeulen via swift-evolution <swift-evolution@swift.org> wrote:

I really like this idea, because indeed this wasn’t possible functionally before. I have a small remark though, wouldn’t it be better to let transform be of type (Key, Value) throws -> T instead of (Value) throws -> T? You can just ignore the key (with _) if you don’t need it, but I think it might come in handy in some cases.


(Matthew Johnson) #11

While I agree that it'd be nice to add a Map abstraction into which we could move a lot of the Dictionary-ness, my original pitch is *just* about adding the specific implementation of `mapValues` in its regular, non-lazy form. My example was about only keeping a subset of the information in memory in a Dictionary to allow for quick and frequent access (lazy goes against that). I think it'd be better to get that in first, or at least evaluate that separately from a comprehensive refactoring of the Dictionary, which would just accumulate more opinions and slow this specific step down.

If one of you have specific ideas about the potential Map protocol, I encourage you to start a separate thread for that, to focus the conversation on the parameters of what it would look like.

I guess I'm now asking - would you support a proposal for adding the basic mapValues function as the first step, with the potential extendability to a Map protocol allowing for a lazy version? Because I'd like to keep the proposal as focused as possible to increase the chance of an on-point discussion.

Yeah, sorry for the digression. I would support it. I don't think we'll see the Map abstraction is Swift 3 and it would be very useful on its own.

···

Sent from my iPad

On May 21, 2016, at 9:47 AM, Honza Dvorsky <jan.dvorsky@me.com> wrote:

Thanks,
Honza

On Sat, May 21, 2016 at 3:27 PM Matthew Johnson <matthew@anandabits.com> wrote:

Sent from my iPad

> On May 21, 2016, at 8:45 AM, Haravikk via swift-evolution <swift-evolution@swift.org> wrote:
>
> I think that before this can be done there needs to be an abstraction of what a Dictionary is, for example a Map<Key, Value> protocol. This would allow us to also implement the important lazy variations of what you suggest, which would likely be more important for very large dictionaries as dictionaries are rarely consumed in their entirety; in other words, calculating and storing the transformed value for every key/value pair is quite a performance overhead if only a fraction of those keys may actually be accessed. Even if you are consuming the whole transformed dictionary the lazy version is better since it doesn’t store any intermediate values, you only really want a fully transformed dictionary if you know the transformation is either very costly, or transformed values will be accessed frequently.
>
> Anyway, long way of saying that while the specific implementation is definitely wanted, the complete solution requires a few extra steps which should be done too, as lazy computation can have big performance benefits.
>
> That and it’d be nice to have a Map protocol in stdlib for defining other map types, such as trees, since these don’t require Hashable keys, but dictionaries do.

+1 to defining map abstractions in the standard library (separating read only from read write). The value associatedtype should not take a position on optionality, allowing for maps which have a valid value for all possible keys. I have done similar things in other languages and found it extremely useful. It is not uncommon to have code that just needs to read and / or write to / from a map without having concern for the implementation of the map.

One issue I think we should sort out along side this is some kind of abstraction which allows code to use functions or user-defined types without regard for which it is accessing. The map abstraction would build on this abstraction, allowing single argument functions to be viewed as a read only map.

One option is to allow functions to conform to protocols that only have subscript { get } requirements (we would probably only allow them to be subscripted through the protocol interface). I think this feels like the most Swifty direction.

Another option is to take the path I have seen in several languages which is to allow overloading of the function call "operator". I originally wanted this in Swift but now wonder if the first option might be a better way to accomplish the same goals.

-Matthew

>
>> On 21 May 2016, at 11:27, Honza Dvorsky via swift-evolution <swift-evolution@swift.org> wrote:
>>
>> Hello everyone,
>>
>> I have added a very simple, but powerful method into a Dictionary extension on multiple projects in the last weeks, so I'd like to bring up the idea of adding it into the standard library, in case other people can see its benefits as well.
>>
>> Currently, Dictionary conforms to Collection with its Element being the tuple of Key and Value. Thus transforming the Dictionary with regular map results in [T], whereas I'd find it more useful to also have a method which results in [Key:T].
>>
>> Let me present an example of where this makes sense.
>>
>> I recently used the GitHub API to crawl some information about repositories. I started with just names (e.g. "/apple/swift", "/apple/llvm") and fetched a JSON response for each of the repos, each returning a dictionary, which got saved into one large dictionary as the end of the full operation, keyed by its name, so the structure was something like
>>
>> {
>> "/apple/swift": { "url":..., "size":...., "homepage":... },
>> "/apple/llvm": { "url":..., "size":...., "homepage":... },
>> ...
>> }
>>
>> To perform analysis, I just needed a dictionary mapping the name of the repository to its size, freeing me to discard the rest of the results.
>> This is where things get interesting, because you can't keep this action nicely functional anymore. I had to do the following:
>>
>> let repos: [String: JSON] = ...
>> var sizes: [String: Int] = [:]
>> for (key, value) in repos {
>> sizes[key] = value["size"].int
>> }
>> // use sizes...
>>
>> Which isn't a huge amount of work, but it creates unnecessary mutable state in your transformation pipeline (and your current scope). And I had to write it enough times to justify bringing it up on this list.
>>
>> I suggest we add the following method to Dictionary:
>>
>> extension Dictionary {
>> public func mapValues<T>(_ transform: @noescape (Value) throws -> T) rethrows -> [Key: T] {
>> var transformed: [Key: T] = [:]
>> for (key, value) in self {
>> transformed[key] = try transform(value)
>> }
>> return transformed
>> }
>> }
>>
>> It is modeled after Collection's `map` function, with the difference that
>> a) only values are transformed, instead of the Key,Value tuple and
>> b) the returned structure is a transformed Dictionary [Key:T], instead of [T]
>>
>> This now allows a much nicer workflow:
>>
>> let repos: [String: JSON] = ...
>> var sizes = repos.mapValues { $0["size"].int }
>> // use sizes...
>>
>> and even multi-step transformations on Dictionaries, previously only possible on Arrays, e.g.
>> var descriptionTextLengths = repos.mapValues { $0["description"].string }.mapValues { $0.characters.count }
>>
>> You get the idea.
>>
>> What do you think? I welcome all feedback, I'd like to see if people would support it before I write a proper proposal.
>>
>> Thanks! :slight_smile:
>> Honza Dvorsky
>>
>> _______________________________________________
>> swift-evolution mailing list
>> swift-evolution@swift.org
>> https://lists.swift.org/mailman/listinfo/swift-evolution
>
> _______________________________________________
> swift-evolution mailing list
> swift-evolution@swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution


(Honza Dvorsky) #12

Great! And FWIW I thought the same thing, the Map abstraction might be a
too large of a task at this time (but I'll definitely support it if it
eventually comes up).

···

On Sat, May 21, 2016 at 3:54 PM Matthew Johnson <matthew@anandabits.com> wrote:

Sent from my iPad

On May 21, 2016, at 9:47 AM, Honza Dvorsky <jan.dvorsky@me.com> wrote:

While I agree that it'd be nice to add a Map abstraction into which we
could move a lot of the Dictionary-ness, my original pitch is *just* about
adding the specific implementation of `mapValues` in its regular, non-lazy
form. My example was about only keeping a subset of the information in
memory in a Dictionary to allow for quick and frequent access (lazy goes
against that). I think it'd be better to get that in first, or at least
evaluate that separately from a comprehensive refactoring of the
Dictionary, which would just accumulate more opinions and slow this
specific step down.

If one of you have specific ideas about the potential Map protocol, I
encourage you to start a separate thread for that, to focus the
conversation on the parameters of what it would look like.

I guess I'm now asking - would you support a proposal for adding the basic
mapValues function as the first step, with the potential extendability to a
Map protocol allowing for a lazy version? Because I'd like to keep the
proposal as focused as possible to increase the chance of an on-point
discussion.

Yeah, sorry for the digression. I would support it. I don't think we'll
see the Map abstraction is Swift 3 and it would be very useful on its own.

Thanks,
Honza

On Sat, May 21, 2016 at 3:27 PM Matthew Johnson <matthew@anandabits.com> > wrote:

Sent from my iPad

> On May 21, 2016, at 8:45 AM, Haravikk via swift-evolution < >> swift-evolution@swift.org> wrote:
>
> I think that before this can be done there needs to be an abstraction
of what a Dictionary is, for example a Map<Key, Value> protocol. This would
allow us to also implement the important lazy variations of what you
suggest, which would likely be more important for very large dictionaries
as dictionaries are rarely consumed in their entirety; in other words,
calculating and storing the transformed value for every key/value pair is
quite a performance overhead if only a fraction of those keys may actually
be accessed. Even if you are consuming the whole transformed dictionary the
lazy version is better since it doesn’t store any intermediate values, you
only really want a fully transformed dictionary if you know the
transformation is either very costly, or transformed values will be
accessed frequently.
>
> Anyway, long way of saying that while the specific implementation is
definitely wanted, the complete solution requires a few extra steps which
should be done too, as lazy computation can have big performance benefits.
>
> That and it’d be nice to have a Map protocol in stdlib for defining
other map types, such as trees, since these don’t require Hashable keys,
but dictionaries do.

+1 to defining map abstractions in the standard library (separating read
only from read write). The value associatedtype should not take a position
on optionality, allowing for maps which have a valid value for all possible
keys. I have done similar things in other languages and found it extremely
useful. It is not uncommon to have code that just needs to read and / or
write to / from a map without having concern for the implementation of the
map.

One issue I think we should sort out along side this is some kind of
abstraction which allows code to use functions or user-defined types
without regard for which it is accessing. The map abstraction would build
on this abstraction, allowing single argument functions to be viewed as a
read only map.

One option is to allow functions to conform to protocols that only have
subscript { get } requirements (we would probably only allow them to be
subscripted through the protocol interface). I think this feels like the
most Swifty direction.

Another option is to take the path I have seen in several languages which
is to allow overloading of the function call "operator". I originally
wanted this in Swift but now wonder if the first option might be a better
way to accomplish the same goals.

-Matthew

>
>> On 21 May 2016, at 11:27, Honza Dvorsky via swift-evolution < >> swift-evolution@swift.org> wrote:
>>
>> Hello everyone,
>>
>> I have added a very simple, but powerful method into a Dictionary
extension on multiple projects in the last weeks, so I'd like to bring up
the idea of adding it into the standard library, in case other people can
see its benefits as well.
>>
>> Currently, Dictionary conforms to Collection with its Element being
the tuple of Key and Value. Thus transforming the Dictionary with regular
map results in [T], whereas I'd find it more useful to also have a method
which results in [Key:T].
>>
>> Let me present an example of where this makes sense.
>>
>> I recently used the GitHub API to crawl some information about
repositories. I started with just names (e.g. "/apple/swift",
"/apple/llvm") and fetched a JSON response for each of the repos, each
returning a dictionary, which got saved into one large dictionary as the
end of the full operation, keyed by its name, so the structure was
something like
>>
>> {
>> "/apple/swift": { "url":..., "size":...., "homepage":... },
>> "/apple/llvm": { "url":..., "size":...., "homepage":... },
>> ...
>> }
>>
>> To perform analysis, I just needed a dictionary mapping the name of
the repository to its size, freeing me to discard the rest of the results.
>> This is where things get interesting, because you can't keep this
action nicely functional anymore. I had to do the following:
>>
>> let repos: [String: JSON] = ...
>> var sizes: [String: Int] = [:]
>> for (key, value) in repos {
>> sizes[key] = value["size"].int
>> }
>> // use sizes...
>>
>> Which isn't a huge amount of work, but it creates unnecessary mutable
state in your transformation pipeline (and your current scope). And I had
to write it enough times to justify bringing it up on this list.
>>
>> I suggest we add the following method to Dictionary:
>>
>> extension Dictionary {
>> public func mapValues<T>(_ transform: @noescape (Value) throws ->
T) rethrows -> [Key: T] {
>> var transformed: [Key: T] = [:]
>> for (key, value) in self {
>> transformed[key] = try transform(value)
>> }
>> return transformed
>> }
>> }
>>
>> It is modeled after Collection's `map` function, with the difference
that
>> a) only values are transformed, instead of the Key,Value tuple and
>> b) the returned structure is a transformed Dictionary [Key:T], instead
of [T]
>>
>> This now allows a much nicer workflow:
>>
>> let repos: [String: JSON] = ...
>> var sizes = repos.mapValues { $0["size"].int }
>> // use sizes...
>>
>> and even multi-step transformations on Dictionaries, previously only
possible on Arrays, e.g.
>> var descriptionTextLengths = repos.mapValues {
$0["description"].string }.mapValues { $0.characters.count }
>>
>> You get the idea.
>>
>> What do you think? I welcome all feedback, I'd like to see if people
would support it before I write a proper proposal.
>>
>> Thanks! :slight_smile:
>> Honza Dvorsky
>>
>> _______________________________________________
>> swift-evolution mailing list
>> swift-evolution@swift.org
>> https://lists.swift.org/mailman/listinfo/swift-evolution
>
> _______________________________________________
> swift-evolution mailing list
> swift-evolution@swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution


(Haravikk) #13

Sorry, my point was that I think it’s better to wait until we can also do the lazy equivalent and have both done together, otherwise we end up with one map function that can work both lazily and one that never does. Sure that will require a refactoring into a protocol, but it seems to me that it’s better to do that as the first step, then add the feature after that. In the mean time extensions have you well covered for convenience.

Another alternative to this feature might be to add a key/value pair constructor to Dictionary (it technically already has one, but it’s variadic only) so you could do something like this:

  let myTransformedDictionary = Dictionary(myIntegerDictionary.lazy.map { ($0, $1 + 5) })

Since this would be a useful initialiser both now and in future. I dunno, it’s just my opinion, but I find it a bit weird to get half the implementation now, and could lead to misunderstandings with people trying to do myMap.lazy.mapValues (won’t be recognised) and wondering why there isn’t one.

···

On 21 May 2016, at 15:47, Honza Dvorsky <jan.dvorsky@me.com> wrote:

While I agree that it'd be nice to add a Map abstraction into which we could move a lot of the Dictionary-ness, my original pitch is *just* about adding the specific implementation of `mapValues` in its regular, non-lazy form. My example was about only keeping a subset of the information in memory in a Dictionary to allow for quick and frequent access (lazy goes against that). I think it'd be better to get that in first, or at least evaluate that separately from a comprehensive refactoring of the Dictionary, which would just accumulate more opinions and slow this specific step down.


(Honza Dvorsky) #14

Thanks Dave, I have to learn how to use all these mailing list tools :slight_smile:

Honza

···

On Sun, May 22, 2016 at 8:24 PM Dave Abrahams via swift-evolution < swift-evolution@swift.org> wrote:

on Sat May 21 2016, Honza Dvorsky <swift-evolution@swift.org> wrote:

> Hi Brent,
>
> thanks, I should have caught that, unfortunately I don't know of a
> nice way to search the mailing list, other than searching your own
> inbox, which doesn't include messages from the time you weren't
> subscribed - how do you do it?

Try the search box at the bottom of
http://news.gmane.org/gmane.comp.lang.swift.evolution

HTH,

--
-Dave

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Brent Royal-Gordon) #15

For this the key/value pair constructor for Dictionary would be the better option, since the .map() method essentially already lets you do this. Adding key transformation to .mapValues() would only make it less specialised (it’s called .mapValues after all ;), plus it would be incompatible with any simultaneous/future lazy implementation since you can’t just transform the keys lazily (you also need a way to transform them back if you intend to use them for lookups).

He was suggesting the transform be of type (Key, Value) -> NewValue; that is, it would be able to see what the key was, but it wouldn't be able to transform the key.

···

--
Brent Royal-Gordon
Architechies


(Dan Appel) #16

A value-value map is just simpler and cleaner, while almost always giving

you what you need.
+1

···

On Mon, May 23, 2016 at 10:59 PM Brent Royal-Gordon via swift-evolution < swift-evolution@swift.org> wrote:

> I have a small remark though, wouldn’t it be better to let transform be
of type (Key, Value) throws -> T instead of (Value) throws -> T? You can
just ignore the key (with _) if you don’t need it, but I think it might
come in handy in some cases.

The problem is, that closes the door to writing many simple maps in
functional style. For instance, this:

        dictionaryOfNumbers.mapValues(abs)

Would have to become this:

        dictionaryOfNumbers.mapValues { _, v in abs(v) }

(It *might* be possible to do it with `$1`, but I'm not sure; there are
some limitations around that.)

A value-value map is just simpler and cleaner, while almost always giving
you what you need.

--
Brent Royal-Gordon
Architechies

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

--
Dan Appel


(Matthew Johnson) #17

I have a small remark though, wouldn’t it be better to let transform be of type (Key, Value) throws -> T instead of (Value) throws -> T? You can just ignore the key (with _) if you don’t need it, but I think it might come in handy in some cases.

The problem is, that closes the door to writing many simple maps in functional style. For instance, this:

   dictionaryOfNumbers.mapValues(abs)

Would have to become this:

   dictionaryOfNumbers.mapValues { _, v in abs(v) }

(It *might* be possible to do it with `$1`, but I'm not sure; there are some limitations around that.)

A value-value map is just simpler and cleaner, while almost always giving you what you need.

+1.

I don't think I have ever mapped keys. Incidentally, that doesn't have the usual semantics of a map operation as you can produce duplicate keys.

···

Sent from my iPad
On May 24, 2016, at 12:59 AM, Brent Royal-Gordon via swift-evolution <swift-evolution@swift.org> wrote:

--
Brent Royal-Gordon
Architechies

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Haravikk) #18

Whoops right you are, sorry! Hmm, in that case, any examples that need the key for the transformation? I usually view keys as arbitrary, so needing them for this seems a little strange (hence my misunderstanding, at least that’s the reason I’m going to stand by in favour of “I can’t read” :wink:

···

On 24 May 2016, at 11:50, Brent Royal-Gordon <brent@architechies.com> wrote:

For this the key/value pair constructor for Dictionary would be the better option, since the .map() method essentially already lets you do this. Adding key transformation to .mapValues() would only make it less specialised (it’s called .mapValues after all ;), plus it would be incompatible with any simultaneous/future lazy implementation since you can’t just transform the keys lazily (you also need a way to transform them back if you intend to use them for lookups).

He was suggesting the transform be of type (Key, Value) -> NewValue; that is, it would be able to see what the key was, but it wouldn't be able to transform the key.


(Honza Dvorsky) #19

I see your point, I agree it would definitely be nicer to have both
variants, but is that a rule that *all* methods on Collection and
Dictionary etc also have lazy variants? I actually don't know (please tell
me if you do).

The question here is not really whether to add a lazy variant later (I
don't think anyone would object to that), but as that would require quite
substantial refactoring and abstraction change into a Map protocol (which
will definitely gather a lot of feedback and could drag on for months), I
don't see why we can't add the regular variant already and benefit from it
now. Another option is that there won't be demand for the lazy variant, in
which case it makes even less sense to block the regular variant right now
(and could even hurt this proposal).

Maybe I'm missing some information about the core team requiring always
adding both non-lazy and lazy variants, if so, please do tell me so that I
can re-evaluate my approach. If not, I'd like to keep the thread focused
strictly on the one method I'm proposing we add, for the benefits and
examples I provided.

The Dictionary initializers could slightly help things, but it'd require
nesting if you do more than one level, whereas `mapValues` would allow for
multiple transformations to be applied in sequence without additional
nesting (my example from the original pitch would turn into `var
descriptionTextLengths = Dictionary(Dictionary(repos.mapValues {
$0["description"].string }).mapValues { $0.characters.count })` which is
much uglier in my opinion (and kind of breaks the natural left-to-right
composition).

···

On Sat, May 21, 2016 at 8:01 PM Haravikk <swift-evolution@haravikk.me> wrote:

On 21 May 2016, at 15:47, Honza Dvorsky <jan.dvorsky@me.com> wrote:

While I agree that it'd be nice to add a Map abstraction into which we
could move a lot of the Dictionary-ness, my original pitch is *just* about
adding the specific implementation of `mapValues` in its regular, non-lazy
form. My example was about only keeping a subset of the information in
memory in a Dictionary to allow for quick and frequent access (lazy goes
against that). I think it'd be better to get that in first, or at least
evaluate that separately from a comprehensive refactoring of the
Dictionary, which would just accumulate more opinions and slow this
specific step down.

Sorry, my point was that I think it’s better to wait until we can also do
the lazy equivalent and have both done together, otherwise we end up with
one map function that can work both lazily and one that never does. Sure
that will require a refactoring into a protocol, but it seems to me that
it’s better to do that as the first step, then add the feature after that.
In the mean time extensions have you well covered for convenience.

Another alternative to this feature might be to add a key/value pair
constructor to Dictionary (it technically already has one, but it’s
variadic only) so you could do something like this:

let myTransformedDictionary = Dictionary(myIntegerDictionary.lazy.map {
($0, $1 + 5) })

Since this would be a useful initialiser both now and in future. I dunno,
it’s just my opinion, but I find it a bit weird to get half the
implementation now, and could lead to misunderstandings with people trying
to do myMap.lazy.mapValues (won’t be recognised) and wondering why there
isn’t one.


(Dan Appel) #20

Correct me if I'm wrong, but isn't part of the goal of lazy collections to
be able to prefix a whole chain of operations with .lazy and have it still
compile (and be more efficient)? If so, then the core team would probably
want both variants in the standard library.

···

On Sat, May 21, 2016 at 12:16 PM Honza Dvorsky via swift-evolution < swift-evolution@swift.org> wrote:

I see your point, I agree it would definitely be nicer to have both
variants, but is that a rule that *all* methods on Collection and
Dictionary etc also have lazy variants? I actually don't know (please tell
me if you do).

The question here is not really whether to add a lazy variant later (I
don't think anyone would object to that), but as that would require quite
substantial refactoring and abstraction change into a Map protocol (which
will definitely gather a lot of feedback and could drag on for months), I
don't see why we can't add the regular variant already and benefit from it
now. Another option is that there won't be demand for the lazy variant, in
which case it makes even less sense to block the regular variant right now
(and could even hurt this proposal).

Maybe I'm missing some information about the core team requiring always
adding both non-lazy and lazy variants, if so, please do tell me so that I
can re-evaluate my approach. If not, I'd like to keep the thread focused
strictly on the one method I'm proposing we add, for the benefits and
examples I provided.

The Dictionary initializers could slightly help things, but it'd require
nesting if you do more than one level, whereas `mapValues` would allow for
multiple transformations to be applied in sequence without additional
nesting (my example from the original pitch would turn into `var
descriptionTextLengths = Dictionary(Dictionary(repos.mapValues {
$0["description"].string }).mapValues { $0.characters.count })` which is
much uglier in my opinion (and kind of breaks the natural left-to-right
composition).

On Sat, May 21, 2016 at 8:01 PM Haravikk <swift-evolution@haravikk.me> > wrote:

On 21 May 2016, at 15:47, Honza Dvorsky <jan.dvorsky@me.com> wrote:

While I agree that it'd be nice to add a Map abstraction into which we
could move a lot of the Dictionary-ness, my original pitch is *just* about
adding the specific implementation of `mapValues` in its regular, non-lazy
form. My example was about only keeping a subset of the information in
memory in a Dictionary to allow for quick and frequent access (lazy goes
against that). I think it'd be better to get that in first, or at least
evaluate that separately from a comprehensive refactoring of the
Dictionary, which would just accumulate more opinions and slow this
specific step down.

Sorry, my point was that I think it’s better to wait until we can also do
the lazy equivalent and have both done together, otherwise we end up with
one map function that can work both lazily and one that never does. Sure
that will require a refactoring into a protocol, but it seems to me that
it’s better to do that as the first step, then add the feature after that.
In the mean time extensions have you well covered for convenience.

Another alternative to this feature might be to add a key/value pair
constructor to Dictionary (it technically already has one, but it’s
variadic only) so you could do something like this:

let myTransformedDictionary = Dictionary(myIntegerDictionary.lazy.map {
($0, $1 + 5) })

Since this would be a useful initialiser both now and in future. I dunno,
it’s just my opinion, but I find it a bit weird to get half the
implementation now, and could lead to misunderstandings with people trying
to do myMap.lazy.mapValues (won’t be recognised) and wondering why there
isn’t one.

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

--
Dan Appel