Efficient `Dictionary.mapValues` with key context

:growing_heart: you can read this proposal as formatted markdown on GitHub :growing_heart:


Efficient Dictionary.mapValues with key context

Introduction

I propose adding an overload to Dictionary.mapValues (and OrderedDictionary.mapValues) that passes the Key to the transformation closure.

This enables us to transform dictionary values with their associated key context without incurring the performance cost of rehashing (or in the case of reduce, reallocating) the dictionary storage, which is currently unavoidable when using init(uniqueKeysWithValues:) or reduce(into:).

Motivation

Currently, when it is necessary to compute the mapped dictionary value using the dictionary key, we must do one of the following:

let new: [Key: NewValue] = .init(
    uniqueKeysWithValues: old.lazy.map { ($0, transform(id: $0, payload: $1)) }
) 
// or
let new: [Key: NewValue] = old.reduce(into: [:]) { 
    $0[$1.key] = transform(id: $1.key, payload: $1.value) 
}

These are both highly pessimized patterns due to expensive hashing, although benchmarks frequently show that the first one is slightly “less bad” than the second one due to having fewer intermediate reallocations.

Although users occasionally also want to transform dictionary keys, this proposal is focused on the use case where dictionary keys are never modified and are only used to provide context (such as aggregation parameters) that is not part of the payload values.

Proposed solution

I propose adding the following overload to Dictionary:

extension Dictionary {
    @inlinable public func mapValues<T>(
        _ transform: (Key, Value) throws -> T
    ) rethrows -> Dictionary<Key, T>
}

And similarly for OrderedDictionary in swift-collections:

extension OrderedDictionary {
    @inlinable public func mapValues<T>(
        _ transform: (Key, Value) throws -> T
    ) rethrows -> OrderedDictionary<Key, T>
}

Usage example

let balances: [Currency: Int64] = [.USD: 13, .EUR: 15]
let displayText: [Currency: String] = balances.mapValues { 
    "\($0.alpha3) balance: \($1)"
}

Detailed design

Changes to Dictionary

The implementation would mirror the existing (Value) -> T overload of mapValues but inside the storage iteration loop it would pass the key along with the value to the transformation closure.

On Apple platforms, Dictionary may be backed by a Cocoa dictionary. This does not pose any major issues, as __CocoaDictionary can be retrofitted with essentially the same machinery as _NativeDictionary within the standard library, and the new mapValues can dispatch between the two exactly as the existing mapValues does.

Changes to OrderedDictionary (swift-collections)

The performance gain for OrderedDictionary could be even more significant. OrderedDictionary maintains a standard Array for keys and values, plus a sidecar hash table for lookups.

The current workaround (reduce or init) forces the reconstruction of the entire hash table and an eager copy of the keys array. We could instead use zipped iteration to map the underlying _keys and _values arrays to a new array of values, and then copy the _keys table – which includes the hash table __storage – and is an O(1) copy-on-write if not mutated, or O(n) on later mutation.

Source compatibility

This is an ABI and API-additive change.

Type inference will handle the overloading gracefully based on the closure’s arity:

dictionary.mapValues { v in ... }    // selects existing `(Value) -> T`
dictionary.mapValues { k, v in ... } // selects new `(Key, Value) -> T`

Alternatives considered

Alternative naming

I considered selecting a new name, such as mapPairs or mapContextual, to avoid overload resolution complexity. But Swift generally prefers overloading when the semantics – mapping values while preserving structure – remain identical.

Additional overload for compactMapValues

The new mapValues overload would introduce an API asymmetry with compactMapValues, which would not support key context. I believe this is justified, as compactMapValues is essentially a shorthand for calling reduce(into:), which makes the performance aspect considerably less motivating.

Doing nothing

As an extensively frozen type, it may be possible for developers to retrofit Dictionary in user space to support key context by relying on stable-but-unspecified implementation details. Similarly, the swift-collections package could be forked to add such functionality. But this would not be a good workflow and we should not encourage it.

Future directions

The proposed mapValues overload does not use typed throws, for symmetry with the existing overload. Both overloads could be mirrored with typed throws variants in the future.

17 Likes

This is a source-breaking change for dictionaries whose values are 2-tuples as the following becomes ambiguous:

let dict1: [K: (A, B)] = ...
let dict2 = dict1.mapValues { a, b in ... }

You could avoid the ambiguity by using a different name for the function.

7 Likes

This is subtly false, because the closure takes (Key, Value), not Element. (This causes other problems, but does solve the ambiguity in your exact example.)

That said, the name is not correct anyway.

mapValues is short for map values to values.

What is proposed here is map elements to values. Even adding in a third overload of the also-missing map values to elements would not cause compiler ambiguation, but it would necessitate explicit typing under certain conditions.

let dictionary = [1: 2]
dictionary.mapValues(\.self) // standard library's overload
dictionary.mapValues { $1 } // first overload below
dictionary.mapValues { ($0, $0) } // second overload below
dictionary.mapValues { ($0, $0) } as [_: (_, _)] // standard library's overload
extension Dictionary {
  func mapValues<NewValue>(
    _ transform: (Key, Value) -> NewValue
  ) -> [Key: NewValue] {
    .init(
      uniqueKeysWithValues: map { ($0, transform($0, $1)) }
    )
  }

  func mapValues<NewKey, NewValue>(
    _ transform: (Value) -> (NewKey, NewValue)
  ) -> [NewKey: NewValue] {
    .init(
      uniqueKeysWithValues: map { transform($0.value) }
    )
  }
}

All of these are useful but the shortcut naming should remain only for the existing signature.

And mapElementsToValues should transform Element, not (Key, Value). Otherwise you can't use transformation functions with the most-commonly labeled arguments.

Tuple desugaring allows for the syntax I described. And for Dictionary, Element is just a typealias of (key: Key, value: Value) (the labels don't make a difference in this particular context.)

2 Likes

Earlier, I suggested mapElements (“…to values”, although maybe not so obvious?) to cut the ambiguity.

But I think mapKeyValues could work better still, as one might just understand it as short for map key-values to values.

it’s much less of a problem than you are possibly imagining, the example you gave would not become ambiguous, the non-splatting overload would be preferred. moreover, type context would generally ensure that the correct overload is selected.

the only possible source break is in the rare scenario where types are being implicitly propagated through multiple levels of type inference, such as h below, which relies on the two-step type inference through the implicitly-typed variable g.

extension Dictionary {
    func f<T>(_ transform: (Key, Value) throws -> T) rethrows -> [Key: T] { [:] }
    func f<T>(_ transform: (Value) throws -> T) rethrows -> [Key: T] { [:] }
}

let d: [Int: (Int, Int)] = [:]
let e: [Int: (Int, Int)] = d.f { a, b in b }
let f: [Int: Int] = d.f { a, b in b }
let g = d.f { a, b in b }
let h: [Int: (Int, Int)] = g

Definitely a big +1 for having this as a built-in feature!

Personally, I'd prefer not to just mirror the existing overload, but instead, introduce this new mapValues function with a typed throw. This would give us two clear variants right off the bat – one that doesn’t throw, and another that rethrows.

That way, the throwable variant would only need to be added to the current mapValues implementation.

Which would be source-breaking, yes.

Given this function:

func f() {
  let dict: [Int: (Int, Double)] = [
    1: (2, 3.0)
  ]
  let dict2 = dict.mapValues { x, y in
    return String(describing: y)
  }
  print(dict2)
}

The output today is:

[1: "3.0"]

With this overload added, the output will silently change to:

[1: "(2, 3.0)"]

This specific example is contrived, obviously, but the point is that there would be a source-breaking change. The obvious solution is to just pick a different name such as mapValuesForKeys(_:). :slightly_smiling_face:

7 Likes

The most gnarly kind of source breakage, where existing working code would still compile but silently behave differently :frowning:

Indeed, a non-compiling ambiguity with a straightforward way of disambiguating might be tolerable if there are good reasons for the other overload, but compiling a silent change in behavior would (IMO) be a dealbreaker.

Now, whether @_disfavoredOverload would cure this, that might be something to investigate...

I'd suggest a variation of that: mapKeyedValues. A longer version of that might be mapKeyValuePairsToValues, but that's wordy and not very nice. I think the addition of "keyed" preserves the sense that values are being mapped over but with additional information about each one.

9 Likes

I think that just opens us up to confusing behaviour in the opposite direction, where we pick the 2-tuple-value overload when a developer writing new code doesn't expect that to happen.

Either way, somebody reading the code is going to see it as ambiguous even if the compiler doesn't, and that's a good enough reason to pick a different name IMHO.

I've harped enough, so I'll shush now. :slightly_smiling_face: For what it's worth, I am in favour of adding this API.

5 Likes

to bikeshed a little, i feel that mapValuesWithKeys is slightly better, if only to preserve the alphabetical prefix ordering with mapValues. i’ve come to accept IDE and documentation tooling relies a lot on alphabetical orderings, and i’ve learned to swim with the grain when it comes to naming.

i don’t feel that mapKeyedValues is a good name, it doesn’t evoke the idea that the keys are present and available, only that the values being passed have some associated key somewhere else in the data structure, and is just as true of mapValues as with the proposed overload.

6 Likes

This is what I’ve always named my (less performant) version of this API. To me it seems to do a good job of expressing that the values are getting mapped, just with some extra available info. Also, in my experience it is common to need to upgrade a normal mapValues to this new API, and it’s nice that doing so would be purely additive, textually speaking. If not for the ambiguity issues with splatting I think I would much prefer the overload

That sounds fine to me, with the one issue that it raises the question as to whether the arguments are to be given in the order (key, value) or (value, key).

(One other reason for having or wondering about the reversed order, given that we already have mapValues, might be that a user can write some code that maps only values using $0, then just start using $1 rather than rewriting the whole thing if they realize later on or halfway through writing that they also need to use the corresponding key.)

4 Likes

that’s a really keen observation, i find myself doing the same when designing closure APIs. i lean towards yes, we should reverse the arguments (among other things, failure to use the key parameter will raise an unused anonymous closure argument error, which serves as a Do You Really Need This signal), but it feels kind of “too clever”, so i’ll let other reviewers to chime in before committing to it.

Having lived with +dictionaryWithObjectsAndKeys:, I’m on the “too clever” side. It would just annoy people in practice, especially since we have “key, value” pairs elsewhere in Dictionary’s API.

8 Likes

You have not tried your own example. Please do and then get back to us with your findings.

Yes, I have in fact tried it and copied and pasted the output here from an actual run. :upside_down_face:

Jonathan’s observation is correct, as i acknowledged a few posts back. as someone who enforces a full type annotation coding style, i felt that it was an extremely rare edge scenario, but enough people disagreed that i conceded the point. also, the silently changing behavior is indeed disqualifying.

The key, again, is in the labeling.

let dictionary = [1: (2, 3.0)]
#expect(dictionary.mapValues { "\($1)" } == [1: "3.0"])
func transform(_ element: [Int: (Int, Double)].Element) -> String {
  "\(element.value)"
}
#expect(dictionary.mapValues(transform) == [1: "(2, 3.0)"])
#expect(dictionary.mapValues { "\($0.value.1)" } == [1: "3.0"])

#expect([1: 2].mapValues { "\($1)" } == [1: "2"])
extension Dictionary {
  func mapValues<NewValue>(
    _ transform: (Element) -> NewValue
  ) -> [Key: NewValue] {
    .init(
      uniqueKeysWithValues: map { ($0.key, transform($0)) }
    )
  }
}

Using Element removes the source breakage problem and is more versatile in the way I previously described. But I still don't think mapValues is the right name.

For example, using a different name supports this, whereas an overload will be unusable due to ambiguation:

func transform<Key, Value>(_ element: [Key: Value].Element) -> String {
  "\(element.value)"
}
#expect(dictionary.mapValues(transform) == [1: "3.0"])
#expect(dictionary.mapElementsToValues(transform) == [1: "(2, 3.0)"])

Yeah, I also agree it feels too clever and would rather not.

It is plausible enough a design, though, for the reasons enumerated above (including, as you point out, Obj-C precedent) that if we don't want (value, key) ordering, some name which mentions keys before values is preferable to one that mentions values before keys.

This would mean explicitly not trying to alphabetize this API alongside mapValues—which, IMO, is totally fine so long as all of these variations on "map" are listed roughly together (this would rule out something like withKeysMapValues, but those are obviously not good candidates).