[Pitch] Sequence grouped(by:) and keyed(by:)

AlexanderM · June 11, 2023, 7:34pm

Hey Swift Community!

I propose adding new APIs to simplfy the process of building dictionaries from Sequences:

let digitsGroupedByMod3 = (0...9).grouped(by: { $0 % 3 })
// Results in:
[
	0: [0, 3, 6, 9],
	1: [1, 4, 7],
	2: [2, 5, 8],
]

let fruitsByFirstLetter = ["Apple", "Banana", "Cherry"].keyed(by: { $0.first! })
// Results in:
[
	"A": "Apple",
	"B": "Banana",
	"C": "Cherry",
]

These methods are simpler, clearer, and play nice with method chaining, as compared to the current initializers on Dictionary.

For more motivation and detail, here's the full proposal.
^{(Please submit any formatting/typo corrections via comments right on the PR, to keep this thread focused its substance)}

Here's my sample implementation.

Open questions:

Should this target swift-algorithms, or is it useful enough to be in the Standard Library (via swift-standard-library-preview)?
grouped(by:) & keyed(by:) versus groupedBy() & keyedBy()
and more? :D

Would love to hear your thoughts, and if this is something you'd ran into in your own projects!

davedelong · June 11, 2023, 11:27pm

grogu.gif YES YES YES YES YES YES YES

I have these in my own personal libraries and I use them ALL THE TIME. It is so much nicer to do:

let existingValues: Array<ModelValue> = ...
let existingValuesByID = existingValues.keyed(by: \.id)

This is so much more succinct and clearer than the built-in approach:

let existingValues: Array<ModelValue> = ...
let keysAndValues = existingValues.map { ($0.id, $0) }
let existingValuesByID = Dictionary(uniqueKeysWithValues: keysAndValues)

Grouping is similarly useful, especially when I have a single array of values and need to divide it up in to sections for showing in a List or UITableView, etc.

davedelong · June 11, 2023, 11:28pm

To answer your questions:

Standard library. I literally put this in EVERY SINGLE PROJECT I do.

grouped(by:), so it matches keyed(by:)

I see in your implementation that both the keyed(by:) and grouped(by:) implementations require a non-optional key to be produced for bucketing the original value. I have sometimes found it useful to allow a nil value to be returned. That would result in the value being omitted from the returned dictionary. If that's too controversial it's fine to leave it out, but I mention it for the sake of completeness.

bbrk24 · June 11, 2023, 11:38pm

I thought the lazy algorithms weren’t rethrows?

AlexanderM · June 12, 2023, 12:41am

Same. But usually not immediately. I use the Dictionary initializers a few times (each time thinking "I probably won't need more than this"), before I see there's tons of usages and go to finally add it in.

The two would match. I was referring to this considered alternative. I'll edit my post to clarify!

Interesting! I hadn't considered this. I'd love to hear what others think about it.

AlexanderM · June 12, 2023, 12:54am

I was considering this. Perhaps there should be an overload that takes a collection to add elements into? You can give it an empty Dictionary, OrderedDictionary, or even a partially filled Dictionary you want to merge into:

extension Sequence {
	func grouped<GroupKey: Hashable, Result>(
		by keyForValue: (Element) throws -> GroupKey,
		into collection: Result = [GroupKey: [Element]]()
	) rethrows -> Result
}

You're right! That's just an implementation detail though, you can have a different implementation entirely, or even two two separate overloads (throwing vs non-throwing).

AlexanderM · June 12, 2023, 4:09pm

Could I get some more feedback on how to choose between targeting the stdlib vs swift-algorithms?

wowbagger · June 13, 2023, 12:22am

A relevant post from a few years ago:

taylorswift · June 13, 2023, 1:50am

if i remember correctly, the original idea was for swift-algorithms, swift-collections, swift-numerics, etc. to be “incubators” for additions to the standard library so that these APIs could get some real-world usage experience without committing them to the standard library (where there was originally anticipated to be a higher expectation of stability.)

this model broke down for a variety of reasons:

pushing legislation through evolution is a harrowing ordeal compared to adding things to third party packages, so types tend to accumulate in the packages and never “graduate”.
there’s no organized effort to actually collect “real-world experience” from library users through feedback.
these packages have a non-trivial number of users, so in practice they end up being almost as “stable” as the standard library and hard to change.
the standard library and the Swift module are not the same thing, and many have found it more productive to work on toolchain modules in the standard library (e.g. Regex, _Concurrency, etc.) which provide a lot of the same benefits as adding things to Swift (e.g. only have to think about one version tuple, don’t have to download anything from the internet to run CI, etc.) but fewer of the downsides (e.g. making swift harder to use on embedded, high expectation of stability, etc.)

we don’t have a toolchain module for generic algorithms (StringProcessing?), but i think we really should, and this proposal would be an excellent reason to start one.

pilky · June 14, 2023, 8:21am

+1 to this. I also have my own keyed(by:) method that I add to every project so this would be great to have in the standard library

lorentey · June 16, 2023, 10:45pm

I and some other engineers working on the stdlib-adjacent packages discussed this and we agree these would be helpful additions in swift-algorithms or swift-collections (and eventually, the Standard Library). Discoverability via code completion is my personal favorite reason for adding these.

I agree that ideally, the chainable Sequence methods would support returning an arbitrary Dictionary type, by e.g. taking the result type as an argument:

(0 ... 9).grouped(into: TreeDictionary<Int, Deque<Int>>.self) { $0 % 3 }

However, this is currently blocked on the lack of a mutable DictionaryProtocol. We now have enough examples of dictionary types to start designing dictionary protocols (likely starting by prototyping them within swift-collections), but until we get that done, we should follow the stdlib's existing practice of simply returning the canonical type, which is Dictionary in this case -- as @AlexanderM suggests in their original post.

At this time I think it would be premature to add specific variants that return other types, such as OrderedDictionary, TreeDictionary, or SortedDictionary. We should rather wait for the dictionary protocols to land, or include designing more generic variants of these as part of the protocol design effort. (In the meantime, people can continue to use the existing initializers, which aren't going away.)

It seems clear to me that we want the default grouped(by:) method to return a standard Dictionary, just like the default map(_:) returns an Array. The language gives special treatment to these two types, by dedicating custom syntax for them -- [Foo], [Foo: Bar] -- and the existing stdlib APIs reinforce this by publishing all the greedy sequence algorithms that are hardwired to return arrays.

As related work, I think it would be interesting to start experimenting with generalizations of the stdlib's map, flatMap, compactMap, filter etc. algorithms that take a RangeReplaceableCollection type to use as the type of the result, so that mapping things into custom collections would be easier:

(0 ..< 26).map(into: Data.self) { $0 + 65 }
"Café 🍽️".unicodeScalars.flatMap(into: String.self) { $0.escaped(asASCII: true) } 
// → "Caf\u{00E9} \u{0001F37D}\u{FE0F}"

Aside: the standard Dictionary.init(grouping:by:) is itself hardwiring Array as the value type of the resulting dictionary. That too ought to be more flexible -- e.g, in the grouped(into:by:) example earlier, I asked to get an ordered dictionary of deques. Is that going to feel too flexible, though? The only good way to tell is to try using it in practice.

For what it's worth, swift-collections has already started generalizing its own versions of this initializer, and grouped(by:) probably should do that from the start.

swift-algorithms seems like the most natural home for prototyping these new Sequence extensions.

(Annoyingly, swift-collections is likely the best place for prototyping DictionaryProtocol, but it isn't clear if we'd want grouped(into:by:)/keyed(into:by:) to live there, too. Perhaps DictionaryProtocol ought to be targeted for addition to the stdlib as soon as we have something good to propose.)

PRs to swift-algorithms for these are welcome! Sequence.grouped(by:) and Sequence.keyed(by:) can be added in one PR; if someone wants to experiment with variants of map/flatMap/etc with a generic result, then that could be done in another.

Once these get into a swift-algorithms release, we can start collecting example use cases to strengthen the case of adding them to the Standard Library.

I don't think this is true. For what it's worth, I continue to consider these packages as temporarily embarrassed parts of the Swift Standard Library, exactly as I looked at them when they were initially published. It is of course probable that some package constructs will never realize this ambition; and I expect constructs will not get added in the same order they were originally introduced.

Some of these packages have now existed for 2-3 or so years; this ought to be a good amount of time for people to gain experience with using the original constructs. I believe forum pitches and subsequent proposal reviews will be a great way to formally collect people's feedback about this experience. (I personally expect this will be a much more productive use of everyone's time than having us argue on the imagined pros and cons of APIs we never actually tried using in practice!) The issues filed on the packages themselves are part of this feedback loop too.

For convenience shortcuts like grouped(by:) and keyed(by:), I think the primary benefit of previewing them in a package is precisely to streamline the eventual proposal process -- if we can clearly demonstrate the utility of these based on year(s?) of widespread use in a standard package, then I hope that will help avoid some of the more predictable/tedious arguments that tend to pop up during such reviews. (Admittedly perhaps replacing them with objections that the package is already good enough.)

I (and presumably the other maintainers of these Standard Library-adjacent packages as well) would be very eager to start proposing individual constructs for adoption in the stdlib. However, I don't think we should hurry up and randomly add arbitrary things to the stdlib without a clear reason to do so.

Perhaps most importantly, in some of these cases we're still blocked on the language maturing enough to allow us to replace the temporary stand-ins that ship in these packages. swift-atomics is the most obvious example for this (not having atomics/locks in the stdlib is getting more painful with every passing day), but even for types in swift-collections and swift-algorithms, ideally we should hold off moving them into the stdlib until the dust settles on non-copiable types. Prematurely moving them would almost certainly make it more difficult to add support for non-copiable elements later.

AlexanderM · June 19, 2023, 10:26pm

I'd suggest taking an instance instead. You can still pass an empty instance, but this has the added utility of letting you accumulate into a pre-existing instance you might have. This would unlock the ability to use the convenience of map/filter/etc. in performance sensitive contexts, where you can skip the allocation (because you can reuse one "scratch" collection over and over).

I have some free time this week, so I'll take a crack at this! Where should I reach out if I have any questions or need support?

scanon · June 20, 2023, 12:52am

Some combination of @lorentey, @nnnnnnnn, and I should be able to help.

AlexanderM · June 21, 2023, 9:21pm

In the abstract, I like the idea of breaking up the complex behaviours into smaller orthogonal components, but I think the wordiness and hit to discoverability isn't worth it.

In a similar vein, I found that if grouped(by:) is implemented in a way that returns lazy groupings (instead of eager Arrays), then the keyed(by:) API is superfluous. This:

fruit.keyed(by: { $0.first! }, uniquingKeysWith: { old, new in new })

Could be instead expressed as:

fruit.grouped(by: { $0.first! })
    .transformValues { group in group.last! }

But the complexity, hit to readability and discoveraability isn't worth it, IMO.

AlexanderM · June 21, 2023, 9:22pm

I've opened a PR to swift-algorithms!

Would you like to have a look?

github.com/apple/swift-algorithms

Add Grouped and Keyed

apple:main ← amomchilov:grouped-and-keyed

opened 08:51PM - 21 Jun 23 UTC

amomchilov

+377 -1

### Description Pitch: https://forums.swift.org/t/pitch-sequence-grouped-by-a…nd-keyed-by/65511 Swift Evolution proposal: https://github.com/apple/swift-evolution/pull/2078/files?short_path=2767014 ### Detailed Design Include any additional information about the design here. At minimum, show any new API: ```swift extension Sequence { func grouped<GroupKey>( by keyForValue: (Element) throws -> GroupKey ) rethrows -> [GroupKey: [Element]] func keyed<Key>( by keyForValue: (Element) -> Key, uniquingKeysWith combine: ((Key, Element, Element) throws -> Element)? = nil ) rethrows -> [Key: Element] } ``` ### Documentation Plan I've updated new `Grouping.md` and `Keyed.md` guides, and updated README to point to thme. ### Test Plan I wrote tests loosely based on the Standard Library's [tests for `Dictionary.init(grouping:by:)`](https://github.com/apple/swift/blob/4d1d8a9de5ebc132a17aee9fc267461facf89bf8/validation-test/stdlib/Dictionary.swift#L1974-L1988). The Standard Library has a magical `expectCrashLater()` which lets you test fatalErrors and such. Without it, I'm not sure how to test properly test the `fatalError()` this PR introduces to `keyed(by:)`. ### Source Impact Purely additive. ### Checklist - [x] I've added at least one test that validates that my change is working, if appropriate - [x] I've followed the code style of the rest of the project - [x] I've read the [Contribution Guidelines](../../blob/main/CONTRIBUTING.md) - [x] I've updated the documentation if necessary

AlexanderM · October 12, 2023, 4:44pm

Heya guys! This fizzled out a bit.

The implementation is done, documents, and (IMO) high quality and production-ready. What are the next steps to get it reviewed, improved and merged?

John_McCall · October 30, 2023, 5:15pm

Hi, Alexander; sorry for the slow response. The standard library team would like to move forward with this just in the swift-algorithms package for now, so it doesn't need official evolution approval.

AlexanderM · October 30, 2023, 5:47pm

Sounds good to me!

Any suggestions on how do I should get reviewer attention to move this forward?

AlexanderM · November 7, 2023, 9:16pm

Wooo! It's merged! https://github.com/apple/swift-algorithms/pull/197

It'll be available in the next release of swift-algorithms, perhaps 1.1.1 or 1.2.0. Thanks for the reviews and getting this baby shipped!

scanon · November 7, 2023, 9:57pm

This is new public API, so it'll get a new minor version. I'll work with Nate to get 1.2.0 tagged.