[Pitch] Sequence grouped(by:) and keyed(by:)

lorentey · June 16, 2023, 10:45pm

I and some other engineers working on the stdlib-adjacent packages discussed this and we agree these would be helpful additions in swift-algorithms or swift-collections (and eventually, the Standard Library). Discoverability via code completion is my personal favorite reason for adding these.

I agree that ideally, the chainable Sequence methods would support returning an arbitrary Dictionary type, by e.g. taking the result type as an argument:

(0 ... 9).grouped(into: TreeDictionary<Int, Deque<Int>>.self) { $0 % 3 }

However, this is currently blocked on the lack of a mutable DictionaryProtocol. We now have enough examples of dictionary types to start designing dictionary protocols (likely starting by prototyping them within swift-collections), but until we get that done, we should follow the stdlib's existing practice of simply returning the canonical type, which is Dictionary in this case -- as @AlexanderM suggests in their original post.

At this time I think it would be premature to add specific variants that return other types, such as OrderedDictionary, TreeDictionary, or SortedDictionary. We should rather wait for the dictionary protocols to land, or include designing more generic variants of these as part of the protocol design effort. (In the meantime, people can continue to use the existing initializers, which aren't going away.)

It seems clear to me that we want the default grouped(by:) method to return a standard Dictionary, just like the default map(_:) returns an Array. The language gives special treatment to these two types, by dedicating custom syntax for them -- [Foo], [Foo: Bar] -- and the existing stdlib APIs reinforce this by publishing all the greedy sequence algorithms that are hardwired to return arrays.

As related work, I think it would be interesting to start experimenting with generalizations of the stdlib's map, flatMap, compactMap, filter etc. algorithms that take a RangeReplaceableCollection type to use as the type of the result, so that mapping things into custom collections would be easier:

(0 ..< 26).map(into: Data.self) { $0 + 65 }
"Café 🍽️".unicodeScalars.flatMap(into: String.self) { $0.escaped(asASCII: true) } 
// → "Caf\u{00E9} \u{0001F37D}\u{FE0F}"

Aside: the standard Dictionary.init(grouping:by:) is itself hardwiring Array as the value type of the resulting dictionary. That too ought to be more flexible -- e.g, in the grouped(into:by:) example earlier, I asked to get an ordered dictionary of deques. Is that going to feel too flexible, though? The only good way to tell is to try using it in practice.

For what it's worth, swift-collections has already started generalizing its own versions of this initializer, and grouped(by:) probably should do that from the start.

swift-algorithms seems like the most natural home for prototyping these new Sequence extensions.

(Annoyingly, swift-collections is likely the best place for prototyping DictionaryProtocol, but it isn't clear if we'd want grouped(into:by:)/keyed(into:by:) to live there, too. Perhaps DictionaryProtocol ought to be targeted for addition to the stdlib as soon as we have something good to propose.)

PRs to swift-algorithms for these are welcome! Sequence.grouped(by:) and Sequence.keyed(by:) can be added in one PR; if someone wants to experiment with variants of map/flatMap/etc with a generic result, then that could be done in another.

Once these get into a swift-algorithms release, we can start collecting example use cases to strengthen the case of adding them to the Standard Library.

I don't think this is true. For what it's worth, I continue to consider these packages as temporarily embarrassed parts of the Swift Standard Library, exactly as I looked at them when they were initially published. It is of course probable that some package constructs will never realize this ambition; and I expect constructs will not get added in the same order they were originally introduced.

Some of these packages have now existed for 2-3 or so years; this ought to be a good amount of time for people to gain experience with using the original constructs. I believe forum pitches and subsequent proposal reviews will be a great way to formally collect people's feedback about this experience. (I personally expect this will be a much more productive use of everyone's time than having us argue on the imagined pros and cons of APIs we never actually tried using in practice!) The issues filed on the packages themselves are part of this feedback loop too.

For convenience shortcuts like grouped(by:) and keyed(by:), I think the primary benefit of previewing them in a package is precisely to streamline the eventual proposal process -- if we can clearly demonstrate the utility of these based on year(s?) of widespread use in a standard package, then I hope that will help avoid some of the more predictable/tedious arguments that tend to pop up during such reviews. (Admittedly perhaps replacing them with objections that the package is already good enough.)

I (and presumably the other maintainers of these Standard Library-adjacent packages as well) would be very eager to start proposing individual constructs for adoption in the stdlib. However, I don't think we should hurry up and randomly add arbitrary things to the stdlib without a clear reason to do so.

Perhaps most importantly, in some of these cases we're still blocked on the language maturing enough to allow us to replace the temporary stand-ins that ship in these packages. swift-atomics is the most obvious example for this (not having atomics/locks in the stdlib is getting more painful with every passing day), but even for types in swift-collections and swift-algorithms, ideally we should hold off moving them into the stdlib until the dust settles on non-copiable types. Prematurely moving them would almost certainly make it more difficult to add support for non-copiable elements later.