Add GroupBy method to Sequence

tomhamming · June 17, 2020, 5:29pm

Introduction

Early in my career I worked a lot with C#, and I found the IEnumerable<T> extension methods extremely useful. I think Swift would benefit from adding some equivalents. This pitch is for GroupBy, which would be a method on Sequence that takes a transformation closure to produce a key from each element and returns an array of Group objects that amount to all values that share one value of a key.

Motivation

Grouping objects by values of some transformation is a common need. Take the Contacts app in iOS - it displays a list of contacts grouped by the first letter of last name (by default in en-us, at least) and shows a scrubber with all letters on the right. Tons of apps have lists like this. GroupBy would make generating this display from a flat collection of contacts trivial. Given a Contact struct with firstName and lastName and a Group class that has a key property:

let groups = allContacts.GroupBy({$0.lastName.prefix(1).uppercased()})
    .sorted {lhs, rhs) -> Bool in 
    return lhs.key < rhs.key
    }

Something close to this can be done today with the grouping initializer on Dictionary, but I would suggest that having it available directly on Collection makes the flow a little more readable. And it's more applicable when you want a collection of groups and you care about the order.

Proposed Solution

struct Group<Key, Element> {
    let key: Key
    private var elements: [Element]
    init(_ key: Key) {
        self.key = key
        self.elements = []
    }
    
    mutating func append(_ newElement: Element) {
        self.elements.append(newElement)
    }
}

//implement Collection on Group by using the internal Array

extension Sequence {
    func groupBy<T: Hashable>(_ transform: (Self.Element) throws -> T) rethrows -> [Group<T, Self.Element>] {
        var groups: [T: Group<T, Self.Element>] = [:]
        for element in self {
            let key = try transform(element)
            if groups[key] == nil {
                groups[key] = Group(key)
            }
            
            groups[key]?.append(element)
        }
        
        return Array(groups.values)
    }
}

wowbagger · June 17, 2020, 5:46pm

I've been experimenting with something similar, with a different approach:

extension Sequence {
	/// Group the sequence by the given criterion.
	/// - Parameter criterion: The basis by which to group the sequence.
	/// - Throws: `Error`.
	/// - Returns: A dictionary consisting of the sequence grouped by the given criterion.
	func grouped<Group>(
		by criterion: (_ transforming: Element) throws -> Group
	) rethrows -> [Group: [Element]] {
		var groups: [Group: [Element]] = [:]
		for element in self {
			try groups[criterion(_: element), default: []].append(element)
		}
		return groups
	}
}

Although, something (impossible for now) like

func grouped<Group>(
	by key: (Element) throws -> Group
) rethrows -> OrderedDictionary<Group, Self>

will be even better imo.

jjatie · June 17, 2020, 5:59pm

Why not just use Dictionary.init(grouping:by:)?

tomhamming · June 17, 2020, 6:33pm

Functionally, we could indeed accomplish the same thing with the Dictionary init method.

let dict = Dictionary(grouping: allContacts, by: {$0.lastName.prefix(1).uppercased()})
let keys = dict.keys.sorted() //these are the letters in the scrubber and the section headers

But I'd argue that it reads more naturally as a method on the collection itself.

tomhamming · June 17, 2020, 6:34pm

Yeah the sorted collection functionality would make this a bit nicer.

xwu · June 17, 2020, 7:17pm

tomhamming:

Functionally, we could indeed accomplish the same thing with the Dictionary init method.
let dict = Dictionary(grouping: allContacts, by: {$0.lastName.prefix(1).uppercased()})
let keys = dict.keys.sorted() //these are the letters in the scrubber and the section headers
But I'd argue that it reads more naturally as a method on the collection itself.

As a general rule, the standard library does not offer multiple distinct spellings for the same functionality. Occasionally, two APIs which are meant to provide distinct functionality can converge to offer the same behavior in certain cases, but it's another thing to add an API which is duplicative of another.

In the Swift standard library, all conversions of a value from one type to another (which includes collections, such as converting an array to a set of unique elements) are spelled as initializers. This is an explicit guideline; that is why Dictionary(grouping:by:) is the API that is offered for this behavior.

You do propose here a way to get the grouped values in order; I would agree that an OrderedDictionary type would make this API fall out naturally. I see you've found the discussions about sorted collections :)

taylorswift · June 17, 2020, 7:32pm

this isn’t exactly what the thread is proposing, but a related operation that would be really useful to have and isn’t covered well by Dictionary.init(grouping:by:) would be to have some inverse to flatMap, that groups elements of a sequence by run, but is still aware of the ordering of the sequence, like for example, lexing tokens:

// classify is some function that returns a character class
" abcd89ef 123_".grouped(by: classify(_:))
// returns 
[
    (.whitespace, [" "]), 
    (.letter,     ["a", "b", "c", "d"]), 
    (.number,     ["8", "9"]),
    (.letter,     ["e", "f"]), 
    (.whitespace, [" "]), 
    (.number,     ["1", "2", "3"]), 
    (.punctuation,["_"])
]

Nevin · June 17, 2020, 7:51pm

If you put it on Collection rather than Sequence the return type can simply be an array of (Classification, Range<Index>), so it doesn’t need to allocate additional storage.

tomhamming · June 17, 2020, 7:51pm

I tend to want operations on my collections to read somewhat like sentences because of my early experiences with IEnumerable in .NET, but now that I think about it, an OrderedDictionary type would actually solve the use case I identified very nicely. It would automatically enforce order and key uniqueness.

I think there are still cases where returning an array of groups would be nice, like preserving the order of the occurrences of keys (with or without key uniqueness, as noted by @taylorswift), but whether that's worth an addition to the standard library is definitely up for debate.

wowbagger · June 17, 2020, 8:21pm

I think this is only true when the conversion is relatively lossless, e.g. Double(42) and String(describing: [foo, bar]). For conversions of Sequences that require custom transformations, they use methods like map and reduce instead of initialisers.

wowbagger · June 17, 2020, 8:24pm

or perhaps even better with RangeSet?

xwu · June 17, 2020, 10:39pm

They do not. Conversions that are lossless are unlabeled; others use labels. As you say, map and reduce arbitrarily transform elements in the general case, and their raison d'être isn't type conversion (to Array).

wowbagger · June 18, 2020, 3:27am

I think I agree with you here, but just to be sure, would you mind elaborating what their exact raison d'être is? Because the pitched group's seem to be similar.