(foreword: I'm aware the implementations are marked as preliminary)
I think there are multiple implementations that are each worth considering:
Algorithms
-
Linear searching: filter the collection, building up an Array of "seen" elements and using Array.contains(_:) (linear search, itself O(n) in time) it for every element.
- Preserves ordering
- Only requires
Equatable conformance
-
O(n) in time
-
O(n) in extra space
-
Set searching: filter the collection, building up a Set of "seen" elements and using Set.contains(_:) (hash-based search, O(1) in time) it for every element.
- Preserves ordering
- Requires
Hashable conformance
-
O(n) in time
-
O(n) in extra space
-
Convert to Set: Simply Set(sourceCollection).
- Loses ordering
- Requires
Hashable conformance
-
O(n) in time
- No extra space requirement
-
Convert to TreeSet: Simply TreeSet(sourceCollection). Requires TreeSet implementation (a tree-based Set that preserves natural ordering of elements.)
- Enforced "natural" ordering
- Requires
Comparable conformance
-
O(log(n)) in time
- No extra space requirement
Use cases:
| Element type |
Order Doesn't Matter |
Discover Order Desired |
Sorted Order Desired |
Equatable |
1 |
1 |
? |
Hashable |
3 |
2 |
? |
Comparable |
1 |
? |
4 |
All space complexity assessments assume that the original collection is discard.
Discussion
I think that the trade-offs between these algorithms are too large for us to prescribe a one-size-fits-all solution. I think we should implement them all. However, I don't think we should introduce separate functions for them, instead, I think we should try to unify them under the minimum possible number of functions, each parameterized with things like whether ordering is desired or not. Something like:
Note: I take "first unique element" to mean: "Out of a set of duplicate elements, take the one which was discovered earliest"
enum OutputOrderingType {
// Output contains first unique elements, in the order they were first seen.
// E.g. [4, 6, 4, 1] => [4, 6, 1]
case discoveryOrder
// Output contains first unique elements, in sorted order
// E.g. [4, 6, 4, 1] => [1, 4, 6]
case sortedOrder
// Output contains first unique elements, in an undefined order
// E.g. [4, 6, 4, 1] => [4, 1, 6] or [1, 4, 6] or [6, 4, 1], or ...
case undefinedOrder
}
//prototype:
func distinctElements(
by deriveKey: KeyDerivingFunction = { $0 },
ordering: OutputOrderingType
) -> ReturnType