Sorted Sequence Operations: Merge-Sort, Set Operations, Inclusion Testing

CTMacUser · November 26, 2019, 6:09pm

Working on another part of the C++ Standard Library to import.

Continuing the discussion from Computations, and where to save resources, are hard:

The set-subtraction functionality is actually part of a mini-framework. I uploaded it to a Gist.

MergedSortedSequence.swift

Sequence.whileMergingSorted(with: do: by:) is the core method. Both the receiver and the sequence passed as the first argument must be sorted according to the predicate passed into the third argument. The second argument is an action closure. The method reads from both sequences and passes to the action closure a value with one element from each sequence if they're equivalent, or a value with one element that's the least-ordered from the current front of the receiver or second sequence. The passed value is of a custom MergedSetElementSource generic enum type. If you prefer a pull interface instead of a push one, the MergeSortedSequence generic type is a lazy sequence (and collection if both sequences are collections too) that walks its input sequences and vends MergedSetElementSource values.

Segregating by elements that are exclusive to the first sequence, second sequence, or are shared is the key to all of the functions in this framework.

Since I didn't need it for my prime sieve research, I didn't directly provide an analogue to the C++ function std::merge, but the methods above can be used to make such a method. Just push-out/transform the source values the same order they came in, stripping out the MergedSetElementSource wrapper. For .shared values, output two values, the first member, then the second, if you want an interleaved merge-sort. But we usually want to have all the equivalent elements from the second sequence to follow those from the first; you have to output only the values from the first sequence and queue those from the second until the next upcoming element has a greater value. I can write that method if we go further along in the process.

The Sequence.overlapIfMergedSorted(with: by:) method is the analogue to the C++ std::includes function. It is more precise than a simple inclusion Bool; a custom TwoSetOverlap enumeration type is returned that specifies the exact relationship between the two sequences. Right now, I'm not sure we need to keep its initializer public.

There are two similar functions that came out of the above method. The Sequence.sourceCountWhileMergingSorted(with: by:) method gives the exact counts of the first-exclusive/shared/second-exclusive elements. The Sequence.whileMergingSortedConfirmFor method is a quasi-dual where you submit what shape of first-exclusive/shared/second-exclusive layout you want and the method confirms if the sorted merger would match. I'm not sure if either of these should go in our standard library.

All the methods that take a comparison predicate have a default Comparable version.

SetOperations.swift

The SetOperation enumeration type is a dual to TwoSetOverlap. While TSO is for reading combinations, SO is for writing them. Those operations are called by the Sequence.mergeSortedAsSets(with: keeping: by:) method. It has variants for overriding the returned collection, working lazily, or using Comparable by default. The lazy-sequence version is a collection when both sorted sequences are collections. This enum and method set combined are analogues to std::set_difference, std::set_intersection, std::set_symmetric_difference, std::set_union, and then some.

I wanted to use one enumeration type for set combinations, but SO and TSO have slightly different operations. And the enum case names are vastly different, so I don't see how I can merge them and still keep the names making sense.