Attemped zero-cost abstraction for composable comparator builders

AlexanderM · December 30, 2020, 4:49am

Background

Foundation has a type called Comparator. It's a type alias for just (Any, Any) -> ComparisonResult. It's used a lot in AppKit (and probably UIKit, too), such as for sorting table columns.

The interface is a bit brutal in Swift, because you're left having to cast the Any params into something more useful.

Further, it suffers the same issues as the < operator, in that it can be error-prone to implement correctly. In the case of <, it needs to implement a strict weak ordering. It can be surprisingly easy to accidentally write a < closure which is reflexive (returns true when both elements are equal) or non-transitive (a < b, and a < c, but somehow a >= c). This is particularly true when trying to sort by two boolean flags (e.g. nils should come before non-nils).

An example hand-rolled solution

In my app, I wanted to sort a lot of optional strings by the following criteria:

All the nulls (it's an NSArray, so that's NSNull, to be precise) first
All the empty strings after (there shouldn't be any according to my business logic, but just in case)
The rest of the strings, according to a standard lexicographic ordering.

My hand rolled Comparator looks like this:

let handRolledComparator: Comparator = { anyA, anyB in
	switch (anyA is NSNull, anyB is NSNull) {
		case ( true,  true): return .orderedSame
		case ( true, false): return .orderedAscending
		case (false,  true): return .orderedDescending
		case (false, false): break
	}
	
	guard let a = anyA as? NSString, let b = anyB as? NSString else {
		fatalError("Expected \(anyA) and \(anyB) to both be of type \(NSString.self)")
	}
	
	switch (a.length == 0, b.length == 0) {
		case ( true,  true): return .orderedSame
		case ( true, false): return .orderedAscending
		case (false,  true): return .orderedDescending
		case (false, false): break
	}
	
	return a.localizedStandardCompare(b as String)
}

It's a little brutal and error prone (I've written tests for it, it looks correct, but it's certainly not obvious that it's correct from a quick glance).

My question: is it possible to make an abstraction that improves this?

Well obviously, it is, but given that these comparators will run on hundreds of strings, on every table update, is there a way to build them elegantly, that's also fast (maybe even zero-cost?)

My attempt

In attempt to remedy this, and learn some more about the language, I tried to make a "Comparator Builder", which allows you to use a declarative syntax to build up a comparator, as a composition of simpler comparators. You can view the source here: AMomchilov / Public / Experiments / TypedComparator · GitLab

I came up with two designs:

The first is closure based. Every criteria of a comparison is expressed as a closure. If the criteria leads to a conclusive result (e.g. the lhs was NSNull and the rhs was not, that should be .orderedAscending regardless of the rest of the pipeline), it returns it. Otherwise, it delegates to the next closure to take a stab at reaching a conclusion.

In all, the chain forms a linked list of comparing closures. I thought this would have a lot of allocation/arc/indirection overhead, but from my toying in Instruments, it appears that the compiler did an impressively good job at inline the closure chain. IIRC, it flattened out into a single closure!
The second approach is struct based. The idea is similar to the first approach, but I tried to do it in a way which maximized static type information, in hopes the compiler could use it for more aggressive inlining.

In all, a chain of comparators here forms a nested structure of structs (hopefully free of heap allocations) that delegate from one to another, in a similar manner.

Performance

I wrote some performance benchmarks to compare these 2 approaches, and the original raw Comparator. Here's the results sorting 10,000 objects (6,000 strings varying between 2 and 20 characters, 1,000 blank strings, and 3,000 nulls) on a 2013 iMac (Core i7-4771, 4 cores, 8 threads, 3.5 GHz). This was run in release mode (-O) on Apple Swift version 5.3.2 (swiftlang-1200.0.45 clang-1200.0.32.28):

Approach	Time	Δ from best
Raw `Comparator`	`0.141 s`	`0.0 %`
Closure based	`0.176 s`	`24.8 %`
Struct based	`0.151 s`	`7.1 %`

Request for comments/feedback

Is there anything I can do to improve these implementations? Or perhaps there's a different approach there's fundamentally better. Are there any particular tricks SwiftUI, Combine, etc. use to decrease the cost of deeply nested trees of views or long chains of operators?