Implementing `Comparable` for types with optional fields is too hard

taylorswift · April 15, 2023, 9:36pm

i think in general we do not use Comparable as much as C++ uses their equivalent of that concept. which is great because Comparable is easily my least-favorite conformance to implement by hand. but sometimes it is unavoidable, because you need it for things like deterministic serialization.

which sucks because anything that has optional stored properties is really hard to make Comparable, because Optional<T> doesn’t have a conditional conformance to Comparable. (Array<T> has a similar problem.)

here’s an example. this type models the generic signature of a swift extension:

extension Extension
{
    @frozen public
    struct Signature:Equatable, Hashable, Sendable
    {
        /// The type extended by the relevant extension group.
        public
        let type:ScalarSymbolResolution
        /// The generic constraints of the relevant extension group.
        /// An empty array indicates an unconstrained extension, while
        /// nil indicates an unknown generic context.
        public
        let conditions:[GenericConstraint<ScalarSymbolResolution>]?
    }
}

(the conditions field is optional because SymbolGraphGen emits the wrong generic signatures for members inherited through protocol conformances. but SymbolGraphGen has a lot of bugs and it doesn’t seem like Apple considers it important, so i guess we just have to accept it is broken and find a workaround.)

conceptually, the < implementation should just be:

extension Extension.Signature:Comparable
{
    public static
    func < (lhs:Self, rhs:Self) -> Bool
    {
        (lhs.type, lhs.conditions) < (rhs.type, rhs.conditions)
    }
}

but that doesn’t work because neither Array nor Optional is conditionally Comparable. so instead we need this monstrosity:

extension Extension.Signature:Comparable
{
    public static
    func < (lhs:Self, rhs:Self) -> Bool
    {
        if      lhs.type < rhs.type
        {
            return true
        }
        else if lhs.type > rhs.type
        {
            return false
        }

        switch (lhs.conditions, rhs.conditions)
        {
        case (_, nil):
            return false
        
        case (nil, _?):
            return true
        
        case (let lhs?, let rhs?):
            return lhs.lexicographicallyPrecedes(rhs)
        }
    }
}

i say that this is hard and not just tedious because there is a lot of control flow to think about here, when the rule is conceptually just:

self.type has the highest sort precedence
nil self.conditions sort before non-nil conditions.

in another life, i thought Optional should not be Comparable, because maybe you want nil to sort after non-nil. so i left Optional out of SE-266. but now i am thinking we may have made the common use case disproportionally painful to avoid infringing on a less-common use case.

i can’t think of any reason why Array should not be Comparable, other than a really weak argument about new users falling into performance traps with expensive <’s. (but this trap already exists with String comparisons.)

so i propose:

we make Optional<T> conditionally Comparable, with nil sorting before everything else, which matches lexicographical intuition.
we make Array<T> conditionally Comparable, in recognition of the fact that sometimes we really do need to do a very slow array comparison.

thoughts?

Slava_Pestov · April 15, 2023, 10:38pm

For Array there are at least two orders, the prefix order and shortlex. Which one do you have in mind?

EDIT: in general, you can wrap Array or Optional in a new type that implements the conformance. You can then implement comparison on your type by constructing the “sorting keys” of both sides and comparing them (and the conformance on sorting keys could be derived)

taylorswift · April 15, 2023, 10:43pm

for me at least, the prefix order matches my intuition more than the shortlex order.

for no reason other than i would expect ["a", "b"] < ["a", "b", "c"] as [Character] to behave similarly to "ab" < "abc".

but at a higher level, any ordering is better than no ordering at all, which is what we have right now. my goal right now is just to achieve deterministic serialization output.

this is already my standard workaround for Array, but it is a lot of boilerplate and a more reusable generalization of it doesn’t really have a natural home in a typical module organization.

SE-266 can be used to deal with the Optional case, by replacing the optional with a custom enum type, but then we give up optional chaining, particularly optional mutations. so Optional is a bigger problem than Array. i think this would also be inconsistent with the general principle of “Optionoids are bad”.

Slava_Pestov · April 15, 2023, 10:44pm

ab precedes abc in both orders. The difference is that eg, cd < abc in shortlex but abc < cd in prefix order.

taylorswift · April 15, 2023, 10:48pm

sorry, that was an ambiguous example. i would expect ["c", "d"] < ["a", "b", "c"] as [Character] to behave similarly to "cd" < "abc".

mpilman · April 16, 2023, 12:47am

I see your point (and it can be especially bad if the implementation behaves in surprising ways), but this is not how I think we should think about this problem.

Generally implementing a data structure shouldn't necessarily be the easy task. Instead, using a data structure efficiently should be as easy as possible.

I have seen many examples of people doing very questionable things to shoehorn their data into some indexing structure. In the best case, this will make code very hard to read, worst-case the performance will be not only bad but, imho much worse, unpredictable.

Additionally I would agree that prefix ordering is probably more intuitive. But maybe my intuition is tainted by too many years of C++. But often memcmp(a, b) < 0 is roughly what I expect a < b to do.

Side-question: do people have thoughts about the spaceship operator (<=>)? I rather like how it reduces work when implementing Comparable (I understand that < is technically sufficient, but if < is expensive to compute one might want to explicitly implement other operators as well)

tera · April 16, 2023, 1:13am

Yes. That would also match Equatable, Hashable & Codable conditional conformances.

Possible implementation

extension Array: Comparable where Element: Comparable {
    public static func < (lhs: Array<Element>, rhs: Array<Element>) -> Bool {
        let lcount = lhs.count
        let rcount = rhs.count
        for i in 0 ..< Swift.min(lcount, rcount) {
            if lhs[i] != rhs[i] {
                return lhs[i] < rhs[i]
            }
        }
        return lcount < rcount
    }
}

extension Optional: Comparable where Wrapped: Comparable {
    public static func < (lhs: Optional, rhs: Optional) -> Bool {
        if let left = lhs {
            if let right = rhs {
                return left < right
            } else {
                return false
            }
        } else {
            return rhs != nil
        }
    }
}

Do you use this in your project and does it it work well? I'm trying to imagine a situation when Array/Optional conforming to Comparable could be harmful but struggling to find an example.

Worth considering Set and Dictionary as well.

lukasa · April 17, 2023, 7:55am

Just FWIW, but I strongly disagree with the characterisation that any ordering is better than no ordering at all, or that the synthesised conformances of Equatable/Hashable/Codable are comparable. Equatable and Hashable have clear, straightforward, "one correct definition" rules, and if you for some reason are using an Array that doesn't want to follow them it's going to be obvious. If your rule for equatability of your type is that you want to equate the elements of your contained array except for any element whose index is evenly divisible by 5, it's very obvious to you that the default Equatable conformance isn't going to work.

Conversely, @Slava_Pestov has quite rightly pointed out that it is less obvious how to compare two aggregates of potentially different sizes and contents. This widens the opportunity for surprise. So I think it is appropriate for Swift to decline to make this decision for users, some of whom are likely to be very surprised by whatever choice Swift makes.

filip-sakel · April 17, 2023, 9:34am

We could perhaps use property wrappers or macros to indicate the desired ordering in the type that uses optionals or arrays. This way, no custom implementation would be required and the comparison ordering would be explicit in the type definition.

tera · April 17, 2023, 9:58am

Note that we do have a precedent of making comparable conformance that might not necessarily meet user expectations:

print("B" < "Å") // always true
print("B".localizedCompare("Å") == .orderedAscending) // false or true (e.g. Norwegian)

One may question whether that's a good precedent to follow or not. On the upside having "the always defined and stable" comparable conformance (even the one that looks arbitrary and doesn't match user expectations) allows using the value as a key of an ordered tree data structure.

xwu · April 17, 2023, 1:48pm

It’s plainly unreasonable for a user to expect Swift’s locale-independent string handling to be based on Norwegian rules. We use Unicode for strings and IEEE-754 for floating-point values as there isn’t anything approaching a reasonable competing standard.

The bar for a default implementation isn’t that it can “necessarily” meet every user’s expectations however unreasonable. If there were an industry consensus that nil sorts either first or last, then we would not be having this conversation now about Optional.

taylorswift · April 17, 2023, 4:22pm

comparing on element count would be illogical because such a comparison rule would be agnostic of element type, and Array's Equatable is conditional on its element type being Equatable. if Array ordered itself on element count, that would imply that any two Arrays with the same number of elements are equal. therefore prefix order is the only sensible Array ordering.

tera · April 17, 2023, 4:27pm

That's not how shortlex order works. ['a', 'b'] is still less than ['a', 'c'] despite lengths being equal.

taylorswift · April 17, 2023, 4:31pm

yeah you're right never mind. i suppose there just isn't a clean way to say "sort on property x by some subrule, and then if they are equal, sort on property y by some other subrule." instead we have to traverse x to check for < and then traverse it a second time to check for >, etc etc.

tera · April 17, 2023, 5:08pm

In a way that's what shorlex ordeding is doing: checking lengths as a primary sort criterion and if they are equal checking contents next. I don't quite follow why to traverse x twice first with < and next with >, as if you are making == implementation via < and > or something to that account. See my implementation above how I'd do prefix (vocabulary) ordering, there the contents is checked first and length is checked only if contents are equal (up to the minimal count of two values). And shortlex implementation is even simpler.

Contrary to some others on this thread I'd prefer having a not so ideal and possibly arbitrary but stable and always available ordering, compared to not having built-in ordering at all. If swift was using trees instead of hashtables (for its dictionaries / sets) I'm pretty sure we'd have this implemented already, similar to how we have Hashability conformance everywhere now. It's worth checking how do other languages cope with this, specifically those languages that use tree storage for their dictionaries.

Slava_Pestov · April 17, 2023, 5:32pm

This is a great idea. My hope is that one day, the derived conformances built-in to the compiler can actually become macros, with a little bit of hard-coded logic remaining inside the compiler to implicitly attach the macro to a type declaration in the cases where we derive the conformance today. At that point, we should investigate allowing a property wrapper or another macro to customize hashing, equality, comparison, codable and whatever else behaviors, perhaps by wrapping the field in a new type before performing the conformance operation.

taylorswift · April 17, 2023, 5:59pm

must everything be a macro? i think maybe 80% of

could be implemented with type packs and custom sort predicates? of course type packs aren't in the language yet, but neither are macros.

there are three branches, for less than, greater than, and equal to. so at least two tests are needed.

tera · April 17, 2023, 6:14pm

I remember seeing this great idea a while ago. With Distillable, Equatable and Comparable implementations could be as simple as memcmp (†), plus you are automatically getting a compatible EQ/Hash/Less implementation adhering essential invariants:

a.hash != b.hash         ---->    a != b
!(a < b) && !(b < a)     <--->    a == b

The only thing I don't like about Distillable if its essence is always a dynamic data structure (say, implemented as an Array). Maybe it's possible implementing essence as an enum with two cases, one case for fixed sized structures (in which case the essence would be static / lock free value, implemented as a tuple or a new fixed array type) and another case for dynamic sized structures, in which case it could be backed by a normal array.

† - well, almost. There could be gaps between fields due to alignment, so it's a bit more complicated.

taylorswift · April 17, 2023, 6:34pm

i can see a lot of value for Distillable in other situations, but i don't think it really helps with types like the Extension.Signature example, because that thing doesn't have a nice binary representation. (indeed, it would be marvelous if it did, because that would make it trivial to serialize it to BSON.)