Mystery: StringProtocol.compare<T>(...) -> ComparisonResult: sort works as expect, map give incorrect result

import Foundation

let original = ["Kofi", "Abena", "Peter", "Kweku", "Akosua", "abena", "bee", "ábenā"]
// the compare closure works for sorting
let sorted = original.sorted { $0.compare($1, options: [.diacriticInsensitive, .caseInsensitive]) == .orderedAscending }
// but here doesn't work:
// find the accending order true/false value of the array compare to string "ábenā"
// using the exact same closure as above
// but this is not returning the expected result comparing to the sorted array
let orderTrueFalse = sorted.map { $0.compare("ábenā", options: [.diacriticInsensitive, .caseInsensitive]) == .orderedAscending }

print("original", original)
print("sorted", sorted)
print("orderTrueFalse", orderTrueFalse)
// print:
// original ["Kofi", "Abena", "Peter", "Kweku", "Akosua", "abena", "bee", "ábenā"]
// sorted ["Abena", "abena", "ábenā", "Akosua", "bee", "Kofi", "Kweku", "Peter"]
// orderTrueFalse [false, false, false, false, false, false, false, false]
// why it's not [true, true, false, false, false, false, false, false]

Try this:

let order = sorted.map { $0.compare("ábenā", options: [.diacriticInsensitive, .caseInsensitive]) }
let orderIsAscending = order.map { $0 == .orderedAscending }
let orderIsSame = order.map { $0 == .orderedSame }
let orderIsDescending = order.map { $0 == .orderedDescending }
print("orderIsAscending", orderIsAscending)
print("orderIsSame", orderIsSame)
print("orderIsDescending", orderIsDescending)

They're ignoring diacritics and cases, so they compare equal instead of ascending.

Couldn't we just give the explanation here, which is that (with those options) "ábenā" is not bigger than (.orderedAscending relative to) anything in the list?

(Edit: OK, I see you added the explanation. :slight_smile: )

TBH, I wasn't even sure that's where the confusion is in the beginning. Could be that @young forget that comparison has three results. I do that a few times, hehe (I mean, what even is trichonomy?). Could also be that they forgot that diacritic & case insensitivity is set, or how they work.

At least the code compiles, and I can see what they got vs expect. So that's better than a majority of questions already.

I see. So how did the sorted() with the same compare(...) == orderedAscending compute the ordering? So the comparison is always false, then the order should not change:

let hm = original.sorted {_, _ in false }
print(hm)
// print ["Kofi", "Abena", "Peter", "Kweku", "Akosua", "abena", "bee", "ábenā"]

I wasn't even thinking and was not aware of what this does:

$0.compare($1, options: [.diacriticInsensitive, .caseInsensitive]) == .orderedAscending

I got this from some example of how to sort string with diacritics because I need to sort things this way. I simply plug the compare expression into my search function and didn't get the result I expected.

Anyway, the sorted() did come to some new order, how did it find the order, if the compare expression is always false?

There're a few steps here. Firstly, sort requires you to adhere to a few assumptions:

areInIncreasingOrder must be a strict weak ordering over the elements. That is, for any elements a , b , and c , the following conditions must hold:

  • areInIncreasingOrder(a, a) is always false . (Irreflexivity)
  • If areInIncreasingOrder(a, b) and areInIncreasingOrder(b, c) are both true , then areInIncreasingOrder(a, c) is also true . (Transitive comparability)
  • Two elements are incomparable if neither is ordered before the other according to the predicate. If a and b are incomparable, and b and c are incomparable, then a and c are also incomparable. (Transitive incomparability)

Just like hash(into:), violating these requirements is generally a bug, and the result is undefined behaviour. Luckily always return false adhere to these rules, so the result is (relatively) defined. sort ensures that if areInIncreasingOrder(a, b) returns true, then a will precede b in the sorted list.

Now, what does it mean when areInIncreasingOrder always returns false? If areInIncreasingOrder(a, b) and areInIncreasingOrder(b, a) are both false, then it means that in a sorted list, a can be placed before or after b.

Is it required that the sorted list is in the exact same order? Not generally. If you sort [(1, "a"), (1, "b")] by the first member, then [(1, "b"), (1, "a")] is a perfectly valid result, 1 and 1 are next to each other in the sorted list, after all.

What you need is called "stable" sorts, where equal elements appear in the same order that it appears in the original list. Unfortunately, Array.sorted is apparently not stable.

The sorting algorithm is not guaranteed to be stable. A stable sort preserves the relative order of elements for which areInIncreasingOrder does not establish an order.

So what exactly is that order that the result returns? Well, that's an implementation detail. If it uses quicksort, it would be in one particular order and another order in merge sort. You see, many algorithms just assume that you can swap equal elements without problem (which is a pretty good optimization).

So if you want to make sure that "equal" elements appear in the same order that they appear, you need to encode it into the array somehow. One way is to add offset into the array before sorting, and strip it off afterward:

let ok = original.enumerated().sorted { false || $0.offset < $1.offset }.map { $0.element }
print(ok) // ["Kofi", "Abena", "Peter", "Kweku", "Akosua", "abena", "bee", "ábenā"]

Specifically, your problem is here in your question.

In orderTrueFalse, you didn't compare array elements with each other. You compared array elements with "ábenā".

"ábenā" is less than or equal to everything in the array. So, it isn't greater than ( .orderedAscending relative to) anything in the array. So you always get false.

Ok. I think I cannot sort the way I did and do binary search on that. I need to keep two separate "sorted" arrays: one sorted the way I did for UI presentation and another sorted with "<" comparison for search.

Is this right?

That's one way to do it. TBH, it's quite a hassle to synchronize the two objects, so I don't usually use that approach.

You can also use one array that sort with diacritic & case insensitive options, then break a tie with the order of the original list. If I understand your UI requirements correctly, that would work with your UI (it definitely works with binary search, given you appropriately handle duplicates). This is similar to what I showed earlier:

let sorted = original.enumerated().sorted {
    switch $0.element.compare($1.element, options: [.diacriticInsensitive, .caseInsensitive]) {
    case .orderedAscending: return true
    case .orderedSame: return $0.offset < $1.offset
    case .orderedDescending: return false
    }
} .map { $0.element }
print(sorted)
// ["Abena", "abena", "ábenā", "Akosua", "bee", "Kofi", "Kweku", "Peter"]

Here, "Abena", "abena", "ábenā" are grouped together, ordered by their appearance in original.

Binary search can handle duplicates. If you want the search to return "Abena", "abena", "ábenā" when you look for AbEnA, you can search twice, once to find the first element, and again to find the last.

let searchTerm = ...
let firstIndex = sorted.binarySearch {
  $0.element.compare(searchTerm.element, options: [.diacriticInsensitive, .caseInsensitive]) != .orderAscending
}
let endIndex = sorted.binarySearch {
  $0.element.compare(searchTerm.element, options: [.diacriticInsensitive, .caseInsensitive]) == .orderDescending
}

return sorted[firstIndex..<lastIndex]

Depending on how binarySearch is defined, you may want to tweak the condition, but that's the idea.


Edit:

One thing I forgot to mention is that, having separate arrays as you said always works simply because you're maintaining separate structure for UIs and searches. Though if your search criteria is stricter than the UI criteria (text that are the same from the UI perspective is different from search perspective), or the converse is true (text that are the same from the search perspective is different from UI perspective), then you can use one array, as I mentioned above.

1 Like

There is no duplicate, entries are unique. So "Abena", "abena", "ábenā" are different entries, just need to present to user in sorted order of "[.diacriticInsensitive, .caseInsensitive]" . My binary search would work if I can come up with a compare predicate that tell the ordering given a target like "ábenā".

For my example after sorted:

["Abena", "abena", "ábenā", "Akosua", "bee", "Kofi", "Kweku", "Peter"]

Search for "ábenā"

I need a compare predicate the give me the isInAccendingOrder:

[true, true, false, false, false, false, false, false]

I'd like to first warn that it's very fishy that the UI treats "Abena" and "abena" as the same string, but the search treats them differently. It doesn't sound like the best design if the list screams I'm aware of the diacritics and handled it for you, then the search bar screams actually not.

However, if you insist, I can see two ways that you can deal with that.

  1. Use separate arrays as you suggested, one sorted with StringProtocol.compare for UI appearance* and another with String.< for searching.

  2. Break the StringProtocol.compare ties with String.< so that you can use the same array,

    let sorted = original.sorted {
      switch $0.compare($1, options: [.diacriticInsensitive, .caseInsensitive]) {
      case .orderedAscending: return true
      case .orderedSame: return $0 < $1
      case .orderedDescending: return false
      }
    }
    print(sorted)
    // ["Abena", "abena", "ábenā", "Akosua", "bee", "Kofi", "Kweku", "Peter"]
    

    You can then do the binary search using that closure as the predicate. This gets quite complicated, so using two arrays may be better in this scenario.

* Even then, you want to make sure to break the ties as sorted is unstable. If you sort it again the entire UI may need to be updated if the sort algorithm puts incomparable things in a different order, which could happen even if you just add/remove unrelated strings.

1 Like

:+1: Now I understand why

.sorted { $0.compare($1, options: [.diacriticInsensitive, .caseInsensitive]) == .orderedAscending }

is enough to sort the sample elements. But for search it doesn't work and needs the more "able" predicate.

Thanks!