Same values equality complexity guarantees

tera · November 9, 2022, 8:02pm

String equality comparison complexity is not stated in the documentation. In particular I wonder about this typical scenario of comparing "same" values (when the underlying reference is presumably the same):

let foo: String = ....
let bar = foo
foo == bar // is this O(1) ?

is comparing foo and bar here constant time?

Does the same hold for other common types like Array, Dictionary, Set, Data, URL, UUID?

AlexanderM · November 9, 2022, 8:18pm

For collections like String, Array, Dictionary, Set, Data, etc., there may be a fast-path to check if the pointers to the underlying buffers are the same.

But if they're not, all the elements need to be checked in linear time, akin to zip(a, b).allSatisfy(==)

David_Smith · November 9, 2022, 8:19pm

The String fast path for this was implemented in [stdlib] Test if two ascii string pointers are equal before memcmp by airspeedswift · Pull Request #10018 · apple/swift · GitHub

I haven't looked at Array/Dictionary/Set yet, it's worth double checking

Alejandro · November 9, 2022, 8:45pm

String has the "pointer equality" fast path you can see here:

// swiftc -emit-assembly 1.swift
@_silgen_name("test")
func test() -> String

let x = test()
let y = x
let z = x == y

The x == y call gets optimized away to a true even though the value of x is unknown at compile time.

tera · November 9, 2022, 9:45pm

"may" word is worrisome here... I wonder if that means I have to implement a wrapper on top of common types to get guaranteed behaviour:

struct MyString: Equatable {
    var string: String
    var fingerprint: Int // e.g. a global auto incrementing counter 
    func == (a: Self, b: Self) -> Bool {
        a.fingerprint == b.fingerprint || a.string == b.string
    }
}

Or take somewhat easier path: check that current implementation is ok + in line with Hyrum's law depend on this behaviour + have unit tests that checks O(1) behaviour + start worrying at some future point but only when/if unit tests say so.

That's ASCII only optimization? Typical strings in my case are people names, text messages, etc, not ASCII unfortunately, apart from dictionary keys like "id", "name"..

I mean code more like this:

let x = test()
let y = test()
let z = x == y

where test() body is out of reach for the optimiser but returning the same value (and that value is not getting mutated or anything nasty like that, so the internal reference should be the same).

Alejandro · November 9, 2022, 11:02pm

tera:

I mean code more like this:
let x = test()
let y = test()
let z = x == y
where test() body is out of reach for the optimiser but returning the same value (and that value is not getting mutated or anything nasty like that, so the internal reference should be the same).

Even assuming they return the same value, the pointer to the backing storage is different so we'd have to walk the string to compare it. (That is ignoring compiler optimizations, if the implementation of test is known to the compiler and returns the same thing all the time it can optimize it away)

tera · November 10, 2022, 12:13am

Why do you think this is the case? Example:

let strings: [String: String] = ["key" : file.data.string]

func test() -> String { // this function is invisible to optimizer
    strings["key"]
}

// in another place:

let x = test()
let y = test()
let z = x == y

In this case the internal reference returned by "test()" is the same.

Alejandro · November 10, 2022, 12:55am

Sorry, I was assuming that this test() function returned a newly allocated string every time.

AlexanderM · November 10, 2022, 3:46pm

Seems like most of them have fast paths:

String
Array
Set
Dictionary
It's unspecified for Data, which calls out to memcmp, which makes no promises, as far as I can tell. Glibc doesn't appear to have a fast path for it, in its implementation

Of course, if it's not documented and you rely on it anyway, you're in Hyrum's law territory like you mentioned.

Perhaps the best path forward is to document these fast paths. I can't imagine an alternative implementation that would be crushed by a requirement to be constant-time for identical arrays.

tera · November 10, 2022, 5:46pm

This is super useful, thank you very much!
+1 for documenting this behaviour.

David_Smith · November 10, 2022, 7:46pm

memcmp on Darwin does not check for pointer equality first either