String equality comparison complexity is not stated in the documentation. In particular I wonder about this typical scenario of comparing "same" values (when the underlying reference is presumably the same):
let foo: String = ....
let bar = foo
foo == bar // is this O(1) ?
is comparing foo and bar here constant time?
Does the same hold for other common types like Array, Dictionary, Set, Data, URL, UUID?
For collections like String, Array, Dictionary, Set, Data, etc., there may be a fast-path to check if the pointers to the underlying buffers are the same.
But if they're not, all the elements need to be checked in linear time, akin to zip(a, b).allSatisfy(==)
"may" word is worrisome here... I wonder if that means I have to implement a wrapper on top of common types to get guaranteed behaviour:
struct MyString: Equatable {
var string: String
var fingerprint: Int // e.g. a global auto incrementing counter
func == (a: Self, b: Self) -> Bool {
a.fingerprint == b.fingerprint || a.string == b.string
}
}
Or take somewhat easier path: check that current implementation is ok + in line with Hyrum's law depend on this behaviour + have unit tests that checks O(1) behaviour + start worrying at some future point but only when/if unit tests say so.
That's ASCII only optimization? Typical strings in my case are people names, text messages, etc, not ASCII unfortunately, apart from dictionary keys like "id", "name"..
I mean code more like this:
let x = test()
let y = test()
let z = x == y
where test() body is out of reach for the optimiser but returning the same value (and that value is not getting mutated or anything nasty like that, so the internal reference should be the same).
Even assuming they return the same value, the pointer to the backing storage is different so we'd have to walk the string to compare it. (That is ignoring compiler optimizations, if the implementation of test is known to the compiler and returns the same thing all the time it can optimize it away)
let strings: [String: String] = ["key" : file.data.string]
func test() -> String { // this function is invisible to optimizer
strings["key"]
}
// in another place:
let x = test()
let y = test()
let z = x == y
In this case the internal reference returned by "test()" is the same.
It's unspecified for Data, which calls out to memcmp, which makes no promises, as far as I can tell. Glibc doesn't appear to have a fast path for it, in its implementation
Of course, if it's not documented and you rely on it anyway, you're in Hyrum's law territory like you mentioned.
Perhaps the best path forward is to document these fast paths. I can't imagine an alternative implementation that would be crushed by a requirement to be constant-time for identical arrays.