Difference between identity and equality

AlexSedykh · May 3, 2022, 11:09am

While learning SwiftUI, I noticed how strongly the framework relies on the Equatable protocol in a way that is not at all obvious to me. Understanding what was the matter, I discovered that I had previously misunderstood Equatable. It turns out I've always misunderstood the equality and the value identity. It seems to me that this confusion is partly caused by value programming, and sometimes it is difficult to understand whether two values are equal or identical, and if they are identical, are they equal? For example, if we take 5 and 5, are we talking about two fives or one? I was once answered on the forum that if we conform a structure to the Hashable protocol , then this conformation should follow from the equality of the values, that is, if two values are equal, then they should have the same hash. That is, first we conform to the Equatable protocol, and then Hashable, and in that order and not vice versa. I have a more philosophical question. Is it semantically correct if two values have different equality, but the same hash, to use values of this type in sets?

struct Ship: Equatable, Hashable {
    let name: String
    let cargo: Int
    static func == (lhs: Ship, rhs: Ship) -> Bool {
        lhs.cargo == rhs.cargo
    }
    func hash(into hasher: inout Hasher) {
        hasher.combine(name)
    }
}

lukasa · May 3, 2022, 11:49am

In general yes, that's fine. You'll get some odd behaviours and poor performance, but nothing much else is wrong.

However, your example is not acceptable. The short rule here is that if two objects are equal (that is, if x == y is true) then they must have the same hash. However, two objects may have the same hash but be non-equal. This is not true above, at least not by the definition of the type: it's entirely possible for two equal objects to have different hash values.

More broadly, to your question of identity, I will say this: a key identifier that a type is a value type is that it does not have identity. Your discussion of 5 reveals this perfectly. It does not make any sense to talk about the difference between one 5 and another 5 in Swift: there is no observable difference. In that sense, Int is a value type: the only thing that matters is its value, not its identity.

Types have identity in cases where it is possible to identify specifically which object we're talking about. Classes are an easy one here: they have identity by virtue of their memory location, which is unchanging. We can therefore easily talk about which instance of a class we mean by referring to its pointer location.

As a general rule, if a type has both identity and defined equality then a value that is identical to another must be equal to it. Put another way, a == a must be true.

Martin · May 3, 2022, 12:02pm

In addition to what @lukasa said: Your implementation will quickly lead to a runtime crash:

var set = Set<Ship>()
set.insert(Ship(name: "A", cargo: 1))
set.insert(Ship(name: "B", cargo: 1))

Fatal error: Duplicate elements of type 'Ship' were found in a Set.
This usually means either that the type violates Hashable's requirements, or
that members of such a set were mutated after insertion.

(Due to the hash randomization that program might not always crash, you may need to run it multiple times to reproduce the problem.)

tem · May 3, 2022, 12:15pm

Interesting! Running this in a playground I get 1 of 3 different results on each run:

Insertion succeeds.
Insertion fails without error.
Insertion fails with fatal error.

I would expect it to be consistent, and I don't understand what you mean by hash randomization. Does Set do certain checks only when the hash value has some property (e.g. it's negative, or an even number...), and if so, why?

Martin · May 3, 2022, 12:18pm

Since Swift 4.2, the standard hash function uses a per-execution random seed, so that generated hash values are be different in each execution of a Swift program. See swift-evolution/0206-hashable-enhancements.md at main · apple/swift-evolution · GitHub for the details.

tem · May 3, 2022, 12:25pm

I can see in the playground that the hash values are different on each run, but I don't see why that should affect how Set handles these violations. I read the link but that didn't clear it up for me. Maybe I'm just being dense, but I assume it's something more fundamental with hashing, like the "dispersion quality"? Or the "open addressing with linear probing", which, I'll be honest, I have no idea what that means. Anyway, if the hashing is done right it's nothing to worry about, right?

xwu · May 3, 2022, 1:38pm

Correct. There is no guaranteed behavior when the Hashable conformance is broken, not even a guaranteed way that Set will behave weirdly.

scanon · May 3, 2022, 3:16pm

This will necessarily happen for any type larger than the size of the hash key, because of the pigeonhole principle. It is unavoidable. There's nothing wrong with it.

This, however, is not that, and is a serious logic error:

struct Ship: Equatable, Hashable {
    let name: String
    let cargo: Int
    static func == (lhs: Ship, rhs: Ship) -> Bool {
        lhs.cargo == rhs.cargo
    }
    func hash(into hasher: inout Hasher) {
        hasher.combine(name)
    }
}

Specifically, while it's OK (and unavoidable) for two unequal values to have the same hash, it is a violation of the protocol semantics for two values that compare equal to have different hashes. In the example you give, two Ships with the same cargo but different names will be in this situation, and it will result in undefined behavior when using generic algorithms bound to Hashable.