Set Uniqueness of instances with the same hashValue

I stumbled on an unexpected behavior of Sets and Dictionaries while conforming to Hashable. I am wondering how are Sets and Dictionarys expected to behave for values with the same hashValue, but that aren't equatable?

Given the ViewModel where we preserve uniqueness over id, but equality on all properties:

class ViewModel: Hashable {
    let id: String
    let name: String
    let price: String

    init(id: String, name: String, price: String) {
        self.id = id
        self.name = name
        self.price = price
    }

    func hash(into hasher: inout Hasher) {
        hasher.combine(id)
    }

    static func == (lhs: ViewModel, rhs: ViewModel) -> Bool {
        return lhs.id == rhs.id &&
            lhs.name == rhs.name &&
            lhs.price == rhs.price
    }
}

While running the following I expect that the Set only contains 2 items as the 3rd item contains the same id as the 1st item; however, the following test fails

class ViewModelTests: XCTestCase {
    func testHashabilityIsMaintainedOnlyByID() {
        // Prepare
        var viewModels = Set<ViewModel>()

        let sut1 = ViewModel(id: "123", name: "Hello", price: "$10")
        let sut2 = ViewModel(id: "1234", name: "Hello", price: "$10")
        let sut3 = ViewModel(id: "123", name: "Hello World!", price: "$100")

        // Exercise
        viewModels.formUnion(Set([sut1, sut2, sut3]))

        // Verify
        XCTAssertEqual(sut1.hashValue, sut3.hashValue) // Succeeds with True
        XCTAssertEqual(viewModels.count, 2) // Fails with count = 3
    }
}

Do Sets and Dictionaries in Swift work on equality to decide if the Element gets added to the Collection rather than the hashValue?

I am also trying to figure out how DiffableDatasources might behave given the above behavior because I am under the impression that this might impact the way DiffableDatsources behave. If you have an item with a specific id for example, but you ran an update on the item on an external service and changed its name. Later the change gets published downstream and you receive the updated item with the same id but different name. You apply the list of items to the DiffCalculator. The way I expect DiffableDatasources to behave here is to perform an update on that item only because it is the same item essentially but the content changed. On the other hand it seems here that instead what the DiffableDatasource might do is delete the item and insert it again.

1 Like

Hash value is just a heuristic to help with Set and Dictionary performance. It does not dictate the equality of two objects. == does.

When hash collision occurs, i.e. you have two unequal (!=) objects with the same hash, they might just be a little slower when you add/retrieve those objects. You still have three unequals (!=) items.


I don't know what DiffableDatasources you're referring to is, so I can't help much in that regard.

What you're explaining is much closer to the concept of Identifiable where you have the data the split out a Hashable id instead. If that DiffableDatasources supports updating items by id, that may be closer to what the API wants of you.

Thank you very much @Lantua for the quick reply and explanation! It makes sense now on how things work underneath the hood.

As for the DiffableDatasource I explicitly meant the NSDiffableDataSourceSnapshot used with UICollectionViews and UITableViews. It uses Hashable instead of Identifiable that's why I had some concerns on the behavior.

I suspect that what you actually want is to have Dictionary<String, ViewModel> with id's as keys. This will give you the expected behaviour of merging view models by id.

let viewModels = Dictionary([sut1, sut2, sut3], uniquingKeysWith: { $1 })

This is perhaps a bug in the Swift overlay for the diffable data source APIs. They really should say Equatable instead of just Hashable. (It should be Equatable, not Identifiable, since the values are identifiers, they don't have identifiers.)

The whole thing makes sense in Obj-C, because both the section and item values must be objects in Obj-C (because they're in NSArrays), and Obj-C objects are always Equatable. It happens to work in Swift too, because pure Swift section/item values are going to get boxed into Obj-C objects, and so will acquire equability in the process. (At least, that's what I think is going on here.)

The hashability is just a requirement that lets the values be stored in a hash table for performance reasons, nothing to do with identifying them (as @Lantua said).

1 Like

After finding out how Set and Dictionaries work that the hashValue isn't what decides if an element is stored in a Set or Dictionary, that would definitely be the solution to go for if I wanted to store according to ID only; however, I was just writing the Unit Test to ensure uniqueness over ID for later when the Diff would be applied on NSDiffableDataSourceSnapshot to have the update be applied accordingly instead of a delete then insertion. Nevertheless, thank you for contributing to the thread!

Makes complete sense. Thank you for giving more insights about the problem!