Let Optional, Dictionary and Array conditionally conform to Hashable

I think I understand what you're getting at -- the name calls out the specific algorithm, and we lose that. On the other hand, there is value in setting up common expectations on what equality means for two collection instances.

Do you have an example for an ordered collection where elementsEqual would not be an suitable definition for equality?

It is--and that's useful--but there's no reason why it couldn't be spelled as heterogeneous == if we did not intend to distinguish equality of elements from equality of the collection.

More generally, is there a good reason not to just define that Equatable for collections means that they have the "same" contents?

This may well be (yet another) case where dynamic and static dispatch should differ. In a context where the constraint is “collection of Equatable elements,” set equality should care about element order, but if the constraint is “Equatable,” it should not.

Yeah, it’s ambiguous and unclear, IMO, as evidenced by the fact you had to quote “same”

1 Like

But it does at least clarify what "Equality implies substitutability" should mean for collections. (For example, it does not imply that equal collections would share the same indices.)

Collection types in general may not agree what exactly "same contents" means, but I think ordered collections do.

That's a good point—we've already answered that b and c are "equal" here, since we've defined == for ArraySlice since the beginning:

let a = [1, 2, 3, 4, 1, 2, 3, 4]
let b = a.prefix(4)
let c = a.suffix(4)
b == c // true

The way equality works for Set (equal elements, regardless of iteration order) says to me that "substitutable" can depend on the primary intended use of the type. I agree with @lorentey that slicing a set turns it into something different, where the order does matter. It's not unrelated that when you do this you also lose all the set operations.

2 Likes

Is there any concern about having potentially large collections conform to Hashable? This could be introducing a hidden performance issue that could bite people…

You don’t have to combine the hash value of every element in a collection to make a valid hash value for the overall collection. The implementation could only hash the first N elements and the last N elements, for instance. (There’s obviously more sophisticated methods to use than this illustrative example).

2 Likes

How would that work for unordered collections?

1 Like

Ah yea, it breaks down for Set and Dictionary. How do other languages handle this? You could return the count deterministically I guess, but that isn’t exactly great at avoiding collisions.

I think hashValue performance on large collections is less of an issue than it appears. The performance of calculating the hash value will be comparable to that of equality checks between equal Set and Dictionary values. (Hash tables will do one of those anyway for every successful lookup.)

Large collections are simply not appropriate for use in Set and as Dictionary keys.

1 Like

While I'd support something that leads to the parenthesized interpretation, I don't agree that “Equatable means ‘same’ contents” clarifies anything. If you'd said that “Equatable for a collection can only be based on the values of its elements,” that would begin to do that job.

“Equality implies substitutability” in our documentation is currently based on an informal notion of substitutability. Until you introduce some notion of essential-vs-inessential attributes, it's not something you can really pin down, and when you do, you find that “equality implies substitutability” is a bit circular: for a well-defined type, equal instances are substitutable insofar as operations only look at its essential parts, and its essential parts are what is compared by the equality operator.

While I'd agree that it's almost always a Bad Idea™ to define a collection type that includes essential parts outside of its elements, I'm not 100% comfortable (yet) with trying to make that a formal restriction.

I'm not sure anybody else will care, but I object to the term "unordered collection." All collections are inherently ordered, since they are multipass: you are guaranteed to see the same elements in the same order when you traverse them. I think what you're trying to capture is that the order of some collections, such as Set<T>, is not an essential attribute.

3 Likes

Indeed, that is a much clearer way to express what I was trying to say.

Okay! I think we should, however, apply this loose guideline on the concrete case of Slice<Base>. I.e.,

  • There are no essential parts of a Slice outside of the value and order of its elements. In particular, startIndex, endIndex or the identity of the base collection aren't essential attributes of a Slice.
  • The order of its elements is an essential attribute of a Slice, even that isn't the case for its base collection.

Therefore, I argue that the natural way to define Equatable over Slices is to use lexicographical equality. (Matching the definition we already have for == for ArraySlice.)

It is a misconception that order only starts to matter when you slice the Set.

extension Collection 
    where Element == Int // implies Self : Hashable
{
    func orderDependentComputation(inout cache: [Self : Int]) -> Int {
        if let r = cache[self] { return r }
        let r = orderDependentComputationImpl()
        resultCache[self] = r
        return r
   }
}

When you call this method on a Set, the equality operation used to look the Set up in cache had better be an order-dependent equality, or the code breaks.

I'm having trouble making sense of that assertion. There are important computations where the most efficient algorithm operates on sets-of-sets. When the inner sets are “large,” it's not like there's a better known answer.

+1 of course. Was that even in question?

1 Like

D'oh, you're right, that was way too general a statement. What I actually meant was

Large Set and Dictionary values are usually not appropriate for use in Set and as Dictionary keys.

Arguably, even that is an overstatement. The cost of a positive lookup in Set<Set<Foo>> is proportional to the size of the contained member, and we can't fix that without unacceptably slowing down important operations like Set.insert. But I'm sure some applications are happy with that cost.

It's certainly possible to store huge sets in Sets with O(1) lookups: we just need to define a custom collection that is optimized specifically for constant-time hashValue and ==. (One easy way is to define == probabilistically in terms of a multi-word, high-quality hash that is cached and incrementally updated on each mutation.)

So it appears this didn't make it into Swift 4.1 :( Any struct with optional properties does not get Hashable synthesized...this seems like a huge oversight and totally defeats the usefulness IMO

1 Like

It's on the master branch as of ≈yesterday:

https://github.com/apple/swift/pull/15382
https://github.com/apple/swift-evolution/pull/808

2 Likes