I was just looking at the implementation of AnyHashable, and I see it is split between the files AnyHashable.swift and AnyHashableSupport.cpp. So I’m wondering what the shortcomings of a pure Swift solution are, that led to the decision to write part of it in C++.
For reference, a short, simple, pure-Swift implementation is possible (I just wrote one in a playground), but I don’t know how it stacks up efficiency-wise. Are there missing features which, if added, would make a pure Swift version of AnyHashable competitive?
Okay, I’ve answered my own question, and discovered some unexpected behaviors in the process.
First, the answer: it is not possible to write an AnyHashable implementation in Swift which can compare for equality two instances of different subclasses whose Equatable conformance is provided by their superclass.
My attempt at implementing it checks whether at least one of the base types can be converted to the other, but for sibling classes in a cluster, that is not possible. To write the implementation in pure Swift would require some way to get the supertype of two types which implements a protocol, which is exactly what the C++ part of AnyHashable does.
• • •
The first unexpected behavior I found is that AnyHashable does not always preserve the hashValue of its wrapped instance. In particular, for floating-point values representing an integer in the range -0x1p63..<0x1p64, the hashValue of the wrapper is different from that of the wrapped value:
let x = 1.0
let y = AnyHashable(x)
x.hashValue == y.hashValue // false
I filed a bug report SR–12091 for this, although it may be intended behavior.
• • •
Second, AnyHashable wrappers around values of different types which store the same integer compare as equal:
let a = AnyHashable(1)
let b = AnyHashable(1.0)
let c = AnyHashable(1 as UInt8)
(a == b, a == c, b == c) // (true, true, true)
type(of: a.base) == type(of: b.base) // false
type(of: a.base) == type(of: c.base) // false
type(of: b.base) == type(of: c.base) // false
I think this has come up before, so you may be able to find posts about the history of AnyHashable. But basically, it was a hack introduced in Swift 2(?) to better bridge NSDictionary values into Swift, since the keys of that type can't be expressed otherwise. It's unfortunate its lasted this long, as it's rather painful to deal with. I don't think it's meant to be used in Swift code that isn't bridging to Obj-C.
This is the intended behavior. See here and here for the implementation and explanations. (It is also alluded to here by @lorentey in explaining the implementation of Hashable enhancements. See also this earlier discussion which lays out the problem in more detail.)
We have had several Swift Evolution proposals to do with NSNumber behavior in Swift; this is not some temporary hack but part of the final design, and is now consistent across all platforms. A previous bridging behavior (SE-0139) was ultimately deemed suboptimal and abandoned with SE-0170; notably, this bridging design was adopted well afterAnyHashable was added in SE-0131. In fact, in motivating SE-0170, the authors write:
No matter if you are using Objective-C in your app/framework in addition to Swift or not the behavior should be easily understood and consistent.
I'm not aware of any guarantee that AnyHashable preserves the hashValue of its wrapped instance. In fact, it seems that the point of the _HasCustomAnyHashableRepresentation protocol is to permit otherwise.
The example given in the documentation for AnyHashable (written for SE-0131) is now outdated; it was actually corrected for the next version of Swift in PR #21550. In that conversation, @Joe_Groff explains:
AnyHashable should behave this way for any types that are transitively as? -castable, so you can get a String out as an NSString and v.v., an Array<T> as an NSArray , and so on. The standard library integer and floating-point types as well as CGFloat and Decimal all bridge to NSNumber , and NSNumber bridges back to any of those types if the value is exactly representable.
/// The `AnyHashable` type forwards equality comparisons and hashing operations
/// to an underlying hashable value, hiding its specific underlying type.
That strongly implies to me at least that the hashing operation will be forwarded to the underlying hashable value, and thus return its hash-value without modification.
/// Two instances of `AnyHashable` compare as equal if and only if the
/// underlying types have the same conformance to the `Equatable` protocol
/// and the underlying values compare as equal.
It is evident from attempting to write (1 as Int) == (1 as Double) that the types Int and Double do not have the same conformance to Equatable, since that raises a compiler error.
Yes, this is misleading. The documentation you quote from was last touched four years ago, and the edge cases you and I have enumerated above were fixed two years later. See also the corresponding bug.
That pull request I linked to in my previous post was merged only three months ago; it corrected a glaringly out-of-date documentation example originally written when AnyHashable was first added to the standard library, but did not update the rest of the documentation.
To be fair, the type does forward the hashing operation to an underlying value (_box._canonicalBox), just not the one you're probably thinking of (_box._base). I think it would be fair to file a bug against the outdated documentation.
The general rule is that Swift never guarantees that distinct types will generate matching hash values (e.g., see String vs NSString, Int vs Int8, T vs Optional<T>, etc.). This applies to conversions to/from AnyHashable, too.
AnyHashable could never guarantee to preserve hash values, because it always ensured that Objective-C and Swift counterparts of the same bridged value will compare the same when converted to AnyHashable -- even if the counterparts used different hashing algorithms. (It just didn't do a particularly good job of this before #17396; and some monsters may still be lurking in its murky depths.)
SE-0131, SE-0143, and SE-0170 certainly complicated things, but they just added to a preexisting pile of sadness. For example, we always needed String and NSString to have a consistent definition of equality+hashing under AnyHashable, even though they don't even agree on what equality means:
let s1 = "café"
let s2 = "cafe\u{301}"
let n1 = "café" as NSString
let n2 = "cafe\u{301}" as NSString
// String does Unicode normalization before its comparisons/hashing:
print(s1 == s2) // ⟹ true
print(s1.hashValue == s2.hashValue) // ⟹ true
// NSString doesn't:
print(n1 == n2) // ⟹ false
print(n1.hashValue == n2.hashValue) // ⟹ false
// However, we want all four of these strings to compare equal (and therefore,
// hash the same) under AnyHashable:
let set: Set<AnyHashable> = [s1, s2, n1, n2]
print(set.count) // ⟹ 1
print((s1 as AnyHashable) == (s2 as AnyHashable)) // ⟹ true
print((s2 as AnyHashable) == (n1 as AnyHashable)) // ⟹ true
print((n1 as AnyHashable) == (n2 as AnyHashable)) // ⟹ true
// Therefore, AnyHashable must not be using NSString's definitions
// for equality and hashing.
There may be other cases, including some we may not be handling correctly yet. (For example, I know downcasting from AnyHashable does not always work the way it should.)
Note that AnyHashable's hash encodings aren't considered part of its ABI -- we may need to change them between any two releases. (Besides fixing bugs like [SR-9047] Optional needs to have a custom AnyHashable representation · Issue #51550 · apple/swift · GitHub, we may want to start using the value's underlying (canonical) type as a hash discriminator in the future, in which case AnyHashable would stop preserving hash values altogether.)