A hashability problem

But what about existential types? They might be treated as a single type, but be different concrete types.

protocol Opaque {}
extension Opaque: Hashable { //implement custom hashing }
struct Ghost: Hashable {}

So if some code sees it as Opaque type that it uses a different hashing scheme.

func useVirtual <T: Opaque>(_ arg: T) { ... } //use existential' hashing
func useReal (_ arg: Ghost) { ... } //uses concrete' hashing

I thought about something like this

//by default any type that conforms to hashable and only 
//feeds some values into hasher gets Int as a hash representing value.
//otherwise, this proto can be a customization point
protocol Hashable {
   associatedtype Hash: HasTypeRangeDescriptor = Int
   var hashValue: Hash { get }
}
protocol HasTypeRangeDescriptor {
    associatedtype Amount: UnsignedInteger
    static var numberOfUniqueValues: Amount { get }
}

Neither existentials are.

Isn't it still fine for Hash to be Equatable in that case still? You also seem to want multiple hashing functions for a single type, which is a separate matter. Though I find it... weird, to say the least.

Note that you can usually work with the resulting Int, e.g., compressing it into UInt8, converting it to UInt, etc.

If you need more entropy/bits than the native Int, chances are the provided hashing algorithm is ill-suited anyway, and you'd need to create your own CustomHashable protocol.

A lot just happened. I was thinking at first about something like this:

protocol KitchenItem {
	var name: String { get }
}
extension KitchenItem: Hashable {
	func hash(into hasher: inout Hasher) {
		hasher.combine(self.name)
	}
}

struct Knife: KitchenItem, Hashable { 
    //assume that we live in perfect world where
    //where it is possible to have more than one customization point
	var name: String
	var isFrightening: Bool = true
	func hash(into hasher: inout Hasher) {
		hasher.combine(self.name)
		hasher.combine(self.isFrightening)
	}
}
struct Pan: KitchenItem {
	var name: String
}

That felt fine to me at first: any type viewed as kitchen item should have its own implementation of hashability and every other descendant type may opt to supplement its own hashability scheme; until I realized that this would be actually impossible because in this hypothetical case notion of Hashable becomes invariant.

var diverseToolset = Set<KitchenItem>([Knife(name: "Fierce Chopper"), Pan(name: "PAN-9000")])
func doTrick <C: Collection>(with collection: C) where C.Element: Hashable { 
	let _ = collection.allSatisfy { item in
		collection.allSatisfy { $0.hashValue == item.hashValue }
	}
}
doTrick (with: diverseToolset)

The code above would be broken because the hash values are not tied to a single identity (Pan uses default hashing from KitchenItem, while Knife uses its own hashing). And if in swift more powerful customization facility would ever be considered, something has to be done about variance.
Although, the code above would be perfectly fine, if it have had been explicitly set to account for such case, in other words, if it was set to use always Hashable implementation from KitchenItem, the world would be a better place.

var diverseToolset = Set<KitchenItem>([Knife(name: "Fierce Chopper"), Pan(name: "PAN-9000")])
func doTrick <C: Collection>(with collection: C) 
where C.Element: Hashable..? { 
//this code uses recent, most common witness from protocol hierarchy
//ignoring any overrides 
	collection.allSatisfy { item in
		collection.allSatisfy { $0 == item }
	} ? print("phew!") : print("oops...")
//now it is okay
}
doTrick (with: diverseToolset)

Thoughts?


Btw. I have this annotation in mind:

var set = Set<Hashable..?>([Pan(...), Knife(...)]) 
//uses most common witness from hierarchy
//(from KitchenItem)
set.insert("Fake blender") //compilation error 
//String does not share common witness with ... <Pick some common type here>
//KitchenItem will do fine

var set2 = Set<?..Hashable>([Pan(...), Pan(...), Pan(...)]) 
//most current type within this hierarchy can only be found within 
//concrete type, which is Pan in this case
set2.insert(Knife(...)) //error 
//Knife most recent witness for Hashable 
//is different from Pan' most recent witness for Hashable

So we can basically get Java (with a LOT of runtime dispatch and inheritance graph resolutions), and as everybody knows, java killed its type inference. Would be fine to see, what direction the core team will choose. I'd prefer becoming java though.

To reiterate

Summary
protocol Demiurge {
	func demonstrateGodlyPower ()
}
extension Demiurge {
	func demonstrateGodlyPower () {
		print ("...")
	}
}
protocol HigherGod: Demiurge {}
extension HigherGod {
	func demonstrateGodlyPower () {
		print ("...thunderstorm growls..")
	}
}
protocol MatterTransformingGod: HigherGod {}
extension MatterTransformingGod {
	func demonstrateGodlyPower () {
		print ("...turning X into Y...")
	}
}
struct Midas: MatterTransformingGod {
	func demonstrateGodlyPower () {
		print ("...turning mud into gold...")
	}
}
//invariant
func invoke <T: Demiurge>(_ target: T) { target.demonstrateGodlyPower() }
invoke (Midas()) //...turning mud into gold...
invoke (Midas() as Demiurge) //...turning mud into gold...
invoke (Midas() as HigherGod) //...turning mud into gold...
invoke (Midas() as MatterTransformingGod) //...turning mud into gold...

//higher bound
func invoke <T: ?..Demiurge>(_ target: T) {}
invoke (Midas()) //...turning mud into gold...
invoke (Midas() as Demiurge) //...
invoke (Midas() as HigherGod) //...thunderstorm growls...
invoke (Midas() as MatterTransformingGod) //...turning X into Y...

//lower bound
func invoke <T: Demiurge..?>(_ target: T) {}
invoke (Midas()) //...
invoke (Midas() as Demiurge) //...
invoke (Midas() as HigherGod) //...
invoke (Midas() as MatterTransformingGod) //... 

Your notion of equality is missing. Do you regard two items with the same name to be the same item, even though they may not even have different type?

If you rely on name to be the sole indicator of equality, saying that two objects are the same iff they have the same name, you can just use name:

var tools = [String: KitchenItem]
tools[item.name] = item

Or you can wrap it:

struct AnyKitchenItem: Hashable {
  var value: KitchenItem

  public var hashValue: Int { value.name.hashValue }
  public static func ==(lhs: Self, rhs: Self) -> Bool {
    lhs.value.name == rhs.value.name
  }
}

OTOH, if you want objects of the same type to figure out themselves, you can use AnyHashable.

struct AnyHashableKitchenItem: Hashable {
  private var storage: AnyHashable
  init<Value: KitchenItem & Hashable>(_ value: Value) { storage = .init(value) }

  var value: KitchenItem { storage.base as! KitchenItem }
}

What problem do you want to address with this?

Making the hash function the responsibility of individual hashable types isn't a good idea -- this is why we implemented SE-0206. If you want to configure the type of hash values, a better way of doing this would be to create a hasher protocol and make hash(into:) generic over it. You could then feed any hashable value to any hash function, and you would be able to define a custom hasher that returned an Int16 or Float or String or whatever hash value type you want.

This is covered in SE-0206's Alternatives Considered section. In a nutshell, while this would be indeed neat, it would generally come with a stiff performance penalty that would slow down Set and Dictionary only to enable an extremely niche use case. Thus, the Standard Library's Hashable only supports its own built-in hash function -- but this doesn't mean you cannot build your own Hashable protocol variant in a standalone package! (This is in fact how the Hasher concept was initially implemented -- on top of the really bad original version of Hashable that required types to roll their own hash functions.)

3 Likes

I want some of my types to have a stable hash value across launches

Kinda true, but is 'Hashable.hashValue' is deprecated as a protocol requirement; conform type '_' to 'Hashable' by implementing 'hash(into:)' instead a problem?

Both problems can probably be addressed by the design of your CustomHashable. Swift designs the hashing system by having Hashable only supplies what is needed for computation, and Hasher decides which hashing algorithm to use. It's probably a good start for designing custom hash functionality.

Also, if you create a new CustomHashable, it'd suggest use different name from hashValue. It's quite dangerous to have the same function signature for something that are purposely different.

protocol CustomHashable {
  func hash(into: inout CustomHasher)
}
struct CustomHasher {
  ...
  func consume(_ value: UnsafeBufferPointer) { ... }
  func finalize() -> UInt32 { /* Computes stable hash */ }
}
1 Like

Uhm, the hashValue property is a requirement of Hashable and conformer can override it as necessary. It is apparent here

protocol CustomHashProvider: Hashable {}
extension CustomHashProvider {
    var hashValue: Int {
        Int.random(in: 0..<100) //stub
    }
}

Also, I want my custom protocol to have ability to be used wherever regular Hashable is usable. Being otherwise defeats the purpose of oo.

Using the hashValue var is deprecated, is only there for backward-compatibility, and is discouraged from use. At some point, it will be removed from the standard library types. You might want to look at the Apple documentation on the Hashable protocol, https://developer.apple.com/documentation/swift/hashable

2 Likes

I have trouble figuring out your requirement. You want a CustomHashable which:

  1. Provides a default implementation for Hashable to the conformer (type), and allows the conformer to use its own implementation to satisfy Hashable requirements,
  2. Normal Hashable usage prefers the conformer implementation,
  3. In some specific scenarios, forcibly use CustomHashable implementation of Hashable,
  4. Allow Set to use CustomHashable implementations.
  5. Supports type heterogeneity, i.e., allow different types, like Knife and Pan, to be in the same Set.

?

1 Like

Yes to all. Do you see that there is something wrong?

#1 and #2 are orthogonal to #3, #4, and #5.

In #3, #4, and #5, you're trying to differentiate them from a subset of their properties that CustomHashable can see. It sounds a lot more like Identifiable than Hashable, and that pattern probably works better (Identifiable has associated type, so we won't be using it):

protocol KitchenItem {
    var name: String { get }
}

private struct ID: Hashable {
    var name: String
}
extension KitchenItem {
    var id: some Hashable {
        ID(name: name) // Or just use `name` if there's one field.
    }
}

struct IdentifiedKitchenItem: Hashable {
    var item: KitchenItem
    
    public func hash(into hasher: inout Hasher) { item.id.hash(into: &hasher) }
    public static func ==(lhs: Self, rhs: Self) -> Bool { lhs.item.id == rhs.item.id }
}

which would run into the question I asked earlier:

and I'll just assume the answer is yes.


For #1 and #2, well, it's an extremely risky business to provide Hashable implementation for types that you only partially know. If you insist, however, you can use the good'ol trick:

public extension KitchenItem where Self: Hashable {
    func hash(into hasher: inout Hasher) { id.hash(into: &hasher) }
    static func ==(lhs: Self, rhs: Self) -> Bool { lhs.id == rhs.id }
}

This would be preferred over synthesized Hashable implementation (red flag!), and it would use only id to differentiate between two instances (red flag!).

They shouldn't be. Here is another bit of my old examples with made-up syntax from above replies.

let mainStageTonight = 
	Set<MusicPerformer & Comparable..?>([FreddieMerqurie(), JimmyHendrix()]) 
//can store any musician whose performance quality can be compared
protocol MusicPerformer: Comparable {
	var crowdExcitement: Int { Int.random(in: 0..<9) }
	func singASong() { print("...doing some performance...") }
	func < (lhs: MusicPerformer, rhs: MusicPerformer) -> Bool {
		if lhs.crowdExcitement < rhs.crowdExcitement { return true }
		if lhs.crowdExcitement > rhs.crowdExcitement { return false }
	}
}
struct FreddieMerqurie: MusicPerformer, Comparable { 
	func singASong() { print("You have been rocked!") }
	let crowdExcitement: Int
    let wasOnDrugs: Bool
    func < (...) ...
}
struct JimmyHendrix: MusicPerformer, Comparable {
	func singASong() { print("Such powerful riffs!") }
	let crowdExcitement = 10
    let whatever: ...
    func < (...) ... 
}

//this means that even though FreddieMerqurie and JimmyHendrix are
//different things, they still can be compared on how they do music
//while FreddieMerqurie's and JimmyHendrix's can have different measurements
//when comparing to themselves

//but there is a critical difference:
//while these musicians should be compared with identical standards
//(should be compared as MusicPerformer s)
//they also should expose unique singing

for performer in mainStageTonight { performer.singASong() } //You have been rocked etc
//whist
mainStageTonight.max() //shoud compare them as MusicPerformer s

Don't you agree that this is a sound semantic?

So this is not acceptable?

let mainStageTonight: Set<ComparableMusicPerformer> = [
  .init(FreddieMerqurie()),
  .init(JimmyHendrix()),
]

struct ComparableMusicPerformer: Hashable, Comparable {
  private var storage: MusicPerformer

  statuc func ==(lhs: Self, rhs: Self) -> Bool { ... }
  ...
}

Well, uhm, kinda, buuuut.... adapters, man! :sob:. They are boilerplate and does not compose well, for every combination of some traits, codebase would be bloated. I was baited by Protocol-oriented programming, sigh.

To quote @jrose:

which should also apply equally as well here.

Ugh, that's obviously a poor man' solution, for an expression problem.

I admit that I've not been able to follow exactly what you want to accomplish, but it seems that you want hashability to mean different things at different times for the same entities, all using the same protocol.

Protocols are not intended to be bags of syntax. If you need different ways of hashing the same type, you need a separate protocol per semantic.

Besides for the technical issues, I'm really struggling to understand why you'd want to write such impenetrable code in the first place.

Not for same entities: I want to have the ability to express different hashing for a concrete type and its existential type. For ex

protocol A: Hashable { //default impl here }
struct Beta: A, Hashable { //provide different hashing  }
//now code that wants Beta uses its hashability, 
//whilst code that expect A type, uses A's hashbility
//a bit simplified version of what I want, but correct

What do you mean by bags of syntax?

Depends on the case

Check this or this. It is true that semantic of the code might be covered from sight, but it can be mitigated with various assistance from IDE.

And why exactly would you ever want that? What specific problem requires you to have stable hash values?

Such a desire is fundamentally incompatible with the safety goals of the general-purpose hashing collection types in the Standard Library. I strongly recommend that you reconsider your requirements.

(Note: if you use a Hashable type that only implements hashValue in a Set or Dictionary, then the collection will still pass the returned integer through a Hasher to break up the overlong hash chains that are almost always produced by manual attempts at writing a hash function. Implementing hashValue is deprecated for good reasons.)

If you have a special use-case where you have full control over the input data you want to hash, then deterministic hashing may be a valid choice. In this case, the best way to go about it is to write your own custom hash table implementation. In most cases, this is considerably less difficult than implementing a good hash function.

You don't get to dictate what hash function the standard Set and Dictionary uses -- this is by deliberate design. However, in your own collection implementation, you are able to calculate hash values in whatever way you like!

No, this isn't a problem. If you must return a specific numerical value, simply define a separate property for that in your custom hashable-derived protocol, and in its default hash(into:) implementation, feed that value to the hasher. I can pretty much guarantee a hashValue-style protocol requirement will never be as convenient, reliable or efficient as simply implementing hash(into:) though.

I recommend you first concentrate on solving how to implement equality checks; the solution for hashing will follow directly from that. (The correct hash(into:) implementation can be (more or less) mechanically derived from the code for ==.)

(I don't see what this problem has to do with stable hash values, though.)

1 Like
Terms of Service

Privacy Policy

Cookie Policy