A hashability problem

total_swiftification · July 30, 2020, 5:31pm

Hello again.

I 'll get straight to the point

//suppose that I wanted to create a library to operate on graphs
//my design would be the following:
//Create a protocol to represent a graph
//and also create a protocol that would represent properties of such graph
 
protocol GraphProperty {
    static func proveProposition <G: Graph>(for graph: G) -> Bool
}
protocol Graph { var root: ...; var properties: Set<GraphProperty> } //rest is omitted

//Declare that all types which are GraphProperty
//should also provide a custom hashing based on nominal type identity.
//that seemes easy
protocol HashableIdentity {
    static var typeHash: String { get }
}
extension HashableIdentity {
    static var typeHash: String {
        return String(describing: self)
    }
}
extension GraphProperty: HashableIdentity {}

//So all ok? NO!
//Recall that in Graph declaration properties have to be a Set, 
//which requires its elements to provide Hashable conformance.
//"What's wrong with that?" - you might ask.
//"The fact that I cannot provide a custom hash representing type!"
//And moreover, Hashables are set to ALWAYS be identical concrete types
//Which means I cannot properly abstract the idea that
//set of my custom (existential) types should provide custom hashing within 
//that group
//in simpler words:

extension GraphProperty: Hashable {
    typealias Hash = String
    var hashValue: String { self.typeHash }
}
// IS NOT POSSIBLE. sigh

Don't know what feedback to expect. I guess what I want to say is "now live with that". Also please fix.

Lantua · July 30, 2020, 5:45pm

It doesn't sound quite right to have HashableIdentity instances compute their hashes with the type-wide typeHash. Are you trying to hash the metatypes?

struct GraphPropertySet {
  private var storage: [ObjectIdentifier: GraphProtocol.Type] = [:]

  mutating func insert(_ type: GraphProtocol.Type) {
    storage[.init(type)] = type
  }

  func contains(_ type: GraphProtocol.Type) -> Bool {
    storage[.init(type)] != nil
  }

  var values: Dictionary<ObjectIdentifier, GraphProtocol.Type>.Values {
    storage.values
  }
}

total_swiftification · July 30, 2020, 6:16pm

Yeap. Intended to use like such:

struct DoesntHaveCycles: GraphProperty { ... }
var properties: Set<GraphProperty.Type> = [DoesntHaveCycles.self, IsBinaryTree.self] //something like this

What is it?

Lantua · July 30, 2020, 6:22pm

Quite literally identifiers for class objects, and metatypes*. Each class instance and metatype will have a unique identifier, which is Comparable and Hashable.

It is one-way though. You can retrieve identifiers from class objects/metatypes, but you can't get back the objects/metatypes from identifiers. That's why I use Dictionary instead.

* IIRC, things get shaky if the class object that an identifier points to is deallocated. Since things here are only metatype, it doesn't really matter here.

It would be nice if metatype itself is Hashable, but they can't conform to protocols right now.

total_swiftification · July 30, 2020, 6:22pm

Oh no, please, put it back

total_swiftification · July 30, 2020, 6:50pm

Ok. Lantua was kind and smart to provide a solution that fits. But does anybody has an opinion about the actual problem which is that Hashables should also be Equatables and that Hashable is hardcoded to provide only Int as hash representing value and you cant customize it? Pls, don't offer me adopters

lorentey · July 30, 2020, 8:13pm

This is because hashing is defined in terms of equality, and is meaningless without it.

Hashing -- by definition -- works by mapping arbitrary types into a fixed-width bit sequence, which is usually modeled as an integer value.

But this is not how Hashable works -- you are supposed to implement hash(into:), not hashValue. See SE-0206 for the full explanation for this.

lorentey · July 30, 2020, 8:47pm

The root of the Set<GraphProtocol.Type> problem is that metatypes aren't currently Hashable. This is an unfortunate limitation that I hope we'll be able to resolve at some point!

The workaround that @Lantua suggested is the generally accepted solution. Another option is to wrap the metatype instances into a custom Hashable struct, and put that into a Set. Neither of these is as convenient as directly hashable metatypes would be, but the convenience/discoverability factor is pretty much the only drawback of these workarounds.

suyashsrijan · July 30, 2020, 9:06pm

You can do something like this perhaps:

struct GraphPropertyMetatypeWrapper {
  let metatype: GraphProperty.Type
}

extension GraphPropertyMetatypeWrapper: Equatable {
  static func ==(lhs: Self, rhs: Self) -> Bool {
      lhs.metatype == rhs.metatype
  }
}

extension GraphPropertyMetatypeWrapper: Hashable {
  func hash(into hasher: inout Hasher) {
      hasher.combine(ObjectIdentifier(metatype))
  }
}

extension Set {
  init(_ graphPropertyMetatypes: [GraphProperty.Type]) where Element == GraphPropertyMetatypeWrapper {
      self.init(graphPropertyMetatypes.map(GraphPropertyMetatypeWrapper.init))
  }
}

var properties: Set<GraphPropertyMetatypeWrapper> = .init([DoesntHaveCycles.self, IsBinaryTree.self]) // Okay

total_swiftification · July 31, 2020, 6:12pm

But what about existential types? They might be treated as a single type, but be different concrete types.

protocol Opaque {}
extension Opaque: Hashable { //implement custom hashing }
struct Ghost: Hashable {}

So if some code sees it as Opaque type that it uses a different hashing scheme.

func useVirtual <T: Opaque>(_ arg: T) { ... } //use existential' hashing
func useReal (_ arg: Ghost) { ... } //uses concrete' hashing

I thought about something like this

//by default any type that conforms to hashable and only 
//feeds some values into hasher gets Int as a hash representing value.
//otherwise, this proto can be a customization point
protocol Hashable {
   associatedtype Hash: HasTypeRangeDescriptor = Int
   var hashValue: Hash { get }
}
protocol HasTypeRangeDescriptor {
    associatedtype Amount: UnsignedInteger
    static var numberOfUniqueValues: Amount { get }
}

Neither existentials are.

Lantua · July 31, 2020, 6:21pm

Isn't it still fine for Hash to be Equatable in that case still? You also seem to want multiple hashing functions for a single type, which is a separate matter. Though I find it... weird, to say the least.

Note that you can usually work with the resulting Int, e.g., compressing it into UInt8, converting it to UInt, etc.

If you need more entropy/bits than the native Int, chances are the provided hashing algorithm is ill-suited anyway, and you'd need to create your own CustomHashable protocol.

total_swiftification · July 31, 2020, 9:09pm

A lot just happened. I was thinking at first about something like this:

protocol KitchenItem {
	var name: String { get }
}
extension KitchenItem: Hashable {
	func hash(into hasher: inout Hasher) {
		hasher.combine(self.name)
	}
}

struct Knife: KitchenItem, Hashable { 
    //assume that we live in perfect world where
    //where it is possible to have more than one customization point
	var name: String
	var isFrightening: Bool = true
	func hash(into hasher: inout Hasher) {
		hasher.combine(self.name)
		hasher.combine(self.isFrightening)
	}
}
struct Pan: KitchenItem {
	var name: String
}

That felt fine to me at first: any type viewed as kitchen item should have its own implementation of hashability and every other descendant type may opt to supplement its own hashability scheme; until I realized that this would be actually impossible because in this hypothetical case notion of Hashable becomes invariant.

var diverseToolset = Set<KitchenItem>([Knife(name: "Fierce Chopper"), Pan(name: "PAN-9000")])
func doTrick <C: Collection>(with collection: C) where C.Element: Hashable { 
	let _ = collection.allSatisfy { item in
		collection.allSatisfy { $0.hashValue == item.hashValue }
	}
}
doTrick (with: diverseToolset)

The code above would be broken because the hash values are not tied to a single identity (Pan uses default hashing from KitchenItem, while Knife uses its own hashing). And if in swift more powerful customization facility would ever be considered, something has to be done about variance.
Although, the code above would be perfectly fine, if it have had been explicitly set to account for such case, in other words, if it was set to use always Hashable implementation from KitchenItem, the world would be a better place.

var diverseToolset = Set<KitchenItem>([Knife(name: "Fierce Chopper"), Pan(name: "PAN-9000")])
func doTrick <C: Collection>(with collection: C) 
where C.Element: Hashable..? { 
//this code uses recent, most common witness from protocol hierarchy
//ignoring any overrides 
	collection.allSatisfy { item in
		collection.allSatisfy { $0 == item }
	} ? print("phew!") : print("oops...")
//now it is okay
}
doTrick (with: diverseToolset)

Thoughts?

Btw. I have this annotation in mind:

var set = Set<Hashable..?>([Pan(...), Knife(...)]) 
//uses most common witness from hierarchy
//(from KitchenItem)
set.insert("Fake blender") //compilation error 
//String does not share common witness with ... <Pick some common type here>
//KitchenItem will do fine

var set2 = Set<?..Hashable>([Pan(...), Pan(...), Pan(...)]) 
//most current type within this hierarchy can only be found within 
//concrete type, which is Pan in this case
set2.insert(Knife(...)) //error 
//Knife most recent witness for Hashable 
//is different from Pan' most recent witness for Hashable

So we can basically get Java (with a LOT of runtime dispatch and inheritance graph resolutions), and as everybody knows, java killed its type inference. Would be fine to see, what direction the core team will choose. I'd prefer becoming java though.

To reiterate

Summary

protocol Demiurge {
	func demonstrateGodlyPower ()
}
extension Demiurge {
	func demonstrateGodlyPower () {
		print ("...")
	}
}
protocol HigherGod: Demiurge {}
extension HigherGod {
	func demonstrateGodlyPower () {
		print ("...thunderstorm growls..")
	}
}
protocol MatterTransformingGod: HigherGod {}
extension MatterTransformingGod {
	func demonstrateGodlyPower () {
		print ("...turning X into Y...")
	}
}
struct Midas: MatterTransformingGod {
	func demonstrateGodlyPower () {
		print ("...turning mud into gold...")
	}
}
//invariant
func invoke <T: Demiurge>(_ target: T) { target.demonstrateGodlyPower() }
invoke (Midas()) //...turning mud into gold...
invoke (Midas() as Demiurge) //...turning mud into gold...
invoke (Midas() as HigherGod) //...turning mud into gold...
invoke (Midas() as MatterTransformingGod) //...turning mud into gold...

//higher bound
func invoke <T: ?..Demiurge>(_ target: T) {}
invoke (Midas()) //...turning mud into gold...
invoke (Midas() as Demiurge) //...
invoke (Midas() as HigherGod) //...thunderstorm growls...
invoke (Midas() as MatterTransformingGod) //...turning X into Y...

//lower bound
func invoke <T: Demiurge..?>(_ target: T) {}
invoke (Midas()) //...
invoke (Midas() as Demiurge) //...
invoke (Midas() as HigherGod) //...
invoke (Midas() as MatterTransformingGod) //...

Lantua · July 31, 2020, 11:22pm

Your notion of equality is missing. Do you regard two items with the same name to be the same item, even though they may not even have different type?

If you rely on name to be the sole indicator of equality, saying that two objects are the same iff they have the same name, you can just use name:

var tools = [String: KitchenItem]
tools[item.name] = item

Or you can wrap it:

struct AnyKitchenItem: Hashable {
  var value: KitchenItem

  public var hashValue: Int { value.name.hashValue }
  public static func ==(lhs: Self, rhs: Self) -> Bool {
    lhs.value.name == rhs.value.name
  }
}

OTOH, if you want objects of the same type to figure out themselves, you can use AnyHashable.

struct AnyHashableKitchenItem: Hashable {
  private var storage: AnyHashable
  init<Value: KitchenItem & Hashable>(_ value: Value) { storage = .init(value) }

  var value: KitchenItem { storage.base as! KitchenItem }
}

lorentey · August 1, 2020, 12:55am

total_swiftification:

I thought about something like this

//by default any type that conforms to hashable and only 
//feeds some values into hasher gets Int as a hash representing value.
//otherwise, this proto can be a customization point
protocol Hashable {
   associatedtype Hash: HasTypeRangeDescriptor = Int
   var hashValue: Hash { get }
}
protocol HasTypeRangeDescriptor {
    associatedtype Amount: UnsignedInteger
    static var numberOfUniqueValues: Amount { get }
}

What problem do you want to address with this?

Making the hash function the responsibility of individual hashable types isn't a good idea -- this is why we implemented SE-0206. If you want to configure the type of hash values, a better way of doing this would be to create a hasher protocol and make hash(into:) generic over it. You could then feed any hashable value to any hash function, and you would be able to define a custom hasher that returned an Int16 or Float or String or whatever hash value type you want.

This is covered in SE-0206's Alternatives Considered section. In a nutshell, while this would be indeed neat, it would generally come with a stiff performance penalty that would slow down Set and Dictionary only to enable an extremely niche use case. Thus, the Standard Library's Hashable only supports its own built-in hash function -- but this doesn't mean you cannot build your own Hashable protocol variant in a standalone package! (This is in fact how the Hasher concept was initially implemented -- on top of the really bad original version of Hashable that required types to roll their own hash functions.)

total_swiftification · August 1, 2020, 2:45pm

I want some of my types to have a stable hash value across launches

Kinda true, but is 'Hashable.hashValue' is deprecated as a protocol requirement; conform type '_' to 'Hashable' by implementing 'hash(into:)' instead a problem?

Lantua · August 1, 2020, 3:53pm

Both problems can probably be addressed by the design of your CustomHashable. Swift designs the hashing system by having Hashable only supplies what is needed for computation, and Hasher decides which hashing algorithm to use. It's probably a good start for designing custom hash functionality.

Also, if you create a new CustomHashable, it'd suggest use different name from hashValue. It's quite dangerous to have the same function signature for something that are purposely different.

protocol CustomHashable {
  func hash(into: inout CustomHasher)
}
struct CustomHasher {
  ...
  func consume(_ value: UnsafeBufferPointer) { ... }
  func finalize() -> UInt32 { /* Computes stable hash */ }
}

total_swiftification · August 1, 2020, 4:31pm

Uhm, the hashValue property is a requirement of Hashable and conformer can override it as necessary. It is apparent here

protocol CustomHashProvider: Hashable {}
extension CustomHashProvider {
    var hashValue: Int {
        Int.random(in: 0..<100) //stub
    }
}

Also, I want my custom protocol to have ability to be used wherever regular Hashable is usable. Being otherwise defeats the purpose of oo.

jonprescott · August 1, 2020, 4:44pm

Using the hashValue var is deprecated, is only there for backward-compatibility, and is discouraged from use. At some point, it will be removed from the standard library types. You might want to look at the Apple documentation on the Hashable protocol, Apple Developer Documentation

Lantua · August 1, 2020, 4:56pm

I have trouble figuring out your requirement. You want a CustomHashable which:

Provides a default implementation for Hashable to the conformer (type), and allows the conformer to use its own implementation to satisfy Hashable requirements,
Normal Hashable usage prefers the conformer implementation,
In some specific scenarios, forcibly use CustomHashable implementation of Hashable,
Allow Set to use CustomHashable implementations.
Supports type heterogeneity, i.e., allow different types, like Knife and Pan, to be in the same Set.

?

total_swiftification · August 1, 2020, 5:11pm

Yes to all. Do you see that there is something wrong?