Pluggable hash algorithm for containers


(Ray Fix) #1

Last year, Howard Hinnant proposed a composable hash algorithm framework for C++ that I was impressed with.

His proposal is here: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3980.html
A live presentation is here: https://www.youtube.com/watch?v=Njjp_MJsgt8 <https://www.youtube.com/watch?v=Njjp_MJsgt8>

A type with multiple fields you inevitably want to do a hash_combine but this pollutes the hashing algorithm. It turns out that (nearly) all hashing algorithms can be abstracted into 3 phases: initialization, digestion and finalization. It might be cool if Swift Hashable adopted this pluggable architecture.

1. Easily adopt Hashable
2. Easily change and compare hash algorithms based on performance / security needs
3. Make good hash implementations trivial for users.

Wondering if anyone has thought about this already / if it would be worthwhile pursuing more.

Best wishes,
Ray

PS: I have always thought that it is a bummer that compound types such as tuples can’t conform to protocols and therefore be Hashable.


(Dmitri Gribenko) #2

Hi Ray,

A prototype is here (written before Howard's talk, so it probably uses
different terms):

https://github.com/apple/swift/blob/master/validation-test/stdlib/HashingPrototype.swift

However, I couldn't make it interoperate with NSObject.hash in both
directions: you should be able to override 'var hash' and get an
implementation of Hashable based on that, and vice versa. It could be
solvable with protocol extensions now, I haven't looked at that
prototype for more than a year -- help and patches appreciated.

Dmitri

···

On Thu, Dec 3, 2015 at 4:14 PM, Ray Fix <rayfix@gmail.com> wrote:

Last year, Howard Hinnant proposed a composable hash algorithm framework for C++ that I was impressed with.

His proposal is here: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3980.html
A live presentation is here: https://www.youtube.com/watch?v=Njjp_MJsgt8

A type with multiple fields you inevitably want to do a hash_combine but this pollutes the hashing algorithm. It turns out that (nearly) all hashing algorithms can be abstracted into 3 phases: initialization, digestion and finalization. It might be cool if Swift Hashable adopted this pluggable architecture.

--
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr@gmail.com>*/


(Ray Fix) #3

A prototype is here (written before Howard's talk, so it probably uses
different terms):

https://github.com/apple/swift/blob/master/validation-test/stdlib/HashingPrototype.swift

Thank your for the excellent pointer, Dmitri. I will look into that as I get setup to build, test, and make actual changes.

However, I couldn't make it interoperate with NSObject.hash in both
directions: you should be able to override 'var hash' and get an
implementation of Hashable based on that, and vice versa. It could be
solvable with protocol extensions now, I haven't looked at that
prototype for more than a year -- help and patches appreciated.

As you know, a hashing algorithm just needs to consume raw bytes. The HasherType protocol could use protocol extensions to provide a clean API to standard types like Int and Float, Double, String, Bool, UInt32, Sequences, etc.

extension CGPoint : NewHashable {
  func combineInto(hasher: Hasher) {
    x.combineInto(hasher)
    y.combineInto(hasher)
  }

What would be even better is if there was some interspection with a default implementation so the user code could look like this:

extension CGPoint : NewHashable {}

but that is dependent on another language feature.

The default implementation of hash is to call combineInto(Hasher) with a legacy hasher and immediately squeeze. Might even want this to be statically dispatched (an extension with no protocol definition).

extension NewHashable {
    var hash: Int {
         var h = LegacyHasher()
         self.combineInto(h)
         return h.squeezeHashValue()
    }
}

Any thoughts?

Ray

···

On Dec 3, 2015, at 8:29 PM, Dmitri Gribenko <gribozavr@gmail.com> wrote:


(Dmitri Gribenko) #4

A prototype is here (written before Howard's talk, so it probably uses
different terms):

https://github.com/apple/swift/blob/master/validation-test/stdlib/HashingPrototype.swift

Thank your for the excellent pointer, Dmitri. I will look into that as I get setup to build, test, and make actual changes.

However, I couldn't make it interoperate with NSObject.hash in both
directions: you should be able to override 'var hash' and get an
implementation of Hashable based on that, and vice versa. It could be
solvable with protocol extensions now, I haven't looked at that
prototype for more than a year -- help and patches appreciated.

As you know, a hashing algorithm just needs to consume raw bytes. The HasherType protocol could use protocol extensions to provide a clean API to standard types like Int and Float, Double, String, Bool, UInt32, Sequences, etc.

extension CGPoint : NewHashable {
  func combineInto(hasher: Hasher) {
    x.combineInto(hasher)
    y.combineInto(hasher)
  }

I definitely agree.

What would be even better is if there was some interspection with a default implementation so the user code could look like this:

extension CGPoint : NewHashable {}

but that is dependent on another language feature.

Right, there are a lot of interesting things we could do if we had
more introspection language features.

The default implementation of hash is to call combineInto(Hasher) with a legacy hasher and immediately squeeze. Might even want this to be statically dispatched (an extension with no protocol definition).

extension NewHashable {
    var hash: Int {
         var h = LegacyHasher()
         self.combineInto(h)
         return h.squeezeHashValue()
    }
}

Any thoughts?

The interop issue is the following. Consider a subclass of NSObject,
possibly defined in Objective-C, that does not conform to NewHashable:
we want to use its 'hash' implementation. If someone subclasses
NSObject in Swift and implements NewHashable, we should use its API
instead. If someone further subclasses that Swift class in
Objective-C, they should be able to add more properties, and override
'var hash', since that's the only API available in Objective-C.

Dmitri

···

On Fri, Dec 4, 2015 at 10:45 AM, Ray Fix <rayfix@gmail.com> wrote:

On Dec 3, 2015, at 8:29 PM, Dmitri Gribenko <gribozavr@gmail.com> wrote:

--
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr@gmail.com>*/