[Draft] Hasher & HashVisitable

hartbit · March 14, 2017, 6:27pm

Source compatibility

Making use of "extending protocols to conform to protocols":

extension Hashable: HashVisitable
{

func hash<H: Hasher>(_ hasher: inout
H) {

self.hashValue.hash(&
hasher)
}
}

We're unlikely to add this feature soon. It seems reasonable to me to instead have `HashVisitable` refine `Hashable` and provide a default implementation of `hashValue` using a default hasher. I think we still want `Hashable` to be the currency protocol most APIs work with for performance in unspecialized code, since we could inline the visitation and hasher implementation together inside the specialized `hashValue` witness.

Can you explain the performance argument? How does it fare (in your opinion) compared to the arguments in the proposal?

How about:

protocol Hashable {
func hash<H: Hasher>(with hasher: inout H)
}

extension Hashable {
    var hashValue: Int {
        var hasher = StdLibDefaultHasher()
        hash(with: hasher)
        return hash.finish()
    }
}

···

On 14 Mar 2017, at 16:41, Joe Groff via swift-evolution <swift-evolution@swift.org> wrote:

On Mar 13, 2017, at 8:38 AM, Vincent Esche via swift-evolution <swift-evolution@swift.org> wrote:

-Joe
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

VladimirS · March 14, 2017, 5:00pm

I’m dumb when it comes to proper hashing, but it’s such a tediously
common thing in my experience to need to add Hashable to some kind of
a struct so I can stash it in a set or use it as a dictionary key. Is
there really no way to make this all more automatic? I have to be
honest - this proposal seems *harder* to understand than the way it
works now. Of course the easiest would be if the language could just
do this “good enough" for me using reflection or whatever and if I
really did run into a problem where I wanted to do this myself, I
could override something.

Perfect is the enemy of good.

The compiler totally ought to derive reasonable default Equatable and
Hashable implementations for you. That would allow the standard library
to do the right thing without burdening most users with the need to
sweat the details.

I believe many of us dream of an feature of auto-generated Equatable and Hashable implementations.

Btw, what is your opinion, can we expect (at some point of time) Equatable&Hashable for tuples?
It is very handy to have Dictionary<Tuple<type1,type2>, type3> in C# and sad that we can't have [(type1,type2):type3] in Swift without declaration of new temporary struct type and implement Equatable&Hashable for it.

···

On 14.03.2017 18:34, Joe Groff via swift-evolution wrote:

On Mar 13, 2017, at 10:54 AM, Sean Heber via swift-evolution >> <swift-evolution@swift.org> wrote:

-Joe _______________________________________________ swift-evolution
mailing list swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

xwu · March 14, 2017, 1:44am

Adding the list back to this reply because I don't believe you meant to
reply only to me, and I think it's an important discussion to have :)

Automatic generation of Hashable implementation code only "solves" the
problem of having to implement `var hashValue`.
It however only works for some types. By no means for all.

Certainly, and I never suggested that it would work for every case. I was
responding to Sean's point that the compiler should be able to do "good
enough" in the common cases and I offered a way that that can be achieved.

Take this hypothetical implementation of a HSB color:

struct Color {
let hue: UInt8
let saturation: UInt8
let brightness: UInt8
}

Let the semantic of this type be this:
Any two colors with a `brightness` of `0` are to be considered equal
regardless of their respective `hue` or `saturation`. At night all cats are
gray.

An auto-generated conformance impl for `Color` would most likely be wrong
afaict.
And those kind of semantics are everywhere.

Of course, and in those cases, you wouldn't want to use an auto-derived
Equatable or Hashable implementation. Those kinds of semantics are
"everywhere" for certain definitions of "everywhere", but similarly,
semantics where the hash value of a thing can be determined simply via
combination of the hash values of its components are also "everywhere".

I wouldn't suggest that auto-deriving Hashable solves *all* problems, and
similarly, your proposal doesn't help in the situation you described
either. Your API provides a different way to mix the hue, saturation, and
brightness components in an implementation of hashValue, but it doesn't
force the user to mix them *correctly* (by excluding hue/saturation if
brightness is zero).

So in both cases, users still have to provide custom implementations of ==
and hashValue. But without auto-deriving, users have to provide them even
for the cases where the auto-derived implementation *would have been
correct.* Auto-deriving is about reducing the number of types that need to
have custom implementations, thereby reducing the chance that a user will
do it wrong.

When considering a type with auto-derived Hashable and Equatable, there
are two ways it can be wrong:

1) The auto-generated implementations of both == and hashValue don't honor
the semantic contract of the type (for example, don't ignore brightness
when it's zero).
2) The user overrides the auto-generated implementation of one of
==/hashValue but not both, and violates the contracts between them.

For #1, yes, the compiler generated an incorrect implementation in that
case. However, I would argue it's the developer's responsibility to fix it
by overriding it if they need different semantics. As I mentioned above,
without derivation, the developer could still implement it incorrectly,
just as they could with your API.

For #2, this could be solved by requiring users to override both if they
override one of them. Now they're back in the same situation as #1—they
either did it right, or they did it wrong.

Auto-generation clearly is no panacea when it comes to hashing. Especially
if it leads to invalid and unsafe default implementations.

Auto-deriving cannot produce "unsafe" implementations because it's defined
in terms of a function operating over the hash values of its components. It
can produce an implementation that does not match the intended semantics of
the class that are defined outside of its component values, but that's not
the compiler's job to decide; it's up to the user to provide those.

Regarding unsafety, it's worth noting that your ContiguouslyHashable
protocol is inherently unsafe and fragile. Imagine that a user implements a
struct that they make conform to ContiguouslyHashable because at the time,
it's a simple fixed layout type with primitive fields. If they take that
type and add a new field to it that isn't contiguously hashable (say, a
class reference) and they forget to replace the ContiguouslyHashable
conformance, they now have a very broken type, and that behavior isn't
caught until *runtime* when things go wrong.

At least with derived conformances, in that situation the *compiler* would
see that the type was no longer hashable and emit an error when it's used
anywhere that Hashable/hashValue was expected.

So if safety is your motivation, ContiguouslyHashable kind of fights back
against that because it gives the user a door they can open—in some cases,
accidentally—that produces invalid results.

It would however be nice to be able to annotate types for which you want
HashVisitable implementations to be generated.

That's one of the still-open questions from the last discussion thread on
the topic; whether auto-deriving should be automatic for any type where it
is possible (as it is now for enums without associated values) or whether
the user should have to opt-in.

I actually don't see auto-generation as an alternative to a proper hashing
API. But rather as an addition.
HashVisitable and auto-deriving would actually work great together!

I completely agree that these ideas complement each other very well! And I
also agree that the language would do well to make it easier for users to
easily do the right thing. I just feel that *the API proposed here
specifically* adds too much complexity for the problem it's trying to solve.

I'd find it more compelling if it was simplified a bit:

* The standard library doesn't need to provide a number of hash
implementations; it should just provide one that works "well enough" in
most cases
* Hashable doesn't have tie itself to a visitor pattern. It's not
necessarily safer because users can still mix the wrong values; in that
case, I'd rather the stdlib implementation of the "good enough" hash just
be a standalone mixer that the language can encourage users to use.

I agree with this assessment. If SipHash is deemed "good enough," then
Swift should offer these as standalone facilities and encourage users to
call them in `Hashable`. I think the API design proposed here is much too
complex, but offering tried-and-true hash functions is certainly reasonable
and would be an improvement over xor.

Certainly also, the documentation for `Hashable` can be improved. In
general, however, it's not very convincing to motivate an entire proposal
for a new feature based on a documentation bug. Recognize that there are
(or at least have been, in past shipped versions of Swift) code snippets in
the documentation that _don't even compile_! If we accepted the logic put
forward here, we should give up on Swift entirely; after all, if even the
official Swift documentation can't provide code that compiles, what hope do
the rest of us have?!

···

On Mon, Mar 13, 2017 at 8:30 PM, Tony Allevato via swift-evolution < swift-evolution@swift.org> wrote:

On Mon, Mar 13, 2017 at 4:32 PM Vincent Esche <regexident.mailinglists@ > gmail.com> wrote:

In fact they already do in other languages <https://is.gd/iy5770>\.

On Mon, Mar 13, 2017 at 8:51 PM, Tony Allevato via swift-evolution < > swift-evolution@swift.org> wrote:

I'm not convinced this is the right approach: it couples the fact that a
type is hashable with a specific strategy for implementing the hash value
computation.

Instead, the language should make efforts to avoid requiring the user to
think about hashing algorithms *at all*; to answer Sean's question a couple
messages up, I've proposed in the past
<https://gist.github.com/allevato/2fd10290bfa84accfbe977d8ac07daad> (rough
working draft) adding derivation of Equatable and Hashable for structs and
enums where it's possible to automatically do so (for example, if all of
the enum's associated values are Equatable/Hashable). I've been
experimenting with an implementation in the compiler and I have something
that works for enums, except for recursive ones. I haven't started structs
yet because I think there are more open questions there, and I hope to
propose enums and structs independently so that the enum one doesn't get
bogged down by the struct one.

The core team seems to be aware of the need for this; the logic that
derives Equatable/Hashable for enums without associated values also has
TODO comments to handle those with associated values, and to handle structs
as well. Slava Pestov mentioned a while back that they encouraged PRs on
it, which is why I started, but my free time has been limited lately.

That being said, I *do* think there would be value in having some kind of
hash "mixer" as part of the standard library and strongly encouraging
through documentation that hashValue implementors use it. Getting the
function right is the hard part, and if Swift both (1) took care of it for
you by default in many cases and (2) made the harder cases easier by
providing a high quality canned implementation, I think we'd have a much
cleaner solution than what is being proposed here.

On Mon, Mar 13, 2017 at 12:18 PM Sean Heber via swift-evolution < > swift-evolution@swift.org> wrote:

Is there any reason the API couldn’t be expressed as something along these
lines?

func hash() -> [Hashable] {
return [x, y]
}

l8r
Sean

> On Mar 13, 2017, at 2:15 PM, David Hart <david@hartbit.com> wrote:
>
>>
>> On 13 Mar 2017, at 18:54, Sean Heber via swift-evolution < > swift-evolution@swift.org> wrote:
>>
>> I’m dumb when it comes to proper hashing, but it’s such a tediously
common thing in my experience to need to add Hashable to some kind of a
struct so I can stash it in a set or use it as a dictionary key. Is there
really no way to make this all more automatic? I have to be honest - this
proposal seems *harder* to understand than the way it works now.
>
> It's not really harder: just call hash on each of your type's
significant values:
>
> x.hash(&hasher)
> y.hash(&hasher)
>
> How would you implement hashValue in a simpler way, remembering that 'x
^ y' is an incorrect implementation?
>
>> Of course the easiest would be if the language could just do this “good
enough" for me using reflection or whatever and if I really did run into a
problem where I wanted to do this myself, I could override something.
>>
>> Perfect is the enemy of good.
>>
>> l8r
>> Sean
>>
>>
>>> On Mar 13, 2017, at 10:38 AM, Vincent Esche via swift-evolution < > swift-evolution@swift.org> wrote:
>>>
>>> Hi there,
>>>
>>> I've written up a proposal for an overhaul/replacement of the Hashable
protocol and would love to hear your feedback on it!
>>>
>>> Rendered | Blog Post
>>>
>>> Cheers,
>>> Vincent
>>>
>>> Ps: I'd like to thank David Hart (@hartbit) for his great editorial
feedback on this proposal.
>>>
>>> HashVisitable
>>>
>>> • Proposal: SE-NNNN
>>> • Authors: Vincent Esche
>>> • Review Manager: TBD
>>> • Status: Awaiting review
>>> Introduction
>>>
>>> Replace the Hashable protocol by two new procotols (Hasher and
HashVisitable) to improve safety, versatility and learnability.
>>>
>>> Motivation
>>>
>>> Implementing Hashable is difficult and the consequences if not done
well have dire performance and safety repercussions.
>>>
>>> The documentation of Hashable lists a sample implementation of var
hashValue:
>>>
>>> /// A point in an x-y coordinate system.
>>> struct GridPoint {
>>> var x: Int
>>> var y: Int
>>> }
>>>
>>> extension GridPoint: Hashable {
>>> var hashValue: Int {
>>> return x.hashValue ^ y.hashValue
>>> }
>>>
>>> static func == (lhs: GridPoint, rhs: GridPoint) -> Bool {
>>> return lhs.x == rhs.x && lhs.y == rhs.y
>>> }
>>> }
>>>
>>> Calculating the hashes of all GridPoints (given the above
implementation) on a 1000 × 1000 grid …
>>>
>>> let (width, height) = (1000, 1000)
>>> let total = width * height
>>> var hashes = Set<Int>()
>>> for x in 0..<width {
>>> for y in 0..<height {
>>> hashes.insert(GridPoint(x: x, y: y).hashValue)
>>> }
>>> }
>>> print("\(hashes.count) unique hashes out of a total of \(total).")
>>>
>>> … results in just 1024 unique hash values for 1_000_000 unique values.
>>>
>>> In other words: The recommended implementation causes 99.9% of values
to trigger a hash collision.
>>>
>>> Out of those 1_000_000 values the median collision count was 976 with
min and max being 976 and 1000respectively.
>>>
>>> The collision rate will have negative impact in algorithms which
heavily use hashValue like the ones in Dictionaryand Set. Furthermore, it
increases the vulnerability to DDOS attacks when exposed to the web.
>>>
>>> If even the official Swift documentation gets the implementation of
hashValue wrong, then who is to expect the average Swift programmer to do
any better?
>>>
>>> In contrast, running the same snippet using HashVisitable and the
semi-secure Fnv1aHash (see below) results in zero collisions!
>>>
>>> Finally, the design of the Hashable protocol forces the use of one
implementation without the possibility of switching between multiple
hashing algorithms.
>>>
>>> Proposed solution
>>>
>>> Instead of coupling the hashing algorithm with each and every Swift
type, we should provide a hashing API based on the visitor-pattern. By
freeing application developers from the burden of having to implement
hashing algorithms, the Standard Library can provide default ones which
fulfill collision, performance and security goals. Furthermore, it would
allow developers to swap to different algorithms based on the use case.
>>>
>>> Detailed design
>>>
>>> The proposal deprecates the Hashable protocol and introduces the
following two:
>>>
>>> protocol Hasher
>>> {
>>>
>>> mutating func finish() -> Int
>>>
>>>
>>> mutating func write(bytes
>>> : UnsafeRawBufferPointer)
>>> }
>>>
>>>
>>> protocol HashVisitable
>>> {
>>>
>>> func hash<H: Hasher>(_ hasher: inout
>>> H)
>>> }
>>>
>>> Hasher is the protocol which represents a hashing algorithm, and
HashVisitable replaces Hashable. For types entirely represented by their
memory layout, the following protocol would provide a default
implementation:
>>>
>>> protocol ContiguouslyHashable: HashVisitable {}
>>>
>>> extension ContiguouslyHashable {
>>> func hash<H: Hasher>(_ hasher: inout H) {
>>> var mutableSelf = self
>>> try! Swift.withUnsafeBytes(of: &mutableSelf) {
>>> hasher.write(bytes: $0)
>>> }
>>> }
>>> }
>>>
>>> extension Bool : ContiguouslyHashable {}
>>> extension UInt8 : ContiguouslyHashable {}
>>> extension UInt16 : ContiguouslyHashable {}
>>> extension UInt32 : ContiguouslyHashable {}
>>> extension UInt64 : ContiguouslyHashable {}
>>> extension UInt : ContiguouslyHashable {}
>>> extension Int8 : ContiguouslyHashable {}
>>> extension Int16 : ContiguouslyHashable {}
>>> extension Int32 : ContiguouslyHashable {}
>>> extension Int64 : ContiguouslyHashable {}
>>> extension Int : ContiguouslyHashable {}
>>>
>>> The Standard-Library would then provide a set of hashing
implementations specific to each purpose. A possible choice for hashing
algorithms would be the reasonably fast SipHash-2-4, and the reasonably
secure SipHash-4-8.
>>>
>>> FNV-1A is another popular semi-secure but blazingly fast hash
algorithm, which – for the sake of demonstration – could be implemented as
follows:
>>>
>>> struct Fnv1aHash
>>> {
>>>
>>> fileprivate var state: UInt
>>>
>>>
>>> init(seed: UInt
>>> ) {
>>>
>>> self.state = seed &+ 14695981039346656037
>>>
>>> }
>>> }
>>>
>>>
>>> extension Fnv1aHash: Hasher
>>> {
>>>
>>> mutating func write(bytes
>>> : UnsafeRawBufferPointer) {
>>>
>>> for byte in
>>> bytes {
>>>
>>> self.state = (self.state ^ UInt(byte)) &* 1099511628211
>>>
>>> }
>>> }
>>>
>>> mutating func finish() -> Int
>>> {
>>>
>>> return unsafeBitCast(self.state, to: Int.self
>>> )
>>> }
>>> }
>>>
>>> Coming back to the sample code present in the Hashable documentation,
the new implementation would look like:
>>>
>>> extension GridPoint: HashVisitable
>>> {
>>>
>>> func hash<H: Hasher>(_ hasher: inout
>>> H) {
>>>
>>> self.x.hash(&
>>> hasher)
>>>
>>> self.y.hash(&
>>> hasher)
>>> }
>>> }
>>>
>>> Source compatibility
>>>
>>> Making use of "extending protocols to conform to protocols":
>>>
>>> extension Hashable: HashVisitable
>>> {
>>>
>>> func hash<H: Hasher>(_ hasher: inout
>>> H) {
>>>
>>> self.hashValue.hash(&
>>> hasher)
>>> }
>>> }
>>>
>>> Effect on ABI stability
>>>
>>> n/a
>>>
>>> Effect on API resilience
>>>
>>> This feature should be possible to add/remove without breaking ABI.
>>>
>>> Alternatives considered
>>>
>>> n/a
>>> _______________________________________________
>>> swift-evolution mailing list
>>> swift-evolution@swift.org
>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>
>> _______________________________________________
>> swift-evolution mailing list
>> swift-evolution@swift.org
>> https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

allevato · March 13, 2017, 10:15pm

I'm not convinced this is the right approach: it couples the fact that a
type is hashable with a specific strategy for implementing the hash value
computation.

I’m confused. This proposal specifically *decouples* the implementation
of hashable from hashing strategies: that’s its main goal. Could you
explain how you see coupling here?

Hashable as it is currently defined says "this type provides a hashValue
property that allows it to be used in a hashing container". The proposed
API says "this type is hashable using a specific visitor API and hasher
interface". The "visitor API and hasher interface" is what I mean by
"strategy" in my original message, compared to the current Hashable which
simply says "it's hashable and here's its hash—a number".

I don't buy the argument that we need to support the same type to support
being hashed with arbitrary hashing algorithms at runtime. Hash values in
languages like Swift are intended to be used specifically for hashing
collections. What value for that use case is provided by allowing a
collection to plug in various algorithms for the same type? I don't think
the added flexibility pays for the increased complexity.

A simpler solution to increase safety/force people to consider the
resiliency of their hash function would just be to have Hashable.hashValue
return a concrete "HashValue" type that provides the mixing interface, with
a concrete implementation in the standard library—but I think a simple
interface that allows mixing (1) integer constants and (2) hash values of
other Hashables would be more sufficient and easier to understand. If users
need something more complex than that, then it stands to reason that they
*should* have to do a little more work.

Instead, the language should make efforts to avoid requiring the user to
think about hashing algorithms *at all*; to answer Sean's question a couple
messages up, I've proposed in the past
<https://gist.github.com/allevato/2fd10290bfa84accfbe977d8ac07daad> (rough
working draft) adding derivation of Equatable and Hashable for structs and
enums where it's possible to automatically do so (for example, if all of
the enum's associated values are Equatable/Hashable).

This proposal does so with the *ContinuouslyHashable* protocol, which
provides a default implementation of hashable that works for enums and
structs that have “inclusive” memory layouts. I think this is good thing,
because it means you have to make a conscious choice to use
*ContinuouslyHashable* instead of relying on an automatic implementation
which will sometimes be wrong.

From what I can tell, ContiguouslyHashable only works with fixed layout

types, whereas a derived implementation would work for anything that is
composed entirely of other Hashable types by mixing the hash values of
those components in a known way.

In such a situation, why do you expect that such an automatically generated
implementation would be "wrong"?

···

On Mon, Mar 13, 2017 at 1:43 PM David Hart <david@hartbit.com> wrote:

On 13 Mar 2017, at 20:51, Tony Allevato <tony.allevato@gmail.com> wrote:

I've been experimenting with an implementation in the compiler and I have
something that works for enums, except for recursive ones. I haven't
started structs yet because I think there are more open questions there,
and I hope to propose enums and structs independently so that the enum one
doesn't get bogged down by the struct one.

The core team seems to be aware of the need for this; the logic that
derives Equatable/Hashable for enums without associated values also has
TODO comments to handle those with associated values, and to handle structs
as well. Slava Pestov mentioned a while back that they encouraged PRs on
it, which is why I started, but my free time has been limited lately.

That being said, I *do* think there would be value in having some kind of
hash "mixer" as part of the standard library and strongly encouraging
through documentation that hashValue implementors use it. Getting the
function right is the hard part, and if Swift both (1) took care of it for
you by default in many cases and (2) made the harder cases easier by
providing a high quality canned implementation, I think we'd have a much
cleaner solution than what is being proposed here.

On Mon, Mar 13, 2017 at 12:18 PM Sean Heber via swift-evolution < > swift-evolution@swift.org> wrote:

Is there any reason the API couldn’t be expressed as something along these
lines?

func hash() -> [Hashable] {
return [x, y]
}

l8r
Sean

> On Mar 13, 2017, at 2:15 PM, David Hart <david@hartbit.com> wrote:
>
>>
>> On 13 Mar 2017, at 18:54, Sean Heber via swift-evolution < > swift-evolution@swift.org> wrote:
>>
>> I’m dumb when it comes to proper hashing, but it’s such a tediously
common thing in my experience to need to add Hashable to some kind of a
struct so I can stash it in a set or use it as a dictionary key. Is there
really no way to make this all more automatic? I have to be honest - this
proposal seems *harder* to understand than the way it works now.
>
> It's not really harder: just call hash on each of your type's
significant values:
>
> x.hash(&hasher)
> y.hash(&hasher)
>
> How would you implement hashValue in a simpler way, remembering that 'x
^ y' is an incorrect implementation?
>
>> Of course the easiest would be if the language could just do this “good
enough" for me using reflection or whatever and if I really did run into a
problem where I wanted to do this myself, I could override something.
>>
>> Perfect is the enemy of good.
>>
>> l8r
>> Sean
>>
>>
>>> On Mar 13, 2017, at 10:38 AM, Vincent Esche via swift-evolution < > swift-evolution@swift.org> wrote:
>>>
>>> Hi there,
>>>
>>> I've written up a proposal for an overhaul/replacement of the Hashable
protocol and would love to hear your feedback on it!
>>>
>>> Rendered | Blog Post
>>>
>>> Cheers,
>>> Vincent
>>>
>>> Ps: I'd like to thank David Hart (@hartbit) for his great editorial
feedback on this proposal.
>>>
>>> HashVisitable
>>>
>>> • Proposal: SE-NNNN
>>> • Authors: Vincent Esche
>>> • Review Manager: TBD
>>> • Status: Awaiting review
>>> Introduction
>>>
>>> Replace the Hashable protocol by two new procotols (Hasher and
HashVisitable) to improve safety, versatility and learnability.
>>>
>>> Motivation
>>>
>>> Implementing Hashable is difficult and the consequences if not done
well have dire performance and safety repercussions.
>>>
>>> The documentation of Hashable lists a sample implementation of var
hashValue:
>>>
>>> /// A point in an x-y coordinate system.
>>> struct GridPoint {
>>> var x: Int
>>> var y: Int
>>> }
>>>
>>> extension GridPoint: Hashable {
>>> var hashValue: Int {
>>> return x.hashValue ^ y.hashValue
>>> }
>>>
>>> static func == (lhs: GridPoint, rhs: GridPoint) -> Bool {
>>> return lhs.x == rhs.x && lhs.y == rhs.y
>>> }
>>> }
>>>
>>> Calculating the hashes of all GridPoints (given the above
implementation) on a 1000 × 1000 grid …
>>>
>>> let (width, height) = (1000, 1000)
>>> let total = width * height
>>> var hashes = Set<Int>()
>>> for x in 0..<width {
>>> for y in 0..<height {
>>> hashes.insert(GridPoint(x: x, y: y).hashValue)
>>> }
>>> }
>>> print("\(hashes.count) unique hashes out of a total of \(total).")
>>>
>>> … results in just 1024 unique hash values for 1_000_000 unique values.
>>>
>>> In other words: The recommended implementation causes 99.9% of values
to trigger a hash collision.
>>>
>>> Out of those 1_000_000 values the median collision count was 976 with
min and max being 976 and 1000respectively.
>>>
>>> The collision rate will have negative impact in algorithms which
heavily use hashValue like the ones in Dictionaryand Set. Furthermore, it
increases the vulnerability to DDOS attacks when exposed to the web.
>>>
>>> If even the official Swift documentation gets the implementation of
hashValue wrong, then who is to expect the average Swift programmer to do
any better?
>>>
>>> In contrast, running the same snippet using HashVisitable and the
semi-secure Fnv1aHash (see below) results in zero collisions!
>>>
>>> Finally, the design of the Hashable protocol forces the use of one
implementation without the possibility of switching between multiple
hashing algorithms.
>>>
>>> Proposed solution
>>>
>>> Instead of coupling the hashing algorithm with each and every Swift
type, we should provide a hashing API based on the visitor-pattern. By
freeing application developers from the burden of having to implement
hashing algorithms, the Standard Library can provide default ones which
fulfill collision, performance and security goals. Furthermore, it would
allow developers to swap to different algorithms based on the use case.
>>>
>>> Detailed design
>>>
>>> The proposal deprecates the Hashable protocol and introduces the
following two:
>>>
>>> protocol Hasher
>>> {
>>>
>>> mutating func finish() -> Int
>>>
>>>
>>> mutating func write(bytes
>>> : UnsafeRawBufferPointer)
>>> }
>>>
>>>
>>> protocol HashVisitable
>>> {
>>>
>>> func hash<H: Hasher>(_ hasher: inout
>>> H)
>>> }
>>>
>>> Hasher is the protocol which represents a hashing algorithm, and
HashVisitable replaces Hashable. For types entirely represented by their
memory layout, the following protocol would provide a default
implementation:
>>>
>>> protocol ContiguouslyHashable: HashVisitable {}
>>>
>>> extension ContiguouslyHashable {
>>> func hash<H: Hasher>(_ hasher: inout H) {
>>> var mutableSelf = self
>>> try! Swift.withUnsafeBytes(of: &mutableSelf) {
>>> hasher.write(bytes: $0)
>>> }
>>> }
>>> }
>>>
>>> extension Bool : ContiguouslyHashable {}
>>> extension UInt8 : ContiguouslyHashable {}
>>> extension UInt16 : ContiguouslyHashable {}
>>> extension UInt32 : ContiguouslyHashable {}
>>> extension UInt64 : ContiguouslyHashable {}
>>> extension UInt : ContiguouslyHashable {}
>>> extension Int8 : ContiguouslyHashable {}
>>> extension Int16 : ContiguouslyHashable {}
>>> extension Int32 : ContiguouslyHashable {}
>>> extension Int64 : ContiguouslyHashable {}
>>> extension Int : ContiguouslyHashable {}
>>>
>>> The Standard-Library would then provide a set of hashing
implementations specific to each purpose. A possible choice for hashing
algorithms would be the reasonably fast SipHash-2-4, and the reasonably
secure SipHash-4-8.
>>>
>>> FNV-1A is another popular semi-secure but blazingly fast hash
algorithm, which – for the sake of demonstration – could be implemented as
follows:
>>>
>>> struct Fnv1aHash
>>> {
>>>
>>> fileprivate var state: UInt
>>>
>>>
>>> init(seed: UInt
>>> ) {
>>>
>>> self.state = seed &+ 14695981039346656037
>>>
>>> }
>>> }
>>>
>>>
>>> extension Fnv1aHash: Hasher
>>> {
>>>
>>> mutating func write(bytes
>>> : UnsafeRawBufferPointer) {
>>>
>>> for byte in
>>> bytes {
>>>
>>> self.state = (self.state ^ UInt(byte)) &* 1099511628211
>>>
>>> }
>>> }
>>>
>>> mutating func finish() -> Int
>>> {
>>>
>>> return unsafeBitCast(self.state, to: Int.self
>>> )
>>> }
>>> }
>>>
>>> Coming back to the sample code present in the Hashable documentation,
the new implementation would look like:
>>>
>>> extension GridPoint: HashVisitable
>>> {
>>>
>>> func hash<H: Hasher>(_ hasher: inout
>>> H) {
>>>
>>> self.x.hash(&
>>> hasher)
>>>
>>> self.y.hash(&
>>> hasher)
>>> }
>>> }
>>>
>>> Source compatibility
>>>
>>> Making use of "extending protocols to conform to protocols":
>>>
>>> extension Hashable: HashVisitable
>>> {
>>>
>>> func hash<H: Hasher>(_ hasher: inout
>>> H) {
>>>
>>> self.hashValue.hash(&
>>> hasher)
>>> }
>>> }
>>>
>>> Effect on ABI stability
>>>
>>> n/a
>>>
>>> Effect on API resilience
>>>
>>> This feature should be possible to add/remove without breaking ABI.
>>>
>>> Alternatives considered
>>>
>>> n/a
>>> _______________________________________________
>>> swift-evolution mailing list
>>> swift-evolution@swift.org
>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>
>> _______________________________________________
>> swift-evolution mailing list
>> swift-evolution@swift.org
>> https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

lorentey · March 14, 2017, 5:35am

Adding the list back to this reply because I don't believe you meant to reply only to me, and I think it's an important discussion to have :)

Automatic generation of Hashable implementation code only "solves" the problem of having to implement `var hashValue`.
It however only works for some types. By no means for all.

Certainly, and I never suggested that it would work for every case. I was responding to Sean's point that the compiler should be able to do "good enough" in the common cases and I offered a way that that can be achieved.

Take this hypothetical implementation of a HSB color:

struct Color {
  let hue: UInt8
  let saturation: UInt8
  let brightness: UInt8
}

Let the semantic of this type be this:
Any two colors with a `brightness` of `0` are to be considered equal regardless of their respective `hue` or `saturation`. At night all cats are gray.

An auto-generated conformance impl for `Color` would most likely be wrong afaict.
And those kind of semantics are everywhere.

Of course, and in those cases, you wouldn't want to use an auto-derived Equatable or Hashable implementation. Those kinds of semantics are "everywhere" for certain definitions of "everywhere", but similarly, semantics where the hash value of a thing can be determined simply via combination of the hash values of its components are also "everywhere".

I wouldn't suggest that auto-deriving Hashable solves *all* problems, and similarly, your proposal doesn't help in the situation you described either. Your API provides a different way to mix the hue, saturation, and brightness components in an implementation of hashValue, but it doesn't force the user to mix them *correctly* (by excluding hue/saturation if brightness is zero).

So in both cases, users still have to provide custom implementations of == and hashValue. But without auto-deriving, users have to provide them even for the cases where the auto-derived implementation *would have been correct.* Auto-deriving is about reducing the number of types that need to have custom implementations, thereby reducing the chance that a user will do it wrong.

When considering a type with auto-derived Hashable and Equatable, there are two ways it can be wrong:

1) The auto-generated implementations of both == and hashValue don't honor the semantic contract of the type (for example, don't ignore brightness when it's zero).
2) The user overrides the auto-generated implementation of one of ==/hashValue but not both, and violates the contracts between them.

For #1, yes, the compiler generated an incorrect implementation in that case. However, I would argue it's the developer's responsibility to fix it by overriding it if they need different semantics. As I mentioned above, without derivation, the developer could still implement it incorrectly, just as they could with your API.

For #2, this could be solved by requiring users to override both if they override one of them. Now they're back in the same situation as #1—they either did it right, or they did it wrong.

Auto-generation clearly is no panacea when it comes to hashing. Especially if it leads to invalid and unsafe default implementations.

Auto-deriving cannot produce "unsafe" implementations because it's defined in terms of a function operating over the hash values of its components. It can produce an implementation that does not match the intended semantics of the class that are defined outside of its component values, but that's not the compiler's job to decide; it's up to the user to provide those.

Regarding unsafety, it's worth noting that your ContiguouslyHashable protocol is inherently unsafe and fragile. Imagine that a user implements a struct that they make conform to ContiguouslyHashable because at the time, it's a simple fixed layout type with primitive fields. If they take that type and add a new field to it that isn't contiguously hashable (say, a class reference) and they forget to replace the ContiguouslyHashable conformance, they now have a very broken type, and that behavior isn't caught until *runtime* when things go wrong.

At least with derived conformances, in that situation the *compiler* would see that the type was no longer hashable and emit an error when it's used anywhere that Hashable/hashValue was expected.

So if safety is your motivation, ContiguouslyHashable kind of fights back against that because it gives the user a door they can open—in some cases, accidentally—that produces invalid results.

It would however be nice to be able to annotate types for which you want HashVisitable implementations to be generated.

That's one of the still-open questions from the last discussion thread on the topic; whether auto-deriving should be automatic for any type where it is possible (as it is now for enums without associated values) or whether the user should have to opt-in.

I actually don't see auto-generation as an alternative to a proper hashing API. But rather as an addition.
HashVisitable and auto-deriving would actually work great together!

I completely agree that these ideas complement each other very well! And I also agree that the language would do well to make it easier for users to easily do the right thing. I just feel that *the API proposed here specifically* adds too much complexity for the problem it's trying to solve.

I'd find it more compelling if it was simplified a bit:

* The standard library doesn't need to provide a number of hash implementations; it should just provide one that works "well enough" in most cases
* Hashable doesn't have tie itself to a visitor pattern. It's not necessarily safer because users can still mix the wrong values; in that case, I'd rather the stdlib implementation of the "good enough" hash just be a standalone mixer that the language can encourage users to use.

I agree with this assessment. If SipHash is deemed "good enough," then Swift should offer these as standalone facilities and encourage users to call them in `Hashable`. I think the API design proposed here is much too complex, but offering tried-and-true hash functions is certainly reasonable and would be an improvement over xor.

One important point about SipHash and similar secure hash functions is that they work better if all nested components of a hashable value are (recursively) appended to a single hasher, rather than having each component create its own separate hasher instance in its hashValue implementation. Initializing and finalizing the hasher state is not free; it has some nontrivial cost. So ideally only collections would create and finalize hashers; composite types that implement Hashable should not do that.

Therefore, implementing SipHash as a separate, optional facility for use inside individual hashValue implementations (as opposed to being an outright replacement for it) would be somewhat inefficient. It also wouldn't discourage/prevent people from continuing to use xor and similar ad hoc hashing methods; people would need to discover these useful new hash utilities on their own.

Certainly also, the documentation for `Hashable` can be improved. In general, however, it's not very convincing to motivate an entire proposal for a new feature based on a documentation bug. Recognize that there are (or at least have been, in past shipped versions of Swift) code snippets in the documentation that _don't even compile_! If we accepted the logic put forward here, we should give up on Swift entirely; after all, if even the official Swift documentation can't provide code that compiles, what hope do the rest of us have?!

IIUC, the proposal suggests that HashVisitable should replace Hashable as the official hashing protocol. This is certainly a major change; but given the sorry state of not just the example given in the Hashable docs, but an alarmingly large fraction of every hand-written hashValue implementation out there (including almost all hashValues I ever made before I gave up & switched to SipHash), perhaps it'd be a good idea to make a clean break here and force people to switch to an API that's a lot easier to get right.

<rant>
Consider that hashValue is pretty much the only place in "everyday" Swift code where people need to routinely use the overflowing arithmetic operators and bitwise operators from the "Advanced Operators" section in the Swift manual. Typical hashValue implementations look like they were written in a different language than the code that surrounds them. They're full of arcane symbols, magical numbers and mystical incantations that are superstitiously repeated over and over, often with no real understanding of their original meaning or function.

It took a lot of effort to learn C-style for loops (ding!), and they were often tricky to use correctly; but it was certainly possible to learn them well enough to write decent code. In contrast, practically nobody I know is able to write a good hashValue implementation from scratch for even simple structs like CGPoint with any confidence. Not just junior programmers: nobody. One has to be some sort of wizard to do it. The hashValue API is a miserable failure.
</rant>

Automatic generation of hashValue implementations using SipHash would (mostly) resolve the usability issue, but not the efficiency thing. Perhaps it'd be acceptable to simultaneously implement autogeneration and to switch over to (simplified!) visitor-based hashing in the same compiler release—so that the migrator's fixits would simply suggest removing hashValue implementations instead of replacing them with something that looks dramatically different.

If the boilerplate generator would include some sort of syntax to explicitly add/remove specific properties from/to the scope of hashing & equality, then only people implementing hashed collections would ever need to care about the specific hashing APIs, leaving other programmers blissfully and safely unaware of such esoteric concerns.

···

On 2017. Mar 14., at 2:44, Xiaodi Wu via swift-evolution <swift-evolution@swift.org> wrote:
On Mon, Mar 13, 2017 at 8:30 PM, Tony Allevato via swift-evolution <swift-evolution@swift.org> wrote:
On Mon, Mar 13, 2017 at 4:32 PM Vincent Esche <regexident.mailinglists@gmail.com> wrote:

In fact they already do in other languages.

On Mon, Mar 13, 2017 at 8:51 PM, Tony Allevato via swift-evolution <swift-evolution@swift.org> wrote:
I'm not convinced this is the right approach: it couples the fact that a type is hashable with a specific strategy for implementing the hash value computation.

Instead, the language should make efforts to avoid requiring the user to think about hashing algorithms *at all*; to answer Sean's question a couple messages up, I've proposed in the past (rough working draft) adding derivation of Equatable and Hashable for structs and enums where it's possible to automatically do so (for example, if all of the enum's associated values are Equatable/Hashable). I've been experimenting with an implementation in the compiler and I have something that works for enums, except for recursive ones. I haven't started structs yet because I think there are more open questions there, and I hope to propose enums and structs independently so that the enum one doesn't get bogged down by the struct one.

The core team seems to be aware of the need for this; the logic that derives Equatable/Hashable for enums without associated values also has TODO comments to handle those with associated values, and to handle structs as well. Slava Pestov mentioned a while back that they encouraged PRs on it, which is why I started, but my free time has been limited lately.

That being said, I *do* think there would be value in having some kind of hash "mixer" as part of the standard library and strongly encouraging through documentation that hashValue implementors use it. Getting the function right is the hard part, and if Swift both (1) took care of it for you by default in many cases and (2) made the harder cases easier by providing a high quality canned implementation, I think we'd have a much cleaner solution than what is being proposed here.

On Mon, Mar 13, 2017 at 12:18 PM Sean Heber via swift-evolution <swift-evolution@swift.org> wrote:
Is there any reason the API couldn’t be expressed as something along these lines?

func hash() -> [Hashable] {
return [x, y]
}

l8r
Sean

> On Mar 13, 2017, at 2:15 PM, David Hart <david@hartbit.com> wrote:
>
>>
>> On 13 Mar 2017, at 18:54, Sean Heber via swift-evolution <swift-evolution@swift.org> wrote:
>>
>> I’m dumb when it comes to proper hashing, but it’s such a tediously common thing in my experience to need to add Hashable to some kind of a struct so I can stash it in a set or use it as a dictionary key. Is there really no way to make this all more automatic? I have to be honest - this proposal seems *harder* to understand than the way it works now.
>
> It's not really harder: just call hash on each of your type's significant values:
>
> x.hash(&hasher)
> y.hash(&hasher)
>
> How would you implement hashValue in a simpler way, remembering that 'x ^ y' is an incorrect implementation?
>
>> Of course the easiest would be if the language could just do this “good enough" for me using reflection or whatever and if I really did run into a problem where I wanted to do this myself, I could override something.
>>
>> Perfect is the enemy of good.
>>
>> l8r
>> Sean
>>
>>
>>> On Mar 13, 2017, at 10:38 AM, Vincent Esche via swift-evolution <swift-evolution@swift.org> wrote:
>>>
>>> Hi there,
>>>
>>> I've written up a proposal for an overhaul/replacement of the Hashable protocol and would love to hear your feedback on it!
>>>
>>> Rendered | Blog Post
>>>
>>> Cheers,
>>> Vincent
>>>
>>> Ps: I'd like to thank David Hart (@hartbit) for his great editorial feedback on this proposal.
>>>
>>> HashVisitable
>>>
>>> • Proposal: SE-NNNN
>>> • Authors: Vincent Esche
>>> • Review Manager: TBD
>>> • Status: Awaiting review
>>> Introduction
>>>
>>> Replace the Hashable protocol by two new procotols (Hasher and HashVisitable) to improve safety, versatility and learnability.
>>>
>>> Motivation
>>>
>>> Implementing Hashable is difficult and the consequences if not done well have dire performance and safety repercussions.
>>>
>>> The documentation of Hashable lists a sample implementation of var hashValue:
>>>
>>> /// A point in an x-y coordinate system.
>>> struct GridPoint {
>>> var x: Int
>>> var y: Int
>>> }
>>>
>>> extension GridPoint: Hashable {
>>> var hashValue: Int {
>>> return x.hashValue ^ y.hashValue
>>> }
>>>
>>> static func == (lhs: GridPoint, rhs: GridPoint) -> Bool {
>>> return lhs.x == rhs.x && lhs.y == rhs.y
>>> }
>>> }
>>>
>>> Calculating the hashes of all GridPoints (given the above implementation) on a 1000 × 1000 grid …
>>>
>>> let (width, height) = (1000, 1000)
>>> let total = width * height
>>> var hashes = Set<Int>()
>>> for x in 0..<width {
>>> for y in 0..<height {
>>> hashes.insert(GridPoint(x: x, y: y).hashValue)
>>> }
>>> }
>>> print("\(hashes.count) unique hashes out of a total of \(total).")
>>>
>>> … results in just 1024 unique hash values for 1_000_000 unique values.
>>>
>>> In other words: The recommended implementation causes 99.9% of values to trigger a hash collision.
>>>
>>> Out of those 1_000_000 values the median collision count was 976 with min and max being 976 and 1000respectively.
>>>
>>> The collision rate will have negative impact in algorithms which heavily use hashValue like the ones in Dictionaryand Set. Furthermore, it increases the vulnerability to DDOS attacks when exposed to the web.
>>>
>>> If even the official Swift documentation gets the implementation of hashValue wrong, then who is to expect the average Swift programmer to do any better?
>>>
>>> In contrast, running the same snippet using HashVisitable and the semi-secure Fnv1aHash (see below) results in zero collisions!
>>>
>>> Finally, the design of the Hashable protocol forces the use of one implementation without the possibility of switching between multiple hashing algorithms.
>>>
>>> Proposed solution
>>>
>>> Instead of coupling the hashing algorithm with each and every Swift type, we should provide a hashing API based on the visitor-pattern. By freeing application developers from the burden of having to implement hashing algorithms, the Standard Library can provide default ones which fulfill collision, performance and security goals. Furthermore, it would allow developers to swap to different algorithms based on the use case.
>>>
>>> Detailed design
>>>
>>> The proposal deprecates the Hashable protocol and introduces the following two:
>>>
>>> protocol Hasher
>>> {
>>>
>>> mutating func finish() -> Int
>>>
>>>
>>> mutating func write(bytes
>>> : UnsafeRawBufferPointer)
>>> }
>>>
>>>
>>> protocol HashVisitable
>>> {
>>>
>>> func hash<H: Hasher>(_ hasher: inout
>>> H)
>>> }
>>>
>>> Hasher is the protocol which represents a hashing algorithm, and HashVisitable replaces Hashable. For types entirely represented by their memory layout, the following protocol would provide a default implementation:
>>>
>>> protocol ContiguouslyHashable: HashVisitable {}
>>>
>>> extension ContiguouslyHashable {
>>> func hash<H: Hasher>(_ hasher: inout H) {
>>> var mutableSelf = self
>>> try! Swift.withUnsafeBytes(of: &mutableSelf) {
>>> hasher.write(bytes: $0)
>>> }
>>> }
>>> }
>>>
>>> extension Bool : ContiguouslyHashable {}
>>> extension UInt8 : ContiguouslyHashable {}
>>> extension UInt16 : ContiguouslyHashable {}
>>> extension UInt32 : ContiguouslyHashable {}
>>> extension UInt64 : ContiguouslyHashable {}
>>> extension UInt : ContiguouslyHashable {}
>>> extension Int8 : ContiguouslyHashable {}
>>> extension Int16 : ContiguouslyHashable {}
>>> extension Int32 : ContiguouslyHashable {}
>>> extension Int64 : ContiguouslyHashable {}
>>> extension Int : ContiguouslyHashable {}
>>>
>>> The Standard-Library would then provide a set of hashing implementations specific to each purpose. A possible choice for hashing algorithms would be the reasonably fast SipHash-2-4, and the reasonably secure SipHash-4-8.
>>>
>>> FNV-1A is another popular semi-secure but blazingly fast hash algorithm, which – for the sake of demonstration – could be implemented as follows:
>>>
>>> struct Fnv1aHash
>>> {
>>>
>>> fileprivate var state: UInt
>>>
>>>
>>> init(seed: UInt
>>> ) {
>>>
>>> self.state = seed &+ 14695981039346656037
>>>
>>> }
>>> }
>>>
>>>
>>> extension Fnv1aHash: Hasher
>>> {
>>>
>>> mutating func write(bytes
>>> : UnsafeRawBufferPointer) {
>>>
>>> for byte in
>>> bytes {
>>>
>>> self.state = (self.state ^ UInt(byte)) &* 1099511628211
>>>
>>> }
>>> }
>>>
>>> mutating func finish() -> Int
>>> {
>>>
>>> return unsafeBitCast(self.state, to: Int.self
>>> )
>>> }
>>> }
>>>
>>> Coming back to the sample code present in the Hashable documentation, the new implementation would look like:
>>>
>>> extension GridPoint: HashVisitable
>>> {
>>>
>>> func hash<H: Hasher>(_ hasher: inout
>>> H) {
>>>
>>> self.x.hash(&
>>> hasher)
>>>
>>> self.y.hash(&
>>> hasher)
>>> }
>>> }
>>>
>>> Source compatibility
>>>
>>> Making use of "extending protocols to conform to protocols":
>>>
>>> extension Hashable: HashVisitable
>>> {
>>>
>>> func hash<H: Hasher>(_ hasher: inout
>>> H) {
>>>
>>> self.hashValue.hash(&
>>> hasher)
>>> }
>>> }
>>>
>>> Effect on ABI stability
>>>
>>> n/a
>>>
>>> Effect on API resilience
>>>
>>> This feature should be possible to add/remove without breaking ABI.
>>>
>>> Alternatives considered
>>>
>>> n/a
>>> _______________________________________________
>>> swift-evolution mailing list
>>> swift-evolution@swift.org
>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>
>> _______________________________________________
>> swift-evolution mailing list
>> swift-evolution@swift.org
>> https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

hartbit · March 14, 2017, 6:36am

Adding the list back to this reply because I don't believe you meant to reply only to me, and I think it's an important discussion to have :)

Automatic generation of Hashable implementation code only "solves" the problem of having to implement `var hashValue`.
It however only works for some types. By no means for all.

Certainly, and I never suggested that it would work for every case. I was responding to Sean's point that the compiler should be able to do "good enough" in the common cases and I offered a way that that can be achieved.

Take this hypothetical implementation of a HSB color:

struct Color {
  let hue: UInt8
  let saturation: UInt8
  let brightness: UInt8
}

Let the semantic of this type be this:
Any two colors with a `brightness` of `0` are to be considered equal regardless of their respective `hue` or `saturation`. At night all cats are gray.

An auto-generated conformance impl for `Color` would most likely be wrong afaict.
And those kind of semantics are everywhere.

Of course, and in those cases, you wouldn't want to use an auto-derived Equatable or Hashable implementation. Those kinds of semantics are "everywhere" for certain definitions of "everywhere", but similarly, semantics where the hash value of a thing can be determined simply via combination of the hash values of its components are also "everywhere".

I wouldn't suggest that auto-deriving Hashable solves *all* problems, and similarly, your proposal doesn't help in the situation you described either. Your API provides a different way to mix the hue, saturation, and brightness components in an implementation of hashValue, but it doesn't force the user to mix them *correctly* (by excluding hue/saturation if brightness is zero).

So in both cases, users still have to provide custom implementations of == and hashValue. But without auto-deriving, users have to provide them even for the cases where the auto-derived implementation *would have been correct.* Auto-deriving is about reducing the number of types that need to have custom implementations, thereby reducing the chance that a user will do it wrong.

When considering a type with auto-derived Hashable and Equatable, there are two ways it can be wrong:

1) The auto-generated implementations of both == and hashValue don't honor the semantic contract of the type (for example, don't ignore brightness when it's zero).
2) The user overrides the auto-generated implementation of one of ==/hashValue but not both, and violates the contracts between them.

For #1, yes, the compiler generated an incorrect implementation in that case. However, I would argue it's the developer's responsibility to fix it by overriding it if they need different semantics. As I mentioned above, without derivation, the developer could still implement it incorrectly, just as they could with your API.

For #2, this could be solved by requiring users to override both if they override one of them. Now they're back in the same situation as #1—they either did it right, or they did it wrong.

Auto-generation clearly is no panacea when it comes to hashing. Especially if it leads to invalid and unsafe default implementations.

Auto-deriving cannot produce "unsafe" implementations because it's defined in terms of a function operating over the hash values of its components. It can produce an implementation that does not match the intended semantics of the class that are defined outside of its component values, but that's not the compiler's job to decide; it's up to the user to provide those.

Regarding unsafety, it's worth noting that your ContiguouslyHashable protocol is inherently unsafe and fragile. Imagine that a user implements a struct that they make conform to ContiguouslyHashable because at the time, it's a simple fixed layout type with primitive fields. If they take that type and add a new field to it that isn't contiguously hashable (say, a class reference) and they forget to replace the ContiguouslyHashable conformance, they now have a very broken type, and that behavior isn't caught until *runtime* when things go wrong.

At least with derived conformances, in that situation the *compiler* would see that the type was no longer hashable and emit an error when it's used anywhere that Hashable/hashValue was expected.

So if safety is your motivation, ContiguouslyHashable kind of fights back against that because it gives the user a door they can open—in some cases, accidentally—that produces invalid results.

It would however be nice to be able to annotate types for which you want HashVisitable implementations to be generated.

That's one of the still-open questions from the last discussion thread on the topic; whether auto-deriving should be automatic for any type where it is possible (as it is now for enums without associated values) or whether the user should have to opt-in.

I actually don't see auto-generation as an alternative to a proper hashing API. But rather as an addition.
HashVisitable and auto-deriving would actually work great together!

I completely agree that these ideas complement each other very well! And I also agree that the language would do well to make it easier for users to easily do the right thing. I just feel that *the API proposed here specifically* adds too much complexity for the problem it's trying to solve.

I'd find it more compelling if it was simplified a bit:

* The standard library doesn't need to provide a number of hash implementations; it should just provide one that works "well enough" in most cases
* Hashable doesn't have tie itself to a visitor pattern. It's not necessarily safer because users can still mix the wrong values; in that case, I'd rather the stdlib implementation of the "good enough" hash just be a standalone mixer that the language can encourage users to use.

I agree with this assessment. If SipHash is deemed "good enough," then Swift should offer these as standalone facilities and encourage users to call them in `Hashable`. I think the API design proposed here is much too complex, but offering tried-and-true hash functions is certainly reasonable and would be an improvement over xor.

Could you describe what you see as too complex in the current proposal?

Certainly also, the documentation for `Hashable` can be improved. In general, however, it's not very convincing to motivate an entire proposal for a new feature based on a documentation bug.

The motivation of the entire proposal is not based on a documentation bug. The documentation bug is there simply as an example to prove that it is hard to write a good hashing algorithm and that the kind of naive implementations we go for have bad hashing characteristics.

The motivation could be resumed as such: generating good hashes for types is hard and the current Standard Library design forces the algorithmic complexity on the shoulders of the user. By encapsulating the algorithmic component in a protocol (and having the Standard Library provide an implementation), we fix that problem while also gaining the versatility of being able to swap the algorithm for one suited for each job.

···

On 14 Mar 2017, at 02:44, Xiaodi Wu via swift-evolution <swift-evolution@swift.org> wrote:

On Mon, Mar 13, 2017 at 8:30 PM, Tony Allevato via swift-evolution <swift-evolution@swift.org> wrote:
On Mon, Mar 13, 2017 at 4:32 PM Vincent Esche <regexident.mailinglists@gmail.com> wrote:

Recognize that there are (or at least have been, in past shipped versions of Swift) code snippets in the documentation that _don't even compile_! If we accepted the logic put forward here, we should give up on Swift entirely; after all, if even the official Swift documentation can't provide code that compiles, what hope do the rest of us have?!

In fact they already do in other languages.

On Mon, Mar 13, 2017 at 8:51 PM, Tony Allevato via swift-evolution <swift-evolution@swift.org> wrote:
I'm not convinced this is the right approach: it couples the fact that a type is hashable with a specific strategy for implementing the hash value computation.

Instead, the language should make efforts to avoid requiring the user to think about hashing algorithms *at all*; to answer Sean's question a couple messages up, I've proposed in the past (rough working draft) adding derivation of Equatable and Hashable for structs and enums where it's possible to automatically do so (for example, if all of the enum's associated values are Equatable/Hashable). I've been experimenting with an implementation in the compiler and I have something that works for enums, except for recursive ones. I haven't started structs yet because I think there are more open questions there, and I hope to propose enums and structs independently so that the enum one doesn't get bogged down by the struct one.

The core team seems to be aware of the need for this; the logic that derives Equatable/Hashable for enums without associated values also has TODO comments to handle those with associated values, and to handle structs as well. Slava Pestov mentioned a while back that they encouraged PRs on it, which is why I started, but my free time has been limited lately.

That being said, I *do* think there would be value in having some kind of hash "mixer" as part of the standard library and strongly encouraging through documentation that hashValue implementors use it. Getting the function right is the hard part, and if Swift both (1) took care of it for you by default in many cases and (2) made the harder cases easier by providing a high quality canned implementation, I think we'd have a much cleaner solution than what is being proposed here.

On Mon, Mar 13, 2017 at 12:18 PM Sean Heber via swift-evolution <swift-evolution@swift.org> wrote:
Is there any reason the API couldn’t be expressed as something along these lines?

func hash() -> [Hashable] {
return [x, y]
}

l8r
Sean

> On Mar 13, 2017, at 2:15 PM, David Hart <david@hartbit.com> wrote:
>
>>
>> On 13 Mar 2017, at 18:54, Sean Heber via swift-evolution <swift-evolution@swift.org> wrote:
>>
>> I’m dumb when it comes to proper hashing, but it’s such a tediously common thing in my experience to need to add Hashable to some kind of a struct so I can stash it in a set or use it as a dictionary key. Is there really no way to make this all more automatic? I have to be honest - this proposal seems *harder* to understand than the way it works now.
>
> It's not really harder: just call hash on each of your type's significant values:
>
> x.hash(&hasher)
> y.hash(&hasher)
>
> How would you implement hashValue in a simpler way, remembering that 'x ^ y' is an incorrect implementation?
>
>> Of course the easiest would be if the language could just do this “good enough" for me using reflection or whatever and if I really did run into a problem where I wanted to do this myself, I could override something.
>>
>> Perfect is the enemy of good.
>>
>> l8r
>> Sean
>>
>>
>>> On Mar 13, 2017, at 10:38 AM, Vincent Esche via swift-evolution <swift-evolution@swift.org> wrote:
>>>
>>> Hi there,
>>>
>>> I've written up a proposal for an overhaul/replacement of the Hashable protocol and would love to hear your feedback on it!
>>>
>>> Rendered | Blog Post
>>>
>>> Cheers,
>>> Vincent
>>>
>>> Ps: I'd like to thank David Hart (@hartbit) for his great editorial feedback on this proposal.
>>>
>>> HashVisitable
>>>
>>> • Proposal: SE-NNNN
>>> • Authors: Vincent Esche
>>> • Review Manager: TBD
>>> • Status: Awaiting review
>>> Introduction
>>>
>>> Replace the Hashable protocol by two new procotols (Hasher and HashVisitable) to improve safety, versatility and learnability.
>>>
>>> Motivation
>>>
>>> Implementing Hashable is difficult and the consequences if not done well have dire performance and safety repercussions.
>>>
>>> The documentation of Hashable lists a sample implementation of var hashValue:
>>>
>>> /// A point in an x-y coordinate system.
>>> struct GridPoint {
>>> var x: Int
>>> var y: Int
>>> }
>>>
>>> extension GridPoint: Hashable {
>>> var hashValue: Int {
>>> return x.hashValue ^ y.hashValue
>>> }
>>>
>>> static func == (lhs: GridPoint, rhs: GridPoint) -> Bool {
>>> return lhs.x == rhs.x && lhs.y == rhs.y
>>> }
>>> }
>>>
>>> Calculating the hashes of all GridPoints (given the above implementation) on a 1000 × 1000 grid …
>>>
>>> let (width, height) = (1000, 1000)
>>> let total = width * height
>>> var hashes = Set<Int>()
>>> for x in 0..<width {
>>> for y in 0..<height {
>>> hashes.insert(GridPoint(x: x, y: y).hashValue)
>>> }
>>> }
>>> print("\(hashes.count) unique hashes out of a total of \(total).")
>>>
>>> … results in just 1024 unique hash values for 1_000_000 unique values.
>>>
>>> In other words: The recommended implementation causes 99.9% of values to trigger a hash collision.
>>>
>>> Out of those 1_000_000 values the median collision count was 976 with min and max being 976 and 1000respectively.
>>>
>>> The collision rate will have negative impact in algorithms which heavily use hashValue like the ones in Dictionaryand Set. Furthermore, it increases the vulnerability to DDOS attacks when exposed to the web.
>>>
>>> If even the official Swift documentation gets the implementation of hashValue wrong, then who is to expect the average Swift programmer to do any better?
>>>
>>> In contrast, running the same snippet using HashVisitable and the semi-secure Fnv1aHash (see below) results in zero collisions!
>>>
>>> Finally, the design of the Hashable protocol forces the use of one implementation without the possibility of switching between multiple hashing algorithms.
>>>
>>> Proposed solution
>>>
>>> Instead of coupling the hashing algorithm with each and every Swift type, we should provide a hashing API based on the visitor-pattern. By freeing application developers from the burden of having to implement hashing algorithms, the Standard Library can provide default ones which fulfill collision, performance and security goals. Furthermore, it would allow developers to swap to different algorithms based on the use case.
>>>
>>> Detailed design
>>>
>>> The proposal deprecates the Hashable protocol and introduces the following two:
>>>
>>> protocol Hasher
>>> {
>>>
>>> mutating func finish() -> Int
>>>
>>>
>>> mutating func write(bytes
>>> : UnsafeRawBufferPointer)
>>> }
>>>
>>>
>>> protocol HashVisitable
>>> {
>>>
>>> func hash<H: Hasher>(_ hasher: inout
>>> H)
>>> }
>>>
>>> Hasher is the protocol which represents a hashing algorithm, and HashVisitable replaces Hashable. For types entirely represented by their memory layout, the following protocol would provide a default implementation:
>>>
>>> protocol ContiguouslyHashable: HashVisitable {}
>>>
>>> extension ContiguouslyHashable {
>>> func hash<H: Hasher>(_ hasher: inout H) {
>>> var mutableSelf = self
>>> try! Swift.withUnsafeBytes(of: &mutableSelf) {
>>> hasher.write(bytes: $0)
>>> }
>>> }
>>> }
>>>
>>> extension Bool : ContiguouslyHashable {}
>>> extension UInt8 : ContiguouslyHashable {}
>>> extension UInt16 : ContiguouslyHashable {}
>>> extension UInt32 : ContiguouslyHashable {}
>>> extension UInt64 : ContiguouslyHashable {}
>>> extension UInt : ContiguouslyHashable {}
>>> extension Int8 : ContiguouslyHashable {}
>>> extension Int16 : ContiguouslyHashable {}
>>> extension Int32 : ContiguouslyHashable {}
>>> extension Int64 : ContiguouslyHashable {}
>>> extension Int : ContiguouslyHashable {}
>>>
>>> The Standard-Library would then provide a set of hashing implementations specific to each purpose. A possible choice for hashing algorithms would be the reasonably fast SipHash-2-4, and the reasonably secure SipHash-4-8.
>>>
>>> FNV-1A is another popular semi-secure but blazingly fast hash algorithm, which – for the sake of demonstration – could be implemented as follows:
>>>
>>> struct Fnv1aHash
>>> {
>>>
>>> fileprivate var state: UInt
>>>
>>>
>>> init(seed: UInt
>>> ) {
>>>
>>> self.state = seed &+ 14695981039346656037
>>>
>>> }
>>> }
>>>
>>>
>>> extension Fnv1aHash: Hasher
>>> {
>>>
>>> mutating func write(bytes
>>> : UnsafeRawBufferPointer) {
>>>
>>> for byte in
>>> bytes {
>>>
>>> self.state = (self.state ^ UInt(byte)) &* 1099511628211
>>>
>>> }
>>> }
>>>
>>> mutating func finish() -> Int
>>> {
>>>
>>> return unsafeBitCast(self.state, to: Int.self
>>> )
>>> }
>>> }
>>>
>>> Coming back to the sample code present in the Hashable documentation, the new implementation would look like:
>>>
>>> extension GridPoint: HashVisitable
>>> {
>>>
>>> func hash<H: Hasher>(_ hasher: inout
>>> H) {
>>>
>>> self.x.hash(&
>>> hasher)
>>>
>>> self.y.hash(&
>>> hasher)
>>> }
>>> }
>>>
>>> Source compatibility
>>>
>>> Making use of "extending protocols to conform to protocols":
>>>
>>> extension Hashable: HashVisitable
>>> {
>>>
>>> func hash<H: Hasher>(_ hasher: inout
>>> H) {
>>>
>>> self.hashValue.hash(&
>>> hasher)
>>> }
>>> }
>>>
>>> Effect on ABI stability
>>>
>>> n/a
>>>
>>> Effect on API resilience
>>>
>>> This feature should be possible to add/remove without breaking ABI.
>>>
>>> Alternatives considered
>>>
>>> n/a
>>> _______________________________________________
>>> swift-evolution mailing list
>>> swift-evolution@swift.org
>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>
>> _______________________________________________
>> swift-evolution mailing list
>> swift-evolution@swift.org
>> https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Vincent_Esche · March 14, 2017, 8:34am

The documentation bug is not the sole motivation for the proposal.
I did in fact only find it while writing the accompanying blog post
<Medium,
in which I present a couple of additional motivations besides hash
collisions.

Namely:
- decoupling
- independence from a one-size-fits-all hasher impl
- multiple hashers per instance (for implementing Bloom Filters, etc.)
- different hashers for different use-cases (fast&weak for safe
bottlenecks, slow&secure for web-exposed APIs)
- …

I’m actually much more interested in the independence gained, than pure
performance.

···

On Tue, Mar 14, 2017 at 2:44 AM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

On Mon, Mar 13, 2017 at 8:30 PM, Tony Allevato via swift-evolution < > swift-evolution@swift.org> wrote:

Adding the list back to this reply because I don't believe you meant to
reply only to me, and I think it's an important discussion to have :)

On Mon, Mar 13, 2017 at 4:32 PM Vincent Esche < >> regexident.mailinglists@gmail.com> wrote:

Automatic generation of Hashable implementation code only "solves" the
problem of having to implement `var hashValue`.
It however only works for some types. By no means for all.

Certainly, and I never suggested that it would work for every case. I was
responding to Sean's point that the compiler should be able to do "good
enough" in the common cases and I offered a way that that can be achieved.

Take this hypothetical implementation of a HSB color:

struct Color {
let hue: UInt8
let saturation: UInt8
let brightness: UInt8
}

Let the semantic of this type be this:
Any two colors with a `brightness` of `0` are to be considered equal
regardless of their respective `hue` or `saturation`. At night all cats are
gray.

An auto-generated conformance impl for `Color` would most likely be wrong
afaict.
And those kind of semantics are everywhere.

Of course, and in those cases, you wouldn't want to use an auto-derived
Equatable or Hashable implementation. Those kinds of semantics are
"everywhere" for certain definitions of "everywhere", but similarly,
semantics where the hash value of a thing can be determined simply via
combination of the hash values of its components are also "everywhere".

I wouldn't suggest that auto-deriving Hashable solves *all* problems, and
similarly, your proposal doesn't help in the situation you described
either. Your API provides a different way to mix the hue, saturation, and
brightness components in an implementation of hashValue, but it doesn't
force the user to mix them *correctly* (by excluding hue/saturation if
brightness is zero).

So in both cases, users still have to provide custom implementations of
== and hashValue. But without auto-deriving, users have to provide them
even for the cases where the auto-derived implementation *would have been
correct.* Auto-deriving is about reducing the number of types that need to
have custom implementations, thereby reducing the chance that a user will
do it wrong.

When considering a type with auto-derived Hashable and Equatable, there
are two ways it can be wrong:

1) The auto-generated implementations of both == and hashValue don't
honor the semantic contract of the type (for example, don't ignore
brightness when it's zero).
2) The user overrides the auto-generated implementation of one of
==/hashValue but not both, and violates the contracts between them.

For #1, yes, the compiler generated an incorrect implementation in that
case. However, I would argue it's the developer's responsibility to fix it
by overriding it if they need different semantics. As I mentioned above,
without derivation, the developer could still implement it incorrectly,
just as they could with your API.

For #2, this could be solved by requiring users to override both if they
override one of them. Now they're back in the same situation as #1—they
either did it right, or they did it wrong.

Auto-generation clearly is no panacea when it comes to hashing.
Especially if it leads to invalid and unsafe default implementations.

Auto-deriving cannot produce "unsafe" implementations because it's
defined in terms of a function operating over the hash values of its
components. It can produce an implementation that does not match the
intended semantics of the class that are defined outside of its component
values, but that's not the compiler's job to decide; it's up to the user to
provide those.

Regarding unsafety, it's worth noting that your ContiguouslyHashable
protocol is inherently unsafe and fragile. Imagine that a user implements a
struct that they make conform to ContiguouslyHashable because at the time,
it's a simple fixed layout type with primitive fields. If they take that
type and add a new field to it that isn't contiguously hashable (say, a
class reference) and they forget to replace the ContiguouslyHashable
conformance, they now have a very broken type, and that behavior isn't
caught until *runtime* when things go wrong.

At least with derived conformances, in that situation the *compiler*
would see that the type was no longer hashable and emit an error when it's
used anywhere that Hashable/hashValue was expected.

So if safety is your motivation, ContiguouslyHashable kind of fights back
against that because it gives the user a door they can open—in some cases,
accidentally—that produces invalid results.

It would however be nice to be able to annotate types for which you want
HashVisitable implementations to be generated.

That's one of the still-open questions from the last discussion thread on
the topic; whether auto-deriving should be automatic for any type where it
is possible (as it is now for enums without associated values) or whether
the user should have to opt-in.

I actually don't see auto-generation as an alternative to a proper
hashing API. But rather as an addition.
HashVisitable and auto-deriving would actually work great together!

I completely agree that these ideas complement each other very well! And
I also agree that the language would do well to make it easier for users to
easily do the right thing. I just feel that *the API proposed here
specifically* adds too much complexity for the problem it's trying to solve.

I'd find it more compelling if it was simplified a bit:

* The standard library doesn't need to provide a number of hash
implementations; it should just provide one that works "well enough" in
most cases
* Hashable doesn't have tie itself to a visitor pattern. It's not
necessarily safer because users can still mix the wrong values; in that
case, I'd rather the stdlib implementation of the "good enough" hash just
be a standalone mixer that the language can encourage users to use.

I agree with this assessment. If SipHash is deemed "good enough," then
Swift should offer these as standalone facilities and encourage users to
call them in `Hashable`. I think the API design proposed here is much too
complex, but offering tried-and-true hash functions is certainly reasonable
and would be an improvement over xor.

Certainly also, the documentation for `Hashable` can be improved. In
general, however, it's not very convincing to motivate an entire proposal
for a new feature based on a documentation bug. Recognize that there are
(or at least have been, in past shipped versions of Swift) code snippets in
the documentation that _don't even compile_! If we accepted the logic put
forward here, we should give up on Swift entirely; after all, if even the
official Swift documentation can't provide code that compiles, what hope do
the rest of us have?!

In fact they already do in other languages <Rust Playground.

On Mon, Mar 13, 2017 at 8:51 PM, Tony Allevato via swift-evolution < >> swift-evolution@swift.org> wrote:

I'm not convinced this is the right approach: it couples the fact that a
type is hashable with a specific strategy for implementing the hash value
computation.

Instead, the language should make efforts to avoid requiring the user to
think about hashing algorithms *at all*; to answer Sean's question a couple
messages up, I've proposed in the past
<https://gist.github.com/allevato/2fd10290bfa84accfbe977d8ac07daad> (rough
working draft) adding derivation of Equatable and Hashable for structs and
enums where it's possible to automatically do so (for example, if all of
the enum's associated values are Equatable/Hashable). I've been
experimenting with an implementation in the compiler and I have something
that works for enums, except for recursive ones. I haven't started structs
yet because I think there are more open questions there, and I hope to
propose enums and structs independently so that the enum one doesn't get
bogged down by the struct one.

The core team seems to be aware of the need for this; the logic that
derives Equatable/Hashable for enums without associated values also has
TODO comments to handle those with associated values, and to handle structs
as well. Slava Pestov mentioned a while back that they encouraged PRs on
it, which is why I started, but my free time has been limited lately.

That being said, I *do* think there would be value in having some kind of
hash "mixer" as part of the standard library and strongly encouraging
through documentation that hashValue implementors use it. Getting the
function right is the hard part, and if Swift both (1) took care of it for
you by default in many cases and (2) made the harder cases easier by
providing a high quality canned implementation, I think we'd have a much
cleaner solution than what is being proposed here.

On Mon, Mar 13, 2017 at 12:18 PM Sean Heber via swift-evolution < >> swift-evolution@swift.org> wrote:

Is there any reason the API couldn’t be expressed as something along
these lines?

func hash() -> [Hashable] {
return [x, y]
}

l8r
Sean

> On Mar 13, 2017, at 2:15 PM, David Hart <david@hartbit.com> wrote:
>
>>
>> On 13 Mar 2017, at 18:54, Sean Heber via swift-evolution < >> swift-evolution@swift.org> wrote:
>>
>> I’m dumb when it comes to proper hashing, but it’s such a tediously
common thing in my experience to need to add Hashable to some kind of a
struct so I can stash it in a set or use it as a dictionary key. Is there
really no way to make this all more automatic? I have to be honest - this
proposal seems *harder* to understand than the way it works now.
>
> It's not really harder: just call hash on each of your type's
significant values:
>
> x.hash(&hasher)
> y.hash(&hasher)
>
> How would you implement hashValue in a simpler way, remembering that 'x
^ y' is an incorrect implementation?
>
>> Of course the easiest would be if the language could just do this
“good enough" for me using reflection or whatever and if I really did run
into a problem where I wanted to do this myself, I could override something.
>>
>> Perfect is the enemy of good.
>>
>> l8r
>> Sean
>>
>>
>>> On Mar 13, 2017, at 10:38 AM, Vincent Esche via swift-evolution < >> swift-evolution@swift.org> wrote:
>>>
>>> Hi there,
>>>
>>> I've written up a proposal for an overhaul/replacement of the
Hashable protocol and would love to hear your feedback on it!
>>>
>>> Rendered | Blog Post
>>>
>>> Cheers,
>>> Vincent
>>>
>>> Ps: I'd like to thank David Hart (@hartbit) for his great editorial
feedback on this proposal.
>>>
>>> HashVisitable
>>>
>>> • Proposal: SE-NNNN
>>> • Authors: Vincent Esche
>>> • Review Manager: TBD
>>> • Status: Awaiting review
>>> Introduction
>>>
>>> Replace the Hashable protocol by two new procotols (Hasher and
HashVisitable) to improve safety, versatility and learnability.
>>>
>>> Motivation
>>>
>>> Implementing Hashable is difficult and the consequences if not done
well have dire performance and safety repercussions.
>>>
>>> The documentation of Hashable lists a sample implementation of var
hashValue:
>>>
>>> /// A point in an x-y coordinate system.
>>> struct GridPoint {
>>> var x: Int
>>> var y: Int
>>> }
>>>
>>> extension GridPoint: Hashable {
>>> var hashValue: Int {
>>> return x.hashValue ^ y.hashValue
>>> }
>>>
>>> static func == (lhs: GridPoint, rhs: GridPoint) -> Bool {
>>> return lhs.x == rhs.x && lhs.y == rhs.y
>>> }
>>> }
>>>
>>> Calculating the hashes of all GridPoints (given the above
implementation) on a 1000 × 1000 grid …
>>>
>>> let (width, height) = (1000, 1000)
>>> let total = width * height
>>> var hashes = Set<Int>()
>>> for x in 0..<width {
>>> for y in 0..<height {
>>> hashes.insert(GridPoint(x: x, y: y).hashValue)
>>> }
>>> }
>>> print("\(hashes.count) unique hashes out of a total of \(total).")
>>>
>>> … results in just 1024 unique hash values for 1_000_000 unique values.
>>>
>>> In other words: The recommended implementation causes 99.9% of values
to trigger a hash collision.
>>>
>>> Out of those 1_000_000 values the median collision count was 976 with
min and max being 976 and 1000respectively.
>>>
>>> The collision rate will have negative impact in algorithms which
heavily use hashValue like the ones in Dictionaryand Set. Furthermore, it
increases the vulnerability to DDOS attacks when exposed to the web.
>>>
>>> If even the official Swift documentation gets the implementation of
hashValue wrong, then who is to expect the average Swift programmer to do
any better?
>>>
>>> In contrast, running the same snippet using HashVisitable and the
semi-secure Fnv1aHash (see below) results in zero collisions!
>>>
>>> Finally, the design of the Hashable protocol forces the use of one
implementation without the possibility of switching between multiple
hashing algorithms.
>>>
>>> Proposed solution
>>>
>>> Instead of coupling the hashing algorithm with each and every Swift
type, we should provide a hashing API based on the visitor-pattern. By
freeing application developers from the burden of having to implement
hashing algorithms, the Standard Library can provide default ones which
fulfill collision, performance and security goals. Furthermore, it would
allow developers to swap to different algorithms based on the use case.
>>>
>>> Detailed design
>>>
>>> The proposal deprecates the Hashable protocol and introduces the
following two:
>>>
>>> protocol Hasher
>>> {
>>>
>>> mutating func finish() -> Int
>>>
>>>
>>> mutating func write(bytes
>>> : UnsafeRawBufferPointer)
>>> }
>>>
>>>
>>> protocol HashVisitable
>>> {
>>>
>>> func hash<H: Hasher>(_ hasher: inout
>>> H)
>>> }
>>>
>>> Hasher is the protocol which represents a hashing algorithm, and
HashVisitable replaces Hashable. For types entirely represented by their
memory layout, the following protocol would provide a default
implementation:
>>>
>>> protocol ContiguouslyHashable: HashVisitable {}
>>>
>>> extension ContiguouslyHashable {
>>> func hash<H: Hasher>(_ hasher: inout H) {
>>> var mutableSelf = self
>>> try! Swift.withUnsafeBytes(of: &mutableSelf) {
>>> hasher.write(bytes: $0)
>>> }
>>> }
>>> }
>>>
>>> extension Bool : ContiguouslyHashable {}
>>> extension UInt8 : ContiguouslyHashable {}
>>> extension UInt16 : ContiguouslyHashable {}
>>> extension UInt32 : ContiguouslyHashable {}
>>> extension UInt64 : ContiguouslyHashable {}
>>> extension UInt : ContiguouslyHashable {}
>>> extension Int8 : ContiguouslyHashable {}
>>> extension Int16 : ContiguouslyHashable {}
>>> extension Int32 : ContiguouslyHashable {}
>>> extension Int64 : ContiguouslyHashable {}
>>> extension Int : ContiguouslyHashable {}
>>>
>>> The Standard-Library would then provide a set of hashing
implementations specific to each purpose. A possible choice for hashing
algorithms would be the reasonably fast SipHash-2-4, and the reasonably
secure SipHash-4-8.
>>>
>>> FNV-1A is another popular semi-secure but blazingly fast hash
algorithm, which – for the sake of demonstration – could be implemented as
follows:
>>>
>>> struct Fnv1aHash
>>> {
>>>
>>> fileprivate var state: UInt
>>>
>>>
>>> init(seed: UInt
>>> ) {
>>>
>>> self.state = seed &+ 14695981039346656037
>>>
>>> }
>>> }
>>>
>>>
>>> extension Fnv1aHash: Hasher
>>> {
>>>
>>> mutating func write(bytes
>>> : UnsafeRawBufferPointer) {
>>>
>>> for byte in
>>> bytes {
>>>
>>> self.state = (self.state ^ UInt(byte)) &* 1099511628211
>>>
>>> }
>>> }
>>>
>>> mutating func finish() -> Int
>>> {
>>>
>>> return unsafeBitCast(self.state, to: Int.self
>>> )
>>> }
>>> }
>>>
>>> Coming back to the sample code present in the Hashable documentation,
the new implementation would look like:
>>>
>>> extension GridPoint: HashVisitable
>>> {
>>>
>>> func hash<H: Hasher>(_ hasher: inout
>>> H) {
>>>
>>> self.x.hash(&
>>> hasher)
>>>
>>> self.y.hash(&
>>> hasher)
>>> }
>>> }
>>>
>>> Source compatibility
>>>
>>> Making use of "extending protocols to conform to protocols":
>>>
>>> extension Hashable: HashVisitable
>>> {
>>>
>>> func hash<H: Hasher>(_ hasher: inout
>>> H) {
>>>
>>> self.hashValue.hash(&
>>> hasher)
>>> }
>>> }
>>>
>>> Effect on ABI stability
>>>
>>> n/a
>>>
>>> Effect on API resilience
>>>
>>> This feature should be possible to add/remove without breaking ABI.
>>>
>>> Alternatives considered
>>>
>>> n/a
>>> _______________________________________________
>>> swift-evolution mailing list
>>> swift-evolution@swift.org
>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>
>> _______________________________________________
>> swift-evolution mailing list
>> swift-evolution@swift.org
>> https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Joe_Groff · March 14, 2017, 5:01pm

It would be reasonable, IMO, even before we support the full generalization of extensions on non-nominal types, to make tuples of Equatable/Hashable/Comparable elements conform to the same protocols by fiat.

-Joe

···

On Mar 14, 2017, at 10:00 AM, Vladimir.S <svabox@gmail.com> wrote:

On 14.03.2017 18:34, Joe Groff via swift-evolution wrote:

On Mar 13, 2017, at 10:54 AM, Sean Heber via swift-evolution >>> <swift-evolution@swift.org> wrote:

I’m dumb when it comes to proper hashing, but it’s such a tediously
common thing in my experience to need to add Hashable to some kind of
a struct so I can stash it in a set or use it as a dictionary key. Is
there really no way to make this all more automatic? I have to be
honest - this proposal seems *harder* to understand than the way it
works now. Of course the easiest would be if the language could just
do this “good enough" for me using reflection or whatever and if I
really did run into a problem where I wanted to do this myself, I
could override something.

Perfect is the enemy of good.

The compiler totally ought to derive reasonable default Equatable and
Hashable implementations for you. That would allow the standard library
to do the right thing without burdening most users with the need to
sweat the details.

I believe many of us dream of an feature of auto-generated Equatable and Hashable implementations.

Btw, what is your opinion, can we expect (at some point of time) Equatable&Hashable for tuples?
It is very handy to have Dictionary<Tuple<type1,type2>, type3> in C# and sad that we can't have [(type1,type2):type3] in Swift without declaration of new temporary struct type and implement Equatable&Hashable for it.

Joe_Groff · March 14, 2017, 6:30pm

For unspecialized code that takes a generic T: Hashable, that will place the only dynamic dispatch point on `hash`, so that will place an abstraction barrier between the Hasher and Self type being hashed, so would likely mean a dynamic call for every component of the value being hashed. Having `hashValue` be a dynamic dispatch point allows the hasher to be inlined together with the type's visitor implementation.

-Joe

···

On Mar 14, 2017, at 11:27 AM, David Hart <david@hartbit.com> wrote:

On 14 Mar 2017, at 16:41, Joe Groff via swift-evolution <swift-evolution@swift.org> wrote:

On Mar 13, 2017, at 8:38 AM, Vincent Esche via swift-evolution <swift-evolution@swift.org> wrote:

Source compatibility

Making use of "extending protocols to conform to protocols":

extension Hashable: HashVisitable
{

func hash<H: Hasher>(_ hasher: inout
H) {

self.hashValue.hash(&
hasher)
  }
}

We're unlikely to add this feature soon. It seems reasonable to me to instead have `HashVisitable` refine `Hashable` and provide a default implementation of `hashValue` using a default hasher. I think we still want `Hashable` to be the currency protocol most APIs work with for performance in unspecialized code, since we could inline the visitation and hasher implementation together inside the specialized `hashValue` witness.

Can you explain the performance argument? How does it fare (in your opinion) compared to the arguments in the proposal?

How about:

protocol Hashable {
   func hash<H: Hasher>(with hasher: inout H)
}

extension Hashable {
   var hashValue: Int {
       var hasher = StdLibDefaultHasher()
       hash(with: hasher)
       return hash.finish()
   }
}

dwaite · March 14, 2017, 11:45pm

It would be possible for a Hasher to return varying types, but then people might want the interface to support cryptographic hashing as well as variable length hashes (for things like HAMT)

-DW

···

On Mar 14, 2017, at 4:56 PM, Greg Parker via swift-evolution <swift-evolution@swift.org> wrote:

On Mar 14, 2017, at 12:01 PM, David Sweeris via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Are we committed to having `hashValue` always be an `Int`, or could it be an associated type like `(Int, Int, Int, Int)`? Seems like especially for something like a BigNum type or an Array, there might simple not be a reasonably efficient way to uniquely-ish represent 1024 bits with just 64 bits.

(This kinda feels like one of those questions where the answer starts out with a variation on “you’re missing the point”)

This would lead to an implementation nightmare for hashing containers. Consider what the implementation of Dictionary<Any, String> would look like if any of its keys could have incompatible hash outputs.

--
Greg Parker gparker@apple.com <mailto:gparker@apple.com> Runtime Wrangler

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Karl · March 14, 2017, 8:08pm

Couldn’t we solve that by adding hashValue with the protocol requirements (and with a default implementation)? IIRC, the standard library already does this for things like map. They are defined as requirements even though they are completely optional in practice.

protocol Hashable {
func hash<H: Hasher>(with hasher: inout H)
var hashValue: Int { get }
}

extension Hashable {
  var hashValue: Int {
      var hasher = StdLibDefaultHasher()
      hash(with: hasher)
      return hash.finish()
  }
}
- Karl

···

On 14 Mar 2017, at 19:30, Joe Groff via swift-evolution <swift-evolution@swift.org> wrote:

On Mar 14, 2017, at 11:27 AM, David Hart <david@hartbit.com> wrote:

On 14 Mar 2017, at 16:41, Joe Groff via swift-evolution <swift-evolution@swift.org> wrote:

On Mar 13, 2017, at 8:38 AM, Vincent Esche via swift-evolution <swift-evolution@swift.org> wrote:

Source compatibility

Making use of "extending protocols to conform to protocols":

extension Hashable: HashVisitable
{

func hash<H: Hasher>(_ hasher: inout
H) {

self.hashValue.hash(&
hasher)
}
}

We're unlikely to add this feature soon. It seems reasonable to me to instead have `HashVisitable` refine `Hashable` and provide a default implementation of `hashValue` using a default hasher. I think we still want `Hashable` to be the currency protocol most APIs work with for performance in unspecialized code, since we could inline the visitation and hasher implementation together inside the specialized `hashValue` witness.

Can you explain the performance argument? How does it fare (in your opinion) compared to the arguments in the proposal?

How about:

protocol Hashable {
  func hash<H: Hasher>(with hasher: inout H)
}

extension Hashable {
  var hashValue: Int {
      var hasher = StdLibDefaultHasher()
      hash(with: hasher)
      return hash.finish()
  }
}

For unspecialized code that takes a generic T: Hashable, that will place the only dynamic dispatch point on `hash`, so that will place an abstraction barrier between the Hasher and Self type being hashed, so would likely mean a dynamic call for every component of the value being hashed. Having `hashValue` be a dynamic dispatch point allows the hasher to be inlined together with the type's visitor implementation.

-Joe
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Vincent_Esche · March 14, 2017, 9:36pm

Pardon my ignorance here, but shouldn’t `foo<T: Protocol>(t: T)` always
monomorphize due to being generic? Isn’t that how `T: …` differs from,
say `foo(t: Protocol)`?

···

On Tue, Mar 14, 2017 at 7:30 PM, Joe Groff <jgroff@apple.com> wrote:

> On Mar 14, 2017, at 11:27 AM, David Hart <david@hartbit.com> wrote:
>
>
>
>> On 14 Mar 2017, at 16:41, Joe Groff via swift-evolution < > swift-evolution@swift.org> wrote:
>>
>>
>>> On Mar 13, 2017, at 8:38 AM, Vincent Esche via swift-evolution < > swift-evolution@swift.org> wrote:
>>>
>>> Source compatibility
>>>
>>> Making use of "extending protocols to conform to protocols":
>>>
>>> extension Hashable: HashVisitable
>>> {
>>>
>>> func hash<H: Hasher>(_ hasher: inout
>>> H) {
>>>
>>> self.hashValue.hash(&
>>> hasher)
>>> }
>>> }
>>
>> We're unlikely to add this feature soon. It seems reasonable to me to
instead have `HashVisitable` refine `Hashable` and provide a default
implementation of `hashValue` using a default hasher. I think we still want
`Hashable` to be the currency protocol most APIs work with for performance
in unspecialized code, since we could inline the visitation and hasher
implementation together inside the specialized `hashValue` witness.
>
> Can you explain the performance argument? How does it fare (in your
opinion) compared to the arguments in the proposal?
>
> How about:
>
> protocol Hashable {
> func hash<H: Hasher>(with hasher: inout H)
> }
>
> extension Hashable {
> var hashValue: Int {
> var hasher = StdLibDefaultHasher()
> hash(with: hasher)
> return hash.finish()
> }
> }

For unspecialized code that takes a generic T: Hashable, that will place
the only dynamic dispatch point on `hash`, so that will place an
abstraction barrier between the Hasher and Self type being hashed, so would
likely mean a dynamic call for every component of the value being hashed.
Having `hashValue` be a dynamic dispatch point allows the hasher to be
inlined together with the type's visitor implementation.

-Joe

allevato · March 13, 2017, 10:29pm

I'm not convinced this is the right approach: it couples the fact that a
type is hashable with a specific strategy for implementing the hash value
computation.

I’m confused. This proposal specifically *decouples* the implementation
of hashable from hashing strategies: that’s its main goal. Could you
explain how you see coupling here?

Hashable as it is currently defined says "this type provides a hashValue
property that allows it to be used in a hashing container". The proposed
API says "this type is hashable using a specific visitor API and hasher
interface". The "visitor API and hasher interface" is what I mean by
"strategy" in my original message, compared to the current Hashable which
simply says "it's hashable and here's its hash—a number".

I don't buy the argument that we need to support the same type to support
being hashed with arbitrary hashing algorithms at runtime. Hash values in
languages like Swift are intended to be used specifically for hashing
collections. What value for that use case is provided by allowing a
collection to plug in various algorithms for the same type? I don't think
the added flexibility pays for the increased complexity.

A simpler solution to increase safety/force people to consider the
resiliency of their hash function would just be to have Hashable.hashValue
return a concrete "HashValue" type that provides the mixing interface, with
a concrete implementation in the standard library

[self-follow up] ...to clarify what I meant here, since I was trying to
juggle too many things at once: hashValue could *return* a HashValue
protocol that has a default concrete implementation in the standard
library, but which users could write conforming types for as well.

I'll admit that does go counter to my original point about coupling hashing
to a specific API, but I was trying to offer a better balance between
safety and flexibility without the added complexity of requiring the
eventual consumer of the hash code to always choose a hash algorithm.

That being said, even though it's somewhat less safe/still opens the door
to user error, I think a simple hashValue: Int interface + a canned hasher
in the standard library is more than sufficient and we don't need to
overcomplicate the general case.

···

On Mon, Mar 13, 2017 at 3:14 PM Tony Allevato <tony.allevato@gmail.com> wrote:

On Mon, Mar 13, 2017 at 1:43 PM David Hart <david@hartbit.com> wrote:
On 13 Mar 2017, at 20:51, Tony Allevato <tony.allevato@gmail.com> wrote:

—but I think a simple interface that allows mixing (1) integer constants
and (2) hash values of other Hashables would be more sufficient and easier
to understand. If users need something more complex than that, then it
stands to reason that they *should* have to do a little more work.

Instead, the language should make efforts to avoid requiring the user to
think about hashing algorithms *at all*; to answer Sean's question a couple
messages up, I've proposed in the past
<https://gist.github.com/allevato/2fd10290bfa84accfbe977d8ac07daad> (rough
working draft) adding derivation of Equatable and Hashable for structs and
enums where it's possible to automatically do so (for example, if all of
the enum's associated values are Equatable/Hashable).

This proposal does so with the *ContinuouslyHashable* protocol, which
provides a default implementation of hashable that works for enums and
structs that have “inclusive” memory layouts. I think this is good thing,
because it means you have to make a conscious choice to use
*ContinuouslyHashable* instead of relying on an automatic implementation
which will sometimes be wrong.

From what I can tell, ContiguouslyHashable only works with fixed layout
types, whereas a derived implementation would work for anything that is
composed entirely of other Hashable types by mixing the hash values of
those components in a known way.

In such a situation, why do you expect that such an automatically
generated implementation would be "wrong"?

I've been experimenting with an implementation in the compiler and I have
something that works for enums, except for recursive ones. I haven't
started structs yet because I think there are more open questions there,
and I hope to propose enums and structs independently so that the enum one
doesn't get bogged down by the struct one.

The core team seems to be aware of the need for this; the logic that
derives Equatable/Hashable for enums without associated values also has
TODO comments to handle those with associated values, and to handle structs
as well. Slava Pestov mentioned a while back that they encouraged PRs on
it, which is why I started, but my free time has been limited lately.

That being said, I *do* think there would be value in having some kind of
hash "mixer" as part of the standard library and strongly encouraging
through documentation that hashValue implementors use it. Getting the
function right is the hard part, and if Swift both (1) took care of it for
you by default in many cases and (2) made the harder cases easier by
providing a high quality canned implementation, I think we'd have a much
cleaner solution than what is being proposed here.

On Mon, Mar 13, 2017 at 12:18 PM Sean Heber via swift-evolution < > swift-evolution@swift.org> wrote:

Is there any reason the API couldn’t be expressed as something along these
lines?

func hash() -> [Hashable] {
return [x, y]
}

l8r
Sean

> On Mar 13, 2017, at 2:15 PM, David Hart <david@hartbit.com> wrote:
>
>>
>> On 13 Mar 2017, at 18:54, Sean Heber via swift-evolution < > swift-evolution@swift.org> wrote:
>>
>> I’m dumb when it comes to proper hashing, but it’s such a tediously
common thing in my experience to need to add Hashable to some kind of a
struct so I can stash it in a set or use it as a dictionary key. Is there
really no way to make this all more automatic? I have to be honest - this
proposal seems *harder* to understand than the way it works now.
>
> It's not really harder: just call hash on each of your type's
significant values:
>
> x.hash(&hasher)
> y.hash(&hasher)
>
> How would you implement hashValue in a simpler way, remembering that 'x
^ y' is an incorrect implementation?
>
>> Of course the easiest would be if the language could just do this “good
enough" for me using reflection or whatever and if I really did run into a
problem where I wanted to do this myself, I could override something.
>>
>> Perfect is the enemy of good.
>>
>> l8r
>> Sean
>>
>>
>>> On Mar 13, 2017, at 10:38 AM, Vincent Esche via swift-evolution < > swift-evolution@swift.org> wrote:
>>>
>>> Hi there,
>>>
>>> I've written up a proposal for an overhaul/replacement of the Hashable
protocol and would love to hear your feedback on it!
>>>
>>> Rendered | Blog Post
>>>
>>> Cheers,
>>> Vincent
>>>
>>> Ps: I'd like to thank David Hart (@hartbit) for his great editorial
feedback on this proposal.
>>>
>>> HashVisitable
>>>
>>> • Proposal: SE-NNNN
>>> • Authors: Vincent Esche
>>> • Review Manager: TBD
>>> • Status: Awaiting review
>>> Introduction
>>>
>>> Replace the Hashable protocol by two new procotols (Hasher and
HashVisitable) to improve safety, versatility and learnability.
>>>
>>> Motivation
>>>
>>> Implementing Hashable is difficult and the consequences if not done
well have dire performance and safety repercussions.
>>>
>>> The documentation of Hashable lists a sample implementation of var
hashValue:
>>>
>>> /// A point in an x-y coordinate system.
>>> struct GridPoint {
>>> var x: Int
>>> var y: Int
>>> }
>>>
>>> extension GridPoint: Hashable {
>>> var hashValue: Int {
>>> return x.hashValue ^ y.hashValue
>>> }
>>>
>>> static func == (lhs: GridPoint, rhs: GridPoint) -> Bool {
>>> return lhs.x == rhs.x && lhs.y == rhs.y
>>> }
>>> }
>>>
>>> Calculating the hashes of all GridPoints (given the above
implementation) on a 1000 × 1000 grid …
>>>
>>> let (width, height) = (1000, 1000)
>>> let total = width * height
>>> var hashes = Set<Int>()
>>> for x in 0..<width {
>>> for y in 0..<height {
>>> hashes.insert(GridPoint(x: x, y: y).hashValue)
>>> }
>>> }
>>> print("\(hashes.count) unique hashes out of a total of \(total).")
>>>
>>> … results in just 1024 unique hash values for 1_000_000 unique values.
>>>
>>> In other words: The recommended implementation causes 99.9% of values
to trigger a hash collision.
>>>
>>> Out of those 1_000_000 values the median collision count was 976 with
min and max being 976 and 1000respectively.
>>>
>>> The collision rate will have negative impact in algorithms which
heavily use hashValue like the ones in Dictionaryand Set. Furthermore, it
increases the vulnerability to DDOS attacks when exposed to the web.
>>>
>>> If even the official Swift documentation gets the implementation of
hashValue wrong, then who is to expect the average Swift programmer to do
any better?
>>>
>>> In contrast, running the same snippet using HashVisitable and the
semi-secure Fnv1aHash (see below) results in zero collisions!
>>>
>>> Finally, the design of the Hashable protocol forces the use of one
implementation without the possibility of switching between multiple
hashing algorithms.
>>>
>>> Proposed solution
>>>
>>> Instead of coupling the hashing algorithm with each and every Swift
type, we should provide a hashing API based on the visitor-pattern. By
freeing application developers from the burden of having to implement
hashing algorithms, the Standard Library can provide default ones which
fulfill collision, performance and security goals. Furthermore, it would
allow developers to swap to different algorithms based on the use case.
>>>
>>> Detailed design
>>>
>>> The proposal deprecates the Hashable protocol and introduces the
following two:
>>>
>>> protocol Hasher
>>> {
>>>
>>> mutating func finish() -> Int
>>>
>>>
>>> mutating func write(bytes
>>> : UnsafeRawBufferPointer)
>>> }
>>>
>>>
>>> protocol HashVisitable
>>> {
>>>
>>> func hash<H: Hasher>(_ hasher: inout
>>> H)
>>> }
>>>
>>> Hasher is the protocol which represents a hashing algorithm, and
HashVisitable replaces Hashable. For types entirely represented by their
memory layout, the following protocol would provide a default
implementation:
>>>
>>> protocol ContiguouslyHashable: HashVisitable {}
>>>
>>> extension ContiguouslyHashable {
>>> func hash<H: Hasher>(_ hasher: inout H) {
>>> var mutableSelf = self
>>> try! Swift.withUnsafeBytes(of: &mutableSelf) {
>>> hasher.write(bytes: $0)
>>> }
>>> }
>>> }
>>>
>>> extension Bool : ContiguouslyHashable {}
>>> extension UInt8 : ContiguouslyHashable {}
>>> extension UInt16 : ContiguouslyHashable {}
>>> extension UInt32 : ContiguouslyHashable {}
>>> extension UInt64 : ContiguouslyHashable {}
>>> extension UInt : ContiguouslyHashable {}
>>> extension Int8 : ContiguouslyHashable {}
>>> extension Int16 : ContiguouslyHashable {}
>>> extension Int32 : ContiguouslyHashable {}
>>> extension Int64 : ContiguouslyHashable {}
>>> extension Int : ContiguouslyHashable {}
>>>
>>> The Standard-Library would then provide a set of hashing
implementations specific to each purpose. A possible choice for hashing
algorithms would be the reasonably fast SipHash-2-4, and the reasonably
secure SipHash-4-8.
>>>
>>> FNV-1A is another popular semi-secure but blazingly fast hash
algorithm, which – for the sake of demonstration – could be implemented as
follows:
>>>
>>> struct Fnv1aHash
>>> {
>>>
>>> fileprivate var state: UInt
>>>
>>>
>>> init(seed: UInt
>>> ) {
>>>
>>> self.state = seed &+ 14695981039346656037
>>>
>>> }
>>> }
>>>
>>>
>>> extension Fnv1aHash: Hasher
>>> {
>>>
>>> mutating func write(bytes
>>> : UnsafeRawBufferPointer) {
>>>
>>> for byte in
>>> bytes {
>>>
>>> self.state = (self.state ^ UInt(byte)) &* 1099511628211
>>>
>>> }
>>> }
>>>
>>> mutating func finish() -> Int
>>> {
>>>
>>> return unsafeBitCast(self.state, to: Int.self
>>> )
>>> }
>>> }
>>>
>>> Coming back to the sample code present in the Hashable documentation,
the new implementation would look like:
>>>
>>> extension GridPoint: HashVisitable
>>> {
>>>
>>> func hash<H: Hasher>(_ hasher: inout
>>> H) {
>>>
>>> self.x.hash(&
>>> hasher)
>>>
>>> self.y.hash(&
>>> hasher)
>>> }
>>> }
>>>
>>> Source compatibility
>>>
>>> Making use of "extending protocols to conform to protocols":
>>>
>>> extension Hashable: HashVisitable
>>> {
>>>
>>> func hash<H: Hasher>(_ hasher: inout
>>> H) {
>>>
>>> self.hashValue.hash(&
>>> hasher)
>>> }
>>> }
>>>
>>> Effect on ABI stability
>>>
>>> n/a
>>>
>>> Effect on API resilience
>>>
>>> This feature should be possible to add/remove without breaking ABI.
>>>
>>> Alternatives considered
>>>
>>> n/a
>>> _______________________________________________
>>> swift-evolution mailing list
>>> swift-evolution@swift.org
>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>
>> _______________________________________________
>> swift-evolution mailing list
>> swift-evolution@swift.org
>> https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Joe_Groff · March 14, 2017, 8:10pm

Sure, that would be fine. Introducing HashVisitable as a new protocol refining Hashable might make it easier to introduce later without disturbing the ABI.

-Joe

···

On Mar 14, 2017, at 1:08 PM, Karl Wagner <razielim@gmail.com> wrote:

On 14 Mar 2017, at 19:30, Joe Groff via swift-evolution <swift-evolution@swift.org> wrote:

On Mar 14, 2017, at 11:27 AM, David Hart <david@hartbit.com> wrote:

On 14 Mar 2017, at 16:41, Joe Groff via swift-evolution <swift-evolution@swift.org> wrote:

On Mar 13, 2017, at 8:38 AM, Vincent Esche via swift-evolution <swift-evolution@swift.org> wrote:

Source compatibility

Making use of "extending protocols to conform to protocols":

extension Hashable: HashVisitable
{

func hash<H: Hasher>(_ hasher: inout
H) {

self.hashValue.hash(&
hasher)
}
}

We're unlikely to add this feature soon. It seems reasonable to me to instead have `HashVisitable` refine `Hashable` and provide a default implementation of `hashValue` using a default hasher. I think we still want `Hashable` to be the currency protocol most APIs work with for performance in unspecialized code, since we could inline the visitation and hasher implementation together inside the specialized `hashValue` witness.

Can you explain the performance argument? How does it fare (in your opinion) compared to the arguments in the proposal?

How about:

protocol Hashable {
  func hash<H: Hasher>(with hasher: inout H)
}

extension Hashable {
  var hashValue: Int {
      var hasher = StdLibDefaultHasher()
      hash(with: hasher)
      return hash.finish()
  }
}

For unspecialized code that takes a generic T: Hashable, that will place the only dynamic dispatch point on `hash`, so that will place an abstraction barrier between the Hasher and Self type being hashed, so would likely mean a dynamic call for every component of the value being hashed. Having `hashValue` be a dynamic dispatch point allows the hasher to be inlined together with the type's visitor implementation.

-Joe
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Couldn’t we solve that by adding hashValue with the protocol requirements (and with a default implementation)? IIRC, the standard library already does this for things like map. They are defined as requirements even though they are completely optional in practice.

protocol Hashable {
  func hash<H: Hasher>(with hasher: inout H)
  var hashValue: Int { get }
}

extension Hashable {
  var hashValue: Int {
      var hasher = StdLibDefaultHasher()
      hash(with: hasher)
      return hash.finish()
  }
}

Joe_Groff · March 14, 2017, 9:40pm

The compiler ought to be able to monomorphize either signature, given that it can see the implementation of `foo`. If `foo` is a public function from another module, you can't see its implementation.

-Joe

···

On Mar 14, 2017, at 2:36 PM, Vincent Esche <regexident.mailinglists@gmail.com> wrote:

Pardon my ignorance here, but shouldn’t `foo<T: Protocol>(t: T)` always monomorphize due to being generic? Isn’t that how `T: …` differs from, say `foo(t: Protocol)`?

jadengeller · March 15, 2017, 12:10am

If this were desired, an associated type would not be sufficient. Perhaps `Dictionary` only supports keys where `Hash == Int`. Then any type that might key a dictionary must use `Int` as it’s hash type. It wouldn’t be possible to define a second implementation with `Int128` or any other types as the hash type since each type can only conform to a protocol once.

Now, if we had generic protocols, this would be possible…

extension Foo: Hashable<Int> { … }
extension Foo: Hashable<Int128> { … }

I’m not sure if that would even be a reasonable design though; I’m just pointing out that the associated type would be problematic.

Cheers,
Jaden Geller

···

On Mar 14, 2017, at 4:45 PM, David Waite via swift-evolution <swift-evolution@swift.org> wrote:

It would be possible for a Hasher to return varying types, but then people might want the interface to support cryptographic hashing as well as variable length hashes (for things like HAMT)

-DW

On Mar 14, 2017, at 4:56 PM, Greg Parker via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Mar 14, 2017, at 12:01 PM, David Sweeris via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Are we committed to having `hashValue` always be an `Int`, or could it be an associated type like `(Int, Int, Int, Int)`? Seems like especially for something like a BigNum type or an Array, there might simple not be a reasonably efficient way to uniquely-ish represent 1024 bits with just 64 bits.

(This kinda feels like one of those questions where the answer starts out with a variation on “you’re missing the point”)

This would lead to an implementation nightmare for hashing containers. Consider what the implementation of Dictionary<Any, String> would look like if any of its keys could have incompatible hash outputs.

--
Greg Parker gparker@apple.com <mailto:gparker@apple.com> Runtime Wrangler

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Brent_Royal-Gordon · March 15, 2017, 12:04pm

One important point about SipHash and similar secure hash functions is that they work better if all nested components of a hashable value are (recursively) appended to a single hasher, rather than having each component create its own separate hasher instance in its hashValue implementation. Initializing and finalizing the hasher state is not free; it has some nontrivial cost. So ideally only collections would create and finalize hashers; composite types that implement Hashable should not do that.

Are there any decent hash functions which *don't* have this property? Specifically, are there any which are transitive (but not commutative), so this property holds?

hash(a) ++ (hash(b) ++ hash(c)) == (hash(a) ++ hash(b)) ++ hash(c)

(Where `++` is a "combine two hashes" function.)

What I sort of envision is a `Hash` type in the standard library, with a `Hashable` protocol which simply looks like:

  protocol Hashable: Equatable {
    var hashValue: Hash { get }
  }

A Hashable conformance would probably just look like:

  extension Person: Hashable {
    var hashValue: Hash {
      return Hash(firstName, lastName, birthDate)
    }
  }

Where all three types themselves were `Hashable` and returned `Hash`es of their own.

A transitive function would surely not be cryptographically strong—but we're not looking for cryptographic strength here, just good mixing and entropy preservation with some distinction between the left-hand and right-hand sides of the operation. I'm not about to do a literature search when I need to be awake in four hours, but I suspect somebody has tackled this problem before. If they have, we can get much better hashing without using generics or radically changing the way people hash—

Therefore, implementing SipHash as a separate, optional facility for use inside individual hashValue implementations (as opposed to being an outright replacement for it) would be somewhat inefficient. It also wouldn't discourage/prevent people from continuing to use xor and similar ad hoc hashing methods; people would need to discover these useful new hash utilities on their own.

and a design where you returned a special Hash type, rather than a plain Int, would largely avoid this very real human factors concern.

···

On Mar 13, 2017, at 10:35 PM, Károly Lőrentey via swift-evolution <swift-evolution@swift.org> wrote:

--
Brent Royal-Gordon
Architechies

Karl · March 15, 2017, 4:06am

I believe the associated type would be on the Hasher in that case (not the Hashee, which just feeds it bytes). So if the Hasher was a generic parameter on Dictionary with suitable default, its keys would all produce values of the same type anyway. The “hashValue” property could be defined as having to return a word-sized hash; if you override it and use your own “standard hasher” for a particular type, you would have to compact it in to an Int somehow.

The big problem with making Dictionary<K, V> in to Dictionary<K, V, H> is that from an API perspective, almost nobody who uses a Dictionary cares about the hashing function being used. We could solve this by allowing partially-bound generic types (e.g. Dictionary<K, V, _>) and either define a typealias, or formalise the notion of a private/optional generic type parameter which does not affect a type’s API and is never required to be specified. Alternatively, we could refactor Dictionary’s implementation as SecureDictionary<K, V, H> and keep Dictionary<K, V> as a wrapper (with the obvious caveat that your custom dictionaries won’t work with most other library code).

- Karl

···

On 15 Mar 2017, at 01:10, Jaden Geller via swift-evolution <swift-evolution@swift.org> wrote:

If this were desired, an associated type would not be sufficient. Perhaps `Dictionary` only supports keys where `Hash == Int`. Then any type that might key a dictionary must use `Int` as it’s hash type. It wouldn’t be possible to define a second implementation with `Int128` or any other types as the hash type since each type can only conform to a protocol once.

Now, if we had generic protocols, this would be possible…

extension Foo: Hashable<Int> { … }
extension Foo: Hashable<Int128> { … }

I’m not sure if that would even be a reasonable design though; I’m just pointing out that the associated type would be problematic.

Cheers,
Jaden Geller

jjrscott · July 6, 2022, 1:42pm

I appreciate this thread is a little old but couldn't find any other relevant discussion ...

My API is forced to receive Hashable which I then need to convert into a stable value. However I see that Hasher is a struct rather than a protocol.

What do people think of the idea of moving Hasher to be a protocol with the existing struct renamed to something else?

Apart from fixing my issue this would have the side effect of fixing this issue to:

Finalizing consumes the hasher: it is illegal to finalize a hasher you don't own, or to perform operations on a finalized hasher. (These may become compile-time errors in the future.)

Alvae · July 6, 2022, 3:16pm

Why would you need Hasher to be a protocol?
Some code example would help understand your problem.

Also, I don't understand why you think it is an issue that finalizing a hasher consumes it, or why using a protocol would change that semantic property.