[Draft] Hasher & HashVisitable


(Vincent Esche) #1

Hi there,

I've written up a proposal for an overhaul/replacement of the Hashable
protocol and would love to hear your feedback on it!

Rendered
<https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25> | Blog
Post <https://blog.definiteloops.com/ha-r-sh-visitors-8c0c3686a46f>

Cheers,
Vincent

Ps: I'd like to thank David Hart (@hartbit) for his great editorial
feedback on this proposal. :+1:

HashVisitable

   - Proposal: SE-NNNN <https://gist.github.com/regexident/NNNN-filename.md>
   - Authors: Vincent Esche <https://github.com/regexident>
   - Review Manager: TBD
   - Status: Awaiting review

<https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#introduction>
Introduction

Replace the Hashable protocol by two new procotols (Hasher and HashVisitable)
to improve safety, versatility and learnability.
<https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#motivation>
Motivation

Implementing Hashable is difficult and the consequences if not done well
have dire performance and safety repercussions.

The documentation of Hashable lists a sample implementation
<https://developer.apple.com/reference/swift/hashable> of var hashValue:

/// A point in an x-y coordinate system.
struct GridPoint {
    var x: Int
    var y: Int
}

extension GridPoint: Hashable {
    var hashValue: Int {
        return x.hashValue ^ y.hashValue
    }

    static func == (lhs: GridPoint, rhs: GridPoint) -> Bool {
        return lhs.x == rhs.x && lhs.y == rhs.y
    }
}

Calculating the hashes of all GridPoints (given the above implementation)
on a 1000 × 1000 grid …

let (width, height) = (1000, 1000)
let total = width * height
var hashes = Set<Int>()
for x in 0..<width {
    for y in 0..<height {
        hashes.insert(GridPoint(x: x, y: y).hashValue)
    }
}
print("\(hashes.count) unique hashes out of a total of \(total).")

… results in just 1024 unique hash values for 1_000_000 unique values.

In other words: The recommended implementation causes 99.9% of values to
trigger a hash collision.

Out of those 1_000_000 values the median collision count was 976 with min
and max being 976 and 1000respectively.

The collision rate will have negative impact in algorithms which heavily
use hashValue like the ones in Dictionaryand Set. Furthermore, it increases
the vulnerability to DDOS attacks when exposed to the web
<https://arstechnica.com/business/2011/12/huge-portions-of-web-vulnerable-to-hashing-denial-of-service-attack/>
.

If even the official Swift documentation gets the implementation of
hashValue wrong, then who is to expect the average Swift programmer to do
any better?

In contrast, running the same snippet using HashVisitable and the
semi-secure Fnv1aHash (see below) results in zero collisions!

Finally, the design of the Hashable protocol forces the use of one
implementation without the possibility of switching between multiple
hashing algorithms.
<https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#proposed-solution>Proposed
solution

Instead of coupling the hashing algorithm with each and every Swift type,
we should provide a hashing API based on the visitor-pattern. By freeing
application developers from the burden of having to implement hashing
algorithms, the Standard Library can provide default ones which fulfill
collision, performance and security goals. Furthermore, it would allow
developers to swap to different algorithms based on the use case.
<https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#detailed-design>Detailed
design

The proposal deprecates the Hashable protocol and introduces the following
two:

protocol Hasher {
    mutating func finish() -> Int
    mutating func write(bytes: UnsafeRawBufferPointer)
}
protocol HashVisitable {
    func hash<H: Hasher>(_ hasher: inout H)
}

Hasher is the protocol which represents a hashing algorithm, and
HashVisitable replaces Hashable. For types entirely represented by their
memory layout, the following protocol would provide a default
implementation:

protocol ContiguouslyHashable: HashVisitable {}

extension ContiguouslyHashable {
    func hash<H: Hasher>(_ hasher: inout H) {
        var mutableSelf = self
        try! Swift.withUnsafeBytes(of: &mutableSelf) {
            hasher.write(bytes: $0)
        }
    }
}

extension Bool : ContiguouslyHashable {}
extension UInt8 : ContiguouslyHashable {}
extension UInt16 : ContiguouslyHashable {}
extension UInt32 : ContiguouslyHashable {}
extension UInt64 : ContiguouslyHashable {}
extension UInt : ContiguouslyHashable {}
extension Int8 : ContiguouslyHashable {}
extension Int16 : ContiguouslyHashable {}
extension Int32 : ContiguouslyHashable {}
extension Int64 : ContiguouslyHashable {}
extension Int : ContiguouslyHashable {}

The Standard-Library would then provide a set of hashing implementations
specific to each purpose. A possible choice for hashing algorithms would be
the reasonably fast SipHash-2-4 <https://en.wikipedia.org/wiki/SipHash>,
and the reasonably secure SipHash-4-8
<https://en.wikipedia.org/wiki/SipHash>.

FNV-1A is another popular semi-secure but blazingly fast hash algorithm,
which – for the sake of demonstration – could be implemented as follows:

struct Fnv1aHash {
    fileprivate var state: UInt
    init(seed: UInt) {
        self.state = seed &+ 14695981039346656037
    }
}
extension Fnv1aHash: Hasher {
    mutating func write(bytes: UnsafeRawBufferPointer) {
        for byte in bytes {
            self.state = (self.state ^ UInt(byte)) &* 1099511628211
        }
    }
    mutating func finish() -> Int {
        return unsafeBitCast(self.state, to: Int.self)
    }
}

Coming back to the sample code present in the Hashable documentation, the
new implementation would look like:

extension GridPoint: HashVisitable {
    func hash<H: Hasher>(_ hasher: inout H) {
        self.x.hash(&hasher)
        self.y.hash(&hasher)
    }
}

<https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#source-compatibility>Source
compatibility

Making use of "extending protocols to conform to protocols
<https://github.com/apple/swift-evolution/blob/d33c129f0920af0404f42219db56981411b20e76/proposals/0143-conditional-conformances.md#extending-protocols-to-conform-to-protocols>
":

extension Hashable: HashVisitable {
    func hash<H: Hasher>(_ hasher: inout H) {
        self.hashValue.hash(&hasher)
    }
}

<https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#effect-on-abi-stability>Effect
on ABI stability

n/a
<https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#effect-on-api-resilience>Effect
on API resilience

This feature should be possible to add/remove without breaking ABI.
<https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#alternatives-considered>Alternatives
considered

n/a


Combining hashes
Combining hashes
(Sean Heber) #2

I’m dumb when it comes to proper hashing, but it’s such a tediously common thing in my experience to need to add Hashable to some kind of a struct so I can stash it in a set or use it as a dictionary key. Is there really no way to make this all more automatic? I have to be honest - this proposal seems *harder* to understand than the way it works now. Of course the easiest would be if the language could just do this “good enough" for me using reflection or whatever and if I really did run into a problem where I wanted to do this myself, I could override something.

Perfect is the enemy of good.

l8r
Sean

···

On Mar 13, 2017, at 10:38 AM, Vincent Esche via swift-evolution <swift-evolution@swift.org> wrote:

Hi there,

I've written up a proposal for an overhaul/replacement of the Hashable protocol and would love to hear your feedback on it!

Rendered | Blog Post

Cheers,
Vincent

Ps: I'd like to thank David Hart (@hartbit) for his great editorial feedback on this proposal. :+1:

HashVisitable

  • Proposal: SE-NNNN
  • Authors: Vincent Esche
  • Review Manager: TBD
  • Status: Awaiting review
Introduction

Replace the Hashable protocol by two new procotols (Hasher and HashVisitable) to improve safety, versatility and learnability.

Motivation

Implementing Hashable is difficult and the consequences if not done well have dire performance and safety repercussions.

The documentation of Hashable lists a sample implementation of var hashValue:

/// A point in an x-y coordinate system.
struct GridPoint {
    var x: Int
    var y: Int
}

extension GridPoint: Hashable {
    var hashValue: Int {
        return x.hashValue ^ y.hashValue
    }

    static func == (lhs: GridPoint, rhs: GridPoint) -> Bool {
        return lhs.x == rhs.x && lhs.y == rhs.y
    }
}

Calculating the hashes of all GridPoints (given the above implementation) on a 1000 × 1000 grid …

let (width, height) = (1000, 1000)
let total = width * height
var hashes = Set<Int>()
for x in 0..<width {
    for y in 0..<height {
        hashes.insert(GridPoint(x: x, y: y).hashValue)
    }
}
print("\(hashes.count) unique hashes out of a total of \(total).")

… results in just 1024 unique hash values for 1_000_000 unique values.

In other words: The recommended implementation causes 99.9% of values to trigger a hash collision.

Out of those 1_000_000 values the median collision count was 976 with min and max being 976 and 1000respectively.

The collision rate will have negative impact in algorithms which heavily use hashValue like the ones in Dictionaryand Set. Furthermore, it increases the vulnerability to DDOS attacks when exposed to the web.

If even the official Swift documentation gets the implementation of hashValue wrong, then who is to expect the average Swift programmer to do any better?

In contrast, running the same snippet using HashVisitable and the semi-secure Fnv1aHash (see below) results in zero collisions!

Finally, the design of the Hashable protocol forces the use of one implementation without the possibility of switching between multiple hashing algorithms.

Proposed solution

Instead of coupling the hashing algorithm with each and every Swift type, we should provide a hashing API based on the visitor-pattern. By freeing application developers from the burden of having to implement hashing algorithms, the Standard Library can provide default ones which fulfill collision, performance and security goals. Furthermore, it would allow developers to swap to different algorithms based on the use case.

Detailed design

The proposal deprecates the Hashable protocol and introduces the following two:

protocol Hasher
{
    
mutating func finish() -> Int

mutating func write(bytes
: UnsafeRawBufferPointer)
}

protocol HashVisitable
{
    
func hash<H: Hasher>(_ hasher: inout
H)
}

Hasher is the protocol which represents a hashing algorithm, and HashVisitable replaces Hashable. For types entirely represented by their memory layout, the following protocol would provide a default implementation:

protocol ContiguouslyHashable: HashVisitable {}

extension ContiguouslyHashable {
    func hash<H: Hasher>(_ hasher: inout H) {
        var mutableSelf = self
        try! Swift.withUnsafeBytes(of: &mutableSelf) {
            hasher.write(bytes: $0)
        }
    }
}

extension Bool : ContiguouslyHashable {}
extension UInt8 : ContiguouslyHashable {}
extension UInt16 : ContiguouslyHashable {}
extension UInt32 : ContiguouslyHashable {}
extension UInt64 : ContiguouslyHashable {}
extension UInt : ContiguouslyHashable {}
extension Int8 : ContiguouslyHashable {}
extension Int16 : ContiguouslyHashable {}
extension Int32 : ContiguouslyHashable {}
extension Int64 : ContiguouslyHashable {}
extension Int : ContiguouslyHashable {}

The Standard-Library would then provide a set of hashing implementations specific to each purpose. A possible choice for hashing algorithms would be the reasonably fast SipHash-2-4, and the reasonably secure SipHash-4-8.

FNV-1A is another popular semi-secure but blazingly fast hash algorithm, which – for the sake of demonstration – could be implemented as follows:

struct Fnv1aHash
{
    
fileprivate var state: UInt

init(seed: UInt
) {
        
self.state = seed &+ 14695981039346656037

    }
}

extension Fnv1aHash: Hasher
{
    
mutating func write(bytes
: UnsafeRawBufferPointer) {
        
for byte in
bytes {
            
self.state = (self.state ^ UInt(byte)) &* 1099511628211

        }
    }
    
mutating func finish() -> Int
{
        
return unsafeBitCast(self.state, to: Int.self
)
    }
}

Coming back to the sample code present in the Hashable documentation, the new implementation would look like:

extension GridPoint: HashVisitable
{
    
func hash<H: Hasher>(_ hasher: inout
H) {
        
self.x.hash(&
hasher)
        
self.y.hash(&
hasher)
    }
}

Source compatibility

Making use of "extending protocols to conform to protocols":

extension Hashable: HashVisitable
{
    
func hash<H: Hasher>(_ hasher: inout
H) {
        
self.hashValue.hash(&
hasher)
    }
}

Effect on ABI stability

n/a

Effect on API resilience

This feature should be possible to add/remove without breaking ABI.

Alternatives considered

n/a
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Haravikk) #3

I really like the proposal!

Only thing I'm wondering is whether a convenience function on Hasher might be nice to have, like so:

  extension Hasher {
    mutating func hash<H:Hashable>(_ hashable:H) -> Int {
      hashable.hash(self)
      return self.finish()
    }
  }

This just feels like it would be easier for the most common case where you just need to hash a single value right away, as you'd just do:

  let hashValue = hasher.hash(foo)

While leaving the inout style available for when you do need to combine multiple values together or otherwise feed in data gradually. This way we're not losing as much of the convenience of the .hashValue that's being deprecated. On that note, should Hashable be updated to do something like the above as well, so it still immediately returns a value from a default hasher?

Anyway, definitely in favour of this change, as you're right; .hashValue is a hard thing to do right, and is often an after-thought, which can make it of limited usefulness to types that depend upon it!

···

On 13 Mar 2017, at 15:38, Vincent Esche via swift-evolution <swift-evolution@swift.org> wrote:

Hi there,

I've written up a proposal for an overhaul/replacement of the Hashable protocol and would love to hear your feedback on it!

Rendered <https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25> | Blog Post <https://blog.definiteloops.com/ha-r-sh-visitors-8c0c3686a46f>

Cheers,
Vincent

Ps: I'd like to thank David Hart (@hartbit) for his great editorial feedback on this proposal. :+1:

HashVisitable

Proposal: SE-NNNN <https://gist.github.com/regexident/NNNN-filename.md>
Authors: Vincent Esche <https://github.com/regexident>
Review Manager: TBD
Status: Awaiting review
<https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#introduction>Introduction

Replace the Hashable protocol by two new procotols (Hasher and HashVisitable) to improve safety, versatility and learnability.

<https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#motivation>Motivation

Implementing Hashable is difficult and the consequences if not done well have dire performance and safety repercussions.

The documentation of Hashable lists a sample implementation <https://developer.apple.com/reference/swift/hashable> of var hashValue:

/// A point in an x-y coordinate system.
struct GridPoint {
    var x: Int
    var y: Int
}

extension GridPoint: Hashable {
    var hashValue: Int {
        return x.hashValue ^ y.hashValue
    }

    static func == (lhs: GridPoint, rhs: GridPoint) -> Bool {
        return lhs.x == rhs.x && lhs.y == rhs.y
    }
}
Calculating the hashes of all GridPoints (given the above implementation) on a 1000 × 1000 grid …

let (width, height) = (1000, 1000)
let total = width * height
var hashes = Set<Int>()
for x in 0..<width {
    for y in 0..<height {
        hashes.insert(GridPoint(x: x, y: y).hashValue)
    }
}
print("\(hashes.count) unique hashes out of a total of \(total).")
… results in just 1024 unique hash values for 1_000_000 unique values.

In other words: The recommended implementation causes 99.9% of values to trigger a hash collision.

Out of those 1_000_000 values the median collision count was 976 with min and max being 976 and 1000respectively.

The collision rate will have negative impact in algorithms which heavily use hashValue like the ones in Dictionaryand Set. Furthermore, it increases the vulnerability to DDOS attacks when exposed to the web <https://arstechnica.com/business/2011/12/huge-portions-of-web-vulnerable-to-hashing-denial-of-service-attack/>.

If even the official Swift documentation gets the implementation of hashValue wrong, then who is to expect the average Swift programmer to do any better?

In contrast, running the same snippet using HashVisitable and the semi-secure Fnv1aHash (see below) results in zero collisions!

Finally, the design of the Hashable protocol forces the use of one implementation without the possibility of switching between multiple hashing algorithms.

<https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#proposed-solution>Proposed solution

Instead of coupling the hashing algorithm with each and every Swift type, we should provide a hashing API based on the visitor-pattern. By freeing application developers from the burden of having to implement hashing algorithms, the Standard Library can provide default ones which fulfill collision, performance and security goals. Furthermore, it would allow developers to swap to different algorithms based on the use case.

<https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#detailed-design>Detailed design

The proposal deprecates the Hashable protocol and introduces the following two:

protocol Hasher {
    mutating func finish() -> Int
    mutating func write(bytes: UnsafeRawBufferPointer)
}

protocol HashVisitable {
    func hash<H: Hasher>(_ hasher: inout H)
}
Hasher is the protocol which represents a hashing algorithm, and HashVisitable replaces Hashable. For types entirely represented by their memory layout, the following protocol would provide a default implementation:

protocol ContiguouslyHashable: HashVisitable {}

extension ContiguouslyHashable {
    func hash<H: Hasher>(_ hasher: inout H) {
        var mutableSelf = self
        try! Swift.withUnsafeBytes(of: &mutableSelf) {
            hasher.write(bytes: $0)
        }
    }
}

extension Bool : ContiguouslyHashable {}
extension UInt8 : ContiguouslyHashable {}
extension UInt16 : ContiguouslyHashable {}
extension UInt32 : ContiguouslyHashable {}
extension UInt64 : ContiguouslyHashable {}
extension UInt : ContiguouslyHashable {}
extension Int8 : ContiguouslyHashable {}
extension Int16 : ContiguouslyHashable {}
extension Int32 : ContiguouslyHashable {}
extension Int64 : ContiguouslyHashable {}
extension Int : ContiguouslyHashable {}
The Standard-Library would then provide a set of hashing implementations specific to each purpose. A possible choice for hashing algorithms would be the reasonably fast SipHash-2-4 <https://en.wikipedia.org/wiki/SipHash>, and the reasonably secure SipHash-4-8 <https://en.wikipedia.org/wiki/SipHash>.

FNV-1A is another popular semi-secure but blazingly fast hash algorithm, which – for the sake of demonstration – could be implemented as follows:

struct Fnv1aHash {
    fileprivate var state: UInt
    init(seed: UInt) {
        self.state = seed &+ 14695981039346656037
    }
}

extension Fnv1aHash: Hasher {
    mutating func write(bytes: UnsafeRawBufferPointer) {
        for byte in bytes {
            self.state = (self.state ^ UInt(byte)) &* 1099511628211
        }
    }
    mutating func finish() -> Int {
        return unsafeBitCast(self.state, to: Int.self)
    }
}
Coming back to the sample code present in the Hashable documentation, the new implementation would look like:

extension GridPoint: HashVisitable {
    func hash<H: Hasher>(_ hasher: inout H) {
        self.x.hash(&hasher)
        self.y.hash(&hasher)
    }
}
<https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#source-compatibility>Source compatibility

Making use of "extending protocols to conform to protocols <https://github.com/apple/swift-evolution/blob/d33c129f0920af0404f42219db56981411b20e76/proposals/0143-conditional-conformances.md#extending-protocols-to-conform-to-protocols>":

extension Hashable: HashVisitable {
    func hash<H: Hasher>(_ hasher: inout H) {
        self.hashValue.hash(&hasher)
    }
}
<https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#effect-on-abi-stability>Effect on ABI stability

n/a

<https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#effect-on-api-resilience>Effect on API resilience

This feature should be possible to add/remove without breaking ABI.

<https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#alternatives-considered>Alternatives considered

n/a
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(David Sweeris) #4

And that’s why I tend to look at Dictionary<> and Set<> kinda like they’re that kid in The Exorcist whenever I start thinking about how they work…

I understand hashing in theory, but can’t for the life of me come up with an algorithm for creating them the wouldn't almost immediately result in collisions for the kinds of use-cases I envision. I haven’t finished reading the proposal yet, but I wanted to get a firm and enthusiastic +1 in for its intent.

···

On Mar 13, 2017, at 8:38 AM, Vincent Esche via swift-evolution <swift-evolution@swift.org> wrote:

Hi there,

I've written up a proposal for an overhaul/replacement of the Hashable protocol and would love to hear your feedback on it!

Rendered <https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25> | Blog Post <https://blog.definiteloops.com/ha-r-sh-visitors-8c0c3686a46f>

Cheers,
Vincent

Ps: I'd like to thank David Hart (@hartbit) for his great editorial feedback on this proposal. :+1:

HashVisitable

Proposal: SE-NNNN <https://gist.github.com/regexident/NNNN-filename.md>
Authors: Vincent Esche <https://github.com/regexident>
Review Manager: TBD
Status: Awaiting review
<https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#introduction>Introduction

Replace the Hashable protocol by two new procotols (Hasher and HashVisitable) to improve safety, versatility and learnability.

<https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#motivation>Motivation

Implementing Hashable is difficult and the consequences if not done well have dire performance and safety repercussions.

The documentation of Hashable lists a sample implementation <https://developer.apple.com/reference/swift/hashable> of var hashValue:

/// A point in an x-y coordinate system.
struct GridPoint {
    var x: Int
    var y: Int
}

extension GridPoint: Hashable {
    var hashValue: Int {
        return x.hashValue ^ y.hashValue
    }

    static func == (lhs: GridPoint, rhs: GridPoint) -> Bool {
        return lhs.x == rhs.x && lhs.y == rhs.y
    }
}
Calculating the hashes of all GridPoints (given the above implementation) on a 1000 × 1000 grid …

let (width, height) = (1000, 1000)
let total = width * height
var hashes = Set<Int>()
for x in 0..<width {
    for y in 0..<height {
        hashes.insert(GridPoint(x: x, y: y).hashValue)
    }
}
print("\(hashes.count) unique hashes out of a total of \(total).")
… results in just 1024 unique hash values for 1_000_000 unique values.

In other words: The recommended implementation causes 99.9% of values to trigger a hash collision.


(Dimitri Racordon) #5

While I understand and agree with the goal of the proposal, I too feel like it’s more difficult to understand than the current approach. HashVisitable looks odd, and one would have to understand the role and API of the Hasher protocol.

I would argue that if Swift provided a good default hashing API, then we could simply call it when computing the hashValue.

struct GridPoint: Hashable {
  var x: Int
  var y: Int
  var hashValue: Int {
    let hasher = Swift.DefaultHasher()
    hasher.hash(self.x)
    hasher.hash(self.y)
    return hasher.value
  }
  static func == (lhs: GridPoint, rhs: GridPoint) -> Bool { /* … */ }
}

It might seem wordy, but it would allow one to leverage good hashing algorithms, while not forcing people who do have a good hashing function for their particular type to define their own hasher.

On an unrelated note, maybe I missed something but I think HashVisitable would still need to conform to Equatable as well, if it were to replace Hashable. This is unclear in your proposal, as the equality operator is present in the motivational example, but absent from the proposed solution.

···

On 13 Mar 2017, at 18:54, Sean Heber via swift-evolution <swift-evolution@swift.org<mailto:swift-evolution@swift.org>> wrote:

I’m dumb when it comes to proper hashing, but it’s such a tediously common thing in my experience to need to add Hashable to some kind of a struct so I can stash it in a set or use it as a dictionary key. Is there really no way to make this all more automatic? I have to be honest - this proposal seems *harder* to understand than the way it works now. Of course the easiest would be if the language could just do this “good enough" for me using reflection or whatever and if I really did run into a problem where I wanted to do this myself, I could override something.

Perfect is the enemy of good.

l8r
Sean

On Mar 13, 2017, at 10:38 AM, Vincent Esche via swift-evolution <swift-evolution@swift.org<mailto:swift-evolution@swift.org>> wrote:

Hi there,

I've written up a proposal for an overhaul/replacement of the Hashable protocol and would love to hear your feedback on it!

Rendered | Blog Post

Cheers,
Vincent

Ps: I'd like to thank David Hart (@hartbit) for his great editorial feedback on this proposal. :+1:

HashVisitable

• Proposal: SE-NNNN
• Authors: Vincent Esche
• Review Manager: TBD
• Status: Awaiting review
Introduction

Replace the Hashable protocol by two new procotols (Hasher and HashVisitable) to improve safety, versatility and learnability.

Motivation

Implementing Hashable is difficult and the consequences if not done well have dire performance and safety repercussions.

The documentation of Hashable lists a sample implementation of var hashValue:

/// A point in an x-y coordinate system.
struct GridPoint {
   var x: Int
   var y: Int
}

extension GridPoint: Hashable {
   var hashValue: Int {
       return x.hashValue ^ y.hashValue
   }

   static func == (lhs: GridPoint, rhs: GridPoint) -> Bool {
       return lhs.x == rhs.x && lhs.y == rhs.y
   }
}

Calculating the hashes of all GridPoints (given the above implementation) on a 1000 × 1000 grid …

let (width, height) = (1000, 1000)
let total = width * height
var hashes = Set<Int>()
for x in 0..<width {
   for y in 0..<height {
       hashes.insert(GridPoint(x: x, y: y).hashValue)
   }
}
print("\(hashes.count) unique hashes out of a total of \(total).")

… results in just 1024 unique hash values for 1_000_000 unique values.

In other words: The recommended implementation causes 99.9% of values to trigger a hash collision.

Out of those 1_000_000 values the median collision count was 976 with min and max being 976 and 1000respectively.

The collision rate will have negative impact in algorithms which heavily use hashValue like the ones in Dictionaryand Set. Furthermore, it increases the vulnerability to DDOS attacks when exposed to the web.

If even the official Swift documentation gets the implementation of hashValue wrong, then who is to expect the average Swift programmer to do any better?

In contrast, running the same snippet using HashVisitable and the semi-secure Fnv1aHash (see below) results in zero collisions!

Finally, the design of the Hashable protocol forces the use of one implementation without the possibility of switching between multiple hashing algorithms.

Proposed solution

Instead of coupling the hashing algorithm with each and every Swift type, we should provide a hashing API based on the visitor-pattern. By freeing application developers from the burden of having to implement hashing algorithms, the Standard Library can provide default ones which fulfill collision, performance and security goals. Furthermore, it would allow developers to swap to different algorithms based on the use case.

Detailed design

The proposal deprecates the Hashable protocol and introduces the following two:

protocol Hasher
{

mutating func finish() -> Int

mutating func write(bytes
: UnsafeRawBufferPointer)
}

protocol HashVisitable
{

func hash<H: Hasher>(_ hasher: inout
H)
}

Hasher is the protocol which represents a hashing algorithm, and HashVisitable replaces Hashable. For types entirely represented by their memory layout, the following protocol would provide a default implementation:

protocol ContiguouslyHashable: HashVisitable {}

extension ContiguouslyHashable {
   func hash<H: Hasher>(_ hasher: inout H) {
       var mutableSelf = self
       try! Swift.withUnsafeBytes(of: &mutableSelf) {
           hasher.write(bytes: $0)
       }
   }
}

extension Bool : ContiguouslyHashable {}
extension UInt8 : ContiguouslyHashable {}
extension UInt16 : ContiguouslyHashable {}
extension UInt32 : ContiguouslyHashable {}
extension UInt64 : ContiguouslyHashable {}
extension UInt : ContiguouslyHashable {}
extension Int8 : ContiguouslyHashable {}
extension Int16 : ContiguouslyHashable {}
extension Int32 : ContiguouslyHashable {}
extension Int64 : ContiguouslyHashable {}
extension Int : ContiguouslyHashable {}

The Standard-Library would then provide a set of hashing implementations specific to each purpose. A possible choice for hashing algorithms would be the reasonably fast SipHash-2-4, and the reasonably secure SipHash-4-8.

FNV-1A is another popular semi-secure but blazingly fast hash algorithm, which – for the sake of demonstration – could be implemented as follows:

struct Fnv1aHash
{

fileprivate var state: UInt

init(seed: UInt
) {

self.state = seed &+ 14695981039346656037

   }
}

extension Fnv1aHash: Hasher
{

mutating func write(bytes
: UnsafeRawBufferPointer) {

for byte in
bytes {

self.state = (self.state ^ UInt(byte)) &* 1099511628211

       }
   }

mutating func finish() -> Int
{

return unsafeBitCast(self.state, to: Int.self
)
   }
}

Coming back to the sample code present in the Hashable documentation, the new implementation would look like:

extension GridPoint: HashVisitable
{

func hash<H: Hasher>(_ hasher: inout
H) {

self.x.hash(&
hasher)

self.y.hash(&
hasher)
   }
}

Source compatibility

Making use of "extending protocols to conform to protocols":

extension Hashable: HashVisitable
{

func hash<H: Hasher>(_ hasher: inout
H) {

self.hashValue.hash(&
hasher)
   }
}

Effect on ABI stability

n/a

Effect on API resilience

This feature should be possible to add/remove without breaking ABI.

Alternatives considered

n/a
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org<mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org<mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution


(David Hart) #6

I’m dumb when it comes to proper hashing, but it’s such a tediously common thing in my experience to need to add Hashable to some kind of a struct so I can stash it in a set or use it as a dictionary key. Is there really no way to make this all more automatic? I have to be honest - this proposal seems *harder* to understand than the way it works now.

It's not really harder: just call hash on each of your type's significant values:

x.hash(&hasher)
y.hash(&hasher)

How would you implement hashValue in a simpler way, remembering that 'x ^ y' is an incorrect implementation?

···

On 13 Mar 2017, at 18:54, Sean Heber via swift-evolution <swift-evolution@swift.org> wrote:

Of course the easiest would be if the language could just do this “good enough" for me using reflection or whatever and if I really did run into a problem where I wanted to do this myself, I could override something.

Perfect is the enemy of good.

l8r
Sean

On Mar 13, 2017, at 10:38 AM, Vincent Esche via swift-evolution <swift-evolution@swift.org> wrote:

Hi there,

I've written up a proposal for an overhaul/replacement of the Hashable protocol and would love to hear your feedback on it!

Rendered | Blog Post

Cheers,
Vincent

Ps: I'd like to thank David Hart (@hartbit) for his great editorial feedback on this proposal. :+1:

HashVisitable

   • Proposal: SE-NNNN
   • Authors: Vincent Esche
   • Review Manager: TBD
   • Status: Awaiting review
Introduction

Replace the Hashable protocol by two new procotols (Hasher and HashVisitable) to improve safety, versatility and learnability.

Motivation

Implementing Hashable is difficult and the consequences if not done well have dire performance and safety repercussions.

The documentation of Hashable lists a sample implementation of var hashValue:

/// A point in an x-y coordinate system.
struct GridPoint {
   var x: Int
   var y: Int
}

extension GridPoint: Hashable {
   var hashValue: Int {
       return x.hashValue ^ y.hashValue
   }

   static func == (lhs: GridPoint, rhs: GridPoint) -> Bool {
       return lhs.x == rhs.x && lhs.y == rhs.y
   }
}

Calculating the hashes of all GridPoints (given the above implementation) on a 1000 × 1000 grid …

let (width, height) = (1000, 1000)
let total = width * height
var hashes = Set<Int>()
for x in 0..<width {
   for y in 0..<height {
       hashes.insert(GridPoint(x: x, y: y).hashValue)
   }
}
print("\(hashes.count) unique hashes out of a total of \(total).")

… results in just 1024 unique hash values for 1_000_000 unique values.

In other words: The recommended implementation causes 99.9% of values to trigger a hash collision.

Out of those 1_000_000 values the median collision count was 976 with min and max being 976 and 1000respectively.

The collision rate will have negative impact in algorithms which heavily use hashValue like the ones in Dictionaryand Set. Furthermore, it increases the vulnerability to DDOS attacks when exposed to the web.

If even the official Swift documentation gets the implementation of hashValue wrong, then who is to expect the average Swift programmer to do any better?

In contrast, running the same snippet using HashVisitable and the semi-secure Fnv1aHash (see below) results in zero collisions!

Finally, the design of the Hashable protocol forces the use of one implementation without the possibility of switching between multiple hashing algorithms.

Proposed solution

Instead of coupling the hashing algorithm with each and every Swift type, we should provide a hashing API based on the visitor-pattern. By freeing application developers from the burden of having to implement hashing algorithms, the Standard Library can provide default ones which fulfill collision, performance and security goals. Furthermore, it would allow developers to swap to different algorithms based on the use case.

Detailed design

The proposal deprecates the Hashable protocol and introduces the following two:

protocol Hasher
{

mutating func finish() -> Int

mutating func write(bytes
: UnsafeRawBufferPointer)
}

protocol HashVisitable
{

func hash<H: Hasher>(_ hasher: inout
H)
}

Hasher is the protocol which represents a hashing algorithm, and HashVisitable replaces Hashable. For types entirely represented by their memory layout, the following protocol would provide a default implementation:

protocol ContiguouslyHashable: HashVisitable {}

extension ContiguouslyHashable {
   func hash<H: Hasher>(_ hasher: inout H) {
       var mutableSelf = self
       try! Swift.withUnsafeBytes(of: &mutableSelf) {
           hasher.write(bytes: $0)
       }
   }
}

extension Bool : ContiguouslyHashable {}
extension UInt8 : ContiguouslyHashable {}
extension UInt16 : ContiguouslyHashable {}
extension UInt32 : ContiguouslyHashable {}
extension UInt64 : ContiguouslyHashable {}
extension UInt : ContiguouslyHashable {}
extension Int8 : ContiguouslyHashable {}
extension Int16 : ContiguouslyHashable {}
extension Int32 : ContiguouslyHashable {}
extension Int64 : ContiguouslyHashable {}
extension Int : ContiguouslyHashable {}

The Standard-Library would then provide a set of hashing implementations specific to each purpose. A possible choice for hashing algorithms would be the reasonably fast SipHash-2-4, and the reasonably secure SipHash-4-8.

FNV-1A is another popular semi-secure but blazingly fast hash algorithm, which – for the sake of demonstration – could be implemented as follows:

struct Fnv1aHash
{

fileprivate var state: UInt

init(seed: UInt
) {

self.state = seed &+ 14695981039346656037

   }
}

extension Fnv1aHash: Hasher
{

mutating func write(bytes
: UnsafeRawBufferPointer) {

for byte in
bytes {

self.state = (self.state ^ UInt(byte)) &* 1099511628211

       }
   }

mutating func finish() -> Int
{

return unsafeBitCast(self.state, to: Int.self
)
   }
}

Coming back to the sample code present in the Hashable documentation, the new implementation would look like:

extension GridPoint: HashVisitable
{

func hash<H: Hasher>(_ hasher: inout
H) {

self.x.hash(&
hasher)

self.y.hash(&
hasher)
   }
}

Source compatibility

Making use of "extending protocols to conform to protocols":

extension Hashable: HashVisitable
{

func hash<H: Hasher>(_ hasher: inout
H) {

self.hashValue.hash(&
hasher)
   }
}

Effect on ABI stability

n/a

Effect on API resilience

This feature should be possible to add/remove without breaking ABI.

Alternatives considered

n/a
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Karoy Lorentey) #7

Yes please! I’ve a package on GitHub to implement roughly the same thing. I’ve been happily using it for months now, and I wouldn’t ever write a hashValue implementation by hand again.

https://github.com/lorentey/SipHash

I think the fact that we’ve both come up with essentially the same API is an interesting data point; it definitely underlines the need for a better Hashable protocol.

My comments:

* In an ideal world, this would be a replacement for Hashable, not a protocol with a different name. Once visitor-based hashing becomes available, nobody should implement a hashValue property.

* All standard Hashable types in standard library should implement the new hash function directly, rather than relying on the default implementation.

* Why is the HashVisitable.hash a generic function? Hasher could just as well be a concrete type defined in stdlib. Making it generic may have some performance implications.

* I find that I prefer to hash components by calling a mutating method on the hasher, rather than directly calling the components' hash implementations. Putting the hasher first is much more readable to me, primarily because it gets rid of all the distracting &s. It also makes it possible to find slightly better names, eliminating the repetitiveness of "foo.hash(&hasher)":

extension GridPoint: SipHashable {
    func appendHashes(to hasher: inout
SipHasher) {
         hasher.append(x)
         hasher.append(y)
    }
}

* I suggest using SipHash instead of FNV-1a. The standard library already contains an implementation for SipHash, as undocumented internal API, complete with a note that it should be made public. AFAICR, it is currently only used for String hashing, but it would be very much worth making it universal. (Accomodating SipHash's random key is one reason why hashValue’s documentation explicitly notes that its value is "not guaranteed to be equal across different executions of your program.”)

* ContiguouslyHashable seems like an unsafe construct to me. It is quite dangerous to base hashing on the raw byte sequence underlying a value: for example, struct values may include uninitialized gaps between some of their stored properties due to alignment constraints. So two otherwise identical values may very well have different in-memory representations. Therefore, I suggest ContiguouslyHashable should be removed from the proposal. Swift programmers who know what they’re doing would still be able to call withUnsafeBytes(of:) when they want to, but regular schmoes like myself will not be tempted to shoot themselves in the foot by using it inappropriately.

Cheers,
Karoly

···

On 2017-03-13, at 16:38, Vincent Esche via swift-evolution <swift-evolution@swift.org> wrote:

Hi there,

I've written up a proposal for an overhaul/replacement of the Hashable protocol and would love to hear your feedback on it!

Rendered <https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25> | Blog Post <https://blog.definiteloops.com/ha-r-sh-visitors-8c0c3686a46f>

Cheers,
Vincent

Ps: I'd like to thank David Hart (@hartbit) for his great editorial feedback on this proposal. :+1:

HashVisitable

Proposal: SE-NNNN <https://gist.github.com/regexident/NNNN-filename.md>
Authors: Vincent Esche <https://github.com/regexident>
Review Manager: TBD
Status: Awaiting review
<https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#introduction>Introduction

Replace the Hashable protocol by two new procotols (Hasher and HashVisitable) to improve safety, versatility and learnability.

<https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#motivation>Motivation

Implementing Hashable is difficult and the consequences if not done well have dire performance and safety repercussions.

The documentation of Hashable lists a sample implementation <https://developer.apple.com/reference/swift/hashable> of var hashValue:

/// A point in an x-y coordinate system.
struct GridPoint {
    var x: Int
    var y: Int
}

extension GridPoint: Hashable {
    var hashValue: Int {
        return x.hashValue ^ y.hashValue
    }

    static func == (lhs: GridPoint, rhs: GridPoint) -> Bool {
        return lhs.x == rhs.x && lhs.y == rhs.y
    }
}
Calculating the hashes of all GridPoints (given the above implementation) on a 1000 × 1000 grid …

let (width, height) = (1000, 1000)
let total = width * height
var hashes = Set<Int>()
for x in 0..<width {
    for y in 0..<height {
        hashes.insert(GridPoint(x: x, y: y).hashValue)
    }
}
print("\(hashes.count) unique hashes out of a total of \(total).")
… results in just 1024 unique hash values for 1_000_000 unique values.

In other words: The recommended implementation causes 99.9% of values to trigger a hash collision.

Out of those 1_000_000 values the median collision count was 976 with min and max being 976 and 1000respectively.

The collision rate will have negative impact in algorithms which heavily use hashValue like the ones in Dictionaryand Set. Furthermore, it increases the vulnerability to DDOS attacks when exposed to the web <https://arstechnica.com/business/2011/12/huge-portions-of-web-vulnerable-to-hashing-denial-of-service-attack/>.

If even the official Swift documentation gets the implementation of hashValue wrong, then who is to expect the average Swift programmer to do any better?

In contrast, running the same snippet using HashVisitable and the semi-secure Fnv1aHash (see below) results in zero collisions!

Finally, the design of the Hashable protocol forces the use of one implementation without the possibility of switching between multiple hashing algorithms.

<https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#proposed-solution>Proposed solution

Instead of coupling the hashing algorithm with each and every Swift type, we should provide a hashing API based on the visitor-pattern. By freeing application developers from the burden of having to implement hashing algorithms, the Standard Library can provide default ones which fulfill collision, performance and security goals. Furthermore, it would allow developers to swap to different algorithms based on the use case.

<https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#detailed-design>Detailed design

The proposal deprecates the Hashable protocol and introduces the following two:

protocol Hasher {
    mutating func finish() -> Int
    mutating func write(bytes: UnsafeRawBufferPointer)
}

protocol HashVisitable {
    func hash<H: Hasher>(_ hasher: inout H)
}
Hasher is the protocol which represents a hashing algorithm, and HashVisitable replaces Hashable. For types entirely represented by their memory layout, the following protocol would provide a default implementation:

protocol ContiguouslyHashable: HashVisitable {}

extension ContiguouslyHashable {
    func hash<H: Hasher>(_ hasher: inout H) {
        var mutableSelf = self
        try! Swift.withUnsafeBytes(of: &mutableSelf) {
            hasher.write(bytes: $0)
        }
    }
}

extension Bool : ContiguouslyHashable {}
extension UInt8 : ContiguouslyHashable {}
extension UInt16 : ContiguouslyHashable {}
extension UInt32 : ContiguouslyHashable {}
extension UInt64 : ContiguouslyHashable {}
extension UInt : ContiguouslyHashable {}
extension Int8 : ContiguouslyHashable {}
extension Int16 : ContiguouslyHashable {}
extension Int32 : ContiguouslyHashable {}
extension Int64 : ContiguouslyHashable {}
extension Int : ContiguouslyHashable {}
The Standard-Library would then provide a set of hashing implementations specific to each purpose. A possible choice for hashing algorithms would be the reasonably fast SipHash-2-4 <https://en.wikipedia.org/wiki/SipHash>, and the reasonably secure SipHash-4-8 <https://en.wikipedia.org/wiki/SipHash>.

FNV-1A is another popular semi-secure but blazingly fast hash algorithm, which – for the sake of demonstration – could be implemented as follows:

struct Fnv1aHash {
    fileprivate var state: UInt
    init(seed: UInt) {
        self.state = seed &+ 14695981039346656037
    }
}

extension Fnv1aHash: Hasher {
    mutating func write(bytes: UnsafeRawBufferPointer) {
        for byte in bytes {
            self.state = (self.state ^ UInt(byte)) &* 1099511628211
        }
    }
    mutating func finish() -> Int {
        return unsafeBitCast(self.state, to: Int.self)
    }
}
Coming back to the sample code present in the Hashable documentation, the new implementation would look like:

extension GridPoint: HashVisitable {
    func hash<H: Hasher>(_ hasher: inout H) {
        self.x.hash(&hasher)
        self.y.hash(&hasher)
    }
}
<https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#source-compatibility>Source compatibility

Making use of "extending protocols to conform to protocols <https://github.com/apple/swift-evolution/blob/d33c129f0920af0404f42219db56981411b20e76/proposals/0143-conditional-conformances.md#extending-protocols-to-conform-to-protocols>":

extension Hashable: HashVisitable {
    func hash<H: Hasher>(_ hasher: inout H) {
        self.hashValue.hash(&hasher)
    }
}
<https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#effect-on-abi-stability>Effect on ABI stability

n/a

<https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#effect-on-api-resilience>Effect on API resilience

This feature should be possible to add/remove without breaking ABI.

<https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#alternatives-considered>Alternatives considered

n/a
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Karoy Lorentey) #8

I think standardizing on a “correct” way to combine multiple hashable components into a single hash is an important step toward automating it. Visitor-based hashing simplifies the implementation of Hashable by essentially making it into a simple decision about which components to include in the hash. The details of how to mix bits into a single fixed-size bitstring is abstracted away into the hasher, so hashing becomes as easy to implement as the equality operator.

Swift already includes very limited support for automated generation of Equatable and Hashable implementations (for basic enum types, at least). Having this new form of hashing would make it a little easier to extend this mechanism to cover other cases—at least, a proposal for that wouldn’t need to invent and describe a solution for combining hash values.

Cheers,
Karoly

···

On 2017-03-13, at 18:54, Sean Heber via swift-evolution <swift-evolution@swift.org> wrote:

I’m dumb when it comes to proper hashing, but it’s such a tediously common thing in my experience to need to add Hashable to some kind of a struct so I can stash it in a set or use it as a dictionary key. Is there really no way to make this all more automatic? I have to be honest - this proposal seems *harder* to understand than the way it works now. Of course the easiest would be if the language could just do this “good enough" for me using reflection or whatever and if I really did run into a problem where I wanted to do this myself, I could override something.

Perfect is the enemy of good.

l8r
Sean

On Mar 13, 2017, at 10:38 AM, Vincent Esche via swift-evolution <swift-evolution@swift.org> wrote:

Hi there,

I've written up a proposal for an overhaul/replacement of the Hashable protocol and would love to hear your feedback on it!

Rendered | Blog Post

Cheers,
Vincent

Ps: I'd like to thank David Hart (@hartbit) for his great editorial feedback on this proposal. :+1:

HashVisitable

  • Proposal: SE-NNNN
  • Authors: Vincent Esche
  • Review Manager: TBD
  • Status: Awaiting review
Introduction

Replace the Hashable protocol by two new procotols (Hasher and HashVisitable) to improve safety, versatility and learnability.

Motivation

Implementing Hashable is difficult and the consequences if not done well have dire performance and safety repercussions.

The documentation of Hashable lists a sample implementation of var hashValue:

/// A point in an x-y coordinate system.
struct GridPoint {
   var x: Int
   var y: Int
}

extension GridPoint: Hashable {
   var hashValue: Int {
       return x.hashValue ^ y.hashValue
   }

   static func == (lhs: GridPoint, rhs: GridPoint) -> Bool {
       return lhs.x == rhs.x && lhs.y == rhs.y
   }
}

Calculating the hashes of all GridPoints (given the above implementation) on a 1000 × 1000 grid …

let (width, height) = (1000, 1000)
let total = width * height
var hashes = Set<Int>()
for x in 0..<width {
   for y in 0..<height {
       hashes.insert(GridPoint(x: x, y: y).hashValue)
   }
}
print("\(hashes.count) unique hashes out of a total of \(total).")

… results in just 1024 unique hash values for 1_000_000 unique values.

In other words: The recommended implementation causes 99.9% of values to trigger a hash collision.

Out of those 1_000_000 values the median collision count was 976 with min and max being 976 and 1000respectively.

The collision rate will have negative impact in algorithms which heavily use hashValue like the ones in Dictionaryand Set. Furthermore, it increases the vulnerability to DDOS attacks when exposed to the web.

If even the official Swift documentation gets the implementation of hashValue wrong, then who is to expect the average Swift programmer to do any better?

In contrast, running the same snippet using HashVisitable and the semi-secure Fnv1aHash (see below) results in zero collisions!

Finally, the design of the Hashable protocol forces the use of one implementation without the possibility of switching between multiple hashing algorithms.

Proposed solution

Instead of coupling the hashing algorithm with each and every Swift type, we should provide a hashing API based on the visitor-pattern. By freeing application developers from the burden of having to implement hashing algorithms, the Standard Library can provide default ones which fulfill collision, performance and security goals. Furthermore, it would allow developers to swap to different algorithms based on the use case.

Detailed design

The proposal deprecates the Hashable protocol and introduces the following two:

protocol Hasher
{

mutating func finish() -> Int

mutating func write(bytes
: UnsafeRawBufferPointer)
}

protocol HashVisitable
{

func hash<H: Hasher>(_ hasher: inout
H)
}

Hasher is the protocol which represents a hashing algorithm, and HashVisitable replaces Hashable. For types entirely represented by their memory layout, the following protocol would provide a default implementation:

protocol ContiguouslyHashable: HashVisitable {}

extension ContiguouslyHashable {
   func hash<H: Hasher>(_ hasher: inout H) {
       var mutableSelf = self
       try! Swift.withUnsafeBytes(of: &mutableSelf) {
           hasher.write(bytes: $0)
       }
   }
}

extension Bool : ContiguouslyHashable {}
extension UInt8 : ContiguouslyHashable {}
extension UInt16 : ContiguouslyHashable {}
extension UInt32 : ContiguouslyHashable {}
extension UInt64 : ContiguouslyHashable {}
extension UInt : ContiguouslyHashable {}
extension Int8 : ContiguouslyHashable {}
extension Int16 : ContiguouslyHashable {}
extension Int32 : ContiguouslyHashable {}
extension Int64 : ContiguouslyHashable {}
extension Int : ContiguouslyHashable {}

The Standard-Library would then provide a set of hashing implementations specific to each purpose. A possible choice for hashing algorithms would be the reasonably fast SipHash-2-4, and the reasonably secure SipHash-4-8.

FNV-1A is another popular semi-secure but blazingly fast hash algorithm, which – for the sake of demonstration – could be implemented as follows:

struct Fnv1aHash
{

fileprivate var state: UInt

init(seed: UInt
) {

self.state = seed &+ 14695981039346656037

   }
}

extension Fnv1aHash: Hasher
{

mutating func write(bytes
: UnsafeRawBufferPointer) {

for byte in
bytes {

self.state = (self.state ^ UInt(byte)) &* 1099511628211

       }
   }

mutating func finish() -> Int
{

return unsafeBitCast(self.state, to: Int.self
)
   }
}

Coming back to the sample code present in the Hashable documentation, the new implementation would look like:

extension GridPoint: HashVisitable
{

func hash<H: Hasher>(_ hasher: inout
H) {

self.x.hash(&
hasher)

self.y.hash(&
hasher)
   }
}

Source compatibility

Making use of "extending protocols to conform to protocols":

extension Hashable: HashVisitable
{

func hash<H: Hasher>(_ hasher: inout
H) {

self.hashValue.hash(&
hasher)
   }
}

Effect on ABI stability

n/a

Effect on API resilience

This feature should be possible to add/remove without breaking ABI.

Alternatives considered

n/a
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Joe Groff) #9

The compiler totally ought to derive reasonable default Equatable and Hashable implementations for you. That would allow the standard library to do the right thing without burdening most users with the need to sweat the details.

-Joe

···

On Mar 13, 2017, at 10:54 AM, Sean Heber via swift-evolution <swift-evolution@swift.org> wrote:

I’m dumb when it comes to proper hashing, but it’s such a tediously common thing in my experience to need to add Hashable to some kind of a struct so I can stash it in a set or use it as a dictionary key. Is there really no way to make this all more automatic? I have to be honest - this proposal seems *harder* to understand than the way it works now. Of course the easiest would be if the language could just do this “good enough" for me using reflection or whatever and if I really did run into a problem where I wanted to do this myself, I could override something.

Perfect is the enemy of good.


(Joe Groff) #10

We're unlikely to add this feature soon. It seems reasonable to me to instead have `HashVisitable` refine `Hashable` and provide a default implementation of `hashValue` using a default hasher. I think we still want `Hashable` to be the currency protocol most APIs work with for performance in unspecialized code, since we could inline the visitation and hasher implementation together inside the specialized `hashValue` witness.

-Joe

···

On Mar 13, 2017, at 8:38 AM, Vincent Esche via swift-evolution <swift-evolution@swift.org> wrote:

Source compatibility

Making use of "extending protocols to conform to protocols":

extension Hashable: HashVisitable
{
    
func hash<H: Hasher>(_ hasher: inout
H) {
        
self.hashValue.hash(&
hasher)
    }
}


(David Sweeris) #11

Are we committed to having `hashValue` always be an `Int`, or could it be an associated type like `(Int, Int, Int, Int)`? Seems like especially for something like a BigNum type or an Array, there might simple not be a reasonably efficient way to uniquely-ish represent 1024 bits with just 64 bits.

(This kinda feels like one of those questions where the answer starts out with a variation on “you’re missing the point”)

- Dave Sweeris

···

On Mar 13, 2017, at 8:38 AM, Vincent Esche via swift-evolution <swift-evolution@swift.org> wrote:

Hi there,

I've written up a proposal for an overhaul/replacement of the Hashable protocol and would love to hear your feedback on it!

Rendered <https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25> | Blog Post <https://blog.definiteloops.com/ha-r-sh-visitors-8c0c3686a46f>


(Sean Heber) #12

Is there any reason the API couldn’t be expressed as something along these lines?

func hash() -> [Hashable] {
  return [x, y]
}

l8r
Sean

···

On Mar 13, 2017, at 2:15 PM, David Hart <david@hartbit.com> wrote:

On 13 Mar 2017, at 18:54, Sean Heber via swift-evolution <swift-evolution@swift.org> wrote:

I’m dumb when it comes to proper hashing, but it’s such a tediously common thing in my experience to need to add Hashable to some kind of a struct so I can stash it in a set or use it as a dictionary key. Is there really no way to make this all more automatic? I have to be honest - this proposal seems *harder* to understand than the way it works now.

It's not really harder: just call hash on each of your type's significant values:

x.hash(&hasher)
y.hash(&hasher)

How would you implement hashValue in a simpler way, remembering that 'x ^ y' is an incorrect implementation?

Of course the easiest would be if the language could just do this “good enough" for me using reflection or whatever and if I really did run into a problem where I wanted to do this myself, I could override something.

Perfect is the enemy of good.

l8r
Sean

On Mar 13, 2017, at 10:38 AM, Vincent Esche via swift-evolution <swift-evolution@swift.org> wrote:

Hi there,

I've written up a proposal for an overhaul/replacement of the Hashable protocol and would love to hear your feedback on it!

Rendered | Blog Post

Cheers,
Vincent

Ps: I'd like to thank David Hart (@hartbit) for his great editorial feedback on this proposal. :+1:

HashVisitable

  • Proposal: SE-NNNN
  • Authors: Vincent Esche
  • Review Manager: TBD
  • Status: Awaiting review
Introduction

Replace the Hashable protocol by two new procotols (Hasher and HashVisitable) to improve safety, versatility and learnability.

Motivation

Implementing Hashable is difficult and the consequences if not done well have dire performance and safety repercussions.

The documentation of Hashable lists a sample implementation of var hashValue:

/// A point in an x-y coordinate system.
struct GridPoint {
  var x: Int
  var y: Int
}

extension GridPoint: Hashable {
  var hashValue: Int {
      return x.hashValue ^ y.hashValue
  }

  static func == (lhs: GridPoint, rhs: GridPoint) -> Bool {
      return lhs.x == rhs.x && lhs.y == rhs.y
  }
}

Calculating the hashes of all GridPoints (given the above implementation) on a 1000 × 1000 grid …

let (width, height) = (1000, 1000)
let total = width * height
var hashes = Set<Int>()
for x in 0..<width {
  for y in 0..<height {
      hashes.insert(GridPoint(x: x, y: y).hashValue)
  }
}
print("\(hashes.count) unique hashes out of a total of \(total).")

… results in just 1024 unique hash values for 1_000_000 unique values.

In other words: The recommended implementation causes 99.9% of values to trigger a hash collision.

Out of those 1_000_000 values the median collision count was 976 with min and max being 976 and 1000respectively.

The collision rate will have negative impact in algorithms which heavily use hashValue like the ones in Dictionaryand Set. Furthermore, it increases the vulnerability to DDOS attacks when exposed to the web.

If even the official Swift documentation gets the implementation of hashValue wrong, then who is to expect the average Swift programmer to do any better?

In contrast, running the same snippet using HashVisitable and the semi-secure Fnv1aHash (see below) results in zero collisions!

Finally, the design of the Hashable protocol forces the use of one implementation without the possibility of switching between multiple hashing algorithms.

Proposed solution

Instead of coupling the hashing algorithm with each and every Swift type, we should provide a hashing API based on the visitor-pattern. By freeing application developers from the burden of having to implement hashing algorithms, the Standard Library can provide default ones which fulfill collision, performance and security goals. Furthermore, it would allow developers to swap to different algorithms based on the use case.

Detailed design

The proposal deprecates the Hashable protocol and introduces the following two:

protocol Hasher
{

mutating func finish() -> Int

mutating func write(bytes
: UnsafeRawBufferPointer)
}

protocol HashVisitable
{

func hash<H: Hasher>(_ hasher: inout
H)
}

Hasher is the protocol which represents a hashing algorithm, and HashVisitable replaces Hashable. For types entirely represented by their memory layout, the following protocol would provide a default implementation:

protocol ContiguouslyHashable: HashVisitable {}

extension ContiguouslyHashable {
  func hash<H: Hasher>(_ hasher: inout H) {
      var mutableSelf = self
      try! Swift.withUnsafeBytes(of: &mutableSelf) {
          hasher.write(bytes: $0)
      }
  }
}

extension Bool : ContiguouslyHashable {}
extension UInt8 : ContiguouslyHashable {}
extension UInt16 : ContiguouslyHashable {}
extension UInt32 : ContiguouslyHashable {}
extension UInt64 : ContiguouslyHashable {}
extension UInt : ContiguouslyHashable {}
extension Int8 : ContiguouslyHashable {}
extension Int16 : ContiguouslyHashable {}
extension Int32 : ContiguouslyHashable {}
extension Int64 : ContiguouslyHashable {}
extension Int : ContiguouslyHashable {}

The Standard-Library would then provide a set of hashing implementations specific to each purpose. A possible choice for hashing algorithms would be the reasonably fast SipHash-2-4, and the reasonably secure SipHash-4-8.

FNV-1A is another popular semi-secure but blazingly fast hash algorithm, which – for the sake of demonstration – could be implemented as follows:

struct Fnv1aHash
{

fileprivate var state: UInt

init(seed: UInt
) {

self.state = seed &+ 14695981039346656037

  }
}

extension Fnv1aHash: Hasher
{

mutating func write(bytes
: UnsafeRawBufferPointer) {

for byte in
bytes {

self.state = (self.state ^ UInt(byte)) &* 1099511628211

      }
  }

mutating func finish() -> Int
{

return unsafeBitCast(self.state, to: Int.self
)
  }
}

Coming back to the sample code present in the Hashable documentation, the new implementation would look like:

extension GridPoint: HashVisitable
{

func hash<H: Hasher>(_ hasher: inout
H) {

self.x.hash(&
hasher)

self.y.hash(&
hasher)
  }
}

Source compatibility

Making use of "extending protocols to conform to protocols":

extension Hashable: HashVisitable
{

func hash<H: Hasher>(_ hasher: inout
H) {

self.hashValue.hash(&
hasher)
  }
}

Effect on ABI stability

n/a

Effect on API resilience

This feature should be possible to add/remove without breaking ABI.

Alternatives considered

n/a
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Vincent Esche) #13

That sure looks like a nice addition. :+1:

I tried to keep the proposal as small as possible, as it already looks much
more complicated than it is.

···

On Mon, Mar 13, 2017 at 6:25 PM, Haravikk <swift-evolution@haravikk.me> wrote:

I really like the proposal!

Only thing I'm wondering is whether a convenience function on Hasher might
be nice to have, like so:

extension Hasher {
mutating func hash<H:Hashable>(_ hashable:H) -> Int {
hashable.hash(self)
return self.finish()
}
}

This just feels like it would be easier for the most common case where you
just need to hash a single value right away, as you'd just do:

let hashValue = hasher.hash(foo)

While leaving the inout style available for when you do need to combine
multiple values together or otherwise feed in data gradually. This way
we're not losing as much of the convenience of the .hashValue that's being
deprecated. On that note, should Hashable be updated to do something like
the above as well, so it still immediately returns a value from a default
hasher?

Anyway, definitely in favour of this change, as you're right; .hashValue
is a hard thing to do right, and is often an after-thought, which can make
it of limited usefulness to types that depend upon it!

On 13 Mar 2017, at 15:38, Vincent Esche via swift-evolution < > swift-evolution@swift.org> wrote:

Hi there,

I've written up a proposal for an overhaul/replacement of the Hashable
protocol and would love to hear your feedback on it!

Rendered
<https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25> | Blog
Post <https://blog.definiteloops.com/ha-r-sh-visitors-8c0c3686a46f>

Cheers,
Vincent

Ps: I'd like to thank David Hart (@hartbit) for his great editorial
feedback on this proposal. :+1:

HashVisitable

   - Proposal: SE-NNNN
   <https://gist.github.com/regexident/NNNN-filename.md>
   - Authors: Vincent Esche <https://github.com/regexident>
   - Review Manager: TBD
   - Status: Awaiting review

<https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#introduction>
Introduction

Replace the Hashable protocol by two new procotols (Hasher and
HashVisitable) to improve safety, versatility and learnability.

<https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#motivation>
Motivation

Implementing Hashable is difficult and the consequences if not done well
have dire performance and safety repercussions.

The documentation of Hashable lists a sample implementation
<https://developer.apple.com/reference/swift/hashable> of var hashValue:

/// A point in an x-y coordinate system.
struct GridPoint {
    var x: Int
    var y: Int
}

extension GridPoint: Hashable {
    var hashValue: Int {
        return x.hashValue ^ y.hashValue
    }

    static func == (lhs: GridPoint, rhs: GridPoint) -> Bool {
        return lhs.x == rhs.x && lhs.y == rhs.y
    }
}

Calculating the hashes of all GridPoints (given the above implementation)
on a 1000 × 1000 grid …

let (width, height) = (1000, 1000)
let total = width * height
var hashes = Set<Int>()
for x in 0..<width {
    for y in 0..<height {
        hashes.insert(GridPoint(x: x, y: y).hashValue)
    }
}
print("\(hashes.count) unique hashes out of a total of \(total).")

… results in just 1024 unique hash values for 1_000_000 unique values.

In other words: The recommended implementation causes 99.9% of values to
trigger a hash collision.

Out of those 1_000_000 values the median collision count was 976 with min
and max being 976 and 1000respectively.

The collision rate will have negative impact in algorithms which heavily
use hashValue like the ones in Dictionaryand Set. Furthermore, it
increases the vulnerability to DDOS attacks when exposed to the web
<https://arstechnica.com/business/2011/12/huge-portions-of-web-vulnerable-to-hashing-denial-of-service-attack/>
.

If even the official Swift documentation gets the implementation of
hashValue wrong, then who is to expect the average Swift programmer to do
any better?

In contrast, running the same snippet using HashVisitable and the
semi-secure Fnv1aHash (see below) results in zero collisions!

Finally, the design of the Hashable protocol forces the use of one
implementation without the possibility of switching between multiple
hashing algorithms.

<https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#proposed-solution>Proposed
solution

Instead of coupling the hashing algorithm with each and every Swift type,
we should provide a hashing API based on the visitor-pattern. By freeing
application developers from the burden of having to implement hashing
algorithms, the Standard Library can provide default ones which fulfill
collision, performance and security goals. Furthermore, it would allow
developers to swap to different algorithms based on the use case.

<https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#detailed-design>Detailed
design

The proposal deprecates the Hashable protocol and introduces the
following two:

protocol Hasher {
    mutating func finish() -> Int
    mutating func write(bytes: UnsafeRawBufferPointer)
}
protocol HashVisitable {
    func hash<H: Hasher>(_ hasher: inout H)
}

Hasher is the protocol which represents a hashing algorithm, and
HashVisitable replaces Hashable. For types entirely represented by their
memory layout, the following protocol would provide a default
implementation:

protocol ContiguouslyHashable: HashVisitable {}

extension ContiguouslyHashable {
    func hash<H: Hasher>(_ hasher: inout H) {
        var mutableSelf = self
        try! Swift.withUnsafeBytes(of: &mutableSelf) {
            hasher.write(bytes: $0)
        }
    }
}

extension Bool : ContiguouslyHashable {}
extension UInt8 : ContiguouslyHashable {}
extension UInt16 : ContiguouslyHashable {}
extension UInt32 : ContiguouslyHashable {}
extension UInt64 : ContiguouslyHashable {}
extension UInt : ContiguouslyHashable {}
extension Int8 : ContiguouslyHashable {}
extension Int16 : ContiguouslyHashable {}
extension Int32 : ContiguouslyHashable {}
extension Int64 : ContiguouslyHashable {}
extension Int : ContiguouslyHashable {}

The Standard-Library would then provide a set of hashing implementations
specific to each purpose. A possible choice for hashing algorithms would be
the reasonably fast SipHash-2-4 <https://en.wikipedia.org/wiki/SipHash>,
and the reasonably secure SipHash-4-8
<https://en.wikipedia.org/wiki/SipHash>.

FNV-1A is another popular semi-secure but blazingly fast hash algorithm,
which – for the sake of demonstration – could be implemented as follows:

struct Fnv1aHash {
    fileprivate var state: UInt
    init(seed: UInt) {
        self.state = seed &+ 14695981039346656037
    }
}
extension Fnv1aHash: Hasher {
    mutating func write(bytes: UnsafeRawBufferPointer) {
        for byte in bytes {
            self.state = (self.state ^ UInt(byte)) &* 1099511628211
        }
    }
    mutating func finish() -> Int {
        return unsafeBitCast(self.state, to: Int.self)
    }
}

Coming back to the sample code present in the Hashable documentation, the
new implementation would look like:

extension GridPoint: HashVisitable {
    func hash<H: Hasher>(_ hasher: inout H) {
        self.x.hash(&hasher)
        self.y.hash(&hasher)
    }
}

<https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#source-compatibility>Source
compatibility

Making use of "extending protocols to conform to protocols
<https://github.com/apple/swift-evolution/blob/d33c129f0920af0404f42219db56981411b20e76/proposals/0143-conditional-conformances.md#extending-protocols-to-conform-to-protocols>
":

extension Hashable: HashVisitable {
    func hash<H: Hasher>(_ hasher: inout H) {
        self.hashValue.hash(&hasher)
    }
}

<https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#effect-on-abi-stability>Effect
on ABI stability

n/a

<https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#effect-on-api-resilience>Effect
on API resilience

This feature should be possible to add/remove without breaking ABI.

<https://gist.github.com/regexident/1b8e84974da2243e5199e760508d2d25#alternatives-considered>Alternatives
considered
n/a
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Tony Allevato) #14

I'm not convinced this is the right approach: it couples the fact that a
type is hashable with a specific strategy for implementing the hash value
computation.

Instead, the language should make efforts to avoid requiring the user to
think about hashing algorithms *at all*; to answer Sean's question a couple
messages up, I've proposed in the past
<https://gist.github.com/allevato/2fd10290bfa84accfbe977d8ac07daad> (rough
working draft) adding derivation of Equatable and Hashable for structs and
enums where it's possible to automatically do so (for example, if all of
the enum's associated values are Equatable/Hashable). I've been
experimenting with an implementation in the compiler and I have something
that works for enums, except for recursive ones. I haven't started structs
yet because I think there are more open questions there, and I hope to
propose enums and structs independently so that the enum one doesn't get
bogged down by the struct one.

The core team seems to be aware of the need for this; the logic that
derives Equatable/Hashable for enums without associated values also has
TODO comments to handle those with associated values, and to handle structs
as well. Slava Pestov mentioned a while back that they encouraged PRs on
it, which is why I started, but my free time has been limited lately.

That being said, I *do* think there would be value in having some kind of
hash "mixer" as part of the standard library and strongly encouraging
through documentation that hashValue implementors use it. Getting the
function right is the hard part, and if Swift both (1) took care of it for
you by default in many cases and (2) made the harder cases easier by
providing a high quality canned implementation, I think we'd have a much
cleaner solution than what is being proposed here.

···

On Mon, Mar 13, 2017 at 12:18 PM Sean Heber via swift-evolution < swift-evolution@swift.org> wrote:

Is there any reason the API couldn’t be expressed as something along these
lines?

func hash() -> [Hashable] {
  return [x, y]
}

l8r
Sean

> On Mar 13, 2017, at 2:15 PM, David Hart <david@hartbit.com> wrote:
>
>>
>> On 13 Mar 2017, at 18:54, Sean Heber via swift-evolution < > swift-evolution@swift.org> wrote:
>>
>> I’m dumb when it comes to proper hashing, but it’s such a tediously
common thing in my experience to need to add Hashable to some kind of a
struct so I can stash it in a set or use it as a dictionary key. Is there
really no way to make this all more automatic? I have to be honest - this
proposal seems *harder* to understand than the way it works now.
>
> It's not really harder: just call hash on each of your type's
significant values:
>
> x.hash(&hasher)
> y.hash(&hasher)
>
> How would you implement hashValue in a simpler way, remembering that 'x
^ y' is an incorrect implementation?
>
>> Of course the easiest would be if the language could just do this “good
enough" for me using reflection or whatever and if I really did run into a
problem where I wanted to do this myself, I could override something.
>>
>> Perfect is the enemy of good.
>>
>> l8r
>> Sean
>>
>>
>>> On Mar 13, 2017, at 10:38 AM, Vincent Esche via swift-evolution < > swift-evolution@swift.org> wrote:
>>>
>>> Hi there,
>>>
>>> I've written up a proposal for an overhaul/replacement of the Hashable
protocol and would love to hear your feedback on it!
>>>
>>> Rendered | Blog Post
>>>
>>> Cheers,
>>> Vincent
>>>
>>> Ps: I'd like to thank David Hart (@hartbit) for his great editorial
feedback on this proposal. :+1:
>>>
>>> HashVisitable
>>>
>>> • Proposal: SE-NNNN
>>> • Authors: Vincent Esche
>>> • Review Manager: TBD
>>> • Status: Awaiting review
>>> Introduction
>>>
>>> Replace the Hashable protocol by two new procotols (Hasher and
HashVisitable) to improve safety, versatility and learnability.
>>>
>>> Motivation
>>>
>>> Implementing Hashable is difficult and the consequences if not done
well have dire performance and safety repercussions.
>>>
>>> The documentation of Hashable lists a sample implementation of var
hashValue:
>>>
>>> /// A point in an x-y coordinate system.
>>> struct GridPoint {
>>> var x: Int
>>> var y: Int
>>> }
>>>
>>> extension GridPoint: Hashable {
>>> var hashValue: Int {
>>> return x.hashValue ^ y.hashValue
>>> }
>>>
>>> static func == (lhs: GridPoint, rhs: GridPoint) -> Bool {
>>> return lhs.x == rhs.x && lhs.y == rhs.y
>>> }
>>> }
>>>
>>> Calculating the hashes of all GridPoints (given the above
implementation) on a 1000 × 1000 grid …
>>>
>>> let (width, height) = (1000, 1000)
>>> let total = width * height
>>> var hashes = Set<Int>()
>>> for x in 0..<width {
>>> for y in 0..<height {
>>> hashes.insert(GridPoint(x: x, y: y).hashValue)
>>> }
>>> }
>>> print("\(hashes.count) unique hashes out of a total of \(total).")
>>>
>>> … results in just 1024 unique hash values for 1_000_000 unique values.
>>>
>>> In other words: The recommended implementation causes 99.9% of values
to trigger a hash collision.
>>>
>>> Out of those 1_000_000 values the median collision count was 976 with
min and max being 976 and 1000respectively.
>>>
>>> The collision rate will have negative impact in algorithms which
heavily use hashValue like the ones in Dictionaryand Set. Furthermore, it
increases the vulnerability to DDOS attacks when exposed to the web.
>>>
>>> If even the official Swift documentation gets the implementation of
hashValue wrong, then who is to expect the average Swift programmer to do
any better?
>>>
>>> In contrast, running the same snippet using HashVisitable and the
semi-secure Fnv1aHash (see below) results in zero collisions!
>>>
>>> Finally, the design of the Hashable protocol forces the use of one
implementation without the possibility of switching between multiple
hashing algorithms.
>>>
>>> Proposed solution
>>>
>>> Instead of coupling the hashing algorithm with each and every Swift
type, we should provide a hashing API based on the visitor-pattern. By
freeing application developers from the burden of having to implement
hashing algorithms, the Standard Library can provide default ones which
fulfill collision, performance and security goals. Furthermore, it would
allow developers to swap to different algorithms based on the use case.
>>>
>>> Detailed design
>>>
>>> The proposal deprecates the Hashable protocol and introduces the
following two:
>>>
>>> protocol Hasher
>>> {
>>>
>>> mutating func finish() -> Int
>>>
>>>
>>> mutating func write(bytes
>>> : UnsafeRawBufferPointer)
>>> }
>>>
>>>
>>> protocol HashVisitable
>>> {
>>>
>>> func hash<H: Hasher>(_ hasher: inout
>>> H)
>>> }
>>>
>>> Hasher is the protocol which represents a hashing algorithm, and
HashVisitable replaces Hashable. For types entirely represented by their
memory layout, the following protocol would provide a default
implementation:
>>>
>>> protocol ContiguouslyHashable: HashVisitable {}
>>>
>>> extension ContiguouslyHashable {
>>> func hash<H: Hasher>(_ hasher: inout H) {
>>> var mutableSelf = self
>>> try! Swift.withUnsafeBytes(of: &mutableSelf) {
>>> hasher.write(bytes: $0)
>>> }
>>> }
>>> }
>>>
>>> extension Bool : ContiguouslyHashable {}
>>> extension UInt8 : ContiguouslyHashable {}
>>> extension UInt16 : ContiguouslyHashable {}
>>> extension UInt32 : ContiguouslyHashable {}
>>> extension UInt64 : ContiguouslyHashable {}
>>> extension UInt : ContiguouslyHashable {}
>>> extension Int8 : ContiguouslyHashable {}
>>> extension Int16 : ContiguouslyHashable {}
>>> extension Int32 : ContiguouslyHashable {}
>>> extension Int64 : ContiguouslyHashable {}
>>> extension Int : ContiguouslyHashable {}
>>>
>>> The Standard-Library would then provide a set of hashing
implementations specific to each purpose. A possible choice for hashing
algorithms would be the reasonably fast SipHash-2-4, and the reasonably
secure SipHash-4-8.
>>>
>>> FNV-1A is another popular semi-secure but blazingly fast hash
algorithm, which – for the sake of demonstration – could be implemented as
follows:
>>>
>>> struct Fnv1aHash
>>> {
>>>
>>> fileprivate var state: UInt
>>>
>>>
>>> init(seed: UInt
>>> ) {
>>>
>>> self.state = seed &+ 14695981039346656037
>>>
>>> }
>>> }
>>>
>>>
>>> extension Fnv1aHash: Hasher
>>> {
>>>
>>> mutating func write(bytes
>>> : UnsafeRawBufferPointer) {
>>>
>>> for byte in
>>> bytes {
>>>
>>> self.state = (self.state ^ UInt(byte)) &* 1099511628211
>>>
>>> }
>>> }
>>>
>>> mutating func finish() -> Int
>>> {
>>>
>>> return unsafeBitCast(self.state, to: Int.self
>>> )
>>> }
>>> }
>>>
>>> Coming back to the sample code present in the Hashable documentation,
the new implementation would look like:
>>>
>>> extension GridPoint: HashVisitable
>>> {
>>>
>>> func hash<H: Hasher>(_ hasher: inout
>>> H) {
>>>
>>> self.x.hash(&
>>> hasher)
>>>
>>> self.y.hash(&
>>> hasher)
>>> }
>>> }
>>>
>>> Source compatibility
>>>
>>> Making use of "extending protocols to conform to protocols":
>>>
>>> extension Hashable: HashVisitable
>>> {
>>>
>>> func hash<H: Hasher>(_ hasher: inout
>>> H) {
>>>
>>> self.hashValue.hash(&
>>> hasher)
>>> }
>>> }
>>>
>>> Effect on ABI stability
>>>
>>> n/a
>>>
>>> Effect on API resilience
>>>
>>> This feature should be possible to add/remove without breaking ABI.
>>>
>>> Alternatives considered
>>>
>>> n/a
>>> _______________________________________________
>>> swift-evolution mailing list
>>> swift-evolution@swift.org
>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>
>> _______________________________________________
>> swift-evolution mailing list
>> swift-evolution@swift.org
>> https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(David Hart) #15

While I understand and agree with the goal of the proposal, I too feel like it’s more difficult to understand than the current approach. HashVisitable looks odd, and one would have to understand the role and API of the Hasher protocol.

I would argue that if Swift provided a good default hashing API, then we could simply call it when computing the hashValue.

struct GridPoint: Hashable {
  var x: Int
  var y: Int
  var hashValue: Int {
    let hasher = Swift.DefaultHasher()
    hasher.hash(self.x)
    hasher.hash(self.y)
    return hasher.value
  }
  static func == (lhs: GridPoint, rhs: GridPoint) -> Bool { /* … */ }
}

It might seem wordy, but it would allow one to leverage good hashing algorithms, while not forcing people who do have a good hashing function for their particular type to define their own hasher.

This is exactly what this proposal is proposing, with a slightly different API that allows you to more easily handle subtypes.

What in the current proposal is more complicated to you than what you propose?

···

Sent from my iPhone

On 13 Mar 2017, at 19:30, Dimitri Racordon via swift-evolution <swift-evolution@swift.org> wrote:

On an unrelated note, maybe I missed something but I think HashVisitable would still need to conform to Equatable as well, if it were to replace Hashable. This is unclear in your proposal, as the equality operator is present in the motivational example, but absent from the proposed solution.

On 13 Mar 2017, at 18:54, Sean Heber via swift-evolution <swift-evolution@swift.org> wrote:

I’m dumb when it comes to proper hashing, but it’s such a tediously common thing in my experience to need to add Hashable to some kind of a struct so I can stash it in a set or use it as a dictionary key. Is there really no way to make this all more automatic? I have to be honest - this proposal seems *harder* to understand than the way it works now. Of course the easiest would be if the language could just do this “good enough" for me using reflection or whatever and if I really did run into a problem where I wanted to do this myself, I could override something.

Perfect is the enemy of good.

l8r
Sean

On Mar 13, 2017, at 10:38 AM, Vincent Esche via swift-evolution <swift-evolution@swift.org> wrote:

Hi there,

I've written up a proposal for an overhaul/replacement of the Hashable protocol and would love to hear your feedback on it!

Rendered | Blog Post

Cheers,
Vincent

Ps: I'd like to thank David Hart (@hartbit) for his great editorial feedback on this proposal. :+1:

HashVisitable

• Proposal: SE-NNNN
• Authors: Vincent Esche
• Review Manager: TBD
• Status: Awaiting review
Introduction

Replace the Hashable protocol by two new procotols (Hasher and HashVisitable) to improve safety, versatility and learnability.

Motivation

Implementing Hashable is difficult and the consequences if not done well have dire performance and safety repercussions.

The documentation of Hashable lists a sample implementation of var hashValue:

/// A point in an x-y coordinate system.
struct GridPoint {
   var x: Int
   var y: Int
}

extension GridPoint: Hashable {
   var hashValue: Int {
       return x.hashValue ^ y.hashValue
   }

   static func == (lhs: GridPoint, rhs: GridPoint) -> Bool {
       return lhs.x == rhs.x && lhs.y == rhs.y
   }
}

Calculating the hashes of all GridPoints (given the above implementation) on a 1000 × 1000 grid …

let (width, height) = (1000, 1000)
let total = width * height
var hashes = Set<Int>()
for x in 0..<width {
   for y in 0..<height {
       hashes.insert(GridPoint(x: x, y: y).hashValue)
   }
}
print("\(hashes.count) unique hashes out of a total of \(total).")

… results in just 1024 unique hash values for 1_000_000 unique values.

In other words: The recommended implementation causes 99.9% of values to trigger a hash collision.

Out of those 1_000_000 values the median collision count was 976 with min and max being 976 and 1000respectively.

The collision rate will have negative impact in algorithms which heavily use hashValue like the ones in Dictionaryand Set. Furthermore, it increases the vulnerability to DDOS attacks when exposed to the web.

If even the official Swift documentation gets the implementation of hashValue wrong, then who is to expect the average Swift programmer to do any better?

In contrast, running the same snippet using HashVisitable and the semi-secure Fnv1aHash (see below) results in zero collisions!

Finally, the design of the Hashable protocol forces the use of one implementation without the possibility of switching between multiple hashing algorithms.

Proposed solution

Instead of coupling the hashing algorithm with each and every Swift type, we should provide a hashing API based on the visitor-pattern. By freeing application developers from the burden of having to implement hashing algorithms, the Standard Library can provide default ones which fulfill collision, performance and security goals. Furthermore, it would allow developers to swap to different algorithms based on the use case.

Detailed design

The proposal deprecates the Hashable protocol and introduces the following two:

protocol Hasher
{

mutating func finish() -> Int

mutating func write(bytes
: UnsafeRawBufferPointer)
}

protocol HashVisitable
{

func hash<H: Hasher>(_ hasher: inout
H)
}

Hasher is the protocol which represents a hashing algorithm, and HashVisitable replaces Hashable. For types entirely represented by their memory layout, the following protocol would provide a default implementation:

protocol ContiguouslyHashable: HashVisitable {}

extension ContiguouslyHashable {
   func hash<H: Hasher>(_ hasher: inout H) {
       var mutableSelf = self
       try! Swift.withUnsafeBytes(of: &mutableSelf) {
           hasher.write(bytes: $0)
       }
   }
}

extension Bool : ContiguouslyHashable {}
extension UInt8 : ContiguouslyHashable {}
extension UInt16 : ContiguouslyHashable {}
extension UInt32 : ContiguouslyHashable {}
extension UInt64 : ContiguouslyHashable {}
extension UInt : ContiguouslyHashable {}
extension Int8 : ContiguouslyHashable {}
extension Int16 : ContiguouslyHashable {}
extension Int32 : ContiguouslyHashable {}
extension Int64 : ContiguouslyHashable {}
extension Int : ContiguouslyHashable {}

The Standard-Library would then provide a set of hashing implementations specific to each purpose. A possible choice for hashing algorithms would be the reasonably fast SipHash-2-4, and the reasonably secure SipHash-4-8.

FNV-1A is another popular semi-secure but blazingly fast hash algorithm, which – for the sake of demonstration – could be implemented as follows:

struct Fnv1aHash
{

fileprivate var state: UInt

init(seed: UInt
) {

self.state = seed &+ 14695981039346656037

   }
}

extension Fnv1aHash: Hasher
{

mutating func write(bytes
: UnsafeRawBufferPointer) {

for byte in
bytes {

self.state = (self.state ^ UInt(byte)) &* 1099511628211

       }
   }

mutating func finish() -> Int
{

return unsafeBitCast(self.state, to: Int.self
)
   }
}

Coming back to the sample code present in the Hashable documentation, the new implementation would look like:

extension GridPoint: HashVisitable
{

func hash<H: Hasher>(_ hasher: inout
H) {

self.x.hash(&
hasher)

self.y.hash(&
hasher)
   }
}

Source compatibility

Making use of "extending protocols to conform to protocols":

extension Hashable: HashVisitable
{

func hash<H: Hasher>(_ hasher: inout
H) {

self.hashValue.hash(&
hasher)
   }
}

Effect on ABI stability

n/a

Effect on API resilience

This feature should be possible to add/remove without breaking ABI.

Alternatives considered

n/a
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(David Hart) #16

Is there any reason the API couldn’t be expressed as something along these lines?

func hash() -> [Hashable] {
return [x, y]
}

From a pure performance standpoint, it forces the creation of an array. But if you really want something like that, you can define:

extension Hasher {
    mutating func write(_ visitables: [HashVisitable]) {
        for visitable in visitables {
            visitable.hash(&self)
        }
    }
}

So you can do:

func hash<H: Hasher>(_ hasher: inout H) {
    hasher.write([x, y])
}

···

On 13 Mar 2017, at 20:18, Sean Heber <sean@fifthace.com> wrote:

l8r
Sean

On Mar 13, 2017, at 2:15 PM, David Hart <david@hartbit.com> wrote:

On 13 Mar 2017, at 18:54, Sean Heber via swift-evolution <swift-evolution@swift.org> wrote:

I’m dumb when it comes to proper hashing, but it’s such a tediously common thing in my experience to need to add Hashable to some kind of a struct so I can stash it in a set or use it as a dictionary key. Is there really no way to make this all more automatic? I have to be honest - this proposal seems *harder* to understand than the way it works now.

It's not really harder: just call hash on each of your type's significant values:

x.hash(&hasher)
y.hash(&hasher)

How would you implement hashValue in a simpler way, remembering that 'x ^ y' is an incorrect implementation?

Of course the easiest would be if the language could just do this “good enough" for me using reflection or whatever and if I really did run into a problem where I wanted to do this myself, I could override something.

Perfect is the enemy of good.

l8r
Sean

On Mar 13, 2017, at 10:38 AM, Vincent Esche via swift-evolution <swift-evolution@swift.org> wrote:

Hi there,

I've written up a proposal for an overhaul/replacement of the Hashable protocol and would love to hear your feedback on it!

Rendered | Blog Post

Cheers,
Vincent

Ps: I'd like to thank David Hart (@hartbit) for his great editorial feedback on this proposal. :+1:

HashVisitable

• Proposal: SE-NNNN
• Authors: Vincent Esche
• Review Manager: TBD
• Status: Awaiting review
Introduction

Replace the Hashable protocol by two new procotols (Hasher and HashVisitable) to improve safety, versatility and learnability.

Motivation

Implementing Hashable is difficult and the consequences if not done well have dire performance and safety repercussions.

The documentation of Hashable lists a sample implementation of var hashValue:

/// A point in an x-y coordinate system.
struct GridPoint {
var x: Int
var y: Int
}

extension GridPoint: Hashable {
var hashValue: Int {
     return x.hashValue ^ y.hashValue
}

static func == (lhs: GridPoint, rhs: GridPoint) -> Bool {
     return lhs.x == rhs.x && lhs.y == rhs.y
}
}

Calculating the hashes of all GridPoints (given the above implementation) on a 1000 × 1000 grid …

let (width, height) = (1000, 1000)
let total = width * height
var hashes = Set<Int>()
for x in 0..<width {
for y in 0..<height {
     hashes.insert(GridPoint(x: x, y: y).hashValue)
}
}
print("\(hashes.count) unique hashes out of a total of \(total).")

… results in just 1024 unique hash values for 1_000_000 unique values.

In other words: The recommended implementation causes 99.9% of values to trigger a hash collision.

Out of those 1_000_000 values the median collision count was 976 with min and max being 976 and 1000respectively.

The collision rate will have negative impact in algorithms which heavily use hashValue like the ones in Dictionaryand Set. Furthermore, it increases the vulnerability to DDOS attacks when exposed to the web.

If even the official Swift documentation gets the implementation of hashValue wrong, then who is to expect the average Swift programmer to do any better?

In contrast, running the same snippet using HashVisitable and the semi-secure Fnv1aHash (see below) results in zero collisions!

Finally, the design of the Hashable protocol forces the use of one implementation without the possibility of switching between multiple hashing algorithms.

Proposed solution

Instead of coupling the hashing algorithm with each and every Swift type, we should provide a hashing API based on the visitor-pattern. By freeing application developers from the burden of having to implement hashing algorithms, the Standard Library can provide default ones which fulfill collision, performance and security goals. Furthermore, it would allow developers to swap to different algorithms based on the use case.

Detailed design

The proposal deprecates the Hashable protocol and introduces the following two:

protocol Hasher
{

mutating func finish() -> Int

mutating func write(bytes
: UnsafeRawBufferPointer)
}

protocol HashVisitable
{

func hash<H: Hasher>(_ hasher: inout
H)
}

Hasher is the protocol which represents a hashing algorithm, and HashVisitable replaces Hashable. For types entirely represented by their memory layout, the following protocol would provide a default implementation:

protocol ContiguouslyHashable: HashVisitable {}

extension ContiguouslyHashable {
func hash<H: Hasher>(_ hasher: inout H) {
     var mutableSelf = self
     try! Swift.withUnsafeBytes(of: &mutableSelf) {
         hasher.write(bytes: $0)
     }
}
}

extension Bool : ContiguouslyHashable {}
extension UInt8 : ContiguouslyHashable {}
extension UInt16 : ContiguouslyHashable {}
extension UInt32 : ContiguouslyHashable {}
extension UInt64 : ContiguouslyHashable {}
extension UInt : ContiguouslyHashable {}
extension Int8 : ContiguouslyHashable {}
extension Int16 : ContiguouslyHashable {}
extension Int32 : ContiguouslyHashable {}
extension Int64 : ContiguouslyHashable {}
extension Int : ContiguouslyHashable {}

The Standard-Library would then provide a set of hashing implementations specific to each purpose. A possible choice for hashing algorithms would be the reasonably fast SipHash-2-4, and the reasonably secure SipHash-4-8.

FNV-1A is another popular semi-secure but blazingly fast hash algorithm, which – for the sake of demonstration – could be implemented as follows:

struct Fnv1aHash
{

fileprivate var state: UInt

init(seed: UInt
) {

self.state = seed &+ 14695981039346656037

}
}

extension Fnv1aHash: Hasher
{

mutating func write(bytes
: UnsafeRawBufferPointer) {

for byte in
bytes {

self.state = (self.state ^ UInt(byte)) &* 1099511628211

     }
}

mutating func finish() -> Int
{

return unsafeBitCast(self.state, to: Int.self
)
}
}

Coming back to the sample code present in the Hashable documentation, the new implementation would look like:

extension GridPoint: HashVisitable
{

func hash<H: Hasher>(_ hasher: inout
H) {

self.x.hash(&
hasher)

self.y.hash(&
hasher)
}
}

Source compatibility

Making use of "extending protocols to conform to protocols":

extension Hashable: HashVisitable
{

func hash<H: Hasher>(_ hasher: inout
H) {

self.hashValue.hash(&
hasher)
}
}

Effect on ABI stability

n/a

Effect on API resilience

This feature should be possible to add/remove without breaking ABI.

Alternatives considered

n/a
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(David Hart) #17

I'm not convinced this is the right approach: it couples the fact that a type is hashable with a specific strategy for implementing the hash value computation.

I’m confused. This proposal specifically decouples the implementation of hashable from hashing strategies: that’s its main goal. Could you explain how you see coupling here?

Instead, the language should make efforts to avoid requiring the user to think about hashing algorithms *at all*; to answer Sean's question a couple messages up, I've proposed in the past <https://gist.github.com/allevato/2fd10290bfa84accfbe977d8ac07daad> (rough working draft) adding derivation of Equatable and Hashable for structs and enums where it's possible to automatically do so (for example, if all of the enum's associated values are Equatable/Hashable).

This proposal does so with the ContinuouslyHashable protocol, which provides a default implementation of hashable that works for enums and structs that have “inclusive” memory layouts. I think this is good thing, because it means you have to make a conscious choice to use ContinuouslyHashable instead of relying on an automatic implementation which will sometimes be wrong.

···

On 13 Mar 2017, at 20:51, Tony Allevato <tony.allevato@gmail.com> wrote:

I've been experimenting with an implementation in the compiler and I have something that works for enums, except for recursive ones. I haven't started structs yet because I think there are more open questions there, and I hope to propose enums and structs independently so that the enum one doesn't get bogged down by the struct one.

The core team seems to be aware of the need for this; the logic that derives Equatable/Hashable for enums without associated values also has TODO comments to handle those with associated values, and to handle structs as well. Slava Pestov mentioned a while back that they encouraged PRs on it, which is why I started, but my free time has been limited lately.

That being said, I *do* think there would be value in having some kind of hash "mixer" as part of the standard library and strongly encouraging through documentation that hashValue implementors use it. Getting the function right is the hard part, and if Swift both (1) took care of it for you by default in many cases and (2) made the harder cases easier by providing a high quality canned implementation, I think we'd have a much cleaner solution than what is being proposed here.

On Mon, Mar 13, 2017 at 12:18 PM Sean Heber via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:
Is there any reason the API couldn’t be expressed as something along these lines?

func hash() -> [Hashable] {
  return [x, y]
}

l8r
Sean

> On Mar 13, 2017, at 2:15 PM, David Hart <david@hartbit.com <mailto:david@hartbit.com>> wrote:
>
>>
>> On 13 Mar 2017, at 18:54, Sean Heber via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:
>>
>> I’m dumb when it comes to proper hashing, but it’s such a tediously common thing in my experience to need to add Hashable to some kind of a struct so I can stash it in a set or use it as a dictionary key. Is there really no way to make this all more automatic? I have to be honest - this proposal seems *harder* to understand than the way it works now.
>
> It's not really harder: just call hash on each of your type's significant values:
>
> x.hash(&hasher)
> y.hash(&hasher)
>
> How would you implement hashValue in a simpler way, remembering that 'x ^ y' is an incorrect implementation?
>
>> Of course the easiest would be if the language could just do this “good enough" for me using reflection or whatever and if I really did run into a problem where I wanted to do this myself, I could override something.
>>
>> Perfect is the enemy of good.
>>
>> l8r
>> Sean
>>
>>
>>> On Mar 13, 2017, at 10:38 AM, Vincent Esche via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:
>>>
>>> Hi there,
>>>
>>> I've written up a proposal for an overhaul/replacement of the Hashable protocol and would love to hear your feedback on it!
>>>
>>> Rendered | Blog Post
>>>
>>> Cheers,
>>> Vincent
>>>
>>> Ps: I'd like to thank David Hart (@hartbit) for his great editorial feedback on this proposal. :+1:
>>>
>>> HashVisitable
>>>
>>> • Proposal: SE-NNNN
>>> • Authors: Vincent Esche
>>> • Review Manager: TBD
>>> • Status: Awaiting review
>>> Introduction
>>>
>>> Replace the Hashable protocol by two new procotols (Hasher and HashVisitable) to improve safety, versatility and learnability.
>>>
>>> Motivation
>>>
>>> Implementing Hashable is difficult and the consequences if not done well have dire performance and safety repercussions.
>>>
>>> The documentation of Hashable lists a sample implementation of var hashValue:
>>>
>>> /// A point in an x-y coordinate system.
>>> struct GridPoint {
>>> var x: Int
>>> var y: Int
>>> }
>>>
>>> extension GridPoint: Hashable {
>>> var hashValue: Int {
>>> return x.hashValue ^ y.hashValue
>>> }
>>>
>>> static func == (lhs: GridPoint, rhs: GridPoint) -> Bool {
>>> return lhs.x == rhs.x && lhs.y == rhs.y
>>> }
>>> }
>>>
>>> Calculating the hashes of all GridPoints (given the above implementation) on a 1000 × 1000 grid …
>>>
>>> let (width, height) = (1000, 1000)
>>> let total = width * height
>>> var hashes = Set<Int>()
>>> for x in 0..<width {
>>> for y in 0..<height {
>>> hashes.insert(GridPoint(x: x, y: y).hashValue)
>>> }
>>> }
>>> print("\(hashes.count) unique hashes out of a total of \(total).")
>>>
>>> … results in just 1024 unique hash values for 1_000_000 unique values.
>>>
>>> In other words: The recommended implementation causes 99.9% of values to trigger a hash collision.
>>>
>>> Out of those 1_000_000 values the median collision count was 976 with min and max being 976 and 1000respectively.
>>>
>>> The collision rate will have negative impact in algorithms which heavily use hashValue like the ones in Dictionaryand Set. Furthermore, it increases the vulnerability to DDOS attacks when exposed to the web.
>>>
>>> If even the official Swift documentation gets the implementation of hashValue wrong, then who is to expect the average Swift programmer to do any better?
>>>
>>> In contrast, running the same snippet using HashVisitable and the semi-secure Fnv1aHash (see below) results in zero collisions!
>>>
>>> Finally, the design of the Hashable protocol forces the use of one implementation without the possibility of switching between multiple hashing algorithms.
>>>
>>> Proposed solution
>>>
>>> Instead of coupling the hashing algorithm with each and every Swift type, we should provide a hashing API based on the visitor-pattern. By freeing application developers from the burden of having to implement hashing algorithms, the Standard Library can provide default ones which fulfill collision, performance and security goals. Furthermore, it would allow developers to swap to different algorithms based on the use case.
>>>
>>> Detailed design
>>>
>>> The proposal deprecates the Hashable protocol and introduces the following two:
>>>
>>> protocol Hasher
>>> {
>>>
>>> mutating func finish() -> Int
>>>
>>>
>>> mutating func write(bytes
>>> : UnsafeRawBufferPointer)
>>> }
>>>
>>>
>>> protocol HashVisitable
>>> {
>>>
>>> func hash<H: Hasher>(_ hasher: inout
>>> H)
>>> }
>>>
>>> Hasher is the protocol which represents a hashing algorithm, and HashVisitable replaces Hashable. For types entirely represented by their memory layout, the following protocol would provide a default implementation:
>>>
>>> protocol ContiguouslyHashable: HashVisitable {}
>>>
>>> extension ContiguouslyHashable {
>>> func hash<H: Hasher>(_ hasher: inout H) {
>>> var mutableSelf = self
>>> try! Swift.withUnsafeBytes(of: &mutableSelf) {
>>> hasher.write(bytes: $0)
>>> }
>>> }
>>> }
>>>
>>> extension Bool : ContiguouslyHashable {}
>>> extension UInt8 : ContiguouslyHashable {}
>>> extension UInt16 : ContiguouslyHashable {}
>>> extension UInt32 : ContiguouslyHashable {}
>>> extension UInt64 : ContiguouslyHashable {}
>>> extension UInt : ContiguouslyHashable {}
>>> extension Int8 : ContiguouslyHashable {}
>>> extension Int16 : ContiguouslyHashable {}
>>> extension Int32 : ContiguouslyHashable {}
>>> extension Int64 : ContiguouslyHashable {}
>>> extension Int : ContiguouslyHashable {}
>>>
>>> The Standard-Library would then provide a set of hashing implementations specific to each purpose. A possible choice for hashing algorithms would be the reasonably fast SipHash-2-4, and the reasonably secure SipHash-4-8.
>>>
>>> FNV-1A is another popular semi-secure but blazingly fast hash algorithm, which – for the sake of demonstration – could be implemented as follows:
>>>
>>> struct Fnv1aHash
>>> {
>>>
>>> fileprivate var state: UInt
>>>
>>>
>>> init(seed: UInt
>>> ) {
>>>
>>> self.state = seed &+ 14695981039346656037
>>>
>>> }
>>> }
>>>
>>>
>>> extension Fnv1aHash: Hasher
>>> {
>>>
>>> mutating func write(bytes
>>> : UnsafeRawBufferPointer) {
>>>
>>> for byte in
>>> bytes {
>>>
>>> self.state = (self.state ^ UInt(byte)) &* 1099511628211
>>>
>>> }
>>> }
>>>
>>> mutating func finish() -> Int
>>> {
>>>
>>> return unsafeBitCast(self.state, to: Int.self
>>> )
>>> }
>>> }
>>>
>>> Coming back to the sample code present in the Hashable documentation, the new implementation would look like:
>>>
>>> extension GridPoint: HashVisitable
>>> {
>>>
>>> func hash<H: Hasher>(_ hasher: inout
>>> H) {
>>>
>>> self.x.hash(&
>>> hasher)
>>>
>>> self.y.hash(&
>>> hasher)
>>> }
>>> }
>>>
>>> Source compatibility
>>>
>>> Making use of "extending protocols to conform to protocols":
>>>
>>> extension Hashable: HashVisitable
>>> {
>>>
>>> func hash<H: Hasher>(_ hasher: inout
>>> H) {
>>>
>>> self.hashValue.hash(&
>>> hasher)
>>> }
>>> }
>>>
>>> Effect on ABI stability
>>>
>>> n/a
>>>
>>> Effect on API resilience
>>>
>>> This feature should be possible to add/remove without breaking ABI.
>>>
>>> Alternatives considered
>>>
>>> n/a
>>> _______________________________________________
>>> swift-evolution mailing list
>>> swift-evolution@swift.org <mailto:swift-evolution@swift.org>
>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>
>> _______________________________________________
>> swift-evolution mailing list
>> swift-evolution@swift.org <mailto:swift-evolution@swift.org>
>> https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution


(Tony Allevato) #18

Adding the list back to this reply because I don't believe you meant to
reply only to me, and I think it's an important discussion to have :slight_smile:

Automatic generation of Hashable implementation code only "solves" the
problem of having to implement `var hashValue`.
It however only works for some types. By no means for all.

Certainly, and I never suggested that it would work for every case. I was
responding to Sean's point that the compiler should be able to do "good
enough" in the common cases and I offered a way that that can be achieved.

Take this hypothetical implementation of a HSB color:

struct Color {
let hue: UInt8
let saturation: UInt8
let brightness: UInt8
}

Let the semantic of this type be this:
Any two colors with a `brightness` of `0` are to be considered equal
regardless of their respective `hue` or `saturation`. At night all cats are
gray.

An auto-generated conformance impl for `Color` would most likely be wrong
afaict.
And those kind of semantics are everywhere.

Of course, and in those cases, you wouldn't want to use an auto-derived
Equatable or Hashable implementation. Those kinds of semantics are
"everywhere" for certain definitions of "everywhere", but similarly,
semantics where the hash value of a thing can be determined simply via
combination of the hash values of its components are also "everywhere".

I wouldn't suggest that auto-deriving Hashable solves *all* problems, and
similarly, your proposal doesn't help in the situation you described
either. Your API provides a different way to mix the hue, saturation, and
brightness components in an implementation of hashValue, but it doesn't
force the user to mix them *correctly* (by excluding hue/saturation if
brightness is zero).

So in both cases, users still have to provide custom implementations of ==
and hashValue. But without auto-deriving, users have to provide them even
for the cases where the auto-derived implementation *would have been
correct.* Auto-deriving is about reducing the number of types that need to
have custom implementations, thereby reducing the chance that a user will
do it wrong.

When considering a type with auto-derived Hashable and Equatable, there are
two ways it can be wrong:

1) The auto-generated implementations of both == and hashValue don't honor
the semantic contract of the type (for example, don't ignore brightness
when it's zero).
2) The user overrides the auto-generated implementation of one of
==/hashValue but not both, and violates the contracts between them.

For #1, yes, the compiler generated an incorrect implementation in that
case. However, I would argue it's the developer's responsibility to fix it
by overriding it if they need different semantics. As I mentioned above,
without derivation, the developer could still implement it incorrectly,
just as they could with your API.

For #2, this could be solved by requiring users to override both if they
override one of them. Now they're back in the same situation as #1—they
either did it right, or they did it wrong.

Auto-generation clearly is no panacea when it comes to hashing. Especially
if it leads to invalid and unsafe default implementations.

Auto-deriving cannot produce "unsafe" implementations because it's defined
in terms of a function operating over the hash values of its components. It
can produce an implementation that does not match the intended semantics of
the class that are defined outside of its component values, but that's not
the compiler's job to decide; it's up to the user to provide those.

Regarding unsafety, it's worth noting that your ContiguouslyHashable
protocol is inherently unsafe and fragile. Imagine that a user implements a
struct that they make conform to ContiguouslyHashable because at the time,
it's a simple fixed layout type with primitive fields. If they take that
type and add a new field to it that isn't contiguously hashable (say, a
class reference) and they forget to replace the ContiguouslyHashable
conformance, they now have a very broken type, and that behavior isn't
caught until *runtime* when things go wrong.

At least with derived conformances, in that situation the *compiler* would
see that the type was no longer hashable and emit an error when it's used
anywhere that Hashable/hashValue was expected.

So if safety is your motivation, ContiguouslyHashable kind of fights back
against that because it gives the user a door they can open—in some cases,
accidentally—that produces invalid results.

It would however be nice to be able to annotate types for which you want
HashVisitable implementations to be generated.

That's one of the still-open questions from the last discussion thread on
the topic; whether auto-deriving should be automatic for any type where it
is possible (as it is now for enums without associated values) or whether
the user should have to opt-in.

I actually don't see auto-generation as an alternative to a proper hashing
API. But rather as an addition.
HashVisitable and auto-deriving would actually work great together!

I completely agree that these ideas complement each other very well! And I
also agree that the language would do well to make it easier for users to
easily do the right thing. I just feel that *the API proposed here
specifically* adds too much complexity for the problem it's trying to solve.

I'd find it more compelling if it was simplified a bit:

* The standard library doesn't need to provide a number of hash
implementations; it should just provide one that works "well enough" in
most cases
* Hashable doesn't have tie itself to a visitor pattern. It's not
necessarily safer because users can still mix the wrong values; in that
case, I'd rather the stdlib implementation of the "good enough" hash just
be a standalone mixer that the language can encourage users to use.

In fact they already do in other languages <https://is.gd/iy5770>.

I'm not convinced this is the right approach: it couples the fact that a
type is hashable with a specific strategy for implementing the hash value
computation.

Instead, the language should make efforts to avoid requiring the user to
think about hashing algorithms *at all*; to answer Sean's question a couple
messages up, I've proposed in the past
<https://gist.github.com/allevato/2fd10290bfa84accfbe977d8ac07daad> (rough
working draft) adding derivation of Equatable and Hashable for structs and
enums where it's possible to automatically do so (for example, if all of
the enum's associated values are Equatable/Hashable). I've been
experimenting with an implementation in the compiler and I have something
that works for enums, except for recursive ones. I haven't started structs
yet because I think there are more open questions there, and I hope to
propose enums and structs independently so that the enum one doesn't get
bogged down by the struct one.

The core team seems to be aware of the need for this; the logic that
derives Equatable/Hashable for enums without associated values also has
TODO comments to handle those with associated values, and to handle structs
as well. Slava Pestov mentioned a while back that they encouraged PRs on
it, which is why I started, but my free time has been limited lately.

That being said, I *do* think there would be value in having some kind of
hash "mixer" as part of the standard library and strongly encouraging
through documentation that hashValue implementors use it. Getting the
function right is the hard part, and if Swift both (1) took care of it for
you by default in many cases and (2) made the harder cases easier by
providing a high quality canned implementation, I think we'd have a much
cleaner solution than what is being proposed here.

Is there any reason the API couldn’t be expressed as something along these
lines?

func hash() -> [Hashable] {
  return [x, y]
}

l8r
Sean

I’m dumb when it comes to proper hashing, but it’s such a tediously

common thing in my experience to need to add Hashable to some kind of a
struct so I can stash it in a set or use it as a dictionary key. Is there
really no way to make this all more automatic? I have to be honest - this
proposal seems *harder* to understand than the way it works now.

It's not really harder: just call hash on each of your type's significant

values:

x.hash(&hasher)
y.hash(&hasher)

How would you implement hashValue in a simpler way, remembering that 'x ^

y' is an incorrect implementation?

Of course the easiest would be if the language could just do this “good

enough" for me using reflection or whatever and if I really did run into a
problem where I wanted to do this myself, I could override something.

Perfect is the enemy of good.

l8r
Sean

Hi there,

I've written up a proposal for an overhaul/replacement of the Hashable

protocol and would love to hear your feedback on it!

Rendered | Blog Post

Cheers,
Vincent

Ps: I'd like to thank David Hart (@hartbit) for his great editorial

feedback on this proposal. :+1:

HashVisitable

  • Proposal: SE-NNNN
  • Authors: Vincent Esche
  • Review Manager: TBD
  • Status: Awaiting review
Introduction

Replace the Hashable protocol by two new procotols (Hasher and

HashVisitable) to improve safety, versatility and learnability.

Motivation

Implementing Hashable is difficult and the consequences if not done

well have dire performance and safety repercussions.

The documentation of Hashable lists a sample implementation of var

hashValue:

/// A point in an x-y coordinate system.
struct GridPoint {
  var x: Int
  var y: Int
}

extension GridPoint: Hashable {
  var hashValue: Int {
      return x.hashValue ^ y.hashValue
  }

  static func == (lhs: GridPoint, rhs: GridPoint) -> Bool {
      return lhs.x == rhs.x && lhs.y == rhs.y
  }
}

Calculating the hashes of all GridPoints (given the above

implementation) on a 1000 × 1000 grid …

let (width, height) = (1000, 1000)
let total = width * height
var hashes = Set<Int>()
for x in 0..<width {
  for y in 0..<height {
      hashes.insert(GridPoint(x: x, y: y).hashValue)
  }
}
print("\(hashes.count) unique hashes out of a total of \(total).")

… results in just 1024 unique hash values for 1_000_000 unique values.

In other words: The recommended implementation causes 99.9% of values

to trigger a hash collision.

Out of those 1_000_000 values the median collision count was 976 with

min and max being 976 and 1000respectively.

The collision rate will have negative impact in algorithms which

heavily use hashValue like the ones in Dictionaryand Set. Furthermore, it
increases the vulnerability to DDOS attacks when exposed to the web.

If even the official Swift documentation gets the implementation of

hashValue wrong, then who is to expect the average Swift programmer to do
any better?

In contrast, running the same snippet using HashVisitable and the

semi-secure Fnv1aHash (see below) results in zero collisions!

Finally, the design of the Hashable protocol forces the use of one

implementation without the possibility of switching between multiple
hashing algorithms.

Proposed solution

Instead of coupling the hashing algorithm with each and every Swift

type, we should provide a hashing API based on the visitor-pattern. By
freeing application developers from the burden of having to implement
hashing algorithms, the Standard Library can provide default ones which
fulfill collision, performance and security goals. Furthermore, it would
allow developers to swap to different algorithms based on the use case.

Detailed design

The proposal deprecates the Hashable protocol and introduces the

following two:

protocol Hasher
{

mutating func finish() -> Int

mutating func write(bytes
: UnsafeRawBufferPointer)
}

protocol HashVisitable
{

func hash<H: Hasher>(_ hasher: inout
H)
}

Hasher is the protocol which represents a hashing algorithm, and

HashVisitable replaces Hashable. For types entirely represented by their
memory layout, the following protocol would provide a default
implementation:

protocol ContiguouslyHashable: HashVisitable {}

extension ContiguouslyHashable {
  func hash<H: Hasher>(_ hasher: inout H) {
      var mutableSelf = self
      try! Swift.withUnsafeBytes(of: &mutableSelf) {
          hasher.write(bytes: $0)
      }
  }
}

extension Bool : ContiguouslyHashable {}
extension UInt8 : ContiguouslyHashable {}
extension UInt16 : ContiguouslyHashable {}
extension UInt32 : ContiguouslyHashable {}
extension UInt64 : ContiguouslyHashable {}
extension UInt : ContiguouslyHashable {}
extension Int8 : ContiguouslyHashable {}
extension Int16 : ContiguouslyHashable {}
extension Int32 : ContiguouslyHashable {}
extension Int64 : ContiguouslyHashable {}
extension Int : ContiguouslyHashable {}

The Standard-Library would then provide a set of hashing

implementations specific to each purpose. A possible choice for hashing
algorithms would be the reasonably fast SipHash-2-4, and the reasonably
secure SipHash-4-8.

FNV-1A is another popular semi-secure but blazingly fast hash

algorithm, which – for the sake of demonstration – could be implemented as
follows:

struct Fnv1aHash
{

fileprivate var state: UInt

init(seed: UInt
) {

self.state = seed &+ 14695981039346656037

  }
}

extension Fnv1aHash: Hasher
{

mutating func write(bytes
: UnsafeRawBufferPointer) {

for byte in
bytes {

self.state = (self.state ^ UInt(byte)) &* 1099511628211

      }
  }

mutating func finish() -> Int
{

return unsafeBitCast(self.state, to: Int.self
)
  }
}

Coming back to the sample code present in the Hashable documentation,

the new implementation would look like:

···

On Mon, Mar 13, 2017 at 4:32 PM Vincent Esche < regexident.mailinglists@gmail.com> wrote:
On Mon, Mar 13, 2017 at 8:51 PM, Tony Allevato via swift-evolution < swift-evolution@swift.org> wrote:
On Mon, Mar 13, 2017 at 12:18 PM Sean Heber via swift-evolution < swift-evolution@swift.org> wrote:

On Mar 13, 2017, at 2:15 PM, David Hart <david@hartbit.com> wrote:

On 13 Mar 2017, at 18:54, Sean Heber via swift-evolution < swift-evolution@swift.org> wrote:

On Mar 13, 2017, at 10:38 AM, Vincent Esche via swift-evolution < swift-evolution@swift.org> wrote:

extension GridPoint: HashVisitable
{

func hash<H: Hasher>(_ hasher: inout
H) {

self.x.hash(&
hasher)

self.y.hash(&
hasher)
  }
}

Source compatibility

Making use of "extending protocols to conform to protocols":

extension Hashable: HashVisitable
{

func hash<H: Hasher>(_ hasher: inout
H) {

self.hashValue.hash(&
hasher)
  }
}

Effect on ABI stability

n/a

Effect on API resilience

This feature should be possible to add/remove without breaking ABI.

Alternatives considered

n/a
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Vincent Esche) #19

Turns out I’m rather bad at mailing lists. Hit “reply” (vs. "reply all")
again.

Yes please! I’ve a package on GitHub to implement roughly the same thing.
I’ve been happily using it for months now, and I wouldn’t ever write a
hashValue implementation by hand again.

https://github.com/lorentey/SipHash

I think the fact that we’ve both come up with essentially the same API is
an interesting data point; it definitely underlines the need for a better
Hashable protocol.

It's not just us, actually. Rust does the very same thing
<https://doc.rust-lang.org/std/hash/>.
A very similar concept has been proposed
<https://isocpp.org/files/papers/n3980.html> and implemented
<https://github.com/HowardHinnant/hash_append> for C++ by Howard Hinnant.

Hm. It is possible I may have been influenced by one of these.

My comments:

* In an ideal world, this would be a replacement for Hashable, not a
protocol with a different name. Once visitor-based hashing becomes
available, nobody should implement a hashValue property.

Same here. I fear however that turning a protocol upside down would do
more harm than good.

* All standard Hashable types in standard library should implement the
new hash function directly, rather than relying on the default
implementation.

Yes.

* Why is the HashVisitable.hash a generic function? Hasher could just as

well be a concrete type defined in stdlib. Making it generic may have some
performance implications.

Because we specifically don't want to
<https://blog.definiteloops.com/ha-r-sh-visitors-8c0c3686a46f#.6emnc0d5x>
be limited to a single hash implementation.

Allowing custom hashers indeed sounds nice in theory, but are there any
more practical examples where you'd want to actually do that? Are Bloom
Filters an important enough use case on their own to justify the extra
complexity? A concrete Hasher type would still allow stdlib to switch to a
different hash algorithm; this is already a long way beyond what people are
typically doing with Hashable.)

Bloom Filters are – while very interesting – an edge case, admittedly.
Being able to choose a different hasher for certain Dictionaries, Sets, …
however is a key goal for me. You want secure defaults, yet be able to
exchange them with a weaker impl for safe bottlenecks.

Especially in the scope of Swift going more and more into systems
programming territory. You don’t want to wait yourself into a corner now.

I'm also worried people would find clever ways to abuse the visitor

parameter in ways not related to hashing. (E.g., you can implement one half
of an amusingly broken serialization library in just two minutes by making
OutputStream conform to Hasher. Implementing deserialization might take a
*little* more work, though.) :smiling_imp:

While technically true that’s kind of a straw man argument, isn’t it? :wink:
I could easily turn it around and argue that nothing hinders me from
implementing half of my app in `var hashValue`. :smiling_imp:

* I find that I prefer to hash components by calling a mutating method on

the hasher, rather than directly calling the components' hash
implementations. Putting the hasher first is much more readable to me,
primarily because it gets rid of all the distracting &s. It also makes it
possible to find slightly better names, eliminating the repetitiveness of
"foo.hash(&hasher)":

extension GridPoint: SipHashable {
    func appendHashes(to hasher: inout
SipHasher) {
         hasher.append(x)
         hasher.append(y)
    }
}

That would require SipHasher to overload `append` for every type
imaginable. And it would require SipHasher know about the type's memory
layout, which violates encapsulation. "What should be hashed?" should be
specified in the type. "How should it be hashed?" however in the given
hasher.

Hasher's mutating method would be implemented as a one-line generic
function:

https://github.com/lorentey/SipHash/blob/master/sources/
SipHashable.swift#L56

The actual hashing would of course still be done by implementations of the
HashVisitable protocol. This is just a convenience thing.

Oh, I misread then.

(Admittedly this would introduce a generic function, while at the same time

I'm also arguing for making the protocol requirement non-generic partly
due to (not very convincing) worries about performance. But this would be a
single fragile generic in stdlib, not many generics sprinkled over separate
user-defined modules.)

* I suggest using SipHash instead of FNV-1a. The standard library already

contains an implementation for SipHash, as undocumented internal API,
complete with a note that it should be made public. AFAICR, it is currently
only used for String hashing, but it would be very much worth making it
universal. (Accomodating SipHash's random key is one reason why hashValue’s
documentation explicitly notes that its value is "not guaranteed to be
equal across different executions of your program.”)

I do not propose making FNV-1a part of stdlib. I just included FNV-1a
because it is trivial to implement and easily fits in a proposal as a
sample impl of Hasher.

I do however propose to include SipHash-2-4, SipHash-4-8 and maybe even
SipHash-1-3.

Oh, indeed, I misread the proposal. (Sorry!) But if the API allows for
custom hashers, and stdlib comes with more than one, then which one would
Set and Dictionary use? Would their choice of hasher be user configurable?
stdlib has been remarkably free of such elaborations so far. The
alternative is to bake a particular hash into the standard hashed
collections, but then the overall API would be strangely lopsided, with an
inconsistent mix of flexible and rigid parts.

I would propose to make them generic, but default to a Hasher provided by
stdlib (like `Set<Element, Hasher = SipHash13>`).

All these questions go away if there is but a single standard Hasher.
Well, except the question about which algorithm should be the One. But I
think any keyed hash function would be much better than the status quo.

* ContiguouslyHashable seems like an unsafe construct to me. It is quite

dangerous to base hashing on the raw byte sequence underlying a value: for
example, struct values may include uninitialized gaps between some of their
stored properties due to alignment constraints. So two otherwise identical
values may very well have different in-memory representations.

These type are clearly not ContiguouslyHashable then. Which was also
explained in detail in the accompanying blog post:

"ContiguouslyHashable marks a special family of types which allow for
simply passing their contiguous memory to the given hasher en bloc, which
Float, Double and String would decidedly not conform to, while for types
matching these criteria then it would be a simple matter of tagging them
with the protocol." Blog Post
<https://blog.definiteloops.com/ha-r-sh-visitors-8c0c3686a46f#.6emnc0d5x>

Therefore, I suggest ContiguouslyHashable should be removed from the
proposal. Swift programmers who know what they’re doing would still be able
to call withUnsafeBytes(of:) when they want to, but regular schmoes like
myself will not be tempted to shoot themselves in the foot by using it
inappropriately.

Fine with me. I don't care to much about how those stdlib primitive types
gain their hashing powers. :wink:

To be clear, I think it would be fine to keep ContiguouslyHashable as an
internal implementation detail in stdlib; it just shouldn't be a public,
documented protocol in its current form. (Perhaps as
UnsafeContiguouslyHashable? :slightly_smiling_face:) I don't think Swift programmers in general
are aware of such subtle aspects of how their structs are laid out in
memory. It's certainly a fuzzy area for me, at least; for example, I
remember getting surprised by the fact that Float80 values end with six
uninitialized bytes; no other floating point type in stdlib is padded like
that.

I’d be absolutely fine with just auto-generating HashVisitable for all the
types I had tagged as ContiguouslyHashVisitable instead. Thinking about it
I’d actually prefer that now. Keep it an implementation detail and for all
those who really want to hash a type from its contiguous memory slice the
code necessary is easy to whip up yourself.

To pick another nit: is it necessary to put a label on the parameter of
Hasher.write(bytes:)? I may not be up to date on the naming convention, but
it seems to me it would read just as well without the label. (I also think
there are problems with the name HashVisitable.hash(_:), but my best
attempt at naming it is writeHashes(to:) which isn't much of an
improvement.)

I haven’t put too much effort into the naming, to be honest. As such I
don’t really like HashVisitable either. But just changing the semantics of
Hashable by 180° for the sake of a better name isn’t feasible.

···

On Tue, Mar 14, 2017 at 3:34 AM, Károly Lőrentey <karoly@lorentey.hu> wrote:

On 2017. Mar 13., at 23:56, Vincent Esche <regexident.mailinglists@ > gmail.com> wrote:
On Mon, Mar 13, 2017 at 9:56 PM, Károly Lőrentey <karoly@lorentey.hu> > wrote:

Karoly


(Greg Parker) #20

This would lead to an implementation nightmare for hashing containers. Consider what the implementation of Dictionary<Any, String> would look like if any of its keys could have incompatible hash outputs.

···

On Mar 14, 2017, at 12:01 PM, David Sweeris via swift-evolution <swift-evolution@swift.org> wrote:

Are we committed to having `hashValue` always be an `Int`, or could it be an associated type like `(Int, Int, Int, Int)`? Seems like especially for something like a BigNum type or an Array, there might simple not be a reasonably efficient way to uniquely-ish represent 1024 bits with just 64 bits.

(This kinda feels like one of those questions where the answer starts out with a variation on “you’re missing the point”)

--
Greg Parker gparker@apple.com <mailto:gparker@apple.com> Runtime Wrangler