Combining hashes

I'd say in that case, the (presumably precalculated) hash becomes a standin for the structure itself, so it can simply be the basis for that type's Equatable and Hashable implementation. Reifying the concept of identity like that is an important technique, and it does not preclude types from implementing Equatable — quite the opposite!

(Note that Hasher would not be a good choice to generate such checksums: the function it implements was not designed for this purpose and Hasher may switch to other algorithms in subsequent Swift releases, breaking any previously generated checksums. But Hasher is perfectly happy to be fed checksums rather than the real data.)

2 Likes

I did carefully consider this approach. I believe it would work fine for a regular protocol inside or outside the stdlib, even after we have stable ABI. Availability attributes already support this kind of protocol evolution reasonably well.

However, Hashable is special, because SE-0185 added compiler support for synthesizing conformances to it. Those synthesized conformances must always implement hash(into:), and having default implementations in the stdlib interferes with that. Doing the whole thing in the compiler really is the right choice in this case.

1 Like

Is there any reason such types can't or shouldn't be Equatable? That is, you're free to use the hash as a probabilistic comparison function (that's what it's for after all) and then decide it's not worth a full-precision comparison.

However, == on such types shouldn't just compare hashes. At the very least, hashing each large structure and then comparing hashes is more computational work than just comparing them directly.

1 Like

I'm not sure that the issue is really compiler synthesis. The language is not ABI stable, so all code must be compiled with the same compiler to be guaranteed to work. We can ensure everybody has the correct synthesised methods.

The actual issue is source compatibility for people who implemented Hashable manually (probably lots of people, since synthesis is quite recent). If we are talking about outright replacing the requirement, they would get non-conformance errors when compiling in Swift 5 mode, and have to migrate their code. There's no way around that.

There are less-magic options:

  • Fix-its/Migrator
    A kind of "tolerable" compiler magic (It just brings your code up to speed, once). We have a simple implementation in terms of the existing hashValue, so that should be easy.
  • A protocol refinement
    Naming might be awkward, but this community loves bikeshedding, it avoids all of these issues and fits within the current model very well. The compiler could automatically upgrade the protocol it synthesises anyway (you write Hashable, it synthesises HashVisitable or whatever it's called). Generic code doesn't need to care and can continue to use Hashable.
  • A new where constraint?
    I could imagine something like extension Hashable where implements(hashValue). That would be really nice...

The problem with this solution is that users will end up with both hashValue and hash(into:) implemented, which increases the risk of implementations becoming out of sync.

If we end up automatically synthesising that new protocol, then it is no less magic than @lorentey's solution while increasing the API surface area and discoverability.

I'm even less enthusiastic about this solution as it introduces new syntax, complexifying the language to solve one use-case.

I would prefer automatically generating hash(into:) and hashValue in the compiler for now, and wait until we get a good meta-programming/macro system that would allow us to remove those from the compiler.

An important feature of the compiler/stdlib is that it supports code written for previous versions of Swift. Swift 3 and 4 code can live together in the same binary today. Even if we deprecate hashValue in a future release, we will still have to support compiling code written for previous versions of Swift, without forcing users through a disruptive all-or-nothing migration step.

So we need to fully support compiling code that only implements hashValue. For a normal protocol, this should be easy enough to do by supplying default implementations, carefully gated to specific Swift versions:

extension Hashable {
  @available(swift, introduced: 3.x, obsoleted: y)
  public func hash(into hasher: inout Hasher) {
    hasher.append(self.hashValue)
  }

  @available(swift, introduced: 3.x, deprecated: y)
  public var hashValue: Int {
    var hasher = Hasher()
    hash(into: &hasher)
    return hasher.finalize()
  }
}

So far, so good. But then consider what happens when we need to synthesize Hashable in Swift 4 mode.

// Compiled with -swift-version 4
struct Foo: Hashable {
  let a: Int
  let b: String
}

Ideally, the synthesized implementation would look like this:

@derived func hash(into hasher: inout Hasher) {
  hasher.append(a)
  hasher.append(b)
}
@derived var hashValue: Int {
    var hasher = Hasher()
    hash(into: &hasher)
    return hasher.finalize()
}  

There are two problems here:

  • The synthesized body of hashValue is the same as the default implementation of it in Swift 5 mode; we're duplicating code between the stdlib and the compiler.
  • Far, far worse is that in Swift 4 mode, hash(into:) already has an implementation in the stdlib, so the compiler wouldn't consider synthesizing it at all. So Hashable synthesis falls into a recursive hole and doesn't work at all in Swift 4 mode.

This is really unfortunate. I see four ways to get out of this conundrum:

  1. Add magic to the compiler to recognize that we're compiling older code, and to do the actual hashing work in the synthesized hashValue implementation instead:

    @derived var hashValue: Int {
      var hasher = Hasher()
      hasher.append(a)
      hasher.append(b)
      return hasher.finalize()
    }
    

    I find this is an unpalatable solution -- Foo may be hashed as part of a larger structure, and in that case spawning a new hasher is not nearly as efficient as simply feeding Foo's components to the existing one. It also means that we need to have two distinct paths for the actual synthesis code (where we need to iterate over components), which is deeply unsatisfying.

  2. Add deep type checker magic to the compiler to ignore the default implementation of hash(into:) in this particular case, so that we forcibly generate a synthesized implementation. I actually implemented a variant of this in #14935. It did happen to work most of the time; however, it was such a shameful, fragile abuse of the compiler's type checker that I quickly retracted it.

  3. Remove both default implementations from the stdlib, and move them to the compiler instead. hashValue is always synthesized as the default implementation above. When we need to synthesize hash(into:), we can simply switch between the two alternative implementations depending on whether hashValue resolves to a synthesized or explicit implementation. This is what I ended up implementing in #15122, and it seems to be the cleanest approach. The compiler can easily produce deprecation warnings for certain combinations of implementations and language modes.

  4. Add a third, hidden requirement to Hashable, called _hash(into:), which is always synthesized by the compiler, and either forwards to hashValue/hash(into:) (if one of them has an explicit implementation) or implements the synthesis itself. This puts several extra twists on top of solution 3 above; in my opinion, it messes up the ABI for no good reason.

I believe solution 3 is the clear winner here; it provides the most satisfactory solution to both of the original problems. By necessity, it involves some compiler work, because SE-0185 added special language support for Hashable. But this is just a fact of life: stdlib and compiler work often goes hand in hand. I don't think changing that would be a viable goal here.

If you do have a better solution, help us implement it! The stdlib part is on the master branch today (although underscored), and #15122 would be a nice starting point for the synthesizer update.

All of this is implementation detail that may change without notice and is only tangentially related to the topic at hand. I suspect that in this topic, we should concentrate a little more on what we intend to achieve and a little less on how we're getting there.

Removing the hashValue requirement is not a viable option; the most we can do is to deprecate it. And even that probably requires some change to how we handle @available attributes on protocol members. In particular, getting a deprecation warning for hashValue implementations is not currently possible. (But it would be nice to get a warning in Swift 5 mode when hashValue conformance is resolved to anything other than a synthesized implementation.)

We shouldn't merely "tolerate" compiler changes; we should cherish them! The raison d'être of swift-evolution is to discuss changing the compiler in interesting new ways.

So far, I kind of assumed we can get away with just asking people to either rewrite hashValue to hash(into:) or (preferably) to just remove their custom implementation and let automatic synthesis do the work for them. Fix-its would work fine for that! I think there should be two of them, so that users can select the correct option:

  • Remove hashValue (when automatic synthesis is available for the declaration context)
  • Change the hashValue signature to hash(into:), leaving the body intact, ready for manual migration. (The body shouldn't be commented out, because an empty hash(into:) implementation would compile successfully and do the wrong thing.)

A (worse) alternative to full deprecation is to silently provide a default implementation of hashValue so that the compiler starts complaining about missing hash(into:) when people forget to implement anything. That and a documentation update would painlessly cover most of the migration in the long term. However, keeping hashValue around indefinitely would clearly be a mistake; I'd prefer to force people to change existing code at some point.

The Swift test suite includes a prototype implementation of this; in a previous life, some random developer implemented essentially the same thing as an external package, and this is also what @regexident proposed in his original pitch.

I strongly believe that this approach is simply not the right choice within the stdlib. The prospect of a naming discussion alone is a really strong argument against considering this route: we already have the perfect name for hashable things! Also, auto-conforming to the new protocol would truly be a one-off compiler hack.

Hashable needs to be a single protocol -- hashValue already caused far too much confusion, and throwing a second protocol (with a worse name!) in the mix would just make things even worse.

This is a really interesting idea! It needs to be fleshed out and discussed in a separate pitch. If it was implemented in a future language version, we could probably use it to migrate some parts of Hashable synthesis into the stdlib.

The most general solution to this problem is one we sorely need in other circumstances as well, and potentially not terribly onerous to implement.

We already have other circumstances in the standard library where (for convenience or backwards compatibility) two protocol requirements have default implementations that reference each other (heterogeneous generic operators and homogeneous ones; Comparable requirements and Strideable requirements). Users who stumble into this situation find that their "conforming" types compile but then crash at runtime.

The solution I'm thinking of is teaching the compiler to recognize these circular references as scenarios in which either requirement A or B must be implemented at minimum, rather than blindly permitting both default implementations to call each other endlessly. (Of course, this would have to be limited to default implementations that are inlinable--i.e., where the compiler can actually determine what's in the body of the implementations.) As a bonus, if one of these default implementations is marked as deprecated, the compiler might always recommend that the user implement the other one.

With ABI stability, such a feature would be hugely important in allowing us to evolve protocols in backward-compatible ways. For this particular circumstance, it would permit us to rip out custom compiler magic except where it pertains to synthesized conformance to Hashable.

5 Likes

Supporting either/or protocol requirements by providing default implementations based on the presence/absence of "direct" implementations for other requirements sounds like an intriguing idea. It needs to be explored further: For example, in the case of Hashable, what would extension Hashable where Self.implements(hashValue) mean, exactly? When is an implementation of hashValue not good enough for implements? Default implementations aren't currently marked as such.

I'd love to find out where this discussion takes us; however, it should be pitched in a new topic.

Would @available attributes not work in the usual case? As I explained above, the only reason Hashable evolution needs special compiler support is because it can be synthesized by the compiler. There are only a few protocols like that, and I really don't think we should bend over backwards to make it easier to evolve these. The need to replace a requirement in a synthesized protocol is an extremely rare thing.

I have wanted something like this for quite a while. I generally don't like to provide mutually recursive default implementations but have run into a number of cases where mutually exclusive default implementations would have been really nice. I'll start a new thread.

4 Likes

Thanks very much for the detailed replies on this; it's clear that you've thoroughly explored the implementation space. We'll continue that discussion in another topic.

I think I explained already why I dislike magic in general, but if there really is no other way... ¯\_(ツ)_/¯. It shouldn't stop us making things better.

Thank you everyone for the discussion. I collected what we learned into a formal proposal; please find it at the URL below:

https://github.com/apple/swift-evolution/pull/821

1 Like