Combining hashes

Sounds like we're in full agreement there. The stdlib is often a testbed for potential future language enhancements -- some of the "magic" is only magic because we don't yet know enough about the problem to produce a well-designed language feature for it. For example, consider how stdlib code has always been magically inlinable/specializable across module boundaries -- and compare that to the universally available @inlinable attribute we're now discussing in SE-0193.

Custom conformance synthesis for user protocols is not currently possible within Swift; however, we do have synthesized protocols in the stdlib, and they provide a nice set of examples for problems such a feature would be expected to solve. The either/or requirement logic of the new Hashable protocol we're discussing adds an interesting new dimension to this design space -- the compiler will probably need a little refactoring to handle such rules well. Conformance synthesis may not be ready to progress beyond magic yet; but we're collecting data and gaining practice.

3 Likes

Note: the dividing line between the standard library and Foundation is precisely whether the feature requires a special relationship with the compiler. If a type doesn’t require that relationship, it doesn’t belong in the standard library:

How do we decide if something belongs in the standard library or Foundation?

In general, the dividing line should be drawn in overlapping area of what people consider the language and what people consider to be a library feature.

For example, Optional is a type provided by the standard library. However, the compiler understands the concept to provide support for things like optional-chaining syntax. The compiler also has syntax for creating Arrays and Dictionaries.

On the other hand, the compiler has no built-in support for types like URL. URL also ties into more complex functionality like basic networking support. Therefore this type is more appropriate for Foundation.

If at all possible, I would prefer deprecating hashValue in newer versions of Swift so that we can keep the name Hashable for the latest hotness. I don't know how much flexibility @available can give us here, @moiseev probably knows.

4 Likes

Related, maybe slightly OT, but more relevant now that Hashable is getting hotter: does it make sense to keep Hashable a subprotocol of Equatable?
My understanding is that the current Hashable doesn't really mean "something that can produce an hash", but more like "an object that can be used in data structures that use hashes".
That has been quite fine but if someone wants to hop on the new good hashing functionality without having to go through equality, they just can't...

Equality is necessary requirement of hashing collections to detect/resolve conflicts. Do you have some use cases of hashing that doesn't require equality?

2 Likes

My main idea is when you have some type representing a big structure, i.e. a typed representation of a big file. In these cases you usually "trust" the hash to be enough and don't follow with full equality check, given the computational cost of it.

I think @Michael_Ilseman is not saying that values with the same hash are equal, but that a value needs to have a concept of equability to be hashable. Ie, he's not saying that a.hashValue == b.hashValue IMPLIES a == b, he's just saying that a is Hashable SHOULD IMPLY a is Equatable.

I would love to replace hashValue with this. I'm still trying to think of alternatives to magic, or if we can limit its scope somehow.

It's interesting because this should be possible, but introduces new @available constraints. Both protocols are entirely compatible with each-other. We would like to:

  • Add a protocol requirement, with a default implementation up to version X
  • Add a default implementation of the old requirement, from version X.

I know ABI stability is an important goal for the project, but I'm not sure when we can expect real versioning out of @available.

as I said before, I support the proposal but I'm uncomfortable with all the one-off magic we've been adding (in general).

Still thinking...

I'd say in that case, the (presumably precalculated) hash becomes a standin for the structure itself, so it can simply be the basis for that type's Equatable and Hashable implementation. Reifying the concept of identity like that is an important technique, and it does not preclude types from implementing Equatable — quite the opposite!

(Note that Hasher would not be a good choice to generate such checksums: the function it implements was not designed for this purpose and Hasher may switch to other algorithms in subsequent Swift releases, breaking any previously generated checksums. But Hasher is perfectly happy to be fed checksums rather than the real data.)

2 Likes

I did carefully consider this approach. I believe it would work fine for a regular protocol inside or outside the stdlib, even after we have stable ABI. Availability attributes already support this kind of protocol evolution reasonably well.

However, Hashable is special, because SE-0185 added compiler support for synthesizing conformances to it. Those synthesized conformances must always implement hash(into:), and having default implementations in the stdlib interferes with that. Doing the whole thing in the compiler really is the right choice in this case.

1 Like

Is there any reason such types can't or shouldn't be Equatable? That is, you're free to use the hash as a probabilistic comparison function (that's what it's for after all) and then decide it's not worth a full-precision comparison.

However, == on such types shouldn't just compare hashes. At the very least, hashing each large structure and then comparing hashes is more computational work than just comparing them directly.

1 Like

I'm not sure that the issue is really compiler synthesis. The language is not ABI stable, so all code must be compiled with the same compiler to be guaranteed to work. We can ensure everybody has the correct synthesised methods.

The actual issue is source compatibility for people who implemented Hashable manually (probably lots of people, since synthesis is quite recent). If we are talking about outright replacing the requirement, they would get non-conformance errors when compiling in Swift 5 mode, and have to migrate their code. There's no way around that.

There are less-magic options:

  • Fix-its/Migrator
    A kind of "tolerable" compiler magic (It just brings your code up to speed, once). We have a simple implementation in terms of the existing hashValue, so that should be easy.
  • A protocol refinement
    Naming might be awkward, but this community loves bikeshedding, it avoids all of these issues and fits within the current model very well. The compiler could automatically upgrade the protocol it synthesises anyway (you write Hashable, it synthesises HashVisitable or whatever it's called). Generic code doesn't need to care and can continue to use Hashable.
  • A new where constraint?
    I could imagine something like extension Hashable where implements(hashValue). That would be really nice...

The problem with this solution is that users will end up with both hashValue and hash(into:) implemented, which increases the risk of implementations becoming out of sync.

If we end up automatically synthesising that new protocol, then it is no less magic than @lorentey's solution while increasing the API surface area and discoverability.

I'm even less enthusiastic about this solution as it introduces new syntax, complexifying the language to solve one use-case.

I would prefer automatically generating hash(into:) and hashValue in the compiler for now, and wait until we get a good meta-programming/macro system that would allow us to remove those from the compiler.

An important feature of the compiler/stdlib is that it supports code written for previous versions of Swift. Swift 3 and 4 code can live together in the same binary today. Even if we deprecate hashValue in a future release, we will still have to support compiling code written for previous versions of Swift, without forcing users through a disruptive all-or-nothing migration step.

So we need to fully support compiling code that only implements hashValue. For a normal protocol, this should be easy enough to do by supplying default implementations, carefully gated to specific Swift versions:

extension Hashable {
  @available(swift, introduced: 3.x, obsoleted: y)
  public func hash(into hasher: inout Hasher) {
    hasher.append(self.hashValue)
  }

  @available(swift, introduced: 3.x, deprecated: y)
  public var hashValue: Int {
    var hasher = Hasher()
    hash(into: &hasher)
    return hasher.finalize()
  }
}

So far, so good. But then consider what happens when we need to synthesize Hashable in Swift 4 mode.

// Compiled with -swift-version 4
struct Foo: Hashable {
  let a: Int
  let b: String
}

Ideally, the synthesized implementation would look like this:

@derived func hash(into hasher: inout Hasher) {
  hasher.append(a)
  hasher.append(b)
}
@derived var hashValue: Int {
    var hasher = Hasher()
    hash(into: &hasher)
    return hasher.finalize()
}  

There are two problems here:

  • The synthesized body of hashValue is the same as the default implementation of it in Swift 5 mode; we're duplicating code between the stdlib and the compiler.
  • Far, far worse is that in Swift 4 mode, hash(into:) already has an implementation in the stdlib, so the compiler wouldn't consider synthesizing it at all. So Hashable synthesis falls into a recursive hole and doesn't work at all in Swift 4 mode.

This is really unfortunate. I see four ways to get out of this conundrum:

  1. Add magic to the compiler to recognize that we're compiling older code, and to do the actual hashing work in the synthesized hashValue implementation instead:

    @derived var hashValue: Int {
      var hasher = Hasher()
      hasher.append(a)
      hasher.append(b)
      return hasher.finalize()
    }
    

    I find this is an unpalatable solution -- Foo may be hashed as part of a larger structure, and in that case spawning a new hasher is not nearly as efficient as simply feeding Foo's components to the existing one. It also means that we need to have two distinct paths for the actual synthesis code (where we need to iterate over components), which is deeply unsatisfying.

  2. Add deep type checker magic to the compiler to ignore the default implementation of hash(into:) in this particular case, so that we forcibly generate a synthesized implementation. I actually implemented a variant of this in #14935. It did happen to work most of the time; however, it was such a shameful, fragile abuse of the compiler's type checker that I quickly retracted it.

  3. Remove both default implementations from the stdlib, and move them to the compiler instead. hashValue is always synthesized as the default implementation above. When we need to synthesize hash(into:), we can simply switch between the two alternative implementations depending on whether hashValue resolves to a synthesized or explicit implementation. This is what I ended up implementing in #15122, and it seems to be the cleanest approach. The compiler can easily produce deprecation warnings for certain combinations of implementations and language modes.

  4. Add a third, hidden requirement to Hashable, called _hash(into:), which is always synthesized by the compiler, and either forwards to hashValue/hash(into:) (if one of them has an explicit implementation) or implements the synthesis itself. This puts several extra twists on top of solution 3 above; in my opinion, it messes up the ABI for no good reason.

I believe solution 3 is the clear winner here; it provides the most satisfactory solution to both of the original problems. By necessity, it involves some compiler work, because SE-0185 added special language support for Hashable. But this is just a fact of life: stdlib and compiler work often goes hand in hand. I don't think changing that would be a viable goal here.

If you do have a better solution, help us implement it! The stdlib part is on the master branch today (although underscored), and #15122 would be a nice starting point for the synthesizer update.

All of this is implementation detail that may change without notice and is only tangentially related to the topic at hand. I suspect that in this topic, we should concentrate a little more on what we intend to achieve and a little less on how we're getting there.

Removing the hashValue requirement is not a viable option; the most we can do is to deprecate it. And even that probably requires some change to how we handle @available attributes on protocol members. In particular, getting a deprecation warning for hashValue implementations is not currently possible. (But it would be nice to get a warning in Swift 5 mode when hashValue conformance is resolved to anything other than a synthesized implementation.)

We shouldn't merely "tolerate" compiler changes; we should cherish them! The raison d'être of swift-evolution is to discuss changing the compiler in interesting new ways.

So far, I kind of assumed we can get away with just asking people to either rewrite hashValue to hash(into:) or (preferably) to just remove their custom implementation and let automatic synthesis do the work for them. Fix-its would work fine for that! I think there should be two of them, so that users can select the correct option:

  • Remove hashValue (when automatic synthesis is available for the declaration context)
  • Change the hashValue signature to hash(into:), leaving the body intact, ready for manual migration. (The body shouldn't be commented out, because an empty hash(into:) implementation would compile successfully and do the wrong thing.)

A (worse) alternative to full deprecation is to silently provide a default implementation of hashValue so that the compiler starts complaining about missing hash(into:) when people forget to implement anything. That and a documentation update would painlessly cover most of the migration in the long term. However, keeping hashValue around indefinitely would clearly be a mistake; I'd prefer to force people to change existing code at some point.

The Swift test suite includes a prototype implementation of this; in a previous life, some random developer implemented essentially the same thing as an external package, and this is also what @regexident proposed in his original pitch.

I strongly believe that this approach is simply not the right choice within the stdlib. The prospect of a naming discussion alone is a really strong argument against considering this route: we already have the perfect name for hashable things! Also, auto-conforming to the new protocol would truly be a one-off compiler hack.

Hashable needs to be a single protocol -- hashValue already caused far too much confusion, and throwing a second protocol (with a worse name!) in the mix would just make things even worse.

This is a really interesting idea! It needs to be fleshed out and discussed in a separate pitch. If it was implemented in a future language version, we could probably use it to migrate some parts of Hashable synthesis into the stdlib.

The most general solution to this problem is one we sorely need in other circumstances as well, and potentially not terribly onerous to implement.

We already have other circumstances in the standard library where (for convenience or backwards compatibility) two protocol requirements have default implementations that reference each other (heterogeneous generic operators and homogeneous ones; Comparable requirements and Strideable requirements). Users who stumble into this situation find that their "conforming" types compile but then crash at runtime.

The solution I'm thinking of is teaching the compiler to recognize these circular references as scenarios in which either requirement A or B must be implemented at minimum, rather than blindly permitting both default implementations to call each other endlessly. (Of course, this would have to be limited to default implementations that are inlinable--i.e., where the compiler can actually determine what's in the body of the implementations.) As a bonus, if one of these default implementations is marked as deprecated, the compiler might always recommend that the user implement the other one.

With ABI stability, such a feature would be hugely important in allowing us to evolve protocols in backward-compatible ways. For this particular circumstance, it would permit us to rip out custom compiler magic except where it pertains to synthesized conformance to Hashable.

5 Likes

Supporting either/or protocol requirements by providing default implementations based on the presence/absence of "direct" implementations for other requirements sounds like an intriguing idea. It needs to be explored further: For example, in the case of Hashable, what would extension Hashable where Self.implements(hashValue) mean, exactly? When is an implementation of hashValue not good enough for implements? Default implementations aren't currently marked as such.

I'd love to find out where this discussion takes us; however, it should be pitched in a new topic.

Would @available attributes not work in the usual case? As I explained above, the only reason Hashable evolution needs special compiler support is because it can be synthesized by the compiler. There are only a few protocols like that, and I really don't think we should bend over backwards to make it easier to evolve these. The need to replace a requirement in a synthesized protocol is an extremely rare thing.

I have wanted something like this for quite a while. I generally don't like to provide mutually recursive default implementations but have run into a number of cases where mutually exclusive default implementations would have been really nice. I'll start a new thread.

4 Likes

Thanks very much for the detailed replies on this; it's clear that you've thoroughly explored the implementation space. We'll continue that discussion in another topic.

I think I explained already why I dislike magic in general, but if there really is no other way... ¯\_(ツ)_/¯. It shouldn't stop us making things better.

Thank you everyone for the discussion. I collected what we learned into a formal proposal; please find it at the URL below:

https://github.com/apple/swift-evolution/pull/821

1 Like