Improved operator lookup semantics

Something else. As @xwu mentions, this is the payoff that never materialized due to structural concerns in the type system. I'm arguing (at the limit) for something like this:

protocol LessThanable {
  // This is not spelled "func <" and not injected into the global scope!!
  static func lessThan(a : Self, b : Self) -> Bool
}

// There can be only one, and it shall be (T,T)->Bool   (*)
func < <T : LessThanable>(a : T, b: T) -> Bool {
  return T.lessThan(a: a, b: b)
}

// not actually allowed :(  - you know what I mean.
extension BinaryInteger : LessThanable {
  static func lessThan(a : Self, b : Self) -> Bool {
    ...
  }
}

Similarly you do that with FP types, and all of the other concrete implementations of the types. The point is that the compiler does name lookup of < and finds exactly one match. This means that the exponential search of various possibilities has to consider exactly one operator instead of N when it sees "42 < x".

Now, we have a complication here: for comparisons (but not normal arithmetic) we went down the path of allowing mixed operands to comparisons (which was perhaps a bad idea in hindsight). This means you need a second mixed operator like:

func < <T, S>(a: T, b: S) -> Bool {
   ...
}

It is unfortunate that we have to support 2 implementations (meaning we get a disjunction in the constraint solver) but one disjunction with 2 possibilities should be FAR FAR Better than a disjunction with N possibilities - each of which need to be exponentially searched. Also, this issue doesn't affect + or * or other non comparison operations.

-Chris

2 Likes

Why not:

protocol LessThanable {
  // This is not spelled "func <" and not injected into the global scope!!
  func lessThan(b : Self) -> Bool
}

// There can be only one, and it shall be (T,T)->Bool   (*)
func < <T : LessThanable>(a : T, b: T) -> Bool {
  return a.lessThan(b: b)
}

Which is shorter and reads better.

Do we know how high the scores for selected solutions usually get? Is it a thing where it's usually an exact match, occasionally has a couple conversions, and once in a blue moon requires a ton of finagling, or are there usually a couple little conversions in most expressions? Is there a point of diminishing returns where, once we start looking at solutions with a particular score, we're almost certainly wasting our time looking for a solution that isn't there?

I ask because I could imagine us evaluating all solutions with a score of zero, then all with a score of one, then all with a score of two, and so on. Basically, the very moment you realize an overload is going to require more type fudging than the solutions we're currently exploring, set it aside for later. That might make sense if scores are usually low, but probably won't if viable high-score solutions are common.

How often does ranking break ties? Can we move more of the ranking rules into the scoring system so the constraint solver can use this information to guide its decisions earlier?

Also, has someone tried dragging a folder full of images of correctly type-checked ASTs into Create ML? :exploding_head:

That sounds a lot like A*, and given we're optimizing a goal function as well as trying to create consistent solutions, that kind of approach might be really good.

// There can be only one, and it shall be (T,T)->Bool   (*)

Would this literally mean "there cannot be another func < (...)", or would this mean "the standard library only defines this one to keep complexity low?
I.e. would it be possible to still use other overloads of this operator?
Otherwise, this would make tests ala expect(n) < 5 impossible (provided e.g. by Nimble).

I don't really get why the compiler has such a problem. If we (programmers) write:

func < (a : SomeType, b: SomeType) -> Bool { ... }

Then the compiler is free to transform that into (assuming LessThanable and < for LessThanable's as suggested above exist):

extension SomeType {
    func lessThan(b: Self) -> Bool { ... }
}

IE Why can't the compiler do the work for us?

Because this work is done before compilation. What you're saying is easy to do when you have type information. But this phase (type checking) is the step that gives you that info.

The idea that Chris is proposing, if I got it right, makes this type info search easier, by using a clever protocol hierarchy.

No I am not saying the compiler should do more when looking up the type but should do more when the operator is declared. As you said when it is doing the lookup it might also be resolving the type. But when you declare < you also declare the types (or their bounds if they are generic). Therefore the compiler can do the transformation for you.

2 Likes

To add to this, I actually ran into a pretty annoying case a little while back with Nimble where the compiler was forced to decide which == to use between Nimble's == and one that I'd defined. There ended up being no way to which module's == to use, so I defaulted to using a function instead of operator overload internally. It would have been nice to be able to do something like a MyModule.== b or something.

I've been busy with other things but am back to looking at this.

I hadn't considered this exactly (in part perhaps because as you point out you cannot actually have that extension), but did consider the second alternative that I posted.

Unless we ban overloading, I don't see how this is realistically going to help the situation since existing libraries already overload operators that we use in the stdlib, and do so with both same-type and mixed-type variants. We do actually have mixed-type overloads + in the stdlib already (although not for math types). One of the relatively common reports we get is that code works until the user adds "import Foo", and then things break because Foo added (for example) three more + operators.

To be fair my pitch doesn't just magically fix that case, either. It requires some potential restructuring on the part of library authors if they've defined global operators. What it does attempt to do is provide a predictable mechanism that makes it possible to implement libraries that scope down operator lookup so that in the common case we'll find only a single overload for most operators that appear in the expression (and in the cases where it finds multiple, we should usually be able to "fail fast" after looking at the argument and parameter types).

Perhaps @ben-cohen or @moiseev have some input on the approach you're suggesting?

We currently bail out as soon as we see a score that is worse than the best solution we've already found. This is honestly a really bad idea, but we rely on it today to be able to typecheck things in a reasonable amount of time. I'm looking at restructuring the way the solver works to try to speed up some of the non-operator cases and hope to remove this optimization as part of that work. The problem you run into is that we basically have no way to predict how things will typecheck without knowing the intimate details of all these hacks and bailouts. I'm hoping to do restructuring things to be more easily understood and easier to explain.

2 Likes