Rationale for Swift's overload resolution?

Jens · March 11, 2018, 9:48am

I assume that there are previous discussions on this, but I couldn't find any, so pointers to such would be appreciated.

As an example, this program will compile and work as expected:

import Foundation

func log(base: Double, of value: Double) -> Double {
    return log(value) / log(base)
}

But the exact same function is not accepted by Swift if it's not declared in global scope, for example:

import Foundation

func enclosingFunc() {
    func log(base: Double, of value: Double) -> Double {
        return log(value) / log(base) // Error: Missing argument for parameter 'of' in call
    }
}

And another example:

import Foundation

struct EnclosingType {
    static func log(base: Double, of value: Double) -> Double {
        return log(value) / log(base) // Error: Use of 'log' nearly matches global function 'log' in module 'Darwin' rather than instance method 'log(base:of:)'
    }
}

Related bug reports:
SR-7143
SR-7018

SR-7018 was closed because this works as designed, and there is this explanation:

Overload resolution works on a set of functions in the same context, so in this case once the nested function is found the compiler stops looking for other overloads.

Given this explanation, I can understand the current behavior, but I still think it's very counterintuitive, and I'm not sure I understand why it has to be like this.

Nevin · March 11, 2018, 4:02pm

This looks a lot like SR-2450.

@Slava_Pestov mentioned that we might be able to solve most of the common occurrences with a fairly simple change:

To me that sounds like the right thing to do in light of SE-0111, which made argument labels part of the function’s name.

Jens · March 11, 2018, 5:28pm

Thanks! That thread answers my question.

Slava_Pestov · March 13, 2018, 6:21am

We were discussing this today again, because it also comes up with self.min() vs Swift.min() on various types. Unfortunately the name lookup change I described on Twitter is not quite sufficient, because of default arguments. Something similar could still work, but would require more design.

Nevin · March 13, 2018, 6:25am

How about a “two-phase” approach?

First try the current way (preferably augmented by including argument labels), and if it resolves to a function then we’re done.

If not, then expand the search to include global functions and try again.

Slava_Pestov · March 13, 2018, 6:28am

Something like that would be possible, but the main difficulty is that it would worsen existing exponential behavior in the type checker. An expression containing multiple overloaded unqualified references would require evaluating each combination of overload choices. If there are multiple valid combinations we would need to rank the solutions and pick the best one (or diagnose ambiguity). We already do this with overloaded references at the same level of nesting, but overloading across multiple levels of nesting would increase overload set sizes further.

hlovatt · March 13, 2018, 6:31am

Why not get rid of global function and always make them static functions associated with a type that is the type of one of their arguments. So the original example would become:

extension Double {
    static func log(base: Double, of value: Double) -> Double { ... }
}

That way you never have to search global scope, because there is nothing in global scope! Hence should be quick.

Nevin · March 13, 2018, 6:41am

Technically, it would only risk exponential behavior in cases that are currently compiler errors. So it is a change from “does not compile at all” to “compiles, though perhaps slowly”.

After all:
1. If the existing algorithm succeeds, then there is no change.
2. If the existing algorithm fails, and there are no additional candidates, then no extra type-checking needs to be done.
3. Only when the existing algorithm fails *and* there are additional candidates, does the type-checker do more work.

Since case 3 is currently a compiler error (because the existing algorithm failed), there is no regression in type-checking.

Edit:

The other situation is when there really is no matching function, and the compiler *ought* to raise an error. Then it might take longer for the compiler to reach that conclusion. So a possible regression in type-checking statements that in fact do not type-check.

Slava_Pestov · March 15, 2018, 7:22am

Unfortunately that is not quite right, because even if we found one possible solution, we have to keep looking to find all solutions, and rank them.

Nevin · March 15, 2018, 8:19am

Perhaps I did not explain clearly, or perhaps I misunderstand the existing process. My understanding is:

1. For each function call in an expression, we currently look for functions with the correct name, starting in the local scope. If no function with that name is found in the local scope, then we look in the next-higher enclosing scope, and so forth.

2. As soon as we reach a scope with at least one function having the correct name, we take all functions in that scope with the correct name as our candidates for that call. We then stop looking for candidates. If no candidates are found in any scope, we raise an error.

3. Then, after repeating the above process for all function calls in the expression, we attempt to solve for valid combinations of types and functions from among our candidates. If no solutions are found, we raise an error. Otherwise, we rank the solutions. If there is exactly one top-ranked solution, we use it. Otherwise we raise an ambiguity error.

Is this reasonably close to correct?

If so, my idea is:

Do everything exactly as above, except if no valid solutions are found in paragraph 3 then, instead of raising an error, repeat the scope-widening process for all function calls in the expression all the way to global scope. If at least one new candidate was found, then repeat the solving process and continue from there.

• • •

As you can see, the process stays exactly the same as it currently is, *except* in the case where no valid solutions were found in paragraph 3. In that case, there is currently a compiler error. And it is exactly and exclusively there that I am saying we could do some extra work to look for more candidates.

Thus I claim, for expressions which currently compile, everything will work exactly as it does today, with no extra compilation time. The only change is in certain cases which *don’t* compile today: *after* the existing attempt to solve them fails, *then* go back and look for more candidates and try again.

There may be other problems with this approach, possibly involving how to rank solutions using candidates from different scopes. But taking more time to compile code which already compiles is not one of them.

GarthSnyder · March 15, 2018, 8:22pm

Once the behavior is implemented, it ceases to be a rescue pattern and becomes just part of the overload resolution system. Some proportion of future code will intentionally use and rely on it. It doesn't seem unreasonable to consider performance here.

Nevin · March 15, 2018, 9:48pm

All right, coming back to your idea for modifying name-lookup, that is a good point about default arguments making the full name not match. Similarly, trailing closures can also hide argument labels. So, when considering a function, how about this:

If the function’s base name does not match the call site, skip it.
If the function has no default arguments, and the call site does not use a trailing closure:
2a. If the function’s full name matches the call site, make it a candidate.
2b. Otherwise, skip it.
If the call site uses a trailing closure and the function does not accept them, skip it.
If the sequence of argument labels at the call site is a subsequence* of the argument labels of the function, where all missing labels have default arguments, make that function a candidate.

Does that cover all the important points?

* Using “subsequence” in the mathematical sense, not the “contiguous slice” sense.

• • •

In thinking about default arguments, I made a simple test:

func foo(a x: Int = 0, a y: Int = 1) {
    print("\((x, y))")
}

foo(a: 2)   // prints "(2, 1)"

It seems that multiple parameters with identical argument labels can all have defaults, and the compiler will greedily use the first match it finds. So we don’t have to worry about ambiguity in that regard.

Slava_Pestov · March 16, 2018, 5:17am

Yep.

Instead of "for all function calls", you would have to perform your "widening" one function call at a time, and try all possible combinations.

However, this approach introduces valid code that was not valid before, which may now require more time to type check.

Nevin · March 16, 2018, 4:20pm

Fair enough, I’ll toss out the two-step approach.

• • •

Does the name-lookup strategy I outlined above look workable?

In light of the foo(a:a:) example showing the compiler matches arguments greedily, we can simplify step 4 and preserve that behavior by checking only the first way the call-site’s argument labels form a subsequence of the function’s.

…and it probably needs another caveat saying “if the call-site uses a trailing closure, ignore the function’s last argument label.”

• • •

Putting it all together:

If the function’s base name does not match the call site, skip it.

If the call site uses a trailing closure {

If the function does not accept trailing closures, skip it.
Consider the sequence of argument labels of the function, dropping the final (trailing closure) label.

} else {

If the function’s full name matches the call site, make it a candidate.
Consider the sequence of argument labels of the function.

}

Greedily match the argument labels from the call site into the sequence from step 3:
4a. If they are not a subsequence, skip the function.
4b. If they cover all non-defaulted parameters, make the function a candidate.

Nevin · March 16, 2018, 9:06pm

It has come to my attention that there is an O(n²) algorithm to solve this, where n is the number of parameters to the function. That is, we can answer, “Can the argument labels at the call site be matched with argument labels of the function, in order, covering all non-defaulted parameters, and if so how?” in worst-case quadratic time.

In particular, we currently face a compiler error in scenarios like this:

func foo(a x: Int = 0, a y: Int) {
    print("\((x, y))")
}

foo(a: 2)   // error: missing argument for parameter 'a' in call

But we could make the compiler correctly ascertain that there is exactly one way for that to be a valid call (namely, through the second parameter).