[Proposal] Static member lookup on protocol metatypes

xedin · November 12, 2020, 8:09pm

Let me try to explain why we need this in more technical detail.

Consider call as applyStyle(.bordered) from type-checker's perspective. Here is how it sees it this expression:

applyStyle<$T_param : Style>($T_base.Type[.bordered == $T_bordered]) where $T_<something> denotes a type variable as a placeholder for type to be inferred.

Parameter type in this case is $T_param and argument type is $T_chain_result (which is not visible here) which are connected through an argument conversion constraint. A valid solution in such case could only be formed if all of the type variables are inferred to be concrete types, what we know about the call is:

$T_param <conforms to> Style
$T_base.Type <should have a member named .bordered> which type is $T_bordered
$T_base <equals> $T_chain_result
$T_bordered <convertible to> $T_chain_result
$T_chain_result <argument conversion> $T_param

Type-checker could make progress towards inferring solution by trying to infer any of the listed type variables ($T_param, $T_base, $T_bordered, $T_chain_result), but unfortunately there are no concrete types here to replace any of the listed type variables with. And enumeration of possible types which could conform to Style would be efficient because - a. Such list is not available, b. it would mean checking every possible type in sequence and trying to disambiguate.

That's where the new proposal comes in. It suggests that we should try to propagate Style requirement through equality/subtype/conversion relationships down to $T_base. Based on that protocol conformance requirement, as a last resort, we could infer $T_base to be Style and try to look up any static members on protocol metatype to make forward progress. Now, once $T_base is bound to Style it's possible to lookup .bordered on it. After doing so we end up with updated constraints like so:

$T_base <bound to> Style
$T_bordered <bound to> BorderedStyle
$T_base == $T_chain_result
$T_bordered <convertible> $T_chain_result
$T_chain_result <argument conversion> $T_param

Here is where type-checker encounters another problem and what's why we'd have to make a lot more changes if we wanted to support unrestricted member reference on protocol metatypes:

$T_base == $T_chain_result and $T_bordered <convertible> $T_chain_result mean that $T_chain_result has to be (existential type) Style and be a concrete type convertible to BorderedStyle at the same time.

So to fit into an existing model proposal suggests that new references are only allowed if it's possible to replace $T_base with a concrete type which satisfies requirement of conforming to Style, implicitly replacing $T_base with BorderedStyle right after lookup, in such case would mean that all of the constraints are satisfied and type-checker could reach a solution.

I think this is a balanced step which allows to cover all of the sort-comings of the syntax without introducing anything new to the language and would allow for lifting of result type requirement in the future (by ranking such members found on conforming types lower) in expense of having additional members be attached to conforming types (which is also how protocols work in general).

Jumhyn · November 12, 2020, 8:24pm

Thank you for the thorough write-up, @xedin! That all makes sense.

Do we need to substitute a concrete type here in order to make forward progress here? Would it not be feasible to attempt binding $T_base (and $T_chain_result) to $T_param? In most cases that might be futile, but in this specific circumstance we would have:

$T_param <conforms to> Style
$T_param.Type <should have a member named .bordered> which type is $T_bordered

which seems like enough to make forward progress on the member constraint, since we could look up bordered in Style.

xedin · November 12, 2020, 8:29pm

Unfortunately yes (for now), the reality of a situation is that lookup could only be performed on a concrete type so type variable has to be bound before member constraint could be simplified. Once we get a way to associate type variable with possible set of bindings on every step (currently gathering is done separately) then we might be able to refactor simplifyMemberConstraint to avoid binding base, but this is more of an implementation detail than anything - when base comes concrete type in that case.

xedin · November 12, 2020, 8:31pm

Also not that $T_base conforms to Style is not an explicit constraint, it could be deduced through examining constraints in equivalence/subtyping chain, that's why I'm talking about bindings here.

xedin · November 12, 2020, 8:38pm

And one more thing here to address - type variables with different l-value requirements couldn't be eagerly merged together that's why constraint between base and result of the chain isn't eagerly simplified out in type variable equivalence.

Jumhyn · November 12, 2020, 8:44pm

Gotcha, that's basically what I was wondering.

Am I wrong in thinking that when (if?) we could proceed with lookup without binding $T_base to a concrete type, a solution to the problem posed in this proposal would sort of... naturally fall out of that change?

In any case, something I'd like to see in the proposal is a fully-explicated version of the new rule that will govern implicit member chains. As I understand it, the current rule, in its full complexity, is:

For an implicit member chain .member1.(...).memberN with contextual type T, the expression will behave as though the programmer had written T.member1.(...).memberN, with the following exceptions:

If T is an optional type U, then member1 may be looked up in U if it cannot be found in T.

If T is a generic type S<...>, then generic parameter inference may proceed as though the user had written S.member1.(...).memberN.

I still can't quite wrap my head around the precise new behavior that this proposal would adopt, so I'd love to see it spelled out as explicitly as possible.

xedin · November 12, 2020, 8:49pm

How about this:

If T is a generic parameter conforming to a protocol P or protocol composition e.g. (P & Q), then generic parameter inference may proceed as through user had written P.member1.(...).memberN or Q.member1.(...).memberN, where result type of 'member1' should conform to contextual protocol or protocol composition type.

anandabits · November 12, 2020, 8:54pm

I understand how this fits into the existing _implementation_well. It's a lot less clear to me why this is the best programming model. Why does it make sense for these members to live on conforming types? Why isn't it better for the protocol metatype to have it's own namespace that is independent of conforming types?

xedin · November 12, 2020, 9:04pm

We are trying to fit into existing model and not just implementation as you mentioned, this member behavior is something which already exists and IHMO could be improved incrementally upon. Leading dot syntax dictates the relationship between base and result types etc. As I tied to express in my previous reply I think what we propose here is a good incremental step forward instead of trying to take all-or-nothing approach, I also think it would take a higher level proposal about protocols to make further progress here and lift result type requirement, we'd also have to re-evaluate what would it mean for leading dot syntax as well as other language features.

ricketson · November 12, 2020, 9:18pm

I think we would submit that defining these on the metatype directly would be better, if that were possible, and we can update the pitch to make that more clear.

However, that is not likely to be possible anytime soon due to the large technical lift required to enable that (as @xedin has addressed). The proposed solution is an incremental step, making lookup available in more existing places than it is today, which I think is generally a good thing, and poses few downsides for the language itself.

The question for libraries is: is it a good idea to leverage this extension to static member lookup to improve the syntax for generic APIs, as mentioned in the pitch, instead of waiting for metatype extensions (if they ever arrive)? We believe it is:

It is a meaningful improvement in certain use cases as covered in the pitch.
If metatype extensions were made possible in the future, there is an evolution path that would allow us to deprecate-and-replace these members with new ones in a source compatible way, without generating warnings in non-edge cases, by having those metatype extensions be preferred during lookup.
There are downsides, like the availability of these members on conforming types. However, while odd, we don't consider the existence of those symbols to be harmful, and there are potentially other solutions in tooling that could limit their impact on documentation and autocomplete in the future.

anandabits · November 12, 2020, 10:08pm

Thanks for laying out the rationale behind taking this incremental step. I can support it from that perspective. Hopefully metatype extensions will be tackled someday and library authors will eagerly migrate to them where appropriate.

Is this true in general, or only true of the intended dot shorthand syntax usage? For example, wouldn't the absence of BorderedStyle.bordered technically be a source break even though it is not an intended use?

I would consider them mildly harmful in some contexts. For that reason, I think the return type restriction is actually a good thing. I don't think we want people using the protocol as a namespace for arbitrary static API related to the protocol itself (rather than conforming types) until that API can be isolated to the metatype. I think a narrow carve out for advanced library developers to support dot shorthand is as far down this path as we should go.

Jumhyn · November 12, 2020, 10:17pm

Okay, that begins to make a little more sense... though I'd like it much better if we placed the conformance requirement on memberN rather than member1 (which, incidentally, should already be the case since the final member type must be convertible to the context type).

I'm also just now registering that the P.member syntax is made possible by this proposal and not earlier. I don't love that under this proposal, writing P.member doesn't actually mean P.member. It feels very strange to me to have this get rewritten to S.member under the hood (where S is the actual concrete type of member).

ETA:

Also, what about weird cases like

protocol P {}
struct R: P {}
struct S: P {}

extension P {
    static var bar: S { S() }
}

extension P where Self == S {
    static var foo: S { S() }
}

extension P where Self == R {
    static var foo: R { R() }
}

func takesP<T: P>(_: T) {}

takesP(.foo)

? Will this just be an "ambiguous use of 'foo'" error?

xedin · November 12, 2020, 10:48pm

Yes, this is because there is no witness to form a valid reference on the protocol type itself, it has to be a valid concrete type which conform to the protocol P. That's why I think to enable actual use of protocol types here we'd need to address the design of the protocols.

The example you have posted would indeed be "ambiguous use of 'foo'" because there is absolutely no context to figure out which "foo" to use here even if there was no restriction on the result type and no re-writing under to hood - both members would form a valid solution because Self is a existential metatype so it's always replaced with underlying conforming type. The observation though is that such members always have different names based on their type name, so it should not be a big issue in practice.

Jumhyn · November 12, 2020, 10:55pm

Right, I understand why this transformation is necessary under this design, I just don't think it's good that P.member ends up referring to something other than a member named member on P.Protocol. While it may not actually preclude future evolution around protocol metatype extensions, it certainly will result in a confusing state of affairs since P.member could refer to member on P.Protocol, or a member of some arbitrary S.Type where S: P. That seems like a pretty confusing model to me.

anandabits · November 12, 2020, 11:00pm

I think it would be reasonable to ban access of these members through the protocol metatype and only allow usage of them through contextual shorthand (and on the concrete types, although that usage would be discouraged).

Jumhyn · November 12, 2020, 11:04pm

Yeah, I like this idea. In fact, if the rule is updated to have the base take on the concrete type of the final member of the chain (rather than the first member of the chain), then implementation concerns aside I think this ends up producing the same result as my "expected" behavior that I outlined a few posts above.

xedin · November 12, 2020, 11:19pm

Note that first and last are effectively equivalent here because last member is convertible to context and base is supposed to conform to it and be equal to base transitively through base == chain_result requirement.

Jumhyn · November 12, 2020, 11:22pm

Ah, by "first member of the chain" I really did mean the result of the first member access (not the base). I.e., the following should work:

protocol P {}
struct R: P {}
struct S {
  var r = R()
}

extension P {
  static var s: S { S() }
}

let _: P = .s.r // equivalent to 'let _: P = R.s.r'

xedin · November 12, 2020, 11:23pm

Why does it matter at all what type-checked AST would look like? For all intents and purposes S.foo means referencing on a member declared on a protocol, what happens to make sure reference work is secondary if we state that restriction is that result of such member should conform to the declaring protocol.Do you know any expression where that would matter?

xedin · November 12, 2020, 11:26pm

I'd disagree on that since I think a better rule is that result type of a member declared on the protocol should conform to that protocol.