Need Help Understanding Protocols and Generics

BigSur · June 23, 2020, 8:14pm

I like it~

jrose · June 24, 2020, 8:10pm

That's not it, and the sidetrack into talking about @_nonoverride probably didn't help. You said that you got it from another thread, but here's some hopefully helpful extra explanation.

Let's go back to the (near-)simplest example:

protocol P {
  var id: String { get } // A
}
extension P {
  var id: String { "P" } // B
  var idFromP: String { self.id }
}
protocol Q: P {}
extension Q {
  var id: String { "Q" } // C
  var idFromQ: String { self.id }
}

struct Z: P, Q {
  var id: String { "Z" } // D
}

Z().id

There are four declarations of properties called id here (labeled "A", "B", "C", and "D"), and there are four places where the compiler has to decide which id to use.

idFromP can call either A (dynamic dispatch through the protocol witness table for Self: P) or B (static dispatch). It prefers A because "that'll be the best possible choice for the concrete type".
idFromQ can call A (dynamic dispatch through the protocol witness table for Self: P, accessible through Self: Q), B (static dispatch, based on Q: P), or C (static dispatch). It prefers A for the same reason.
Conforming Z: P also does a lookup, choosing between B (because we're conforming to P), C (because we can see we're later conforming to Q), and D (right there on the concrete type). A isn't an option because we're trying to choose what will go into the protocol witness table for Z: P, which means choosing A would be an infinite loop! D of course gets chosen.
Actually calling id on an instance of Z can choose between all of A, B, C, and D. D again is the most specific.
You'll notice there's no choice for Z: Q. That's because there's no new requirement defined in Q, so the PWT for Q isn't going to contain an entry for id.

Now we can add in the original conditional conformance example:

struct Y<T> {}
extension Y: P {}
extension Y: Q where T: Equatable {}

This only adds one new decision point, at Y: P. The only valid single option here is B, so that's what goes in the PWT for Y: P, and therefore that's what gets called from idFromP.

I've emphasized "single" here because one possibility—that @Douglas_Gregor's brought up here and elsewhere—is to record more than one option, along with their conditions, and resolve them at some point at run time. That would need a fair bit of design, though, and we'd be trading off more run time to support it (and maybe also more optimization work by the compiler to try to avoid that cost).

Does this answer the questions about mixing class and protocol constraints, in either existentials or generics? No, it doesn't. I don't think we have good rules there and I'm not sure what they should be. But I do think that when we ignore classes and just look at protocol inheritance, we have a self-consistent and sensible model. It might just not be the one we want.

mattrips · June 27, 2020, 3:07pm

@jrose, sorry to be slow in getting back to you. Your answer is much appreciated, and gives great insight into the internal semantics of the construction and use of protocol witness tables. Still, I'm not quite asking the question correctly to get at the "reason why" that I'm hoping for.

So, I've given it (a lot of) thought, and have a simple, familiar example that should starkly frame the question.

protocol P { var id: String { get } }
extension P { var id: String { "P" } }

protocol Q: P {}
extension Q { var id: String { "Q" } }

struct X<T>: P {}
extension X: Q where T == Int {}

let x = X<Int>()
print( x.id,  (x as Q).id ) // "Q P"

Both expressions inside the print statement access an id property. Both expressions access the id property via the id protocol requirement.

In both expressions, an instance of the type X<Int> is used. That type conforms to protocol Q. In the first expression, the instance of the concrete type is used directly. In the second expression, the existential type Q is used, with the same instance of the concrete type wrapped inside of the existential.

There are two implementations of the id requirement available. For each expression, it is necessary to determine which implementation of id will serve as the witness for the id requirement.

Both expressions access the id requirement through the protocol Q interface.

Yet, they access different implementations of id. The first expression accesses the implementation of id declared on Q. The second expression accesses the implementation of id declared on P.

Is there a semantic reason as to why these two accesses of the same requirement via the same protocol yield two different results?

Isn't it true that a type's conformance to a protocol is an invariant determined at the point at which the declaration of conformance is made?

I understand the PWT algorithm that leads to the second expression ultimately being dispatched dynamically to P.id. I do not yet understand the algorithm that leads to the compiler statically dispatching the first expression to Q.id. More importantly, I do not understand why those algorithms should result in different witnesses being used for the same requirement.

Shouldn't the logic and results of protocol conformance be the same regardless of whether compile-time static or run-time dynamic dispatch is used?

dabrahams · June 27, 2020, 9:58pm

Yes, that's currently how it's defined. The problem is that X<Int>'s conformance to P is declared in this line:

struct X<T>: P {}

We want to say X<T> conforms to P differently when X<T> conforms to Q, but there's no way to express that using the current language semantics.

Yeah, but you're not using the X: P protocol conformance when you write x.id; that extension in Q is just a dangling overload of id that is never a witness to a P conformance. To observe the "static" dispatch behavior of that conformance, you need to pass x to something constrained to P.

mattrips · June 28, 2020, 5:43am

Actually, I would have liked to say that X<Int> conforms to Q, and has its own set of pairings of requirements and witnesses. I continue to bang my head against the (lack of) documentation wall. The documentation says:

A protocol can inherit one or more other protocols and can add further requirements on top of the requirements it inherits.

It seems to say that protocol requirements are inherited.

The notion of inherited requirements could lead a reasonable reader to believe that Q inherits the requirements of P, and then deals with those requirements as its own. And so, when X<Int> conforms to Q, that conformance would set up its own set of pairings of requirements to witnesses. The X<Int>: Q conformance would pair Q's inherited id requirement with the implementation of id that Q declares in its extension.

Apparently, that is not how the system works.

Unfortunately, the language documentation says almost nothing about how it actually works. As a whole, the documentation could include much more detail.

Two thoughts, here:

First, we have the same documentation problem. A reasonable reader of the documentation might expect that accessing a protocol requirement on a type that conforms to a protocol involves use of the protocol system. The documentation does not say one way or the other how that sort of access is handled, but there is an implication that the protocol system is directing the dispatch when a type accesses a protocol requirement that is not declared on the type itself.

Second, this dichotomy seems odd, and that oddity is at the core of the question that I'm asking:

Why use one set of rules (i.e., routine overload resolution) to handle the x.id access and a different set of rules (i.e., protocol conformance) to handle the access of id through the existential Q? The different rules sometimes lead to different results. Does the difference in how they are handled serve a purpose?

dabrahams · June 28, 2020, 3:33pm

Okay, my fault for introducing the vague term “want” and then attributing a single want to the vague entity “we,” but I'm not quite sure what you mean by that. What you've just described sounds like it means roughly what you get when you add this line to your example:

extension Q { @_nonoverride var id: String {get} } // evil

X<Int> (and everything else that conforms to Q) now has its own set of pairings of requirements and witnesses for its conformance to Q, separate from the pairings used for its conformance to P. I'm not sure why you'd want that; it means an efficient implementation of id that exists on X<Int> by virtue of its Q conformance won't get used from a P-but-not-Q-constrained context.

I continue to bang my head against the (lack of) documentation wall.

Okay, stop. Just stop. First of all we need your head to be in good working order so we can fix the very problem you're pointing to. Second of all, TSPL is not meant to be used as a technical reference, and never will be sufficient for that. Editorial policy for that document prioritizes being nonthreatening over precision. It's certainly not going to meet the standards of (language) lawyers. Not that there's anything wrong with that: books that are more narrative and casual fill an important niche. We just need something else, thus the project you and I have started.

The documentation says.

A protocol can inherit one or more other protocols and can add further requirements on top of the requirements it inherits.

It seems to say that protocol requirements are inherited.

Yes it does, but it's not that they're inherited as in “transferred upon death!” This inheritance is similar to the way non-final methods are inherited by classes. A subclass override of an inherited method describes how that method is implemented even when an instance of the subclass is handled as a base class in the static type system, in the same way that a default implementation of an inherited protocol requirement also describes how that method is implemented when an instance of a type conforming to that protocol is handed as an instance of a type the protocol refines.

Incidentally, we generic programmers (and for the most part, the language designers) don't talk about protocols this way. We say that a protocol can refine one or more other protocols and add requirements. That makes it clearer that there's an “is-a” relationship with the protocol being refined and that there aren't two disjoint sets of requirements some of which might have the same name.

Why use one set of rules (i.e., routine overload resolution) to handle the x.id access and a different set of rules (i.e., protocol conformance) to handle the access of id through the existential Q ? The different rules sometimes lead to different results. Does the difference in how they are handled serve a purpose?

The rules should line up in many more cases than they do today, but there is actually a good reason for the ability to create methods that hide implementations from less-refined protocols without overriding them. You can see it in action here:

// Module A
extension Collection where Element: Numeric {
  public func firstPositive() -> Element? {
    var count = 0 // debugging purposes
    defer { assert(self.count == count, "sanity check") }
    return self.filter { x in  // not explicitly lazy
        count += 1
        return x > 0
    }.first
  }
}
// Module B
print((-10...10).lazy.filter { $0 % 3 == 0 }.firstPositive())

The implementation of filter used in firstPositive had better not be the same as the one used outside, or the assertion will fail.