An inheritance problem

Ah, I cannot be so sure, because neither I see your code nor I can confidently state the exact behaviour if the correct dispatching would be implemented, because it is not rigorously defined yet. But despite that, I already don't see how this would incur performance degradation for static libs (which yours seems to be). Moreover, I think that in the examples I have shown the dynamic dispatch is unnecessary.
All I have talked about can be implemented completely in compile-time - compiler just has to resolve conformance graph and pick a single witness that is set in the current context (module) and then generate assembly - and nothing prevents it from inlining (as nothing prevents inlining of generic functions and methods now).
To demonstrate:

Prelude
extension A: Mixin {
  func fin() -> Int { return 1 }
}

extension B: Mixin {
  func fin() -> Int { return 2 }
}
struct Concrete: A, B {} //collision found. Resolve manually

struct Concrete: Mixin, A, B { func fin () -> Int { return Self.A.fin() } } //1
//now Concrete have implementation from A: Mixin

//or just redeclare it. Both impls from A: Mixin and B: Mixin would be
//suppressed
struct Concrete: Mixin, A, B { func fin () -> Int { 666 } } 
//ok, now implementation is in Concrete: Mixin
//module S
struct Concrete: Mixin {}
//module T
extension Concrete: Mixin { func fin () -> Int { return 99 } } 
//the most recent refinement
//a single redeclaration within one module should be ok

//at this point if you use module T 
//runtime shouldn't give a damn about all previous 
//refinements of Concrete, because in this module it is explicitly set 
//to be with exact implementation (return 99).
//it just has to construct a correct kind 
//(Archetype it is called?)

So I don't see how this would get it not better, but worse.

ps. I have some hypothesis about the way to nail it:
First, the runtime should be separated into two different domains: private domain for internal, private, fileprivate code etc), and public domain for public, open code. The witness picking should then have the following algorithm:

  1. Construct all the code that is internal for any external modules used
  2. Collect all most recent public declarations of witnesses in the current module
//module S
protocol A { func lp () }
struct Tau: A { func lp () {print("Tau from S")} }
//module R
import S
extension Tau: A { func lp () {print("Tau from R")} }
//this is the most recent declaration
//choose this witness and forget about S.Tau.lp()
  1. Descend the hierarchy of refinements and pick the most recent public witness from imported modules
//module S
protocol A { func lp (); func zx () }
extension A { func zx () {print("A from S")} }
struct Tau: A { func lp () {print("Tau from S")} }
Tau().lp() //"Tau from S"
Tau().zx() //"A from S"

//module R
import S
extension Tau: A { func lp () {print("Tau from R")} }
Tau().lp() //"Tau from R"
Tau().zx() //"A from S"
//this is the most recent declaration
//inherit zx() implementation
//choose new witness for lp() and forget about S.Tau.lp()
  1. If any collisions were found, throw an error and make the programmer to choose single witness manually
extension A: Mixin {
  func fin() -> Int { return 1 }
}

extension B: Mixin {
  func fin() -> Int { return 2 }
}
struct Concrete: A, B {} //Error, collision found. Resolve manually

struct Concrete: Mixin, A, B { func fin () -> Int { return Self.A.fin() } } //1
//now Concrete have implementation from A: Mixin
//Now compiler can eliminate unnecesary code for this particular type
//(Implementation from B: Mixin gets annihilated)

//or just redeclare it. Both impls from A: Mixin and B: Mixin would be
//suppressed and never get to assembly
struct Concrete: Mixin, A, B { func fin () -> Int { 666 } } 
//ok, now implementation is in Concrete: Mixin
//Destroy A: Mixin and B: Mixin impls

Regarding variable parameterization by existential, then I think the best way is to first:

  1. Track all assignments of all valid concrete types to it
  2. Construct a box with the biggest size that could be allocated on the stack (kind of how enums work) or put it on the heap
//module R
var u: A = Tau()
u.lp() //"Tau from R". Yes, the most recent is picked
u.zx() //"A from S"

To spill a bit about generic functions, behaviour that differs from this is basically inferiority of current implementation:

//module R
func useA<T: A>(_ value: T) {
  value.lp()
}
useA(Tau()) 
//if it prints "Tau from R" than we are golden
//otherwise it just a bold bug, because a witness is explicitly set to be 
{print("Tau from R")}