Glad you asked -- I probably should have talked about this in my post.
So to begin with, every generic signature is in math terms is finitely presented. This means the original number of generic requirements written by the user is necessarily finite.
Starting from a finite set of requirements, we build a confluent rewrite system, that is, one where a canonical type can be computed in a finite number of steps by applying rules in any order.
The process of building a confluent rewrite system from a finite set of initial requirements is not guaranteed to terminate, so it runs up to a specified number of iterations, failing with an error if the iteration limit is exceeded. If the process succeeds, let's call the original set of requirements convergent. So we have this set inclusion to begin with:
- Convergent generic signatures ⊂ Finitely-presented generic signatures
So let's try the undecidable example from my original post:
protocol Undecidable
where A.C == C.A,
A.D == D.A,
B.C == C.B,
B.D == D.B,
C.E == E.C.A,
D.E == E.D.B,
C.C.A == C.C.A.E {
associatedtype A : Undecidable
associatedtype B : Undecidable
associatedtype C : Undecidable
associatedtype D : Undecidable
associatedtype E : Undecidable
}
This is rejected with a somewhat cryptic error message:
undecidable.swift:1:10: error: cannot build rewrite system for protocol; rule length limit exceeded
protocol Undecidable
^
undecidable.swift:1:10: note: failed rewrite rule is [Undecidable:A].[Undecidable:B].[Undecidable:D].[Undecidable:C].[Undecidable:C].[Undecidable:C].[Undecidable:E].[Undecidable:B].[Undecidable:A].[Undecidable:A].[Undecidable:E].[Undecidable:C].[Undecidable:E] => [Undecidable:A].[Undecidable:B].[Undecidable:D].[Undecidable:C].[Undecidable:C].[Undecidable:C].[Undecidable:E].[Undecidable:B].[Undecidable:A].[Undecidable:A].[Undecidable:E].[Undecidable:C]
protocol Undecidable
^
So we know there are undecidable generic signatures that we simply cannot reason about, and we're limited to decidable generic signatures. This gives us another set inclusion in the middle:
- Convergent generic signatures ⊂ Decidable generic signatures ⊂ Finitely-presented generic signatures
However, the undecidable cases are arguably not interesting, since they're extremely hard to construct and would never arise in practice.
The question is, are there decidable generic signatures that are not convergent? The answer is unfortunately "yes". Indeed, there is no single algorithm that can solve the word problem (in our lingo, compute canonical types) for every decidable monoid (generic signature). Everything we do is necessarily an approximation.
I'll show some examples of those shortly, but let's try to refine the set inclusions from the other end first.
First of all, let's say a generic signature is finite if there are only finitely many unique type parameters. (This is a stricter condition than there only being finitely many requirements!). A finite generic signature is always convergent:
- Finite generic signatures ⊂ Convergent generic signatures ⊂ Decidable generic signatures ⊂ Finitely-presented generic signatures
Any generic signature that does not involve a protocol with a recursive conformance is always finite.
A signature with a recursive conformance is finite if the recursive conformance is "tied off" with a same-type requirement. For example,
protocol Z5 {
associatedtype T : Z5 where T.T.T.T.T == Self
}
The above protocol only has 5 distinct type parameters -- Self, Self.T, Self.T.T, Self.T.T.T, Self.T.T.T.T -- so any signature with a conformance to this protocol (eg, <T where T : Z5>) is finite.
(Incidentally, the GenericSignatureBuilder had trouble with this protocol, showing that it's view of the world was incorrect even for finite generic signatures. For reasons I never understood and no longer have any reason to investigate further, the "Z3" example works, but "Z4" and "Z5" do not).
Generalizing further, a recursive conformance that is completely unconstrained also gives rise to a convergent rewrite system, despite there being infinitely many type parameters now. The SwiftUI View protocol is one example:
protocol View {
associatedtype Body : View
}
If I have a function generic over Views with the signature <T where T : View>, then I have infinitely many type parameters: T, T.Body, T.Body.Body, ... etc. However a confluent rewrite system can still be constructed.
Another example is the Collection protocol's Indices type. There are some requirements placed on it, for example Indices.Element == Index, but the recursive expansion of Indices itself goes on forever, so if you have <T where T : Collection>, then T.Indices, T.Indices.Indices and so on are all distinct types. (Arguably, this was an oversight in the design; I highly doubt any concrete conforming type actually has unique instances for each of those nested types in practice). While there are infinitely many unique type parameters here, there is a confluent rewrite system with finitely-many rewrite rules.
So this means that in order to break the confluent rewrite system formalism, you need a generic signature that simultaneously:
- involves a protocol with at least two recursive associated types, and same-type requirements between them
- despite the same-type requirements, still has infinitely many unique type parameters
(Note that these conditions are necessary, but not sufficient. There are plenty of examples that satisfy both but still work.)
The simplest example I've seen is the following:
protocol XYX where X.Y == X.Y.X {
associatedtype X : XYX
associatedtype Y : XYX
}
With this protocol, completion will construct an infinite sequence of rules:
Self.X.Y.X => Self.X.Y
Self.X.Y.Y.X => Self.X.Y.Y
Self.X.Y.Y.Y.X => Self.X.Y.Y.Y
...
If you try to compile this protocol in Swift 5.5 , it appears to work! So how much expressivity did we lose when the Requirement Machine was introduced? Well, in reality, not much, because if you play around with it a bit, you'll see the GenericSignatureBuilder doesn't actually reason about it correctly, failing to discover the longer equivalences:
protocol XYX where X.Y == X.Y.X {
associatedtype X : XYX
associatedtype Y : XYX
}
func sameType<T>(_: T.Type, _: T.Type) {}
func foo<T : XYX>(_: T) {
// The GSB allows this
sameType(T.X.Y.Y.Y.X.self, T.X.Y.Y.Y.self)
// The GSB does not allow this, even though it should work
sameType(T.X.Y.Y.Y.Y.X.self, T.X.Y.Y.Y.Y.self)
}
Also while this protocol looks simple, I don't believe it is as simple as it seems. You have two recursive conformances on the associated types X and Y, and a non-trivial same-type requirement between a nested type of length 2 and 3. If someone can think of a practical use-case for this, involving a conforming concrete type, I'd be curious to see it.
However, this protocol certainly does have a decidable word problem (algorithm for finding a canonical type). You look for occurrences of "X" that appear after "X.Y" followed by zero or more "Y"'s, and delete the final "X".
Another interesting thing is that this protocol shows that the choice of the initial requirements can affect whether completion succeeds or not. Let's try adding a third associated type W that is equivalent to X.Y:
protocol XYX where X.Y == X.Y.X, X.Y == W {
associatedtype W : XYX
associatedtype X : XYX
associatedtype Y : XYX
}
Incidentally, this also shows that the word problem here is decidable; by fiddling with the initial set of requirements, we've constructed an isomorphic signature that has a convergent rewrite system.
Amusingly, if you run this new version with -requirement-machine-protocol-signatures=verify, it hits the verify assertion because the GSB produces a ridiculous "minimal" signature:
RequirementMachine protocol signature minimization is broken:
Protocol: XYX
RequirementMachine says: <Self where Self.[XYX]W == Self.[XYX]W.[XYX]X, Self.[XYX]X : XYX, Self.[XYX]Y : XYX, Self.[XYX]W.[XYX]X == Self.[XYX]X.[XYX]Y>
GenericSignatureBuilder says: <Self where Self.[XYX]W == Self.[XYX]W.[XYX]X, Self.[XYX]X : XYX, Self.[XYX]Y : XYX, Self.[XYX]W.[XYX]W == Self.[XYX]W.[XYX]W.[XYX]X, Self.[XYX]W.[XYX]X == Self.[XYX]X.[XYX]Y, Self.[XYX]X.[XYX]W == Self.[XYX]X.[XYX]W.[XYX]X, Self.[XYX]Y.[XYX]W == Self.[XYX]Y.[XYX]W.[XYX]X, Self.[XYX]W.[XYX]W.[XYX]W == Self.[XYX]W.[XYX]W.[XYX]Y.[XYX]X, Self.[XYX]W.[XYX]W.[XYX]X == Self.[XYX]W.[XYX]Y.[XYX]X, Self.[XYX]W.[XYX]Y.[XYX]X == Self.[XYX]W.[XYX]Y.[XYX]X.[XYX]X, Self.[XYX]X.[XYX]W.[XYX]W == Self.[XYX]X.[XYX]W.[XYX]W.[XYX]X, Self.[XYX]Y.[XYX]W.[XYX]W == Self.[XYX]Y.[XYX]W.[XYX]W.[XYX]X, Self.[XYX]Y.[XYX]X.[XYX]W == Self.[XYX]Y.[XYX]X.[XYX]W.[XYX]X, Self.[XYX]W.[XYX]W.[XYX]Y.[XYX]X == Self.[XYX]W.[XYX]Y.[XYX]W.[XYX]X, Self.[XYX]W.[XYX]Y.[XYX]X.[XYX]X == Self.[XYX]X.[XYX]Y.[XYX]W.[XYX]X, Self.[XYX]X.[XYX]W.[XYX]W.[XYX]X == Self.[XYX]X.[XYX]W.[XYX]Y.[XYX]X, Self.[XYX]X.[XYX]W.[XYX]Y.[XYX]W == Self.[XYX]X.[XYX]W.[XYX]Y.[XYX]W.[XYX]X, Self.[XYX]Y.[XYX]X.[XYX]W.[XYX]W == Self.[XYX]Y.[XYX]X.[XYX]W.[XYX]W.[XYX]X, Self.[XYX]X.[XYX]W.[XYX]Y.[XYX]W.[XYX]X == Self.[XYX]X.[XYX]W.[XYX]Y.[XYX]Y.[XYX]X, Self.[XYX]Y.[XYX]X.[XYX]W.[XYX]W.[XYX]X == Self.[XYX]Y.[XYX]X.[XYX]W.[XYX]Y.[XYX]X>
Please submit a bug report (https://swift.org/contributing/#reporting-bugs) and include the project and the crash backtrace.
Of course in this case, it's not the Requirement Machine that's broken, it's the GSB. If you pass -requirement-machine-protocol-signatures=on, the GSB is bypassed, and the code compiles and both calls to sameType() in the body of foo() type check as expected.
Now, a more realistic example of a non-convergent rewrite system occurs when you have two infinite recursive conformances that are "smashed together". For example,
protocol P1 { // this is totally fine
associatedtype A : P1
}
protocol P2 { // this is totally fine
associatedtype A : P2
}
protocol P3 { // this will not work
associatedtype X : P1 & P2
}
Unfortunately, this is actually the only example I've seen where the GSB handles things correctly, but the Requirement Machine's formalism fails. What happens is that when building the rewrite system for <T where T : P3>, the completion procedure will try and fail to introduce an infinite series of same-type requirements which notionally would be written as follows, if we had a syntax to disambiguate associated types with the same name from different protocols:
T.[P1]A == T.[P2]A
T.[P1]A.[P2]A == T.[P1]A.[P2]A
T.[P1]A.[P1]A.[P2]A == T.[P1]A.[P1]A.[P1]A
...
To resolve this, you either need to define a third protocol that inherits from both P1 and P2 and imposes the additional requirement on the inherited associated type:
protocol P1and2 : P1, P2 where A : P1and2 {}
protocol P3 { // this is fine
associatedtype X : P1and2
}
Or, you tie off the infinite recursion in some spot, either P1, P2 or P3:
protocol P1 {
associatedtype A : P1
}
protocol P2 {
associatedtype A : P2
}
protocol P3 {
associatedtype X : P1 & P2 where X.A.A.A == X // or whatever
}
This is unfortunate but I haven't been able to come up with a good solution that preserves the properties we want. Swift 5.6 included a hack that made this work in the simplest examples (such as the P3 written as above) but the hack did not generalize to the minimization algorithm, and it also failed in a real-world setup that was slightly more complex than P3.
Here is the one real-world example where I've seen this come up, in RealmSwift -- [SR-15792] error: Segmentation fault: 11 on compile of project · Issue #58069 · apple/swift · GitHub (the bug report mentions a Segfault which was an unrelated problem in the GSB, which is still used in Swift 5.6 for signature minimization; once that crash was fixed the failure to construct a confluent rewrite system was revealed as soon as the type checker went on to check the body of a generic function, performing a generic signature query against the protocol in question).
In RealmSwift's case, the desired design was that the recursion reaches a fixed point; ironically, one of the maintainers mentioned they were unable to declare this requirement in previous versions of Swift because of problems with the GSB. It seems that now they can add this requirement as intended.
If this comes up in other projects I'll need to come up with a more general solution, but for now, I believe the Requirement Machine handles the overwhelming majority of realistic generic signatures that previously worked with the GSB, while also supporting many more complex examples that did not work before.
For writing new code, if you follow these rules of thumb you'll be unlikely to hit the limits of the formalism:
- Prefer to define your protocols so that there are only finitely-many type parameters
- Just one recursive associated type, whether it is ultimately finite or infinite, is also fine
- If you need more than one infinite recursive conformance, avoid complex same-type requirements that relate deeply-nested types of those recursive associated types
- Avoid "horizontal" composition that forces the Requirement Machine to introduce implied same-type requirements between associated types with the same name in unrelated protocols