Hi all,
I just flipped a feature flag enabling a new generic signature minimization algorithm for protocol signatures. Since I've only posted a little bit about my work in this area here so I thought I'd give an more complete update.
A year and a half ago I wrote how Swift type checking is undecidable, then last year I sketched out a new way of formalizing Swift generics as a term rewrite system. Since then I've been working on implementing this idea with the goal of replacing the GenericSignatureBuilder.
Overview
The GenericSignatureBuilder is used for two distinct tasks in the compiler:
- Computing a generic signature from user-written requirements in where clauses. A generic signature contains a "minimal, canonical" set of requirements from which the properties of all type parameters (generic parameters and their nested types) can be proven.
- Answering queries about type parameters in a previously-built generic signature.
The GenericSignatureBuilder has several long-standing problems that are quite hard to fix in an incremental way.
It is slow, with exponential time and memory usage when a protocol hierarchy has high "fan-out" (if protocol P1 has 4 associated types that conform to P2, and P2 has 4 associated types that conform to P3, and so on, it constructs 4*4*4*...
rather large objects to represent each equivalence class for example).
It uses a lot of memory, because answering generic signature queries requires keeping these rather large lazily-built GenericSignatureBuilder instances around for each generic signature the type checker encounters. In extreme examples you can make it allocate gigabytes of memory and end up with an effectively non-terminating compilation.
It suffers from various correctness issues, mostly in the minimization algorithm. For example, consider this protocol from SR-12120:
protocol Swappable where
Swapped: Swappable,
Swapped.Swapped == Self,
Swapped.A == B, Swapped.B == A
{
associatedtype A
associatedtype B
associatedtype Swapped
}
The two requirements Swapped.A == B
and Swapped.B == A
imply each other, one or the other can be dropped from the minimal generic signature for the protocol Swappable
, however the GenericSignatureBuilder would drop both, resulting in an incorrect protocol that did not satisfy the properties expected by the user.
Another example concerns superclass requirements. Suppose you have two classes, and a Holder<T, U>
type where T
is required to be some subclass of Base<U>
:
class Base<U> {}
class Derived : Base<Int> {}
struct Holder<T : Base<U>, U> {}
Now I write an extension of Holder
for the case where T
is further constrained to be a subclass of Derived
:
extension Holder where T : Derived {}
The requirement T : Derived
in the extension's where
clause supercedes the requirement T : Base<U>
in the definition of struct Holder
. The GenericSignatureBuilder would drop the latter requirement. However, that was not enough, because T : Base<U>
together with T : Derived
also imply that U == Int
. The Requirement Machine correctly handles this:
extension Holder where T : Derived {
func foo(_ u: U) -> Int {
// fails to type check with GenericSignatureBuilder; RequirementMachine
// correctly infers that "U == Int" therefore "u" can be returned here
return u
}
}
One last example from SR-14485 where the GenericSignatureBuilder completely mangles the generic signature:
func truncate<Input: FixedWidthInteger & UnsignedInteger & BinaryInteger,
Output: FixedWidthInteger & SignedInteger & BinaryInteger>(_ input: Input, outputType: Output.Type = Output.self) -> Output
where Output.Magnitude == Input {
}
The above truncate()
function would receive the generic signature <Input, Output where Input == Input.Magnitude, Output : FixedWidthInteger, Output : SignedInteger, Input.Magnitude == Input.Magnitude.Magnitude.Magnitude>
. This has the nonsense requirement Input.Magnitude == Input.Magnitude.Magnitude.Magnitude
, and also Input
itself loses it's conformance requirement, so it doesn't even have a nested type named Magnitude
at all.
Finally, the fact that uncomputable generic signatures exist together with the GenericSignatureBuilder making no attempt to define which signatures it supports means the underlying approach is fundamentally unsound. A valid Swift compiler cannot claim to accept all "syntactically valid" generic signatures.
Current state
The new implementation is called the "Requirement Machine". The source code is in the lib/AST/RequirementMachine directory of the source tree. Unlike the GenericSignatureBuilder which is one big file, I tried to split it up into smaller files with lots of comments.
In Swift 5.6, the Requirement Machine is enabled for generic signature queries. The GenericSignatureBuilder is still used for constructing new canonical minimal signatures. This setup doesn't address most of the bugs, but it reduces memory usage, since the Requirement Machine's rewrite system implementation is inherently more compact than the GenericSignatureBuilder's "expanded" graph of type parameters.
Swift 5.6 still allows falling back to the GenericSignatureBuilder for generic signature queries by passing the -requirement-machine=off
frontend flag. This flag is gone on the main
branch, where the old implementation of queries has been removed. We now always build a rewrite system for generic signatures encountered by the type checker.
On the main
branch, I've been working on an implementation of generic signature minimization using the Requirement Machine. This is controlled with three frontend flags:
-requirement-machine-protocol-signatures={off,on,verify}
-requirement-machine-inferred-signatures={off,on,verify}
-requirement-machine-abstract-signatures={off,on,verify}
These flags enable the new code path for protocol signatures, generic signatures of concrete types and functions, and finally for the last one, some generic signatures the compiler constructs internally as part of conformance checking and other things. The new algorithm is mostly implemented, except emission of requirement redundancy and conflict diagnostics is still in progress.
For each flag, the "off" state runs the old GenericSignatureBuilder logic as before. The "on" state runs the new logic. "verify" mode runs both and compares the results, triggering an assertion if they mismatch. The purpose of verify mode is to emit diagnostics (since so far only the GenericSignatureBuilder can do it) and to test the new code (on at least the generic signatures that the GenericSignatureBuilder could support, we expect them to match to avoid breaking the ABI for the calling convention and function mangling).
Today I landed a PR to enable the -requirement-machine-protocol-signatures
flag, setting it to "verify" by default. This is a temporary state of affairs; once we do some more qualification and finish diagnostics, I'll set it to "on" by default. What this means is that if you try an example that caused problems for the GenericSignatureBuilder, such as the Swappable
protocol above, you'll get an assertion. If you see assertions on valid code, please let me know.
Once all three flags are set to on
by default, the GenericSignatureBuilder will not run at all. At some unspecified future point when everyone is happy with the new state of things, I will remove the GenericSignatureBuilder entirely.
Performance
I've measured a significant memory usage reduction when the Requirement Machine was enabled for queries in Swift 5.6. I've also seen performance gains with the new minimization algorithm on main. I have not optimized the minimization algorithm much yet, so there is a lot of room for improvement, but it's dramatically faster on pathological examples, with more modest gains on codebases like the standard library. I have yet to see any projects where it is slower, despite the unoptimized state.
Debugging
There are various frontend flags that produce interesting output if you're curious:
-
-dump-requirement-machine
-- dumps all constructed rewrite systems -
-debug-requirement-machine=...
-- specify a comma-separated combination of debugging flags -
-analyze-requirement-machine
-- dumps various histograms I used for performance tuning
Theory
For solving generic signature queries and computing canonical types, the techniques are pretty standard, such as the Knuth-Bendix completion algorithm and so on. Incidentally, this is where the undecidable generic signatures are rejected; completion is a semi-decision procedure that either terminates with a correct solution or runs forever. We run it up to a limit set by command line flags, which you should hopefully never need to change:
-requirement-machine-max-rule-count
-requirement-machine-max-term-length
-requirement-machine-max-concrete-nesting
The problem of coming up with a general algorithm to find a minimal set of rules that generate a rewrite system, and picking a "canonical" minimal set among many possibilities was surprisingly difficult to solve in a general way (where "general" means fixing the cases that the GenericSignatureBuilder failed to find a correct answer for, while producing identical results on formerly-valid code). I came up with an algorithm based on these three papers:
- A Homotopical Completion Procedure with Applications to Coherence of Monoids - Inria
- Homotopy reduction systems for monoid presentations Asphericity and low-dimensional homology - ScienceDirect
- [math/0507344] Logged Rewriting for Monoids
Documentation
I started writing a paper about the Requirement Machine. I'm not going to post a PDF yet because it is still a work in progress. In particular, the minimization algorithm is not documented, and some parts are out of date since it represents the state of the design from June last year. If you're curious, install a TeX distribution and run "make" in the docs/RequirementMachine directory of the source tree.