Swift type checking is undecidable

You're absolutely right. It's been a while since I've done any abstract algebra so my definitions are rusty. I think you can replace "semi-group" with "monoid" everywhere in my post and it would be valid (the cited paper does talk about semi-groups though, because the undecidability of the word problem doesn't actually depend on there being an identity element).

10 Likes

Yes, absolutely. But if the program is accepted by the compiler* (i.e it type checks) and "miscompiles and crashes" (at runtime) as OP stated, wouldn't that imply unsoundness? I mean, the definition of sound type systems that I'm aware of is "all of type-checked programs are correct".

Actually, semigroup is what you want, not monoid: your initial writeup is just fine if you remove the identity law.

It might be better to say that the word problem is known to be undecidable for SEMIgroups, because it might not be immediately clear how undecidability for groups implies undecidability for semigroups. (But it's not hard either.) The result for semigroups was the first one established, by Emil Post, in 1947: Search 2.5 million pages of mathematics and statistics articles

Now, I wonder if you'll end up making use of (some parts of) the Knuth-Bendix algorithm in the typechecker. :smile:

4 Likes

There’s a difference between the case where the type system is sound but the compiler implements it incorrectly, and the case where the type system itself is unsound. That may not matter to someone with a crashing app, though!

8 Likes

If I understand the problem that Slava is outlining here correctly, it's the introduction of constraints (a.k.a. relations) that leads to being able to express undecidable statements in a free semi-group, so it's hard to see how the lack of a constraint could cause a decidability problem.

Also, I don't understand why you say that constraint is “missing,” or what conceptual difficulty that gives you; the lack of such a constraint merely means that the Indices of a Collection's Indices can be represented by any other Collection with the right Element type. In a world of finite types, that would indeed have to bottom out at some kind of cycle in the type-relationship graph, but since we have infinite types, you could imagine the index type of A is I<A> and therefore its index type is I<I<A>> and so on.

…which leads me to wonder, @Slava_Pestov, if in some crazy world where we could think about banning infinite type structures, this problem would go away?

1 Like

If by "this problem" you refer only to undecidability, then yes. The word problem on finite (as opposed to finitely-presented) semigroups is trivially decidable[1]. This may not actually be practically useful, however, without a specific concrete bound on the sizes of types.

[1] Because the (semi)group is finite, you can "just" write down the whole multiplication table, then you "just" evaluate each expression and see if you get the same result.

2 Likes

Yeah, it's pretty interesting. As @scanon explains, once you add "enough" constraints to make the nested type graph finite, you can completely solve the problem; but also, in the absence of protocol "where" clauses, the problem is trivially solvable too (every nested type is unique). It's the uncanny valley in between where you have some same-type constraints but some infinite structure as well that we run into difficulties.

7 Likes

While it seems we never intentionally allowed unlimited recursive constraints in the Standard Library, this generally isn't true outside of it -- for example, SwiftUI's View protocol presents a clear use case for them.

FWIW, the existing cases in the stdlib I can find are all mistakes -- we forgot to add an obvious terminator clause:

Collection.Indices: Indices
// missing: where Indices.Indices == Indices

Numeric.Magnitude: Numeric 
// missing: where Magnitude.Magnitude == Magnitude

Before SE-0191, we also had a mutually recursive case:

Collection.IndexDistance: SignedInteger
SignedInteger.Words: Collection 
// missing: Words.IndexDistance == Int

I think that it would be clearly desirable to add such a clause whenever possible, but DSLs need to be able to work without an artificial limit. (Unless they rely on type erasure.)

IIUC (which is a big if), undecidability can only raise its ugly head while type checking generic code. Conformance checking is always decidable because that only ever involves matching a finite set of types against a finite set of constraints. Allowing tuples to conform to arbitrary protocols would perhaps change this, though.

7 Likes

IIUC, the undecidability is introduced by the combination of same-type constraints and unlimited recursivity. In SwiftUI's case, there shouldn't be a need for same type constraints on the Body associated type.

3 Likes

Nice work in all of this. I hope the solution still supports mutually referential protocols (like in SR-8966), since we've seen in practice that people use them! I suspect they won't be a problem because they're all constraints of the form A.B[.C] == Self, which means canonicalization can always pick a shorter name for the type.

5 Likes

Considering that we want to change the language semantics in a way that will force some ambiguities to be diagnosed/resolved at runtime when infinite types are allowed, I wonder whether the right answer to both problems is to close the infinite type door.

I'm drawing a blank trying to think of the connection between the linked post and generic signatures that generate infinitely many unique types.

However, in theory I'm not opposed to limiting the language here, but it seems like at least for SwiftUI, we have to allow infinitely many unconstrained types, no?

Can we tie down Collection.Indices.Indices == Collection.Indices without breaking the world?

2 Likes

Ah, but you can't actually do that :slight_smile: A same-type constraint A == B has to be thought of as two rules, A -> B and B -> A. Consider this example

protocol P {
  associatedtype A : P
  associatedtype B : P where A.A.A == Self, B.A == Self
}

For example, we can prove that B is equivalent to A.A, but only by applying Self -> A.A.A first, to get B.A.A.A. Then we apply B.A -> Self leaving us with A.A.

If you prove the other direction, you similarly have to rewrite Self -> B.A first to get B.A.A.A, before you can simplify down to B.

So while we define the "canonical" form of a type as the smallest representative with respect to some order, the set of same-type constraints as written in a protocol do not form a rewrite system where every step must either terminate or make the type shorter.

7 Likes

I've thought about it some more and finiteness of the type graph is a red herring.

It is true that if a set of mutually-recursive protocol definitions is known to generate a finite type graph, the compiler could exhaustively enumerate all the associated types and compute the conformance and same-type relations via brute force.

However, checking if a finitely-presented monoid is finite is an undecidable problem. So given a set of mutually-recursive protocol definitions, the compiler has no way to enforce that the type graph is finite, short of setting an upper limit on the number of types and exhaustively enumerating them until it hits the limit.

6 Likes

I don't think that's true. There are definitely structural rules that are sufficient to establish finiteness and which are decidable to check; disallowing recursive protocols is the most obvious example. We could also just detect recursion and require the knot to be tied, i.e. all recursive positions have to be tied to a structurally-smaller type. Those rules might not be workable for source/binary-compatibility reasons, but they're not impossible.

2 Likes

Right, we can introduce a restriction that has the side effect that all valid graphs are finite. But there will necessarily exist finite graphs that it will reject, too.

I think it would be better if we allowed some infinite graphs, since clearly they’re useful, and looked for other restrictions that give the rewrite system the desired comparability properties.

3 Likes

Right, of course. Otherwise it would be decidable. :slight_smile:

If you think you can find that, that's great.

1 Like

Here’s an interesting class of decidable word problems that is sadly also insufficient. If you restrict all same-type constraints to having equal length on two sides, then any two equivalent types must have the same length, too. Since there are only finitely many strings of a fixed length, we can enumerate all of them and compute our rewrite relation exhaustively.

1 Like

I'm really glad you pushed me on this because it has forced me to clarify my thinking.

I don't think what SwiftUI needs is the same as having “infinite types” in the sense I mean it. I think what I mean is that today, any given concretization of a generic type can imply the existence of an unbounded number of other concrete types, like C.Indices.Indices.Indices…, each of which is allowed to be different. Here's a really reduced example of a program that uses that fact:

struct X<T> {
  init(_ n: Int) {
    if n != 0 { _ = X<Self>(n - 1) }
  }
}

So I guess what I'm talking about is not a property of types but of programs. Today, the set of concrete types used by a program is unknowable at compile-time. If it were possible to explore type generation paths until the set of types reached a fixed point at compile time, it would allow all such ambiguities to be diagnosed and/or resolved at compile-time. I'm pretty sure this is the same as monomorphizability, and I have to admit I'm not yet sure whether it's compatible with module encapsulation.

I think this is really the other side of the C.Indices.Indices.Indices… coin, in the sense that if we had a finite set of types, maybe(?) it wouldn't matter if the constraint loop on Indices were closed.

Can we tie down Collection.Indices.Indices == Collection.Indices without breaking the world?

Certainly; the number of user-defined collections is (relatively) small and the number with non-defaulted Indices is even smaller. So we might break a small continent, but not the whole world :wink:

4 Likes

But all programs builds on variables. And all variables must be defined by user before program will be compiled. How to get the variable of ambiguities type?