Avoiding unintentional infinite recursion

Jens · March 2, 2020, 8:01am

I'm starting this thread to discuss any code analysis tools or strategies that can be used for avoiding unintentional infinite recursion.

For the most trivial case, we'll get notified via a warning:

struct S {
    func foo() { foo() } // Warning: All paths through this function will call itself.
}

But AFAICS, there's no support for identifying mutual infinite recursion:

struct S {
    func foo() { bar() }
    func bar() { foo() }
}

let alone cases where only some paths result in infinite recursion:

struct S {
    func foo(_ a: Int) -> Int {
        return a < 0 ? foo(a) : a
    }
}

And, especially in more involved scenarios (large API with lots of members calling other members or even default implementations of protocol requirements etc.) it would be nice if there was some kind of code analysis that could optionally report all potential loops in some selected part of the call graph.

In lack of better options, I've adopted a manual "layered-extensions-discipline" where any member of layer (extension) N, must not call any members of layer >= N. That is, members of layer N can only call members of layers < N. But this is tedious and/or still prone to manual errors.

Are there any existing options already available?
If not, could/should Swift support something like this?

(Examples of related/motivating posts are here, here and here.)

Karl · March 2, 2020, 11:33am

Not really; such analysis could only handle simple cases, before running in to the halting problem. There may be restricted subsets of the language for which an answer is computable, but even then, it is typically very expensive.

Jens · March 2, 2020, 12:43pm

I'm not sure if we're imagining the same functionality or not, or if I'm just missing something obvious.

It wouldn't have to be perfect to be useful. For my purposes, it would be OK if it reported any "possible" loop in the call graph, including cases like this:

struct S {
    func foo() {
        if false { bar() }
    }
    func bar() {
        if false { foo() }
    }
}

Ie, it wouldn't have to care about whether actual (runtime) infinite recursion would happen or not, just obviously potential loops in the call hierarchy (Xcode already has the "find in call hierarchy" for any member, which suggests something along these lines might be possible).

I'm currently doing my manual "analysis" by trying to stick to the the principle I mentioned above, and "jumping to definition" on every member call I'm unsure of within the member I'm currently implementing in layer/extension N, to check that it's defined in a layer/extension < N. And it's also possible to comment out all layers/extensions above N to see that nothing <= N is calling anything > N.

The related problem of accidental infinite recursion caused by default implementations might be a bit harder to analyze / avoid though.

suyashsrijan · March 2, 2020, 1:10pm

Your example above leads to the following warnings:

/Users/suyashsrijan/Desktop/test.swift:6:20: warning: will never be executed
        if false { foo() }
                   ^
/Users/suyashsrijan/Desktop/test.swift:6:12: note: condition always evaluates to false
        if false { foo() }
           ^
/Users/suyashsrijan/Desktop/test.swift:3:20: warning: will never be executed
        if false { bar() }
                   ^
/Users/suyashsrijan/Desktop/test.swift:3:12: note: condition always evaluates to false
        if false { bar() }
           ^

Jens · March 2, 2020, 1:31pm

Yes sure, but that's beside the point I was trying to make. I should probably have written something like:
if Int.random(1 ... 6) == 7 { … }
instead of
if false { … }

Lantua · March 2, 2020, 1:38pm

what is layered-extensions-discipline?

Karl · March 2, 2020, 1:40pm

Hmm, so apparently clang will indeed flag the following as an obvious infinite loop:

Because there is no path which might exit that cycle, and the same pure function is called with the same argument.

However, flagging something like this is far more complex, because it depends on the input. Even though there is an exit path, it might never be invoked:

In general, compiler writers don’t tend to invest much time in to this AFAIK. It can cost a lot to check, and in general isn’t worth it because it only handles obvious cases.

Jens · March 2, 2020, 2:45pm

Again, the last example would just be a matter of seeing that foo has a call to foo within itself, ignoring whether the path to that call is possible or not (at runtime or even compile time).

Put differently: I could perform some of the work of this hypothetical feature manually by exploring "find in call hierarchy" on each member call in each member, and see if I find a loop or not.

It's just this:

codafi · March 2, 2020, 3:54pm

No, Clang will not diagnose inter-procedural cycles in general - the linked patchset was abandoned. -Winfinite-recursion is implemented via a depth-first-search for an exit node reachable via any path from the function entry node. I should know, I built Swift’s warning and revised Clang’s to look like ours!

Nevin · March 2, 2020, 4:07pm

Perhaps a simpler and more general feature would be some way to generate and output the entire static call-graph of a program (ie. is there any path through function f which calls g).

Then such a graph could be analysed by a standard cycle-finding algorithm to produce what Jens wants.

codafi · March 2, 2020, 4:10pm

Brute-force, flow-insensitive, interprocedural analysis is going to be extremely expensive, both in runtime and in memory consumption, and will yield a wealth of false positives. The following functions are terminating, but would be considered infinitely-recursive by this metric because they appear as members of each other’s call graphs:


indirect enum Nat {
    case z
    case s(Nat)
    
    var isEven: Bool {
        switch self {
        case .z:
            return true
        case .s(let n):
            return n.isOdd
        }
    }
    
    var isOdd: Bool {
        switch self {
        case .z:
            return false
        case .s(let n):
            return n.isEven
        }
    }
}

Directed graph reachability algorithms have absolutely abysmal performance even in isolation. Running them over entire programs is a non-starter.

Now, can we be a little more clever here to catch some domain-specific cycles? Yeah! I had the idea at one point to try to use Swiftc’s infinite recursion pass to detect infinitely-recursive uses of lazy variables. The trouble there is there is a level of indirection between the user’s getter and the initialization thunk. Checking for that isn’t so bad, and can use the existing machinery since it is already general enough to look for “a program point” not just “the exit”.

codafi · March 2, 2020, 4:51pm

SILPassManager::viewCallGraph() is backed by such an algorithm.

github.com

apple/swift/blob/ef7ddda98e74557d3838631dd1ff6d078ba12279/lib/SILOptimizer/PassManager/PassManager.cpp#L977


      
                                                   const CallGraph *Graph) {
                CallGraph::Edge *Edge = I.baseIter;
                if (Edge->Incomplete)
                  return "color=\"red\"";
                return "";
              }
            };
          } // namespace llvm
          #endif
          
          
void SILPassManager::viewCallGraph() {
            /// When asserts are disabled, this should be a NoOp.
          #ifndef NDEBUG
            CallGraph OCG(getModule(), getAnalysis<BasicCalleeAnalysis>());
            llvm::ViewGraph(&OCG, "callgraph");
          #endif
          }

Jens · March 2, 2020, 7:20pm

For my purposes (making sure some part of my code doesn't contain any possible recursive calls at all) it is OK if every recursive call is reported, whether they are terminating (or even never-actually-happening-at-runtime) or not.

Yes, this sound exactly like something I might be looking for.

I guess it's not possible to generate the call graph in any less involved way than by directly using that source code?

codafi · March 2, 2020, 7:43pm

It's not hard to see how this analysis is going to quickly spiral out of hand as the length of cycles you look for starts to increase. For the sake of argument, let's take program entrypoints to be vertices (V) and calls to be edges (E) and assume the rather expensive work of (re)building the whole-program CFG has already completed. For interprocedural cycles of fixed length k, you can do it naively in O(|V| * |E|) time - consider, then, having to repeat the experience for each k. You can try to amortize with hash tables, but you're trading a whole lot of time for a whole lot of space and still won't be able to escape a general bound of O(|V| * log(|V|))ish. As others have mentioned before, you have to consider the absolute worst case here, which would be a nice big honking Hamiltonian Cycle. (This specific case is often cited as a reason why cycle-finding is in NP).

Not as far as I'm aware.

Jens · March 2, 2020, 8:11pm

OK, thank you for the explanation. I thought that it might still be possible if the analysis was limited to only selected parts of a program, eg a single type like my (educational, recreational) implementation of a Float8 type, piggybacking on Float32 for arithmetic operations etc, which turned out to easily end up in unintentional infinite recursion that was both hard to detect and prevent.

Nevin · March 2, 2020, 9:59pm

Perhaps I’m missing something, but once you have the graph of which functions call which others, can’t you just find the strongly-connected-components (eg. with Tarjan’s algorithm)?

That is, taking functions as vertices and function calls as edges, find which functions are strongly connected to each other?

codafi · March 3, 2020, 1:26am

Right, Tarjan's Algorithm will help you find connected components in near linear time, and every non-singular such component will contain at least one cycle - that you may still have to go scrobble around for. There is a utility for doing that analysis in particular called BottomUpFunctionOrder.

Nevin · March 3, 2020, 2:04am

It’s hardly “scrobbling” when every path within a strongly-connected component leads to a cycle. You don’t even need to specifically enumerate the cycles for Jens’s purpose, just output “this set of functions is a strongly-connected component”.

codafi · March 3, 2020, 3:59am

For anybody that wants to learn more Graph Theory, I can’t recommend Diestel’s Graph Theory enough - it’s the bog standard text. Puts Wikipedia to shame.

akyrtzi · March 3, 2020, 6:04am

You could generate the index data for your project (the directory that -index-store-path points to) and use indexstore-db to read the data, do queries, and generate a call-graph.