Detecting symbol dependency cycles in order to minimize them for simpler incremental builds

kevinc · August 9, 2020, 8:39pm

I'm interested in finding a way to warn, or lint, about dependency cycles in Swift code. I'm talking about symbol dependencies, like two types in different files of a module that use one another, not package dependencies.

The hypothesis is that eliminating or reducing these cycles could improve incremental build time, and therefore that identifying cycles would be good for developers who want to reduce them.

This idea arrived when I read an article about the design of Go, which is pretty uncompromising about compile time. Here's an excerpt. Note they are talking about dependencies between modules, not files or symbols, but the principle may hold water anyway. Emphasis mine:

Another feature of the Go dependency graph is that it has no cycles. The language defines that there can be no circular imports in the graph, and the compiler and linker both check that they do not exist. Although they are occasionally useful, circular imports introduce significant problems at scale. They require the compiler to deal with larger sets of source files all at once, which slows down incremental builds. More important, when allowed, in our experience such imports end up entangling huge swaths of the source tree into large subpieces that are difficult to manage independently, bloating binaries and complicating initialization, testing, refactoring, releasing, and other tasks of software development.

It does make sense that when dependencies are acyclic, a code change would incur fewer dependent rebuilds than otherwise, and that the area surrounding a cycle can form a group of nodes that must all rebuild when one does. On the larger Swift projects I've built, I went to the trouble of splitting the app into several frameworks with the goal of reducing build work by straightening out the graph into more of a DAG, even if only at module boundaries. So it would be worth it to me to pursue that benefit, if cycles were identified.

I understand that .swiftdeps files exist, and what's in them, and that the format changes. I feel like I could at least do an experiment with the most recent text .swiftdeps format to see if locating and removing cycles improves build time to a degree that matters.

I have questions about this:

Is this worthwhile or misguided? Why?
Is parsing .swiftdeps the best way to go about this experiment, or does something more convenient exist? Would it be better to find a way to run this experiment against the latest .swiftdeps format?
If experiments pan out, what's the right way to implement a cycle warning for real such that it continues to work given changes to the .swiftdeps format? Maybe, contributing an output option to swiftc? Or is it the kind of thing you could just do in a script build step? Would SwiftLint be the right home for this?

codafi · August 10, 2020, 1:43am

The article is correct in that cycles in the build graph have a detrimental effect on both application architecture and build performance. But I'd note here that XCBuild and the package manager are already setup to detect these cycles and present you diagnostics about them.

For example, I set up three framework targets in Xcode called A, B, and C and created a simple 3-cycle between them. XCBuild is, rightfully, not entertained.

error: Cycle in dependencies between targets 'A' and 'C'; building could produce unreliable results.

Cycle path: A → B → C → A

.swiftdeps files address a separate problem entirely. Though they currently encode "external dependencies" on other files, they can pretty much only express relationships between built products and the current module - say, "my module depends on 'Foo.swiftmodule' being at X file path". There isn't really a formal encoding of the entire build graph because the Swift driver reasons about just its local piece of it - which is usually a single swift module. If you wanted to help improve cycle detection at the level of the build graph, it would involve changes to the package manager, which does not read swiftdeps files.

We are currently exploring a lot of different avenues for improving incremental builds, and there's a lot of low-hanging fruit around cross-module dependencies that we'd like to pick. For example, our encoding of external dependencies leaves a lot of be desired. We also would like the ability to reason about dependencies on names imported across module boundaries instead of just names within a module. These would enable the driver to more effectively weed out files that aren't actually touched by changes to downstream dependencies. Today, we basically only have the ability to observe entire modules changing, and our response is to completely invalidate the current module and reschedule everything. That currently means the lower down in the stack you make a change, the more detrimental to overall compile times it will be because the change will appear to "ripple outward" and touch every other module in the build graph with some kind of transitive edge to the changed module.

kevinc · August 10, 2020, 2:26am

For example, I set up three framework targets in Xcode called A , B , and C and created a simple 3-cycle between them. XCBuild is, rightfully, not entertained.

There isn't really a formal encoding of the entire build graph because the Swift driver reasons about just its local piece of it - which is usually a single swift module.

cross-module dependencies

To restate, my goal is not to address cycles in package dependencies among many modules, but in symbol dependencies within a single module:

I'm talking about symbol dependencies, like two types in different files of a module that use one another, not package dependencies.

codafi · August 10, 2020, 3:23am

You'll have to forgive my misunderstanding here, because I think we're talking past each other. Go's compilation model does not map cleanly onto Swift's compilation model, and the linked article talks about cycles in what would be inter-module dependency graphs. I don't believe that extending that logic to intramodule dependencies makes much sense in the context of our current incremental build setup. I further worry that optimizing for mostly/completely cycle-free intramodule dependencies is going to result in poor application architecture and may work to actually pessimize incremental builds by needlessly increasing the number of files required to lay out a cycle-free module.

Let's see a concrete example of this to check both of our understandings. Suppose you have two Swift files

// File 1
protocol P { /**/ }
struct A: Q { /**/ }

// File 2
protocol Q: P { /**/ }

Here, there's a cycle between the "symbols" (Swift does not model dependencies at this level, I'm going to use "names" from now on) across files. File1(A) -> File2(Q) -> File1(P). According to your reasoning, this cycle should always be broken, so we will introduce a third file

// File 1
protocol P { /**/ }

// File 2
protocol Q: P { /**/ }

// File 3
struct A: Q { /**/ }

File3(A) -> File2(Q) -> File1(P)

In an incremental build, you want to mostly optimize for the number of frontend jobs required to rebuild the average change, because the cost of creating new frontend processes and blocking parts of the intramodule build graph on them far outweighs most other costs. In addition, we must dynamically discover additional work during the incremental build, which results in observable compilation "waves". The driver therefore runs all incremental builds to a fixpoint, so the more waves we introduce the slower the entire build runs. In fact, the worst-case scenario for such a system is actually a linearization like this where a chain of dependencies can cause O(n) waves that completely serializes the incremental build schedule. I don't mean to make this out as some kind of especially common disaster scenario. Today, swiftdeps files result in quite dense dependency graphs, which helps to reduce the number of waves, but at the cost of over-compilation.

Linearization attacks the wrong problem in my opinion. You actually want to increase the granularity of dependency information, not require users to restructure the dependency graph itself. Because at the end of the day, it shouldn't matter whether the user spreads their app out in 3 files or 3000 files. Increasing the granularity of intramodule dependencies allows the driver to more effectively invalidate parts of the build graph, and may allow a future compiler to achieve incrementally inside of frontend jobs themselves.

kevinc · August 10, 2020, 4:06am

OK, I think I'm following. Thanks for your patience. I think what this means is this experiment would not make progress in the right direction.

The need to dynamically discover dependency graph information makes sense, so I think I get what you mean by waves of compilation and discovery. Would you indulge a few questions?

By fix point, you mean a fan-in kind of synchronization that parallel work has to wait on before the next wave can be dispatched? And that's the reason that removing cycles is costly, because it means lengthening the graph and entails more waves to discover, and you don't get to use all your cores and do have to wait more on job setup cost because you have to fan in at the end of a wave and back out again if you've discovered new work?

Then, the over-compilation in the dense graph tradeoff is extra work on cores that you throw out because you didn't discover everything you needed to first? Is that relevant to your June 10 merge that made name dependencies explicit rather than transient?

Within a given module, are there time gains to be made, not by strictly removing cycles and lengthening the graph, but by making the total graph shallower, or by targeting a certain minimum breadth?

codafi · August 10, 2020, 4:30am

I appreciate your enthusiasm to make changes in this area. Making incremental builds better has a direct impact on the quality of life of every Swift developer.

It's slightly more complex than that. Our current model of dependency information does not necessarily fully capture the transitive closure of all dependencies between files up front. Even after my changes to use "direct dependencies", it is still possible to observe distinct compilation waves.

The costly effect here is the increased likelihood that a change in a linearized dependency graph will allow for strictly more compilation waves than a similar compilation session that uses a cycle-bearing dependency graph. I believe the waves themselves are the problem we should aim to eliminate there, and I don't think linearization is a prerequisite to tackling that problem.

We certainly hope that my work there will result in fewer incremental waves. But like I said before we don't have evidence that we can rule out the worst case scenario here even after my changes, so we cannot remove waves from the compilation model.

Bingo.

To eliminate waves, I think you want a corresponding increase in density (at least over the dependencies we have today) so the driver knows precisely the set of files to rebuild for a given change.

kevinc · August 10, 2020, 4:37am

Thanks very much for the details!