Package Manager Extensible Build Tools

jakepetroules · February 5, 2019, 12:17am

One major issue with this proposal (and indeed with most build systems in general, not just SwiftPM) is that for many build tasks it is not possible to statically know the paths of the output files produced by a task from the set of input file paths alone; there is also a dependency on the contents of those inputs.

For example, implementing a C compiler task is easy: there's always one input file, and always one output file. We can compute a suitable output file path based on the input file path, i.e. input.c produces output.o, and we pass both of these paths to the compiler. The contents of input.c are completely irrelevant when constructing the build graph.

However, other tools can be problematic, such as the protobuf compiler. Given a file such as input.proto (depending on the output language), any number of output files may be generated. You can only control the output directory, but you can't know which files the tool will generate there (from the file paths alone). To know this, you must also understand the content of input.proto.

With a solution requiring outputs to be listed at task construction time, you either have to provide a provision for developers to hardcode which output files a given protoc invocation + input file will generate (this is not scalable and pushes the problem to the wrong audience), or you have to forgo declaring some of the outputs to the build process (this harms parallelism and correctness, if it's even possible at all in a given scenario).

Essentially, we need some sort of two-part solution: a mechanism for rule authors to declare what WILL happen, to the build system, and a mechanism for the build system to report back to rule authors what DID happen, providing the opportunity to cycle back additional information into the build graph (i.e. newly discovered output nodes that now need to be attached to the task we just ran). This also makes ordering more difficult (how do you guarantee the discovered outputs don't affect tasks which already ran, or how do you know to defer tasks which might have been or will be affected?) but will need to be solved for proper integration of arbitrary build tools.