Future of Conditional Compilation

owenv · January 28, 2020, 1:28am

Future of Conditional Compilation

Over the past couple of years, there have been a few attempts to make incremental improvements to conditional compilation directives, like allowing them inside array/dictionary literals or around catch clauses. The core team response to the first proposal does a good job of explaining why these improvements have never moved forward, even though there is near-universal support for the features in theory: the current implementation of #if creates complexity throughout the compiler that scales with the number of contexts a conditional compilation directive is permitted to appear in. Having worked on the catch clause implementation myself, and taken a look at the array literal one, I think it's become clear at this point that in order to support use cases like conditionally compiled catch clauses, literals, attributes, etc. which have been requested by the community, a new approach is required. This post is intended to begin exploring what that new approach should look like.

It's useful to begin by surveying some notable approaches taken by others with regard to #if-like features:

Rust

The Rust docs contain a short, but very helpful chapter on conditional compilation. In Rust, the primary mechanism to conditionally compile code is through the cfg attribute. Translated to Swift, this might look something like:


@if(os: Linux) struct OnlyCompiledOnLinux {}

However, Rust also allows the use of attributes to annotate arbitrary code blocks, statements, parameters, etc. which makes the feature much more flexible. The language also provides a cfg! macro which is expanded by the compiler to true or false and can be used in an arbitrary expression context. Rust, like Swift, requires that excluded code is parseable.

I think this design is probably not the right one for Swift, even ignoring the source compatibility implications. It's unclear whether Swift should support attributes on code blocks, statements, etc., and applied to our current modeling of attributes, probably wouldn't solve the problem of AST complexity. I do think it illustrates an interesting point though, that it's valuable to consider conditional compilation of declarations and statements as a closely related but separate problem to intra-declaration(e.g. attributes), intra-statement(e.g. switch and do-catch), and intra-expression (e.g. array/dictionary literals) conditionals. I'd argue that #if as it stands today supports the former cases very well, and it's the latter cases we should direct our focus to.

C/C++/C#

C -family languages rely on the preprocessor for conditional compilation, so the compiler never has to worry about representing compiled-out blocks in the AST. This allows for the use of #ifdef et. al. pretty much anywhere (even where it can result in very confusing code). C does not require that excluded code is parseable.

It's been suggested in the past that Swift could use some kind of "integrated preprocessing" approach in between the lexing and parsing stages. This would allow conditional inclusion of individual tokens. Integrating this step into the compiler is also key: it would avoid some problems C faces due to the fact that the C preprocessor and compiler tokenize code slightly differently. This is similar to the approach taken by C#. Downsides if Swift were to take this approach include:

Compiled-out code would no longer be parsed. This would make cross-platform code somewhat harder to maintain, and would hurt the quality of tools like formatters that rely on the syntax tree.
There are some open questions as to how such a change would interact with module interface printing.

Other possible techniques

While researching this topic I came across a few papers (Refactoring C with conditional compilation | IEEE Conference Publication | IEEE Xplore, https://cs.nyu.edu/rgrimm/papers/pldi12.pdf) on refactoring C in the presence of #ifdef blocks which contain some interesting techniques that might apply to this problem. Most operate by hoisting conditional compilation directives. For example, the following program:

let arr = [
1,
#if SOMETHING
2,
#else
3,
#endif
4
]

Would be transformed into this program:

#if SOMETHING
let arr = [
1,
2,
4
]
#else
let arr = [
1,
3,
4
]
#endif

Such a preprocessing step transforms a program which uses the C/C++ conditional compilation model into one which is supported by the current Swift model by hoisting conditional compilation directives to the innermost point at which they can be parsed by the current grammar. This means they could still be checked by the parser/used by tooling. The downside to this approach of course, is that actually implementing such a program transformation is very difficult, and there's not a lot of prior art to draw on. Many possible approaches involve "forking" the parser state, which would also have a nontrivial performance impact.

What to do about #warning and #error

It's worth noting that the #warning and #error directives have many of the same AST representation and usability problems as #if. Historically, it seems like most users haven't found this to be an issue in practice. Nevertheless, in the interests of consistency and maintainability, any redesign of conditional compilation should probably apply to #warning and #error as well.

Bringing it all together

I think we've seen enough proposals related to conditional compilation to indicate most of the community is interested in making some kind of change to the status quo. It also seems pretty clear to me that any solution we pick is going to involve some kind of compromise, whether it's tooling, code readability, or something else entirely. In my opinion, the most important considerations are going to be:

Source compatibility
Intra- declaration/statement/expression conditional compilation
Tooling support for understanding and manipulating excluded code blocks
Implementation maintenance burden
How hard it is to expand the model to hypothetical future language features

And so far the languages I've looked at fall into one of two broad categories

Preprocessor based conditional compilation: C, C++, C#, etc.
Grammar-integrated conditional compilation: Swift today, Rust, D, etc.

As I see it that leaves us with three options when determining the future of conditional compilation in Swift:

Keep the existing grammar-integrated model, accepting its current limitations. This would likely mean it would never support every syntactic construct, only those with clear and compelling use cases.
Move to an integrated-preprocessor model, allowing conditional compilation directives to appear almost anywhere. This would likely be a source compatible change, and it resolves the issue of intra-declaration/statement/expression conditionals. It would likely have a negative impact on source tooling.
Pursue some kind of hybrid model, perhaps involving conditional hoisting or some other program transformation. This isn't particularly well-defined and would require more investigation. It might be able to mitigate some of the downsides of the integrated preprocessor approach, but would come at the cost of significant technical complexity.

With all that said, I'm very interested in hearing everyone's thoughts on this. Do you know of any languages which take an approach different from the ones described here? Is there any other criteria we should be considering? What direction do you think we should go in from here? I'm leaning towards the second option at the moment (integrated preprocessor), but I haven't made up my mind yet.

Karl · January 29, 2020, 9:35am

IIUC we can't do things like #if Int.bitWidth == 32 today because the constant evaluator is invoked after conditional-compilation branches have been resolved.

If we're taking a serious and thorough look at conditional compilation, @constantEvaluable support should be at the top of the list IMO. Do you have any thoughts on that @owenv ?

owenv · January 29, 2020, 4:43pm

IMO constant-evaluation-based conditional compilation is not easily supported by any traditional model. The current constant evaluation infrastructure runs as a mandatory SIL pass, which is unlikely to change, whereas conditional compilation has to occur before Sema to remain useful across different platforms. This makes any #if directive body that can't be independently lowered to SIL problematic, including the existing support for conditionally compiled case blocks.

Chris_Lattner3 · February 7, 2020, 11:18pm

Hi Owen,

Thank you for re-raising this question, I'd still love to see this get addressed and improved!

I think that this approach is the right way to go: this would eliminate the AST representation of unparsed code, but if there were a reason to care about this, other representations could be found (similar to how comments are attached to ASTs in certain representations).

On your second point, I don't think there are any questions about module interface printing: module interfaces and the generated binary modules are target-specific anyway, so #ifdef's should just be stripped by the parser, and not appear in the generated interface.

AFAIK, the places that this would cause problems are things that try to produce a "cross-architecture" view of an API, e.g. documentation tools. These tools generally have to merge/diff different modules anyway, because they can have lots of differences (e.g. the size of imported "C long" types differ), so this doesn't seem like an added burden.

-Chris

anandabits · February 7, 2020, 11:26pm

I missed this earlier. This would be incredibly useful. I work on a library that has a build for 3rd parties and a build for use within the organization (from the same source). In this library we have a lot of symbols that need to be public within the organization and internal in the 3rd party builds.

Conditional compilation of individual tokens (access modifiers in this case) would be a significant improvement over what we have to do right now.

Dante-Broggi · February 7, 2020, 11:46pm

If Swift moves to a model of having compiled-out code no longer be parsed, I think it is still important that the entirety of the extended #if block (or similar) itself still be completely parseable, including all conditions.
Specifically I do not think Swift should ever support code like this C:

#if true
// …
#elif &#$* // as an example of an un-parseable condition.
// …
#endif

where C allows (actually requires) the condition of trailing else-if blocks to not be parsed, if an earlier block has a true condition.

Chris_Lattner3 · February 7, 2020, 11:58pm

I agree completely.

blangmuir · February 8, 2020, 12:39am

This moves the problem from the AST to the Swift Syntax tree. We still need to model conditionally compiled code accurately in Swift Syntax, and it would be a regression to have only a sequence of tokens. We don't know which of the #if ... #else branches to pick. Part of the promise of Swift tooling is that we don't need compiler arguments to get a parse tree. I would hate to lose that.

AFAIK, the places that this would cause problems are things that try to produce a "cross-architecture" view of an API, e.g. documentation tools.

I agree that many documentation tools may already handle this, because it's already a problem at the semantic level. I think the bigger issue is all the syntactic tools like code formatters, non-semantic linter rules, code folding, smart selection, syntactic refactoring, etc.. All of those tools are harder to write and less complete if they cannot reliably get a parse tree that covers the whole file.

SDGGiesbrecht · February 8, 2020, 1:06am

My documentation tool uses SwiftSyntax, and it relies on the #if statements being parsable to do its job. I first wrote it because the only other tool available at the time insisted on performing a separate build for each, and could not run on Linux. My current tool produces identical output from both macOS and Linux and automatically documents differences for Windows, Android, SWIFT_PACKAGE, Xcode, Debug, arch(...) etc. The single‐pass strategy also compressed what had been 2+ hour jobs down to 1 minute. I would not want a world where you have to perform separate passes on at least three different machines (to satisfy macOS, Linux, Windows) and then consolidate them somewhere to merge them. That would be a horrible degradation from the status quo where you can just do it locally on one arbitrary machine. I strongly oppose any steps that would put that in jeopardy.

Now, I primarily care about the API‐level declarations. I would not mind nearly as much if function bodies were to become a free‐for‐all. However, even now I run into frustration when using swift‐format, because its results differ between platforms. Sometimes I run it locally on macOS, then push to a Linux CI host, which then objects and tells me I forgot run swift‐format. I hope these situations occur less often over time, not more.

Over the lifetime of SwiftPM, we’ve seen the use of #if os(...) in the manifest often seem like a blessing at first, but then turn out to be the source of chronic problems. Posts involving it on these forums tend to be immediately pounced on by people vehemently objecting to its use whatsoever. The newer way with BuildSettingCondition adopted by SE‐0238 and SE‐0273 has been a push to make the world a better place where all platforms understand each other’s requirements and can reason about them.

With that as an approximate forerunner, I would like to suggest recognizing two separate levels of conditionals, and going about the two differently:

The compiler requires something that is unavailable in order to continue, so we have to hide it. Stuff in this category includes:
- Parsing inside #if compiler(>=x).
- Resolving symbols inside #if os(Windows) with import WinSDK.
While necessary at times, I think these things should be used as little as possible, because of the interference they cause.
The compiler can completely understand it, it just needs to refrain from actually applying it. I think these deserve something better than #if. These are where I think real improvement could be made and time could be better spent. They could all be done on a model more like SwiftPM’s .when(), and use clearly defined syntax and semantics:
- Modifiers and attributes could all have a final, optional parameter tagged on:
```
public internal(set, when: [.os([.macOS]), .config(.debug)])
func foo() {}
```
- Array literals could take a hint from string interpolation:
```
let string = "1, \(condition ? "2" : "3"), 4"
let array = [1, \(condition ? [2] : [3]), 4]
```
  Combined with the following “magic” standard library function, it could be used with compile time conditions:
```
func when<T>(
  _ conditions: [CompilerCondition],
  _ ifValues: [T],
  else elseValues: [T]
) -> [T] { /* */ }


let arr = [
  1,
  \(when([.arch([.arm, .i386]), .flag("CUSTOM")], [2], else: [3])),
  4
]
```

Chris_Lattner3 · February 8, 2020, 1:30am

I agree, but I don't think that is avoidable. We have the (unacceptable to me) situation where people cannot #ifdef out individual attributes, and there are other weird limitations that cause significant problems for users. We need to fix this somehow, and encoding this all into the AST isn't scalable. We discussed that in the review conversation mentioned in the original post upthread. Here is a link to part of the discussion.

-Chris

blangmuir · February 8, 2020, 1:46am

I'm sympathetic to wanting to move this out of the (semantic) AST. If we were generating a syntax tree in the parser that was lowered to the semantic AST, it would isolate the rest of the compiler from having "special cases" for #if-handling. I think adding new high-value places for #if like around attributes, and list items would be very reasonable in Swift Syntax.

beccadax · February 8, 2020, 1:50am

One more issue that you sort of skip over: #sourceLocation. Parsing #sourceLocation as a declaration severely limits its applicability to code generation, which is supposed to be its domain. Even the Swift project itself doesn't use #sourceLocation in GYB output because it can't be reliably inserted between any two tokens. If we're talking about reforming #warning and #error, #sourceLocation should be part of the conversation too.

allevato · February 8, 2020, 1:54am

I came here to say something to the same effect. Any changes that eliminate the tree representation of conditional compilation blocks (or reduce them to a stream of tokens or some other untyped text blob) would cause a huge regression in swift-format as well as the ecosystem of other tools that are being built up using the SwiftSyntax. I'd hate to see us start building this entire ecosystem and then debilitate the underlying model so badly that users' abilities to build successful tooling around it are severely harmed.

I can sympathize with this problem though. When working on swift-format, we've had our share of bugs that were caused by us writing rules that didn't expect #if blocks in certain places. For example, we had a rule that combined cases with fallthroughs, and we just completely dropped code in situations like this, because who would have expected this was allowed:

switch foo {
case .bar:
  doSomething()
#if FEATURE_ENABLED
case .baz:
  doSomethingElse()
#endif
}

But this feels like a problem that can be solved if we generalize it properly, and without giving up all the advantages we have of a tree-based representation. For example, I don't think there are any situations where you'd want an #if/#endif block that couldn't be represented as a node that completely encompasses another subtree (or subtrees with the same parent, for collections) of the syntax tree. I don't think there would be value in supporting something that cross-cuts through multiple nodes in a way that can't be represented as a parent/child relationship, like this:

#ifdef FOO
func foo(
#else
init(
#endif
x: Int) {
  // why?
}

With that in mind, if we audited all of the nodes where we felt that compile-time conditionals should be allowed, could a solution be achieved with some metaprogramming? Could the syntax node definitions in gyb_syntax_support be tagged in such a way that they say they support being conditionally present, and some codegen in the parser could handle the process of "check for #if, handle it if present, then delegate to the underlying node"?

It could make the consumption side of the syntax tree a little more complicated (more SwiftSyntax users would have to look "through" conditional blocks), but I don't think that can be avoided. It can also be made better by improving the type safety of SwiftSyntax—for example, in the switch/case situation above, instead of just representing the children of the switch as a generic Syntax node that needs to be downcast to the right type, which makes it easy for users to omit cases they aren't aware of, represent it as an enum that exhaustively lists the allowed child types, roughly something like

enum SwitchCaseListElement {
  case switchCase(SwitchCaseSyntax)
  case ifConfigDecl(IfConfigDeclSyntax)
}

owenv · February 8, 2020, 2:04am

Thanks for the feedback everyone. I'm going to try and respond to some of your points here, sorry if I miss anything!

This is good to know. Originally, I thought there may have been some edge cases, but I don't remember where I got that impression and I haven't found any evidence to back it up.

Like Chris said, I totally agree. There's no good reason to match the behavior of C here.

This is a major concern of mine as well, and it's why I've been hesitant to recommend a specific solution so far.

This would help somewhat, but even just parsing #if in more places adds a significant amount of complexity. For example, adding #if support to switch cases doubled the number of productions in the switch statement grammar. That doesn't map 1-1 to parser complexity, but they're not completely unrelated. It's probably worth investigating more though. I assume what you have in mind is something similar to what's on the syntax-parse branch?