[Pitch] Expression macros

blangmuir · November 18, 2022, 1:29am

@cachemeifyoucan and I reviewed this proposal with an eye to how it could affect the possibility of caching in the Swift compiler. The Swift compiler's request evaluator model is already being used to get correct incremental builds, which is a form of build caching, and there's a lot of potential to do more cross-build caching in the future. There is also ongoing related work in Clang to add compilation caching .

What do we need to cache a macro expansion?

The macro expansion function must be deterministic.
The execution must be isolated (e.g. from filesystem and network)
- If there is a good reason to allow file access, we must have a way to model the external dependencies
We must be able to identify the macro expansion function version.

These concerns - at least about isolation and determinism - appear to be known pain points for Rust proc-macros[0]. In addition to caching during builds, they also impact IDE tools’ (rust-analyzer) ability to incrementally update. I think we should learn from this and require isolation and determinism from the beginning.

Specifically, require macros to be deterministic, and reserve the right to behave as if they are and/or report errors if we detect non-determinism; require macros to be pure functions, and reserve the right to execute them in a sandbox; require macro packages to be versioned dependencies and never republish the same package version with different behaviour.

Some ideas for how we can check these properties in the implementation

Re-run macro expansions and report an error if the results differ.
- In the compiler: If we are able to cache a macro expansion result we can probabilistically, or as an opt-in verification run the expansion again and check that it matches. This could be done without caching by running the expansion twice in a row, but that may be more limited in what it can catch.
- Tooling: Could fuzz the macro for cases with non-deterministic behaviour. Macro packages can have unit tests, so someone could provide testing API to support this.
Sandbox macro expansion process to prevent access to filesystem/network.
- This depends on access to a sandbox, which may be platform-dependent. On macOS I would like us to always sandbox.
- Without a real sandbox we could at least try to run the expansion in a unique directory and avoid providing paths to the swift source code/etc. in the tool’s arguments.
(Longer term) Maybe compile macros to a non-native and sandboxed runtime like wasm (e.g. there’s an experiment to do this for rust proc-macros watt - Rust)

Can you say more about the execution model for how we will compile code with macro inside? This would help us better-understand what is possible to cache and what kind of power is being given to the macro expansion. For example, is the compiler itself directly running an executable to expand the macro, or would this process be determined at dependency scanning time and launched in coordination with the build system? How often do we need to run a new process (per module, per file, per macro)? Will a macro expansion be limited to a single expansion or will it have global visibility to all expansions? What is the behavior for incremental build? Those questions will put a limit on what macros can do even in the future unless we are willing to redesign its build system. Before we have finer-grained caching that can make decisions based on the knowledge inside compiler, the inputs to macro expansion are important for caching in near term, with the possibility of using the caching system for incremental build.

Beyond soundness, we can also consider what impact macros could have on cache hit rates, particularly if we get to a point of doing fine-grained caching inside the compiler.

What do we need for caching to be fine-grained?

The proposed MacroExpansionExprSyntax is a good fit, since it provides only the minimal expression syntax.
MacroEvaluationContext
- Has a SourceLocationConverter that gives you line/column, which could cause spurious caching failures due to code motion. The existing #sourceLocation support has the same issue, but in that case you can detect statically that the macro requires source locations, whereas the proposed expression macros all have access to this information and would need to be conservatively modeled even if they do not use it.
  - Is there some way we could identify macros that need absolute source locations statically?
  - Another idea would be to detect the use of the location dynamically (ie. if the converter is ever used) and return that fact along with the other result data.
  - Alternatively, what if rather than provide a location converter to the expansion function, we have macros that want source locations expand to an expression that uses #sourceLocation and then we can detect statically in the expanded result if the location is an input. This would of course prevent subsuming sourceLocation itself as a “normal” macro.

In general, we want to minimize the inputs to the expansion in common cases. The more information that is provided to the macro expansion, the more likely it is to trigger a cache miss. This could be in tension with providing semantic context, since information about a type is global. The current proposal does not provide type information to the expansion function, but it seems like that is the direction things may go. The other direction we can take is exposing our caching infrastructure to macro expansion so whoever is writing the plugin can add caching internally. But that may be hard to do efficiently.

Ben & Steven

[0] Fun things about Rust proc-macros

Rust procedural macros can be non-deterministc: e.g. GitHub - tkaitchuck/constrandom: Macro to generate random constants in Rust https://xkcd.com/221/
Rust procedural macros can access the filesystem e.g. #[template(path = "hello.html")] from GitHub - djc/askama: Type-safe, compiled Jinja-like templates for Rust. This is outside the knowledge of the compiler.
Rust-analyzer (LSP server for rust) assumes macros are deterministic, which can cause issues when they are not IDEs and Macros
Experimental tool to use wasm to fully isolate proc-macros: watt - Rust