[Pitch] Expression macros

Woohoo!

Do we want macros to be able to care about this information? It's not immediately obvious that we would. What's an example of a situation where it might matter?

Type-checked macro arguments and results

One question I had reading this that I didn't see answered elsewhere is whether the context provided signature of the macro will be used in type checking the argument expression. It seems like it would be but would be nice to have this stated explicitly.

  • Syntactic macro expansions are prone to compile-time failures, because we're effectively working with source code as strings, and it's easy to introduce (e.g.) syntax errors or type errors in the macro implementation.

I see how this is true when using the string interpolation facilities for building the resultant syntax nodes, but isn't it perfectly possible to write "safe" macros that instantiate the syntax nodes directly in order to remove the possibility of introducing additional syntax errors?

The StringifyMacro struct is the implementation for the stringify macro declared earlier. We will need to tie these together in the source code via some mechanism. One approach is to name the module and ExpressionMacro struct name within the macro declaration, e.g.,

macro stringify<T>(_: T) -> (T, String) = ExampleMacros.StringifyMacro

This reads fairly noncommittally but immediately following in Detailed design it appears this is actually the truly proposed behavior. Should this statement be strengthened?

This sort of solution always bugs me a little since it potentially results in errors that are highly unreproducible and dependent on both general machine performance and particular system load. It would be cool if we could come up with some heuristics here (based on the total size of the macro argument? depth of the syntax tree?) that were a little bit more machine independent than raw compilation time required...

Will the macro evaluation context provide information about the compilation target? If not, how do we support cross-compiling?

Also, is the macro declaration parsed in the context of the build machine, or the compilation target? Let's say I have a macro declaration which claims to return an Int, and I'm building on a 64-bit machine but the target is a 32-bit machine - which type does it return?

Also, the idea that compiling code will require the ability to run arbitrary executables has me a little concerned. Does some kind of fallback interpreter already exist for platforms which can't support that?

Here's an admittedly contrived example, because it's the first thing that popped into my mind (my LaTeX is rusty, I've been out of academia for a while!):

// Produce a LaTeX equation for the given expression
#latexify(a + b)

If a and b are scalars, then this could produce $a + b$ (rendered as a + b) whereas if a and b are collection/vector-like types, it could produce $\mathbf{a} + \mathbf{b}$ (rendered as a + b). To render that correctly would require type information for the component parts.

More generally, I want to make sure that we're not walling ourselves in to capabilities that we may need in the future. Given a macro invocation #someMacro(expr), it's reasonable to want to know the type of expr itself, but since we can traverse the syntax tree of expr, and since the whole thing has already been type-checked, it seems reasonable that some operations may need to work on the constituent components as well. For example, when we get to a future proposal on declaration macros, an implementation of Equatable synthesis as a macro would need to iterate over the type members and verify their conformances, so having just the type information for the thing the macro is applied to is insufficient.

This can all be hashed out more in a future proposal, but I think one interesting approach to this, rather than trying to provide a full semantic AST API similar to the AST nodes in the compiler today, would be to provide APIs that let you query AST information from the syntax nodes. Effectively, have the AST be a lightweight projection overlayed on top of the syntax tree, and from the syntax node traversal the user can already do, they can ask questions like "what's the type of this ExprSyntax?" or "for each MemberDeclListItemSyntax that is a VarDeclSyntax, what is its type?"

1 Like

Sorry, I should have been a bit clearer. I'm broadly on board with the idea that having some type information would be desirable, my skepticism was mostly in regards to the subsequent items ("whether they are local variables, members, globals from the same module, globals from different modules, what is their visibility, etc").

Those feel like declaration-level concerns, which feel like they're a level above what an expression macro should be doing. (I think this is the same discomfort I felt with move(_:) being originally modeled as a function.)

It seems to me as though, conceptually, an expression macro should be operating in an expression world, and producing syntax trees which operate on types/values rather than having its behavior change based on declaration-level information about the specific references it's working with. OTOH, it seems perfectly reasonable for declaration macros to be working with declaration-level concerns, for the reasons you note.

This is a fairly hand-wavy objection, though, and I could be convinced otherwise. I'd want to see compelling examples, though (and I think your #latexify example is a decent one for demonstrating why type information would be useful to expression macros).

2 Likes

On the subject of the protocol hierarchy here:

Macro protocols

The Macro protocol is the root protocol for all kinds of macro definitions. At present, it does not have any requirements:

public protocol Macro { }

The ExpressionMacro protocol is used to describe expression macros:

public protocol ExpressionMacro: Macro {
 /// Evaluate a macro described by the given macro expansion expression
 /// within the given context to produce a replacement expression.
static func apply(
   _ macro: MacroExpansionExprSyntax, in context: inout MacroEvaluationContext
 ) -> MacroResult<ExprSyntax>
}

Do we foresee macros that wouldn't have an apply operation? Should we plan for the generalization and have it be a root requirement of Macro? Something like:

protocol Macro {
  associatedtype MacroExpansionSyntax
  associatedtype ResultSyntax

  func apply(_ macro: MacroExpansionSyntax, in context: inout MacroEvaluationContext)
    -> MacroResult<ResultSyntax>
}

protocol ExpressionMacro : Macro
  where Self.MacroExpansionSyntax == MacroExpansionExprSyntax,
        Self.ResultSyntax == ExprSyntax {}

(And if we do that, could ExpressionMacro be a typealias of Macro with those constraints instead of being a formally inherited protocol?)

2 Likes

It seems like it would be useful to have the properties of MacroResult be var properties as well:

public struct MacroResult<Rewritten: SyntaxProtocol> {
  public let rewritten: Rewritten
  public let diagnostics: [Diagnostic]

  public init(_ rewritten: Rewritten, diagnostics: [Diagnostic] = []) {
    self.rewritten = rewritten
    self.diagnostics = diagnostics
  }
}

especially since the public initializer already lets you construct an equivalent struct with only one of the fields modified. It seems to me like it would be natural for a macro to want to append diagnostics to the diagnostics field in-place as it does its work.

3 Likes
macro stringify<T>(_: T) -> (T, String)

Although they are not in scope for this proposal, we know that there will be other kinds of macros in the future. What marks #stringify(_:) as an expression macro, as opposed to some other kind of macro that also has arguments and a return value?

macro stringify<T>(_: T) -> (T, String) = ExampleMacros.StringifyMacro

Oooh, I like this = syntax. Nice and concise.

    return MacroResult("(\(argument), #\"\(argument.description)\"#)")

I'd like to see stronger typing of the code fragments a macro works with. In particular, I'm convinced that treating interpolation of a plain String into a SwiftSyntax node as inserting raw source code is a mistake. It's the same kind of hazard we see in SQL injection—people don't think about the edge cases, or don't handle them robustly enough to work correctly in all situations. For instance, this line in StringifyMacro would break if the argument itself used a raw string literal.

What we ought to do is rename the SyntaxStringInterpolation.appendInterpolation(_: some CustomStringConvertible) method to something like appendInterpolation(raw:), and add a new appendInterpolation(_:) overload that automatically chooses a robust way to escape and quote the text. For that matter, we should have overloads for inserting integer literals, boolean literals, and array/dictionary literals of these types in the same way. Then StringifyMacro can write:

     return MacroResult("(\(argument), \(argument.description))")

To automatically get appropriate escaping, or:

    return MacroResult("(\(argument), #\"\(raw: argument.description)\"#)")

If for some reason it really wants to control the quoting and escaping itself.

(If you hate having quote marks magically appear around interpolated strings, you could instead add a label so it's more like "(\(argument), \(literal: argument.description))". The important point is that an unlabeled appendInterpolation(_:) must not treat a random String that could have come from anywhere as though it were valid source code. You should have to do something to acknowledge that the string needs to contain valid source.)

(Aside: SwiftSyntax does not currently evolve through Language Workgroup evolution proposals, does it? Should it? I suppose it's not as vital if you can use whatever SwiftSyntax version you want, but it seems strange that the design of the types used in macros isn't really governed by evolution.)

The signature of a macro is either function-like ((T) -> (T, String) ) or value-like (: Int ), depending on the form of the macro-signature .

This sentence doesn't seem to be quite correct—a function-like signature has a parameter clause, not a function type, so (T) should be (_: T).

public mutating func createUniqueLocalName() -> String

Should this return a TokenSyntax with an identifier or something?

macro column<T: ExpressibleByIntegerLiteral>: T = BuiltinMacros.ColumnMacro

Is it a little strange that both function-style and constant-style macros start with the same keywords and the difference between them only becomes clear when we see whether there's a : or an argument list? Perhaps we should use macro var and macro func to convey this difference more quickly:

macro func stringify<T>(_: T) -> (T, String)
macro var column<T: ExpressibleByIntegerLiteral>: T = BuiltinMacros.ColumnMacro

The error question is really interesting. I'm sorely tempted to suggest that we should emit warnings through a function on the context and emit errors by throwing them from the apply(...) method. That would limit us to one error per macro, but I think it might be okay. It would mean we don't need a MacroResult, so the macro doesn't have to make up a fake expansion when there's a failure, and it would mean that nested type-checking failures could throw a special "macro aborted, but we have already diagnosed the failure so you don't need to do it again" error that would simply pass through the macro without explicit error handling beyond a try.

extension MacroEvaluationContext {
  mutating func type(of expression: ExprSyntax) throws -> Type
  mutating func declaration(referencedBy expression: ExprSyntax) throws -> Decl

   mutating func warn(_: Diagnostic)
   mutating func remark(_: Diagnostic)
}

public enum LatexifyMacro: ExpressionMacro {
  static func apply(
    _ macro: MacroExpansionExprSyntax, in context: inout MacroEvaluationContext
  ) throws -> ExprSyntax {
    "\(try convert(macro.argumentList.first!.expression, in: &context))"
  }

  static func convert(_ syntax: Syntax, in context: inout MacroEvaluationContext) throws -> String {
    switch syntax.as(SyntaxEnum.self) {
    case .identifierExpr(let expr):
      switch try context.type(of: expr) {
      ...
      case let unknownType:
        throw Diagnostic(location: expr, message: "can't convert variable of unknown type \(unknownType) to LaTeX")
      }
      ...
    }
  }
}

EDIT:

macro-signature -> parameter-clause '->' type

Should this support omitting the return type for a Void-returning macro? Should it support throws, requiring a try before the macro? What about rethrows? Should it support async, requiring an await before the macro?

9 Likes

Huh, I thought it was stated explicitly. I'll see if I can strengthen the wording here.

Yes, it's possible to do it correctly, but constructing syntax nodes programmatically is a rather painful process. String interpolation is so much better.

Sure, will do.

I agree, but I also have no idea how to do this well :(.

Yes, it will be extended in the future to provide more of this information. At a minimum, I'd like to have enough information to evaluate #if checks appropriately, because that's the kind of compilation-target checking one could if you had a function that's built for the target.

Doug

It returns Int, which will be 32-bits. Now, the actual implementation of the macro (the one that transforms syntax trees) will be built with a 64-bit Int for the build machine. If you try to do math on the integer literals---say, you want to implement your own kind of constant folding---you would need to be very careful to do so using the target's bit-width.

No, we do not have a fallback interpreter. Our options in that case would be to build or load the macro implementation into the compiler (e.g., via the plugin mechanism we've been using for our prototypes) or pre-expand all of the macro uses in the code base.

FWIW, the need for type information of subexpressions was called out in the power assertions post.

This is very much the direction we'd like to go.

I can give a couple of examples where it would be useful to know what declarations are being referenced within an expression:

  • You might want to transform + operations that refer to a specific implementation of + without affecting normal math.

  • You might want to implement something like #selector or #keyPath as a macro, where you need to know something about the declarations referenced along the way.

  • You might want to distinguish between references to local variables and references to global declarations, because you want to alter how local variable references are captured in a closure you generate.

    Doug

9 Likes

I think they'll all have an "apply" operation, but I don't know if we want to push them all through the same signature. The section on macro contexts in the vision document has things like property-wrapper and synthesized-conformance macro protocols that need different information than expression macros. We could make all of that fit a general apply, but I'm not sure if it's a good idea---especially if a given macro might support several contexts by conforming to different protocols, because then there would be no single MacroExpansionSyntax or ResultSyntax.

Sure.

Doug

6 Likes

It’s of course possible I overlooked something :) but specifically, this snippet:

Macro arguments are type-checked against the parameter types of the macro. For example, the macro argument x + y will be type-checked; if it is ill-formed (for example, if x is an Int and y is a String ), the macro will never be expanded. If it is well-formed, the generic parameter T will be inferred to the result of x + y , and that type is carried through to the result type of the macro.

is clear about the fact that the type information propagates from the macro argument to the macro result, but I’m wondering whether a more narrowly typed macro argument would provide a contextual type to the argument expression during type checking. And, similarly, can type information propagate backwards over the macro? E.g., if we knew the result of #stringify(1 + 2) was being passed to a context expecting (Double, String), would type checking succeed, or no? Is there a passage I’m missing that addresses this more directly?

Nothing marks it as an expression macro in the source; that information is in the definition, via the conformance to ExpressionMacro. I'm mixed on this: I don't like the repetition if we have to say it in both places, and the conformance is the one that has to be there. OTOH, tools could get further in understanding the source code without loading any macro definitions if we had the information repeated.

That's more of a concern with the design of the SwiftSyntaxBuilder library, but I think I agree: if we limit ourselves to interpolated in syntax nodes, you're less likely to make a silly error.

To be clear, the real implementation of such a macro should use StringLiteralExprSyntax initializer that does escaping for you.

I sense a pull request coming ;)

I do not think SwiftSyntax should be governed by the evolution process. It's too large, too fast-moving, too tied to the implementation details that don't belong here. Maybe that means we should put big disclaimers around the types described in this proposal that the details can change.

Good point, thanks.

Good idea!

This 'macro' feels more like a modifier on a func or var than a new kind of entity. Having '(' vs. ':' be the difference between the two forms, for me, is sufficient, and unlike functions where forgetting to call them still produces a value expression, if you forget macro arguments your program is immediately ill-formed.

That's an interesting approach; I hadn't considered separating out warnings from errors to have completely different approaches. I do worry about only being able to produce a single error, because it means your users could be stuck fixing one error at a time with rebuilds in between. That said, we could have some form for throwing multiple diagnostics.

() is valid ExprSyntax, so making up a fake expansion doesn't really seem that hard... but yes, I see how being able to throw out of your macro makes this a little easier.

Yeah, probably.

I don't think any of these should apply. Checking for proper await and try will only be done post-expansion.

Oh, I see. "Yes", and "no", respectively: you can think of this like a function call, so type information propagates in both directions, but there's nothing that says so in the proposal. I'll see if I can clarify this.

Doug

5 Likes

@cachemeifyoucan and I reviewed this proposal with an eye to how it could affect the possibility of caching in the Swift compiler. The Swift compiler's request evaluator model is already being used to get correct incremental builds, which is a form of build caching, and there's a lot of potential to do more cross-build caching in the future. There is also ongoing related work in Clang to add compilation caching .

What do we need to cache a macro expansion?

  • The macro expansion function must be deterministic.
  • The execution must be isolated (e.g. from filesystem and network)
    • If there is a good reason to allow file access, we must have a way to model the external dependencies
  • We must be able to identify the macro expansion function version.

These concerns - at least about isolation and determinism - appear to be known pain points for Rust proc-macros[0]. In addition to caching during builds, they also impact IDE tools’ (rust-analyzer) ability to incrementally update. I think we should learn from this and require isolation and determinism from the beginning.

Specifically, require macros to be deterministic, and reserve the right to behave as if they are and/or report errors if we detect non-determinism; require macros to be pure functions, and reserve the right to execute them in a sandbox; require macro packages to be versioned dependencies and never republish the same package version with different behaviour.

Some ideas for how we can check these properties in the implementation

  • Re-run macro expansions and report an error if the results differ.
    • In the compiler: If we are able to cache a macro expansion result we can probabilistically, or as an opt-in verification run the expansion again and check that it matches. This could be done without caching by running the expansion twice in a row, but that may be more limited in what it can catch.
    • Tooling: Could fuzz the macro for cases with non-deterministic behaviour. Macro packages can have unit tests, so someone could provide testing API to support this.
  • Sandbox macro expansion process to prevent access to filesystem/network.
    • This depends on access to a sandbox, which may be platform-dependent. On macOS I would like us to always sandbox.
    • Without a real sandbox we could at least try to run the expansion in a unique directory and avoid providing paths to the swift source code/etc. in the tool’s arguments.
  • (Longer term) Maybe compile macros to a non-native and sandboxed runtime like wasm (e.g. there’s an experiment to do this for rust proc-macros watt - Rust)

Can you say more about the execution model for how we will compile code with macro inside? This would help us better-understand what is possible to cache and what kind of power is being given to the macro expansion. For example, is the compiler itself directly running an executable to expand the macro, or would this process be determined at dependency scanning time and launched in coordination with the build system? How often do we need to run a new process (per module, per file, per macro)? Will a macro expansion be limited to a single expansion or will it have global visibility to all expansions? What is the behavior for incremental build? Those questions will put a limit on what macros can do even in the future unless we are willing to redesign its build system. Before we have finer-grained caching that can make decisions based on the knowledge inside compiler, the inputs to macro expansion are important for caching in near term, with the possibility of using the caching system for incremental build.

Beyond soundness, we can also consider what impact macros could have on cache hit rates, particularly if we get to a point of doing fine-grained caching inside the compiler.

What do we need for caching to be fine-grained?

  • The proposed MacroExpansionExprSyntax is a good fit, since it provides only the minimal expression syntax.
  • MacroEvaluationContext
    • Has a SourceLocationConverter that gives you line/column, which could cause spurious caching failures due to code motion. The existing #sourceLocation support has the same issue, but in that case you can detect statically that the macro requires source locations, whereas the proposed expression macros all have access to this information and would need to be conservatively modeled even if they do not use it.
      • Is there some way we could identify macros that need absolute source locations statically?
      • Another idea would be to detect the use of the location dynamically (ie. if the converter is ever used) and return that fact along with the other result data.
      • Alternatively, what if rather than provide a location converter to the expansion function, we have macros that want source locations expand to an expression that uses #sourceLocation and then we can detect statically in the expanded result if the location is an input. This would of course prevent subsuming sourceLocation itself as a “normal” macro.

In general, we want to minimize the inputs to the expansion in common cases. The more information that is provided to the macro expansion, the more likely it is to trigger a cache miss. This could be in tension with providing semantic context, since information about a type is global. The current proposal does not provide type information to the expansion function, but it seems like that is the direction things may go. The other direction we can take is exposing our caching infrastructure to macro expansion so whoever is writing the plugin can add caching internally. But that may be hard to do efficiently.

Ben & Steven

[0] Fun things about Rust proc-macros

14 Likes

C'mon, Doug, you can't just nerd-snipe people like this, especially on a Friday before a vacation.

12 Likes

For the record, I nerd-sniped you on a Thursday afternoon :).

Doug

8 Likes

Apologies in advance, because I assume this is already answered somewhere, but I don’t see it in the proposal:

What are the build & dependency constraints for macro defs?

I take it that, unlike Rust, a macro cannot be defined in the same file where it’s used…? And not even a separate file in the same module?

What about a different module in the same package? It it possible to define “package-private” macros? If so, what does the package manifest look like?

Or does a macro def have to live in an entirely separate package? If so, it is possible to use a nested package-in-package approach for macros tightly coupled to a specific package, and not suitable for standalone release?

Macro definitions will need to be in a different module, but can be from the same package.

Rust has more than one kind of macro. Rust's declarative macros can be in the same file. Rust's procedural macros need to be in a separate crate (akin to a Swift package) that is specially marked as defining a procedural macro.

Access control will be based on the macro declaration, so if your macro declaration is public it can be used outside of the module, if it's internal you can only use it inside the module, etc. Package-level visibility would allow one to create a macro only visible within the package.

I don't have a specific design yet, but my thought is that it'll be a special kind of target in the manifest so that SwiftPM knows it is building a macro definition and can pass the appropriate compiler flags down.

That's how Rust's procedural macros work, but I'd rather not require that. I think we should be able to have the macro definition module be part of the same package as the macro declaration. The macro definition module will have very different dependencies (e.g., it'll depend on swift-syntax) and be built for a different platform/architecture (because it runs on the host, not the target).

These things need to be designed further and written down somewhere. I don't know if it should be this proposal, or whether it should be a separate SwiftPM proposal focused on how macro definitions are built.

Doug

10 Likes

Immensely helpful, and I like all of these design decisions very much. Thank you!

1 Like

Could macros have access to build settings and preprocess Swift source with them (e.g.: to initialise a property)?

I had a use case today where I needed the value for the DEVELOPMENT_TEAM build setting to properly access shared Keychains. I am currently relying on the Info.plist (or some other property list) to be preprocessed and reading that value from it at runtime. But it would have been simpler and easier to use to have it known at compile time and fail to build the app if a team ID couldn't be found.

Now I'm daydreaming of an alternate universe where we could replace Info.plist files with Swift files!

2 Likes

Yes, that's the eventual plan. I've added that to the proposal in Note that MacroEvaluationContext will grow to include build environme… · DougGregor/swift-evolution@d101cff · GitHub

Doug

5 Likes

Folks, I've made some significant revisions to the proposal and posted a second pitch. Thank you all so much for the discussion!

Doug

5 Likes