[Pitch #2] Expression macros

Douglas_Gregor · December 6, 2022, 6:56am

Hi all,

Based on discussion in the first pitch of expression macros, I've revised the proposal. The updated proposal is here. Changes include:

Rename MacroEvaluationContext to MacroExpansionContext.
- Remove MacroResult and instead allow macros to emit diagnostics via the macro expansion context.
- Remove sourceLocationConverter from the macro expansion context; it provides access to the whole source file, which interferes with incremental builds.
- Rename ExpressionMacro.apply to expansion(of:in) to make it clear that it's producing the expansion of a syntax node within a given context.
- Remove the implementations of #column, as well as the implication that things like #line can be implemented with macros. Based on the above changes, they cannot.
- Introduce a new section providing declarations of macros for the various # expressions that exist in the language, but will be replaced with (built-in) macros.
- Replace the external-macro-name production for defining macros with the more-general macro-expansion-expression, and a builtin macro externalMacro that makes it far more explicit that we're dealing with external types that are looked up by name. This also provides additional capabilities for defining macros in terms of other macros.
- Add much more detail about how macro expansion works in practice.
- Introduce SwiftPM manifest extensions to define macro plugins.
- Added some future directions and alternatives considered.

The prototype implementation is also making good progress: it's integrated in the compiler behind some experimental feature flags and is able handle some interesting macros. I'll follow up with some instructions on writing your own macros to try it out.

I'd love to hear your thoughts on this new revision!

Doug

mtsrodrigues · December 6, 2022, 4:11pm

I'm confused here because the proposal mentions implementations for these in builtin macro declarations

xwu · December 6, 2022, 5:20pm

It gives the declarations for these, not the implementations; you may have missed the text in the revised introductory paragraph of that section:

The actual macro implementations are provided by the compiler, and may even involve things that aren't necessarily implementable with the pure syntactic macro.

davedelong · December 6, 2022, 6:02pm

After my previous comment about #printArguments, I'm now wondering about #context.

It doesn't look like a macro evaluation can recursively evaluate other macros as part of the expansion (without requiring the user to manually include those macros at the call site, such as that #addBlocker(#stringify(x + 1)) example). This implies to me that I cannot make a #context macro that would automatically pick up the file, line, column, dsohandle, etc of its call site.

Therefore... could that information be retrievable from the MacroExpansionContext? There's moduleName and fileName, but being able to get the #line, #column, #function, and #dsohandle would make building a context structure much more straight-forward.

blangmuir · December 6, 2022, 6:33pm

Do you actually need the values at expansion time, or just in the final program? I don't think you can evaluate #line to an Int for your caller in your macro, but you could embed #line in the syntax tree you return and let it be evaluated in turn. Doing it that way gives the compiler visibility into when line numbers, etc. are actually needed, which improves the ability to cache or do incremental compilation.

Note: This was explicitly removed from MacroExpansionContext in the current update (removal of sourceLocationConverter).

Joe_Groff · December 6, 2022, 6:37pm

Should the macro declaration declare what kind of macro it is, as in expression macro foo(...)? Even if we can infer the sort of macro from the implementing type, it might be good for human readability to state it up front, and doing so might also open up the grammar so that other kinds of macro can use different signature syntax if the type-based signature productions don't make sense for them.

Douglas_Gregor · December 6, 2022, 6:44pm

Right. I was planning on following up with @davedelong after posting this, because my earlier answer about building a #context is no longer valid. Exposing complete line/column/source file text to macro declarations breaks incremental compilation because we cannot track where a macro implementation uses that information. If we want macros to be able to use this information, we need to provide it via the MacroExpansionContext.

Doug

beccadax · December 7, 2022, 12:19am

The struct label here brings up a minor nit with the examples: We don't need to instantiate instances of the macro types, so they ought to be written as uninhabited enums, right? And if so, we probably don't want to use struct as the label—we'd want type or maybe enum instead.

For what it's worth, the vision document originally included the macro kind(s) and various other information in a long argument list on a macro modifier; I asked if we could remove or reformat that information for better readability, and one of the changes Doug made in response was to get the macro kinds from the conformances on the implementation, since it already had to be written there.

From what I can see, we could write the macro kind on the decl, but (a) it would be entirely redundant and (b) we would have to decide if it makes sense to have more than one kind and how that should be written. I'll let Doug chime in on those issues.

On the gripping hand, ExpressionMacro only has one function in it anyway. Do we anticipate that other kinds of macros will require several functions? If not, perhaps we should drop the type entirely, define the macro expansions as free functions, and write the macro kind only in the macro decl, removing the redundancy in the other direction.

// module declaring the macro
expression macro stringify<T>(_: T) -> (T, String) = #externalMacro(module: "ExampleMacros", func: "expandStringify(of:in:)")

// module defining the expansion
public func expandStringify(
  of node: MacroExpansionExprSyntax, in context: inout MacroExpansionContext
) -> ExprSyntax {
  guard let argument = node.argumentList.first?.expression else {
    fatalError("compiler bug: the macro does not have any arguments")
  }

  return "(\(argument), \(literal: argument.description))"
}

kishikawakatsumi · December 8, 2022, 1:45pm

I am considering using macros to embed binaries into Swift source code.

EXAMPLE:

let data = #embed("images/icon.png")

// ^ This macro will generate the code below.
let data = Data([0x4d, 0x49, 0x54, ..., 0x65, 0x6e, 0x73])

Similar to the #embed macro in C23.

This would be useful for applications that do not have bundles, such as command-line tools.

To make this work, I need to access the file system from the macro processor.

Macro implementations will be executed in a sandbox like other SwiftPM plugins, preventing file system and network access.

The proposal says that the macro runs in a sandbox and does not have a file system or network access, but is there a mechanism planned to access read-only files or get permissions explicitly, like SwiftPM's command plugin?

Douglas_Gregor · December 8, 2022, 6:10pm

Yes, I consider this a natural extension to the proposal, and probably one of the first ones we should do. I think it'll need both SwiftPM manifest file changes and new API in MacroExpansionContext.

I'm not sure where to put this information in the SwiftPM manifest file; we need to know the files that could be read by the macro implementation so the build system can appropriate form dependencies on them. It likely needs to be on the target, e.g., as a new kind of Resource.

Doug

Douglas_Gregor · December 8, 2022, 6:37pm

beccadax:

macro stringify<T>(_: T) -> (T, String) = #externalMacro(module: "ExampleMacros", struct: "StringifyMacro")
The struct label here brings up a minor nit with the examples: We don't need to instantiate instances of the macro types, so they ought to be written as uninhabited enums, right? And if so, we probably don't want to use struct as the label—we'd want type or maybe enum instead.

The macro protocols don't provide a way to create instances of the macro types, but I don't think that should limit what kinds of types people can use to define macros. Maybe their expansion(of:in:) method creates an instance and performs operations on it, because that's convenient.

If your goal is to replace the need to specify struct / enum / actor / class so we have just one version of this macro, we can do that. I used the keywords because it's easier for the implementation (we know exactly what mangled name to look for) and it emphasizes that you need to have one of these concrete type kinds---typealiases or non-nominal types won't work. I do like the idea of having a single externalMacro(module:type:), though.

It is technically redundant, yes. However, the macro implementations are mostly hidden from clients of the macro, so there would still be value in having something there on the macro declaration that you can see in the code / documentation / generated interface.

The main reason I held back on adding something like expression macro is a concern that we'd be adding a bunch of very specific declaration modifiers. expression is a simple declaration modifier, but the vision document lays out a bunch more potential ones: things like declaration, propertyWrapper, or functionBody. Some of these might need arguments (e.g., to say what kinds of names they introduce) as well. The declaration-modifier part of the grammar isn't all that easy to extend without affecting source compatibility.

Perhaps this pushes us to attributes. @expression macro isn't so bad here, and we get to re-use @propertyWrapper if we want a macro form of property wrappers. But we'd still be adding a bunch of attributes, most of which only make any sense on a macro, which bothers me a little bit. It's probably better than deriving information about where/how a macro can be used from the macro implementation, though.

There are several benefits of using protocols that I don't want to give up. For one, the compiler checks that you've provided the right signature, which we wouldn't get for free functions. Also, we can evolve the protocol a bit---for example, maybe we want to add more requirements (with default implementations) to help customize the interaction, e.g., "does this macro want to see a raw syntax tree or one on which operator folding has occurred?". Or perhaps we add an async version of the requirement later on. None of that is straightforward with the free-function approach.

Doug

allevato · December 8, 2022, 6:48pm

Since each of these is a partitioning of the space of all macros, what about adopting a parenthesized syntax? That would avoid filling up the attribute space or introducing a bunch of one-off contextual keywords as declaration starters:

macro(expression) stringify<T>(...)
macro(declaration) deriveEquality<T>(...)
macro(functionBody) traceCalls<T>(...)

Since macro implementations conform to protocols like ExpressionMacro (and in the future, likely DeclarationMacro, FunctionBodyMacro, etc.), it's possible for one type to support multiple types of macros via multiple conformances. Expanding the above to a comma delimited list mirrors that nicely, as long as the signatures are compatible (I guess we need to see more of these to know how realistic that is in practice):

macro(expression, declaration) someMacro<T>(...)

// Elsewhere...
public struct SomeMacro: ExpressionMacro, DeclarationMacro { ... }

Douglas_Gregor · December 8, 2022, 7:10pm

allevato:

Expanding the above to a comma delimited list mirrors that nicely, as long as the signatures are compatible (I guess we need to see more of these to know how realistic that is in practice):
macro(expression, declaration) someMacro<T>(...)

Yeah, let's add some parameterization for the cases that are more complicated than expression macros:

Declaration macros might need to say what names they declare, e.g., "I create a declaration with the name foo" or "I create a declaration by applying the prefix orig_ to the name of the declaration I apply to", so something like declaration(prefixedName: "orig_").
A function-body macro might want to say whether it's able to synthesize a complete implementation from nothing, e.g., functionBody(.synthesized).
A conformance-synthesizing macro might want to say what protocol it synthesizes form e.g., conformance(to: Hashable.self).

Sure, it's unlikely that all of those are going to be applied to a single macro, but it's interesting to think about. As declaration modifiers, this is a source-compatibility minefield:

declaration(prefixedName: "orig_")
functionBody(.synthesized)
conformance(to: Hashable.self)
macro myDoAllMacro: Void

Your suggestion of a parenthesized syntax works fine:

macro(
  declaration(prefixedName: "orig_"),
  functionBody(.synthesized),
  conformance(to: Hashable.self)
) myDoAllMacro: Void

although it feels a little odd to put so much information in the introducer. Even with a realistic example where you have one of the parameterized ones, e.g.,

macro(conformance(to: Hashable.self)) myDoAllMacro: Void

it feels busy in the introducer. This is why I'm leaning toward attributes:

@declaration(prefixedName: "orig_")
@functionBody(.synthesized)
@conformance(to: Hashable.self)
macro myDoAllMacro: Void

We already have rules for custom attribute parsing, so adding new attributes is easy in a source-compatible way.

Doug

blangmuir · December 8, 2022, 7:12pm

This may already be what you're thinking here, but I think we want the file contents to be loaded via the expansion context or by some external means (swiftpm itself?) so that we can model the dependency and invalidate caches if the contents change without giving the macro direct filesystem access to the potentially mutable file.

Douglas_Gregor · December 8, 2022, 7:15pm

Yes, that's what I'm thinking. The MacroExpansionContext isn't going to directly touch the file system---it's going to ask the compiler/SourceKit (whomever it is talking to) to give it the contents of the file as a buffer.

Doug

allevato · December 8, 2022, 7:22pm

It's a future direction so I'm not looking to go into a lot of detail here today, but I'm very invested in what the non-SPM driver interface for this would look like, so that we can ensure that Swift targets in Bazel can pass additional input files to the compiler for macros to use. Since the compiler is setting up the sandbox and doing all the communication with the macro process, would we just need a flag like (contrived name) -allow-macro-to-read-this-file <PATH>?

Douglas_Gregor · December 8, 2022, 7:35pm

Yes, exactly.

Doug

jrose · December 12, 2022, 12:00am

Expression macros are the least interesting kind of macros for me, but this design seems pretty reasonable. How do I disambiguate in case of collisions, though? #file.suffix(4) is currently valid syntax that lexically looks a lot like #Swift.selector(doStuff).

Douglas_Gregor · December 13, 2022, 9:42pm

At present, the grammar of macro-expansion-expression only allows an identifier after #:

macro-expansion-expression -> '#' identifier generic-argument-clause[opt] function-call-argument-clause[opt] trailing-closures[opt]

So #Swift.selector(doStuff) is parsed as (#Swift).selector(doStuff), and we have no way of disambiguating if you do import two modules that define the macros with the same name, and overload resolution doesn't suffice for disambiguation. That's a bit like the existing limitation if you have two declarations in extensions on a type that come in from different modules. I expect that can only be addressed with new syntax (e.g., a.Swift::b() and that the new syntax would also apply here to macro expansions (#Swift::selector(a.b)).

Doug

Douglas_Gregor · December 14, 2022, 6:03am

Hey folks,

Based on discussion here, I've revised the proposal again, albeit with a smaller set of changes focused on getting us to what I believe is a reviewable state:

Moved SwiftPM manifest changes to a separate proposal that can explore the building of macros in depth. This proposal will focus only on the language aspects.
Simplified the type signature of the #externalMacro built-in macro.
Added @expression to the macro to distinguish it from other kinds of macros that could come in the future.
Make expansion(of:in:) throwing, and have that error be reported back to the user.
Expand on how the various builtin standard library macros will work.

Doug