A Possible Vision for Macros in Swift

I'd prefer that we not debate the validity of dependency injection specifically in this thread; I merely mentioned it as a possible use case. I know it's an opinionated topic, but from a technical standpoint, the APIs presented here could permit such designs.

4 Likes

Is it possible to do incremental compilation on a file that includes a macro written as arbitrary Swift code? Given that the macro can access things like Date(). Or can we put limitations that given the same input the macro should return the same output? Or otherwise prevent complete re-compilation?

Illustration of concern

So imagine you had a file like this:

func longComplicatedFunction() {
  // ...
  print(#messThingsUp())
  // ...
}

Which calls this macro:

public struct MessThingsUp: ExpressionMacro {
  public static func apply(
    expansion: MacroExpansionExprSyntax, in context: MacroEvaluationContext
  ) -> (ExprSyntax, [Diagnostic]) {
    if isTuesday() {
      return ("2", [])
    } else {
      return ("1", [])
    }
  }
}

Is that allowed? And if so, does every use of a macro have to pay for the fact that this is possible? Does every single swift build require re-building that Swift file, even if nothing changed? How about:

struct Foo {
    var value: Int
    #memberwiseInit // From proposal
}

Or

struct Baz: Equatable { // If we implemented Equatable with macros
    var value: Int
}

Sorry let me clarify, I don't mean to knock dependency injection. I use it all the time in fact. I meant to highlight the challenges and complexity of moving things from obvious Swift code into macros / annotations that do a lot of work behind the scene. Such a move made my work more difficult and less enjoyable because I was not able to debug everything going on during compilation. I think that's relevant to the discussion in that moving certain things into the compilation process can actually hurt the dev experience rather than help it. Good diagnostics help but if I can't inspect the generated code then it can get very painful. Maybe there's something we can learn from Dagger that will help us make better macros (or "annotation processors", whatever you want to call then) in Swift. Hope that makes sense!

9 Likes

I think we'd want to assume that macros have no dependencies or statefulness other than their direct inputs, at least by default. We can't really enforce that for external macros, of course, but we can say that those are the constraints you're expected to live under. With internal macros, we could just make that kind of API unavailable to the compile-time interpreter.

Some kinds of macros would find it useful to be able to access files (and thus have dependencies), but we'd want that to be opt-in. Also, we'd need to be careful about how that works ā€” we wouldn't want to make Swift impossible to use with specific build systems or rule out future improvements to incremental compilation.

Accessing Date() directly in a build is basically never the right thing to do. If you want to capture the build time in the build, something at the start of the build should determine that and pass it down.

7 Likes

I like the idea of building a macro system, and I love the concept of using SwiftSyntax in a compile-time-run module to do it. In fact, I'm inclined to suggest that perhaps we should go all-in on this model and drop any notion that you might ever define a macro directly in the module that will use it. I don't think it's a terrible idea to add some extra ceremony to macro definitionsā€”it would discourage overuse.

If we did give up on inline macros, we could repurpose the body of a macro, treating it as a set of key-value pairs like a precedencegroup declaration. This would avoid the large, ungainly argument lists that would otherwise be required on most (all?) macro modifiers:

// Instead of this:
macro(contexts: [.expression, .parameter], external: "MyMacros.Stringify")
func stringify<T>(_ value: T) -> (T, String)

// You'd write this:
macro func stringify<T>(_ value: T) -> (T, String) {
  contexts: expression, parameter
  external: MyMacros.Stringify
}

I'm tempted to suggest that the macro declarations should be written in the compile-time module, not a runtime module, and you should write something like macro import MyMacros to bring them in. It looks to me like the macro decls and their implementations will usually be tightly coupled, so I think it makes sense to keep them together. (If a run-time library wanted to automatically export macros for its clients to use, it could use @_exported macro import MyMacros.) If macro names are always written in source code with a distinguishing symbol like #, we should always know whether we need to look up that name in runtime modules or macro modules.

(Note that this doesn't stop us from, say, automatically applying a macro to an argument if the parameter is marked with the macro's nameā€”the # at the parameter declaration is enough to figure it out.)

I'm slightly bothered by the fact that the contexts argument and the conformances on the type duplicate the same information. I feel like I can almost see a way to eliminate the separate macro declarations entirely and just use the types, conformances, and members to define macros, but the piece that I can't figure out how to do in plain Swift is conveying generic type signatures like <T> (T) -> (T, String) in a way that operates on runtime types but is still statically comprehensible to the type checker.

If we did crack that nut and people just used the macro type's name directly when applying the macroā€”rather than indirecting through a separate macro declaration with its own nameā€”it would naturally provide a nice separation between uppercased user macros and lowercased built-in macros, just as we already have for property wrappers.


My brain must be broken from all my module loading work, because I find myself (perhaps prematurely) wondering how compile-time macro modules should be laid out on disk and discovered by the compiler. Do we need a separate set of macro search paths, or would they be installed right alongside the runtime .frameworks and .swiftmodules and we'd just include the actual dynamic libraries instead of the TBDs? Or would we want to distribute them as source code and execute them in immediate mode to better accommodate cross-compilation?


I don't think it's discussed in this document, but I'd like to talk about macro ordering and interactions between macros.

If you have multiple macros all operating on the AST at once, it's possible for one of them to manipulate the AST in a way that is of interest to another. For example, suppose you have a #synthesizeStoredProperty macro that would add a stored property to conforming types that don't declare it:

protocol Versioned: Identifiable where ID == UUID {
  var generation: Int { get }
}
extension Versioned {
  #synthesizeStoredProperty var id: UUID
  #synthesizeStoredProperty var generation: Int
}

And you also have a #memberwiseInit macro that creates a memberwise initializer. Implementations for these:

struct SynthesizeStoredProperty: ConformanceMacro {
  public static func apply(
    conformingType: TypeDecl, protocol: ProtocolDecl,
    witness: FunctionDeclSyntax,
    in context: MacroEvaluationContext
  ) -> (FunctionDeclSyntax, [Diagnostic]) {
    return witness
  }
}

// Copied from the vision
struct MemberwiseInit: MemberDeclarationMacro {
  public static func apply(
    enclosingType: TypeDecl, in context: MacroEvaluationContext
  ) -> (FunctionDeclSyntax, [Diagnostic]) {
    let parameters: [FunctionParameterSyntax] = conformingType.storedProperties.map { property in 
      let paramDecl: FunctionParameterSyntax = "\(property.name): \(property.type)"                                                                         
      guard let initializer = property.initializer else {
        return paramDecl
      }
      return paramDecl.withDefaultArgument(
        InitializerClauseSyntax(
          equal: TokenSyntax(.equal, presence: .present),
          value: "\(initializer)"
        )
      )
    }
    
    let assignments: [ExprSyntax] = conformingType.storedProperties.map { property in
      "self.\(property.name) = \(property.name)"
    }

    return (
      #"""
      public init(\(parameters.map { $0.description }.joined(separator: ", "))) {
        \(assignments.map { $0.description }.joined(separator: "\n"))
      }
      """#,
      []
    )
  }
}

So, what happens if you try to use them on the same type?

struct Document: Versioned {
  var title, text: String

  #memberwiseInit
}

One possible answer is that each macro runs independently and works purely from what was written in the source code, but although this means the macros will work reliably, they won't actually work together: the memberwise init will be unaware of the stored properties added by the conformance and will not fully initialize the type. Ideally, we'd like the macros to apply in a particular orderā€”first adding the stored properties, then synthesizing the initializerā€”so that the initializer would know about the stored properties and produce the correct result.

You might achieve this by having multiple entry points which run in a defined order, each after some aspect of the AST has been finalized. For instance, MemberwiseInit might implement a variant of the apply method that has an additional storedProperties: parameter and is defined to execute later, after all macros which do not need information about stored properties have been run. If there was a rule that this entry point was not allowed to synthesize additional stored properties, then there would be no danger of stored properties being added after the macro had been expanded.

struct MemberwiseInit: MemberDeclarationMacro {
  public static func apply(
    enclosingType: TypeDecl,
    storedProperties: TypeDecl.StoredProperties,    // <--- NEW
    in context: MacroEvaluationContext
  ) -> (FunctionDeclSyntax, [Diagnostic]) {
    let parameters: [FunctionParameterSyntax] = storedProperties.map { property in 
      ...
    }
    
    let assignments: [ExprSyntax] = storedProperties.map { property in
      ...
    }

    return (...)
  }
}

Another, rather cooler, way to implement this might be to have all macros notionally begin executing immediately, but require them to await information that might be influenced by other macros. The macro would only be continued once all of the macros that didn't read that piece of information had finished executing. For instance:

struct MemberwiseInit: MemberDeclarationMacro {
  public static func apply(
    enclosingType: TypeDecl, in context: MacroEvaluationContext
  ) async -> (FunctionDeclSyntax, [Diagnostic]) {
    // Suspends until all macros that don't read stored properties have finished.
    //                                          vvvvv--- NEW
    let parameters: [FunctionParameterSyntax] = await enclosingType.storedProperties.map { property in 
      ...
    }
    
    // We've already gotten the stored properties, so in practice this will never suspend.
    //                              vvvvv--- NEW
    let assignments: [ExprSyntax] = await enclosingType.storedProperties.map { property in
      ...
    }

    return (...)
  }
}

A further complication is that you might want a macro to look at either enclosing or nested types. This would likely come up in Codable synthesis:

extension Encodable {
    #DeriveCodingKeys enum CodingKeys
    // Needs to look at the stored properties/cases of `Self`

    #DeriveEncodableEncode func encode(to encoder: Encoder) throws
    // Needs to look at the cases of `CodingKeys`
}

Perhaps the request evaluator would suffice to handle these interdependencies and diagnose any circularity that might be present. (Or maybe CodingKeys has to stay as compiler magic. It certainly seems to be an especially delicate derivation.)


AttributedStringKey seems like a similar use case where there's a bunch of boilerplate to make a type work with dynamic member lookup:

enum OutlineColorAttribute : AttributedStringKey {
    typealias Value = Color
    static let name = "OutlineColor"
}

struct MyFrameworkAttributes : AttributeScope {
    let outlineColor = OutlineColorAttribute()
}

extension AttributeDynamicLookup {
    subscript<T: AttributedStringKey>(dynamicMember keyPath: KeyPath<MyFrameworkAttributes, T>) -> T {
        return self[T.self]
    }
} 

One could imagine at least generating the AttributeDynamicLookup extension automatically when the AttributeScope conformance is declared.


As @stuchlej, @allevato, and @Finagolfin have pointed out, tool support is going to be absolutely vital to this effort. On the bright side, though, if we have a way to display macro expansions in our tools, that means any existing feature that we port to macros (or at least to a sufficiently macro-like implementationā€”probably meaning that it synthesizes a syntax tree instead of an AST) magically gets that feature too. This could finally give us a viable approach to improving the legibility of features like Codable synthesis, property wrappers, or the result builder transform. There's a lot of potential for synergy here.


Overall, I'm really excited about this effort. It seems like a surprisingly practical way to put a lot of incredibly powerful techniques into the hands of ordinary users.

34 Likes

Things I'd like that a macro system might perhaps support:

  • Automated custom protocol conformance synthesis in a way similar to Hashable, Codable, etc. A concrete example from this morning is I have a protocol somewhat similar to Codable. But if I introduce it I don't get automatic conformance, which is terrible ergonomics. Instead I'll use Codable (with user values) but this is less type safe and doesn't document itself well.
  • Effects currently accomplished by code generation such as populating a module with let constants for image assets, or strings

Maybe Iā€™m missing and this is possible or Iā€™m completely off base, still digesting the pitch, if so apologies;

One thing weā€™d like to do if possible, is to have a Context of a type or protocol declaration, such that we could generate a stand-alone function/class/ā€¦ at the top level referencing e.g. the type or protocol name (and possibly stored properties as a bonus if available) - that would allow us to generate e.g. boilerplate code for dynamic loading and related helper functions/classes.

Something with a use site like:

#dynamicLoadingMacro // this one gets access to stored properties 
struct myType {
ā€¦
}

or

#someOtherMacro
protocol myProtocol {
ā€¦
}

Overall very excited about the pitch, especially for being able to generate e.g. customized encoding/ decoding of types that we currently do with Sourcery.

Not to discount the ordering concerns you've raised, but on this specific example it does feel to me like there's a bit of an inversion with a macro like this. It's... odd that #synthesizeStoredProperty is able to not only reach 'out' of its own scope but somehow 'into' the scope of conforming types. This seems like the type of macro that would violate my expectations about the types of transformations macros can do, unless we already had a feature that allowed protocols to do some boilerplate-heavy introduction of storage into conforming types and #synthesizeStoredProperty is just shorthand for that boilerplate.

This (or one of the automatic approaches you mention) would be super cool, and feels like the best user experience to me. If we could let non-circular dependency chains 'just work' while providing useful error messages when there is a circular dependency (e.g. 'macro #a modifies the stored properties of T, which #b depends on, and #b modifies the methods of T which #a depends on'), that would be awesome. I don't know how technically feasible this would be, though.

3 Likes

I have the use case for regex literals that I would like to reference own custom character classes (consisting of ranges and single code points) in regex literals, e.g. something like /a\{my_character_class}/, and ā€” without fully thinking it through ā€” I would say that you would need some kind of macro system to do that, replacing \{my_character_class} with e.g. [\u2000-\u200B\u0009] during compilation. I would even better if the replacement [\u2000-\u200B\u0009] could be the result of a calculation, i.e. this character class is actually defined in a proper way (not just as a replacement text).

I love the idea of making things like automatic protocol conformance available outside of the compiler. I think the requirement of making the definition exist in a separate module will avoid some of the abuse you see with other macro systems.

3 Likes

I think we should be careful not to make it hard to abuse macros by making it hard to use macros. Iā€™m concerned that adding overhead to using them will make many legitimate use cases painful, or prohibitively verbose.

Was there specific anti-patterns you expect it to prevent, that donā€™t apply to all macro use?

4 Likes

I think it wasn't about hard to use when defined, but to introduce a slight tension / pressure by having to define a separate macro package? That seems like a great tradeoff I think, it makes macros something you don't reach for in everyday situations, but that exists as a sharp(ish) tool to bring in when its right for the job - similar to SwiftPM plug-ins.

Glad to see something similar to Rust's procedural macros. It's not perfect, but it's the best experience I've had with a macro system, and the tools the community produces with it are fantastic too. I'd be very happy to have something like this added to Swift.

One thing Rust macros are (generally) very careful about is generating code with unambiguous, fully-qualified type names, so that the behavior of the generated code can't be altered by an unexpected import or shadowing declaration. Right now, Swift doesn't have any mechanism that allows this to work. As soon as there's a package 'X' and a type 'X', it's no longer possible to refer to the package explicitly. I suspect Swift would need to add an equivalent of C++'s "prefix ::" to force name resolution to start at the root, to avoid this problem.

Perhaps obviously, the context passed to the macro evaluation should also have a way to generate unique local names (with a given prefix, perhaps) so that local hygiene can be taken into account, too.

Generally though, this kind of procedural macro leaves hygiene up to the macro author. I wonder if it's possible to build the code generation API in such a way that correct hygiene is at least a little more guaranteed?

14 Likes

I'm excited about the new kinds of things that these macros would enable!

I'd like to ask the authors to consider and expand on the user experience for implicit macro expansions, both for clients and maintainers of a library. During the design process, it would be great to make sure we're paying attention to traps that might add confusion to the process of reading and writing code.

For example, the vision document includes this example of an assert method that "stringifies" its parameter before evaluation:

// the macro declaration:
macro func stringify<T>(_ value: T) -> (T, String)

// the assert function:
func assert(#stringify result: Bool) {
  if !result.0 {
    fatalError("assertion failed: \(result.1)")
  }
}

Clients
As a caller of the assert(_:) function, it doesn't look like I'm impacted by the #stringify macro at all. For cases like this, it would be great if the documentation (and quick help, autocomplete, etc.) could omit the #stringify annotation, which only serves to confuse.

If there are cases where I do need to know about the macro as a caller (are there examples of this?), only then would I want to see the annotation. Is this a distinction that needs to be made when defining the macro? Or would the macro-less version of the function always provide all the type information that a caller needs (in this case, that assert(_:) is a (Bool) -> Void function).

Maintainers
As someone in charge of the library that declares assert(_:), on the other hand, the transformation performed by #stringify is salient but unfortunately invisible at the declaration site. In the declaration of the function, result appears to simply be a Bool, but in the body of the function has the type (Bool, String). I worry that this change from the visibly annotated type will be confusing, since it is a significant change from how Swift declarations work elsewhere.

Are there ways we could mitigate this confusion? Unfortunately, most of my ideas for solutions require more code at the declaration site, which I think is the opposite of this feature's intended effect.

9 Likes

To me, the best way to ensure hygiene is to switch away from textual macros and towards something thatā€™s expressed directly in terms of declarations. That means e.g. that you arenā€™t reliant on finding names for synthetic variables that donā€™t otherwise appear in the program, and instead you just directly emit a reference to the declaration you mean. And if you operate on type-checked AST, thereā€™s no possibility of references in the operands being disturbed by anything (unless, of course, you intentionally transform it to force name lookup to be re-done).

Very late edit: to be clear, achieving that will have to be a long-term move. Using external textual/lexical macros is a fine alternative until then; itā€™s just yet another reason why they are mostly a short-term strategy.

7 Likes

One area of Swift that I think could benefit from macro capabilities is literal initialization. It would be nice if these things were allowed:

let number: UInt24 = 0xFFFFFF
let string: EncodedString<Unicode.UTF32> = "hello world"
let version: Version = 1.0.0

struct UInt24 {
    var value: UInt32

    macro init(integerLiteral value: UInt32)
    // sets `self.value` to `value` if under 2^24, otherwise produces a compile-time error
}

struct EncodedString<Encoding: UnicodeCodec> {
    var contents: [Encoding.CodeUnit]

    macro init(stringLiteral value: String)
    // converts the `String` instance to an array of code units at compile time then sets `contents` to that array
}

struct Version {
    var major: Int
    var minor: Int
    var patch: Int

    macro init(numericLiteral major: Int, _ minor: Int, _ patch: Int)
    // initializes a `Version` based on a series of dot-separated numeric values
}
2 Likes

Maybe this could also help with the initialization of larger structures? (Of course this would concern serialization?) Currently you can easily have a problem even when trying to compile regular code that initializes large structures.

My biggest concern with any macro system design is debuggability. Especially for statement macros, the IDE would have to show the generated code and allow you to step through it. As long as we get to actually step through the macro-evaluated code, debuggability won't be a problem (unless I forgot about something)

14 Likes

I am familiar with Dylanā€™s macro system (and even wrote a guide on the subject), so Iā€™ll comment in relation to that, to provide some examples in the solution space.

Dylan macros are normally hygienic in that variables declared within a macro expansion are only visible there; they are always gensymā€™d. However, the system provides an escape hatch so that the caller can supply a variable name that is given a definition by the macro and used elsewhere by the caller, akin to a let without an initialization expression that is given its value within a subsequent ifā€¦else statement.

(Dylan macros work by pattern-matching, so the system takes the caller-supplied name from that spot in the call-site as specified by the pattern.)

Additionally, the macro can define a variable that the caller can see and use within its call to the macro but once execution leaves the text comprising the macro call, the variable goes out of scope. The macro can use this facility to create helpers or references to macro-relevant concepts that the caller can manipulate in some fashion. An example of this would be some identifier akin to a break statement, within a macro analogous to a switch statement. Quite useful.

4 Likes

I don't have anything to add other than that I'm very excited to move away from GYBs in my projects!