How would we change error detection and reporting to handle macros? (#dup, the Duplication Macro, part 4)

CTMacUser · September 6, 2020, 10:01pm

I've posted about a duplication macro before. Here's a revised grammar:

Under identifiers (first two augment an existing production):
- identifier → duplication-marker
- implicit-parameter-name → $ ( duplication-marker )
- duplication-marker → $$ decimal-digits
Under keywords that begin with a number sign ( # ): #dup
Under declarations, but in the generic section along with top-level code and code blocks:
- duplication-directive → #dup ( duplication-amount ; balanced-tokens_opt )
- duplication-amount → decimal-literal | duplication-marker

A directive can duplicate one term, or multiple terms if the right separator is in the token sequence at the top level.

// Both of these initialize to zero through five.
let myArray1 = [ #dup(6; $$0) ]
let myArray2 = [ #dup(3; 2 * $$0, 2 * $$0 + 1) ]

A duplication directive normally can appear anywhere a term for a comma-separated list can appear. But they can also duplicate statements (which are semicolon-separated) or protocols/classes (which are ampersand-separated). I'll make up a complete list, including if protocols/classes will stay included, later.

A few days ago, I realized: but what about compiler errors whenever one happens within a duplication directive. There can be three main kinds:

The directive resolves to a null token sequence, but it's in a context where the number of items must be positive, and there are no non-directive and/or non-empty directive sibling terms.
There's a structural error in the token sequence that applies to every iteration.
The structure is valid, but certain values of duplication markers end up out of range in particular iterations.

Right now, I guess the compiler tracks an error's module & file, line, and column. WIth this, or any other macro, it will also have to track the application iteration indices. (Indices, not index, because directives can be nested.) How hard would it be to add this possibility for error detection and reporting?

CTMacUser · September 6, 2020, 11:43pm

More thoughts:

A duplication directive ends up as a null token sequence if:

The token sequence is empty.
The duplication amount is zero.
Every top-level nested duplication directive ends up as a null token sequence and there are no other tokens at the top level besides separators.

A duplication marker can only appear within a directive. A marker-based implicit parameter name can only appear within contexts surrounded by both a closure and a duplication directive. Within a directive's token sequence, "$$0" is expanded to the index of the current iteration, and all other marker values are decremented by 1. A marker within a directive's duplicate amount is controlled by the surrounding directive (with an error if there is no such directive). Directives are expanded from the outside in (so marker-controlled amounts can work), but not dumped to the file's context until after all nested expansions.

If a duplication directive is in a context expecting statements, any statement can be used except compiler-control statements. That's because those are the only statements that can't end with a semicolon. Compiler-control statements can appear if they're protected by a nested scope.

The duplication-directive production will be an alternative to many other productions that take a list. First the context for that outer list is determined, then the directive's interior is checked if its expansion will produce comptible terms. The kind of list also determines which of the top-level separators (comma, semicolon, or ampersand) is permitted.

Comma-separated lists
- identifier-list (not sure it can be empty)
- tuple-type-element-list
- function-type-argument-list (can't be alone and empty if an ellipsis follows)
- type-inheritance-list (can't be alone and empty)
- expression-list (not sure it can be empty)
- array-literal-items
- dictionary-literal-items
- closure-parameter-list
- capture-list-items (can't be alone and empty)
- tuple-element-list
- function-call-argument-list
- condition-list (can't be alone and empty)
- case-item-list (can't be alone and empty)
- catch-pattern-list
- pattern-initializer-list (can't be alone and empty)
- parameter-list
- union-style-enum-case-list (can't be alone and empty)
- raw-value-style-enum-case-list (can't be alone and empty)
- precedence-group-names (can't be alone and empty)
- tuple-pattern-element-list
- generic-parameter-list (can't be alone and empty)
- requirement-list (can't be alone and empty)
- generic-argument-list (can't be alone and empty)
Ampersand-separated list
- protocol-composition-type (can't be alone and empty)
Semicolon-separated list (for all of these, semicolons are not optional except after the last term, and compiler-control statements can't be present at the top level)
- statements
- struct-members
- class-members
- extension-members