New TableGen backends: llvm-tblgen or a new swift-tblgen?

owenv · November 23, 2019, 8:57pm

Over the past couple of days I've been exploring moving Swift diagnostic definitions from our current macro-based approach over to TableGen. This should make it significantly easier to define diagnostics with more complex metadata. However, it requires adding a new TableGen backend to generate headers from the definition files.

Initially, I followed clang's approach to adding custom backends by creating a new swift-tblgen tool in the swift repo. This wouldn't replace llvm-tblgen entirely, as we'd still use it for option parsing. It would support any other new swift-specific backends which don't make sense to upstream to LLVM. The main issue with this approach is that it potentially adds quite a bit of complexity to the swift build, especially when cross-compiling.

The other alternative I see at the moment is just contributing new swift-specific backends to the downstream apple/llvm-project. This would require very few changes to the build, but I'm not sure it's a great idea from an organizational standpoint.

I figured this was worth bringing up for discussion since others who have more experience with the build system might have different opinions on the tradeoffs involved! I don't plan on making any of these changes (if we agree they're a good idea) until after the holidays, so there's no rush.

compnerd · November 23, 2019, 11:51pm

I think that adding more to the local "fork" of llvm/clang is definitely less than desirable. It adds a lot of complexity in maintaining the LLVM/clang repositories, and also makes it more difficult to pick up functionality from upstream.

Many of the LLVM projects have custom table gen binaries (e.g. clang, lldb). I think that adding swift-tblgen isnt really that terrible, with the minor caveat that the build system support for this can be a bit more complicated to setup (so that you can continue to do cross-compilation). However, this is not without example, as the LLVM, clang, and LLDB cases do actually support the cross-compilation. Unfortunately, the swift build is a bit more convoluted, especially due to the unfortunate setup with the runtime/standard library.

Basically, I think that structurally and maintenance wise, it makes sense to put this into the swift tree, but would require a little bit more work upfront to structure the build properly.

DaveZ · November 24, 2019, 10:51am

Can you give some more background on your thinking? What additional metadata do you want to add and why do you see the current macro based approach is being insufficient?

LucianoPAlmeida · November 24, 2019, 3:09pm

That seems interesting
Can you elaborate more on the why this would make easier to define diagnostics? Maybe some examples :))

owenv · November 24, 2019, 5:40pm

There are a two things that are particularly hard to express in the macro-based system: lists and optional attributes.

Lists are difficult to make work because {"one", "two"} is multiple preprocessor tokens, ({"one", "two"}) isn't a valid array literal, etc. As a result, I had to create EducationalNotes.def and a separate set of macros to associate a list of filenames with a diagnostic id for the experimental educational notes feature.

Optional attributes are also annoying to work with because they require either defining a whole bunch of macro variants or explicitly specifying their absence. This isn't a problem today because the only optional tags are PointsToFirstBadToken and Fatal, but it does add a useless none specifier to 95% of diagnostics, and a diagnostic can't be both PointsToFirstBadToken and Fatal.

The TableGen format I have in mind looks roughly like this:

def non_nominal_extension:
  Error<"non-nominal type %0 cannot be extended", [Type]>,
  PointsToFirstBadToken,
  EducationalNotes<["nominal-types.md"]>;

Where only the Error supertype is required.

This can potentially extend in the future to support a lot of the improvements discussed here like:

user-facing public diagnostic names (PublicName<"...">)
an extended diagnostic message format to, for example, surface more detailed type inference info (ExtendedMessage<[Args...], [{ // C++ to generate an extended message }]>)

Moving to TableGen makes it much easier to explore these future directions, even if not all of them end up being implemented.

DaveZ · November 24, 2019, 6:47pm

Having hacked on various *.defs in Swift and being the guy to blame for the design of the inline bitfield macro hacks, I feel your pain. Yes, macros are crude and frustrating, but investing the time and energy to setup and maintain a table gen based approach is a serious endeavor that shouldn't be taken lightly.

I'm not saying you shouldn't create a table gen based approach, but have you exhausted your macro skills yet? For example, why do you need to add none to "95% of diagnostics"? Why does the following not work?

// Forward the old macros to a new "complex" macro
#define ERROR(ID, Options, Text, Signature) ERROR_COMPLEX(ID, Options, Text, Signature, none)

owenv · November 24, 2019, 7:20pm

This works well now, and we already kind of do this with the ERROR/WARNING/NOTE macros that forward to DIAG. It becomes a problem as new independently optional arguments are added though, because the number of required forwarding macros grows exponentially. Similarly, __VA_LIST__ can accommodate some of my use cases for lists, but the UX isn't always great when you're limited to one that must appear at the end.

That's fair. TableGen is certainly the most complex solution to this problem, and it's debatable whether the benefits are worth it. I suppose it's also worth considering whether other parts of Swift might benefit from custom TableGen backends in the future, or if this would be creating a bunch of infrastructure that's only used in one part of the compiler.

DaveZ · November 26, 2019, 10:49am

Well, before you dive deep into the world of table gen, have you looked at some of the other defs files to see how they handle this problem? In particular:

Attr.def has to deal with lots of options
The current state of ReferenceStorage.def is my fault, but it shows how one can have different macro forwarding dimensions depending on what a client wants. See also DeclNodes.def for a similar but simpler example.

akyrtzi · November 27, 2019, 5:13pm

Have you considered possibly using gyb? It should be much easier to setup.

owenv · November 27, 2019, 6:12pm

I never thought of that, it might work really well actually! I plan on experimenting more with some of the suggestions from this thread next week, I’ll take a look at gyb too.