On the second point, I think the idea is that generating diagnostics is something that any macro should be able to do, so there isn’t any reason to call it out.
The purpose of tracking other side-effects is to know when exactly the implementation needs to evaluate the macro, so that we aren’t forced to eagerly evaluate macros to do any sort of incremental type-checking (e.g. when resolving things in an IDE). But if you’re interested in getting all of the diagnostics for a module, you have to expand all the macros anyway because the code emitted by the macro can result in diagnostics even if the macro doesn’t emit one directly.