Extended Error Messages + Diagnostic Naming

Agreed, I think this is an important prerequisite. Whatever naming/error code scheme we decide on should be stable across Swift versions, and would be useful on its own as an incremental step towards better diagnostics. Thanks for linking that bug!

@AlexanderM's suggestion:

would also be a great incremental improvement IMO. We're actually pretty close to being able to support a nicer combined presentation of errors and associated notes like Rust's, there's just a little bit more work needed to prepare the diagnostics emission infrastructure.

One reason to use name is that if you ask on a forum "what is error unsupported_existential_type", the experienced users will reply immediately, but if you ask "what is error 102547", everybody will have first to lookup what this error is before being able to reply.

2 Likes

That’s what the terse error message is for, though, isn’t it?

Agreed, I think there are two use cases to consider here: discussion and search. When discussing an error online or in person, an error code doesn't provide any extra value because it's easy to tell which error is being discussed despite the fact that it's been customized. The real value of error codes is when searching Google/Stack Overflow/documentation where you need a consistent, machine-readable way of identifying the error.

I'm beginning to lean towards assigning a simple ascending integer error code to each error or warning. My reasoning is:

  • When juxtaposed with a terse diagnostic, a name doesn't provide additional information
  • There are currently over 2,100 errors and warnings, choosing appropriate names is nontrivial
  • Error codes have a (mostly) fixed length, making them easy to add to every existing diagnostic message without raising concerns around length
  • Only errors and warnings should be assigned a code. Notes should always be considered attached to their parent, even though the compiler implementation doesn't reflect that yet.
  • An error and warning should not have the same ID. The count should go E0001, W0002, E0003 rather than E0001, W0001, E0002 to avoid ambiguity.
  • I don't think we should attempt to categorize diagnostics (e.g. TypeChecking0001). I could see it being problematic in the future if we wanted to split a large category in two, which couldn't be done in a stable manner.

Existing diagnostic messages could be changed slightly to something like the following:

E0042: protocol 'P' can only be used as a generic constraint because it has Self or associated type requirements

I'm personally against error codes, because we add and remove errors and warnings all the time and now we have to keep track of what the next available code is. It's bad enough remembering to bump the serialization version number.

We may not be able to remove errors and warnings after this (except for prerelease features), but we can at least not make it harder to add them.

[still planning to reply with thoughts about the general effort, but the basic feeling is "yes please!"]

5 Likes

We could just say that the compiler which generated the error is the only one that might be able to explain it. There’s no guarantee that Swift 7.0 can explain an error that occurred while compiling with Swift 4.2.

We could, but that makes it harder to look things up online.

2 Likes

That completely defeats the point of having error codes.

This is essentially about adding Identifiable conformance to Swift compiler messages. i.e. the only point over the existing, implicit equality method (i.e. copy paste the string into StackOverflow) can be to add a stable identifier that uniquely & persistently identifies the thing, irrespective of how the human-readable message might change (over time, or between languages, or whatever).

Actually maintaining that over time isn’t necessarily trivial, though - diagnostics can evolve in subtle ways, such that you have to remember to ask and then answer potentially tricky questions about whether this revised diagnostic should really continue using the same code or whether it needs to adopt a new one. I don’t work on the Swift compiler, so I have no horse in this race - it’s entirely up to those that do if they’re willing to accept this maintenance burden.

Aside from the issue of whether to use error codes or names, I think the question of how we maintain stable identifiers deserves some additional thought. There are a few complicating factors to consider:

  • Internally, there are a number of cases where what appears to the user to be a single diagnostic is actually one of a few different, but closely related variants. For the purposes of documentation and searchability, we should maintain that illusion by having them share an identifier.
  • Not every diagnostic defined internally should have a stable, public identifier. For example, there's no good reason to make SIL parsing diagnostics 'public API'.
  • It may be beneficial to incrementally assign stable identifiers to diagnostics so that it can be done deliberately and we're not pressured to audit 2,000+ all at once.

This suggests to me that we may want to define a 'public diagnostic' as an identifier paired with one or more compiler internal diagnostics (and in the future, perhaps additional documentation). This would potentially make it easier to maintain 'diagnostic stability' in that new variants could be added easily, and new diagnostics would not need to immediately commit to stability. The downside is that adding a new diagnostic would then become a two stage process.

Edit: Separating public and internal diagnostics would also mean that if a new version of the compiler no longer emitted a diagnostic, we could mark the public diagnostic as deprecated, but remove all of the associated internal diagnostics.

3 Likes

Hi all,

Wanted to give a quick update on where this is at. I've opened a PR at https://github.com/apple/swift/pull/27299 which contains an initial implementation of diagnostic identifiers, and a docs update drafting some guidelines for naming and maintaining stability. Feel free to let me know what you think, I'd like to try and pick back up the naming discussion and see if we can reach a consensus!

For now, I'm assuming this won't require evolution, If I'm wrong I can write up a proposal once this is finalized. Once the naming situation is sorted out, I've implemented this so it should be relatively straightforward to integrate with documentation/extended error messages in the future.

1 Like

I thought about this too, but I think we could solve it by switching to camel case for the "public" part of the identifier and using an underscore for the "variant". For instance, where we currently have enum_element_empty_arglist and enum_element_empty_arglist_swift4, we would now internally have enumElementEmptyArglist and enumElementEmptyArglist_swift4, and would always omit the part after the underscore when printing the latter.

I like the simplicity of that approach, but I don't think it completely removes the need for a public diagnostic definition independent of the internal diags. There still needs to be some way to attach metadata to the public identifier. Right now this is only the compiler version it was introduced and removed in, but I'd like to leave open the possibility of extended messages/docs as well, which would clutter the existing definition files if included inline. It might also mean we'd never be able to remove an internal diagnostic once it was made public, because the two definitions would be one and the same.

I don't think these are necessarily insurmountable problems though, and the WIP approach I've taken has its own issues. Specifically, I'd like to reduce the effort required to "publicize" a diag if possible.

I do like the camel-cased naming convention quite a bit though. Subjectively I think UpperCamelCase looks a little nicer but I'd be fine adopting either.

The idea here is to help users to understand diagnostics better, so I find it a bit counter-productive to require use of error codes or public names to look diagnostics up, there is still this extra step which tooling could eliminate. Better solution for this problem might be similar to what @brentdax mentioned in his previous message. We could maintain a "regular" and "descriptive" style of each error/warning and control how much information we'd like to produce e.g. in case of REPL it might make sense to produce "descriptive" messages by default, in IDE - to produce a "one liner" with an ability to expand it into more descriptive version. I think, if descriptiveness is handled by the diagnostic infra/engine, it might make it easier to incrementally build up a set of descriptive diagnostics as we as a tool like swift explain, because that way it could operate on snippets of code instead of individual diagnostic names/errors.

1 Like

Forgot to mention that in conjunction with new diagnostic infrastructure it could yield very nice results, currently we only have diagnoseAsError or diagnoseAsNote (for ambiguity cases), but we can add a third option to diagnose "with maximum detail" which means that we can have custom diagnostic logic and not just textual messages.

I agree with this in principle, but I'm concerned the scope (in terms of LOC added to the compiler and time required) would get out of hand quickly if we tried to implement a tailored message for all nontrivial, user facing errors/warnings (~1100 by my count). Elm is the only language I know of that has attempted this successfully, and it has far fewer diagnostics.

That said, if we went in this direction maybe we could start with a goal of adding extended versions for just the top 50 or 100 diagnostics. I just think it's important to acknowledge we need to make some kind of tradeoff between the detail provided and the % of diagnostics that can be covered in a reasonable timeframe.

1 Like

I agree! And I think if we go the route I have described it should be easy to add diagnostics incrementally, "descriptive" versions for not-yet-implemented ones could just return regular diagnostic and a warning or something similar.

I've been thinking about this a bit more over the past couple days and I think we can break most of the improvements proposed in this thread into one of the following categories:

  1. More detailed educational content in errors. For example, an extra paragraph explaining why a protocol type cannot be used as a generic constraint

  2. More detailed annotations of a user's code when they encounter an error. Instead of producing an error and collection of notes, produce a unified code snippet with more detailed annotations pointing out the problem.

  3. A better way of discussing/documenting diagnostics on external platforms. This is the issue naming diagnostics tries to solve.

What I originally pitched was a combination of 1 and 3 which would make it harder to achieve 2 in the long run. While I think 2 is going to be the hardest item to accomplish, I think it's pretty clear at this point we shouldn't just give up on making improvements in this area. If we build 1 & 2, 3 becomes less important because it's no longer a necessary part of the compiler tooling. However, it's still potentially useful for indexing educational content if we host it on the web, and for use by external tooling/q&a/platforms.


I'd like to propose a new "roadmap" which should hopefully allow us to tackle all of these issues incrementally:

  1. Introduce a new compiler flag, -enable-descriptive-diagnostics. This will gate all of the features described below, and can remain a private frontend option until enough has been accomplished to officially ship the improvements.

  2. If the flag is passed, the compiler will produce a "descriptive" annotated code snippet rather than an error and list of notes. This will follow the approach @xedin describes above. If a particular diagnostic hasn't been migrated yet, fallback to the current system.

  3. If the flag is passed, also check to see whether the diagnostic has educational documentation associated with it. If so, print this before/after the message. By separating educational content which isn't specific to the context from annotations, we can make it easier to make incremental improvements to a diagnostic because this content could be worked on by contributors who are less comfortable writing C++. By putting this behind a flag, we can also expose it in the terminal, through SourceKit, etc. without requiring a swift-explain lookup step.

  4. (Maybe) Introduce public diagnostic names. Even if they're not necessary to interact with the compiler, they might still be useful for looking things up on Google/StackOverflow. They could also be used to index the educational content if we wanted to host it on Swift.org. The question here is, would they be worth the additional maintenance burden they create?


Apologies for the long post, I'd be interested in hearing everyone's thoughts on this! Once some of the details are sorted out, I plan on submitting a PR to add the -enable-descriptive-diagnostics flag and another to submit a formalized roadmap document. I expect actually accomplishing everything listed above will be a very long-term effort, but a lot of these improvements are interconnected and I think it makes sense to properly establish the long-term vision.

6 Likes

The plan looks great to me! I'd be more than happy to help with an implementation.

1 Like

I like this plan a lot. It lets us develop the content and use it to improve the user experience. #2 is of particular interest to me, because especially in the expression type checker, there's a lot more "annotation" information that I would love for us to gather and present to the user (such as why we inferred a particular type that shows up in a diagnostic), but for which we have no presentation mechanism. That said, #1 is the easiest to get started on.

I suspect we'll end up with stable identifiers at some point, but I'm fine with not introducing them immediately if it helps make progress.

Thank you for working on this!

Doug

2 Likes

Related: the latest release of the Elm language is focused on improved error diagnostics. The author is very talkative about the feature: https://elm-lang.org/news/the-syntax-cliff

I find this post interesting because of the work on the goal, tone, and contents of the error messages. It's clearly not only a matter of technical challenges, but also communication.

4 Likes
Terms of Service

Privacy Policy

Cookie Policy