RFC: Better diagnostics through simplification

Hello,

I’d like to propose improving the quality, consistency, and future preparedness of diagnostics through simplification. Right now, a lot of diagnostics output tends to look like this (for example):

  type ‘Foo’ cannot subclass type ‘Bar’

Instead of hard coding the word “type” into the message, I’d like to change the diagnostics engine to dynamically introspect the types and print out the type kind instead. Therefore, after this change, the message output would look like this:

  class ‘Foo’ cannot subclass struct ‘Bar’

Which is clearly more better. And it requires less work and specialization on behalf of diagnostics clients because the messages within the compiler “defs” files would change from:

  “type %0 cannot subclass type %1”

To:

  “%0 cannot subclass %1”

Which is obviously simpler, better, and more future proof. In particular, clients of the diagnostics engine would no longer need to haphazardly customize messages to acknowledge optionals, metatypes, etc as they do today. The engine itself would deal with that.

What do people think? The meat of the change is tiny and below. The vast majority of the work is the long and painful fallout in the test suite, which I’m cautiously (and perhaps foolishly) stepping up to doing.

Cheers,
Dave

diff --git c/lib/AST/DiagnosticEngine.cpp i/lib/AST/DiagnosticEngine.cpp
index c99833d373..32ba196f04 100644
--- c/lib/AST/DiagnosticEngine.cpp
+++ i/lib/AST/DiagnosticEngine.cpp
@@ -373,6 +373,121 @@ static bool shouldShowAKA(Type type, StringRef typeName) {
   return true;
}

+static std::string computeDiagnosticTypeKindString(Type &type) {
+ switch (type->getKind()) {
+#define TYPE(id, parent)
+#define BUILTIN_TYPE(id, parent) \
+ case TypeKind::id: return "builtin ";
+#include <swift/AST/TypeNodes.def>
+ case TypeKind::InOut:
+ type = type->getInOutObjectType();
+ return "inout " + computeDiagnosticTypeKindString(type);
+ case TypeKind::Function: return "function ";
+ case TypeKind::GenericFunction: return "function ";
+ case TypeKind::Struct: return "struct ";
+ case TypeKind::BoundGenericStruct: return "struct ";
+ case TypeKind::Class: return "class ";
+ case TypeKind::BoundGenericClass: return "class ";
+ case TypeKind::Enum: return "enum ";
+ case TypeKind::BoundGenericEnum: return "enum ";
+ case TypeKind::ProtocolComposition: return "protocol ";
+ case TypeKind::Protocol: return "protocol ";
+ case TypeKind::Module: return "module ";
+ case TypeKind::Unresolved: return "unresolved type ";
+ case TypeKind::DynamicSelf: return "dynamic type ";
+ case TypeKind::TypeVariable: return "type-variable ";
+ case TypeKind::SILFunction: return "SIL function ";
+ case TypeKind::SILBlockStorage: return "SIL block storage ";
+ case TypeKind::SILBox: return "SIL box ";
+ case TypeKind::GenericTypeParam: return "generic parameter ";
+ case TypeKind::Archetype: {
+ auto archetypeTy = dyn_cast<ArchetypeType>(type.getPointer());
+ if (auto openedTy = archetypeTy->getOpenedExistentialType()) {
+ type = openedTy;
+ return computeDiagnosticTypeKindString(type);
+ }
+ if (archetypeTy->getAssocType())
+ return "associated type ";
+ return "generic parameter ";
+ }
+ case TypeKind::DependentMember: {
+ while (auto depTy = dyn_cast<DependentMemberType>(type.getPointer()))
+ type = depTy->getBase();
+ return "dependent " + computeDiagnosticTypeKindString(type);
+ }
+ case TypeKind::Metatype: {
+ auto metaTy = cast<MetatypeType>(type.getPointer());
+ type = metaTy->getInstanceType();
+ if (!type->isExistentialType())
+ return "metatype of " + computeDiagnosticTypeKindString(type);
+ return "concrete metatype of " + computeDiagnosticTypeKindString(type);
+ }
+ case TypeKind::ExistentialMetatype: {
+ auto metaTy = cast<ExistentialMetatypeType>(type.getPointer());
+ type = metaTy->getInstanceType();
+ return "metatype of " + computeDiagnosticTypeKindString(type);
+ }
+ case TypeKind::ArraySlice:
+ // we could in dig further, but then diag messages might become unwieldy
+ return "array slice ";
+ case TypeKind::Dictionary:
+ // we could in dig further, but then diag messages would become unwieldy
+ return "dictionary ";
+ case TypeKind::Optional:
+ type = type->getAnyOptionalObjectType();
+ return "optional " + computeDiagnosticTypeKindString(type);
+ case TypeKind::ImplicitlyUnwrappedOptional:
+ type = type->getAnyOptionalObjectType();
+ return "implicitly unwrapped optional "
+ + computeDiagnosticTypeKindString(type);
+ case TypeKind::Paren:
+ type = type->getWithoutParens();
+ return computeDiagnosticTypeKindString(type);
+ case TypeKind::Tuple: {
+ auto tempTy = type->getWithoutImmediateLabel();
+ if (tempTy->isEqual(type))
+ return "tuple ";
+ type = tempTy;
+ return computeDiagnosticTypeKindString(type);
+ }
+ case TypeKind::NameAlias: {
+ // We need to stop updating our caller's 'type' at this point.
+ auto aliasTy = cast<NameAliasType>(type.getPointer());
+ Type discardTy = aliasTy->getSinglyDesugaredType();
+ return computeDiagnosticTypeKindString(discardTy);
+ }
+ case TypeKind::Error: {
+ auto errorTy = cast<ErrorType>(type.getPointer());
+ if (auto origTy = errorTy->getOriginalType()) {
+ type = origTy;
+ return "invalid " + computeDiagnosticTypeKindString(type);
+ }
+ return "invalid type ";
+ }
+ case TypeKind::UnboundGeneric:
+ switch (type->castTo<UnboundGenericType>()->getDecl()->getKind()) {
+ case DeclKind::TypeAlias: return "typealias ";
+ case DeclKind::Protocol: return "protocol ";
+ case DeclKind::Struct: return "struct ";
+ case DeclKind::Class: return "class ";
+ case DeclKind::Enum: return "enum ";
+ default: break;
+ }
+ llvm_unreachable("Unknown unbound generic");
+ }
+ case TypeKind::LValue: {
+ // While LValues are not a part of the user visible type system, they do
+ // sometimes appear inside of tuple expressions. Just ignore them.
+ auto lvTy = cast<LValueType>(type.getPointer());
+ type = lvTy->getObjectType();
+ return computeDiagnosticTypeKindString(type);
+ }
+ case TypeKind::WeakStorage:
+ case TypeKind::UnownedStorage:
+ case TypeKind::UnmanagedStorage:
+ llvm_unreachable("Not a part of the user facing type system");
+}

···

+
/// \brief Format a single diagnostic argument and write it to the given
/// stream.
static void formatDiagnosticArgument(StringRef Modifier,
@@ -439,16 +554,18 @@ static void formatDiagnosticArgument(StringRef Modifier,
     
     // Strip extraneous parentheses; they add no value.
     auto type = Arg.getAsType()->getWithoutParens();
+ auto origType = type->getInOutObjectType();
+ std::string declKind = computeDiagnosticTypeKindString(type);
     std::string typeName = type->getString();

     if (shouldShowAKA(type, typeName)) {
       llvm::SmallString<256> AkaText;
       llvm::raw_svector_ostream OutAka(AkaText);
- OutAka << type->getCanonicalType();
- Out << llvm::format(FormatOpts.AKAFormatString.c_str(), typeName.c_str(),
- AkaText.c_str());
+ OutAka << origType->getCanonicalType();
+ Out << declKind << llvm::format(FormatOpts.AKAFormatString.c_str(),
+ typeName.c_str(), AkaText.c_str());
     } else {
- Out << FormatOpts.OpeningQuotationMark << typeName
+ Out << declKind << FormatOpts.OpeningQuotationMark << typeName
           << FormatOpts.ClosingQuotationMark;
     }
     break;

Hello,

I’d like to propose improving the quality, consistency, and future preparedness of diagnostics through simplification. Right now, a lot of diagnostics output tends to look like this (for example):

  type ‘Foo’ cannot subclass type ‘Bar’

Instead of hard coding the word “type” into the message, I’d like to change the diagnostics engine to dynamically introspect the types and print out the type kind instead. Therefore, after this change, the message output would look like this:

  class ‘Foo’ cannot subclass struct ‘Bar’

Which is clearly more better. And it requires less work and specialization on behalf of diagnostics clients because the messages within the compiler “defs” files would change from:

  “type %0 cannot subclass type %1”

To:

  “%0 cannot subclass %1”

Which is obviously simpler, better, and more future proof. In particular, clients of the diagnostics engine would no longer need to haphazardly customize messages to acknowledge optionals, metatypes, etc as they do today. The engine itself would deal with that.

What do people think? The meat of the change is tiny and below. The vast majority of the work is the long and painful fallout in the test suite, which I’m cautiously (and perhaps foolishly) stepping up to doing.

I think that's an interesting improvement in this specific message, but it's not obvious that it would be an improvement to all messages. Generally the way we did things like this in Clang was to add a modifier to the diagnostic string, so that it would look something like "%type1", which would be understood to expand to a noun appropriate for the type. That way, it was easy to take advantage of it for a specific diagnostic, but you weren't forced into it by default. I think that's still the right approach here.

A few comments on the nouns you've picked:

+ case TypeKind::Function: return "function ";
+ case TypeKind::GenericFunction: return "function ";

'function' would be misleading here, since it's actually referring to a function *type*. Consider your example above: "class 'A' cannot subclass function '(Int) -> ()'".

+ case TypeKind::ProtocolComposition: return "protocol ";

"cannot convert from protocol 'A & B' to ..."? Hmm. Not sure about this one.

+ case TypeKind::Module: return "module ";

This has a lot of the same problems as "function", I think.

+ case TypeKind::DynamicSelf: return "dynamic type ";

This is almost certainly a situation where you're not going to do better than "type".

+ case TypeKind::Unresolved: return "unresolved type ";
+ case TypeKind::TypeVariable: return "type-variable ";
+ case TypeKind::SILFunction: return "SIL function ";
+ case TypeKind::SILBlockStorage: return "SIL block storage ";
+ case TypeKind::SILBox: return "SIL box ";

If these ever do show up in diagnostics, I'm not sure a good noun will help.

+ case TypeKind::WeakStorage:
+ case TypeKind::UnownedStorage:
+ case TypeKind::UnmanagedStorage:
+ llvm_unreachable("Not a part of the user facing type system");

This seems inconsistent given the treatment for SILFunctionType.

+ case TypeKind::GenericTypeParam: return "generic parameter ";
+ case TypeKind::Archetype: {
+ auto archetypeTy = dyn_cast<ArchetypeType>(type.getPointer());
+ if (auto openedTy = archetypeTy->getOpenedExistentialType()) {
+ type = openedTy;
+ return computeDiagnosticTypeKindString(type);
+ }
+ if (archetypeTy->getAssocType())
+ return "associated type ";
+ return "generic parameter ";
+ }

I'm not sure that being more specific about archetypes is actually helpful to users.

+ case TypeKind::DependentMember: {
+ while (auto depTy = dyn_cast<DependentMemberType>(type.getPointer()))
+ type = depTy->getBase();
+ return "dependent " + computeDiagnosticTypeKindString(type);
+ }

This is definitely going to be more confusing than not.

+ case TypeKind::Metatype: {
+ auto metaTy = cast<MetatypeType>(type.getPointer());
+ type = metaTy->getInstanceType();
+ if (!type->isExistentialType())
+ return "metatype of " + computeDiagnosticTypeKindString(type);
+ return "concrete metatype of " + computeDiagnosticTypeKindString(type);
+ }
+ case TypeKind::ExistentialMetatype: {
+ auto metaTy = cast<ExistentialMetatypeType>(type.getPointer());
+ type = metaTy->getInstanceType();
+ return "metatype of " + computeDiagnosticTypeKindString(type);
+ }

Interesting. "metatype of optional struct" for Array?.Type.

+ case TypeKind::ArraySlice:
+ // we could in dig further, but then diag messages might become unwieldy
+ return "array slice ";
+ case TypeKind::Dictionary:
+ // we could in dig further, but then diag messages would become unwieldy
+ return "dictionary ";

Same problem as "function".

+ case TypeKind::Optional:
+ type = type->getAnyOptionalObjectType();
+ return "optional " + computeDiagnosticTypeKindString(type);

This is interesting but could get pretty verbose.

+ case TypeKind::ImplicitlyUnwrappedOptional:
+ type = type->getAnyOptionalObjectType();
+ return "implicitly unwrapped optional "
+ + computeDiagnosticTypeKindString(type);

This is *already* pretty verbose to throw into random diagnostics.

John.

···

On Aug 23, 2017, at 11:00 PM, David Zarzycki via swift-dev <swift-dev@swift.org> wrote:

+ case TypeKind::Paren:
+ type = type->getWithoutParens();
+ return computeDiagnosticTypeKindString(type);
+ case TypeKind::Tuple: {
+ auto tempTy = type->getWithoutImmediateLabel();
+ if (tempTy->isEqual(type))
+ return "tuple ";
+ type = tempTy;
+ return computeDiagnosticTypeKindString(type);
+ }
+ case TypeKind::NameAlias: {
+ // We need to stop updating our caller's 'type' at this point.
+ auto aliasTy = cast<NameAliasType>(type.getPointer());
+ Type discardTy = aliasTy->getSinglyDesugaredType();
+ return computeDiagnosticTypeKindString(discardTy);
+ }
+ case TypeKind::Error: {
+ auto errorTy = cast<ErrorType>(type.getPointer());
+ if (auto origTy = errorTy->getOriginalType()) {
+ type = origTy;
+ return "invalid " + computeDiagnosticTypeKindString(type);
+ }
+ return "invalid type ";
+ }
+ case TypeKind::UnboundGeneric:
+ switch (type->castTo<UnboundGenericType>()->getDecl()->getKind()) {
+ case DeclKind::TypeAlias: return "typealias ";
+ case DeclKind::Protocol: return "protocol ";
+ case DeclKind::Struct: return "struct ";
+ case DeclKind::Class: return "class ";
+ case DeclKind::Enum: return "enum ";
+ default: break;
+ }
+ llvm_unreachable("Unknown unbound generic");
+ }
+ case TypeKind::LValue: {
+ // While LValues are not a part of the user visible type system, they do
+ // sometimes appear inside of tuple expressions. Just ignore them.
+ auto lvTy = cast<LValueType>(type.getPointer());
+ type = lvTy->getObjectType();
+ return computeDiagnosticTypeKindString(type);
+ }
+}
+
/// \brief Format a single diagnostic argument and write it to the given
/// stream.
static void formatDiagnosticArgument(StringRef Modifier,
@@ -439,16 +554,18 @@ static void formatDiagnosticArgument(StringRef Modifier,
     
     // Strip extraneous parentheses; they add no value.
     auto type = Arg.getAsType()->getWithoutParens();
+ auto origType = type->getInOutObjectType();
+ std::string declKind = computeDiagnosticTypeKindString(type);
     std::string typeName = type->getString();

     if (shouldShowAKA(type, typeName)) {
       llvm::SmallString<256> AkaText;
       llvm::raw_svector_ostream OutAka(AkaText);
- OutAka << type->getCanonicalType();
- Out << llvm::format(FormatOpts.AKAFormatString.c_str(), typeName.c_str(),
- AkaText.c_str());
+ OutAka << origType->getCanonicalType();
+ Out << declKind << llvm::format(FormatOpts.AKAFormatString.c_str(),
+ typeName.c_str(), AkaText.c_str());
     } else {
- Out << FormatOpts.OpeningQuotationMark << typeName
+ Out << declKind << FormatOpts.OpeningQuotationMark << typeName
           << FormatOpts.ClosingQuotationMark;
     }
     break;
_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

What do people think? The meat of the change is tiny and below. The vast majority of the work is the long and painful fallout in the test suite, which I’m cautiously (and perhaps foolishly) stepping up to doing.

I think that’s an interesting improvement in this specific message, but it's not obvious that it would be an improvement to all messages.

Hi John,

That is certainly true. I’ve already seen that after a few rounds of testing this and watching the changing output from the test suite. That’s why I emailed this list before getting too deep into this change. In general, I thought the output was better than before, but reasonable people might disagree. The only place where I felt the extra verbosity was clearly distracting was/is the ‘where’ clause diagnostics.

Generally the way we did things like this in Clang was to add a modifier to the diagnostic string, so that it would look something like "%type1", which would be understood to expand to a noun appropriate for the type. That way, it was easy to take advantage of it for a specific diagnostic, but you weren't forced into it by default. I think that’s still the right approach here.

Interesting. Good to know. Setting aside what the default should be, and unless I’m missing something, the clang style “%type1” approach has the same underlying challenges as this patch:
The verbosity snowballs because the goal of printing the inner decl kind is complicated by having layered type qualifiers (optional, metatype, etc).
Having two very different approaches to printing a type during diagnostics is lame (i.e. dense native syntax versus verbose English descriptions).
Maybe the “right” approach – and I’m only half serious about this – is to have a way for the compiler to print ‘(/*struct*/Foo).Type?’. Is it ugly? Arguably. Is it dense? Yes. Does it snowball? No. Does it scale? Absolutely! For example, consider a nontrivial “optional dictionary of tuple” type. It might look like this: ‘[/*struct*/String : (/*class*/Foo, /*enum*/Bar)]?’. If knowing the decl kind is helpful, then the latter output is a huge improvement.

For whatever it may be worth, I’m not wed to this proposal. The motivation behind this patch was preparation for the eventual day when a new nominal type is created (like Chris’s “actor” proposal or something else). I could certainly narrow the scope of this work to rewording some of the diagnostics to be more decl kind agnostic. For example: “subclass” becomes “subtype”, “superclass type” becomes “inherited type”, etc. If people are open to that, I can narrow the focus of this work.

Cheers,
Dave

···

On Aug 24, 2017, at 00:08, John McCall <rjmccall@apple.com> wrote:

I think making it easy to have a better noun than "type" before a type in a diagnostic is valuable, but yeah, I think you need to:
  - take it as a mission to not use more than two words as an introducer
  - be cool with falling back to "type" if you're not sure what else to say
  - think carefully about whether using a more specific noun than "type" is really providing any non-obvious information

For example, "type" is probably fine to introduce function types, because it's usually visually obvious that a type is a function type. On the other hand, you could consider using a more precise noun in the same sort of situations where you might print an aka clause, e.g.:
  can't frobonicate with function type 'HappyCallback' (aka '(HappyContext) -> ()')

Finally, of course, it's okay just to make incremental improvements instead of trying to make everything perfect all at once. :)

John.

···

On Aug 24, 2017, at 10:41 AM, David Zarzycki <zarzycki@icloud.com> wrote:

On Aug 24, 2017, at 00:08, John McCall <rjmccall@apple.com <mailto:rjmccall@apple.com>> wrote:

What do people think? The meat of the change is tiny and below. The vast majority of the work is the long and painful fallout in the test suite, which I’m cautiously (and perhaps foolishly) stepping up to doing.

I think that’s an interesting improvement in this specific message, but it's not obvious that it would be an improvement to all messages.

Hi John,

That is certainly true. I’ve already seen that after a few rounds of testing this and watching the changing output from the test suite. That’s why I emailed this list before getting too deep into this change. In general, I thought the output was better than before, but reasonable people might disagree. The only place where I felt the extra verbosity was clearly distracting was/is the ‘where’ clause diagnostics.

Generally the way we did things like this in Clang was to add a modifier to the diagnostic string, so that it would look something like "%type1", which would be understood to expand to a noun appropriate for the type. That way, it was easy to take advantage of it for a specific diagnostic, but you weren't forced into it by default. I think that’s still the right approach here.

Interesting. Good to know. Setting aside what the default should be, and unless I’m missing something, the clang style “%type1” approach has the same underlying challenges as this patch:
The verbosity snowballs because the goal of printing the inner decl kind is complicated by having layered type qualifiers (optional, metatype, etc).
Having two very different approaches to printing a type during diagnostics is lame (i.e. dense native syntax versus verbose English descriptions).
Maybe the “right” approach – and I’m only half serious about this – is to have a way for the compiler to print ‘(/*struct*/Foo).Type?’. Is it ugly? Arguably. Is it dense? Yes. Does it snowball? No. Does it scale? Absolutely! For example, consider a nontrivial “optional dictionary of tuple” type. It might look like this: ‘[/*struct*/String : (/*class*/Foo, /*enum*/Bar)]?’. If knowing the decl kind is helpful, then the latter output is a huge improvement.

For whatever it may be worth, I’m not wed to this proposal. The motivation behind this patch was preparation for the eventual day when a new nominal type is created (like Chris’s “actor” proposal or something else). I could certainly narrow the scope of this work to rewording some of the diagnostics to be more decl kind agnostic. For example: “subclass” becomes “subtype”, “superclass type” becomes “inherited type”, etc. If people are open to that, I can narrow the focus of this work.