Is Decl semantic checking two or three pass?

In lib/Sema/TypeCheckDecl.cpp, the file refers to declaration checking as having "two passes", but in practice the state engine has three (confusing) states: "first", "second", and "neither".

What is the long-term direction for the declaration checking state engine? Two states? Or formalization of the existing behavior into an enum with three cases?

Hoo boy.

Decl checking is already broken into two pieces: "validation", which means "give me enough information to use this decl, possibly from another file", and "full checking", which means "I plan to emit this". What you're referring to is "full checking".

I don't think we actually need two passes any more—there should be no information that pass #2 needs that can't be put into the "validation" part—but I've never tried to verify that, and no one has looked at it recently.

Furthermore, because "validation" came after "full checking", there are (were?) some parts of validation that are implemented by calling the full-checking logic, instead of the other way around. This might be one of the places where the ridiculous "neither" state shows up, though I seem to remember it wasn't the only one.

Cleaning this up would be great, which in my mind means eliminating the "pass" enum altogether. At the same time, there's the notion of the "iterative decl checker", which knows how to do much more fine-grained things than "validate" and "full-check". This is all the files that start with "ITC" in the Sema folder, and unfortunately it's still in the "experiment" phase of implementation. The tantalizing redesign of the ITC (IDC?) probably contributed to us not improving the existing DeclChecker—well, that and "if it ain't broke, don't touch it cause it might fall over". :-/

cc @Douglas_Gregor if he has anything to add

Wow, thanks for the quick reply! That was way more helpful than I imagined. Thanks!

So how should a programmer decide where to place new logic among the three states?

More concretely, how should one think about the difference between "validation" and "ready for emission"?

Most of the time I follow what an existing similar feature for the same declaration kind does. If I were making a completely new declaration, I'd have it use the second pass only and see if that works.

The declaration checker basically works as follows:

  • You have a set of source files in a module. One or more of those files are "primary files". In non-WMO mode, there is one primary file. In WMO mode, all files are primary files. Batch mode is in between.

  • All source files are parsed. Function bodies are parsed for the primary file; in the rest they're skipped.

  • We bind extensions, which means adding each extension's members to its nominal type.

  • The declarations in the primary files are walked and we do a "first pass" type check.

  • The declarations in the primary files are walked and we do a "second pass" type check. This is where function bodies are added to a list to be type checked later.

  • We type check all function bodies.

During the course of the above four steps, we perform name lookup to find other declarations referenced from the ones being type checked. This causes them to be validated.

Note that for some declarations, like functions, don't distinguish validation from first-pass type checking. The ones that do, like classes, do it because first-pass type checking also type checks all members, which you generally don't want to do when name lookup just references a class by name, etc.

Sometimes, when you reference a declaration we have to validate all members though, for example if a struct is referenced we will eventually need to compute its layout. This validation of required members is done by placing the declarations on a list and walking them at the end.

Validation of a declaration can trigger validation of other declarations, which can cause various crashes.

The "first pass" / "second pass" thing is mostly a misguided attempt at fixing some circular validation issues, and it's not really right to think of any of this as having multiple formally-defined "passes". It's all just a pile of hacks.

As Jordan mentioned, the hope is that the ITC will eventually replace all of this: swift/DeclarationTypeChecker.rst at main · apple/swift · GitHub

3 Likes

I was curious about this and it seems like the only time we see the "neither" state is for declarations that are not top-level:

  bool isSecondPass =
    !isFirstPass && D->getDeclContext()->isModuleScopeContext();
  DeclChecker(*this, isFirstPass, isSecondPass).visit(D);

I don't really remember why though. I think it has something to do with how nested declarations are visited recursively, but the details are hard to untangle.

@DaveZ patches welcome ;)

I figured it out. If both IsFirstPass and IsSecondPass are false, it means the declaration is nested inside a local context, ie a function body or something of that sort. This is used to skip certain checks, like accessibility checks (if you have a public function it should only reference public methods in its type, etc). Quite confusing.

I have an idea for getting rid of the two passes, or at least the separate IsSecondPass flag: Declaration checker cleanups by slavapestov · Pull Request #15408 · apple/swift · GitHub

No promises -- as Jordan pointed out this piece of the code is rather tricky and it's easy to break.