Where is the real grammar for Swift?

Let me simplify the grammar for class-declaration by removing unrelated or optional fields:

class-declarationclass class-name
class-name → identifier
class-body{ class-members opt }
class-members → class-member class-members opt
class-member → declaration | compiler-control-statement

declaration → import-declaration
declaration → constant-declaration
declaration → variable-declaration
declaration → typealias-declaration
declaration → function-declaration
declaration → enum-declaration
declaration → struct-declaration
declaration → class-declaration
declaration → protocol-declaration
... and so on

If we'd take it literally (as far as I understand it) it'd mean that even import-declaration or protocol-declaration is valid in the scope of class-declaration, which is not the case obviously.

I know that "The grammar described here is intended to help you understand the language in more detail, rather than to allow you to directly implement a parser or compiler", but my question is where should I look for a 100% reliable reference?

Maybe you can take a look at the code for the parser.

1 Like

Yeah, code for sure is the ultimate source of truth. Can you point me in the right direction here - a related parser source file for class for example? Is there anything in between that can be more easily digested by an average Swift user :slightly_smiling_face:? Thanks for the reply anyway :pray:t3:

The key thing is that there is an important distinction between syntactic and semantic validity.

For example, something like class class Foo { ] is not well-formed syntactically. However, class C { protocol P {} } is valid syntax but doesn't make sense semantically. There is nothing in the language grammar preventing a protocol declaration from being nested inside of a class. Instead, it is the type checker's job to enforce such semantic rules.

6 Likes

I would argue that it makes sense both syntactically and semantically. It is only grammatically that the language prohibits it.

Syntactically, it is obviously well-formed.

Semantically, there is a protocol P declared in the C namespace, aka. C.P. While C happens to be a class, that fact is (semantically) immaterial to the protocol declaration: the only effect it should have is to place an upper bound on the visibility of the protocol equal to the visibility of the class.

The reason declaring a nested protocol is disallowed, is an essentially arbitrary choice. And that choice is a grammatical rule of the language, which happens to be undocumented in the grammar.

2 Likes

I'm not sure the semantics are as obvious as you claim once you start taking generics into consideration. E.g., is C<Int>.P the same protocol as C<String>.P? There are legitimate design questions around this sort of construction that go beyond grammatical issues. Even if it were possible to capture all the desired rules in terms of a CFG, it's not clear that that would be desirable. It may make the grammar significantly more complex for something that could easily be rejected in the type checking phase.

Nobody here is asking for generic protocols, or even nested protocols, to be allowed. We are asking for the formal description of the grammar to accurately reflect the fact that they are not allowed.

1 Like

I think @Slava_Pestov's point is that one can write something rejected by the compiler for reasons other than syntax, and that the term "grammar" is meant only to describe those syntactic rules:

A grammar lets us transform a program, which is normally represented as a linear sequence of ASCII characters, into a syntax tree.

4 Likes

But an accurate description of the grammar does not prevent a class inside a protocol, because this is a semantic restriction enforced during semantic analysis, not a grammatical restriction enforced during parsing.

Let’s try to make the situation clearer. This code is invalid:

// There are no other declarations in this module.
print(undeclaredVariable)

But it is not grammatically invalid. The compiler parses it correctly, but during semantic analysis it determines that undeclaredVariable was never declared anywhere and rejects it. The requirement that variables be defined is a semantic, not a grammatical, rule.

Similarly, the requirement that protocols not be nested in other types is diagnosed during semantic analysis, not parsing. It is a semantic, not grammatical, rule.

Having said that, there are things that are grammatical rules but aren’t in TSPL’s grammar. There is no better documentation of the grammar than TSPL’s—to get a more accurate view of the grammar, you have to read the parser’s source code.

9 Likes