Potential Swift grammar simplification

Can we use parentheses in Swift grammar for grouping or is this prohibited for some reason? Could simplify quite a few things.

A few examples:


Current:

  1. function-signatureparameter-clause async ? throws ? function-result ?
  2. function-signatureparameter-clause async ? rethrows function-result ?

Simplified:

  1. function-signatureparameter-clause async ? ( throws ? | rethrows ) function-result ?

Current:

  1. subscript-declarationsubscript-head subscript-result generic-where-clause ? code-block
  2. subscript-declarationsubscript-head subscript-result generic-where-clause ? getter-setter-block
  3. subscript-declarationsubscript-head subscript-result generic-where-clause ? getter-setter-keyword-block

Simplified:

  1. subscript-declarationsubscript-head subscript-result generic-where-clause ? ( code-block | getter-setter-block | getter-setter-keyword-block )

Current:

  1. precedence-group-relationhigherThan : precedence-group-names
  2. precedence-group-relationlowerThan : precedence-group-names

Simplified:

  1. precedence-group-relation( higherThan | lowerThan ) : precedence-group-names

Current:

  1. precedence-group-associativityassociativity : left
  2. precedence-group-associativityassociativity : right
  3. precedence-group-associativityassociativity : none

Simplified:

  1. precedence-group-associativityassociativity : ( left | right | none )

Current:

  1. variable-declarationvariable-declaration-head variable-name type-annotation code-block
  2. variable-declarationvariable-declaration-head variable-name type-annotation getter-setter-block
  3. variable-declarationvariable-declaration-head variable-name type-annotation getter-setter-keyword-block

Simplified:

  1. variable-declarationvariable-declaration-head variable-name type-annotation ( code-block | getter-setter-block | getter-setter-keyword-block )

Current:

  1. access-level-modifierprivate | private ( set )
  2. access-level-modifierfileprivate | fileprivate ( set )
  3. access-level-modifierinternal | internal ( set )
  4. access-level-modifierpublic | public ( set )
  5. access-level-modifieropen | open ( set )

Simplified:

  1. access-level-modifier( private | fileprivate | internal | public | open ) ( ( set ) ) ?

Alternatively:

  1. access-level-suffix( set )
    access-level-modifier( private | fileprivate | internal | public | open ) access-level-suffix ?

Edit: italicised parens in BNF to make them more visually dissimilar to parens in the language – we do the same thing with question mark. Also added the alternative for the last example as discussed below.

1 Like

As a general point, I would dispute the assumption that less verbose = simpler. However, in some cases, I would agree your alternate form does look easier to understand - the precedence-group-associativity for example. However, in some cases, I think the verbose version is better - access-level-modifier for example.

I think the last one might be the reason why they are so reluctant to use symbols from the language as part of the syntax of the syntax specifications. The nested set of parentheses, one of which is in bold doesn't look great.

The aesthetics of that particular case we could fix by factoring that optional suffix out:

access-level-suffix( set )
access-level-modifier(private | fileprivate | internal | public | open) access-level-suffix ?


Edited this and the head post adding italization to BNF parens - we do the same trick with question mark.

Found this:

It doesn't explain why, just stating the fact. Can we change it or is it already deep in the Swift DNA?

Motivation

When reading the grammar stances like these:

  1. initializer-declarationinitializer-head generic-parameter-clause ? parameter-clause async ? throws ? generic-where-clause ? initializer-body

  2. initializer-declarationinitializer-head generic-parameter-clause ? parameter-clause async ? rethrows generic-where-clause ? initializer-body

my brain is doing this "diff" exercise (and sometimes I don't trust the brain and verify the differences in a text editor) to see what's actually different, and then doing the substitution to a "compact" version of the syntax; the end result my brain ends up dealing with is this, with the change capitalised:

  1. initializer-declarationinitializer-head generic-parameter-clause ? parameter-clause async ? THROWS-OR-RETHROWS ? generic-where-clause ? initializer-body

This particular change can be written like that, with throws-or-rethrows defined on a separate line - but that separate line is effectively a noise (throws-or-rethrows = throws | rethrows) that could've not been there.

Can this topic be moved to the Swift Documentation category?

1 Like

Done!

1 Like

Some of the design choices in the grammar itself are eyebrow-raising. "watchOS", "macCatalystApplicationExtension", "x86_64" and "arm64" as part of the grammar, seriously?!

We could probably cut the syntax in half by moving things up to semantic level. Not quite to that extent, obviously but still substantially.

For example, following the precedent of attributes:

instead of having the above "access-level-modifier" stanzas we could've had merely:

With some built-in definitions like so assumed internally:

protocol Meta.AccessLevelModifier {} // pseudo protocol (or struct / enum)

// built-ins declarations:
var `private`: Meta.AccessLevelModifier
var `fileprivate`: Meta.AccessLevelModifier
var `internal`: Meta.AccessLevelModifier
...

Optionally, besides grammar simplification this could open up quite interesting further opportunities in customisation.
var pink: Meta.AccessLevelModifier {
	#if os(macOS)
	return Meta.AccessLevelModifier.internal
	#else
	return Meta.AccessLevelModifier.public
	#endif
}

Omit grouping parentheses. Our formal grammar doesn’t use grouping parentheses.

It doesn't explain why, just stating the fact. Can we change it or is it already deep in the Swift DNA?

This was an intentional decision when we started writing the formal grammar for Swift. The primary reason for our choice of language constructs in the formal grammar was to prevent individual production rules from becoming too complex. That's why we omitted operators for things like grouping and repetition. Especially when combined, these kinds of nesting can produce grammar that's very complex. We found that having to factor out and name groups of optional or repeated elements, and that having guidelines to name those things consistently, resulted in a grammar that's readable and made up of simple pieces.

Our choice of grammar came after a survey of how documentation describes the grammar a number of other languages — including C, C++, and Java, which tend to use a similar style to what TSPL uses, and also languages like Pascal, AppleScript, and Cobol that tend to be described in very different ways. Because the grammar is meant for people to read, not for parsing Swift, we use a language that makes use of different typographical styles rather than quoting — unlike the BNF grammars you might see in RFCs.

Using a different syntax is possible, but you'd need to make a strong case that either the current syntax is actively harmful, or that there's a significant improvement from the proposed one. Think along the lines of writing a Swift Evolution proposal that includes source-breaking changes.

Some of the design choices in the grammar itself are eyebrow-raising. "watchOS", "macCatalystApplicationExtension", "x86_64" and "arm64" as part of the grammar, seriously?!

There's a balance when writing the grammar between overproduction (rules that match invalid Swift code) and readability. The general goal is to minimize the places where the grammar overproduces, but to allow some overproduction where it's needed for readability. For example, the formal grammar for attributes enforces balanced ( and ) but doesn't actually enforce much structure inside the argument list, because encoding the grammar of each individual attribute would make the grammar very long and very hard to read — but it isn't generally valid to write arbitrary tokens there.

In this case, if I remember correctly, those are the list of things that are accepted by the compiler. (Swift has experimental/unofficial support for additional platforms, but those aren't included in this list. See also include/swift/AST/PlatformKinds.def in the compiler source code.) The grammar could just write "identifier" there, but that would overproduce. So we include the specific list here, because that provides useful signal to a programmer reading the grammar: These are the only things you should expect to write here.

8 Likes