Allow regular letters as operator names

It's much worse, since it's an explosion underneath the overloads.

Shouldn't it try +a+ first, then +a? Also, how does it know if the second + is infix?

Admittedly, I have hard time parsing (ha!) what you're trying to suggest.

After it encounters second +, and since it knows that there are infix operators with this name, it should forward scan to resolve the ambiguity.

You can't do that in the early stages of parsing unless the operator is declared in the same file in which it's used. Despite your dismissal of the goal, Swift parses each file individually.

4 Likes

What is "it"?

If there is a file like this

import Foundation 
prefix ...

Doesn't that mean that in order to compile it at all, compiler has to deal with all files in foundation? At least at the first time?

Swift language compiler :expressionless:

The parsing part? No, Swift's lexical structure allows you to parse the file alone, and figure out things like: this is identifier, this is operator, this is keyword, etc.

Whether the operators/identifiers/functions exist is done at later stage.

2 Likes

But how it can construct parse tree if it doesn't know at this stage precedence?

Compilation has several steps. The first is lexical parsing. This is done on a per-file basis, without opening any other source files (or module interface files, etc.). I believe the next step is Sema, or Semantic Analysis. This is where symbols are resolved into types. It is uses the syntax tree generated by the first phase. It is only here that operators will be resolved, which in turn requires that the types of operands be determined, to select appropriate overloads.

2 Likes

Why would it need precedence at this stage? The parse tree is of lexical tokens, not language rules.

3 Likes

Well, it is token stream, and not any kind of tree, than.

Not entirely true. Swift syntax has hierarchy, and that's reflected in the tree. Scopes, for example. And parentheses, especially for function declarations and for parameters at call sites. At this stage, expressions will just be a stream, that's true.

In Sema, the expressions will grouped by operator and function precedence, and the flow control representation will reflect this analysis.

2 Likes

Ok, I see now. In current compiler implementation it is easier to parse unambiguously operators and identifiers. But does it actually goes against feasibility? Because precedence declarations always appear at top level, could compiler collect them at intermediate early stage of parsing before semantic analysis and parse then any occurance of them as maybe operator with correctness of application checked at later stages?

Well, we could do anything. But you have consider the trade-offs in complexity. And the fact that it's a specific goal, and not a mere accident, that Swift be parseable without seeing other files.

7 Likes

It's important to note that, unlike many languages, the parse tree in Swift doesn't encode operator precedence at all.

If you have the expression a + b * c, traditional languages that have fixed operator precedence would encode that directly in the syntax tree as something like:

AddOp(
  Identifier("a"),
  MultiplyOp(
    Identifier("b"),
    Identifier("c")
  )
)

But in Swift, the precedence of + and * are defined in the standard library, not in the compiler, so the parser can only surface the expression as a sequence of alternating (term OP term OP ...) elements:

Sequence(
  Identifier("a"),
  BinaryOp("+"),
  Identifier("b"),
  BinaryOp("*"),
  Identifier("c")
)

Later, during semantic analysis after modules have been loaded and the precedence of operators is known, the compiler uses that information to fold those sequences into an actual tree.

So how does swift-format handle this, since we only have access to the flat sequences from the parse tree? We had to implement the same logic in the compiler, except that we hardcoded the precedences of the standard operators and just give an arbitrary precedence to any that we don't recognize. (Eventually we'll have some configuration options to let people add their own custom operators if there's a library they use frequently that defines some.)

So the parser being able to distinguish between identifiers and operators is even more important when it comes to the ability to define custom operators, because it needs to be able to recognize "hey, this thing is definitely an operator" or "this thing is definitely an identifier" even before the module is loaded.

Otherwise, too much logic gets pushed to the semantic analysis stage, and that would make many tools like swift-format and other IDE functionality infeasible because they wouldn't be able to distinguish between operators or identifiers at all without the same semantic information, and requiring a user to compiler their code (and all of its dependencies!) before formatting it would be a non-starter.

7 Likes

Just because an implementation is possible doesn't mean it's a good idea or that it's practical. You've already been quoted Chris Lattner's explanation of why the idea has been rejected.

You claim it's “mythical complexity”. I don't know what experience you have in language implementation. I do know that the core team has decades of experience implementing languages. It is on you to convince them, and the community, that your opinion outweighs their decades of experience.

You're not going to make any allies by asserting that your opinion trumps everyone else's.

2 Likes

What I understand here is that you would like to introduce a new syntax for prefix operators that would always be separated by (at least) a space with its operand.

That would avoid the parsing problem of +example since if the operator is named +ex, the only correct syntax would be +ex ample.

However, it seem to introduce breaking changes with existing code. Should current prefix operators still be valid ? Would you still be able to write -42 or would you have to write - 42 ?

And now, how would you highlight the following code ?

let value = please tell me what is what

Is it (var) (infix op) (prefix op) (var) (infix op) (var) or (prefix op) (var) (infix op) (var) (infix op) (var) ?

I think it will always be ambiguous if the compiler does not parse everything before hand.

However I don't know the real impact of having to parse every single file to be able to parse a single line of code. Since some other languages are doing it (such as C++), are there tooling problems for those languages ?

This request has been rejected many times because of its implementational complexity and it was requested many times because of its readability benefits.

I'm just dropping an idea here: how about making a compromise that allows the best of both worlds and uses a distinct syntactic element that is specific only to named operators.
Something like value1 #infixop# value2, or #prefixop value, so that the compiler can detect whether an expression is an operator.

Yes I was going to mention that. Haskell allows you to use backticks to treat any (I think) binary function as an infix operator:

of :: Rank -> Suit -> PokerCard
r `of` s = PokerCard { rank = r, suit = s }

The nice thing about this feature is that it helps code read more naturally in some cases. Swift might need a different character given backtick is already used.

1 Like

' hasn't been used yet :wink:. Theoretically, # followed by non-keywords should also work.