[Proposal] Refining Identifier and Operator Symbology

Would it be possible to do the following:

• Have 1 group which are always used as identifiers. This would probably be the identifiers from this proposal.

• Have a 2nd group which are always used as operators (quite a bit larger than the group proposed by this proposal). At a minimum, the following ascii + set operators + math operators:
(± ≠ ≤ ≥ ¿ ¡ ™ ¢ ¶ • ° ƒ © √ ∆ ◊ § ≈ ∫ ÷ ¬ ).

• Everything else is in a 3rd group. These can be used as identifiers unless they have been defined as an operator somewhere within the project, at which point they switch to being only usable as operators. As long as the 2nd group is large enough, this shouldn’t cause conflict issues too often. It would prove useful to lots of subdomains (especially mathematicians).

Alternatively, we could just allow anything outside of the 2nd group to be an operator (including letter-like characters) as long as:
1) It is wrapped in parenthesis without spaces (∆) or even (op)
2) It has been declared as an operator
3) It is not a keyword or otherwise taken

Taken to it’s extreme, that might someday even allow anonymous operators.
let (©) = { lhs, rhs in …}
let ans = a (©) b

Thoughts? Is this possible?

Thanks,
Jon

Imagine you're the compiler and you've been handed this source file:

  let party = :tada::birthday:
  prefix operator :tada:

You're going to see a keyword "let", an identifier "party", an "=" operator, and then the sequence ":tada::birthday:". How will you interpret that sequence? There are a few possibilities:

1. A two-character variable named ":tada::birthday:".
2. A prefix ":tada:" operator followed by a one-character variable name ":birthday:".
3. A two-character operator ":tada::birthday:" with an operand to follow on the next line.
4. A one-character variable name ":tada:" followed by a postfix ":birthday:" operator.

The operator declaration on the next line will make all of this clear—but we can't understand the next line until we've parsed this line (see #3).

Now, one way around this would be to require all operator declarations to be at the top of the source file. But this would be very strange in Swift, which is otherwise completely insensitive to the order of statements in a declaration scope. And it leads to a strange two-phase behavior when parsing multiple files: You would need to parse each file through the end of its operator declarations before parsing any other code in any of the other files.

I don't think this is something we want to do.

···

On Oct 20, 2016, at 12:25 PM, Jonathan Hull via swift-evolution <swift-evolution@swift.org> wrote:

These can be used as identifiers unless they have been defined as an operator somewhere within the project, at which point they switch to being only usable as operators.

--
Brent Royal-Gordon
Architechies

No. This is far, far too complicated, and it mixes up the layers of the
compilation process.

···

On Thu, Oct 20, 2016 at 12:25 PM, Jonathan Hull via swift-evolution < swift-evolution@swift.org> wrote:

Would it be possible to do the following:

• Have 1 group which are always used as identifiers. This would probably
be the identifiers from this proposal.

• Have a 2nd group which are always used as operators (quite a bit larger
than the group proposed by this proposal). At a minimum, the following
ascii + set operators + math operators:
(± ≠ ≤ ≥ ¿ ¡ ™ ¢ ¶ • ° ƒ © √ ∆ ◊ § ≈ ∫ ÷ ¬ ).

• Everything else is in a 3rd group.

>
> These can be used as identifiers unless they have been defined as an
operator somewhere within the project, at which point they switch to being
only usable as operators.

Imagine you're the compiler and you've been handed this source file:

        let party = :tada::birthday:
        prefix operator :tada:

You're going to see a keyword "let", an identifier "party", an "="
operator, and then the sequence ":tada::birthday:". How will you interpret that
sequence? There are a few possibilities:

1. A two-character variable named ":tada::birthday:".
2. A prefix ":tada:" operator followed by a one-character variable name ":birthday:".
3. A two-character operator ":tada::birthday:" with an operand to follow on the next
line.
4. A one-character variable name ":tada:" followed by a postfix ":birthday:" operator.

The operator declaration on the next line will make all of this clear—but
we can't understand the next line until we've parsed this line (see #3).

Nope. There is no scenario in which any of this is clarified by an operator
declaration, and certainly one that is not in the lexical scope. What would
happen is this:

1. At the LEXICAL level, the sequences :tada::birthday: will be handled according to
the tokenization rules. If emojis are admitted in Swift, they are
definitely in the identifier space, so both of these are "normal"
identifiers.

2. At the parse level, no operator is in scope when you encounter the :tada::birthday:,
so it's simply an identifier.

3. At symbol resolution, no binding is found for that identifier, and an
error is raised.

Now let's consider real examples:

let c = a +++ b /// +++ is a user infix operator

Fails, because +++ is not defined in the lexical scope.

func +++ (a, b) -> int { ...}
...
let c = a +++ b

fails because this tokenizes as LET ident = ident ident ident, which is a
parse error. So how about:

func +++ (a, b) -> int { ...}
...
let c = +++(a, b)

succeeds. Tokenizes as LET ident = ident ( ident, ident), which is a
function call. OK. Now how about;

infix operator +++ : *SomePrecedence*
...
let c = a +++ b

builds a parse tree in which +++ is bound in the lexical contour of
operator symbols, but fails because +++ is *not* bound in the lexical
contour of identifiers. What was needed was

func +++(a, b) -> int { ... }
infix operator +++ : *SomePrecedence*
...
let c = a +++ b

THIS works. Builds an operator expression parse tree because +++ is bound
in the lexical contour of operator reserved words. Identifier resolution
then succeeds because +++ is *also* bound in the lexical contour of
identifiers.

But this would be very strange in Swift, which is otherwise completely

insensitive to the order of statements in a declaration scope. And it leads
to a strange two-phase behavior when parsing multiple files: You would need
to parse each file through the end of its operator declarations before
parsing any other code in any of the other files.

If I understand the language reference correctly, what you say is true
within a class definition, but not at file or local scope. Normal scopes
follow the normal rules for lexical contours. Member scopes have something
like the behavior you suggest, but the rules there are more complex than
you are describing.

Jonathan

···

On Thu, Oct 20, 2016 at 4:02 PM, Brent Royal-Gordon via swift-evolution < swift-evolution@swift.org> wrote:

> On Oct 20, 2016, at 12:25 PM, Jonathan Hull via swift-evolution < > swift-evolution@swift.org> wrote:

Right. Any proposal that changes parser behavior based on a visible operator declaration will break the ability for Swift to separately compile files. This will have massive tooling ramifications that are almost certainly a non-starter.

-Chris

···

On Oct 20, 2016, at 5:03 PM, Jonathan S. Shapiro via swift-evolution <swift-evolution@swift.org> wrote:

On Thu, Oct 20, 2016 at 12:25 PM, Jonathan Hull via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:
Would it be possible to do the following:

• Have 1 group which are always used as identifiers. This would probably be the identifiers from this proposal.

• Have a 2nd group which are always used as operators (quite a bit larger than the group proposed by this proposal). At a minimum, the following ascii + set operators + math operators:
(± ≠ ≤ ≥ ¿ ¡ ™ ¢ ¶ • ° ƒ © √ ∆ ◊ § ≈ ∫ ÷ ¬ ).

• Everything else is in a 3rd group.

No. This is far, far too complicated, and it mixes up the layers of the compilation process.

Just to play devil’s advocate (since I am honestly ok with the less draconian form of the proposal now), wouldn’t it be possible to declare the relevant unicode characters in the module’s plist (or similar file)? I know it isn’t the most elegant solution, but it does move the bar on using those characters from impossible to possible, and it would no longer mix up the compilation layers.

Thanks,
Jon

···

On Oct 24, 2016, at 10:10 PM, Chris Lattner <clattner@apple.com> wrote:

On Oct 20, 2016, at 5:03 PM, Jonathan S. Shapiro via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Thu, Oct 20, 2016 at 12:25 PM, Jonathan Hull via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:
Would it be possible to do the following:

• Have 1 group which are always used as identifiers. This would probably be the identifiers from this proposal.

• Have a 2nd group which are always used as operators (quite a bit larger than the group proposed by this proposal). At a minimum, the following ascii + set operators + math operators:
(± ≠ ≤ ≥ ¿ ¡ ™ ¢ ¶ • ° ƒ © √ ∆ ◊ § ≈ ∫ ÷ ¬ ).

• Everything else is in a 3rd group.

No. This is far, far too complicated, and it mixes up the layers of the compilation process.

Right. Any proposal that changes parser behavior based on a visible operator declaration will break the ability for Swift to separately compile files. This will have massive tooling ramifications that are almost certainly a non-starter.

-Chris