[Proposal] Refining Identifier and Operator Symbology

xwu · October 19, 2016, 4:39pm

Personally, a *very* limited cherrypicking might be OK, on the
understanding that it must be done with very stringent inclusion criteria.
However, it's specifically the set algebra operators that I have the
greatest objection to cherrypicking:

* Each of these operators has a specific meaning; it is an anti-goal to
support repurposing the union operator for any other purpose than forming a
union of two sets, for instance.

* Thus, the only rationale for inclusion of these operators is to support
aliasing the set algebra protocol members.

* Now, if it is appropriate for these set algebra operations to be
accessible through these operators, then the standard library should be
providing them.

* However, exactly such an API has been trialled on a Swift branch, and the
set algebra renaming debate of 2015 (or was it early 2016?) on this very
list resulted in a renaming that *rejected* the use of these operators.

* Given that these operators have been deemed not appropriate for the only
methods that should use them, we should *not* specifically enable these
symbols as valid operator characters.

It is of course true that a user can essentially choose to alias any method
using any other identifier. However, this particular scenario is different
because you are advocating for Swift to make express allowance in the
grammar for solely that purpose. In other words, the language design would
be saying to the user, "We don't think this is a good idea at all;
otherwise, it'd be in the standard library. We literally tried that and
decided against it. However, we're specifically going to let you act on
this not-very-good idea by explicitly leaving room for it."

Now, if we had a complete set of operator characters, which would include
set algebra operators, then it wouldn't be problematic in the same way. In
that scenario, the language would be no more responsible for your renaming
formUnion() to the union symbol than to nonsenseAsdf(), since in neither
case would the underlying characters have been included among valid
operator or identifier characters, respectively, for the express purpose of
making possible that renaming.

···

On Wed, Oct 19, 2016 at 23:48 plx via swift-evolution < swift-evolution@swift.org> wrote:

+💯 on the emoji-related parts, +1 in general spirit, +1 for the
identifier cleanup, -103 for being needlessly overly-restrictive for
operators; net -1 overall.

Operator abuse is a social problem, and even if a technical fix is
possible this isn’t that…and despite the messiness of the relevant unicode
categories, this proposal goes far too far.

For operators, the reasonable thing to do at this time would be to
hand-select a small subset of the mathematical characters to allow as
operators—the “greatest hits” so to speak—and move on. If any grave
oversights are discovered those characters can be included in subsequent
major revisions; if the consortium ever finishes its recommendation it can
be adopted at that time.

There’s no need to exhaustively re-do the consortium’s work and there’s no
need to make a correct-for-all-time decision on each character at this
time; pick the low-hanging fruit and leave the rest for later.

That not everyone will be perfectly happy with any specific subset is
predictable and not interesting; not everyone is going to be perfectly
happy with this proposal’s proposed subset, either.

In any case, I’d specifically hate to lose these:

- approximate equality: ≈
- set operations: ∩, ∪
- set relations: ⊂, ⊃, ⊄, ⊅, ⊆, ⊇, ⊈, ⊉, ⊊, ⊋
- set membership: ∌, ∋, ∈, ∉
- logical operators: ¬, ∧, ∨

…although there are many more that would be nice to keep available.

On Oct 19, 2016, at 1:34 AM, Jacob Bandes-Storch via swift-evolution < > swift-evolution@swift.org> wrote:

Dear Swift-Evolution community,

A few of us have been preparing a proposal to refine the definitions of
identifiers & operators. This includes some changes to the permitted
Unicode characters.

The latest (perhaps final?) draft is available here:

https://github.com/jtbandes/swift-evolution/blob/unicode-id-op/proposals/NNNN-refining-identifier-and-operator-symbology.md

We'd welcome your initial thoughts, and will probably submit a PR soon to
the swift-evolution repo for a formal review. Full text follows below.

—Jacob Bandes-Storch, Xiaodi Wu, Erica Sadun, Jonathan Shapiro

Refining Identifier and Operator Symbology

   - Proposal: SE-NNNN
   <https://github.com/jtbandes/swift-evolution/blob/unicode-id-op/proposals/NNNN-refining-identifier-and-operator-symbology.md>
   - Authors: Jacob Bandes-Storch <https://github.com/jtbandes>, Erica
   Sadun <https://github.com/erica>, Xiaodi Wu <https://github.com/xwu>,
   Jonathan Shapiro
   - Review Manager: TBD
   - Status: Awaiting review

<https://github.com/jtbandes/swift-evolution/blob/unicode-id-op/proposals/NNNN-refining-identifier-and-operator-symbology.md#introduction>
Introduction

This proposal seeks to refine and rationalize Swift's identifier and
operator symbology. Specifically, this proposal:

   - adopts the Unicode recommendation for identifier characters, with
   some minor exceptions;
   - restricts the legal operator set to the current ASCII operator
   characters;
   - changes where dots may appear in operators; and
   - disallows Emoji from identifiers and operators.

<https://github.com/jtbandes/swift-evolution/blob/unicode-id-op/proposals/NNNN-refining-identifier-and-operator-symbology.md#prior-discussion-threads--proposals>Prior
discussion threads & proposals

   - Proposal: Normalize Unicode identifiers
   <https://github.com/apple/swift-evolution/pull/531>
   - Unicode identifiers & operators
   <https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20160912/027108.html>,
   with pre-proposal
   <https://gist.github.com/jtbandes/c0b0c072181dcd22c3147802025d0b59> (a
   precursor to this document)
   - Lexical matters: identifiers and operators
   <https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20160926/027479.html>
   - Proposal: Allow Single Dollar Sign as Valid Identifier
   <https://github.com/apple/swift-evolution/pull/354>
   - Free the '$' Symbol!
   <https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20151228/005133.html>
   - Request to add middle dot (U+00B7) as operator character?
   <https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20151214/003176.html>

<https://github.com/jtbandes/swift-evolution/blob/unicode-id-op/proposals/NNNN-refining-identifier-and-operator-symbology.md#guiding-principles>Guiding
principles

Chris Lattner has written:

…our current operator space (particularly the unicode segments covered) is
not super well considered. It would be great for someone to take a more
systematic pass over them to rationalize things.

We need a token to be unambiguously an operator or identifier - we can
have different rules for the leading and subsequent characters though.

…any proposal that breaks:

let = "moof"

will not be tolerated. :-) :-)

<https://github.com/jtbandes/swift-evolution/blob/unicode-id-op/proposals/NNNN-refining-identifier-and-operator-symbology.md#motivation>
Motivation

By supporting custom Unicode operators and identifiers, Swift attempts to
accomodate programmers and programming styles from many languages and
cultures. It deserves a well-thought-out specification of which characters
are valid. However, Swift's current identifier and operator character sets
do not conform to any Unicode standards, nor have they been rationalized in
the language or compiler documentation.

Identifiers, which serve as *names* for various entities, are linguistic
in nature and must permit a variety of characters to properly serve
non–English-speaking coders. This issue has been considered by the
communities of many programming languages already, and the Unicode
Consortium has published recommendations on how to choose identifier
character sets — Swift should make an effort to conform to these
recommendations.

Operators, on the other hand, should be rare and carefully chosen, because
they suffer from low discoverability and difficult readability. They are by
nature *symbols*, not names. This places a cognitive cost on users with
respect to both recall ("What is the operator that applies the behavior I
need?") and recognition ("What does the operator in this code do?"). While *almost
every* nontrivial program defines many new identifiers, most programs do
not define new operators.

As operators become more esoteric or customized, the cognitive cost rises.
Recognizing a function name like formUnion(with:) is simpler for many
programmers than recalling what the ∪ operator does. Swift's current
operator character set includes many characters that aren't traditional and
recognizable operators — this encourages problematic and frivolous uses in
an otherwise safe language.

Today, there are many discrepancies and edge cases motivating these
changes:

   - · is an identifier, while • is an operator.
   - The Greek question mark ; is a valid identifier.
   - Braille patterns ⠟ seem letter-like, but are operator characters.
   - 🂡 are identifiers, while are operators.
   - Some *non-combining* diacritics ´ ¨ ꓻ are valid in identifiers.
   - Some completely non-linguistic characters, such as ۞ and ༒, are
   valid in identifiers.
   - Some symbols such as ⚄ and ♄ are operators, despite not really being
   "operator-like".
   - A small handful of characters 〡〢〣〤〥〦〧〨〩〪〭〮〯〫〬 are valid in both identifiers
   and operators.
   - Some non-printing characters such as 2064 INVISIBLE PLUS and 200B
   ZERO WIDTH SPACE are valid identifiers.
   - Currency symbols are split across operators (¢ £ ¤ ¥) and
   identifiers ($ ₪ € ₱ ₹ ฿ ...).

This matter should be considered in a near timeframe (Swift 3.1 or 4) as
it is both fundamental to Swift and will produce source-breaking changes.

<https://github.com/jtbandes/swift-evolution/blob/unicode-id-op/proposals/NNNN-refining-identifier-and-operator-symbology.md#precedent-in-other-languages>Precedent
in other languages

Haskell distinguishes identifiers/operators by their general category
<http://www.fileformat.info/info/unicode/category/index.htm> such as "any
Unicode lowercase letter", "any Unicode symbol or punctuation", and so
forth. Identifiers can start with any lowercase letter or _, and may
contain any letter/digit/'/_. This includes letters like δ and Я, and
digits like ٢.

   - Haskell Syntax Reference
   <https://www.haskell.org/onlinereport/syntax-iso.html>
   - Haskell Lexer
   <https://github.com/ghc/ghc/blob/714bebff44076061d0a719c4eda2cfd213b7ac3d/compiler/parser/Lexer.x#L1949-L1973>

Scala similarly allows letters, numbers, $, and _ in identifiers,
distinguishing by general categories Ll, Lu, Lt, Lo, and Nl. Operator
characters include mathematical and other symbols (Sm and So) in addition
to other ASCII symbol characters.

   - Scala Lexical Syntax
   <Lexical Syntax;

ECMAScript 2015 ("ES6") uses ID_Start and ID_Continue, as well as
Other_ID_Start / Other_ID_Continue, for identifiers.

   - ECMAScript Specification: Names and Keywords
   <ECMAScript 2015 Language Specification – ECMA-262 6th Edition;

Python 3 uses XID_Start and XID_Continue.

   - The Python Language Reference: Identifiers and Keywords
   <2. Lexical analysis — Python 3.12.0 documentation;
   - PEP 3131: Supporting Non-ASCII Identifiers
   <https://www.python.org/dev/peps/pep-3131/>

<https://github.com/jtbandes/swift-evolution/blob/unicode-id-op/proposals/NNNN-refining-identifier-and-operator-symbology.md#proposed-solution>Proposed
solution

For identifiers, adopt the recommendations made in UAX #31 Identifier and
Pattern Syntax <http://unicode.org/reports/tr31/>, deriving the sets of
valid characters from ID_Start and ID_Continue. Normalize identifiers
using Normalization Form C (NFC).

(For operators, no such recommendation currently exists, although active
work is in progress to update UAX #31 to address "operator identifiers".)

Restrict operators to those ASCII characters which are currently
operators. All other operator characters are removed from the language.

Allow dots in operators in any location, but only in runs of two or more.

(Overall, this proposal is aggressive in its removal of problematic
characters. We are not attempting to prevent the addition or re-addition of
characters in the future, but by paring the set down now, we require any
future changes to pass the high bar of the Swift Evolution process.)

<https://github.com/jtbandes/swift-evolution/blob/unicode-id-op/proposals/NNNN-refining-identifier-and-operator-symbology.md#detailed-design>Detailed
design
<https://github.com/jtbandes/swift-evolution/blob/unicode-id-op/proposals/NNNN-refining-identifier-and-operator-symbology.md#identifiers>
Identifiers

Swift identifier characters will conform to UAX #31
<UAX #31: Unicode Identifiers and Syntax; as follows:

   -

   UAX31-C1. <UAX #31: Unicode Identifiers and Syntax; The conformance
   described herein refers to the Unicode 9.0.0 version of UAX #31 (dated
   2016-05-31 and retrieved 2016-10-09).
   -

   UAX31-C2. <UAX #31: Unicode Identifiers and Syntax; Swift shall observe
   the following requirements:
   -

      UAX31-R1. <UAX #31: Unicode Identifiers and Syntax; Swift shall augment
      the definition of "Default Identifiers" with the following profiles:
      1.

         ID_Start and ID_Continue shall be used for Start and Continue
          (replacing XID_Start and XID_Continue). This excludes characters
         in Other_ID_Start and Other_ID_Continue.
         2.

         _ 005F LOW LINE shall additionally be allowed as a Start
          character.
         3.

         The emoji characters 1F436 DOG FACE and 1F42E COW FACE
         shall be allowed as Start and Continue characters.
         4.

         (UAX31-R1a. <UAX #31: Unicode Identifiers and Syntax) The
         join-control characters ZWJ and ZWNJ are strictly limited to the special
         cases A1, A2, and B described in UAX #31. (This requirement is covered in
         the Normalize Unicode Identifiers proposal
         <Sign in to GitHub · GitHub)
         -

      UAX31-R4. <UAX #31: Unicode Identifiers and Syntax; Swift shall
      consider two identifiers equivalent when they have the same normalized form
      under NFC <http://unicode.org/reports/tr15/>\. (This requirement is
      covered in the Normalize Unicode Identifiers proposal
      <Sign in to GitHub · GitHub)

These changes
<Unicode Utilities: UnicodeSet Comparison; result
in the removal of some 5,500 valid code points from the identifier
characters, as well as hundreds of thousands of unassigned code points.
(Though it does not appear on this unicode.org utility, which currently
supports only Unicode 8 data, the · 00B7 MIDDLE DOT is no longer an
identifier character.) Adopting ID_Start and ID_Continue does not add any
new identifier characters.

<https://github.com/jtbandes/swift-evolution/blob/unicode-id-op/proposals/NNNN-refining-identifier-and-operator-symbology.md#grammar-changes>Grammar
changes

identifier-head → [:ID_Start:]
identifier-head → _
identifier-character → identifier-head
identifier-character → [:ID_Continue:]

<https://github.com/jtbandes/swift-evolution/blob/unicode-id-op/proposals/NNNN-refining-identifier-and-operator-symbology.md#operators>
Operators

Swift operator characters will be limited to only the following ASCII
characters:

! % & * + - . / < = > ? ^ | ~

The current restrictions on reserved tokens and operators will remain: =,
->, //, /*, */, ., ?, prefix <, prefix &, postfix >, and postfix ! are
reserved.

<https://github.com/jtbandes/swift-evolution/blob/unicode-id-op/proposals/NNNN-refining-identifier-and-operator-symbology.md#dots-in-operators>Dots
in operators

The current requirements for dots in operator names are:

If an operator doesn’t begin with a dot, it can’t contain a dot elsewhere.

This proposal changes the rule to:

Dots may only appear in operators in runs of two or more.

Under the revised rule, ..< and ... are allowed, but <.< is not. We also reserve
the .. operator, permitting the compiler to use .. for a "method cascade"
syntax in the future, as supported by Dart
<http://news.dartlang.org/2012/02/method-cascades-in-dart-posted-by-gilad.html>
.

Motivations for incorporating the two-dot rule are:

   -

   It helps avoid future lexical complications arising from lone .s.
   -

   It's a conservative approach, erring towards overly restrictive.
   Dropping the rule in future (thereby allowing single dots) may be possible.
   -

   It doesn't require special cases for existing infix dot operators in
   the standard library, ... (closed range) and ..< (half-open range). It
   also leaves the door open for the standard library to add analogous
   half-open and fully-open range operators <.. and <..<.
   -

   If we fail to adopt this rule now, then future backward-compatibility
   requirements will preclude the introduction of some potentially useful
   language enhancements.

<https://github.com/jtbandes/swift-evolution/blob/unicode-id-op/proposals/NNNN-refining-identifier-and-operator-symbology.md#grammar-changes-1>Grammar
changes

operator → operator-head operator-characters[opt]

operator-head → ! % & * + - / < = > ? ^ | ~
operator-head → operator-dot operator-dots
operator-character → operator-head
operator-characters → operator-character operator-character[opt]

operator-dot → .
operator-dots → operator-dot operator-dots[opt]

<https://github.com/jtbandes/swift-evolution/blob/unicode-id-op/proposals/NNNN-refining-identifier-and-operator-symbology.md#emoji>
Emoji

If adopted, this proposal eliminates emoji from Swift identifiers and
operators. Despite their novelty and utility, emoji characters introduce
significant challenges to the language:

   -

   Their categorization into identifiers and operators is not
   semantically motivated, and is fraught with discrepancies.
   -

   Emoji characters are not displayed consistently and uniformly across
   different systems and fonts. Including all Unicode emoji
   <Unicode Utilities: UnicodeSet; introduces
   characters that don't render as emoji on Apple platforms without a variant
   selector, but which also wouldn't normally be used as identifier characters
   (e.g. ).
   -

   Some emoji nearly overlap with existing operator syntax:
   -

   Full emoji support necessitates handling a variety of use cases for
   joining characters and variant selectors, which would not otherwise be
   useful in most cases. It would be hard to avoid permitting sequences of
   characters which aren't valid emoji, or being overly restrictive and not
   properly supporting emoji introduced in future versions of Unicode.

As an exception, in homage to Swift's origins, we permit and in
identifiers.

<https://github.com/jtbandes/swift-evolution/blob/unicode-id-op/proposals/NNNN-refining-identifier-and-operator-symbology.md#source-compatibility>Source
compatibility

This change is source-breaking in cases where developers have incorporated
emoji or custom non-ASCII operators, or identifiers with characters which
have been disallowed. This is unlikely to be a significant breakage for the
majority of serious Swift code.

Code using the middle dot · in identifiers may be slightly more common. · is
now disallowed entirely.

Diagnostics for invalid characters are already produced today. We can
improve them easily if needed.

Maintaining source compatibility for Swift 3 should be easy: just keep the
old parsing & identifier lookup code.

<https://github.com/jtbandes/swift-evolution/blob/unicode-id-op/proposals/NNNN-refining-identifier-and-operator-symbology.md#effect-on-abi-stability>Effect
on ABI stability

This proposal does not affect the ABI format itself, although the Normalize
Unicode Identifiers proposal
<https://github.com/apple/swift-evolution/pull/531> affects the ABI of
compiled modules.

The standard library will not be affected; it uses ASCII symbols with no
combining characters.

<https://github.com/jtbandes/swift-evolution/blob/unicode-id-op/proposals/NNNN-refining-identifier-and-operator-symbology.md#effect-on-api-resilience>Effect
on API resilience

This proposal doesn't affect API resilience.

<https://github.com/jtbandes/swift-evolution/blob/unicode-id-op/proposals/NNNN-refining-identifier-and-operator-symbology.md#alternatives-considered>Alternatives
considered

   -

   Define operator characters using Unicode categories such as Sm and So
   <http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[[%3ASm%3A][%3ASo%3A]]>\.
   This approach would include many "non-operator-like" characters and doesn't
   seem to provide a significant benefit aside from a simpler definition.
   -

   Hand-pick a set of "operator-like" characters to include. The proposal
   authors tried this painstaking approach, and came up with a relatively
   agreeable set of about 650 code points
   <http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[!\%24%25\%26*%2B\-%2F<%3D>%3F\^|~ \u00AC \u00B1 \u00B7 \u00D7 \u00F7 \u2208-\u220D \u220F-\u2211 \u22C0-\u22C3 \u2212-\u221D \u2238 \u223A \u2240 \u228C-\u228E \u2293-\u22A3 \u22BA-\u22BD \u22C4-\u22C7 \u22C9-\u22CC \u22D2-\u22D3 \u2223-\u222A \u2236-\u2237 \u2239 \u223B-\u223E \u2241-\u228B \u228F-\u2292 \u22A6-\u22B9 \u22C8 \u22CD \u22D0-\u22D1 \u22D4-\u22FF \u22CE-\u22CF \u2A00-\u2AFF \u27C2 \u27C3 \u27C4 \u27C7 \u27C8 \u27C9 \u27CA \u27CE-\u27D7 \u27DA-\u27DF \u27E0-\u27E5 \u29B5-\u29C3 \u29C4-\u29C9 \u29CA-\u29D0 \u29D1-\u29D7 \u29DF \u29E1 \u29E2 \u29E3-\u29E6 \u29FA \u29FB \u2308-\u230B \u2336-\u237A \u2395]> (although
   this set would require further refinement), but ultimately felt the
   motivation for including non-ASCII operators is much lower than for
   identifiers, and the harm to readers/writers of programs outweighs their
   potential utility.
   -

   Use Normalization Form KC (NFKC) instead of NFC. The decision to use
   NFC comes from Normalize Unicode Identifiers proposal
   <https://github.com/apple/swift-evolution/pull/531>\. Also, UAX #31
   states:

   Generally if the programming language has case-sensitive identifiers,
   then Normalization Form C is appropriate; whereas, if the programming
   language has case-insensitive identifiers, then Normalization Form KC is
   more appropriate.

   NFKC may also produce surprising results; for example, "ſ" and "s" are
   equivalent under NFKC.
   -

   Continue to allow single .s in operators, and perhaps even expand the
   original rule to allow them anywhere (even if the operator does not begin
   with .).

   This would allow a wider variety of custom operators (for some
   interesting possibilities, see the operators in Haskell's Lens
   <https://github.com/ekmett/lens/wiki/Operators> package). However,
   there are a handful of potential complications:
   -

      Combining prefix or postfix operators with member access: foo*.bar would
      need to be parsed as foo *. barrather than (foo*).bar. Parentheses
      could be required to disambiguate.
      -

      Combining infix operators with contextual members: foo*.bar would
      need to be parsed as foo *. bar rather than foo * (.bar).
      Whitespace or parentheses could be required to disambiguate.
      -

      Hypothetically, if operators were accessible as members such as
      MyNumber.+, allowing operators with single .s would require
      escaping operator names (perhaps with backticks, such as
      MyNumber.`+`).

   This would also require operators of the form [!?]*\. (for example . ?.
    !. !!.) to be reserved, to prevent users from defining custom
   operators that conflict with member access and optional chaining.

   We believe that requiring dots to appear in groups of at least two,
   while in some ways more restrictive, will prevent a significant amount of
   future pain, and does not require special-case considerations such as the
   above.

<https://github.com/jtbandes/swift-evolution/blob/unicode-id-op/proposals/NNNN-refining-identifier-and-operator-symbology.md#future-directions>Future
directions

While not within the scope of this proposal, the following considerations
may provide useful context for the proposed changes. We encourage the
community to pick up these topics when the time is right.

   -

   Re-expand operators to allow some non-ASCII characters. There is work
   in progress to update UAX #31 with definitions for "operator identifiers" —
   when this work is completed, it would be worth considering for Swift.
   -

   Introduce a syntax for method cascades. The Dart language supports method
   cascades
   <http://news.dartlang.org/2012/02/method-cascades-in-dart-posted-by-gilad.html>,
   whereby multiple methods can be called on an object within one expression:
   foo..bar()..baz() effectively performs foo.bar(); foo.baz(). This
   syntax can also be used with assignments and subscripts. Such a feature
   might be very useful in Swift; this proposal reserves the .. operator
   so that it may be added in the future.
   -

   Introduce "mixfix" operator declarations. Mixfix operators are based
   on pattern matching, and would allow more than two operands. For example,
   the ternary operator ? : can be defined as a mixfix operator with
   three "holes": _ ? _ : _. Subscripts might be subsumed by mixfix
   declarations such as _ [ _ ]. Some holes could be made @autoclosure,
   and there might even be holes whose argument is represented as an AST,
   rather than a value or thunk, supporting advanced metaprogramming (for
   instance, F#'s code quotations
   <https://docs.microsoft.com/en-us/dotnet/articles/fsharp/language-reference/code-quotations>
   ).
   -

   Diminish or remove the lexical distinction between operators and
   identifiers. If precedence and fixity applied to traditional
   identifiers as well as operators, it would be possible to incorporate ASCII
   equivalents for standard operators (e.g. and for &&, to allow A and B).
   If additionally combined with mixfix operator support, this might enable
   powerful DSLs (for instance, C#'s LINQ
   <https://en.wikipedia.org/wiki/Language_Integrated_Query>\).

_______________________________________________

swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

jtbandes · October 20, 2016, 4:44am

There is a typo in that operator-character[opt] should be
operator-characters[opt]. Aside from that, though, I believe the grammar as
written accepts +..+ already. Take a look at the following series of
substitutions based on the grammar rules:

operator
operator-head operator-characters
+ operator-characters
+ operator-character operator-characters[opt]
+ operator-head operator-head
+ operator-dot operator-dots operator-head
+ . . +

···

On Wed, Oct 19, 2016 at 10:12 AM, Alex Martini <amartini@apple.com> wrote:

Grammar changes

operator → operator-head operator-characters[opt]

operator-head → ! % & * + - / < = > ? ^ | ~
operator-head → operator-dot operator-dots
operator-character → operator-head
operator-characters → operator-character operator-character[opt]

operator-dot → .
operator-dots → operator-dot operator-dots[opt]

<https://github.com/jtbandes/swift-evolution/blob/unicode-id-op/proposals/NNNN-refining-identifier-and-operator-symbology.md#emoji>

I think there's a mismatch between the English and grammar. For example,
is +..+ allowed or not?

The English rule does allow +..+ because its dots appear in a run of two.

The grammar allows a run of one or more dots as an operator head, but
never allows dots as characters appearing in the middle of an operator,
regardless of how many dots appear next to each other. The grammar
wouldn't allow +..+ because the dots don't come at the beginning.

Here's an alternate version of the grammar that matches the "two or more"
rule. Because we no longer distinguish between which characters are
allowed as the first character of an operator vs a character inside,
there's no longer a need for a separate operator-head.

operator --> operator-character operator-OPT

operator-character --> ! % & * + - / < = > ? ^ | ~
operator-character --> operator-dots

operator-dots --> .. operator-additional-dots-OPT
operator-additional-dots --> . operator-additional-dots-OPT

Colin_Barrett · October 22, 2016, 11:02pm

I’m a -1 on the original proposal. I can see the logic in doing things that way, but it’s really unclear to me why we need to act *now*. In fact it seems like waiting might be a better option, given the things mentioned upthread about revisions to the Unicode standard.

Also, I think the message quoted below is a promising direction worth exploring. How would something like this work in the front-end? Swift’s grammar currently distinguishes between operators and identifiers right?

-Colin

···

On Oct 19, 2016, at 12:17 PM, Joe Groff via swift-evolution <swift-evolution@swift.org> wrote:

I think this is a promising direction. Getting us in line with Unicode recommendations is an important first step, and being conservative about the treatment of operator characters and emoji is a good engineering approach, though certainly unfortunate in the short term for users who've adopted custom operators or found interesting uses for emoji identifiers in Swift 3 and earlier.

In the discussion about operators, I wonder whether it makes sense to formally separate "identifier" and "operator" characters at all. My hunch is that there isn't going to be any perfect categorization; there are so many symbols and scripts out there that it's going to be difficult to definitively characterize many symbols as "obviously" an operator or identifier. Not every developer has the mathematical background to even recognize common math operators beyond the elementary arithmetic ones. Something to consider would be to change the way operators work in the language so that they can use *any* symbols (subject to canonicalization, visibility, and confusability constraints), but require their use to always be explicitly declared in a source file that uses an operator outside of the standard library. For example, you would have to say something like:

import Sets
import operator Sets.∪

to make the '∪' symbol available as an operator in the import declaration's scope. This would provide more obvious evidence in the source code of what tokens are being employed as operators, and lessen the need to have formally distinct identifier and operator character sets.

-Joe
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Joe_Groff · October 24, 2016, 4:43pm

Dear Swift-Evolution community,

A few of us have been preparing a proposal to refine the definitions of identifiers & operators. This includes some changes to the permitted Unicode characters.

The latest (perhaps final?) draft is available here:

https://github.com/jtbandes/swift-evolution/blob/unicode-id-op/proposals/NNNN-refining-identifier-and-operator-symbology.md

We'd welcome your initial thoughts, and will probably submit a PR soon to the swift-evolution repo for a formal review. Full text follows below.

I haven’t had a chance to read the entire proposal, nor the tons of great discussion down thread, but here are a few thoughts, just MHO:

- I’m loving that you’re taking a detail oriented approach to the problem. I agree with you that our current approach is unprincipled, and we need to get this right for Swift 4.
- I think that it is perfectly fine to err on the side of conservatism: if it isn’t clear how to classify something (e.g. Braille patterns), we should just reject them in both operators and identifiers (make them be unassigned). If these unclear cases are important to someone, then we can consider (as a separate additive proposal) adding them back later.
- As to conservatism, explicitly reserving “..” (for possible future language directions) seems reasonable to me. Are there any other similar things we should consider reserving?

- I applaud the creativity keeping a valid identifier :-), but it is really missing the point. *All* of the non-symbol-like emoji’s should be valid in identifiers. With a quick unscientific look at Apple’s character picker, all the emojis other than a few in “Symbols” seem like they should be identifiers. It would be fine to conservatively leave all emoji “symbols” as unassigned.

The problem with this is that "emoji" is not a well-defined category by Unicode. Whether a character is rendered as emoji or a traditional symbol in a given font on a given platform can depend on variation selectors, and the exact variation selectors (or lack thereof) that choose emoji or traditional representation are non-portable, even among different text rendering APIs on the same platform (e.g. ATSUI vs TextKit vs CoreText vs WebKit on Darwin).

-Joe

···

On Oct 23, 2016, at 9:41 PM, Chris Lattner via swift-evolution <swift-evolution@swift.org> wrote:

On Oct 18, 2016, at 11:34 PM, Jacob Bandes-Storch via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

- I really think we should keep symbols as operators, including much of the math symbols (e.g. ∪). In a later separate proposal, we can consider whether it makes sense for emoji symbols (like to be usable as operators), I can see arguments both ways.

-Chris

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Benjamin_Spratling · October 19, 2016, 1:04pm

How did you decide on those operators instead of ASCII ones?

They are the “correct” mathematical symbols. On my machine, there is a block of characters explicitly marked as “Math Symbols". It’s a pretty good start (until you get to the section where latin characters are just drawn in different fonts,at which point, I make no assertions). ASCII is not the future. We have unicode for a reason, so we can type what we mean.

Obviously, we would want to enable as many operators as possible to continue functioning.

I don’t find this statement to be consistent with the others. From my perspective, it looks like the proposal deleted as many operators as possible.

• Operators suffer from low discoverability and difficult readability. They use symbols, not names. This places a cognitive cost on users with respect to both recall ("What is the operator that applies the behavior I need?") and recognition ("What does the operator in this code do?").

Requiring a developer to read “formUnion(with:)” is a large cognitive burden for someone trying to use Swift to solve problems in their field. Not all developers are only UI app makers or coding philosophers. Many are engineers and scientists trying to find a tool to get their job done. For these fields, using the correct mathematical operators significantly reduces the difficulty in reading code, which reduces the difficulty finding bugs. Swift offers a major performance improvement over the tools they may currently be using to get their jobs done, i.e. MatLab and Mathematica. While those tools offer other features Swift never will, there is always a push in research fields for better performance and independence from a desktop meta-tool, so battle tested algorithms are frequently moved out into stand-alone libraries. Yes, at times, that has been my job. Occasionally, someone working on advanced math libraries chimes in on the forum, I know there are folks working on vector libraries out there, I’m sure their code would be more legible using the correct vector-oriented operators. With the ascii-only operators, there is a lack of distinction between scalar products, and vector products, for example. Something which can be represented easily with extended unicode operators.

If someone really wants to make the operator sets so restricted, fine, make that a policy at your company. Please don’t delete it from the language. In the course on Swift 3 that I’m teaching, I even recommend users don’t create operators they can’t easily type, but keyboards are going to get better. Heck, I’m paying as much for an Apple keyboard these days as I paid for a cell phone back in 2001.

Look, go easy on me. I’m still reeling from discovering that weak references cause memory to be persisted until the app itself acts as a garbage collector. I was unable to bring myself to write code for 2 days following that. Please don’t take away my icing, too. I’m not sure I can keep writing code if you do. Removing standard math operators would make my code look I’m constantly writing 1980’s work arounds instead of clean code.

I have a hard time accepting that essentially reverting the character set for Swift to ASCII is really a good move forward. I think the proposal reflects some good work on codifying identifiers, but I think the removal of emoji and almost every operator means there’s more work to be done before this proposal is ready for acceptance.

-Ben

···

On Oct 19, 2016, at 7:18 AM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

Jonathan_S_Shapiro · October 19, 2016, 5:26pm

All of these seem reasonable to me. Can we please confirm that all of these
lie within

[:S:] - [:Sc]

Thanks!

Rationale will become clear in just a moment.

Jonathan

···

On Wed, Oct 19, 2016 at 10:21 AM, Rob Mayoff via swift-evolution < swift-evolution@swift.org> wrote:

On Wed, Oct 19, 2016 at 10:47 AM, plx via swift-evolution < > swift-evolution@swift.org> wrote:

In any case, I’d specifically hate to lose these:

- approximate equality: ≈
- set operations: ∩, ∪
- set relations: ⊂, ⊃, ⊄, ⊅, ⊆, ⊇, ⊈, ⊉, ⊊, ⊋
- set membership: ∌, ∋, ∈, ∉
- logical operators: ¬, ∧, ∨

I'd add ≤ ≥ ≠ to that set.

anandabits · October 19, 2016, 6:20pm

I think that depends on who you ask. I think I understand the argument for taking that approach. I just don’t necessarily agree with it. I haven’t seen a compelling enough argument that this is actually causing a problem *in practice* or in some way preventing the language from moving forward.

If we can find a way to include a sizable subset of mathematical operators we believe will be included that goes beyond those suggested by plx I would support that. I just think going all the way back to basic ascii operators is much to far and believe we should be able to find a better “temporary” solution while waiting on the Unicode Consortium.

···

On Oct 19, 2016, at 12:27 PM, Erica Sadun <erica@ericasadun.com> wrote:

On Oct 19, 2016, at 7:41 AM, Matthew Johnson via swift-evolution <swift-evolution@swift.org> wrote:

I very much support the proposal to rationalize our handling of identifier characters.

I also support doing something similar for operator symbols. However, I agree feedback from others that this proposal goes way to far in removing our ability to use mathematical operators.

If I’m reading the proposal and discussion properly, the group has not able to reach consensus on the right criteria for operator symbols, but is hopeful that will be possible after the Unicode Consortium completes its work. I think it would be far better to defer the changes to valid operator symbols until that time (removing only symbols which are currently treated as operators but for which the proposal suggests should be available for identifiers instead).

It's more practical to make breaking changes now and introduce the "right set" (that is, a standards-based set of mathematical operators) at a future date, than to justify keeping things as is and removing operators at a future date.

anandabits · October 19, 2016, 6:22pm

If I’m reading the proposal and discussion properly, the group has not able to reach consensus on the right criteria for operator symbols, but is hopeful that will be possible after the Unicode Consortium completes its work. I think it would be far better to defer the changes to valid operator symbols until that time (removing only symbols which are currently treated as operators but for which the proposal suggests should be available for identifiers instead).

Beginning with Swift 4, there will be a major push to ensure that backwards compatibility with existing code is not broken. It will be possible to expand the operator character set, but very difficult to shrink it.

Given the current state of the discussion over in Unicode land, I think it would probably be safe from a compatibility standpoint to admit code points that fall into the following (Unicode-style) code point set:

Am I reading this correctly that you are suggesting we expand the proposal to include this set of operator characters? If this is what you are suggesting I would drop my opposition to the proposal as it would no longer take away a bunch of very common mathematical operators. I believe defining the included set this way would also address Xiaodi’s concerns about including the set operators.

···

On Oct 19, 2016, at 12:29 PM, Jonathan S. Shapiro <jonathan.s.shapiro@gmail.com> wrote:
On Wed, Oct 19, 2016 at 6:41 AM, Matthew Johnson via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

[:S:] - [:Sc:] - [:xidcontinue:] - [:nfcqc=n:] & [:scx=Common:] - pictographics - emoji

into operator characters. In English, this would be:

All symbols excluding currency symbols, provided they are not already in regular identifiers, requiring that they are legal under NFC normalization and also that they live in the Common script.

Explicitly exclude pictographics and emojis, not as a value judgment of UAX31, but because different languages seem to be choosing to go different ways about whether these are part of normal identifiers or operator identifiers.

Similar rationale for currency symbols, though I personally believe those should be operators rather than regular identifiers.

It's possible that other things will go in to UAX31, but it's very hard to imagine that anything in the set above will end up getting excluded. In particular, there is some inclination to add some punctuation symbols in UAX31, but that's going to take some work to ensure that we don't make a mess inadvertently.

As a transitional matter, I think it would be conservatively safe to add the code points identified above. Note that it's important to exclude ASCII code points that are currently "punctuation reserved words". In Swift this (at least) includes:

. (period, when it does not appear [at least] two times in sequence)
; (Semicolon)
: (Full colon)
$ (Dollar sign - used in special identifiers, which I consider a flaw)
any and all brackets (for now).

IMO, the best argument against using unicode symbols for operators defined by mathematics is that they are currently difficult to type.

And there is no realistic hope of that changing. This issue is so compelling that C and C++ introduced standardized text-ascii alternatives for the punctuation operators to relieve stress on non-english keyboard users.

This is an argument with a limited lifespan and should not carry more weight than it deserves in the design of a language positioned to be the language for the next 20 years. I strongly believe that removing them, even temporarily, is a mistake.

I think it's good to be a little conservative given the fact that the issue is more broadly "in flight". That said, I personally believe that the current proposal has cut back too far.

Jonathan

Brent_Royal-Gordon · October 19, 2016, 8:46pm

I was in the middle of writing about my opposition to the original proposal when I went to bed last night, and was going to advocate something like this:

Given the current state of the discussion over in Unicode land, I think it would probably be safe from a compatibility standpoint to admit code points that fall into the following (Unicode-style) code point set:

[:S:] - [:Sc:] - [:xidcontinue:] - [:nfcqc=n:] & [:scx=Common:] - pictographics - emoji

I suspect we can probably also do something about emoji, since I doubt UAX #31 is going to. Given that they are all static pictures of people or things, I think we can decide they are all nouns and thus all identifier characters. If we think there are some which might be declared operators later, we can exclude them for now, but I'd like to at least see the bulk of them brought in.

I think addressing emoji is important not for any technical reason, but for nontechnical ones. Emoji are a statement about Swift's modern approach; modernity is important. They are fun and whimsical; whimsy is important.

And most importantly, emoji identifiers are part of Swift's culture. It's widely understood that you don't use them in real code, but they are very common in examples. Just as we worry about source compatibility and binary compatibility, so we should worry about culture compatibility. Removing emoji would cause a gratuitous cultural regression.

···

--
Brent Royal-Gordon
Architechies

xwu · October 19, 2016, 1:54pm

Sorry, I've not been very clear on my question. What non-ASCII operators
are in use in your production code? What ASCII equivalents did you consider
and discard for those particular operators?

It's important to remember that math *symbols* are not all operators. We
cannot merely transpose Unicode characters labeled "mathematical" into our
set of operators because many (such as the null set symbol) do not operate
on anything and instead represent mathematical objects, which therefore
suggests that if anything those should be valid identifier characters and
not operator characters. This task becomes impossible once one considers
that, in mathematics, symbols such as nabla are both used as operators
_and_ may stand alone.

The bottom line is that this laborious work of classification is what the
Unicode Consortium is for. I am convinced that we, the Swift community, are
not capable of undertaking this task with any semblance of coherence. Jacob
and I spent two or three emails going back and forth about the inclusion of
"tiny" and "miny," two symbols I'll bet you've never contemplated using in
your code. We also discussed whether pentagons and hexagons were
appropriate (and yes, these are classed as mathematical symbols; on the one
hand, these are plausibly operator-looking characters, but on the other
hand, angles are not considered operators, and shapes can no more take an
operand than can angles). Extending this work throughout the disparate
ranges of mathematical symbols in Unicode 9 is untenable.

Note that while Unicode does not _yet_ have recommendations on operator
characters, and while we cannot wait until it does in order to move
forward, it is not out of the question that a future version of Swift could
incorporate that work even if we remove non-ASCII characters today.
However, if we move ahead with an ad-hoc selection of Unicode characters,
we may never be able to converge with a future Unicode recommendation
without breaking backwards source compatibility.

Thus, I think it's important to be specific about what code you can no
longer write if non-ASCII operators are removed. It is not true that we
have removed "almost every operator." Although we are proposing the removal
of the vast majority of currently valid operator characters, based on our
survey of code bases available to us, we believe that we are removing _zero
or very nearly zero operators_ in usage today. Should you have a use case
we haven't contemplated, I'd be very very keen to hear about it.

How did you decide on those operators instead of ASCII ones?

They are the “correct” mathematical symbols. On my machine, there is a
block of characters explicitly marked as “Math Symbols". It’s a pretty
good start (until you get to the section where latin characters are just
drawn in different fonts,at which point, I make no assertions). ASCII is
not the future. We have unicode for a reason, so we can type what we mean.

Obviously, we would want to enable as many operators as possible to

continue functioning.
I don’t find this statement to be consistent with the others. From my
perspective, it looks like the proposal deleted as many operators as
possible.

• Operators suffer from low discoverability and difficult

readability. They use symbols, not names. This places a cognitive cost on
users with respect to both recall ("What is the operator that applies the
behavior I need?") and recognition ("What does the operator in this code
do?").

Requiring a developer to read “formUnion(with:)” is a large cognitive
burden for someone trying to use Swift to solve problems in their field.
Not all developers are only UI app makers or coding philosophers. Many are
engineers and scientists trying to find a tool to get their job done. For
these fields, using the correct mathematical operators significantly
reduces the difficulty in reading code, which reduces the difficulty
finding bugs. Swift offers a major performance improvement over the tools
they may currently be using to get their jobs done, i.e. MatLab and
Mathematica. While those tools offer other features Swift never will,
there is always a push in research fields for better performance and
independence from a desktop meta-tool, so battle tested algorithms are
frequently moved out into stand-alone libraries. Yes, at times, that has
been my job. Occasionally, someone working on advanced math libraries
chimes in on the forum, I know there are folks working on vector libraries
out there, I’m sure their code would be more legible using the correct
vector-oriented operators. With the ascii-only operators, there is a lack
of distinction between scalar products, and vector products, for example.
Something which can be represented easily with extended unicode operators.

If someone really wants to make the operator sets so restricted, fine, make
that a policy at your company. Please don’t delete it from the language.
In the course on Swift 3 that I’m teaching, I even recommend users don’t
create operators they can’t easily type, but keyboards are going to get
better. Heck, I’m paying as much for an Apple keyboard these days as I
paid for a cell phone back in 2001.

Look, go easy on me. I’m still reeling from discovering that weak
references cause memory to be persisted until the app itself acts as a
garbage collector. I was unable to bring myself to write code for 2 days
following that. Please don’t take away my icing, too. I’m not sure I can
keep writing code if you do. Removing standard math operators would make
my code look I’m constantly writing 1980’s work arounds instead of clean
code.

I have a hard time accepting that essentially reverting the character set
for Swift to ASCII is really a good move forward. I think the proposal
reflects some good work on codifying identifiers, but I think the removal
of emoji and almost every operator means there’s more work to be done
before this proposal is ready for acceptance.

-Ben

···

On Wed, Oct 19, 2016 at 21:06 Benjamin Spratling via swift-evolution < swift-evolution@swift.org> wrote:

On Oct 19, 2016, at 7:18 AM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Paul_Cantrell · October 19, 2016, 7:49pm

I have production code that uses ± and ≈.

As with all operator overloads, there is room for abuse — but I’ve been using these operators to mean what it looks like they should mean, and am very pleased with the readability benefits.

At the very least, Swift ought to support operators using symbols from the Unicode blocks called “Mathematical Operators” and “Supplemental Mathematical Operators.”

It’s right there in the name!™

Cheers, P

···

On Oct 19, 2016, at 12:21 PM, Rob Mayoff via swift-evolution <swift-evolution@swift.org> wrote:

On Wed, Oct 19, 2016 at 10:47 AM, plx via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:
In any case, I’d specifically hate to lose these:

- approximate equality: ≈
- set operations: ∩, ∪
- set relations: ⊂, ⊃, ⊄, ⊅, ⊆, ⊇, ⊈, ⊉, ⊊, ⊋
- set membership: ∌, ∋, ∈, ∉
- logical operators: ¬, ∧, ∨

I'd add ≤ ≥ ≠ to that set.

David_Sweeris · October 19, 2016, 8:37pm

Wait, what? They’re only hard to type because people don’t seem to realize they can make their own keyboard layouts to use while they’re waiting for the USB Consortium to notice that it’s not the '80s anymore and update the class driver spec to allow keyboards to directly type unicode characters.
For macOS, I use Ukelele (http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=ukelele\). I don’t know what tools there are for Windows or Linux, but I’d be *shocked* if they didn’t exist.

- Dave Sweeris

···

On Oct 19, 2016, at 12:29 PM, Jonathan S. Shapiro via swift-evolution <swift-evolution@swift.org> wrote:

On Wed, Oct 19, 2016 at 6:41 AM, Matthew Johnson via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:
IMO, the best argument against using unicode symbols for operators defined by mathematics is that they are currently difficult to type.

And there is no realistic hope of that changing. This issue is so compelling that C and C++ introduced standardized text-ascii alternatives for the punctuation operators to relieve stress on non-english keyboard users.

anandabits · October 19, 2016, 9:53pm

IMO, the best argument against using unicode symbols for operators defined by mathematics is that they are currently difficult to type.

And there is no realistic hope of that changing. This issue is so compelling that C and C++ introduced standardized text-ascii alternatives for the punctuation operators to relieve stress on non-english keyboard users.

I don’t agree that there is no realistic hope of that changing. It appears to be pretty reasonable to anticipate that we’ll all be using software-driven keyboards that can display software-defined symbols on the keys in the relatively near future (probably 5 years, certainly 10). All kinds of interesting things become possible when that happens, including the ability to make unicode operators much easier to discover and type in a programmer’s editor.

Nevin · October 20, 2016, 5:26am

Thinking about it further, I am not convinced we need to make *any* change
to the set of operator characters at this time. It’s not like people are
clamoring to have Braille variable names after all. And as much as I’d like
to see the upside-down ampersand (⅋) as an operator, that too can wait.

I am hopeful that this proposal will be revised to focus solely on adopting
UAX-31. I am not yet familiar with the specifics of that document, and I
expect I am not alone in that regard. Since the proposal indicates several
thousand characters will no longer be valid in identifiers, it seems quite
possible that some of them may be controversial.

I think it is far more productive to spend our collective efforts on making
sure we get identifiers right for Swift 4. We can deal with operators in a
similar manner once official Unicode guidelines are put forth, so we should
not spend time on them now.

Nevin

···

On Thu, Oct 20, 2016 at 12:44 AM, Jacob Bandes-Storch via swift-evolution < swift-evolution@swift.org> wrote:

On Wed, Oct 19, 2016 at 10:12 AM, Alex Martini <amartini@apple.com> wrote:

Grammar changes

operator → operator-head operator-characters[opt]

operator-head → ! % & * + - / < = > ? ^ | ~
operator-head → operator-dot operator-dots
operator-character → operator-head
operator-characters → operator-character operator-character[opt]

operator-dot → .
operator-dots → operator-dot operator-dots[opt]

<https://github.com/jtbandes/swift-evolution/blob/unicode-id-op/proposals/NNNN-refining-identifier-and-operator-symbology.md#emoji>

I think there's a mismatch between the English and grammar. For example,
is +..+ allowed or not?

The English rule does allow +..+ because its dots appear in a run of two.

The grammar allows a run of one or more dots as an operator head, but
never allows dots as characters appearing in the middle of an operator,
regardless of how many dots appear next to each other. The grammar
wouldn't allow +..+ because the dots don't come at the beginning.

Here's an alternate version of the grammar that matches the "two or more"
rule. Because we no longer distinguish between which characters are
allowed as the first character of an operator vs a character inside,
there's no longer a need for a separate operator-head.

operator --> operator-character operator-OPT

operator-character --> ! % & * + - / < = > ? ^ | ~
operator-character --> operator-dots

operator-dots --> .. operator-additional-dots-OPT
operator-additional-dots --> . operator-additional-dots-OPT

There is a typo in that operator-character[opt] should be
operator-characters[opt]. Aside from that, though, I believe the grammar as
written accepts +..+ already. Take a look at the following series of
substitutions based on the grammar rules:

operator
operator-head operator-characters
+ operator-characters
+ operator-character operator-characters[opt]
+ operator-head operator-head
+ operator-dot operator-dots operator-head
+ . . +

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Jonathan_S_Shapiro · October 23, 2016, 12:53am

I missed this earlier posting from Joe Groff, who wrote:

In the discussion about operators, I wonder whether it makes sense to

formally separate "identifier" and "operator" characters at all. ...

The consequence if we do not formally separate the operators (verbs) from
the identifiers (nouns) is that white space will be needed around all
operators. That's not necessarily a bad thing, but it would be a
significant and incompatible departure from today's Swift, both in terms of
actual source code breakage and in terms of the "look and feel" that many
people feel passionate about.

Jonathan

Colin_Barrett · October 23, 2016, 1:49am

That’s one way yeah. Or you’d could to make the grammar context sensitive / apply a "lexer hack”. Probably other ways to deal w/ context sensitivity as well. Joe’s proposed syntax seems pretty explicit, and hopefully it’sdsimple to plumb / capture that info in the lexer. (I’m ignorant of the implementation of Swift’s front-end unfortunately!)

-Colin

···

On Oct 22, 2016, at 8:53 PM, Jonathan S. Shapiro <jonathan.s.shapiro@gmail.com> wrote:

I missed this earlier posting from Joe Groff, who wrote:

In the discussion about operators, I wonder whether it makes sense to formally separate "identifier" and "operator" characters at all. ...

The consequence if we do not formally separate the operators (verbs) from the identifiers (nouns) is that white space will be needed around all operators. That's not necessarily a bad thing, but it would be a significant and incompatible departure from today's Swift, both in terms of actual source code breakage and in terms of the "look and feel" that many people feel passionate about.

Joe_Groff · October 24, 2016, 4:40pm

That's not a strict requirement. If we require operator usage to be declared explicitly, the lexer can accommodate those declarations. Since operators only appear as part of expressions inside bodies, the operator import or declaration doesn't even necessarily have to be ordered at the top of the file since we can still skip function bodies when parsing declarations (though I think we'd want to encourage imports on top anyway for the benefit of readers). This wouldn't be unprecedented—operators as they stand already effectively require an extra pass of parsing.

-Joe

···

On Oct 22, 2016, at 5:53 PM, Jonathan S. Shapiro <jonathan.s.shapiro@gmail.com> wrote:

I missed this earlier posting from Joe Groff, who wrote:

In the discussion about operators, I wonder whether it makes sense to formally separate "identifier" and "operator" characters at all. ...

The consequence if we do not formally separate the operators (verbs) from the identifiers (nouns) is that white space will be needed around all operators. That's not necessarily a bad thing, but it would be a significant and incompatible departure from today's Swift, both in terms of actual source code breakage and in terms of the "look and feel" that many people feel passionate about.

Nevin · October 24, 2016, 8:42pm

Here are several more Unicode blocks that contain characters we may want as
operator symbols
<http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[\p{Block%3DMiscellaneous+Technical} \p{Block%3DOptical+Character+Recognition} \p{Block%3DBox+Drawing} \p{Block%3DBlock+Elements} \p{Block%3DGeometric+Shapes} \p{Block%3DMiscellaneous+Symbols} \p{Block%3DDingbats} \p{Block%3DBraille} \p{Block%3DMiscellaneous+Symbols+And+Arrows} \p{Block%3DYijing+Hexagram+Symbols} \p{Block%3DMusical+Symbols} \p{Block%3DAncient+Greek+Musical+Notation} \p{Block%3DTai+Xuan+Jing+Symbols} \p{Block%3DMahjong+Tiles} \p{Block%3DDomino+Tiles} \p{Block%3DPlaying+Cards} \p{Block%3DOrnamental+Dingbats} \p{Block%3DAlchemical+Symbols} \p{Block%3DGeometric+Shapes+Extended} \p{Block%3DSupplemental+Arrows+C}] &g=&i=>\.
I have not gone through them at a detailed level, and many blocks I am
simply unsure how we want to categorize.

[\p{Block=Miscellaneous Technical}
\p{Block=Optical Character Recognition}
\p{Block=Box Drawing}
\p{Block=Block Elements}
\p{Block=Geometric Shapes}
\p{Block=Miscellaneous Symbols}
\p{Block=Dingbats}
\p{Block=Braille}
\p{Block=Miscellaneous Symbols And Arrows}
\p{Block=Yijing Hexagram Symbols}
\p{Block=Musical Symbols}
\p{Block=Ancient Greek Musical Notation}
\p{Block=Tai Xuan Jing Symbols}
\p{Block=Mahjong Tiles}
\p{Block=Domino Tiles}
\p{Block=Playing Cards}
\p{Block=Ornamental Dingbats}
\p{Block=Alchemical Symbols}
\p{Block=Geometric Shapes Extended}
\p{Block=Supplemental Arrows C}]

Nevin

Russ_Bishop1 · October 25, 2016, 5:40am

Dear Swift-Evolution community,

A few of us have been preparing a proposal to refine the definitions of identifiers & operators. This includes some changes to the permitted Unicode characters.

The latest (perhaps final?) draft is available here:

https://github.com/jtbandes/swift-evolution/blob/unicode-id-op/proposals/NNNN-refining-identifier-and-operator-symbology.md

We'd welcome your initial thoughts, and will probably submit a PR soon to the swift-evolution repo for a formal review. Full text follows below.

I haven’t had a chance to read the entire proposal, nor the tons of great discussion down thread, but here are a few thoughts, just MHO:

- I’m loving that you’re taking a detail oriented approach to the problem. I agree with you that our current approach is unprincipled, and we need to get this right for Swift 4.
- I think that it is perfectly fine to err on the side of conservatism: if it isn’t clear how to classify something (e.g. Braille patterns), we should just reject them in both operators and identifiers (make them be unassigned). If these unclear cases are important to someone, then we can consider (as a separate additive proposal) adding them back later.
- As to conservatism, explicitly reserving “..” (for possible future language directions) seems reasonable to me. Are there any other similar things we should consider reserving?

- I applaud the creativity keeping a valid identifier :-), but it is really missing the point. *All* of the non-symbol-like emoji’s should be valid in identifiers. With a quick unscientific look at Apple’s character picker, all the emojis other than a few in “Symbols” seem like they should be identifiers. It would be fine to conservatively leave all emoji “symbols” as unassigned.

The problem with this is that "emoji" is not a well-defined category by Unicode. Whether a character is rendered as emoji or a traditional symbol in a given font on a given platform can depend on variation selectors, and the exact variation selectors (or lack thereof) that choose emoji or traditional representation are non-portable, even among different text rendering APIs on the same platform (e.g. ATSUI vs TextKit vs CoreText vs WebKit on Darwin).

-Joe

I’m not sure that is true. Unicode gives the list: http://unicode.org/emoji/charts/full-emoji-list.html\.

If a platform can’t render the ZJW sequences it can render them as separate Emoji, but Swift can still treat that as the same identifier.

== 🏼

If you don’t have a font capable of displaying the character at all that isn’t any different from not having a Chinese font available. You should get the missing character glyph. The list of emoji base characters is not unrestricted - there is a specific and limited list of valid base characters that accept modifiers.

If we wanted to go further and say that all Emoji modifiers are preserved and rendered if possible but not considered part of the identifier that would be OK with me. Same for variation selectors.

Russ

···

On Oct 24, 2016, at 9:43 AM, Joe Groff via swift-evolution <swift-evolution@swift.org> wrote:

On Oct 23, 2016, at 9:41 PM, Chris Lattner via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Oct 18, 2016, at 11:34 PM, Jacob Bandes-Storch via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

- I really think we should keep symbols as operators, including much of the math symbols (e.g. ∪). In a later separate proposal, we can consider whether it makes sense for emoji symbols (like to be usable as operators), I can see arguments both ways.

-Chris

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Benjamin_Spratling · October 19, 2016, 2:01pm

They’ve been selling products to mathematicians and scientists for decades. Some of those symbols are their own, i.e. not included in unicode.

···

On Oct 19, 2016, at 8:54 AM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

Sorry, I've not been very clear on my question. What non-ASCII operators are in use in your production code? What ASCII equivalents did you consider and discard for those particular operators?