Personally, a *very* limited cherrypicking might be OK, on the
understanding that it must be done with very stringent inclusion criteria.
However, it's specifically the set algebra operators that I have the
greatest objection to cherrypicking:
* Each of these operators has a specific meaning; it is an anti-goal to
support repurposing the union operator for any other purpose than forming a
union of two sets, for instance.
* Thus, the only rationale for inclusion of these operators is to support
aliasing the set algebra protocol members.
* Now, if it is appropriate for these set algebra operations to be
accessible through these operators, then the standard library should be
providing them.
* However, exactly such an API has been trialled on a Swift branch, and the
set algebra renaming debate of 2015 (or was it early 2016?) on this very
list resulted in a renaming that *rejected* the use of these operators.
* Given that these operators have been deemed not appropriate for the only
methods that should use them, we should *not* specifically enable these
symbols as valid operator characters.
It is of course true that a user can essentially choose to alias any method
using any other identifier. However, this particular scenario is different
because you are advocating for Swift to make express allowance in the
grammar for solely that purpose. In other words, the language design would
be saying to the user, "We don't think this is a good idea at all;
otherwise, it'd be in the standard library. We literally tried that and
decided against it. However, we're specifically going to let you act on
this not-very-good idea by explicitly leaving room for it."
Now, if we had a complete set of operator characters, which would include
set algebra operators, then it wouldn't be problematic in the same way. In
that scenario, the language would be no more responsible for your renaming
formUnion() to the union symbol than to nonsenseAsdf(), since in neither
case would the underlying characters have been included among valid
operator or identifier characters, respectively, for the express purpose of
making possible that renaming.
···
On Wed, Oct 19, 2016 at 23:48 plx via swift-evolution < swift-evolution@swift.org> wrote:
+💯 on the emoji-related parts, +1 in general spirit, +1 for the
identifier cleanup, -103 for being needlessly overly-restrictive for
operators; net -1 overall.Operator abuse is a social problem, and even if a technical fix is
possible this isn’t that…and despite the messiness of the relevant unicode
categories, this proposal goes far too far.For operators, the reasonable thing to do at this time would be to
hand-select a small subset of the mathematical characters to allow as
operators—the “greatest hits” so to speak—and move on. If any grave
oversights are discovered those characters can be included in subsequent
major revisions; if the consortium ever finishes its recommendation it can
be adopted at that time.There’s no need to exhaustively re-do the consortium’s work and there’s no
need to make a correct-for-all-time decision on each character at this
time; pick the low-hanging fruit and leave the rest for later.That not everyone will be perfectly happy with any specific subset is
predictable and not interesting; not everyone is going to be perfectly
happy with this proposal’s proposed subset, either.In any case, I’d specifically hate to lose these:
- approximate equality: ≈
- set operations: ∩, ∪
- set relations: ⊂, ⊃, ⊄, ⊅, ⊆, ⊇, ⊈, ⊉, ⊊, ⊋
- set membership: ∌, ∋, ∈, ∉
- logical operators: ¬, ∧, ∨…although there are many more that would be nice to keep available.
On Oct 19, 2016, at 1:34 AM, Jacob Bandes-Storch via swift-evolution < > swift-evolution@swift.org> wrote:
Dear Swift-Evolution community,
A few of us have been preparing a proposal to refine the definitions of
identifiers & operators. This includes some changes to the permitted
Unicode characters.The latest (perhaps final?) draft is available here:
We'd welcome your initial thoughts, and will probably submit a PR soon to
the swift-evolution repo for a formal review. Full text follows below.—Jacob Bandes-Storch, Xiaodi Wu, Erica Sadun, Jonathan Shapiro
Refining Identifier and Operator Symbology
- Proposal: SE-NNNN
<https://github.com/jtbandes/swift-evolution/blob/unicode-id-op/proposals/NNNN-refining-identifier-and-operator-symbology.md>
- Authors: Jacob Bandes-Storch <https://github.com/jtbandes>, Erica
Sadun <https://github.com/erica>, Xiaodi Wu <https://github.com/xwu>,
Jonathan Shapiro
- Review Manager: TBD
- Status: Awaiting reviewThis proposal seeks to refine and rationalize Swift's identifier and
operator symbology. Specifically, this proposal:- adopts the Unicode recommendation for identifier characters, with
some minor exceptions;
- restricts the legal operator set to the current ASCII operator
characters;
- changes where dots may appear in operators; and
- disallows Emoji from identifiers and operators.<https://github.com/jtbandes/swift-evolution/blob/unicode-id-op/proposals/NNNN-refining-identifier-and-operator-symbology.md#prior-discussion-threads--proposals>Prior
discussion threads & proposals- Proposal: Normalize Unicode identifiers
<https://github.com/apple/swift-evolution/pull/531>
- Unicode identifiers & operators
<https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20160912/027108.html>,
with pre-proposal
<https://gist.github.com/jtbandes/c0b0c072181dcd22c3147802025d0b59> (a
precursor to this document)
- Lexical matters: identifiers and operators
<https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20160926/027479.html>
- Proposal: Allow Single Dollar Sign as Valid Identifier
<https://github.com/apple/swift-evolution/pull/354>
- Free the '$' Symbol!
<https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20151228/005133.html>
- Request to add middle dot (U+00B7) as operator character?
<https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20151214/003176.html>Chris Lattner has written:
…our current operator space (particularly the unicode segments covered) is
not super well considered. It would be great for someone to take a more
systematic pass over them to rationalize things.We need a token to be unambiguously an operator or identifier - we can
have different rules for the leading and subsequent characters though.…any proposal that breaks:
let = "moof"
will not be tolerated. :-) :-)
By supporting custom Unicode operators and identifiers, Swift attempts to
accomodate programmers and programming styles from many languages and
cultures. It deserves a well-thought-out specification of which characters
are valid. However, Swift's current identifier and operator character sets
do not conform to any Unicode standards, nor have they been rationalized in
the language or compiler documentation.Identifiers, which serve as *names* for various entities, are linguistic
in nature and must permit a variety of characters to properly serve
non–English-speaking coders. This issue has been considered by the
communities of many programming languages already, and the Unicode
Consortium has published recommendations on how to choose identifier
character sets — Swift should make an effort to conform to these
recommendations.Operators, on the other hand, should be rare and carefully chosen, because
they suffer from low discoverability and difficult readability. They are by
nature *symbols*, not names. This places a cognitive cost on users with
respect to both recall ("What is the operator that applies the behavior I
need?") and recognition ("What does the operator in this code do?"). While *almost
every* nontrivial program defines many new identifiers, most programs do
not define new operators.As operators become more esoteric or customized, the cognitive cost rises.
Recognizing a function name like formUnion(with:) is simpler for many
programmers than recalling what the ∪ operator does. Swift's current
operator character set includes many characters that aren't traditional and
recognizable operators — this encourages problematic and frivolous uses in
an otherwise safe language.Today, there are many discrepancies and edge cases motivating these
changes:- · is an identifier, while • is an operator.
- The Greek question mark ; is a valid identifier.
- Braille patterns ⠟ seem letter-like, but are operator characters.
- 🂡 are identifiers, while are operators.
- Some *non-combining* diacritics ´ ¨ ꓻ are valid in identifiers.
- Some completely non-linguistic characters, such as ۞ and ༒, are
valid in identifiers.
- Some symbols such as ⚄ and ♄ are operators, despite not really being
"operator-like".
- A small handful of characters 〡〢〣〤〥〦〧〨〩 〪 〫 〬 〭 〮 〯 are valid in both identifiers
and operators.
- Some non-printing characters such as 2064 INVISIBLE PLUS and 200B
ZERO WIDTH SPACE are valid identifiers.
- Currency symbols are split across operators (¢ £ ¤ ¥) and
identifiers ($ ₪ € ₱ ₹ ฿ ...).This matter should be considered in a near timeframe (Swift 3.1 or 4) as
it is both fundamental to Swift and will produce source-breaking changes.<https://github.com/jtbandes/swift-evolution/blob/unicode-id-op/proposals/NNNN-refining-identifier-and-operator-symbology.md#precedent-in-other-languages>Precedent
in other languagesHaskell distinguishes identifiers/operators by their general category
<http://www.fileformat.info/info/unicode/category/index.htm> such as "any
Unicode lowercase letter", "any Unicode symbol or punctuation", and so
forth. Identifiers can start with any lowercase letter or _, and may
contain any letter/digit/'/_. This includes letters like δ and Я, and
digits like ٢.- Haskell Syntax Reference
<https://www.haskell.org/onlinereport/syntax-iso.html>
- Haskell Lexer
<https://github.com/ghc/ghc/blob/714bebff44076061d0a719c4eda2cfd213b7ac3d/compiler/parser/Lexer.x#L1949-L1973>Scala similarly allows letters, numbers, $, and _ in identifiers,
distinguishing by general categories Ll, Lu, Lt, Lo, and Nl. Operator
characters include mathematical and other symbols (Sm and So) in addition
to other ASCII symbol characters.- Scala Lexical Syntax
<Lexical Syntax;ECMAScript 2015 ("ES6") uses ID_Start and ID_Continue, as well as
Other_ID_Start / Other_ID_Continue, for identifiers.- ECMAScript Specification: Names and Keywords
<ECMAScript 2015 Language Specification – ECMA-262 6th Edition;Python 3 uses XID_Start and XID_Continue.
- The Python Language Reference: Identifiers and Keywords
<2. Lexical analysis — Python 3.12.0 documentation;
- PEP 3131: Supporting Non-ASCII Identifiers
<https://www.python.org/dev/peps/pep-3131/>For identifiers, adopt the recommendations made in UAX #31 Identifier and
Pattern Syntax <http://unicode.org/reports/tr31/>, deriving the sets of
valid characters from ID_Start and ID_Continue. Normalize identifiers
using Normalization Form C (NFC).(For operators, no such recommendation currently exists, although active
work is in progress to update UAX #31 to address "operator identifiers".)Restrict operators to those ASCII characters which are currently
operators. All other operator characters are removed from the language.Allow dots in operators in any location, but only in runs of two or more.
(Overall, this proposal is aggressive in its removal of problematic
characters. We are not attempting to prevent the addition or re-addition of
characters in the future, but by paring the set down now, we require any
future changes to pass the high bar of the Swift Evolution process.)<https://github.com/jtbandes/swift-evolution/blob/unicode-id-op/proposals/NNNN-refining-identifier-and-operator-symbology.md#detailed-design>Detailed
design
<https://github.com/jtbandes/swift-evolution/blob/unicode-id-op/proposals/NNNN-refining-identifier-and-operator-symbology.md#identifiers>
IdentifiersSwift identifier characters will conform to UAX #31
<UAX #31: Unicode Identifiers and Syntax; as follows:-
UAX31-C1. <UAX #31: Unicode Identifiers and Syntax; The conformance
described herein refers to the Unicode 9.0.0 version of UAX #31 (dated
2016-05-31 and retrieved 2016-10-09).
-UAX31-C2. <UAX #31: Unicode Identifiers and Syntax; Swift shall observe
the following requirements:
-UAX31-R1. <UAX #31: Unicode Identifiers and Syntax; Swift shall augment
the definition of "Default Identifiers" with the following profiles:
1.ID_Start and ID_Continue shall be used for Start and Continue
(replacing XID_Start and XID_Continue). This excludes characters
in Other_ID_Start and Other_ID_Continue.
2._ 005F LOW LINE shall additionally be allowed as a Start
character.
3.The emoji characters 1F436 DOG FACE and 1F42E COW FACE
shall be allowed as Start and Continue characters.
4.(UAX31-R1a. <UAX #31: Unicode Identifiers and Syntax) The
join-control characters ZWJ and ZWNJ are strictly limited to the special
cases A1, A2, and B described in UAX #31. (This requirement is covered in
the Normalize Unicode Identifiers proposal
<Sign in to GitHub · GitHub)
-UAX31-R4. <UAX #31: Unicode Identifiers and Syntax; Swift shall
consider two identifiers equivalent when they have the same normalized form
under NFC <http://unicode.org/reports/tr15/>\. (This requirement is
covered in the Normalize Unicode Identifiers proposal
<Sign in to GitHub · GitHub)These changes
<Unicode Utilities: UnicodeSet Comparison; result
in the removal of some 5,500 valid code points from the identifier
characters, as well as hundreds of thousands of unassigned code points.
(Though it does not appear on this unicode.org utility, which currently
supports only Unicode 8 data, the · 00B7 MIDDLE DOT is no longer an
identifier character.) Adopting ID_Start and ID_Continue does not add any
new identifier characters.identifier-head → [:ID_Start:]
identifier-head → _
identifier-character → identifier-head
identifier-character → [:ID_Continue:]Swift operator characters will be limited to only the following ASCII
characters:! % & * + - . / < = > ? ^ | ~
The current restrictions on reserved tokens and operators will remain: =,
->, //, /*, */, ., ?, prefix <, prefix &, postfix >, and postfix ! are
reserved.The current requirements for dots in operator names are:
If an operator doesn’t begin with a dot, it can’t contain a dot elsewhere.
This proposal changes the rule to:
Dots may only appear in operators in runs of two or more.
Under the revised rule, ..< and ... are allowed, but <.< is not. We also reserve
the .. operator, permitting the compiler to use .. for a "method cascade"
syntax in the future, as supported by Dart
<http://news.dartlang.org/2012/02/method-cascades-in-dart-posted-by-gilad.html>
.Motivations for incorporating the two-dot rule are:
-
It helps avoid future lexical complications arising from lone .s.
-It's a conservative approach, erring towards overly restrictive.
Dropping the rule in future (thereby allowing single dots) may be possible.
-It doesn't require special cases for existing infix dot operators in
the standard library, ... (closed range) and ..< (half-open range). It
also leaves the door open for the standard library to add analogous
half-open and fully-open range operators <.. and <..<.
-If we fail to adopt this rule now, then future backward-compatibility
requirements will preclude the introduction of some potentially useful
language enhancements.operator → operator-head operator-characters[opt]
operator-head → ! % & * + - / < = > ? ^ | ~
operator-head → operator-dot operator-dots
operator-character → operator-head
operator-characters → operator-character operator-character[opt]operator-dot → .
operator-dots → operator-dot operator-dots[opt]If adopted, this proposal eliminates emoji from Swift identifiers and
operators. Despite their novelty and utility, emoji characters introduce
significant challenges to the language:-
Their categorization into identifiers and operators is not
semantically motivated, and is fraught with discrepancies.
-Emoji characters are not displayed consistently and uniformly across
different systems and fonts. Including all Unicode emoji
<Unicode Utilities: UnicodeSet; introduces
characters that don't render as emoji on Apple platforms without a variant
selector, but which also wouldn't normally be used as identifier characters
(e.g. ).
-Some emoji nearly overlap with existing operator syntax:
-Full emoji support necessitates handling a variety of use cases for
joining characters and variant selectors, which would not otherwise be
useful in most cases. It would be hard to avoid permitting sequences of
characters which aren't valid emoji, or being overly restrictive and not
properly supporting emoji introduced in future versions of Unicode.As an exception, in homage to Swift's origins, we permit and in
identifiers.This change is source-breaking in cases where developers have incorporated
emoji or custom non-ASCII operators, or identifiers with characters which
have been disallowed. This is unlikely to be a significant breakage for the
majority of serious Swift code.Code using the middle dot · in identifiers may be slightly more common. · is
now disallowed entirely.Diagnostics for invalid characters are already produced today. We can
improve them easily if needed.Maintaining source compatibility for Swift 3 should be easy: just keep the
old parsing & identifier lookup code.<https://github.com/jtbandes/swift-evolution/blob/unicode-id-op/proposals/NNNN-refining-identifier-and-operator-symbology.md#effect-on-abi-stability>Effect
on ABI stabilityThis proposal does not affect the ABI format itself, although the Normalize
Unicode Identifiers proposal
<https://github.com/apple/swift-evolution/pull/531> affects the ABI of
compiled modules.The standard library will not be affected; it uses ASCII symbols with no
combining characters.<https://github.com/jtbandes/swift-evolution/blob/unicode-id-op/proposals/NNNN-refining-identifier-and-operator-symbology.md#effect-on-api-resilience>Effect
on API resilienceThis proposal doesn't affect API resilience.
-
Define operator characters using Unicode categories such as Sm and So
<http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[[%3ASm%3A][%3ASo%3A]]>\.
This approach would include many "non-operator-like" characters and doesn't
seem to provide a significant benefit aside from a simpler definition.
-Hand-pick a set of "operator-like" characters to include. The proposal
authors tried this painstaking approach, and came up with a relatively
agreeable set of about 650 code points
<http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[!\%24%25\%26*%2B\-%2F<%3D>%3F\^|~ \u00AC \u00B1 \u00B7 \u00D7 \u00F7 \u2208-\u220D \u220F-\u2211 \u22C0-\u22C3 \u2212-\u221D \u2238 \u223A \u2240 \u228C-\u228E \u2293-\u22A3 \u22BA-\u22BD \u22C4-\u22C7 \u22C9-\u22CC \u22D2-\u22D3 \u2223-\u222A \u2236-\u2237 \u2239 \u223B-\u223E \u2241-\u228B \u228F-\u2292 \u22A6-\u22B9 \u22C8 \u22CD \u22D0-\u22D1 \u22D4-\u22FF \u22CE-\u22CF \u2A00-\u2AFF \u27C2 \u27C3 \u27C4 \u27C7 \u27C8 \u27C9 \u27CA \u27CE-\u27D7 \u27DA-\u27DF \u27E0-\u27E5 \u29B5-\u29C3 \u29C4-\u29C9 \u29CA-\u29D0 \u29D1-\u29D7 \u29DF \u29E1 \u29E2 \u29E3-\u29E6 \u29FA \u29FB \u2308-\u230B \u2336-\u237A \u2395]> (although
this set would require further refinement), but ultimately felt the
motivation for including non-ASCII operators is much lower than for
identifiers, and the harm to readers/writers of programs outweighs their
potential utility.
-Use Normalization Form KC (NFKC) instead of NFC. The decision to use
NFC comes from Normalize Unicode Identifiers proposal
<https://github.com/apple/swift-evolution/pull/531>\. Also, UAX #31
states:Generally if the programming language has case-sensitive identifiers,
then Normalization Form C is appropriate; whereas, if the programming
language has case-insensitive identifiers, then Normalization Form KC is
more appropriate.NFKC may also produce surprising results; for example, "ſ" and "s" are
equivalent under NFKC.
-Continue to allow single .s in operators, and perhaps even expand the
original rule to allow them anywhere (even if the operator does not begin
with .).This would allow a wider variety of custom operators (for some
interesting possibilities, see the operators in Haskell's Lens
<https://github.com/ekmett/lens/wiki/Operators> package). However,
there are a handful of potential complications:
-Combining prefix or postfix operators with member access: foo*.bar would
need to be parsed as foo *. barrather than (foo*).bar. Parentheses
could be required to disambiguate.
-Combining infix operators with contextual members: foo*.bar would
need to be parsed as foo *. bar rather than foo * (.bar).
Whitespace or parentheses could be required to disambiguate.
-Hypothetically, if operators were accessible as members such as
MyNumber.+, allowing operators with single .s would require
escaping operator names (perhaps with backticks, such as
MyNumber.`+`).This would also require operators of the form [!?]*\. (for example . ?.
!. !!.) to be reserved, to prevent users from defining custom
operators that conflict with member access and optional chaining.We believe that requiring dots to appear in groups of at least two,
while in some ways more restrictive, will prevent a significant amount of
future pain, and does not require special-case considerations such as the
above.While not within the scope of this proposal, the following considerations
may provide useful context for the proposed changes. We encourage the
community to pick up these topics when the time is right.-
Re-expand operators to allow some non-ASCII characters. There is work
in progress to update UAX #31 with definitions for "operator identifiers" —
when this work is completed, it would be worth considering for Swift.
-Introduce a syntax for method cascades. The Dart language supports method
cascades
<http://news.dartlang.org/2012/02/method-cascades-in-dart-posted-by-gilad.html>,
whereby multiple methods can be called on an object within one expression:
foo..bar()..baz() effectively performs foo.bar(); foo.baz(). This
syntax can also be used with assignments and subscripts. Such a feature
might be very useful in Swift; this proposal reserves the .. operator
so that it may be added in the future.
-Introduce "mixfix" operator declarations. Mixfix operators are based
on pattern matching, and would allow more than two operands. For example,
the ternary operator ? : can be defined as a mixfix operator with
three "holes": _ ? _ : _. Subscripts might be subsumed by mixfix
declarations such as _ [ _ ]. Some holes could be made @autoclosure,
and there might even be holes whose argument is represented as an AST,
rather than a value or thunk, supporting advanced metaprogramming (for
instance, F#'s code quotations
<https://docs.microsoft.com/en-us/dotnet/articles/fsharp/language-reference/code-quotations>
).
-Diminish or remove the lexical distinction between operators and
identifiers. If precedence and fixity applied to traditional
identifiers as well as operators, it would be possible to incorporate ASCII
equivalents for standard operators (e.g. and for &&, to allow A and B).
If additionally combined with mixfix operator support, this might enable
powerful DSLs (for instance, C#'s LINQ
<https://en.wikipedia.org/wiki/Language_Integrated_Query>\)._______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution