[Proposal] Refining Identifier and Operator Symbology

Martin:

My "operators" proposal relies on resolving what will be a symbol
identifier, so I'm afraid we can't duck this issue.

I'm writing up a revised proposal as we speak. I'm hopeful that it will
address *most* of the concerns that have been expressed here, but it adopts
a slightly different conceptual approach than the previous one - enough so
that I don't think of it as a revision.

Because of the concern that Xiaodi just raised about emojis and modifiers,
I'm going to identify that as an open issue with an "if feasible" proposed
resolution to place them in the traditional identifier space.

I hope to be able to send a link to this list in a few hours.

Jonathan

···

On Fri, Oct 21, 2016 at 1:51 PM, Martin Waitz via swift-evolution < swift-evolution@swift.org> wrote:

With the current difficulty to come up with an agreed set of identifies
vs. operators, maybe we should really try to sidestep this issue entirely
and study Jonathan Shapiro’s idea of using identifiers as operators.

Another painful reminder of why "oops" is a four-letter word...

Sorry!

Jonathan

···

On Fri, Oct 21, 2016 at 9:26 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

You kinda sent that to swift-evolution :P

All:

Jacob has already identified a *big* hole in the proposal, which is that it
doesn't define how operator-bound identifiers are treated by import. That
definitely needs to be addressed by the proposal. It's straightforward, but
easy to get wrong. I will address that early tomorrow.

Jonathan

It’s worth pointing out that the proposal to add ‘$’ to identifiers is still under active review and have generated much controversy. I wouldn’t put much weight in “backward compatibility” for that proposal.

···

On Oct 21, 2016, at 9:38 PM, Jonathan S. Shapiro via swift-evolution <swift-evolution@swift.org> wrote:

Well, it seems that I jumped the gun and sent my document link to swift-evolution by mistake. Since I can't take it back, it might be good to say what it's about. Because of my mistake, this version has not had any review by the rest of the author group, which probably would have improved it a lot. I had intended one more round of edits to deal with a few things that still say "FIX" and a few minor cleanups, but I think the substance of the proposal is sound.

This revised proposal <https://github.com/jsshapiro/swift-evolution/blob/unicode-id-op/proposals/NNNN-refining-identifiers-and-operators.md&gt; tries to take into account most of the feedback we have received here. Lots of nitty-gritty changes, but here's the big picture of the changes:
Emojis are admitted, subject to reasonable sanity conditions.
A (significantly) broader, but still conservative set of code points are admitted for symbol identifiers. Hopefully this addresses current need, but I remain open to adopting full-on [:Sm:] and [:So:] if there is a strong push for that.
Operator definition is orthogonal to identifier specification, which deals with the noun/verb confusion and also addresses the widely-expressed feeling that some symbols aren't operators and their conventional meaning should be usable. The term "operator" no longer has anything to do with identifiers.
A laundry list of potential parsing gotchas are addressed. The previous proposal would have broken the generics syntax and also the binding syntax. This isn't a substantive conceptual change, but it's important if the proposal is going to, you know, actually work. :-)
Dollar is admitted in identifiers.
Explicitly addresses anonymous closure parameters in a way that reflects how the compiler actually needs to deal with such things. Might be I've written a compiler or two in my career. :-)
Consistent with the current direction of UAX31 on these issues.
Susan Kare's legacy is preserved. :-) If you don't know who Susan is, look her up and learn why Chris loves the dogcow emoji pair.
The new proposal remains entirely compatible with Swift 3, except where existing source runs up against the narrower symbol identifier space. It's a specific goal to avoid breaking reasonable current practice where possible, though we're surely going to break something with this one.

I was trained to write specifications in a school that favored rigorous writing. In order to make sure I didn't lose track of something I rewrote the proposal in a form that I know how to use effectively. Any loss of "fun" in the text is my fault alone.

Interested to see how this will be received.

Jonathan
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Jonathan, that’s a really nice proposal! :-)

You also already achieved some points from your „future directions“ ;-)

— Martin

Well, it seems that I jumped the gun and sent my document link to swift-evolution by mistake. Since I can't take it back, it might be good to say what it's about. Because of my mistake, this version has not had any review by the rest of the author group, which probably would have improved it a lot. I had intended one more round of edits to deal with a few things that still say "FIX" and a few minor cleanups, but I think the substance of the proposal is sound.

This revised proposal <https://github.com/jsshapiro/swift-evolution/blob/unicode-id-op/proposals/NNNN-refining-identifiers-and-operators.md&gt; tries to take into account most of the feedback we have received here. Lots of nitty-gritty changes, but here's the big picture of the changes:

This is looking like great progress over the original proposal and (to restate the obvious) I’m a huge fan of your (and Jacob, Erica, and Xiaodi’s) work on this proposal. Thank you ALL for driving this forward, it is a very messy but critical task to get right for Swift 4.

My biggest concern is the Operator Definition section (https://github.com/jsshapiro/swift-evolution/blob/unicode-id-op/proposals/NNNN-refining-identifiers-and-operators.md#operator-definition\)

The issue is that this breaks the ability to separately parse files. The C family of languages generally requires the parser to reason about a “translation unit”, and used a technique colloquially known as the “lexer hack” to determine whether an identifier token refers to a type or a value definition. Once the lexer determines this, the parser behaves differently. Your proposal introduces an analogous “lexer hack”, for the purposes of determining whether an token is an identifier or an operator.

This is deeply concerning to me, because this would break the ability to parse a file without parsing all of its dependencies. Our current support for this has some well known problems (e.g. you can’t resolve expressions to a particularly useful representation: you get SequenceExprs unless you can perform name lookup) but we do have the ability parse *declarations* with high fidelity.

This ability is extremely important for us to be able to get incremental compilation of an individual source file (which typically depends on declarations from other source files, but does not depend on the contents of their bodies) and is important for various latency-critical IDE features. For example, if you perform a code completion within a function, you generally only need to fully type check the function body that you’re inside, everything else in the file can be ignored, and only referenced declarations need to be (recursively) checked.

I think you have a good set of goals and motivations, and I agree that aligning with UAX31’s ultimate resolution is important. That said, I hope that we can tackle this in Swift 4 by taking a small but reasonable subset of what UAX31 is “certainly” going to support, and then wait for UAX31 to be finalized before expanding the rest out.

I’m not a unicode expert by any stretch of the imagination, but would it be possible to carve off some obvious blocks of emoji support as identifiers (e.g. not symbols, flags, or anything else complicated), and carve off the most obvious blocks of the math operators as operators? For the operator set, maybe we could start with some small subset of 100 (totally random number here) operators that are commonly requested and seem obvious, then expand it out to a principled set once UAX31 is resolved? This would avoid regressing too much in Swift 4 from Swift 3, but also dramatically limit the risk of painting ourselves into a corner that turns out to be incompatible with UAX31.

-Chris

···

On Oct 21, 2016, at 9:38 PM, Jonathan S. Shapiro via swift-evolution <swift-evolution@swift.org> wrote:

Emojis are admitted, subject to reasonable sanity conditions.
A (significantly) broader, but still conservative set of code points are admitted for symbol identifiers. Hopefully this addresses current need, but I remain open to adopting full-on [:Sm:] and [:So:] if there is a strong push for that.
Operator definition is orthogonal to identifier specification, which deals with the noun/verb confusion and also addresses the widely-expressed feeling that some symbols aren't operators and their conventional meaning should be usable. The term "operator" no longer has anything to do with identifiers.
A laundry list of potential parsing gotchas are addressed. The previous proposal would have broken the generics syntax and also the binding syntax. This isn't a substantive conceptual change, but it's important if the proposal is going to, you know, actually work. :-)
Dollar is admitted in identifiers.
Explicitly addresses anonymous closure parameters in a way that reflects how the compiler actually needs to deal with such things. Might be I've written a compiler or two in my career. :-)
Consistent with the current direction of UAX31 on these issues.
Susan Kare's legacy is preserved. :-) If you don't know who Susan is, look her up and learn why Chris loves the dogcow emoji pair.
The new proposal remains entirely compatible with Swift 3, except where existing source runs up against the narrower symbol identifier space. It's a specific goal to avoid breaking reasonable current practice where possible, though we're surely going to break something with this one.

I was trained to write specifications in a school that favored rigorous writing. In order to make sure I didn't lose track of something I rewrote the proposal in a form that I know how to use effectively. Any loss of "fun" in the text is my fault alone.

Interested to see how this will be received.

Jonathan
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

All:

Xiaodi has pointed out that the existing Unicode emoji set pulls in a host
of stuff we don't want. My bad for failing to adequately understand what
that file represented.

We can either pull in selected blocks with appropriate sanity filters or
possibly consider adopting the Unicode pictographics. I'll look at cleaning
this up in the morning. If somebody has put forth a clean proposal by then,
so much the better.

Jonathan

I just read through your new proposal, and I have to say it is extremely
well-written. There is a vast quantity of information presented quite
clearly, and it gives me a lot to think about.

I think it is plainly evident that the well-defined criteria you would
like to use *have not yet been defined* by Unicode. That is a large part of
why I recommend that we postpone a major overhaul of our operator
characters.

That's a feasible way to go, but keep in mind that the UAX31 changes are
being co-designed with and informed by the current discussion. There are a
bunch of things that have come up here that will allow UAX31 to side-step
some "might have happened" mistakes, so this discussion has been very
useful.

The Swift community can and should make its own decision about whether to
remain engaged. The risk of disengagement is that messy compatibility
issues will probably have to be faced later that we can easily head-off now.

Ah, I had not previously understood that. Well then, in light of the fact
that the Unicode recommendations may be influenced by our decisions, and
given that Swift is an opinionated language, it follows that we ought to
make our best effort at separating out what we have been calling “operator
characters” (and your revised proposal calls “symbol identifier”
characters).

In particular, since there does not yet exist a categorization of symbols
which fits our needs, and since our needs may help shape such a
categorization as it forms, it behooves us to fully undertake the endeavor
of defining which symbols we would like to see in which roles for Swift.

Your proposal mentions and links to a set of 650 code points
<Unicode Utilities: UnicodeSet;
that
your group identified by hand as operators. It also links to the combined Sm
and So categories
<Unicode Utilities: UnicodeSet.
However what you actually propose is the far-more-limited Mathematical
Operators block
<Unicode Utilities: UnicodeSet;
.

I will take it upon myself to go through code-points by hand and see what I
can find.

It is worth noting that your proposed “symbol identifier” category, by its
very name, suggests it should have broader membership than just operators.
I am not sure if that was intentional, however I will restrict my attention
to symbols that may reasonably function as operators.

After a preliminary glance through the code blocks, I believe there are
operator-like characters in these blocks
<Unicode Utilities: UnicodeSet;
:
Basic Latin
Latin-1 Supplement
General Punctuation
Letterlike Symbols
Arrows
Mathematical Operators
Miscellaneous Technical
Miscellaneous Mathematical Symbols-A
Supplemental Arrows-A
Supplemental Arrows-B
Miscellaneous Mathematical Symbols-B
Supplemental Mathematical Operators
Miscellaneous Symbols and Arrows
Supplemental Punctuation

Furthermore, the following blocks
<Unicode Utilities: UnicodeSet;
*may* have symbols that we want to allow in operators:
Box Drawing
Block Elements
Geometric Shapes
Miscellaneous Symbols
Dingbats
Braille Patterns
CJK Symbols and Punctuation
Yijing Hexagram Symbols
Ancient Symbols
Musical Symbols
Tai Xuan Jing Symbols

I think that covers all the blocks with potentially operator-like
characters. When I have had time to go through character by character I
will report back my findings.

Nevin

···

On Fri, Oct 21, 2016 at 5:38 PM, Jonathan S. Shapiro < jonathan.s.shapiro@gmail.com> wrote:

On Fri, Oct 21, 2016 at 1:54 PM, Nevin Brackett-Rozinsky via > swift-evolution <swift-evolution@swift.org> wrote:

On Sat, Oct 22, 2016 at 12:59 AM, Jonathan S. Shapiro via swift-evolution < swift-evolution@swift.org> wrote:

All:

Jacob has already identified a *big* hole in the proposal, which is that
it doesn't define how operator-bound identifiers are treated by import.
That definitely needs to be addressed by the proposal. It's
straightforward, but easy to get wrong. I will address that early tomorrow.

Jonathan

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

All right, I have gone through and formulated a set of characters to serve
as the core of our operator symbols. I started with [:Sm:] and removed
blocks and subheaders which are not clearly useful as operators (though may
be reincorporated selectively in the future). Then I added the rest of the
Arrows block, as well as punctuation symbols that are “operator-like”.

In particular, I kept Swift’s existing ASCII operators, and all of Swift’s
Latin-1 operators except for currency signs and the copyright and
registered trademark symbols. I also kept most of Swift’s existing General
Punctuation operators.

The end result is a set of 1,020 operator characters
<http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[[%3ASm%3A] -\p{Block%3DSuperscripts+And+Subscripts} -\p{Block%3DMiscellaneous+Technical} -\p{Block%3DGeometric+Shapes} -\p{Block%3DMiscellaneous+Symbols} -\p{Block%3DAlphabetic+Presentation+Forms} -\p{Block%3DSmall+Form+Variants} -\p{Block%3DHalfwidth+And+Fullwidth+Forms} -\p{Block%3DMathematical+Alphanumeric+Symbols} -\p{Block%3DArabic+Mathematical+Alphabetic+Symbols} -\p{subhead%3DVariant+letterforms+and+symbols} -\p{subhead%3DLetterlike+symbol} \p{Block%3DArrows} [%2F+%3D+\-+%2B+!+*+%25+<+>+\%26+|+\^+~+%3F] [¡+¢+£+¤+¥+¦+§+©+«+¬+®+°+±+¶+»+¿]+-+[¢+£+¤+¥+©+®] \p{subhead%3DGeneral+punctuation}+-+[U%2B203F+U%2B2040+U%2B2045+U%2B2046+U%2B2054] \p{subhead%3DDouble+punctuation+for+vertical+text} \p{subhead%3DArchaic+punctuation}+-+[U%2B2E31+U%2B2E33+U%2B2E34+U%2B2E3F] U%2B214B]&g=&i=&gt;,
which removes 1,628 symbols
<Unicode Utilities: UnicodeSet;
from Swift’s existing operator set
<Unicode Utilities: UnicodeSet;
and adds just 4 new ones
<http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[[%3ASm%3A] -\p{Block%3DSuperscripts+And+Subscripts} -\p{Block%3DMiscellaneous+Technical} -\p{Block%3DGeometric+Shapes} -\p{Block%3DMiscellaneous+Symbols} -\p{Block%3DAlphabetic+Presentation+Forms} -\p{Block%3DSmall+Form+Variants} -\p{Block%3DHalfwidth+And+Fullwidth+Forms} -\p{Block%3DMathematical+Alphanumeric+Symbols} -\p{Block%3DArabic+Mathematical+Alphabetic+Symbols} -\p{subhead%3DVariant+letterforms+and+symbols} -\p{subhead%3DLetterlike+symbol} \p{Block%3DArrows} [%2F+%3D+\-+%2B+!+*+%25+<+>+\%26+|+\^+~+%3F] [¡+¢+£+¤+¥+¦+§+©+«+¬+®+°+±+¶+»+¿]+-+[¢+£+¤+¥+©+®] \p{subhead%3DGeneral+punctuation}+-+[U%2B203F+U%2B2040+U%2B2045+U%2B2046+U%2B2054] \p{subhead%3DDouble+punctuation+for+vertical+text} \p{subhead%3DArchaic+punctuation}+-+[U%2B2E31+U%2B2E33+U%2B2E34+U%2B2E3F] U%2B214B] -[%2F+%3D+\-+%2B+!+*+%25+<+>+\%26+|+\^+~+%3F U%2B00A1+-+U%2B00A7 U%2B00A9+U%2B00AB+U%2B00AC+U%2B00AE U%2B00B0+-+U%2B00B1 U%2B00B6+U%2B00BB+U%2B00BF+U%2B00D7+U%2B00F7 U%2B2016+-+U%2B2017 U%2B2020+-+U%2B2027 U%2B2030+-+U%2B203E U%2B2041+-+U%2B2053 U%2B2055+-+U%2B205E U%2B2190+-+U%2B23FF U%2B2500+-+U%2B2775 U%2B2794+-+U%2B2BFF U%2B2E00+-+U%2B2E7F U%2B3001+-+U%2B3003 U%2B3008+-+U%2B3030]&g=&i=&gt;
(⅀ ؆ ؇ ⅋). I left out the “Full Stop” character, to be dealt with by
whatever rules we decide upon for dots in operators.

Here is the classification of the 1,020 characters I have identified as
operators:

[[:Sm:]
-\p{Block=Superscripts And Subscripts}
-\p{Block=Miscellaneous Technical}
-\p{Block=Geometric Shapes}
-\p{Block=Miscellaneous Symbols}
-\p{Block=Alphabetic Presentation Forms}
-\p{Block=Small Form Variants}
-\p{Block=Halfwidth And Fullwidth Forms}
-\p{Block=Mathematical Alphanumeric Symbols}
-\p{Block=Arabic Mathematical Alphabetic Symbols}
-\p{subhead=Variant letterforms and symbols}
-\p{subhead=Letterlike symbol}
\p{Block=Arrows}
[/ = \- + ! * % < > \& | \^ ~ ?]
[¡ ¢ £ ¤ ¥ ¦ § © « ¬ ® ° ± ¶ » ¿] - [¢ £ ¤ ¥ © ®]
\p{subhead=General punctuation} - [U+203F U+2040 U+2045 U+2046 U+2054]
\p{subhead=Double punctuation for vertical text}
\p{subhead=Archaic punctuation} - [U+2E31 U+2E33 U+2E34 U+2E3F]
U+214B]

Additionally, I think it is worthwhile to consider including the “Drafting
symbols” subheader and most of the “Miscellaneous technical” subheader.
This would add 34 more operator characters
<Unicode Utilities: UnicodeSet;
.

I did not consider non-head operator characters
<Unicode Utilities: UnicodeSet,
which are predominantly combining marks and variant selectors, and should
probably stay essentially as they are. Also, I kept the empty set and
infinity sign as operators, though we may want to change that.

There are a lot more symbols that could potentially become operators (eg.
shapes, currency signs, APL, etc.). However in light of the prevailing view
that we should start conservatively and add more in the future, I believe
this set of 1,020 characters is a good place to begin.

Nevin

+1 to this approach.

Regarding emoji, I don’t really use them much myself, so I’m favor of the minimal amount of work needed to get them under control (for now). I’d rather see the effort spent elsewhere. The operator stuff is interesting though.

-Matt

···

On Oct 24, 2016, at 22:33, Chris Lattner via swift-evolution <swift-evolution@swift.org> wrote:
would it be possible to carve off some obvious blocks of emoji support as identifiers (e.g. not symbols, flags, or anything else complicated), and carve off the most obvious blocks of the math operators as operators? For the operator set, maybe we could start with some small subset of 100 (totally random number here) operators that are commonly requested and seem obvious, then expand it out to a principled set once UAX31 is resolved?

I just read through your new proposal, and I have to say it is extremely
well-written. There is a vast quantity of information presented quite
clearly, and it gives me a lot to think about.

I think it is plainly evident that the well-defined criteria you would
like to use *have not yet been defined* by Unicode. That is a large part of
why I recommend that we postpone a major overhaul of our operator
characters.

That's a feasible way to go, but keep in mind that the UAX31 changes are
being co-designed with and informed by the current discussion. There are a
bunch of things that have come up here that will allow UAX31 to side-step
some "might have happened" mistakes, so this discussion has been very
useful.

The Swift community can and should make its own decision about whether to
remain engaged. The risk of disengagement is that messy compatibility
issues will probably have to be faced later that we can easily head-off now.

Ah, I had not previously understood that. Well then, in light of the fact
that the Unicode recommendations may be influenced by our decisions, and
given that Swift is an opinionated language, it follows that we ought to
make our best effort at separating out what we have been calling “operator
characters” (and your revised proposal calls “symbol identifier”
characters).

In particular, since there does not yet exist a categorization of symbols
which fits our needs, and since our needs may help shape such a
categorization as it forms, it behooves us to fully undertake the endeavor
of defining which symbols we would like to see in which roles for Swift.

Your proposal mentions and links to a set of 650 code points
<Unicode Utilities: UnicodeSet; that
your group identified by hand as operators. It also links to the combined Sm
and So categories
<Unicode Utilities: UnicodeSet.
However what you actually propose is the far-more-limited Mathematical
Operators block
<Unicode Utilities: UnicodeSet;
.

I will take it upon myself to go through code-points by hand and see what
I can find.

It is worth noting that your proposed “symbol identifier” category, by its
very name, suggests it should have broader membership than just operators.
I am not sure if that was intentional, however I will restrict my attention
to symbols that may reasonably function as operators.

After a preliminary glance through the code blocks, I believe there are
operator-like characters in these blocks
<Unicode Utilities: UnicodeSet;
:
Basic Latin
Latin-1 Supplement
General Punctuation
Letterlike Symbols
Arrows
Mathematical Operators
Miscellaneous Technical
Miscellaneous Mathematical Symbols-A
Supplemental Arrows-A
Supplemental Arrows-B
Miscellaneous Mathematical Symbols-B
Supplemental Mathematical Operators
Miscellaneous Symbols and Arrows
Supplemental Punctuation

Furthermore, the following blocks
<Unicode Utilities: UnicodeSet;
*may* have symbols that we want to allow in operators:
Box Drawing
Block Elements
Geometric Shapes
Miscellaneous Symbols
Dingbats
Braille Patterns
CJK Symbols and Punctuation
Yijing Hexagram Symbols
Ancient Symbols
Musical Symbols
Tai Xuan Jing Symbols

I think that covers all the blocks with potentially operator-like
characters. When I have had time to go through character by character I
will report back my findings.

A previous version of our proposal went through these blocks
character-by-character. It was not a fruitful exercise, and absent a
compelling use case, I would not recommend it, as Jonathan has made clear
that whatever UAX#31 ends up doing, it's definitely the case that Unicode
will not be adopting this character-by-character approach.

···

On Sat, Oct 22, 2016 at 1:37 AM, Nevin Brackett-Rozinsky via swift-evolution <swift-evolution@swift.org> wrote:

On Fri, Oct 21, 2016 at 5:38 PM, Jonathan S. Shapiro <jonathan.s.shapiro@ > gmail.com> wrote:

On Fri, Oct 21, 2016 at 1:54 PM, Nevin Brackett-Rozinsky via >> swift-evolution <swift-evolution@swift.org> wrote:

Nevin

On Sat, Oct 22, 2016 at 12:59 AM, Jonathan S. Shapiro via swift-evolution > <swift-evolution@swift.org> wrote:

All:

Jacob has already identified a *big* hole in the proposal, which is that
it doesn't define how operator-bound identifiers are treated by import.
That definitely needs to be addressed by the proposal. It's
straightforward, but easy to get wrong. I will address that early tomorrow.

Jonathan

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Ah, I had not previously understood that. Well then, in light of the fact
that the Unicode recommendations may be influenced by our decisions, and
given that Swift is an opinionated language, it follows that we ought to
make our best effort at separating out what we have been calling “operator
characters” (and your revised proposal calls “symbol identifier”
characters).

In particular, since there does not yet exist a categorization of symbols
which fits our needs, and since our needs may help shape such a
categorization as it forms, it behooves us to fully undertake the endeavor
of defining which symbols we would like to see in which roles for Swift.

The Unicode standard has a four well-established and general categories for
symbols:

Sc: Symbols, Currency
Sk: Symbol, Modifier
Sm: Symbols, Math
So: Symbols, Other

For our purposes these aren't a terribly helpful set of assignments. The
assignments in some places were arbitrary between categories S and P, and
the conceptual model used wasn't necessarily appropriate for programming
purposes. It might be a good idea to start by reading Chapter 22 (Symbols
<http://www.unicode.org/versions/Unicode9.0.0/ch22.pdf&gt;\) of the current
Unicode standard.

It would be a mistake to equate symbol identifiers with operators (which
are verbs). For example: my initial thought was to want to exclude the
LetterLike symbols block, but once the notion of operator and symbol
identifier are distinct it is no longer clear to me that this should be
done. The more pertinent question would seem to be "What kind of
identifiers should Letterlike Symbols be?" That seems to be driven more by
whitespace considerations than anything else. If we want to be able to
write something like

3*ℇ // three times the Euler constant (U+2107)

without white space, then we want ℇ to fall into normal identifiers. What
we'd ideally like to have is "noun identifiers" and "verb identifiers",
because this would correspond to the best white space outcome.
Unfortunately that ship sailed well before FORTRAN was standardized, and
the best we can do now is "get it right enough" and accept the need for
white space when that can't be done adequately. The conceptual difficulty
with defining symbol identifiers as nouns or verbs is that symbols in
mathematical use are assigned to meanings arbitrarily on a case-by-case
basis for the convenience of that individual paper. There really *isn't* a
general consensus in mathematical symbols about what is a noun and what is
a verb,. There *certainly* isn't a consensus for symbol identifiers
involving more than one glyph; the style of formal mathematics favors
single-letter variables in various scripts with decorative modifiers. For
this reason, the best outcome we can achieve will be "right enough" rather
than perfect, and white space will still be required in some cases.

I'm coming around to the view that a new Unicode property is actually
warranted, but that will take time. My reasoning in proposing that start
with the Mathematical Operators block had four parts:

   1. It's enough to make forward progress
   2. It allows most existing code to survive without breakage
   3. All of the code points in that particular block are pretty clearly
   things that want to be in symbol identifier
   4. It buys time for the Unicode group to negotiate the creation of a new
   property.

If we really want a good, fine-grain organization of code points, we need
to buy time for a property definition to happen over in Unicode-land, and
we need to avoid stepping on future backward-compatibility issues while
that happens. For this reason, I think we should be looking to define the
smallest set of symbol identifier codepoints that we think we can live with
for now given the current state of source code in the field. Everything we
add poses a risk of future backwards compatibility concerns.

What *would* be useful would be to go through each of the blocks mentioned
at the top of Chapter 22 of the standard, and characterize each one as
"mostly normal identifier" or "mostly symbol identifier". We can then go
through and identify the exceptions in each block. We should *avoid* the
Punctuation category and associated blocks at this time; category
assignments in that space were pretty arbitrary, so those will take a fair
bit of work to sort out.

Ideally, I'd like to see all of Sc (currency symbols) end up in "identifier
symbols" for consistency. The hold-out at the moment is '$'. As it turns
out, we could probably get away with adding decimal digits to
symbol-identifier-continue and then admit '$' in symbol identifiers rather
than conventional identifiers without breaking existing code, but I'm not
sure this clean-up is worth the consternation and worry that it will cause.

It is worth noting that your proposed “symbol identifier” category, by its
very name, suggests it should have broader membership than just operators.
I am not sure if that was intentional.

That is very much intentional. Our attention should *not* be restricted
solely to operators. It's a general identifier space.

Jonathan

···

On Fri, Oct 21, 2016 at 11:37 PM, Nevin Brackett-Rozinsky < nevin.brackettrozinsky@gmail.com> wrote:

Re: <https://github.com/jsshapiro/swift-evolution/blob/unicode-id-op/proposals/NNNN-refining-identifiers-and-operators.md&gt;

### Conventional Identifiers

`conventional-ident-head → [:Emoji:]`
will include the keycap base characters ('#', '*', ASCII digits).
<http://www.unicode.org/Public/emoji/4.0/emoji-data.txt&gt;
<http://www.unicode.org/Public/emoji/4.0/emoji-sequences.txt&gt;

Does the NFC format prevent emoji ZWJ sequences?
<http://www.unicode.org/Public/emoji/4.0/emoji-zwj-sequences.txt&gt;

### Symbol Identifiers

You could include both blocks of mathematical operators.
* U+2200 ... U+22FF [:Block=Mathematical_Operators:]
* U+2A00 ... U+2AFF [:Block=Supplemental_Mathematical_Operators:]

-- Ben