[Proposal] Refining Identifier and Operator Symbology


(J.E. Schotsman) #1

+1

Jan E.

···

On 20 Oct 2016, at 08:34,Nevin Brackett-Rozinsky wrote:

So I am strongly—adamantly—opposed to the operator-eviscerating portion of
this proposal.

We should make Braille characters, Hangzhou numerals, the empty set and the
infinity sign into identifiers. All other operators should remain as they
are until official Unicode recommendations exist, at which point we should
deprecate as necessary.


(Jonathan S. Shapiro) #2

Quick poll as a sanity check on a possible alternative for operators:

If we admitted [:Sm:] and [:So:] and the traditional ASCII operator
characters, would that cover the things that people currently feel
passionate about? That would almost certainly be compliant with UAX31 once
it settles, and I *think* it covers all of the cases people have raised
here.

Useful links if you want to check:

[:Sm:] Symbol, Math
<http://www.fileformat.info/info/unicode/category/Sm/list.htm>

[:So:] Symbol, Other
<http://www.fileformat.info/info/unicode/category/So/list.htm>

Having looked it over, I'm concerned about including [:Sk:] in UAX31
operators, and I'm probably going to recommend in the UAX31 discussion that
we shouldn't do so.

Jonathan


(Matthew Johnson) #3

Quick poll as a sanity check on a possible alternative for operators:

If we admitted [:Sm:] and [:So:] and the traditional ASCII operator characters, would that cover the things that people currently feel passionate about? That would almost certainly be compliant with UAX31 once it settles, and I think it covers all of the cases people have raised here.

Useful links if you want to check:

[:Sm:] Symbol, Math <http://www.fileformat.info/info/unicode/category/Sm/list.htm>
[:So:] Symbol, Other <http://www.fileformat.info/info/unicode/category/So/list.htm>

Having looked it over, I'm concerned about including [:Sk:] in UAX31 operators, and I'm probably going to recommend in the UAX31 discussion that we shouldn't do so.

On a quick glance, I think this would be acceptable to me.

···

On Oct 20, 2016, at 9:29 AM, Jonathan S. Shapiro via swift-evolution <swift-evolution@swift.org> wrote:

Jonathan
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Rob Mayoff) #4

This would define both ∞ and ∅ as operators.

···

On Thu, Oct 20, 2016 at 9:29 AM, Jonathan S. Shapiro via swift-evolution < swift-evolution@swift.org> wrote:

Quick poll as a sanity check on a possible alternative for operators:

If we admitted [:Sm:] and [:So:] and the traditional ASCII operator
characters, would that cover the things that people currently feel
passionate about? That would almost certainly be compliant with UAX31 once
it settles, and I *think* it covers all of the cases people have raised
here.

Useful links if you want to check:

[:Sm:] Symbol, Math
<http://www.fileformat.info/info/unicode/category/Sm/list.htm>

[:So:] Symbol, Other
<http://www.fileformat.info/info/unicode/category/So/list.htm>


(Alex Blewitt) #5

The "Symbol, Other" category contains "Sign of the Horns" :metal: which was one of the problems with the identifier/operator that kicked off these discussions.

http://www.fileformat.info/info/unicode/char/1f918/index.htm

So it would break some existing cases, e.g.:

  1> let \U+1F913 = "nerd face"
:nerd_face:: String = "nerd face"

http://www.fileformat.info/info/unicode/char/1f913/index.htm

On the other hand, there are some symbols in [:So:] that may be useful e.g. the APL Functional Symbol * series

It might be easier to have just [:Sm:] to start with, and review the [:So:] subsequently (or have those addressed in UAX31).

Alex

···

On 20 Oct 2016, at 15:29, Jonathan S. Shapiro via swift-evolution <swift-evolution@swift.org> wrote:

Quick poll as a sanity check on a possible alternative for operators:

If we admitted [:Sm:] and [:So:] and the traditional ASCII operator characters, would that cover the things that people currently feel passionate about? That would almost certainly be compliant with UAX31 once it settles, and I think it covers all of the cases people have raised here.

Useful links if you want to check:

[:Sm:] Symbol, Math <http://www.fileformat.info/info/unicode/category/Sm/list.htm>
[:So:] Symbol, Other <http://www.fileformat.info/info/unicode/category/So/list.htm>

Having looked it over, I'm concerned about including [:Sk:] in UAX31 operators, and I'm probably going to recommend in the UAX31 discussion that we shouldn't do so.

Jonathan
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Ben Rimmington) #6

I'd exclude [:So:], arrows, brackets and "Miscellaneous Mathematical Symbols B".

  [:Math_Symbol:]
  - [:name=/\bANGLE\b/:]
  - [:name=EMPTY SET:]
  - [:name=INFINITY:]
  - [:Emoji:]
  - [:ID_Continue:]
  - [:NFC_Quick_Check=No:]
  & [:Script_Extensions=Common:]
  & [[:Block=Latin_1_Supplement:]
     [:Block=General_Punctuation:]
     [:Block=Mathematical_Operators:]
     [:Block=Miscellaneous_Mathematical_Symbols_A:]
     [:Block=Supplemental_Mathematical_Operators:]]

<http://www.unicode.org/cldr/utility/list-unicodeset.jsp?a=[%3AMath_Symbol%3A] -+[%3Aname%3D%2F\bANGLE\b%2F%3A] -+[%3Aname%3DEMPTY+SET%3A] -+[%3Aname%3DINFINITY%3A] -+[%3AEmoji%3A] -+[%3AID_Continue%3A] -+[%3ANFC_Quick_Check%3DNo%3A] %26+[%3AScript_Extensions%3DCommon%3A] %26+[[%3ABlock%3DLatin_1_Supplement%3A] +++[%3ABlock%3DGeneral_Punctuation%3A] +++[%3ABlock%3DMathematical_Operators%3A] +++[%3ABlock%3DMiscellaneous_Mathematical_Symbols_A%3A] +++[%3ABlock%3DSupplemental_Mathematical_Operators%3A]]>

Is there a property to test for "pictographic" characters?

-- Ben

···

On 20 Oct 2016, at 15:29, Jonathan S. Shapiro wrote:

Quick poll as a sanity check on a possible alternative for operators:

If we admitted [:Sm:] and [:So:] and the traditional ASCII operator characters, would that cover the things that people currently feel passionate about? That would almost certainly be compliant with UAX31 once it settles, and I think it covers all of the cases people have raised here.


(Jonathan S. Shapiro) #7

Actually not. In the interests of sanity I didn't give the full
specification statement, which excludes anything in XIDC, emojis, or
pictographics. From the UAX31 perspective those are sanity conditions, and
I think they would be prudent for Swift as well.

You can still put the horns on somebody, but you have to use a conventional
identifier.... :slight_smile:

Jonathan

···

On Thu, Oct 20, 2016 at 8:22 AM, Alex Blewitt <alblue@apple.com> wrote:

The "Symbol, Other" category contains "Sign of the Horns" :metal: which was
one of the problems with the identifier/operator that kicked off these
discussions.


(Benjamin Spratling) #8

Brackets and symbols are definitely operators. Different brackets are used to represent various quantum mechanical forms and operations.
Arrows are also useful as operators, including but not restricted to chemical reactions.

-Ben

···

Sent from my iPhone.

On Oct 21, 2016, at 7:20 AM, Ben Rimmington via swift-evolution <swift-evolution@swift.org> wrote:

On 20 Oct 2016, at 15:29, Jonathan S. Shapiro wrote:

Quick poll as a sanity check on a possible alternative for operators:

If we admitted [:Sm:] and [:So:] and the traditional ASCII operator characters, would that cover the things that people currently feel passionate about? That would almost certainly be compliant with UAX31 once it settles, and I think it covers all of the cases people have raised here.

I'd exclude [:So:], arrows, brackets and "Miscellaneous Mathematical Symbols B".

   [:Math_Symbol:]
   - [:name=/\bANGLE\b/:]
   - [:name=EMPTY SET:]
   - [:name=INFINITY:]
   - [:Emoji:]
   - [:ID_Continue:]
   - [:NFC_Quick_Check=No:]
   & [:Script_Extensions=Common:]
   & [[:Block=Latin_1_Supplement:]
      [:Block=General_Punctuation:]
      [:Block=Mathematical_Operators:]
      [:Block=Miscellaneous_Mathematical_Symbols_A:]
      [:Block=Supplemental_Mathematical_Operators:]]

<http://www.unicode.org/cldr/utility/list-unicodeset.jsp?a=[%3AMath_Symbol%3A] -+[%3Aname%3D%2F\bANGLE\b%2F%3A] -+[%3Aname%3DEMPTY+SET%3A] -+[%3Aname%3DINFINITY%3A] -+[%3AEmoji%3A] -+[%3AID_Continue%3A] -+[%3ANFC_Quick_Check%3DNo%3A] %26+[%3AScript_Extensions%3DCommon%3A] %26+[[%3ABlock%3DLatin_1_Supplement%3A] +++[%3ABlock%3DGeneral_Punctuation%3A] +++[%3ABlock%3DMathematical_Operators%3A] +++[%3ABlock%3DMiscellaneous_Mathematical_Symbols_A%3A] +++[%3ABlock%3DSupplemental_Mathematical_Operators%3A]]>

Is there a property to test for "pictographic" characters?

-- Ben

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Ben Rimmington) #9

Brackets and symbols are definitely operators. Different brackets are used to represent various quantum mechanical forms and operations.

The brackets are mostly "bracket pieces":

  [:Math_Symbol:]
  - [:name=/\bANGLE\b/:]
  - [:Emoji:]
  - [:ID_Continue:]
  - [:NFC_Quick_Check=No:]
  & [:Script_Extensions=Common:]
  & [:Block=Miscellaneous_Technical:]

<http://www.unicode.org/cldr/utility/list-unicodeset.jsp?a=[%3AMath_Symbol%3A] -+[%3Aname%3D%2F\bANGLE\b%2F%3A] -+[%3AEmoji%3A] -+[%3AID_Continue%3A] -+[%3ANFC_Quick_Check%3DNo%3A] %26+[%3AScript_Extensions%3DCommon%3A] %26+[%3ABlock%3DMiscellaneous_Technical%3A]>

Arrows are also useful as operators, including but not restricted to chemical reactions.

Including arrows, there are 740 operators:

  [:Math_Symbol:]
  - [:name=/\bANGLE\b/:]
  - [:name=EMPTY SET:]
  - [:name=INFINITY:]
  - [:Emoji:]
  - [:ID_Continue:]
  - [:NFC_Quick_Check=No:]
  & [:Script_Extensions=Common:]
  & [[:Block=Arrows:]
     [:Block=General_Punctuation:]
     [:Block=Latin_1_Supplement:]
     [:Block=Mathematical_Operators:]
     [:Block=Miscellaneous_Mathematical_Symbols_A:]
     [:Block=Miscellaneous_Symbols_And_Arrows:]
     [:Block=Supplemental_Arrows_A:]
     [:Block=Supplemental_Arrows_B:]
     [:Block=Supplemental_Mathematical_Operators:]]

<http://www.unicode.org/cldr/utility/list-unicodeset.jsp?a=[%3AMath_Symbol%3A] -+[%3Aname%3D%2F\bANGLE\b%2F%3A] -+[%3Aname%3DEMPTY+SET%3A] -+[%3Aname%3DINFINITY%3A] -+[%3AEmoji%3A] -+[%3AID_Continue%3A] -+[%3ANFC_Quick_Check%3DNo%3A] %26+[%3AScript_Extensions%3DCommon%3A] %26+[[%3ABlock%3DArrows%3A] +++[%3ABlock%3DGeneral_Punctuation%3A] +++[%3ABlock%3DLatin_1_Supplement%3A] +++[%3ABlock%3DMathematical_Operators%3A] +++[%3ABlock%3DMiscellaneous_Mathematical_Symbols_A%3A] +++[%3ABlock%3DMiscellaneous_Symbols_And_Arrows%3A] +++[%3ABlock%3DSupplemental_Arrows_A%3A] +++[%3ABlock%3DSupplemental_Arrows_B%3A] +++[%3ABlock%3DSupplemental_Mathematical_Operators%3A]]>

-- Ben

···

On 21 Oct 2016, at 13:42, Benjamin Spratling wrote:


#10

I think it is important that we as a community discuss the non-operator
portion of this proposal. And, given the strong opinions about operators
that have been expressed, I think it is unlikely we will do so while major
operator changes are on the table.

Thus I would suggest that either the operator changes should be separated
out into their own proposal, or we should make only minor (and generally
consensus-agreed) changes to the operator set as part of this one.

Here is what I propose:

Emoji shall be identifiers, not operators.
The turned ampersand shall be an operator, not an identifier.
The empty set and infinity symbols shall be identifiers, not operators.

All other potential changes to the set of operator characters then go in
their own proposal, which I am sure will receive a lot of attention.

It may turn out that the non-operator portion of this proposal nonetheless
touches characters that Swift has designated for operators, in which case
we may address those as they arise.

Does that sound like a reasonable way forward?

Nevin

···

On Fri, Oct 21, 2016 at 9:27 AM, Ben Rimmington via swift-evolution < swift-evolution@swift.org> wrote:

> On 21 Oct 2016, at 13:42, Benjamin Spratling wrote:
>
> Brackets and symbols are definitely operators. Different brackets are
used to represent various quantum mechanical forms and operations.

The brackets are mostly "bracket pieces":

        [:Math_Symbol:]
        - [:name=/\bANGLE\b/:]
        - [:Emoji:]
        - [:ID_Continue:]
        - [:NFC_Quick_Check=No:]
        & [:Script_Extensions=Common:]
        & [:Block=Miscellaneous_Technical:]

<http://www.unicode.org/cldr/utility/list-unicodeset.jsp?a=
%5B%3AMath_Symbol%3A%5D%0D%0A-+%5B%3Aname%3D%2F%5CbANGLE%
5Cb%2F%3A%5D%0D%0A-+%5B%3AEmoji%3A%5D%0D%0A-+%5B%3AID_
Continue%3A%5D%0D%0A-+%5B%3ANFC_Quick_Check%3DNo%3A%5D%
0D%0A%26+%5B%3AScript_Extensions%3DCommon%3A%5D%0D%0A%26+%5B%3ABlock%
3DMiscellaneous_Technical%3A%5D>

> Arrows are also useful as operators, including but not restricted to
chemical reactions.

Including arrows, there are 740 operators:

        [:Math_Symbol:]
        - [:name=/\bANGLE\b/:]
        - [:name=EMPTY SET:]
        - [:name=INFINITY:]
        - [:Emoji:]
        - [:ID_Continue:]
        - [:NFC_Quick_Check=No:]
        & [:Script_Extensions=Common:]
        & [[:Block=Arrows:]
           [:Block=General_Punctuation:]
           [:Block=Latin_1_Supplement:]
           [:Block=Mathematical_Operators:]
           [:Block=Miscellaneous_Mathematical_Symbols_A:]
           [:Block=Miscellaneous_Symbols_And_Arrows:]
           [:Block=Supplemental_Arrows_A:]
           [:Block=Supplemental_Arrows_B:]
           [:Block=Supplemental_Mathematical_Operators:]]

<http://www.unicode.org/cldr/utility/list-unicodeset.jsp?a=
%5B%3AMath_Symbol%3A%5D%0D%0A-+%5B%3Aname%3D%2F%5CbANGLE%
5Cb%2F%3A%5D%0D%0A-+%5B%3Aname%3DEMPTY+SET%3A%5D%0D%
0A-+%5B%3Aname%3DINFINITY%3A%5D%0D%0A-+%5B%3AEmoji%3A%5D%
0D%0A-+%5B%3AID_Continue%3A%5D%0D%0A-+%5B%3ANFC_Quick_
Check%3DNo%3A%5D%0D%0A%26+%5B%3AScript_Extensions%3DCommon%
3A%5D%0D%0A%26+%5B%5B%3ABlock%3DArrows%3A%5D%0D%0A+++%5B%
3ABlock%3DGeneral_Punctuation%3A%5D%0D%0A+++%5B%3ABlock%
3DLatin_1_Supplement%3A%5D%0D%0A+++%5B%3ABlock%
3DMathematical_Operators%3A%5D%0D%0A+++%5B%3ABlock%
3DMiscellaneous_Mathematical_Symbols_A%3A%5D%0D%0A+++%5B%
3ABlock%3DMiscellaneous_Symbols_And_Arrows%3A%5D%0D%0A+++%5B%3ABlock%
3DSupplemental_Arrows_A%3A%5D%0D%0A+++%5B%3ABlock%
3DSupplemental_Arrows_B%3A%5D%0D%0A+++%5B%3ABlock%
3DSupplemental_Mathematical_Operators%3A%5D%5D>

-- Ben

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Xiaodi Wu) #11

I disagree pretty strongly with this approach. Firstly because as you've
pointed out, changes to the operator characters will have effects on valid
operator character sets, secondly because a major question about operators
is whether it is feasible or no to manually exclude empty set and infinity
given that Unicode is likely to reject that approach. For these reasons I
feel very strongly that the reforming the operator and identifier
characters must move in tandem.

···

On Fri, Oct 21, 2016 at 14:28 Nevin Brackett-Rozinsky via swift-evolution < swift-evolution@swift.org> wrote:

I think it is important that we as a community discuss the non-operator
portion of this proposal. And, given the strong opinions about operators
that have been expressed, I think it is unlikely we will do so while major
operator changes are on the table.

Thus I would suggest that either the operator changes should be separated
out into their own proposal, or we should make only minor (and generally
consensus-agreed) changes to the operator set as part of this one.

Here is what I propose:

Emoji shall be identifiers, not operators.
The turned ampersand shall be an operator, not an identifier.
The empty set and infinity symbols shall be identifiers, not operators.

All other potential changes to the set of operator characters then go in
their own proposal, which I am sure will receive a lot of attention.

It may turn out that the non-operator portion of this proposal nonetheless
touches characters that Swift has designated for operators, in which case
we may address those as they arise.

Does that sound like a reasonable way forward?

Nevin

On Fri, Oct 21, 2016 at 9:27 AM, Ben Rimmington via swift-evolution < > swift-evolution@swift.org> wrote:

> On 21 Oct 2016, at 13:42, Benjamin Spratling wrote:
>
> Brackets and symbols are definitely operators. Different brackets are
used to represent various quantum mechanical forms and operations.

The brackets are mostly "bracket pieces":

        [:Math_Symbol:]
        - [:name=/\bANGLE\b/:]
        - [:Emoji:]
        - [:ID_Continue:]
        - [:NFC_Quick_Check=No:]
        & [:Script_Extensions=Common:]
        & [:Block=Miscellaneous_Technical:]

<
http://www.unicode.org/cldr/utility/list-unicodeset.jsp?a=[%3AMath_Symbol%3A] -+[%3Aname%3D%2F\bANGLE\b%2F%3A] -+[%3AEmoji%3A] -+[%3AID_Continue%3A] -+[%3ANFC_Quick_Check%3DNo%3A] %26+[%3AScript_Extensions%3DCommon%3A] %26+[%3ABlock%3DMiscellaneous_Technical%3A]
>

> Arrows are also useful as operators, including but not restricted to
chemical reactions.

Including arrows, there are 740 operators:

        [:Math_Symbol:]
        - [:name=/\bANGLE\b/:]
        - [:name=EMPTY SET:]
        - [:name=INFINITY:]
        - [:Emoji:]
        - [:ID_Continue:]
        - [:NFC_Quick_Check=No:]
        & [:Script_Extensions=Common:]
        & [[:Block=Arrows:]
           [:Block=General_Punctuation:]
           [:Block=Latin_1_Supplement:]
           [:Block=Mathematical_Operators:]
           [:Block=Miscellaneous_Mathematical_Symbols_A:]
           [:Block=Miscellaneous_Symbols_And_Arrows:]
           [:Block=Supplemental_Arrows_A:]
           [:Block=Supplemental_Arrows_B:]
           [:Block=Supplemental_Mathematical_Operators:]]

<
http://www.unicode.org/cldr/utility/list-unicodeset.jsp?a=[%3AMath_Symbol%3A] -+[%3Aname%3D%2F\bANGLE\b%2F%3A] -+[%3Aname%3DEMPTY+SET%3A] -+[%3Aname%3DINFINITY%3A] -+[%3AEmoji%3A] -+[%3AID_Continue%3A] -+[%3ANFC_Quick_Check%3DNo%3A] %26+[%3AScript_Extensions%3DCommon%3A] %26+[[%3ABlock%3DArrows%3A] +++[%3ABlock%3DGeneral_Punctuation%3A] +++[%3ABlock%3DLatin_1_Supplement%3A] +++[%3ABlock%3DMathematical_Operators%3A] +++[%3ABlock%3DMiscellaneous_Mathematical_Symbols_A%3A] +++[%3ABlock%3DMiscellaneous_Symbols_And_Arrows%3A] +++[%3ABlock%3DSupplemental_Arrows_A%3A] +++[%3ABlock%3DSupplemental_Arrows_B%3A] +++[%3ABlock%3DSupplemental_Mathematical_Operators%3A]]
>

-- Ben

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Jonathan S. Shapiro) #12

A few other thoughts:

* One of our (or at least my) overarching goals for this proposal is to
refine identifier and operator sets by moving away from an ad-hoc
character-by-character approach to a systematic treatment that relies on
well-defined criteria to select blocks of characters for inclusion.

I completely agree with this, up to the point where you wrote "select
blocks". That doesn't seem to be the way things are selected in the Unicode
universe.

We can choose to adopt blocks as an interim measure, but we should not
loose sight of the notion that this should eventually be property-driven.

* Particularly if the community insists on emoji being identifiers, we will

have to critically evaluate how to proceed on that in tandem with
operators, because there exist emojified operators and arrows, and certain
emoji are encoded in blocks that are otherwise symbols, which Unicode is
likely to deem as operators in the future.

This is not correct. The current view in the UAX31 discussion is that
emojis and pictographics should be excluded from *both* types of
identifiers so that individual programming languages can make
language-specific choices.

The main current problem is that there is no clear-cut Unicode property
covering Emojis at the present time, which is something that needs to be
resolved over in Unicode-land. There is a list given in the antique texty
UCD file format, but it isn't part of the XML formulation of the UCD
database. I'll be generating a proposed update shortly, and when I have
that I can provide a working list in the form of a C file that can be used
by Swift.

I have a mild preference that emojis should live in conventional
identifiers if they are adopted.

Moreover, IIUC, certain codepoints can be either emoji or non-emoji
symbols and variant selectors can specify which they are...

Can you expand on this and (hopefully) point me at the appropriate spot in
one of the Unicode TRs?

Thanks!

Jonathan

···

On Fri, Oct 21, 2016 at 1:10 PM, Xiaodi Wu via swift-evolution < swift-evolution@swift.org> wrote:


(Jonathan S. Shapiro) #13

That's a feasible way to go, but keep in mind that the UAX31 changes are
being co-designed with and informed by the current discussion. There are a
bunch of things that have come up here that will allow UAX31 to side-step
some "might have happened" mistakes, so this discussion has been very
useful.

The Swift community can and should make its own decision about whether to
remain engaged. The risk of disengagement is that messy compatibility
issues will probably have to be faced later that we can easily head-off now.

Jonathan

···

On Fri, Oct 21, 2016 at 1:54 PM, Nevin Brackett-Rozinsky via swift-evolution <swift-evolution@swift.org> wrote:

I think it is plainly evident that the well-defined criteria you would
like to use *have not yet been defined* by Unicode. That is a large part of
why I recommend that we postpone a major overhaul of our operator
characters.


(Xiaodi Wu) #14

A few other thoughts:

* One of our (or at least my) overarching goals for this proposal is to
refine identifier and operator sets by moving away from an ad-hoc
character-by-character approach to a systematic treatment that relies on
well-defined criteria to select blocks of characters for inclusion. My
personal opinion is that reverting to discussions of individual characters,
such as the turned ampersand, means we're simply re-shuffling the current
set of identifier and operator characters to a different one, having failed
to discover any systematic basis for doing so. I think one of the strongest
parts of this proposal is the straight-up statement that identifier start
characters shall be IDC_Start and identifier continuation characters shall
be IDC_Continue, modulo a few ASCII characters that require special
treatment in Swift.

* Particularly if the community insists on emoji being identifiers, we will
have to critically evaluate how to proceed on that in tandem with
operators, because there exist emojified operators and arrows, and certain
emoji are encoded in blocks that are otherwise symbols, which Unicode is
likely to deem as operators in the future. Moreover, IIUC, certain
codepoints can be either emoji or non-emoji symbols and variant selectors
can specify which they are, but in the absence of a variant selector
*either* the emoji or non-emoji version can be correctly displayed
depending on the platform. For some of these, the non-emoji version is
either clearly or plausible an operator character. Therefore, without
dealing *very* carefully with emoji and with operators at the same time, we
are failing to address a key motivation of this proposal, which is to fix
the incorrect separation between emoji operators and emoji identifiers.

···

On Fri, Oct 21, 2016 at 2:36 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

I disagree pretty strongly with this approach. Firstly because as you've
pointed out, changes to the operator characters will have effects on valid
operator character sets, secondly because a major question about operators
is whether it is feasible or no to manually exclude empty set and infinity
given that Unicode is likely to reject that approach. For these reasons I
feel very strongly that the reforming the operator and identifier
characters must move in tandem.

On Fri, Oct 21, 2016 at 14:28 Nevin Brackett-Rozinsky via swift-evolution < > swift-evolution@swift.org> wrote:

I think it is important that we as a community discuss the non-operator
portion of this proposal. And, given the strong opinions about operators
that have been expressed, I think it is unlikely we will do so while major
operator changes are on the table.

Thus I would suggest that either the operator changes should be separated
out into their own proposal, or we should make only minor (and generally
consensus-agreed) changes to the operator set as part of this one.

Here is what I propose:

Emoji shall be identifiers, not operators.
The turned ampersand shall be an operator, not an identifier.
The empty set and infinity symbols shall be identifiers, not operators.

All other potential changes to the set of operator characters then go in
their own proposal, which I am sure will receive a lot of attention.

It may turn out that the non-operator portion of this proposal
nonetheless touches characters that Swift has designated for operators, in
which case we may address those as they arise.

Does that sound like a reasonable way forward?

Nevin

On Fri, Oct 21, 2016 at 9:27 AM, Ben Rimmington via swift-evolution < >> swift-evolution@swift.org> wrote:

> On 21 Oct 2016, at 13:42, Benjamin Spratling wrote:
>
> Brackets and symbols are definitely operators. Different brackets are
used to represent various quantum mechanical forms and operations.

The brackets are mostly "bracket pieces":

        [:Math_Symbol:]
        - [:name=/\bANGLE\b/:]
        - [:Emoji:]
        - [:ID_Continue:]
        - [:NFC_Quick_Check=No:]
        & [:Script_Extensions=Common:]
        & [:Block=Miscellaneous_Technical:]

<http://www.unicode.org/cldr/utility/list-unicodeset.jsp?a=
%5B%3AMath_Symbol%3A%5D%0D%0A-+%5B%3Aname%3D%2F%5CbANGLE%
5Cb%2F%3A%5D%0D%0A-+%5B%3AEmoji%3A%5D%0D%0A-+%5B%3AID_
Continue%3A%5D%0D%0A-+%5B%3ANFC_Quick_Check%3DNo%3A%5D%
0D%0A%26+%5B%3AScript_Extensions%3DCommon%3A%5D%0D%0A%26+%5B%3ABlock%
3DMiscellaneous_Technical%3A%5D>

> Arrows are also useful as operators, including but not restricted to
chemical reactions.

Including arrows, there are 740 operators:

        [:Math_Symbol:]
        - [:name=/\bANGLE\b/:]
        - [:name=EMPTY SET:]
        - [:name=INFINITY:]
        - [:Emoji:]
        - [:ID_Continue:]
        - [:NFC_Quick_Check=No:]
        & [:Script_Extensions=Common:]
        & [[:Block=Arrows:]
           [:Block=General_Punctuation:]
           [:Block=Latin_1_Supplement:]
           [:Block=Mathematical_Operators:]
           [:Block=Miscellaneous_Mathematical_Symbols_A:]
           [:Block=Miscellaneous_Symbols_And_Arrows:]
           [:Block=Supplemental_Arrows_A:]
           [:Block=Supplemental_Arrows_B:]
           [:Block=Supplemental_Mathematical_Operators:]]

<http://www.unicode.org/cldr/utility/list-unicodeset.jsp?a=
%5B%3AMath_Symbol%3A%5D%0D%0A-+%5B%3Aname%3D%2F%5CbANGLE%
5Cb%2F%3A%5D%0D%0A-+%5B%3Aname%3DEMPTY+SET%3A%5D%0D%
0A-+%5B%3Aname%3DINFINITY%3A%5D%0D%0A-+%5B%3AEmoji%3A%5D%
0D%0A-+%5B%3AID_Continue%3A%5D%0D%0A-+%5B%3ANFC_Quick_
Check%3DNo%3A%5D%0D%0A%26+%5B%3AScript_Extensions%3DCommon%
3A%5D%0D%0A%26+%5B%5B%3ABlock%3DArrows%3A%5D%0D%0A+++%5B%
3ABlock%3DGeneral_Punctuation%3A%5D%0D%0A+++%5B%3ABlock%
3DLatin_1_Supplement%3A%5D%0D%0A+++%5B%3ABlock%
3DMathematical_Operators%3A%5D%0D%0A+++%5B%3ABlock%
3DMiscellaneous_Mathematical_Symbols_A%3A%5D%0D%0A+++%5B%
3ABlock%3DMiscellaneous_Symbols_And_Arrows%3A%5D%0D%0A+++%5B%3ABlock%
3DSupplemental_Arrows_A%3A%5D%0D%0A+++%5B%3ABlock%
3DSupplemental_Arrows_B%3A%5D%0D%0A+++%5B%3ABlock%
3DSupplemental_Mathematical_Operators%3A%5D%5D>

-- Ben

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Xiaodi Wu) #15

A few other thoughts:

* One of our (or at least my) overarching goals for this proposal is to
refine identifier and operator sets by moving away from an ad-hoc
character-by-character approach to a systematic treatment that relies on
well-defined criteria to select blocks of characters for inclusion.

I completely agree with this, up to the point where you wrote "select
blocks". That doesn't seem to be the way things are selected in the Unicode
universe.

We can choose to adopt blocks as an interim measure, but we should not
loose sight of the notion that this should eventually be property-driven.

* Particularly if the community insists on emoji being identifiers, we

will have to critically evaluate how to proceed on that in tandem with
operators, because there exist emojified operators and arrows, and certain
emoji are encoded in blocks that are otherwise symbols, which Unicode is
likely to deem as operators in the future.

This is not correct. The current view in the UAX31 discussion is that
emojis and pictographics should be excluded from *both* types of
identifiers so that individual programming languages can make
language-specific choices.

The main current problem is that there is no clear-cut Unicode property
covering Emojis at the present time, which is something that needs to be
resolved over in Unicode-land. There is a list given in the antique texty
UCD file format, but it isn't part of the XML formulation of the UCD
database. I'll be generating a proposed update shortly, and when I have
that I can provide a working list in the form of a C file that can be used
by Swift.

I have a mild preference that emojis should live in conventional
identifiers if they are adopted.

Moreover, IIUC, certain codepoints can be either emoji or non-emoji
symbols and variant selectors can specify which they are...

Can you expand on this and (hopefully) point me at the appropriate spot in
one of the Unicode TRs?

Indeed. This issue first popped up on my radar when John Gruber wrote that
he had issues with his website displaying the curly arrow symbol as emoji
in recent browsers:

https://twitter.com/gruber/status/590355262281744384

Turns out, certain characters have emoji variants and text variants, which
can be selected for explicitly by appending a variant selector:

http://mts.io/2015/04/21/unicode-symbol-render-text-emoji/

However, whether the *default* presentation in the absence of a variant
selector is text or emoji can change from platform to platform, or from use
case to use case. This is explicitly spelled out in UTR#51:

"The presentation of a given emoji character depends on the environment,
whether or not there is an emoji or text variation selector, and the
default presentation style (emoji vs text). In informal environments like
texting and chats, it is more appropriate for most emoji characters to
appear with a colorful emoji presentation, and only get a text presentation
with a text variation selector. Conversely, in formal environments such as
word processing, it is generally better for emoji characters to appear with
a text presentation, and only get the colorful emoji presentation with the
emoji variation selector."

http://unicode.org/reports/tr51/#Presentation_Style

···

On Fri, Oct 21, 2016 at 4:32 PM, Jonathan S. Shapiro < jonathan.s.shapiro@gmail.com> wrote:

On Fri, Oct 21, 2016 at 1:10 PM, Xiaodi Wu via swift-evolution < > swift-evolution@swift.org> wrote:

Thanks!

Jonathan


#16

I think it is plainly evident that the well-defined criteria you would like
to use *have not yet been defined* by Unicode. That is a large part of why
I recommend that we postpone a major overhaul of our operator characters.

Furthermore, just like during the Great Renaming—when we used some general
rules to automate the name changes, then went through and fine-tuned by
hand to deal with cases where the automated rules produced suboptimal
results—I think it will be great if we can classify the majority of
characters all at once with certain criteria, but we should expect and plan
to go through by hand afterward to clean up the edge cases that humans can
tell were misplaced by the broad rules.

It is more important that we get the definitions of operator and identifier
characters *right* than that we make them *rigidly conform to a certain
rule*. If our rule says “∞” should be an operator, but we as humans
recognize that to be a mistake, then we should change it. Heck, maybe we
should make “∞” parse as a floating-point literal!

The point is, bikeshedding over operators has been and continues to be
diverting our collective attention away from the rest of the proposal, such
as using the IDC_Start and IDC_Continue categories that you mentioned.

Nevin

···

On Fri, Oct 21, 2016 at 4:10 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

A few other thoughts:

* One of our (or at least my) overarching goals for this proposal is to
refine identifier and operator sets by moving away from an ad-hoc
character-by-character approach to a systematic treatment that relies on
well-defined criteria to select blocks of characters for inclusion. My
personal opinion is that reverting to discussions of individual characters,
such as the turned ampersand, means we're simply re-shuffling the current
set of identifier and operator characters to a different one, having failed
to discover any systematic basis for doing so. I think one of the strongest
parts of this proposal is the straight-up statement that identifier start
characters shall be IDC_Start and identifier continuation characters shall
be IDC_Continue, modulo a few ASCII characters that require special
treatment in Swift.

* Particularly if the community insists on emoji being identifiers, we
will have to critically evaluate how to proceed on that in tandem with
operators, because there exist emojified operators and arrows, and certain
emoji are encoded in blocks that are otherwise symbols, which Unicode is
likely to deem as operators in the future. Moreover, IIUC, certain
codepoints can be either emoji or non-emoji symbols and variant selectors
can specify which they are, but in the absence of a variant selector
*either* the emoji or non-emoji version can be correctly displayed
depending on the platform. For some of these, the non-emoji version is
either clearly or plausible an operator character. Therefore, without
dealing *very* carefully with emoji and with operators at the same time, we
are failing to address a key motivation of this proposal, which is to fix
the incorrect separation between emoji operators and emoji identifiers.

On Fri, Oct 21, 2016 at 2:36 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

I disagree pretty strongly with this approach. Firstly because as you've
pointed out, changes to the operator characters will have effects on valid
operator character sets, secondly because a major question about operators
is whether it is feasible or no to manually exclude empty set and infinity
given that Unicode is likely to reject that approach. For these reasons I
feel very strongly that the reforming the operator and identifier
characters must move in tandem.

On Fri, Oct 21, 2016 at 14:28 Nevin Brackett-Rozinsky via swift-evolution >> <swift-evolution@swift.org> wrote:

I think it is important that we as a community discuss the non-operator
portion of this proposal. And, given the strong opinions about operators
that have been expressed, I think it is unlikely we will do so while major
operator changes are on the table.

Thus I would suggest that either the operator changes should be
separated out into their own proposal, or we should make only minor (and
generally consensus-agreed) changes to the operator set as part of this one.

Here is what I propose:

Emoji shall be identifiers, not operators.
The turned ampersand shall be an operator, not an identifier.
The empty set and infinity symbols shall be identifiers, not operators.

All other potential changes to the set of operator characters then go in
their own proposal, which I am sure will receive a lot of attention.

It may turn out that the non-operator portion of this proposal
nonetheless touches characters that Swift has designated for operators, in
which case we may address those as they arise.

Does that sound like a reasonable way forward?

Nevin

On Fri, Oct 21, 2016 at 9:27 AM, Ben Rimmington via swift-evolution < >>> swift-evolution@swift.org> wrote:

> On 21 Oct 2016, at 13:42, Benjamin Spratling wrote:
>
> Brackets and symbols are definitely operators. Different brackets are
used to represent various quantum mechanical forms and operations.

The brackets are mostly "bracket pieces":

        [:Math_Symbol:]
        - [:name=/\bANGLE\b/:]
        - [:Emoji:]
        - [:ID_Continue:]
        - [:NFC_Quick_Check=No:]
        & [:Script_Extensions=Common:]
        & [:Block=Miscellaneous_Technical:]

<http://www.unicode.org/cldr/utility/list-unicodeset.jsp?a=%
5B%3AMath_Symbol%3A%5D%0D%0A-+%5B%3Aname%3D%2F%5CbANGLE%5Cb%
2F%3A%5D%0D%0A-+%5B%3AEmoji%3A%5D%0D%0A-+%5B%3AID_Continue
%3A%5D%0D%0A-+%5B%3ANFC_Quick_Check%3DNo%3A%5D%0D%0A%26+%5B%
3AScript_Extensions%3DCommon%3A%5D%0D%0A%26+%5B%3ABlock%3DM
iscellaneous_Technical%3A%5D>

> Arrows are also useful as operators, including but not restricted to
chemical reactions.

Including arrows, there are 740 operators:

        [:Math_Symbol:]
        - [:name=/\bANGLE\b/:]
        - [:name=EMPTY SET:]
        - [:name=INFINITY:]
        - [:Emoji:]
        - [:ID_Continue:]
        - [:NFC_Quick_Check=No:]
        & [:Script_Extensions=Common:]
        & [[:Block=Arrows:]
           [:Block=General_Punctuation:]
           [:Block=Latin_1_Supplement:]
           [:Block=Mathematical_Operators:]
           [:Block=Miscellaneous_Mathematical_Symbols_A:]
           [:Block=Miscellaneous_Symbols_And_Arrows:]
           [:Block=Supplemental_Arrows_A:]
           [:Block=Supplemental_Arrows_B:]
           [:Block=Supplemental_Mathematical_Operators:]]

<http://www.unicode.org/cldr/utility/list-unicodeset.jsp?a=%
5B%3AMath_Symbol%3A%5D%0D%0A-+%5B%3Aname%3D%2F%5CbANGLE%5Cb%
2F%3A%5D%0D%0A-+%5B%3Aname%3DEMPTY+SET%3A%5D%0D%0A-+%5B%
3Aname%3DINFINITY%3A%5D%0D%0A-+%5B%3AEmoji%3A%5D%0D%0A-+%5B%
3AID_Continue%3A%5D%0D%0A-+%5B%3ANFC_Quick_Check%3DNo%3A%
5D%0D%0A%26+%5B%3AScript_Extensions%3DCommon%3A%5D%0D%
0A%26+%5B%5B%3ABlock%3DArrows%3A%5D%0D%0A+++%5B%3ABlock%
3DGeneral_Punctuation%3A%5D%0D%0A+++%5B%3ABlock%3DLatin_1_
Supplement%3A%5D%0D%0A+++%5B%3ABlock%3DMathematical_
Operators%3A%5D%0D%0A+++%5B%3ABlock%3DMiscellaneous_
Mathematical_Symbols_A%3A%5D%0D%0A+++%5B%3ABlock%
3DMiscellaneous_Symbols_And_Arrows%3A%5D%0D%0A+++%5B%
3ABlock%3DSupplemental_Arrows_A%3A%5D%0D%0A+++%5B%3ABlock%3D
Supplemental_Arrows_B%3A%5D%0D%0A+++%5B%3ABlock%3DSupplement
al_Mathematical_Operators%3A%5D%5D>

-- Ben

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Xiaodi Wu) #17

A few other thoughts:

* One of our (or at least my) overarching goals for this proposal is to
refine identifier and operator sets by moving away from an ad-hoc
character-by-character approach to a systematic treatment that relies on
well-defined criteria to select blocks of characters for inclusion.

I completely agree with this, up to the point where you wrote "select
blocks". That doesn't seem to be the way things are selected in the Unicode
universe.

We can choose to adopt blocks as an interim measure, but we should not
loose sight of the notion that this should eventually be property-driven.

* Particularly if the community insists on emoji being identifiers, we

will have to critically evaluate how to proceed on that in tandem with
operators, because there exist emojified operators and arrows, and certain
emoji are encoded in blocks that are otherwise symbols, which Unicode is
likely to deem as operators in the future.

This is not correct. The current view in the UAX31 discussion is that
emojis and pictographics should be excluded from *both* types of
identifiers so that individual programming languages can make
language-specific choices.

This proposed approach raises some issues with apparent inconsistency as
well as forward compatibility issues. What I'm saying is that today, among
a chunk of symbols which Unicode may deem to be operators, there will be
some with emoji variants and others without. It seems kinda arbitrary to
exclude specific arrows or specific dingbats from valid operator characters
on the criterion that they have an emoji variant. For instance, if curly
leftwards arrow has an emoji variant but curly upwards arrow does not, one
is considered an invalid operator but the other is valid? Now, going
forward, if an existing codepoint is today part of IDC_Start but tomorrow
gains an emoji variant, what happens then?

The main current problem is that there is no clear-cut Unicode property

···

On Fri, Oct 21, 2016 at 4:32 PM, Jonathan S. Shapiro < jonathan.s.shapiro@gmail.com> wrote:

On Fri, Oct 21, 2016 at 1:10 PM, Xiaodi Wu via swift-evolution < > swift-evolution@swift.org> wrote:
covering Emojis at the present time, which is something that needs to be
resolved over in Unicode-land. There is a list given in the antique texty
UCD file format, but it isn't part of the XML formulation of the UCD
database. I'll be generating a proposed update shortly, and when I have
that I can provide a working list in the form of a C file that can be used
by Swift.

I have a mild preference that emojis should live in conventional
identifiers if they are adopted.

Moreover, IIUC, certain codepoints can be either emoji or non-emoji
symbols and variant selectors can specify which they are...

Can you expand on this and (hopefully) point me at the appropriate spot in
one of the Unicode TRs?

Thanks!

Jonathan


(Jonathan S. Shapiro) #18

Umm... It would probably help if I sent a link, wouldn't it? :slight_smile:

Here it is:
https://github.com/jsshapiro/swift-evolution/blob/unicode-id-op/proposals/NNNN-refining-identifiers-and-operators.md

Jonathan, sheepishly.


(Jonathan S. Shapiro) #19

Well, it seems that I jumped the gun and sent my document link to
swift-evolution by mistake. Since I can't take it back, it might be good to
say what it's about. Because of my mistake, this version has *not* had any
review by the rest of the author group, which probably would have improved
it a lot. I had intended one more round of edits to deal with a few things
that still say "FIX" and a few minor cleanups, but I think the substance of
the proposal is sound.

This revised proposal
<https://github.com/jsshapiro/swift-evolution/blob/unicode-id-op/proposals/NNNN-refining-identifiers-and-operators.md>
tries
to take into account most of the feedback we have received here. Lots of
nitty-gritty changes, but here's the big picture of the changes:

   - Emojis are admitted, subject to reasonable sanity conditions.
   - A (significantly) broader, but still conservative set of code points
   are admitted for symbol identifiers. Hopefully this addresses current need,
   but I remain open to adopting full-on [:Sm:] and [:So:] if there is a
   strong push for that.
   - Operator definition is orthogonal to identifier specification, which
   deals with the noun/verb confusion and also addresses the widely-expressed
   feeling that some symbols aren't operators and their conventional meaning
   should be usable. The term "operator" no longer has anything to do with
   identifiers.
   - A laundry list of potential parsing gotchas are addressed. The
   previous proposal would have broken the generics syntax and also the
   binding syntax. This isn't a substantive conceptual change, but it's
   important if the proposal is going to, you know, *actually work*. :slight_smile:
   - Dollar is admitted in identifiers.
   - Explicitly addresses anonymous closure parameters in a way that
   reflects how the compiler actually needs to deal with such things. Might be
   I've written a compiler or two in my career. :slight_smile:
   - Consistent with the current direction of UAX31 on these issues.
   - Susan Kare's legacy is preserved. :slight_smile: If you don't know who Susan is,
   look her up and learn why Chris loves the dogcow emoji pair.

The new proposal remains entirely compatible with Swift 3, except where
existing source runs up against the narrower symbol identifier space. It's
a specific goal to avoid breaking reasonable current practice where
possible, though we're surely going to break *something* with this one.

I was trained to write specifications in a school that favored rigorous
writing. In order to make sure I didn't lose track of something I rewrote
the proposal in a form that I know how to use effectively. Any loss of
"fun" in the text is my fault alone.

Interested to see how this will be received.

Jonathan


(Martin Waitz) #20

With the current difficulty to come up with an agreed set of identifies vs. operators, maybe we should really try to sidestep this issue entirely and study Jonathan Shapiro’s idea of using identifiers as operators.

— Martin

···

Am 21.10.2016 um 22:10 schrieb Xiaodi Wu via swift-evolution <swift-evolution@swift.org>:

A few other thoughts:

* One of our (or at least my) overarching goals for this proposal is to refine identifier and operator sets by moving away from an ad-hoc character-by-character approach to a systematic treatment that relies on well-defined criteria to select blocks of characters for inclusion. My personal opinion is that reverting to discussions of individual characters, such as the turned ampersand, means we're simply re-shuffling the current set of identifier and operator characters to a different one, having failed to discover any systematic basis for doing so. I think one of the strongest parts of this proposal is the straight-up statement that identifier start characters shall be IDC_Start and identifier continuation characters shall be IDC_Continue, modulo a few ASCII characters that require special treatment in Swift.

* Particularly if the community insists on emoji being identifiers, we will have to critically evaluate how to proceed on that in tandem with operators, because there exist emojified operators and arrows, and certain emoji are encoded in blocks that are otherwise symbols, which Unicode is likely to deem as operators in the future. Moreover, IIUC, certain codepoints can be either emoji or non-emoji symbols and variant selectors can specify which they are, but in the absence of a variant selector *either* the emoji or non-emoji version can be correctly displayed depending on the platform. For some of these, the non-emoji version is either clearly or plausible an operator character. Therefore, without dealing *very* carefully with emoji and with operators at the same time, we are failing to address a key motivation of this proposal, which is to fix the incorrect separation between emoji operators and emoji identifiers.