[Proposal] Refining Identifier and Operator Symbology

dwaite · October 19, 2016, 8:31pm

The problem is that this set does not just contain mathematical operators, but includes among other examples \u2205 (Empty Set) and \u221E (infinity).

-DW

···

On Oct 19, 2016, at 1:49 PM, Paul Cantrell via swift-evolution <swift-evolution@swift.org> wrote:

At the very least, Swift ought to support operators using symbols from the Unicode blocks called “Mathematical Operators” and “Supplemental Mathematical Operators.”
Mathematical operators and symbols in Unicode - Wikipedia

It’s right there in the name!™

Nevin · October 20, 2016, 12:32am

I strongly oppose the proposed mass-removal of operator characters. It
would be a major loss to the Swift language if that were to occur even
temporarily.

The long-term goal is for Swift to adopt the official Unicode guidance for
operator characters, which is still under development. Therefore I believe
we should make only minor and obvious changes right now, because there is
no sense in “jumping the gun” and causing unnecessary source-breaking
changes.

In particular, we should make it clear that Swift will most likely adopt
the Unicode operator conventions when they become available, so people are
aware and prepared.

When the time comes, we should deprecate any operator characters that
Unicode recommends against (unless we have a good reason not to), before
removing them in the next major release. The deprecation period ensures
that source-breaking changes result in a warning at first, so developers
have time to adapt.

I just went through all the valid operator characters in Swift, and the
only ones I would recommend to eliminate at this time are:
U+2800 – U+28FF (Braille patterns)
U+3021 – U+3029 (Hangzhou numerals)
U+2205 and U+221E (Empty set and Infinity)

Additionally, I propose to *add* one operator that is missing:
U+214B (Turned ampersand)

• • •

As for the rest of the proposal, I suppose normalizing identifiers and
dealing with confusable characters in a sensible way.

Regarding emoji, I look at them rather like the “I’m feeling lucky” button
on Google—essentially nobody uses it, but when they tried getting rid of it
people didn’t like the result. So I agree with Brent about that we should
keep them for cultural, not technical, reasons.

• • •

Returning to the discussion of operators, I am reminded of what happened
when we eliminated argument labels from functions passed as parameters. The
intent was and still is to reinstate them in a more robust manner.

However, during the interim the result has been a regression that goes
against the core tenets and philosophy of Swift. I would not want to repeat
that while waiting for official Unicode operator guidelines.

So I am strongly—adamantly—opposed to the operator-eviscerating portion of
this proposal.

We should make Braille characters, Hangzhou numerals, the empty set and the
infinity sign into identifiers. All other operators should remain as they
are until official Unicode recommendations exist, at which point we should
deprecate as necessary.

Nevin

···

On Wed, Oct 19, 2016 at 5:53 PM, Matthew Johnson via swift-evolution < swift-evolution@swift.org> wrote:

IMO, the best argument against using unicode symbols for operators defined

by mathematics is that they are currently difficult to type.

And there is no realistic hope of that changing. This issue is so
compelling that C and C++ introduced standardized text-ascii alternatives
for the punctuation operators to relieve stress on non-english keyboard
users.

I don’t agree that there is no realistic hope of that changing. It
appears to be pretty reasonable to anticipate that we’ll all be using
software-driven keyboards that can display software-defined symbols on the
keys in the relatively near future (probably 5 years, certainly 10). All
kinds of interesting things become possible when that happens, including
the ability to make unicode operators much easier to discover and type in a
programmer’s editor.

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Jonathan_S_Shapiro · October 20, 2016, 5:17am

The problem is that this set does not just contain mathematical operators,
but includes among other examples \u2205 (Empty Set) and \u221E (infinity).

Both of which are perfectly reasonable symbols to include.

From a UAX31 standpoint, the practical problem is that operator symbols are

going to get defined largely in terms of the existing symbol category. It's
not going to be perfect. Traditionally, Unicode standards have been defined
in terms of properties rather than blocks. I do think its worth asking
whether "mathematical symbols" is too broad and we may wish to consider
only "mathematical operators". I'll take that up with Mark.

This is one reason that I was briefly exploring whether operator
identifiers could actually be used as identifiers generally. The answer
boils down to: "not if operator symbols admit . (period)". Unfortunately,
the existing Swift standard library is *already* using .

Even if we were prepared to slog through all of the math symbols one by one
and decide which ones are operators, I'm not convinced that the UAX31
effort would be prepared to adopt the result. Part of the problem is that
it's not just about singleton code points. It's about codepoints that get
combined into operator identifiers that are then interpreted as operators.

Jonathan

···

On Wed, Oct 19, 2016 at 1:31 PM, David Waite via swift-evolution < swift-evolution@swift.org> wrote:

Jonathan_S_Shapiro · October 20, 2016, 5:36am

Thinking about it further, I am not convinced we need to make *any* change
to the set of operator characters at this time. It’s not like people are
clamoring to have Braille variable names after all. And as much as I’d like
to see the upside-down ampersand (⅋) as an operator, that too can wait.

Unfortunately we *do* need to make changes. At the very least, the current
definition of operators includes completely undefined codepoints. That's
just not OK. There are also *many* elements that are unlikely to be
incorporated in UAX31, and we want to be careful about backwards
compatibility issues and also cross-language interop issues. Including too
much risks incompatibility with future evolutions of UAX31 that other
languages are likely to adopt as a gold standard of interop.

I am hopeful that this proposal will be revised to focus solely on adopting

UAX-31.

That's definitely the goal, but we don't yet have a draft "operator
identifier" proposal in UAX31 to adopt. I think Xiaodi's goal here was to
arrive at a subset that would be future proof.

I've put in an email to the proposing group, and I expect there will be a
response, but I don't want to speak as if I'm representing the consensus
until I hear back from them. Can I ask everyone to engage patience on this
issue until some time tomorrow?

You are all very definitely being heard!

Jonathan

···

On Wed, Oct 19, 2016 at 10:26 PM, Nevin Brackett-Rozinsky via swift-evolution <swift-evolution@swift.org> wrote:

Russ_Bishop1 · October 20, 2016, 6:37am

I fully agree. It’s hella presumptuous to decide that I’m not allowed to express whimsy, frustration, humor, or any other emotions in my code. Or to tell an 8 year old using Playgrounds on the iPad that he/she can’t name a variable purely because they find it funny. We don’t have to squash the joy out of everything.

Russ

···

On Oct 19, 2016, at 1:46 PM, Brent Royal-Gordon via swift-evolution <swift-evolution@swift.org> wrote:

I was in the middle of writing about my opposition to the original proposal when I went to bed last night, and was going to advocate something like this:

Given the current state of the discussion over in Unicode land, I think it would probably be safe from a compatibility standpoint to admit code points that fall into the following (Unicode-style) code point set:

[:S:] - [:Sc:] - [:xidcontinue:] - [:nfcqc=n:] & [:scx=Common:] - pictographics - emoji

I suspect we can probably also do something about emoji, since I doubt UAX #31 is going to. Given that they are all static pictures of people or things, I think we can decide they are all nouns and thus all identifier characters. If we think there are some which might be declared operators later, we can exclude them for now, but I'd like to at least see the bulk of them brought in.

I think addressing emoji is important not for any technical reason, but for nontechnical ones. Emoji are a statement about Swift's modern approach; modernity is important. They are fun and whimsical; whimsy is important.

And most importantly, emoji identifiers are part of Swift's culture. It's widely understood that you don't use them in real code, but they are very common in examples. Just as we worry about source compatibility and binary compatibility, so we should worry about culture compatibility. Removing emoji would cause a gratuitous cultural regression.

xwu · October 20, 2016, 12:11am

I actually take the opposite view of emoji, and I was convinced of this by
arguments from some of the other authors (though they may not come to the
same conclusions as I do):

The real and very weighty reason Swift should support Unicode identifiers
is that naming things is hard, and it is serious, and we should be adamant
about this one thing:

Even if your primary language is not English, and even if your use of Swift
takes you beyond emulating the narrow set of standard library and
Foundation API names, you can still take all the care and attention in
naming things that we would want to promote in Swift by using your own
primary language. We want this to be the case wherever you were born,
whatever language your mother taught you, and we want to support this on
principle, whether or not we can find empiric evidence of open-source
projects on GitHub that make use of any particular language which we know
to be used in the world.

Previously, as we tackled this Unicode problem, a not-illegitimate critique
was that Swift's support of Unicode identifiers appeared to be frivolous,
because the only examples given in documentation are of emoji, and as you
say, it is there to be cute or whimsical. This appearance undermines that
very serious idea described above.

UAX#31 makes room for the removal of obsolete scripts such as Egyptian
hieroglyphics from the range of valid identifier characters on the basis
(at least in my reading of the document) that it adds to the burden of a
programming language without serving the weighty purpose of expressing
themselves in their primary language. By analogy, emoji similarly do not
serve that purpose, and since their parsing changes with every Unicode
update, we would be making changes to Swift every minor release for the
purpose of chasing a modish whimsy.

I was in the middle of writing about my opposition to the original proposal
when I went to bed last night, and was going to advocate something like
this:

Given the current state of the discussion over in Unicode land, I think

it would probably be safe from a compatibility standpoint to admit code
points that fall into the following (Unicode-style) code point set:

[:S:] - [:Sc:] - [:xidcontinue:] - [:nfcqc=n:] & [:scx=Common:] -

pictographics - emoji

I suspect we can probably also do something about emoji, since I doubt UAX
#31 is going to. Given that they are all static pictures of people or
things, I think we can decide they are all nouns and thus all identifier
characters. If we think there are some which might be declared operators
later, we can exclude them for now, but I'd like to at least see the bulk
of them brought in.

I think addressing emoji is important not for any technical reason, but for
nontechnical ones. Emoji are a statement about Swift's modern approach;
modernity is important. They are fun and whimsical; whimsy is important.

And most importantly, emoji identifiers are part of Swift's culture. It's
widely understood that you don't use them in real code, but they are very
common in examples. Just as we worry about source compatibility and binary
compatibility, so we should worry about culture compatibility. Removing
emoji would cause a gratuitous cultural regression.

···

On Thu, Oct 20, 2016 at 04:46 Brent Royal-Gordon via swift-evolution < swift-evolution@swift.org> wrote:

--
Brent Royal-Gordon
Architechies

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Chris_Lattner · October 25, 2016, 4:55am

Very well said Brent: +1 from me.

-Chris

···

On Oct 19, 2016, at 1:46 PM, Brent Royal-Gordon via swift-evolution <swift-evolution@swift.org> wrote:

I was in the middle of writing about my opposition to the original proposal when I went to bed last night, and was going to advocate something like this:

Given the current state of the discussion over in Unicode land, I think it would probably be safe from a compatibility standpoint to admit code points that fall into the following (Unicode-style) code point set:

[:S:] - [:Sc:] - [:xidcontinue:] - [:nfcqc=n:] & [:scx=Common:] - pictographics - emoji

I suspect we can probably also do something about emoji, since I doubt UAX #31 is going to. Given that they are all static pictures of people or things, I think we can decide they are all nouns and thus all identifier characters. If we think there are some which might be declared operators later, we can exclude them for now, but I'd like to at least see the bulk of them brought in.

I think addressing emoji is important not for any technical reason, but for nontechnical ones. Emoji are a statement about Swift's modern approach; modernity is important. They are fun and whimsical; whimsy is important.

And most importantly, emoji identifiers are part of Swift's culture. It's widely understood that you don't use them in real code, but they are very common in examples. Just as we worry about source compatibility and binary compatibility, so we should worry about culture compatibility. Removing emoji would cause a gratuitous cultural regression.

Chris_Lattner · October 25, 2016, 5:15am

Ok, but to clarify the requirement, *every* file would have to declare the operators it is using at the top of the file. It isn’t enough for them to be declared in some file within the current module. Not having this property breaks the ability to do a quick parse of a file without doing name lookup.

In addition to the tooling impact, going with such an approach would be very inconsistent with the rest of Swift’s grammar, which aims to be order independent (except in script files / top level code).

-Chris

···

On Oct 24, 2016, at 9:40 AM, Joe Groff via swift-evolution <swift-evolution@swift.org> wrote:

On Oct 22, 2016, at 5:53 PM, Jonathan S. Shapiro <jonathan.s.shapiro@gmail.com <mailto:jonathan.s.shapiro@gmail.com>> wrote:

I missed this earlier posting from Joe Groff, who wrote:

In the discussion about operators, I wonder whether it makes sense to formally separate "identifier" and "operator" characters at all. ...

The consequence if we do not formally separate the operators (verbs) from the identifiers (nouns) is that white space will be needed around all operators. That's not necessarily a bad thing, but it would be a significant and incompatible departure from today's Swift, both in terms of actual source code breakage and in terms of the "look and feel" that many people feel passionate about.

That's not a strict requirement. If we require operator usage to be declared explicitly, the lexer can accommodate those declarations. Since operators only appear as part of expressions inside bodies, the operator import or declaration doesn't even necessarily have to be ordered at the top of the file since we can still skip function bodies when parsing declarations (though I think we'd want to encourage imports on top anyway for the benefit of readers). This wouldn't be unprecedented—operators as they stand already effectively require an extra pass of parsing.

xwu · October 25, 2016, 10:19am

Unfortunately, Joe is correct on this point. As I stated earlier in the
thread, there are a series of characters that can be either text or emoji
in presentation, where the default presentation differs depending on
platform, technology, use case, or context. This is also not a bug, but
explicitly contemplated by Unicode technical recommendations. You can
convince yourself of this fact by looking up the Wikipedia page on the
Unicode "dingbats" block and comparing the rendering on Safari on iOS and
Safari on macOS. You will see that they are different.

Unfortunately, you are incorrect about the behavior of missing glyphs.
Unlike, say, Chinese displayed on a machine without the necessary fonts,
there is a security concern that Unicode 9 emoji not yet supported by Apple
are non-displaying on that platform. No placeholder appears. This includes
what is according to Emojipedia the #1 most popular emoji, the shrug.
(Check out Emojipedia on a Mac.) It appears that there is no required
placeholder glyph for unsupported emoji, so any of them can legitimately
disappear on a non-supported platform. This is an issue worth serious
consideration.

Finally, the issue remains live as to whether we can have some way of
confronting the issue that Apple platforms now have emoji that differ both
in name and appearance from other platforms. In the latest version of macOS
Sierra, a wide swath of emoji have been renamed to diverge from Unicode
guidelines. Some, like "pile of poo" have been renamed only subtly ("pile
of poop"). However, others like "imp" have been completely renamed, so that
where formerly the Apple rendering was a stretch as compared to Unicode
recommendations, now Apple platforms are literally drawing and describing a
completely different thing at the same codepoint. We in the Swift community
obviously cannot tell Apple how to draw or name their emoji. But, Apple is
clearly willing to unilaterally revise arbitrary numbers of emoji in a
single release of their OS and may continue to do so. Is this appropriate
for programming language identifiers? I think not. But I can't square that
with the clear demand for emoji identifiers in the community.

···

On Tue, Oct 25, 2016 at 00:41 Russ Bishop via swift-evolution < swift-evolution@swift.org> wrote:

On Oct 24, 2016, at 9:43 AM, Joe Groff via swift-evolution < > swift-evolution@swift.org> wrote:

On Oct 23, 2016, at 9:41 PM, Chris Lattner via swift-evolution < > swift-evolution@swift.org> wrote:

On Oct 18, 2016, at 11:34 PM, Jacob Bandes-Storch via swift-evolution < > swift-evolution@swift.org> wrote:

Dear Swift-Evolution community,

A few of us have been preparing a proposal to refine the definitions of
identifiers & operators. This includes some changes to the permitted
Unicode characters.

The latest (perhaps final?) draft is available here:

https://github.com/jtbandes/swift-evolution/blob/unicode-id-op/proposals/NNNN-refining-identifier-and-operator-symbology.md

We'd welcome your initial thoughts, and will probably submit a PR soon to
the swift-evolution repo for a formal review. Full text follows below.

I haven’t had a chance to read the entire proposal, nor the tons of great
discussion down thread, but here are a few thoughts, just MHO:

- I’m loving that you’re taking a detail oriented approach to the
problem. I agree with you that our current approach is unprincipled, and
we need to get this right for Swift 4.
- I think that it is perfectly fine to err on the side of conservatism: if
it isn’t clear how to classify something (e.g. Braille patterns), we should
just reject them in both operators and identifiers (make them be
unassigned). If these unclear cases are important to someone, then we can
consider (as a separate additive proposal) adding them back later.
- As to conservatism, explicitly reserving “..” (for possible future
language directions) seems reasonable to me. Are there any other similar
things we should consider reserving?

- I applaud the creativity keeping a valid identifier :-), but it is
really missing the point. *All* of the non-symbol-like emoji’s should be
valid in identifiers. With a quick unscientific look at Apple’s character
picker, all the emojis other than a few in “Symbols” seem like they should
be identifiers. It would be fine to conservatively leave all emoji
“symbols” as unassigned.

The problem with this is that "emoji" is not a well-defined category by
Unicode. Whether a character is rendered as emoji or a traditional symbol
in a given font on a given platform can depend on variation selectors, and
the exact variation selectors (or lack thereof) that choose emoji or
traditional representation are non-portable, even among different text
rendering APIs on the same platform (e.g. ATSUI vs TextKit vs CoreText vs
WebKit on Darwin).

-Joe

I’m not sure that is true. Unicode gives the list:
http://unicode.org/emoji/charts/full-emoji-list.html\.

If a platform can’t render the ZJW sequences it can render them as
separate Emoji, but Swift can still treat that as the same identifier.

== 🏼

If you don’t have a font capable of displaying the character at all that
isn’t any different from not having a Chinese font available. You should
get the missing character glyph. The list of emoji base characters is not
unrestricted - there is a specific and limited list of valid base
characters that accept modifiers.

If we wanted to go further and say that all Emoji modifiers are preserved
and rendered if possible but not considered part of the identifier that
would be OK with me. Same for variation selectors.

Russ

- I really think we should keep symbols as operators, including much of
the math symbols (e.g. ∪). In a later separate proposal, we can consider
whether it makes sense for emoji symbols (like to be usable as
operators), I can see arguments both ways.

-Chris

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Joe_Groff · October 25, 2016, 5:50pm

Good to hear that there's some work being done into standardizing emoji—this wasn't the case three years ago when we adopted the N1518 character set. If there's a Unicode standard we can cite for what constitutes an emoji, that'd be a good foundation for "supporting emoji" in Swift syntax, though as Xiaodi notes, implementations still vary quite a bit in what they present as emoji in practice, and the fact that unsupported emoji are allowed to rendered completely invisible is troubling.

-Joe

···

On Oct 24, 2016, at 10:40 PM, Russ Bishop <xenadu@gmail.com> wrote:

On Oct 24, 2016, at 9:43 AM, Joe Groff via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Oct 23, 2016, at 9:41 PM, Chris Lattner via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Oct 18, 2016, at 11:34 PM, Jacob Bandes-Storch via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Dear Swift-Evolution community,

A few of us have been preparing a proposal to refine the definitions of identifiers & operators. This includes some changes to the permitted Unicode characters.

The latest (perhaps final?) draft is available here:

https://github.com/jtbandes/swift-evolution/blob/unicode-id-op/proposals/NNNN-refining-identifier-and-operator-symbology.md

We'd welcome your initial thoughts, and will probably submit a PR soon to the swift-evolution repo for a formal review. Full text follows below.

I haven’t had a chance to read the entire proposal, nor the tons of great discussion down thread, but here are a few thoughts, just MHO:

- I’m loving that you’re taking a detail oriented approach to the problem. I agree with you that our current approach is unprincipled, and we need to get this right for Swift 4.
- I think that it is perfectly fine to err on the side of conservatism: if it isn’t clear how to classify something (e.g. Braille patterns), we should just reject them in both operators and identifiers (make them be unassigned). If these unclear cases are important to someone, then we can consider (as a separate additive proposal) adding them back later.
- As to conservatism, explicitly reserving “..” (for possible future language directions) seems reasonable to me. Are there any other similar things we should consider reserving?

- I applaud the creativity keeping a valid identifier :-), but it is really missing the point. *All* of the non-symbol-like emoji’s should be valid in identifiers. With a quick unscientific look at Apple’s character picker, all the emojis other than a few in “Symbols” seem like they should be identifiers. It would be fine to conservatively leave all emoji “symbols” as unassigned.

The problem with this is that "emoji" is not a well-defined category by Unicode. Whether a character is rendered as emoji or a traditional symbol in a given font on a given platform can depend on variation selectors, and the exact variation selectors (or lack thereof) that choose emoji or traditional representation are non-portable, even among different text rendering APIs on the same platform (e.g. ATSUI vs TextKit vs CoreText vs WebKit on Darwin).

-Joe

I’m not sure that is true. Unicode gives the list: http://unicode.org/emoji/charts/full-emoji-list.html\.

If a platform can’t render the ZJW sequences it can render them as separate Emoji, but Swift can still treat that as the same identifier.

== 🏼

If you don’t have a font capable of displaying the character at all that isn’t any different from not having a Chinese font available. You should get the missing character glyph. The list of emoji base characters is not unrestricted - there is a specific and limited list of valid base characters that accept modifiers.

If we wanted to go further and say that all Emoji modifiers are preserved and rendered if possible but not considered part of the identifier that would be OK with me. Same for variation selectors.

xwu · October 20, 2016, 7:06am

Point well taken, but FWIW, there is a large difference between *you*
expressing whimsy and committing the language to an open-ended series of
continuous revisions for the sole purpose of enabling one particular form
of whimsy. It's rather an overstatement to say that we are proposing to
"squash the joy out of everything," as though we all lived our lives in
states of ascetic deprivation before the advent of emoji.

···

On Thu, Oct 20, 2016 at 14:38 Russ Bishop via swift-evolution < swift-evolution@swift.org> wrote:

On Oct 19, 2016, at 1:46 PM, Brent Royal-Gordon via swift-evolution < > swift-evolution@swift.org> wrote:

I was in the middle of writing about my opposition to the original
proposal when I went to bed last night, and was going to advocate something
like this:

Given the current state of the discussion over in Unicode land, I think it
would probably be safe from a compatibility standpoint to admit code points
that fall into the following (Unicode-style) code point set:

[:S:] - [:Sc:] - [:xidcontinue:] - [:nfcqc=n:] & [:scx=Common:] -
pictographics - emoji

I suspect we can probably also do something about emoji, since I doubt UAX
#31 is going to. Given that they are all static pictures of people or
things, I think we can decide they are all nouns and thus all identifier
characters. If we think there are some which might be declared operators
later, we can exclude them for now, but I'd like to at least see the bulk
of them brought in.

I think addressing emoji is important not for any technical reason, but
for nontechnical ones. Emoji are a statement about Swift's modern approach;
modernity is important. They are fun and whimsical; whimsy is important.

And most importantly, emoji identifiers are part of Swift's culture. It's
widely understood that you don't use them in real code, but they are very
common in examples. Just as we worry about source compatibility and binary
compatibility, so we should worry about culture compatibility. Removing
emoji would cause a gratuitous cultural regression.

I fully agree. It’s hella presumptuous to decide that I’m not allowed to
express whimsy, frustration, humor, or any other emotions in my code. Or to
tell an 8 year old using Playgrounds on the iPad that he/she can’t name a
variable purely because they find it *funny*. We don’t have to squash
the joy out of *everything*.

Russ
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Jean-Daniel · October 20, 2016, 6:46am

I actually take the opposite view of emoji, and I was convinced of this by arguments from some of the other authors (though they may not come to the same conclusions as I do):

The real and very weighty reason Swift should support Unicode identifiers is that naming things is hard, and it is serious, and we should be adamant about this one thing:

Even if your primary language is not English, and even if your use of Swift takes you beyond emulating the narrow set of standard library and Foundation API names, you can still take all the care and attention in naming things that we would want to promote in Swift by using your own primary language. We want this to be the case wherever you were born, whatever language your mother taught you, and we want to support this on principle, whether or not we can find empiric evidence of open-source projects on GitHub that make use of any particular language which we know to be used in the world.

Previously, as we tackled this Unicode problem, a not-illegitimate critique was that Swift's support of Unicode identifiers appeared to be frivolous, because the only examples given in documentation are of emoji, and as you say, it is there to be cute or whimsical. This appearance undermines that very serious idea described above.

And removing emoji remove the possibility to write simple sample code using an universal language that is understandable whatever language your mother taught you. If you want to have a car variable or a dog variable, just use the emoji and everybody can read it without hesitation.

···

Le 20 oct. 2016 à 02:11, Xiaodi Wu via swift-evolution <swift-evolution@swift.org> a écrit :

UAX#31 makes room for the removal of obsolete scripts such as Egyptian hieroglyphics from the range of valid identifier characters on the basis (at least in my reading of the document) that it adds to the burden of a programming language without serving the weighty purpose of expressing themselves in their primary language. By analogy, emoji similarly do not serve that purpose, and since their parsing changes with every Unicode update, we would be making changes to Swift every minor release for the purpose of chasing a modish whimsy.

On Thu, Oct 20, 2016 at 04:46 Brent Royal-Gordon via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:
I was in the middle of writing about my opposition to the original proposal when I went to bed last night, and was going to advocate something like this:

> Given the current state of the discussion over in Unicode land, I think it would probably be safe from a compatibility standpoint to admit code points that fall into the following (Unicode-style) code point set:
>
> [:S:] - [:Sc:] - [:xidcontinue:] - [:nfcqc=n:] & [:scx=Common:] - pictographics - emoji

I suspect we can probably also do something about emoji, since I doubt UAX #31 is going to. Given that they are all static pictures of people or things, I think we can decide they are all nouns and thus all identifier characters. If we think there are some which might be declared operators later, we can exclude them for now, but I'd like to at least see the bulk of them brought in.

I think addressing emoji is important not for any technical reason, but for nontechnical ones. Emoji are a statement about Swift's modern approach; modernity is important. They are fun and whimsical; whimsy is important.

And most importantly, emoji identifiers are part of Swift's culture. It's widely understood that you don't use them in real code, but they are very common in examples. Just as we worry about source compatibility and binary compatibility, so we should worry about culture compatibility. Removing emoji would cause a gratuitous cultural regression.

--
Brent Royal-Gordon
Architechies

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Erica_Sadun · October 20, 2016, 3:08pm

I really liked Jonathan's suggestion that removed the distinction between operators and identifiers entirely. You could mark a one-argument function as postfix or prefix, and a two-argument function as infix and use them as a kind of pseudo keyword.

-- E

···

On Oct 19, 2016, at 11:17 PM, Jonathan S. Shapiro via swift-evolution <swift-evolution@swift.org> wrote:

On Wed, Oct 19, 2016 at 1:31 PM, David Waite via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:
The problem is that this set does not just contain mathematical operators, but includes among other examples \u2205 (Empty Set) and \u221E (infinity).

Both of which are perfectly reasonable symbols to include.

From a UAX31 standpoint, the practical problem is that operator symbols are going to get defined largely in terms of the existing symbol category. It's not going to be perfect. Traditionally, Unicode standards have been defined in terms of properties rather than blocks. I do think its worth asking whether "mathematical symbols" is too broad and we may wish to consider only "mathematical operators". I'll take that up with Mark.

This is one reason that I was briefly exploring whether operator identifiers could actually be used as identifiers generally. The answer boils down to: "not if operator symbols admit . (period)". Unfortunately, the existing Swift standard library is already using .

Erica_Sadun · October 20, 2016, 3:18pm

The problem isn't whimsy so much as it's selecting the right set. If you can point to a standard (or create one) that provides a good set, which does not introduce the issues described in the proposal, that would be a great starting step for adapting the proposed approach. The same goes for the mathematical operators.

-- E

···

On Oct 20, 2016, at 12:37 AM, Russ Bishop via swift-evolution <swift-evolution@swift.org> wrote:

On Oct 19, 2016, at 1:46 PM, Brent Royal-Gordon via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

I was in the middle of writing about my opposition to the original proposal when I went to bed last night, and was going to advocate something like this:

Given the current state of the discussion over in Unicode land, I think it would probably be safe from a compatibility standpoint to admit code points that fall into the following (Unicode-style) code point set:

[:S:] - [:Sc:] - [:xidcontinue:] - [:nfcqc=n:] & [:scx=Common:] - pictographics - emoji

I suspect we can probably also do something about emoji, since I doubt UAX #31 is going to. Given that they are all static pictures of people or things, I think we can decide they are all nouns and thus all identifier characters. If we think there are some which might be declared operators later, we can exclude them for now, but I'd like to at least see the bulk of them brought in.

I think addressing emoji is important not for any technical reason, but for nontechnical ones. Emoji are a statement about Swift's modern approach; modernity is important. They are fun and whimsical; whimsy is important.

And most importantly, emoji identifiers are part of Swift's culture. It's widely understood that you don't use them in real code, but they are very common in examples. Just as we worry about source compatibility and binary compatibility, so we should worry about culture compatibility. Removing emoji would cause a gratuitous cultural regression.

I fully agree. It’s hella presumptuous to decide that I’m not allowed to express whimsy, frustration, humor, or any other emotions in my code. Or to tell an 8 year old using Playgrounds on the iPad that he/she can’t name a variable purely because they find it funny. We don’t have to squash the joy out of everything.

Russ

Joe_Groff · October 25, 2016, 3:24pm

I missed this earlier posting from Joe Groff, who wrote:

In the discussion about operators, I wonder whether it makes sense to formally separate "identifier" and "operator" characters at all. ...

The consequence if we do not formally separate the operators (verbs) from the identifiers (nouns) is that white space will be needed around all operators. That's not necessarily a bad thing, but it would be a significant and incompatible departure from today's Swift, both in terms of actual source code breakage and in terms of the "look and feel" that many people feel passionate about.

That's not a strict requirement. If we require operator usage to be declared explicitly, the lexer can accommodate those declarations. Since operators only appear as part of expressions inside bodies, the operator import or declaration doesn't even necessarily have to be ordered at the top of the file since we can still skip function bodies when parsing declarations (though I think we'd want to encourage imports on top anyway for the benefit of readers). This wouldn't be unprecedented—operators as they stand already effectively require an extra pass of parsing.

Ok, but to clarify the requirement, *every* file would have to declare the operators it is using at the top of the file. It isn’t enough for them to be declared in some file within the current module. Not having this property breaks the ability to do a quick parse of a file without doing name lookup.

Yeah, that's a tradeoff. I think that requiring non-standard operator use to be explicitly declared could be a good thing, though, since I don't think that we can realistically expect users to learn or intuitively agree on what glyphs are "operator" or "identifier", no matter what character set we design.

In addition to the tooling impact, going with such an approach would be very inconsistent with the rest of Swift’s grammar, which aims to be order independent (except in script files / top level code).

As long as { } aren't in the operator character set, we should still be able to skip function bodies without parsing, so operator use declarations could still be order-independent at the top level of declarations. (Whether it's a good idea to bury your import declarations in the middle of your other decls is another story.)

-Joe

···

On Oct 24, 2016, at 10:15 PM, Chris Lattner <clattner@apple.com> wrote:

On Oct 24, 2016, at 9:40 AM, Joe Groff via swift-evolution <swift-evolution@swift.org> wrote:

On Oct 22, 2016, at 5:53 PM, Jonathan S. Shapiro <jonathan.s.shapiro@gmail.com> wrote:

akashivskyy · October 25, 2016, 6:45pm

I’m also -1 on disallowing emojis as identifiers. As it was stated may times before, emojis are an important part of modern communication, especially between young people and kids.

I understand the complexity of keeping them around, especially if they are not well-defined by Unicode and if they are not rendered correctly in certain environments, but that seems like a valid argument to defer this discussion and not to make rash decision.

In the end, as a compromise, I would vote to perhaps restrict the range of allowed emojis until they become more standardized.

– Adrian

Russ_Bishop1 · October 28, 2016, 12:35am

Unfortunately, Joe is correct on this point. As I stated earlier in the thread, there are a series of characters that can be either text or emoji in presentation, where the default presentation differs depending on platform, technology, use case, or context. This is also not a bug, but explicitly contemplated by Unicode technical recommendations. You can convince yourself of this fact by looking up the Wikipedia page on the Unicode "dingbats" block and comparing the rendering on Safari on iOS and Safari on macOS. You will see that they are different.

Unfortunately, you are incorrect about the behavior of missing glyphs. Unlike, say, Chinese displayed on a machine without the necessary fonts, there is a security concern that Unicode 9 emoji not yet supported by Apple are non-displaying on that platform. No placeholder appears. This includes what is according to Emojipedia the #1 most popular emoji, the shrug. (Check out Emojipedia on a Mac.) It appears that there is no required placeholder glyph for unsupported emoji, so any of them can legitimately disappear on a non-supported platform. This is an issue worth serious consideration.

IMHO I don’t think Swift needs to be designed around rendering bugs with specific fonts on specific platforms. We can file a radar to have this corrected. I’m not aware of anything in Unicode that says it is acceptable to just drop unknown characters. I think some ZJW sequences or modifiers can be ignored; anything that can be ignored for rendering should be ignored for uniqueness of identifiers too.

Russ

···

On Oct 25, 2016, at 3:19 AM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

On Tue, Oct 25, 2016 at 00:41 Russ Bishop via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Oct 24, 2016, at 9:43 AM, Joe Groff via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Oct 23, 2016, at 9:41 PM, Chris Lattner via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Oct 18, 2016, at 11:34 PM, Jacob Bandes-Storch via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Dear Swift-Evolution community,

A few of us have been preparing a proposal to refine the definitions of identifiers & operators. This includes some changes to the permitted Unicode characters.

The latest (perhaps final?) draft is available here:

https://github.com/jtbandes/swift-evolution/blob/unicode-id-op/proposals/NNNN-refining-identifier-and-operator-symbology.md

We'd welcome your initial thoughts, and will probably submit a PR soon to the swift-evolution repo for a formal review. Full text follows below.

I haven’t had a chance to read the entire proposal, nor the tons of great discussion down thread, but here are a few thoughts, just MHO:

- I’m loving that you’re taking a detail oriented approach to the problem. I agree with you that our current approach is unprincipled, and we need to get this right for Swift 4.
- I think that it is perfectly fine to err on the side of conservatism: if it isn’t clear how to classify something (e.g. Braille patterns), we should just reject them in both operators and identifiers (make them be unassigned). If these unclear cases are important to someone, then we can consider (as a separate additive proposal) adding them back later.
- As to conservatism, explicitly reserving “..” (for possible future language directions) seems reasonable to me. Are there any other similar things we should consider reserving?

- I applaud the creativity keeping a valid identifier :-), but it is really missing the point. *All* of the non-symbol-like emoji’s should be valid in identifiers. With a quick unscientific look at Apple’s character picker, all the emojis other than a few in “Symbols” seem like they should be identifiers. It would be fine to conservatively leave all emoji “symbols” as unassigned.

The problem with this is that "emoji" is not a well-defined category by Unicode. Whether a character is rendered as emoji or a traditional symbol in a given font on a given platform can depend on variation selectors, and the exact variation selectors (or lack thereof) that choose emoji or traditional representation are non-portable, even among different text rendering APIs on the same platform (e.g. ATSUI vs TextKit vs CoreText vs WebKit on Darwin).

-Joe

I’m not sure that is true. Unicode gives the list: http://unicode.org/emoji/charts/full-emoji-list.html\.

If a platform can’t render the ZJW sequences it can render them as separate Emoji, but Swift can still treat that as the same identifier.

== 🏼

If you don’t have a font capable of displaying the character at all that isn’t any different from not having a Chinese font available. You should get the missing character glyph. The list of emoji base characters is not unrestricted - there is a specific and limited list of valid base characters that accept modifiers.

If we wanted to go further and say that all Emoji modifiers are preserved and rendered if possible but not considered part of the identifier that would be OK with me. Same for variation selectors.

Russ

- I really think we should keep symbols as operators, including much of the math symbols (e.g. ∪). In a later separate proposal, we can consider whether it makes sense for emoji symbols (like to be usable as operators), I can see arguments both ways.

-Chris

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

trs · October 28, 2016, 4:13pm

+1

-Thorsten

···

Am 20.10.2016 um 02:32 schrieb Nevin Brackett-Rozinsky via swift-evolution <swift-evolution@swift.org>:

I strongly oppose the proposed mass-removal of operator characters. It would be a major loss to the Swift language if that were to occur even temporarily.

The long-term goal is for Swift to adopt the official Unicode guidance for operator characters, which is still under development. Therefore I believe we should make only minor and obvious changes right now, because there is no sense in “jumping the gun” and causing unnecessary source-breaking changes.

In particular, we should make it clear that Swift will most likely adopt the Unicode operator conventions when they become available, so people are aware and prepared.

When the time comes, we should deprecate any operator characters that Unicode recommends against (unless we have a good reason not to), before removing them in the next major release. The deprecation period ensures that source-breaking changes result in a warning at first, so developers have time to adapt.

I just went through all the valid operator characters in Swift, and the only ones I would recommend to eliminate at this time are:
U+2800 – U+28FF (Braille patterns)
U+3021 – U+3029 (Hangzhou numerals)
U+2205 and U+221E (Empty set and Infinity)

Additionally, I propose to *add* one operator that is missing:
U+214B (Turned ampersand)

• • •

As for the rest of the proposal, I suppose normalizing identifiers and dealing with confusable characters in a sensible way.

Regarding emoji, I look at them rather like the “I’m feeling lucky” button on Google—essentially nobody uses it, but when they tried getting rid of it people didn’t like the result. So I agree with Brent about that we should keep them for cultural, not technical, reasons.

• • •

Returning to the discussion of operators, I am reminded of what happened when we eliminated argument labels from functions passed as parameters. The intent was and still is to reinstate them in a more robust manner.

However, during the interim the result has been a regression that goes against the core tenets and philosophy of Swift. I would not want to repeat that while waiting for official Unicode operator guidelines.

So I am strongly—adamantly—opposed to the operator-eviscerating portion of this proposal.

We should make Braille characters, Hangzhou numerals, the empty set and the infinity sign into identifiers. All other operators should remain as they are until official Unicode recommendations exist, at which point we should deprecate as necessary.

Nevin

On Wed, Oct 19, 2016 at 5:53 PM, Matthew Johnson via swift-evolution <swift-evolution@swift.org> wrote:

IMO, the best argument against using unicode symbols for operators defined by mathematics is that they are currently difficult to type.

And there is no realistic hope of that changing. This issue is so compelling that C and C++ introduced standardized text-ascii alternatives for the punctuation operators to relieve stress on non-english keyboard users.

I don’t agree that there is no realistic hope of that changing. It appears to be pretty reasonable to anticipate that we’ll all be using software-driven keyboards that can display software-defined symbols on the keys in the relatively near future (probably 5 years, certainly 10). All kinds of interesting things become possible when that happens, including the ability to make unicode operators much easier to discover and type in a programmer’s editor.

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Austin · October 20, 2016, 7:12am

Is there a compromise we can come up with, maybe? Allow emoji in identifiers, but freeze the set of allowed emoji to whatever the current version of the Unicode spec defines with the intention that 'automatic expansion' of the allowed character set to accommodate future emoji is a non-goal? (Does Unicode even provide a way to express "the set of emoji characters supported by Specific Unicode Specification X"?)

Austin

···

On Oct 20, 2016, at 12:06 AM, Xiaodi Wu via swift-evolution <swift-evolution@swift.org> wrote:

Point well taken, but FWIW, there is a large difference between *you* expressing whimsy and committing the language to an open-ended series of continuous revisions for the sole purpose of enabling one particular form of whimsy. It's rather an overstatement to say that we are proposing to "squash the joy out of everything," as though we all lived our lives in states of ascetic deprivation before the advent of emoji.

On Thu, Oct 20, 2016 at 14:38 Russ Bishop via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Oct 19, 2016, at 1:46 PM, Brent Royal-Gordon via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

I was in the middle of writing about my opposition to the original proposal when I went to bed last night, and was going to advocate something like this:

Given the current state of the discussion over in Unicode land, I think it would probably be safe from a compatibility standpoint to admit code points that fall into the following (Unicode-style) code point set:

[:S:] - [:Sc:] - [:xidcontinue:] - [:nfcqc=n:] & [:scx=Common:] - pictographics - emoji

I suspect we can probably also do something about emoji, since I doubt UAX #31 is going to. Given that they are all static pictures of people or things, I think we can decide they are all nouns and thus all identifier characters. If we think there are some which might be declared operators later, we can exclude them for now, but I'd like to at least see the bulk of them brought in.

I think addressing emoji is important not for any technical reason, but for nontechnical ones. Emoji are a statement about Swift's modern approach; modernity is important. They are fun and whimsical; whimsy is important.

And most importantly, emoji identifiers are part of Swift's culture. It's widely understood that you don't use them in real code, but they are very common in examples. Just as we worry about source compatibility and binary compatibility, so we should worry about culture compatibility. Removing emoji would cause a gratuitous cultural regression.

I fully agree. It’s hella presumptuous to decide that I’m not allowed to express whimsy, frustration, humor, or any other emotions in my code. Or to tell an 8 year old using Playgrounds on the iPad that he/she can’t name a variable purely because they find it funny. We don’t have to squash the joy out of everything.

Russ
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Jonathan_S_Shapiro · October 20, 2016, 2:03pm

Is there a compromise we can come up with, maybe?

So speaking just for myself, I strongly oppose emojis because every example
of emoji code I have seen has been truly obfuscated. Emojis therefore
present very serious and active source-level security risks that will
require significant engineering investment to manage and will never be
fully managed successfully.

That said, I'm very glad that some people here have pointed out the "kid
use case", because I had not considered that one. I think that's actually
pretty compelling.

Let me ask a question: would single-character emoji identifiers be enough,
or do we need multi-character emojis? Single-character emoji identifiers
would go a long way toward limiting the capacity for obfuscation, but I'm
guessing it won't be enough for a bunch of people here.

Freeze the set of allowed emoji to whatever the current version of the
Unicode spec defines...

UAX31 won't include emojis in either space, because there is no clear
consensus about where they belong (identifiers or operators). Individual
languages can certainly add them to one space or the other, but should take
care not to cross-contaminate. So if we add them to operators, we need to
exclude any that are already part of normal identifiers and vice versa.
That sanity restriction is technically necessary, but it shouldn't be an
inconvenience in practical terms.

Jonathan

···

On Thu, Oct 20, 2016 at 12:12 AM, Austin Zheng via swift-evolution < swift-evolution@swift.org> wrote: