A path forward on rationalizing unicode identifiers and operators

fclout · October 2, 2017, 8:13am

If you tried hard enough, you could probably create a variable that looks like it's shadowing one from an outer scope while it actually isn't, and use the two to confuse readers. This could trick people into thinking that some dangerous/backdoor code is actually good and safe, especially in the open-source world where you can't always trust your contributors.

On one hand, other than the complexity of telling if two characters are lookalikes, I don't know why Αrray (GREEK CAPITAL LETTER ALPHA) and Array (LATIN CAPITAL LETTER A) should be considered different identifiers. On the other hand, I struggle to imagine the specifics of an exploit that uses that. You'd have to work pretty hard to assemble all the pieces of a backdoor in visually-similar variable names without arousing suspicion.

Félix

···

Le 1 oct. 2017 à 22:30, Kenny Leung via swift-evolution <swift-evolution@swift.org> a écrit :

I guess theoretically you could have two variables that look alike, but are actually different values, allowing you to insert some obfuscated malicious code somehow.

-Kenny

On Oct 1, 2017, at 10:01 PM, Chris Lattner <clattner@nondot.org <mailto:clattner@nondot.org>> wrote:

On Oct 1, 2017, at 9:26 PM, Kenny Leung via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Hi All.

I’d like to help as well. I have fun with operators.

There is also the issue of code security with invisible unicode characters and characters that look exactly alike.

Unless there is a compelling reason to add them, I think we should ban invisible characters. What is the harm of characters that look alike?

-Chris

(They should make a Coding font that ensures all characters look different.) Was that ever resolved? Googling, I found this:

[swift-evolution] Prohibit invisible characters in identifier names

Which seems to have been left at this:

[swift-evolution] [Proposal] Normalize Unicode Identifiers

The swift-evolution The Week Of Monday 19 September 2016 Archive by thread

Should we throw all of this into the same pot, and make any characters that aren’t on the approved list illegal?

-Kenny

On Sep 30, 2017, at 4:13 PM, Xiaodi Wu via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

I’m happy to participate in the reshaping of the proposal. It would be nice to gather a group of people again to help drive it forward.

That said, it’s unclear to me that superscript T is clearly an operator, any more than would be superscript H (Hermitian), superscript 2, superscript 3, etc. But at any rate, this would be discussion for the future workgroup.

I would strongly advocate that the things-that-are-identifiers group be strongly tied to the existing, complete Unicode standard for such, and that the critical parts of the previous document about normalization be retained.

On Sat, Sep 30, 2017 at 17:59 Chris Lattner via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

The core team recently met to discuss PR609 - Refining identifier and operator symbology:
https://github.com/xwu/swift-evolution/blob/7c2c4df63b1d92a1677461f41bc638f31926c9c3/proposals/NNNN-refining-identifier-and-operator-symbology.md

The proposal correctly observes that the partitioning of unicode codepoints into identifiers and operators is a mess in some cases. It really is an outright bug for to be an identifier, but to be an operator. That said, the proposal itself is complicated and is defined in terms of a bunch of unicode classes that may evolve in the “wrong way for Swift” in the future.

The core team would really like to get this sorted out for Swift 5, and sooner is better than later :-). Because it seems that this is a really hard problem and that perfection is becoming the enemy of good <https://en.wikipedia.org/wiki/Perfect_is_the_enemy_of_good>, the core team requests the creation of a new proposal with a different approach. The general observation is that there are three kinds of characters: things that are obviously identifiers, things that are obviously math operators, and things that are non-obvious. Things that are non-obvious can be made into invalid code points, and legislated later in follow-up proposals if/when someone cares to argue for them.

To make progress on this, we suggest a few separable steps:

First, please split out the changes to the ASCII characters (e.g. . and \ operator parsing rules) to its own (small) proposal, since it is unrelated to the unicode changes, and can make progress on that proposal independently.

Second, someone should take a look at the concrete set of unicode identifiers that are accepted by Swift 4 and write a new proposal that splits them into the three groups: those that are clearly identifiers (which become identifiers), those that are clearly operators (which become operators), and those that are unclear or don’t matter (these become invalid code points).

I suggest that the criteria be based on utility for Swift code, not on the underlying unicode classification. For example, the discussion thread for PR609 mentions that the T character in “ xᵀ ” is defined in unicode as a latin “letter”. Despite that, its use is Swift would clearly be as a postfix operator, so we should classify it as an operator.

Other suggestions:
- Math symbols are operators excepting those primarily used as identifiers like “alpha”. If there are any characters that are used for both, this proposal should make them invalid.
- While there may be useful ranges for some identifiers (e.g. to handle european accented characters), the Emoji range should probably have each codepoint independently judged, and currently unassigned codepoints should not get a meaning defined for them.
- Unicode “faces”, “people”, “animals” etc are all identifiers.
- In order to reduce the scope of the proposal, it is a safe default to exclude characters that are unlikely to be used by Swift code today, including Braille, weird currency symbols, or any set of characters that are so broken and useless in Swift 4 that it isn’t worth worrying about.
- The proposal is likely to turn a large number of code points into rejected characters. In the discussions, some people will be tempted to argue endlessly about individual rejections. To control that, we can require that people point out an example where the character is already in use, or where it has a clear application to a domain that is known today: the discussion needs to be grounded and practical, not theoretical.

Third, if there is interest sometime in the future, we can have subsequent proposals that expand the range of accepted code points, motivated by the specific application domain that cares about them. These proposals will not be source breaking, so they can happen at any time.

Is anyone interested in helping to push this effort forward?

-Chris

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

VladimirS · October 2, 2017, 11:59am

I guess theoretically you could have two variables that look alike, but are actually different values, allowing you to insert some obfuscated malicious code somehow.

Also, IIRC, there is a "similar" problem exists with Right-To-Left "modifier", so when inserted inside some variable name, you *see* (in browser/in editor) not the same variable name that will be used *by compiler*. Can't find the link right now, but if this could be helpful - will try to find.

Vladimir.

···

On 02.10.2017 8:30, Kenny Leung via swift-evolution wrote:

-Kenny

On Oct 1, 2017, at 10:01 PM, Chris Lattner <clattner@nondot.org >> <mailto:clattner@nondot.org>> wrote:

On Oct 1, 2017, at 9:26 PM, Kenny Leung via swift-evolution >>> <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Hi All.

I’d like to help as well. I have fun with operators.

There is also the issue of code security with invisible unicode characters and characters that look exactly alike.

Unless there is a compelling reason to add them, I think we should ban invisible characters. What is the harm of characters that look alike?

-Chris

(They should make a Coding font that ensures all characters look different.) Was that ever resolved? Googling, I found this:

[swift-evolution] Prohibit invisible characters in identifier names

Which seems to have been left at this:

[swift-evolution] [Proposal] Normalize Unicode Identifiers

The swift-evolution The Week Of Monday 19 September 2016 Archive by thread

Should we throw all of this into the same pot, and make any characters that aren’t on the approved list illegal?

-Kenny

On Sep 30, 2017, at 4:13 PM, Xiaodi Wu via swift-evolution >>>> <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

I’m happy to participate in the reshaping of the proposal. It would be nice to gather a group of people again to help drive it forward.

That said, it’s unclear to me that superscript T is clearly an operator, any more than would be superscript H (Hermitian), superscript 2, superscript 3, etc. But at any rate, this would be discussion for the future workgroup.

I would strongly advocate that the things-that-are-identifiers group be strongly tied to the existing, complete Unicode standard for such, and that the critical parts of the previous document about normalization be retained.

On Sat, Sep 30, 2017 at 17:59 Chris Lattner via swift-evolution >>>> <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

    The core team recently met to discuss PR609 - Refining identifier and
    operator symbology:
    https://github.com/xwu/swift-evolution/blob/7c2c4df63b1d92a1677461f41bc638f31926c9c3/proposals/NNNN-refining-identifier-and-operator-symbology.md

    The proposal correctly observes that the partitioning of unicode codepoints
    into identifiers and operators is a mess in some cases. It really is an
    outright bug for to be an identifier, but to be an operator. That
    said, the proposal itself is complicated and is defined in terms of a bunch
    of unicode classes that may evolve in the “wrong way for Swift” in the future.

    The core team would really like to get this sorted out for Swift 5, and
    sooner is better than later :-). Because it seems that this is a really hard
    problem and that perfection is becoming the enemy of good
    <https://en.wikipedia.org/wiki/Perfect_is_the_enemy_of_good>, the core team
    requests the creation of a new proposal with a different approach. The
    general observation is that there are three kinds of characters: things that
    are obviously identifiers, things that are obviously math operators, and
    things that are non-obvious. Things that are non-obvious can be made into
    invalid code points, and legislated later in follow-up proposals if/when
    someone cares to argue for them.

    To make progress on this, we suggest a few separable steps:

    First, please split out the changes to the ASCII characters (e.g. . and \
    operator parsing rules) to its own (small) proposal, since it is unrelated to
    the unicode changes, and can make progress on that proposal independently.

    Second, someone should take a look at the concrete set of unicode identifiers
    that are accepted by Swift 4 and write a new proposal that splits them into
    the three groups: those that are clearly identifiers (which become
    identifiers), those that are clearly operators (which become operators), and
    those that are unclear or don’t matter (these become invalid code points).

    I suggest that the criteria be based on*utility for Swift code*, not on the
    underlying unicode classification. For example, the discussion thread for
    PR609 mentions that the T character in “ xᵀ ” is defined in unicode as a
    latin “letter”. Despite that, its use is Swift would clearly be as a postfix
    operator, so we should classify it as an operator.

    Other suggestions:
     - Math symbols are operators excepting those primarily used as identifiers
    like “alpha”. If there are any characters that are used for both, this
    proposal should make them invalid.
     - While there may be useful ranges for some identifiers (e.g. to handle
    european accented characters), the Emoji range should probably have each
    codepoint independently judged, and currently unassigned codepoints should
    not get a meaning defined for them.
     - Unicode “faces”, “people”, “animals” etc are all identifiers.
     - In order to reduce the scope of the proposal, it is a safe default to
    exclude characters that are unlikely to be used by Swift code today,
    including Braille, weird currency symbols, or any set of characters that are
    so broken and useless in Swift 4 that it isn’t worth worrying about.
     - The proposal is likely to turn a large number of code points into rejected
    characters. In the discussions, some people will be tempted to argue
    endlessly about individual rejections. To control that, we can require that
    people point out an example where the character is already in use, or where
    it has a clear application to a domain that is known today: the discussion
    needs to be grounded and practical, not theoretical.

    Third, if there is interest sometime in the future, we can have subsequent
    proposals that expand the range of accepted code points, motivated by the
    specific application domain that cares about them. These proposals will not
    be source breaking, so they can happen at any time.

    Is anyone interested in helping to push this effort forward?

    -Chris

    _______________________________________________
    swift-evolution mailing list
    swift-evolution@swift.org <mailto:swift-evolution@swift.org>
    https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

xwu · October 2, 2017, 4:14pm

What is your use case for this?

···

On Mon, Oct 2, 2017 at 10:56 David Sweeris via swift-evolution < swift-evolution@swift.org> wrote:

On Oct 1, 2017, at 22:01, Chris Lattner via swift-evolution < > swift-evolution@swift.org> wrote:

On Oct 1, 2017, at 9:26 PM, Kenny Leung via swift-evolution < > swift-evolution@swift.org> wrote:

Hi All.

I’d like to help as well. I have fun with operators.

There is also the issue of code security with invisible unicode characters
and characters that look exactly alike.

Unless there is a compelling reason to add them, I think we should ban
invisible characters. What is the harm of characters that look alike?

Especially if people want to use the character in question as both an
identifier and an operator: We can make the character an identifier and its
lookalike an operator (or the other way around).

- Dave Sweeris
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

David_Sweeris · October 2, 2017, 5:58pm

Off the top of my head...
In calculus, “𝖽” (MATHEMATICAL SANS-SERIF SMALL D) would be a fine substitute for "d" in “𝖽y/𝖽x” ("the derivative of y(x) with respect to x").
In statistics, we could use "𝖢" (MATHEMATICAL SANS-SERIF CAPITAL C), as in "5𝖢3" to mimic the "5C3" notation ("5 choose 3"). And although not strictly an issue of identifiers vs operators, “！” (FULLWIDTH EXCLAMATION MARK) would be an ok substitution (that extra space on the right looks funny) for "!" in “4！” ("4 factorial").

I'm sure there are other examples from math/science/<insert any "symbology"-heavy DSL here>, but “d” in particular is one that I’ve wanted for a while since Swift classifies "∂" (the partial derivative operator) as an operator rather than an identifier, making it impossible to use a consistent syntax between normal derivatives and partial derivatives (normal derivatives are "d(y)/d(x)", whereas partial derivatives get to drop the parens "∂y/∂x")

- Dave Sweeris

···

On Oct 2, 2017, at 09:14, Xiaodi Wu <xiaodi.wu@gmail.com <mailto:xiaodi.wu@gmail.com>> wrote:

What is your use case for this?

On Mon, Oct 2, 2017 at 10:56 David Sweeris via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Oct 1, 2017, at 22:01, Chris Lattner via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Oct 1, 2017, at 9:26 PM, Kenny Leung via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Hi All.

I’d like to help as well. I have fun with operators.

There is also the issue of code security with invisible unicode characters and characters that look exactly alike.

Unless there is a compelling reason to add them, I think we should ban invisible characters. What is the harm of characters that look alike?

Especially if people want to use the character in question as both an identifier and an operator: We can make the character an identifier and its lookalike an operator (or the other way around).

xwu · October 2, 2017, 11:56am

This is why I’m advocating for the sections of the previous draft that deal
with this issue to be maintained going forward. In that document and in the
links provided in that document, there are very extensive previous
discussions on lookalike characters and invisibles.

No need to rehash this very complex topic again. I will just say that are
languages for which invisible modifiers are essential, but there are
well-defined Unicode guidelines about restricting their use so as to
maximize security without impeding legitimate use cases. Lookalikes are
dealt with by Unicode in several flavors, and again the previous draft
discusses why a certain flavor of normalization is most appropriate for
Swift.

···

On Mon, Oct 2, 2017 at 03:13 Félix Cloutier via swift-evolution < swift-evolution@swift.org> wrote:

If you tried hard enough, you could probably create a variable that looks
like it's shadowing one from an outer scope while it actually isn't, and
use the two to confuse readers. This could trick people into thinking that
some dangerous/backdoor code is actually good and safe, especially in the
open-source world where you can't always trust your contributors.

On one hand, other than the complexity of telling if two characters are
lookalikes, I don't know why Αrray (GREEK CAPITAL LETTER ALPHA) and Array
(LATIN CAPITAL LETTER A) should be considered different identifiers. On the
other hand, I struggle to imagine the specifics of an exploit that uses
that. You'd have to work pretty hard to assemble all the pieces of a
backdoor in visually-similar variable names without arousing suspicion.

Félix

Le 1 oct. 2017 à 22:30, Kenny Leung via swift-evolution < > swift-evolution@swift.org> a écrit :

I guess theoretically you could have two variables that look alike, but
are actually different values, allowing you to insert some obfuscated
malicious code somehow.

-Kenny

On Oct 1, 2017, at 10:01 PM, Chris Lattner <clattner@nondot.org> wrote:

On Oct 1, 2017, at 9:26 PM, Kenny Leung via swift-evolution < > swift-evolution@swift.org> wrote:

Hi All.

I’d like to help as well. I have fun with operators.

There is also the issue of code security with invisible unicode characters
and characters that look exactly alike.

Unless there is a compelling reason to add them, I think we should ban
invisible characters. What is the harm of characters that look alike?

-Chris

(They should make a Coding font that ensures all characters look
different.) Was that ever resolved? Googling, I found this:

[swift-evolution] Prohibit invisible characters in identifier names

Which seems to have been left at this:

[swift-evolution] [Proposal] Normalize Unicode Identifiers

The swift-evolution The Week Of Monday 19 September 2016 Archive by thread

Should we throw all of this into the same pot, and make any characters
that aren’t on the approved list illegal?

-Kenny

On Sep 30, 2017, at 4:13 PM, Xiaodi Wu via swift-evolution < > swift-evolution@swift.org> wrote:

I’m happy to participate in the reshaping of the proposal. It would be
nice to gather a group of people again to help drive it forward.

That said, it’s unclear to me that superscript T is clearly an operator,
any more than would be superscript H (Hermitian), superscript 2,
superscript 3, etc. But at any rate, this would be discussion for the
future workgroup.

I would strongly advocate that the things-that-are-identifiers group be
strongly tied to the existing, complete Unicode standard for such, and that
the critical parts of the previous document about normalization be retained.

On Sat, Sep 30, 2017 at 17:59 Chris Lattner via swift-evolution < > swift-evolution@swift.org> wrote:

The core team recently met to discuss PR609 - Refining identifier and
operator symbology:

https://github.com/xwu/swift-evolution/blob/7c2c4df63b1d92a1677461f41bc638f31926c9c3/proposals/NNNN-refining-identifier-and-operator-symbology.md

The proposal correctly observes that the partitioning of unicode
codepoints into identifiers and operators is a mess in some cases. It
really is an outright bug for to be an identifier, but to be an
operator. That said, the proposal itself is complicated and is defined in
terms of a bunch of unicode classes that may evolve in the “wrong way for
Swift” in the future.

The core team would really like to get this sorted out for Swift 5, and
sooner is better than later :-). Because it seems that this is a really
hard problem and that perfection is becoming the enemy of good
<https://en.wikipedia.org/wiki/Perfect_is_the_enemy_of_good>, the core
team requests the creation of a new proposal with a different approach.
The general observation is that there are three kinds of characters: things
that are obviously identifiers, things that are obviously math operators,
and things that are non-obvious. Things that are non-obvious can be made
into invalid code points, and legislated later in follow-up proposals
if/when someone cares to argue for them.

To make progress on this, we suggest a few separable steps:

First, please split out the changes to the ASCII characters (e.g. . and \
operator parsing rules) to its own (small) proposal, since it is unrelated
to the unicode changes, and can make progress on that proposal
independently.

Second, someone should take a look at the concrete set of unicode
identifiers that are accepted by Swift 4 and write a new proposal that
splits them into the three groups: those that are clearly identifiers
(which become identifiers), those that are clearly operators (which become
operators), and those that are unclear or don’t matter (these become
invalid code points).

I suggest that the criteria be based on *utility for Swift code*, not on
the underlying unicode classification. For example, the discussion thread
for PR609 mentions that the T character in “ xᵀ ” is defined in unicode
as a latin “letter”. Despite that, its use is Swift would clearly be as a
postfix operator, so we should classify it as an operator.

Other suggestions:
- Math symbols are operators excepting those primarily used as
identifiers like “alpha”. If there are any characters that are used for
both, this proposal should make them invalid.
- While there may be useful ranges for some identifiers (e.g. to handle
european accented characters), the Emoji range should probably have each
codepoint independently judged, and currently unassigned codepoints should
not get a meaning defined for them.
- Unicode “faces”, “people”, “animals” etc are all identifiers.
- In order to reduce the scope of the proposal, it is a safe default to
exclude characters that are unlikely to be used by Swift code today,
including Braille, weird currency symbols, or any set of characters that
are so broken and useless in Swift 4 that it isn’t worth worrying about.
- The proposal is likely to turn a large number of code points into
rejected characters. In the discussions, some people will be tempted to
argue endlessly about individual rejections. To control that, we can
require that people point out an example where the character is already in
use, or where it has a clear application to a domain that is known today:
the discussion needs to be grounded and practical, not theoretical.

Third, if there is interest sometime in the future, we can have
subsequent proposals that expand the range of accepted code points,
motivated by the specific application domain that cares about them. These
proposals will not be source breaking, so they can happen at any time.

Is anyone interested in helping to push this effort forward?

-Chris

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

xwu · October 2, 2017, 10:24pm

Allowing a custom operator that looks like `!` to be anything other than
the force-unwrap operator would be unwise, IMO, and not a desirable goal.
Likewise characters that look like `d` not being the character `d`, etc. In
the previous PR, the authors deliberately created a system where these will
not be possible.

I think we should specify from the outset of re-examining this topic that
supporting arbitrary math/science notation without demonstrable improvement
in code clarity for actual, Swift code is a non-goal. Since manipulating
matrices is a common programming task, and the current BLAS syntax is
terribly cumbersome, being able to use operators for matrix multiplication,
inversion, etc. is imminently reasonable. Having a way of writing
`4.factorial()` that looks like an equation in a math textbook, however,
wouldn't pass that bar.

···

On Mon, Oct 2, 2017 at 12:58 PM, David Sweeris <davesweeris@mac.com> wrote:

On Oct 2, 2017, at 09:14, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

What is your use case for this?

On Mon, Oct 2, 2017 at 10:56 David Sweeris via swift-evolution < > swift-evolution@swift.org> wrote:

On Oct 1, 2017, at 22:01, Chris Lattner via swift-evolution < >> swift-evolution@swift.org> wrote:

On Oct 1, 2017, at 9:26 PM, Kenny Leung via swift-evolution < >> swift-evolution@swift.org> wrote:

Hi All.

I’d like to help as well. I have fun with operators.

There is also the issue of code security with invisible unicode
characters and characters that look exactly alike.

Unless there is a compelling reason to add them, I think we should ban
invisible characters. What is the harm of characters that look alike?

Especially if people want to use the character in question as both an
identifier and an operator: We can make the character an identifier and its
lookalike an operator (or the other way around).

Off the top of my head...
In calculus, “𝖽” (MATHEMATICAL SANS-SERIF SMALL D) would be a fine
substitute for "d" in “𝖽y/𝖽x” ("the derivative of y(x) with respect to
x").
In statistics, we could use "𝖢" (MATHEMATICAL SANS-SERIF CAPITAL C), as
in "5𝖢3" to mimic the "5C3" notation ("5 choose 3"). And although not
strictly an issue of identifiers vs operators, “！” (FULLWIDTH EXCLAMATION
MARK) would be an ok substitution (that extra space on the right looks
funny) for "!" in “4！” ("4 factorial").

I'm sure there are other examples from math/science/<insert any
"symbology"-heavy DSL here>, but “d” in particular is one that I’ve wanted
for a while since Swift classifies "∂" (the partial derivative operator) as
an operator rather than an identifier, making it impossible to use a
consistent syntax between normal derivatives and partial derivatives
(normal derivatives are "d(y)/d(x)", whereas partial derivatives get to
drop the parens "∂y/∂x")

Ethan_Tira-Thompson · October 3, 2017, 12:28am

I’m all for fixing pressing issues requested by Xiaodi, but beyond that I request we give a little more thought to the long term direction.

My 2¢ is I’ve been convinced that very few characters are “obviously” either a operator or identifier across all contexts where they might be used. Thus relegating the vast majority of thousands of ambiguous characters to committee to decide a single global usage. But that is both a huge time sink and fundamentally flawed in approach due to the contextual dependency of who is using them.

For example, if a developer finds a set of symbols which perfectly denote some niche concept, do you really expect the developer to submit a proposal and wait months/years to get the characters classified and then a new compiler version to be distributed, all so that developer can adopt his/her own notation?

And then after that is done, now say a member of some distant tribe complains they wanted to use one of those characters to write identifiers using their native language. Even though there may be zero intersection between these two user groups, this path forces Swift itself to pick a side of one vs. the other.

Surely there is some way to enable the local developer to resolve these choices rather than putting the swift language definition on the critical path?

The goals I know of:
1. Performance: don’t require parsing all imports to get the operator set
2. Security: don’t let imports do surprising/obfuscated stuff
3. Functionality: do let users write what they want, or import/share libraries for niche domains
4. Well defined: resolve conflicts, e.g. between libraries

I’m a little out of my league, but let’s say we want to use operator ᵀ from some matrixlib, how about:
import matrixlib (operator: ᵀ)

Or if you want several operators:
import matrixlib (operators: [ᵀ,·,⊗])

Ideally, any local operator definitions “just work” across their own module, but if it requires a “import (operator: ×)” in each file for performance, so be it.

A whitelist of “standard” operators would automatically import (i.e. initialize the operator character list) to maintain compatibility with current usage. But you can imagine additional arguments to the import call, such as “standardOperators: false” to import only the explicitly listed operators and reduce potential surprises.

My rationale vs. the goals:
1. Performance: the operator character set vs. identifiers (everything else) can be determined within the file itself
2. Security: developer explicitly opts-in to the special operators they want to use, and readers can see where an operator comes from
3. Functionality: user is able to define their operators without getting committee involved
4. Well defined: potential conflict between libraries resolved by client’s choice to import or exclude the operator

Does this have potential?

-Ethan

···

On Oct 2, 2017, at 10:59 AM, David Sweeris via swift-evolution <swift-evolution@swift.org> wrote:

On Oct 2, 2017, at 09:14, Xiaodi Wu <xiaodi.wu@gmail.com <mailto:xiaodi.wu@gmail.com>> wrote:

What is your use case for this?

On Mon, Oct 2, 2017 at 10:56 David Sweeris via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Oct 1, 2017, at 22:01, Chris Lattner via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Oct 1, 2017, at 9:26 PM, Kenny Leung via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Hi All.

I’d like to help as well. I have fun with operators.

There is also the issue of code security with invisible unicode characters and characters that look exactly alike.

Unless there is a compelling reason to add them, I think we should ban invisible characters. What is the harm of characters that look alike?

Especially if people want to use the character in question as both an identifier and an operator: We can make the character an identifier and its lookalike an operator (or the other way around).

Off the top of my head...
In calculus, “𝖽” (MATHEMATICAL SANS-SERIF SMALL D) would be a fine substitute for "d" in “𝖽y/𝖽x” ("the derivative of y(x) with respect to x").
In statistics, we could use "𝖢" (MATHEMATICAL SANS-SERIF CAPITAL C), as in "5𝖢3" to mimic the "5C3" notation ("5 choose 3"). And although not strictly an issue of identifiers vs operators, “！” (FULLWIDTH EXCLAMATION MARK) would be an ok substitution (that extra space on the right looks funny) for "!" in “4！” ("4 factorial").

I'm sure there are other examples from math/science/<insert any "symbology"-heavy DSL here>, but “d” in particular is one that I’ve wanted for a while since Swift classifies "∂" (the partial derivative operator) as an operator rather than an identifier, making it impossible to use a consistent syntax between normal derivatives and partial derivatives (normal derivatives are "d(y)/d(x)", whereas partial derivatives get to drop the parens "∂y/∂x")

- Dave Sweeris
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

Chris_Lattner · October 3, 2017, 4:40am

I don’t think this is something we have to try hard to avoid. It is true that some characters look similar, particularly in some fonts, but this isn’t new:

   let a1 = 42
   let al = 12
   let b = al + a1

If there were real code that was maliciously shadowing to try to cause confusion, then you have a more serious problem on your hands than someone accidentally misunderstanding which one to use.

All I’m saying is that we shouldn’t complicate the design to solve this problem (IMO). If it falls out of the solution somehow (e.g. just disallow invisible characters) then that’s great of course!

-Chris

···

On Oct 2, 2017, at 1:13 AM, Félix Cloutier via swift-evolution <swift-evolution@swift.org> wrote:

If you tried hard enough, you could probably create a variable that looks like it's shadowing one from an outer scope while it actually isn't, and use the two to confuse readers. This could trick people into thinking that some dangerous/backdoor code is actually good and safe, especially in the open-source world where you can't always trust your contributors.

On one hand, other than the complexity of telling if two characters are lookalikes, I don't know why Αrray (GREEK CAPITAL LETTER ALPHA) and Array (LATIN CAPITAL LETTER A) should be considered different identifiers. On the other hand, I struggle to imagine the specifics of an exploit that uses that. You'd have to work pretty hard to assemble all the pieces of a backdoor in visually-similar variable names without arousing suspicion.

Jon_Gilbert1 · October 3, 2017, 12:38am

Ethan-

This is brilliant. I don’t know if it’s technically feasible, but this is how it *should be.*

- Jonathan

···

On Oct 2, 2017, at 17:28, Ethan Tira-Thompson via swift-evolution <swift-evolution@swift.org> wrote:

I’m all for fixing pressing issues requested by Xiaodi, but beyond that I request we give a little more thought to the long term direction.

My 2¢ is I’ve been convinced that very few characters are “obviously” either a operator or identifier across all contexts where they might be used. Thus relegating the vast majority of thousands of ambiguous characters to committee to decide a single global usage. But that is both a huge time sink and fundamentally flawed in approach due to the contextual dependency of who is using them.

For example, if a developer finds a set of symbols which perfectly denote some niche concept, do you really expect the developer to submit a proposal and wait months/years to get the characters classified and then a new compiler version to be distributed, all so that developer can adopt his/her own notation?

And then after that is done, now say a member of some distant tribe complains they wanted to use one of those characters to write identifiers using their native language. Even though there may be zero intersection between these two user groups, this path forces Swift itself to pick a side of one vs. the other.

Surely there is some way to enable the local developer to resolve these choices rather than putting the swift language definition on the critical path?

The goals I know of:
1. Performance: don’t require parsing all imports to get the operator set
2. Security: don’t let imports do surprising/obfuscated stuff
3. Functionality: do let users write what they want, or import/share libraries for niche domains
4. Well defined: resolve conflicts, e.g. between libraries

I’m a little out of my league, but let’s say we want to use operator ᵀ from some matrixlib, how about:
import matrixlib (operator: ᵀ)

Or if you want several operators:
import matrixlib (operators: [ᵀ,·,⊗])

Ideally, any local operator definitions “just work” across their own module, but if it requires a “import (operator: ×)” in each file for performance, so be it.

A whitelist of “standard” operators would automatically import (i.e. initialize the operator character list) to maintain compatibility with current usage. But you can imagine additional arguments to the import call, such as “standardOperators: false” to import only the explicitly listed operators and reduce potential surprises.

My rationale vs. the goals:
1. Performance: the operator character set vs. identifiers (everything else) can be determined within the file itself
2. Security: developer explicitly opts-in to the special operators they want to use, and readers can see where an operator comes from
3. Functionality: user is able to define their operators without getting committee involved
4. Well defined: potential conflict between libraries resolved by client’s choice to import or exclude the operator

Does this have potential?

-Ethan

On Oct 2, 2017, at 10:59 AM, David Sweeris via swift-evolution <swift-evolution@swift.org> wrote:

On Oct 2, 2017, at 09:14, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

What is your use case for this?

On Mon, Oct 2, 2017 at 10:56 David Sweeris via swift-evolution <swift-evolution@swift.org> wrote:

On Oct 1, 2017, at 22:01, Chris Lattner via swift-evolution <swift-evolution@swift.org> wrote:

On Oct 1, 2017, at 9:26 PM, Kenny Leung via swift-evolution <swift-evolution@swift.org> wrote:

Hi All.

I’d like to help as well. I have fun with operators.

There is also the issue of code security with invisible unicode characters and characters that look exactly alike.

Unless there is a compelling reason to add them, I think we should ban invisible characters. What is the harm of characters that look alike?

Especially if people want to use the character in question as both an identifier and an operator: We can make the character an identifier and its lookalike an operator (or the other way around).

Off the top of my head...
In calculus, “𝖽” (MATHEMATICAL SANS-SERIF SMALL D) would be a fine substitute for "d" in “𝖽y/𝖽x” ("the derivative of y(x) with respect to x").
In statistics, we could use "𝖢" (MATHEMATICAL SANS-SERIF CAPITAL C), as in "5𝖢3" to mimic the "5C3" notation ("5 choose 3"). And although not strictly an issue of identifiers vs operators, “！” (FULLWIDTH EXCLAMATION MARK) would be an ok substitution (that extra space on the right looks funny) for "!" in “4！” ("4 factorial").

I'm sure there are other examples from math/science/<insert any "symbology"-heavy DSL here>, but “d” in particular is one that I’ve wanted for a while since Swift classifies "∂" (the partial derivative operator) as an operator rather than an identifier, making it impossible to use a consistent syntax between normal derivatives and partial derivatives (normal derivatives are "d(y)/d(x)", whereas partial derivatives get to drop the parens "∂y/∂x")

- Dave Sweeris
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

David_Sweeris · October 3, 2017, 2:01am

I gave up on trying to get the restrictions on the normal "!" removed a while ago... I guess we'll just have to agree to disagree on !-lookalikes.

Supporting arbitrary math/science notation, though, is almost by definition an increase in code clarity for the people who are used to it. Is that everyone? Of course not. Is it the majority? I doubt it; the days when Computer Science was part of the Math department are long gone, and it's common for people to become developers without getting any formal education in the field at all (which is great, IMHO... that means we're successfully making computing more accessible). That doesn't seem to me like a good reason not to support such symbolic notations, though. I'm not suggesting a change to the standard library here, to be forced on everyone -- I'm merely suggesting a way to help people who prefer the more symbol-heavy notations to use them if they and their teams (and their clients, if they're a library vendor) want to.

I would never claim that the particular cases I raised are “critical to Swift's long-term success” or anything (I think the # of people who care about "𝖽y" vs "d(y)" enough to let it dictate their language choice is probably zero), but I would like to point out that a few of the threads here have demonstrated just how differing the opinions are on this matter even within the relatively small group of people who participate on this list. If Swift’s long-term goal is to take over the world, that means the language needs to “work” for very diverse groups of people... We probably shouldn’t be restricting syntax at the language level unless we actually have to.

- Dave Sweeris

···

On Oct 2, 2017, at 3:24 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

On Mon, Oct 2, 2017 at 12:58 PM, David Sweeris <davesweeris@mac.com> wrote:

On Oct 2, 2017, at 09:14, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

What is your use case for this?

On Mon, Oct 2, 2017 at 10:56 David Sweeris via swift-evolution <swift-evolution@swift.org> wrote:

On Oct 1, 2017, at 22:01, Chris Lattner via swift-evolution <swift-evolution@swift.org> wrote:

On Oct 1, 2017, at 9:26 PM, Kenny Leung via swift-evolution <swift-evolution@swift.org> wrote:

Hi All.

I’d like to help as well. I have fun with operators.

There is also the issue of code security with invisible unicode characters and characters that look exactly alike.

Unless there is a compelling reason to add them, I think we should ban invisible characters. What is the harm of characters that look alike?

Especially if people want to use the character in question as both an identifier and an operator: We can make the character an identifier and its lookalike an operator (or the other way around).

Off the top of my head...
In calculus, “𝖽” (MATHEMATICAL SANS-SERIF SMALL D) would be a fine substitute for "d" in “𝖽y/𝖽x” ("the derivative of y(x) with respect to x").
In statistics, we could use "𝖢" (MATHEMATICAL SANS-SERIF CAPITAL C), as in "5𝖢3" to mimic the "5C3" notation ("5 choose 3"). And although not strictly an issue of identifiers vs operators, “！” (FULLWIDTH EXCLAMATION MARK) would be an ok substitution (that extra space on the right looks funny) for "!" in “4！” ("4 factorial").

I'm sure there are other examples from math/science/<insert any "symbology"-heavy DSL here>, but “d” in particular is one that I’ve wanted for a while since Swift classifies "∂" (the partial derivative operator) as an operator rather than an identifier, making it impossible to use a consistent syntax between normal derivatives and partial derivatives (normal derivatives are "d(y)/d(x)", whereas partial derivatives get to drop the parens "∂y/∂x")

I think we should specify from the outset of re-examining this topic that supporting arbitrary math/science notation without demonstrable improvement in code clarity for actual, Swift code is a non-goal.

Chris_Lattner · October 3, 2017, 4:44am

Especially if people want to use the character in question as both an identifier and an operator: We can make the character an identifier and its lookalike an operator (or the other way around).

Off the top of my head...
In calculus, “𝖽” (MATHEMATICAL SANS-SERIF SMALL D) would be a fine substitute for "d" in “𝖽y/𝖽x” ("the derivative of y(x) with respect to x").
In statistics, we could use "𝖢" (MATHEMATICAL SANS-SERIF CAPITAL C), as in "5𝖢3" to mimic the "5C3" notation ("5 choose 3"). And although not strictly an issue of identifiers vs operators, “！” (FULLWIDTH EXCLAMATION MARK) would be an ok substitution (that extra space on the right looks funny) for "!" in “4！” ("4 factorial").

I'm sure there are other examples from math/science/<insert any "symbology"-heavy DSL here>, but “d” in particular is one that I’ve wanted for a while since Swift classifies "∂" (the partial derivative operator) as an operator rather than an identifier, making it impossible to use a consistent syntax between normal derivatives and partial derivatives (normal derivatives are "d(y)/d(x)", whereas partial derivatives get to drop the parens "∂y/∂x")

Allowing a custom operator that looks like `!` to be anything other than the force-unwrap operator would be unwise, IMO, and not a desirable goal. Likewise characters that look like `d` not being the character `d`, etc. In the previous PR, the authors deliberately created a system where these will not be possible.

I completely agree with Xiaodi here. Even if this were technically possible by the rules that are defined, it would still be very poor form to do this. It would violate the swift goal of clarity of code. If these operations need to be an operator, it would be better to define a *new* operator for these operations, so that a human at least knows that they don’t know what the operation does - rather than being misled into thinking they are a familiar construct.

I think we should specify from the outset of re-examining this topic that supporting arbitrary math/science notation without demonstrable improvement in code clarity for actual, Swift code is a non-goal. Since manipulating matrices is a common programming task, and the current BLAS syntax is terribly cumbersome, being able to use operators for matrix multiplication, inversion, etc. is imminently reasonable. Having a way of writing `4.factorial()` that looks like an equation in a math textbook, however, wouldn't pass that bar.

+100. Clarity is the important thing, and sometimes operators are the right way to get that. Emulating math syntax exactly is a non-goal.

-Chris

···

On Oct 2, 2017, at 3:24 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

xwu · October 3, 2017, 12:45am

I’m all for fixing pressing issues requested by Xiaodi, but beyond that I
request we give a little more thought to the long term direction.

My 2¢ is I’ve been convinced that very few characters are “obviously”
either a operator or identifier across all contexts where they might be
used. Thus relegating the vast majority of thousands of ambiguous
characters to committee to decide a single global usage. But that is both
a huge time sink and fundamentally flawed in approach due to the contextual
dependency of who is using them.

For example, if a developer finds a set of symbols which perfectly denote
some niche concept, do you really expect the developer to submit a proposal
and wait months/years to get the characters classified and then a new
compiler version to be distributed, all so that developer can adopt his/her
own notation?

The Unicode Consortium already has a document describing which Unicode
characters are suitable identifiers in programming languages, with guidance
as to how to customize that list around the edges. This is already adopted
by other programming languages. So, with little design effort, that task is
not only doable but largely done.

As to operators, again, I am of the strong opinion that making it possible
for developers to adopt any preferred notation for any purpose (a) is
fundamentally incompatible with the division between operators and
identifiers, as I believe you’re saying here; and (b) should be a non-goal
from the outset. The only task, so far as I can tell, left to do is to
identify what pragmatic set of (mostly mathematical) symbols are used as
operators in the wider world and are likely to be already used in Swift
code or part of common use cases where an operator is clearly superior to
alternative spellings. In my view, the set of valid operator characters not
only shouldn’t require parsing or import directives, but should be small
enough to be knowable by memory.

And then after that is done, now say a member of some distant tribe

···

On Mon, Oct 2, 2017 at 19:28 Ethan Tira-Thompson via swift-evolution < swift-evolution@swift.org> wrote:

complains they wanted to use one of those characters to write identifiers
using their native language. Even though there may be zero intersection
between these two user groups, this path forces Swift itself to pick a side
of one vs. the other.

Surely there is some way to enable the local developer to resolve these
choices rather than putting the swift language definition on the critical
path?

The goals I know of:
1. Performance: don’t require parsing all imports to get the operator set
2. Security: don’t let imports do surprising/obfuscated stuff
3. Functionality: do let users write what they want, or import/share
libraries for niche domains
4. Well defined: resolve conflicts, e.g. between libraries

I’m a little out of my league, but let’s say we want to use operator ᵀ
from some matrixlib, how about:
import matrixlib (operator: ᵀ)

Or if you want several operators:
import matrixlib (operators: [ᵀ,·,⊗])

Ideally, any local operator definitions “just work” across their own
module, but if it requires a “import (operator: ×)” in each file for
performance, so be it.

A whitelist of “standard” operators would automatically import (i.e.
initialize the operator character list) to maintain compatibility with
current usage. But you can imagine additional arguments to the import
call, such as “standardOperators: false” to import only the explicitly
listed operators and reduce potential surprises.

My rationale vs. the goals:
1. Performance: the operator character set vs. identifiers (everything
else) can be determined within the file itself
2. Security: developer explicitly opts-in to the special operators they
want to use, and readers can see where an operator comes from
3. Functionality: user is able to define their operators without getting
committee involved
4. Well defined: potential conflict between libraries resolved by client’s
choice to import or exclude the operator

Does this have potential?

-Ethan

On Oct 2, 2017, at 10:59 AM, David Sweeris via swift-evolution < > swift-evolution@swift.org> wrote:

On Oct 2, 2017, at 09:14, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

What is your use case for this?

On Mon, Oct 2, 2017 at 10:56 David Sweeris via swift-evolution < > swift-evolution@swift.org> wrote:

On Oct 1, 2017, at 22:01, Chris Lattner via swift-evolution < >> swift-evolution@swift.org> wrote:

On Oct 1, 2017, at 9:26 PM, Kenny Leung via swift-evolution < >> swift-evolution@swift.org> wrote:

Hi All.

I’d like to help as well. I have fun with operators.

There is also the issue of code security with invisible unicode
characters and characters that look exactly alike.

Unless there is a compelling reason to add them, I think we should ban
invisible characters. What is the harm of characters that look alike?

Especially if people want to use the character in question as both an
identifier and an operator: We can make the character an identifier and its
lookalike an operator (or the other way around).

Off the top of my head...
In calculus, “𝖽” (MATHEMATICAL SANS-SERIF SMALL D) would be a fine
substitute for "d" in “𝖽y/𝖽x” ("the derivative of y(x) with respect to
x").
In statistics, we could use "𝖢" (MATHEMATICAL SANS-SERIF CAPITAL C), as
in "5𝖢3" to mimic the "5C3" notation ("5 choose 3"). And although not
strictly an issue of identifiers vs operators, “！” (FULLWIDTH EXCLAMATION
MARK) would be an ok substitution (that extra space on the right looks
funny) for "!" in “4！” ("4 factorial").

I'm sure there are other examples from math/science/<insert any
"symbology"-heavy DSL here>, but “d” in particular is one that I’ve wanted
for a while since Swift classifies "∂" (the partial derivative operator) as
an operator rather than an identifier, making it impossible to use a
consistent syntax between normal derivatives and partial derivatives
(normal derivatives are "d(y)/d(x)", whereas partial derivatives get to
drop the parens "∂y/∂x")

- Dave Sweeris

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

fclout · October 3, 2017, 4:43pm

If you tried hard enough, you could probably create a variable that looks like it's shadowing one from an outer scope while it actually isn't, and use the two to confuse readers. This could trick people into thinking that some dangerous/backdoor code is actually good and safe, especially in the open-source world where you can't always trust your contributors.

On one hand, other than the complexity of telling if two characters are lookalikes, I don't know why Αrray (GREEK CAPITAL LETTER ALPHA) and Array (LATIN CAPITAL LETTER A) should be considered different identifiers. On the other hand, I struggle to imagine the specifics of an exploit that uses that. You'd have to work pretty hard to assemble all the pieces of a backdoor in visually-similar variable names without arousing suspicion.

I don’t think this is something we have to try hard to avoid. It is true that some characters look similar, particularly in some fonts, but this isn’t new:

  let a1 = 42
  let al = 12
  let b = al + a1

There is a fundamental difference between similar characters and characters that are meant to be visually identical. People judge the quality of a font by its Unicode support, and that means that only "low-quality" fonts would render, say, LATIN CAPITAL LETTER T and GREEK CAPITAL LETTER TAU differently.

If there were real code that was maliciously shadowing to try to cause confusion, then you have a more serious problem on your hands than someone accidentally misunderstanding which one to use.

I'm not sure I understand. If the "more serious problem" you're talking about is that your popular project is a valuable target to subvert, then there is no question that being backdoored would be more serious than people not reading your code right. I don't see how it pushes the problem out of scope, though.

As a security guy, I take my role of thinking about how anything can be abused very seriously. Backdoored open source projects turn up every now and then.

This code is backdoored. I challenge you to spot the bug:

func shellEscape(_ args: [String]) -> [String]?
func isWhitelisted(_ tool: String) -> Bool

func execute(externalTool: String, parameters: [String]) {
    if isWhitelisted(externalTool), let pаrameters = shellEscape(parameters) {
        print("Running tool \(pаrameters[0])")"
        system(parameters.joined(separator: " "))
    }
}

All I’m saying is that we shouldn’t complicate the design to solve this problem (IMO). If it falls out of the solution somehow (e.g. just disallow invisible characters) then that’s great of course!

How did you identify the bug in the snippet from above? Is it practical enough that you would, for instance, recommend that the server group do that test on every PR that they receive going forward?

I think that it's hard to build something meaningful without making it look suspicious. It's already kind of fishy that my shellEscape function returns an Optional, and people will eventually figure out that the parameters are not, in fact, shell-escaped. Still, I feel that it should be recognized that security is more than buffer overflows and integer overflows, and if there ever is an underhanded Swift code contest, that'll be my entry.

Félix

···

Le 2 oct. 2017 à 21:40, Chris Lattner <clattner@nondot.org> a écrit :

On Oct 2, 2017, at 1:13 AM, Félix Cloutier via swift-evolution <swift-evolution@swift.org> wrote:

David_Sweeris · October 3, 2017, 2:04am

The set notation operators should be identifiers, then? Because the impression I got from the Set Algebra proposal a few months ago is that there are a lot of people who’ve never even seen those operators, let alone memorized them.

- Dave Sweeris

···

On Oct 2, 2017, at 5:45 PM, Xiaodi Wu via swift-evolution <swift-evolution@swift.org> wrote:

On Mon, Oct 2, 2017 at 19:28 Ethan Tira-Thompson via swift-evolution <swift-evolution@swift.org> wrote:
I’m all for fixing pressing issues requested by Xiaodi, but beyond that I request we give a little more thought to the long term direction.

My 2¢ is I’ve been convinced that very few characters are “obviously” either a operator or identifier across all contexts where they might be used. Thus relegating the vast majority of thousands of ambiguous characters to committee to decide a single global usage. But that is both a huge time sink and fundamentally flawed in approach due to the contextual dependency of who is using them.

For example, if a developer finds a set of symbols which perfectly denote some niche concept, do you really expect the developer to submit a proposal and wait months/years to get the characters classified and then a new compiler version to be distributed, all so that developer can adopt his/her own notation?

The Unicode Consortium already has a document describing which Unicode characters are suitable identifiers in programming languages, with guidance as to how to customize that list around the edges. This is already adopted by other programming languages. So, with little design effort, that task is not only doable but largely done.

As to operators, again, I am of the strong opinion that making it possible for developers to adopt any preferred notation for any purpose (a) is fundamentally incompatible with the division between operators and identifiers, as I believe you’re saying here; and (b) should be a non-goal from the outset. The only task, so far as I can tell, left to do is to identify what pragmatic set of (mostly mathematical) symbols are used as operators in the wider world and are likely to be already used in Swift code or part of common use cases where an operator is clearly superior to alternative spellings. In my view, the set of valid operator characters not only shouldn’t require parsing or import directives, but should be small enough to be knowable by memory.

David_Sweeris · October 3, 2017, 4:11am

I’m all for fixing pressing issues requested by Xiaodi, but beyond that I request we give a little more thought to the long term direction.

My 2¢ is I’ve been convinced that very few characters are “obviously” either a operator or identifier across all contexts where they might be used. Thus relegating the vast majority of thousands of ambiguous characters to committee to decide a single global usage. But that is both a huge time sink and fundamentally flawed in approach due to the contextual dependency of who is using them.

For example, if a developer finds a set of symbols which perfectly denote some niche concept, do you really expect the developer to submit a proposal and wait months/years to get the characters classified and then a new compiler version to be distributed, all so that developer can adopt his/her own notation?

The Unicode Consortium already has a document describing which Unicode characters are suitable identifiers in programming languages, with guidance as to how to customize that list around the edges. This is already adopted by other programming languages. So, with little design effort, that task is not only doable but largely done.

As to operators, again, I am of the strong opinion that making it possible for developers to adopt any preferred notation for any purpose (a) is fundamentally incompatible with the division between operators and identifiers, as I believe you’re saying here; and (b) should be a non-goal from the outset. The only task, so far as I can tell, left to do is to identify what pragmatic set of (mostly mathematical) symbols are used as operators in the wider world and are likely to be already used in Swift code or part of common use cases where an operator is clearly superior to alternative spellings. In my view, the set of valid operator characters not only shouldn’t require parsing or import directives, but should be small enough to be knowable by memory.

The set notation operators should be identifiers, then?

Set notation operators aren't valid identifier characters; to be clear, the alternative to being a valid operator character would be simply not listing that character among valid operator or identifier characters.

Because the impression I got from the Set Algebra proposal a few months ago is that there are a lot of people who’ve never even seen those operators, let alone memorized them.

That's not the impression I got; the argument was that these symbols are hard to type and _not more recognizable that the English text_, which is certainly a plausible argument and the appropriate bar for deciding on a standard library API name.

MHO is that the bar for a potentially valid operator character _for potential use in third-party APIs_ needn't be so high that we demand the character to be more recognizable to most people than alternative notations. Instead, we can probably justify including a character if it is (a) plausibly useful for some relatively common Swift use case and (b) at least somewhat recognizable for many people. Since set algebra has a well-accepted mathematical notation that's taught (afaict) at the _high school_ level if not earlier, and since set algebra functions are a part of the standard library, that surely meets those bars of usefulness and recognizability.

Maybe they've started teaching it earlier than when I went through school... I don't think I learned it until Discrete Math, which IIRC was a 2nd or 3rd year course at my college and only required for Math, CS, and maybe EE majors. Anyway, WRT a), if Swift achieves its "take over the world" goal, all use cases will be Swift use cases. WRT b), "many" as in the numerical quantity or "many" as in the percentage? There are probably millions of people who recognize calculus's operators, but there are 7.5 billion people in the world.

Keep in mind that Swift already goes far above and beyond in terms of operators

Yep, that's is a large part of why I'm such a Swift fan :-D

in that: (a) it allows overloading of almost all standard operators; (b) it permits the definition of effectively an infinite number of custom operators using characters found in standard operators; (c) it permits the definition of custom precedences for custom operators; and (d) it additionally permits the use of a wide number of Unicode characters for custom operators. Most systems programming languages don't even allow (a), let alone (b) or (c). Even dramatically curtailing (d) leaves Swift with an unusually expansive support for custom operators.

Yes, but many of those custom operators won't have a clear meaning because operators are rarely limited to pre-existing symbols like "++++++++" (which doesn't mean anything at all AFAIK), so operators that are widely known within some field probably won't be widely known to the general public, which, IIUC, seems to be your standard for inclusion(?). Please let me know if that's not your position... I hate being misunderstood probably more than the next person, and I wouldn't want to be guilty of that myself.

What it does conclusively foreclose is something which ought to be stated firmly as a non-goal, which is the typesetting of arbitrary mathematical equations as valid Swift code.

I'm not arguing for adding arbitrary typesetting (not now anyway... maybe in a decade or so when more important things have been dealt with). What I am arguing for is the ability to treat operators from <pick your field> as operators within Swift. As much as possible, anyway.

Quite simply, Swift is not math; simple addition doesn't even behave as it does in grade-school arithmetic,

Perhaps not for the built-in numeric types, but how do you know somebody won't create a type which does behave that way?

so there is no sense in attempting to shoehorn calculus into the language.

(I'm assuming you mean calculus's syntax, not calculus itself, right?) What's the point of having unicode support if half the symbols will get rejected because they aren't well-known enough? Sometimes only having token support for something can be worse than no support at all from the PoV of someone trying to do something that relies on it.

In any case, I only meant to point out a use-case for lookalikes, not spark a debate about whether we should support more than a handful of operators... shall we consider the idea withdrawn?

- Dave Sweeris

···

On Oct 2, 2017, at 7:57 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:
On Mon, Oct 2, 2017 at 9:04 PM, David Sweeris <davesweeris@mac.com <mailto:davesweeris@mac.com>> wrote:
On Oct 2, 2017, at 5:45 PM, Xiaodi Wu via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Mon, Oct 2, 2017 at 19:28 Ethan Tira-Thompson via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

xwu · October 3, 2017, 2:57am

I’m all for fixing pressing issues requested by Xiaodi, but beyond that I
request we give a little more thought to the long term direction.

My 2¢ is I’ve been convinced that very few characters are “obviously”
either a operator or identifier across all contexts where they might be
used. Thus relegating the vast majority of thousands of ambiguous
characters to committee to decide a single global usage. But that is both
a huge time sink and fundamentally flawed in approach due to the contextual
dependency of who is using them.

For example, if a developer finds a set of symbols which perfectly denote
some niche concept, do you really expect the developer to submit a proposal
and wait months/years to get the characters classified and then a new
compiler version to be distributed, all so that developer can adopt his/her
own notation?

The Unicode Consortium already has a document describing which Unicode
characters are suitable identifiers in programming languages, with guidance
as to how to customize that list around the edges. This is already adopted
by other programming languages. So, with little design effort, that task is
not only doable but largely done.

As to operators, again, I am of the strong opinion that making it possible
for developers to adopt any preferred notation for any purpose (a) is
fundamentally incompatible with the division between operators and
identifiers, as I believe you’re saying here; and (b) should be a non-goal
from the outset. The only task, so far as I can tell, left to do is to
identify what pragmatic set of (mostly mathematical) symbols are used as
operators in the wider world and are likely to be already used in Swift
code or part of common use cases where an operator is clearly superior to
alternative spellings. In my view, the set of valid operator characters not
only shouldn’t require parsing or import directives, but should be small
enough to be knowable by memory.

The set notation operators should be identifiers, then?

Set notation operators aren't valid identifier characters; to be clear, the
alternative to being a valid operator character would be simply not listing
that character among valid operator or identifier characters.

Because the impression I got from the Set Algebra proposal a few months
ago is that there are a lot of people who’ve never even seen those
operators, let alone memorized them.

That's not the impression I got; the argument was that these symbols are
hard to type and _not more recognizable that the English text_, which is
certainly a plausible argument and the appropriate bar for deciding on a
standard library API name.

MHO is that the bar for a potentially valid operator character _for
potential use in third-party APIs_ needn't be so high that we demand the
character to be more recognizable to most people than alternative
notations. Instead, we can probably justify including a character if it is
(a) plausibly useful for some relatively common Swift use case and (b) at
least somewhat recognizable for many people. Since set algebra has a
well-accepted mathematical notation that's taught (afaict) at the _high
school_ level if not earlier, and since set algebra functions are a part of
the standard library, that surely meets those bars of usefulness and
recognizability.

Keep in mind that Swift already goes far above and beyond in terms of
operators, in that: (a) it allows overloading of almost all standard
operators; (b) it permits the definition of effectively an infinite number
of custom operators using characters found in standard operators; (c) it
permits the definition of custom precedences for custom operators; and (d)
it additionally permits the use of a wide number of Unicode characters for
custom operators. Most systems programming languages don't even allow (a),
let alone (b) or (c). Even dramatically curtailing (d) leaves Swift with an
unusually expansive support for custom operators. What it does conclusively
foreclose is something which ought to be stated firmly as a non-goal, which
is the typesetting of arbitrary mathematical equations as valid Swift code.
Quite simply, Swift is not math; simple addition doesn't even behave as it
does in grade-school arithmetic, so there is no sense in attempting to
shoehorn calculus into the language.

···

On Mon, Oct 2, 2017 at 9:04 PM, David Sweeris <davesweeris@mac.com> wrote:

On Oct 2, 2017, at 5:45 PM, Xiaodi Wu via swift-evolution < > swift-evolution@swift.org> wrote:
On Mon, Oct 2, 2017 at 19:28 Ethan Tira-Thompson via swift-evolution < > swift-evolution@swift.org> wrote:

Chris_Lattner · October 3, 2017, 5:06am

Keep in mind that Swift already goes far above and beyond in terms of operators

Yep, that's is a large part of why I'm such a Swift fan :-D

Fortunately, no one is seriously proposing a major curtailing of the capabilities here, we’re just trying to rationalize the operator set, which is a bit of a mess at present.

in that: (a) it allows overloading of almost all standard operators; (b) it permits the definition of effectively an infinite number of custom operators using characters found in standard operators; (c) it permits the definition of custom precedences for custom operators; and (d) it additionally permits the use of a wide number of Unicode characters for custom operators. Most systems programming languages don't even allow (a), let alone (b) or (c). Even dramatically curtailing (d) leaves Swift with an unusually expansive support for custom operators.

Yes, but many of those custom operators won't have a clear meaning because operators are rarely limited to pre-existing symbols like "++++++++" (which doesn't mean anything at all AFAIK), so operators that are widely known within some field probably won't be widely known to the general public, which, IIUC, seems to be your standard for inclusion(?). Please let me know if that's not your position... I hate being misunderstood probably more than the next person, and I wouldn't want to be guilty of that myself.

The approach to operator handling in Swift is very intentional. IMO, it is well known that:

1) Operators can make code significantly easier to understand by reducing noise from complex expressions: writing x.matmul(y) is insane <https://www.python.org/dev/peps/pep-0465/> if you’re doing a lot of matrix multiplies.
2) Operators can be completely opaque to someone who doesn’t know them, and sometimes named functions are more clear.
3) Named functions can also sometimes be completely opaque if you don't know them, e.g. "let x = cholesky(y)"
4) Languages with fixed operator sets that also allow overloading (e.g. C++) end up with those operators being abused.
5) Some code can only be written and maintained by domain experts, and those experts often know the operators.

Swift’s approach is basically to say to users: “ok we allow overloaded operators, but at least if you encounter some operation that you don’t know… you know that you don’t know it”. If you encounter "if ¬x {“ or “a ∩ b” in some source code, at least you can command click, jump to the definition and read what it does: you aren’t misled into thinking that the expression is some familiar thing, but find out later it was overloaded to do something crazy (bitshifts for i/o? really??? :).

Set algebra is an illustrative example, because it is both used by people who are experts and people who are not. As far as policies go, I think it makes sense for Swift libraries to define operator-like things as named functions (e.g. “intersection") and also define operators (“∩”) which can optionally be used in source bases that want them for convenience. The compiler and language cannot know whether a code base is written and maintained by experts who know the symbols and who value their clarity (over the difficulty typing and recognizing them), and this approach allows maintainers of the codebase to pick their own policies.

I do think that Ethan’s suggestion upthread interesting, which suggest considering something like:
import matrixlib (operators: [ᵀ,·,⊗])

Three concerns I see:
- Requiring them today would be a source incompatibility with Swift 4
- Multiple modules can define operators, unclear whether this refers to the operator decl or implementations of operators.
- Imports are per-module, not per-source-file, so this couldn’t be used to “user-partition” the identifier and operator space. It could be a way to make it clear that the user is opting into these explicitly.

-Chris

···

On Oct 2, 2017, at 9:12 PM, David Sweeris via swift-evolution <swift-evolution@swift.org> wrote:

taylorswift · October 3, 2017, 5:21am

I’m 19 and for what it’s worth, set notation is “taught” in 9th grade but
no one really “learns” it until they get to discrete structures in college.
There’s a ton of random things that get introduced in high school/middle
school that no one ever retains. Believe it or not they teach set closure
in 6th grade, at least in my state.

It’s still my opinion that ⊆, ⊇, ∪, and friends make for obfuscated code
and I consider unicode operators to be one of the “toy” features of Swift.

···

On Mon, Oct 2, 2017 at 11:12 PM, David Sweeris via swift-evolution < swift-evolution@swift.org> wrote:

Maybe they've started teaching it earlier than when I went through
school... I don't think I learned it until Discrete Math, which IIRC was a
2nd or 3rd year course at my college and only required for Math, CS, and
maybe EE majors. Anyway, WRT a), if Swift achieves its "take over the
world" goal, *all* use cases will be Swift use cases. WRT b), "many" as
in the numerical quantity or "many" as in the percentage? There are
probably millions of people who recognize calculus's operators, but there
are 7.5 *billion* people in the world.

xwu · October 3, 2017, 4:20am

I’m all for fixing pressing issues requested by Xiaodi, but beyond that
I request we give a little more thought to the long term direction.

My 2¢ is I’ve been convinced that very few characters are “obviously”
either a operator or identifier across all contexts where they might be
used. Thus relegating the vast majority of thousands of ambiguous
characters to committee to decide a single global usage. But that is both
a huge time sink and fundamentally flawed in approach due to the contextual
dependency of who is using them.

For example, if a developer finds a set of symbols which perfectly
denote some niche concept, do you really expect the developer to submit a
proposal and wait months/years to get the characters classified and then a
new compiler version to be distributed, all so that developer can adopt
his/her own notation?

The Unicode Consortium already has a document describing which Unicode
characters are suitable identifiers in programming languages, with guidance
as to how to customize that list around the edges. This is already adopted
by other programming languages. So, with little design effort, that task is
not only doable but largely done.

As to operators, again, I am of the strong opinion that making it
possible for developers to adopt any preferred notation for any purpose (a)
is fundamentally incompatible with the division between operators and
identifiers, as I believe you’re saying here; and (b) should be a non-goal
from the outset. The only task, so far as I can tell, left to do is to
identify what pragmatic set of (mostly mathematical) symbols are used as
operators in the wider world and are likely to be already used in Swift
code or part of common use cases where an operator is clearly superior to
alternative spellings. In my view, the set of valid operator characters not
only shouldn’t require parsing or import directives, but should be small
enough to be knowable by memory.

The set notation operators should be identifiers, then?

Set notation operators aren't valid identifier characters; to be clear,
the alternative to being a valid operator character would be simply not
listing that character among valid operator or identifier characters.

Because the impression I got from the Set Algebra proposal a few months
ago is that there are a lot of people who’ve never even seen those
operators, let alone memorized them.

That's not the impression I got; the argument was that these symbols are
hard to type and _not more recognizable that the English text_, which is
certainly a plausible argument and the appropriate bar for deciding on a
standard library API name.

MHO is that the bar for a potentially valid operator character _for
potential use in third-party APIs_ needn't be so high that we demand the
character to be more recognizable to most people than alternative
notations. Instead, we can probably justify including a character if it is
(a) plausibly useful for some relatively common Swift use case and (b) at
least somewhat recognizable for many people. Since set algebra has a
well-accepted mathematical notation that's taught (afaict) at the _high
school_ level if not earlier, and since set algebra functions are a part of
the standard library, that surely meets those bars of usefulness and
recognizability.

Maybe they've started teaching it earlier than when I went through
school... I don't think I learned it until Discrete Math, which IIRC was a
2nd or 3rd year course at my college and only required for Math, CS, and
maybe EE majors. Anyway, WRT a), if Swift achieves its "take over the
world" goal, *all* use cases will be Swift use cases. WRT b), "many" as
in the numerical quantity or "many" as in the percentage? There are
probably millions of people who recognize calculus's operators, but there
are 7.5 *billion* people in the world.

Keep in mind that Swift already goes far above and beyond in terms of
operators

Yep, that's is a large part of why I'm such a Swift fan :-D

in that: (a) it allows overloading of almost all standard operators; (b)
it permits the definition of effectively an infinite number of custom
operators using characters found in standard operators; (c) it permits the
definition of custom precedences for custom operators; and (d) it
additionally permits the use of a wide number of Unicode characters for
custom operators. Most systems programming languages don't even allow (a),
let alone (b) or (c). Even dramatically curtailing (d) leaves Swift with an
unusually expansive support for custom operators.

Yes, but many of those custom operators won't have a clear meaning because
operators are rarely limited to pre-existing symbols like "++++++++" (which
doesn't mean anything at all AFAIK), so operators that are widely known *within
some field* probably won't be widely known to the general public, which,
IIUC, seems to be your standard for inclusion(?). Please let me know if
that's not your position... I hate being misunderstood probably more than
the next person, and I wouldn't want to be guilty of that myself.

What it does conclusively foreclose is something which ought to be stated
firmly as a non-goal, which is the typesetting of arbitrary mathematical
equations as valid Swift code.

I'm not arguing for adding arbitrary *typesetting* (not now anyway...
maybe in a decade or so when more important things have been dealt with).
What I *am* arguing for is the ability to treat operators from <pick your
> as operators within Swift. As much as possible, anyway.

Yes, and to be clear, I am arguing that this should be an explicit non-goal
at the moment for arbitrary "pick your fields"; instead, work for Swift 5
should address some pragmatic, fixed, and limited set of plausible use
cases for Swift.

···

On Mon, Oct 2, 2017 at 23:11 David Sweeris <davesweeris@mac.com> wrote:

On Oct 2, 2017, at 7:57 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:
On Mon, Oct 2, 2017 at 9:04 PM, David Sweeris <davesweeris@mac.com> wrote:

On Oct 2, 2017, at 5:45 PM, Xiaodi Wu via swift-evolution < >> swift-evolution@swift.org> wrote:
On Mon, Oct 2, 2017 at 19:28 Ethan Tira-Thompson via swift-evolution < >> swift-evolution@swift.org> wrote:

Quite simply, Swift is not math; simple addition doesn't even behave as it
does in grade-school arithmetic,

Perhaps not for the built-in numeric types, but how do you know somebody
won't create a type which does behave that way?

so there is no sense in attempting to shoehorn calculus into the language.

(I'm assuming you mean calculus's syntax, not calculus itself, right?) What's
the point of having unicode support if half the symbols will get rejected
because they aren't well-known enough? Sometimes only having token support
for something can be worse than no support at all from the PoV of someone
trying to do something that relies on it.

In any case, I only meant to point out a use-case for lookalikes, not
spark a debate about whether we should support more than a handful of
operators... shall we consider the idea withdrawn?

- Dave Sweeris

davedelong · October 3, 2017, 5:00pm

IMO, trying to restrict allowed operator characters based on their visual similarity to other characters is folly. The unicode representation of a character is an independent thing from its visual representation.

Because, ideally, I’d love to be able to do:

infix operator and: LogicalConjunctionPrecedence // or whatever the precedence is called
func and(lhs: Bool, rhs: Bool) → Bool { return lhs && rhs }

let truthyValue = true and false

That would make teaching simple predicate calculus much simpler. :)

Dave

···

On Oct 3, 2017, at 10:43 AM, Félix Cloutier via swift-evolution <swift-evolution@swift.org> wrote:

Le 2 oct. 2017 à 21:40, Chris Lattner <clattner@nondot.org <mailto:clattner@nondot.org>> a écrit :

On Oct 2, 2017, at 1:13 AM, Félix Cloutier via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

If you tried hard enough, you could probably create a variable that looks like it's shadowing one from an outer scope while it actually isn't, and use the two to confuse readers. This could trick people into thinking that some dangerous/backdoor code is actually good and safe, especially in the open-source world where you can't always trust your contributors.

On one hand, other than the complexity of telling if two characters are lookalikes, I don't know why Αrray (GREEK CAPITAL LETTER ALPHA) and Array (LATIN CAPITAL LETTER A) should be considered different identifiers. On the other hand, I struggle to imagine the specifics of an exploit that uses that. You'd have to work pretty hard to assemble all the pieces of a backdoor in visually-similar variable names without arousing suspicion.

I don’t think this is something we have to try hard to avoid. It is true that some characters look similar, particularly in some fonts, but this isn’t new:

  let a1 = 42
  let al = 12
  let b = al + a1

There is a fundamental difference between similar characters and characters that are meant to be visually identical. People judge the quality of a font by its Unicode support, and that means that only "low-quality" fonts would render, say, LATIN CAPITAL LETTER T and GREEK CAPITAL LETTER TAU differently.

If there were real code that was maliciously shadowing to try to cause confusion, then you have a more serious problem on your hands than someone accidentally misunderstanding which one to use.

I'm not sure I understand. If the "more serious problem" you're talking about is that your popular project is a valuable target to subvert, then there is no question that being backdoored would be more serious than people not reading your code right. I don't see how it pushes the problem out of scope, though.

As a security guy, I take my role of thinking about how anything can be abused very seriously. Backdoored open source projects turn up every now and then.

This code is backdoored. I challenge you to spot the bug:

func shellEscape(_ args: [String]) -> [String]?
func isWhitelisted(_ tool: String) -> Bool

func execute(externalTool: String, parameters: [String]) {
    if isWhitelisted(externalTool), let pаrameters = shellEscape(parameters) {
        print("Running tool \(pаrameters[0])")"
        system(parameters.joined(separator: " "))
    }
}

All I’m saying is that we shouldn’t complicate the design to solve this problem (IMO). If it falls out of the solution somehow (e.g. just disallow invisible characters) then that’s great of course!

How did you identify the bug in the snippet from above? Is it practical enough that you would, for instance, recommend that the server group do that test on every PR that they receive going forward?

I think that it's hard to build something meaningful without making it look suspicious. It's already kind of fishy that my shellEscape function returns an Optional, and people will eventually figure out that the parameters are not, in fact, shell-escaped. Still, I feel that it should be recognized that security is more than buffer overflows and integer overflows, and if there ever is an underhanded Swift code contest, that'll be my entry.

Félix

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution