Prohibit invisible characters in identifier names

VladimirS · June 21, 2016, 2:48pm

Perhaps stupid but: why was Swift designed to accept most Unicode
characters in identifier names? Wouldn’t it be simpler to go back to
a model where only standard ascii characters are accepted in
identifier names?

I assume it has something to do with the fact that 94.6% of the
world's population speak a first language which is not English. That
outweighs the inconvenience for Anglo developers, IMHO.

Yes, but the SDKs (frameworks, system libraries) are all in English,
including Swift standard library. I remember a few languages attempting
localized versions for kids to study better, failing terribly because
you learned something that had a very very limited use.

Support Charlie's opinion. For me (as non-native English speaker) non-ASCII characters in identifiers had no sense, even when I start to tech the programming when I was a child. Expressions composed from identifiers written in my native language is not near correct sentences.

Even more, we still have all other parts of language in English - for-while-guard-let-var-func etc..

When it comes to maintaining code, using localized identifier names is a
bad practice since anyone outside that country coming to the code can't
really use it. I personally can't imagine coming to maintain Swift code
with identifiers in Chinese, Japanese, Arabic, ...

While the feature of non-ASCII characters being allowed as identifiers
(which was held up high with Apple giving emoji examples) may seem cool,
I can only see this helpful in the future, given a different keyboard
layout (as someone has pointed out some time ago here), to introduce
one-character operators that would be otherwise impossible. But if
someone came to me with a code where a variable would be an emoji of a
dog, he'd get fired on the spot.

Yes, but I don't believe Apple will accept limiting of character set for identifiers to ASCII *after* these presentations with emoji of a dog ;-)

I'd personally vote to keep the zero-width-joiner characters forbidden
within the code outside of string literals (where they may make sense).
I agree that this can be easily solved by linters, but: I think this
particular set of characters should be restricted by the language
itself, since it's something easily omittable during code review and
given the upcoming package manager, this can lead to a hard-to-find
malware being distributed among developers who include these packages
within their projects - since you usually do not run a linter on a 3rd
party code.

I also think the main problem that could be caused by such tricks with zero-width-joiner or right-to-left-markers is injecting some malware code into sources in github, in package manager *or* even just in code snippet on web page(so you copy-pasted it to your source). Right now I don't know exact method to implement such malware code, but I believe this vulnerability could be used some day.

Btw, regarding the package manager. Will we have any protection from Typosquatting ? incolumitas.com – Typosquatting programming language package managers

As for the confusables - this depends a lot on the rendering and what
font you have set. I've tried 𝛎 → v with current Xcode and it looks
really different, mostly when you use a fixed-space font which usually
doesn't have non-ASCII characters which are then rendered using a
different font, making the distinction easy to spot.

In Russian we have these chars :
у к е г х а р о с ь
which are similar to english:
y k e r x a p o c b

So you most likely can't differ `рос` and `poc` , `хае` and `xae` etc

I don't think compiler should somehow decide if one non-English letter is looks like another English letter. But don't see any other method to protect myself other than using lints/checking tools for 3rd party code also.

···

On 21.06.2016 7:37, Charlie Monroe via swift-evolution wrote:

On Jun 21, 2016, at 2:23 AM, Brent Royal-Gordon via swift-evolution >> <swift-evolution@swift.org> wrote:

Honestly, this seems to me like a concern for linters and security
auditing tools, not for the compiler. Swift identifiers are
case-sensitive; I see no reason they shouldn't be script-sensitive or
zero-width-joiner-sensitive. (Though basic Unicode normalization seems
like a good idea, since differently-normalized strings are `==`
anyway.)

-- Brent Royal-Gordon Architechies

_______________________________________________ swift-evolution
mailing list swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________ swift-evolution mailing
list swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

xwu · June 21, 2016, 7:15pm

>
>>> IIRC, some languages require zero-width joiners (though not zero-width
spaces, which are distinct) to properly encode some of their characters.
I'd be very leery of having Swift land on a model where identifiers can be
used with some languages and not others; that smacks of ethnocentrism.
>>
>> None of those languages require zero-width characters between two Latin
letters, or between a Latin letter and an Arabic numeral, or at the end of
a word. Since standard / system APIs will (barring some radical shift) use
those code points exclusively, it's justifiable to give them some special
attention.
>>
>> Although the practical implementation may need to be more limited in
scope, the general principle doesn't need to privilege Latin letters and
Arabic numerals. If, in any context, the presence or absence of a
zero-width glyph cannot possibly be distinguished by a human reading the
text, then the compiler should also be indifferent to its presence or
absence (or, alternatively, its presence should be a compile-time error).
>
> Sure, that's obvious. Jordan was observing that the simplest way to
enforce that, banning such characters from identifiers completely, would
still interfere with some languages, and I was pointing out that just doing
enough to protect English would get most of the practical value because it
would protect every use of the system and standard library. A program
would then only become attackable in this specific way for its own
identifiers using non-Latin characters.
>
> All that said, I'm not convinced that this is worthwhile; the
identifier-similarity problem in Unicode is much broader than just
invisible characters. In fact, Swift still doesn't canonicalize
identifiers, so canonically equivalent compositions of the same glyph will
actually produce different names. So unless we're going to fix that and
then ban all sorts of things that are known to generally be represented
with a confusable glyph in a typical fixed-width font (like the
mathematical alphabets), this is just a problem that will always exist in
some form.

Any discussion about this ought to start from UAX #31, the Unicode
consortium's recommendations on identifiers in programming languages:

UAX #31: Unicode Identifiers and Syntax

Section 2.3 specifically calls out the situations in which ZWJ and ZWNJ
need to be allowed. The document also describes a stability policy for
handling new Unicode versions, other confusability issues, and many of the
other problems with adopting Unicode in a programming language's syntax.

That's a fantastic document--a very edifying read. Given Swift's robust
support for Unicode in its core libraries, it's kind of surprising to me
that identifiers aren't canonicalized at compile time. From a quick first
read, faithful adoption of UAX #31 recommendations would address most if
not all of the confusability and zero-width security issues raised in this
conversation.

···

On Tue, Jun 21, 2016 at 1:16 PM, Joe Groff <jgroff@apple.com> wrote:

> On Jun 21, 2016, at 8:47 AM, John McCall via swift-evolution < > swift-evolution@swift.org> wrote:
>> On Jun 20, 2016, at 7:07 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:
>> On Mon, Jun 20, 2016 at 8:58 PM, John McCall via swift-evolution < > swift-evolution@swift.org> wrote:
>>> On Jun 20, 2016, at 5:22 PM, Jordan Rose via swift-evolution < > swift-evolution@swift.org> wrote:

-Joe

ahltorp · June 24, 2016, 10:21pm

As far as I can see, forcing the programmer to write identifiers in an only-ASCII language, and requiring that identifier names have to be meaningful to the programmer means that the programmer has to know a language that is written in only ASCII. Most people are not, and requiring that they first learn for example English to be able to write programs is absurd.

Of course, if we relax the rule that identifier names have to be meaningful to the programmer, then it works out, but that is not something that I feel the Swift community should encourage.

The few symbols that are in English can be learned as what they are, symbols, without knowing their English meaning, just like &, |, <, >, *, and so on are symbols. In fact, you all learn these words as symbols that are distinct from their English connotations, since Swift is definitely not English text.

/Magnus

···

21 June 2016 16:48 Vladimir.S via swift-evolution wrote:

On 21.06.2016 7:37, Charlie Monroe via swift-evolution wrote:

On Jun 21, 2016, at 2:23 AM, Brent Royal-Gordon via swift-evolution >>> <swift-evolution@swift.org> wrote:

Perhaps stupid but: why was Swift designed to accept most Unicode
characters in identifier names? Wouldn’t it be simpler to go back to
a model where only standard ascii characters are accepted in
identifier names?

I assume it has something to do with the fact that 94.6% of the
world's population speak a first language which is not English. That
outweighs the inconvenience for Anglo developers, IMHO.

Yes, but the SDKs (frameworks, system libraries) are all in English,
including Swift standard library. I remember a few languages attempting
localized versions for kids to study better, failing terribly because
you learned something that had a very very limited use.

Support Charlie's opinion. For me (as non-native English speaker) non-ASCII characters in identifiers had no sense, even when I start to tech the programming when I was a child. Expressions composed from identifiers written in my native language is not near correct sentences.

Even more, we still have all other parts of language in English - for-while-guard-let-var-func etc..

Russ_Bishop1 · June 28, 2016, 7:43pm

Doesn't Unicode have a standard for this that specified which characters are look-alikes?

Russ

···

On Jun 21, 2016, at 7:48 AM, Vladimir.S via swift-evolution <swift-evolution@swift.org> wrote:

On 21.06.2016 7:37, Charlie Monroe via swift-evolution wrote:

On Jun 21, 2016, at 2:23 AM, Brent Royal-Gordon via swift-evolution >>> <swift-evolution@swift.org> wrote:

Perhaps stupid but: why was Swift designed to accept most Unicode
characters in identifier names? Wouldn’t it be simpler to go back to
a model where only standard ascii characters are accepted in
identifier names?

I assume it has something to do with the fact that 94.6% of the
world's population speak a first language which is not English. That
outweighs the inconvenience for Anglo developers, IMHO.

Yes, but the SDKs (frameworks, system libraries) are all in English,
including Swift standard library. I remember a few languages attempting
localized versions for kids to study better, failing terribly because
you learned something that had a very very limited use.

Support Charlie's opinion. For me (as non-native English speaker) non-ASCII characters in identifiers had no sense, even when I start to tech the programming when I was a child. Expressions composed from identifiers written in my native language is not near correct sentences.

Even more, we still have all other parts of language in English - for-while-guard-let-var-func etc..

When it comes to maintaining code, using localized identifier names is a
bad practice since anyone outside that country coming to the code can't
really use it. I personally can't imagine coming to maintain Swift code
with identifiers in Chinese, Japanese, Arabic, ...

While the feature of non-ASCII characters being allowed as identifiers
(which was held up high with Apple giving emoji examples) may seem cool,
I can only see this helpful in the future, given a different keyboard
layout (as someone has pointed out some time ago here), to introduce
one-character operators that would be otherwise impossible. But if
someone came to me with a code where a variable would be an emoji of a
dog, he'd get fired on the spot.

Yes, but I don't believe Apple will accept limiting of character set for identifiers to ASCII *after* these presentations with emoji of a dog ;-)

I'd personally vote to keep the zero-width-joiner characters forbidden
within the code outside of string literals (where they may make sense).
I agree that this can be easily solved by linters, but: I think this
particular set of characters should be restricted by the language
itself, since it's something easily omittable during code review and
given the upcoming package manager, this can lead to a hard-to-find
malware being distributed among developers who include these packages
within their projects - since you usually do not run a linter on a 3rd
party code.

I also think the main problem that could be caused by such tricks with zero-width-joiner or right-to-left-markers is injecting some malware code into sources in github, in package manager *or* even just in code snippet on web page(so you copy-pasted it to your source). Right now I don't know exact method to implement such malware code, but I believe this vulnerability could be used some day.

Btw, regarding the package manager. Will we have any protection from Typosquatting ? incolumitas.com – Typosquatting programming language package managers

As for the confusables - this depends a lot on the rendering and what
font you have set. I've tried 𝛎 → v with current Xcode and it looks
really different, mostly when you use a fixed-space font which usually
doesn't have non-ASCII characters which are then rendered using a
different font, making the distinction easy to spot.

In Russian we have these chars :
у к е г х а р о с ь
which are similar to english:
y k e r x a p o c b

So you most likely can't differ `рос` and `poc` , `хае` and `xae` etc

I don't think compiler should somehow decide if one non-English letter is looks like another English letter. But don't see any other method to protect myself other than using lints/checking tools for 3rd party code also.

Honestly, this seems to me like a concern for linters and security
auditing tools, not for the compiler. Swift identifiers are
case-sensitive; I see no reason they shouldn't be script-sensitive or
zero-width-joiner-sensitive. (Though basic Unicode normalization seems
like a good idea, since differently-normalized strings are `==`
anyway.)

-- Brent Royal-Gordon Architechies

_______________________________________________ swift-evolution
mailing list swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________ swift-evolution mailing
list swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Chris_Lattner · June 22, 2016, 4:57am

+1

-Chris

···

On Jun 21, 2016, at 12:15 PM, Xiaodi Wu via swift-evolution <swift-evolution@swift.org> wrote:

Any discussion about this ought to start from UAX #31, the Unicode consortium's recommendations on identifiers in programming languages:

UAX #31: Unicode Identifiers and Syntax

Section 2.3 specifically calls out the situations in which ZWJ and ZWNJ need to be allowed. The document also describes a stability policy for handling new Unicode versions, other confusability issues, and many of the other problems with adopting Unicode in a programming language's syntax.

That's a fantastic document--a very edifying read. Given Swift's robust support for Unicode in its core libraries, it's kind of surprising to me that identifiers aren't canonicalized at compile time. From a quick first read, faithful adoption of UAX #31 recommendations would address most if not all of the confusability and zero-width security issues raised in this conversation.

Joao_Pinheiro · June 23, 2016, 4:17pm

From what I've read of UAX #31 <http://unicode.org/reports/tr31/> it does seem to address all of the invisible character issues raised in the discussion. Given their unicode status of of Default_Ignorable_Code_Points, I believe the best course of action would be to canonicalise identifiers by allowing invisible characters only where appropriate and ignoring them everywhere else.

The alternative to ignoring them would be to not canonicalise identifiers and treat invisible characters as an error instead.

This doesn't address the issue of unicode confusable characters, but solving that has additional problems of its own and would probably be better addressed in a different proposal entirely.

I'd like to start writing the proposal if there is agreement that this would be the best course of action.

Sincerely,
João Pinheiro

···

On 21 Jun 2016, at 20:15, Xiaodi Wu via swift-evolution <swift-evolution@swift.org> wrote:

On Tue, Jun 21, 2016 at 1:16 PM, Joe Groff <jgroff@apple.com <mailto:jgroff@apple.com>> wrote:
Any discussion about this ought to start from UAX #31, the Unicode consortium's recommendations on identifiers in programming languages:

UAX #31: Unicode Identifiers and Syntax

Section 2.3 specifically calls out the situations in which ZWJ and ZWNJ need to be allowed. The document also describes a stability policy for handling new Unicode versions, other confusability issues, and many of the other problems with adopting Unicode in a programming language's syntax.

That's a fantastic document--a very edifying read. Given Swift's robust support for Unicode in its core libraries, it's kind of surprising to me that identifiers aren't canonicalized at compile time. From a quick first read, faithful adoption of UAX #31 recommendations would address most if not all of the confusability and zero-width security issues raised in this conversation.

Charlie_Monroe1 · June 25, 2016, 4:13am

Perhaps stupid but: why was Swift designed to accept most Unicode
characters in identifier names? Wouldn’t it be simpler to go back to
a model where only standard ascii characters are accepted in
identifier names?

I assume it has something to do with the fact that 94.6% of the
world's population speak a first language which is not English. That
outweighs the inconvenience for Anglo developers, IMHO.

Yes, but the SDKs (frameworks, system libraries) are all in English,
including Swift standard library. I remember a few languages attempting
localized versions for kids to study better, failing terribly because
you learned something that had a very very limited use.

Support Charlie's opinion. For me (as non-native English speaker) non-ASCII characters in identifiers had no sense, even when I start to tech the programming when I was a child. Expressions composed from identifiers written in my native language is not near correct sentences.

Even more, we still have all other parts of language in English - for-while-guard-let-var-func etc..

As far as I can see, forcing the programmer to write identifiers in an only-ASCII language, and requiring that identifier names have to be meaningful to the programmer means that the programmer has to know a language that is written in only ASCII. Most people are not, and requiring that they first learn for example English to be able to write programs is absurd.

You can always write your identifier using ASCII. For languages that use the latin script, just extended with various accents, can be written without them (this is how I was taught). Any language I'm aware of can be eventually written in the latin script (Chinese, cyrilic, ...).

BTW how far along with programming do you think you'd get without the knowledge of English? All libraries, SDKs use English identifiers. The documentation is in English. For one to lear programming without actually knowing any English would require the language to have localizable identifiers. Can you imagine those? Given how much time is put here to standardize the naming of a few methods in the standard library, how would it look in other languages?

···

On Jun 25, 2016, at 12:21 AM, Magnus Ahltorp <map@kth.se> wrote:

21 June 2016 16:48 Vladimir.S via swift-evolution wrote:
On 21.06.2016 7:37, Charlie Monroe via swift-evolution wrote:

On Jun 21, 2016, at 2:23 AM, Brent Royal-Gordon via swift-evolution >>>> <swift-evolution@swift.org> wrote:

Of course, if we relax the rule that identifier names have to be meaningful to the programmer, then it works out, but that is not something that I feel the Swift community should encourage.

The few symbols that are in English can be learned as what they are, symbols, without knowing their English meaning, just like &, |, <, >, *, and so on are symbols. In fact, you all learn these words as symbols that are distinct from their English connotations, since Swift is definitely not English text.

/Magnus

xwu · June 28, 2016, 8:13pm

Yes. See earlier in the discussion about the Unicode confusables list. The
security issues that arise from confusable URLs aren't the same as those
for identifiers in Swift, and I think the short version of the previous
discussion is that prohibiting the use of Greek nu and mathematical bold
italic v, for instance, isn't necessary for security.

Thus, we're working off of a different Unicode recommendation specifically
drawn up for identifier normalization, which does not involve forbidding
confusables.

···

On Tue, Jun 28, 2016 at 14:43 Russ Bishop via swift-evolution < swift-evolution@swift.org> wrote:

Doesn't Unicode have a standard for this that specified which characters
are look-alikes?

Russ

> On Jun 21, 2016, at 7:48 AM, Vladimir.S via swift-evolution < > swift-evolution@swift.org> wrote:
>
>
>> On 21.06.2016 7:37, Charlie Monroe via swift-evolution wrote:
>>
>>> On Jun 21, 2016, at 2:23 AM, Brent Royal-Gordon via swift-evolution > >>> <swift-evolution@swift.org> wrote:
>>>
>>>> Perhaps stupid but: why was Swift designed to accept most Unicode
>>>> characters in identifier names? Wouldn’t it be simpler to go back to
>>>> a model where only standard ascii characters are accepted in
>>>> identifier names?
>>>
>>> I assume it has something to do with the fact that 94.6% of the
>>> world's population speak a first language which is not English. That
>>> outweighs the inconvenience for Anglo developers, IMHO.
>>
>> Yes, but the SDKs (frameworks, system libraries) are all in English,
>> including Swift standard library. I remember a few languages attempting
>> localized versions for kids to study better, failing terribly because
>> you learned something that had a very very limited use.
>
> Support Charlie's opinion. For me (as non-native English speaker)
non-ASCII characters in identifiers had no sense, even when I start to tech
the programming when I was a child. Expressions composed from identifiers
written in my native language is not near correct sentences.
>
> Even more, we still have all other parts of language in English -
for-while-guard-let-var-func etc..
>
>>
>> When it comes to maintaining code, using localized identifier names is a
>> bad practice since anyone outside that country coming to the code can't
>> really use it. I personally can't imagine coming to maintain Swift code
>> with identifiers in Chinese, Japanese, Arabic, ...
>>
>> While the feature of non-ASCII characters being allowed as identifiers
>> (which was held up high with Apple giving emoji examples) may seem cool,
>> I can only see this helpful in the future, given a different keyboard
>> layout (as someone has pointed out some time ago here), to introduce
>> one-character operators that would be otherwise impossible. But if
>> someone came to me with a code where a variable would be an emoji of a
>> dog, he'd get fired on the spot.
>
> Yes, but I don't believe Apple will accept limiting of character set for
identifiers to ASCII *after* these presentations with emoji of a dog ;-)
>
>>
>> I'd personally vote to keep the zero-width-joiner characters forbidden
>> within the code outside of string literals (where they may make sense).
>> I agree that this can be easily solved by linters, but: I think this
>> particular set of characters should be restricted by the language
>> itself, since it's something easily omittable during code review and
>> given the upcoming package manager, this can lead to a hard-to-find
>> malware being distributed among developers who include these packages
>> within their projects - since you usually do not run a linter on a 3rd
>> party code.
>
> I also think the main problem that could be caused by such tricks with
zero-width-joiner or right-to-left-markers is injecting some malware code
into sources in github, in package manager *or* even just in code snippet
on web page(so you copy-pasted it to your source). Right now I don't know
exact method to implement such malware code, but I believe this
vulnerability could be used some day.
>
> Btw, regarding the package manager. Will we have any protection from
Typosquatting ?
incolumitas.com – Typosquatting programming language package managers
>
>>
>> As for the confusables - this depends a lot on the rendering and what
>> font you have set. I've tried 𝛎 → v with current Xcode and it looks
>> really different, mostly when you use a fixed-space font which usually
>> doesn't have non-ASCII characters which are then rendered using a
>> different font, making the distinction easy to spot.
>
> In Russian we have these chars :
> у к е г х а р о с ь
> which are similar to english:
> y k e r x a p o c b
>
> So you most likely can't differ `рос` and `poc` , `хае` and `xae` etc
>
> I don't think compiler should somehow decide if one non-English letter
is looks like another English letter. But don't see any other method to
protect myself other than using lints/checking tools for 3rd party code
also.
>
>>
>>>
>>> Honestly, this seems to me like a concern for linters and security
>>> auditing tools, not for the compiler. Swift identifiers are
>>> case-sensitive; I see no reason they shouldn't be script-sensitive or
>>> zero-width-joiner-sensitive. (Though basic Unicode normalization seems
>>> like a good idea, since differently-normalized strings are `==`
>>> anyway.)
>>>
>>> -- Brent Royal-Gordon Architechies
>>>
>>> _______________________________________________ swift-evolution
>>> mailing list swift-evolution@swift.org
>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>
>> _______________________________________________ swift-evolution mailing
>> list swift-evolution@swift.org
>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>
> _______________________________________________
> swift-evolution mailing list
> swift-evolution@swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Chris_Lattner · June 23, 2016, 4:30pm

Sounds great, please do. Thanks!

-Chris

···

On Jun 23, 2016, at 9:17 AM, João Pinheiro via swift-evolution <swift-evolution@swift.org> wrote:

On 21 Jun 2016, at 20:15, Xiaodi Wu via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Tue, Jun 21, 2016 at 1:16 PM, Joe Groff <jgroff@apple.com <mailto:jgroff@apple.com>> wrote:
Any discussion about this ought to start from UAX #31, the Unicode consortium's recommendations on identifiers in programming languages:

UAX #31: Unicode Identifiers and Syntax

Section 2.3 specifically calls out the situations in which ZWJ and ZWNJ need to be allowed. The document also describes a stability policy for handling new Unicode versions, other confusability issues, and many of the other problems with adopting Unicode in a programming language's syntax.

That's a fantastic document--a very edifying read. Given Swift's robust support for Unicode in its core libraries, it's kind of surprising to me that identifiers aren't canonicalized at compile time. From a quick first read, faithful adoption of UAX #31 recommendations would address most if not all of the confusability and zero-width security issues raised in this conversation.

From what I've read of UAX #31 <http://unicode.org/reports/tr31/> it does seem to address all of the invisible character issues raised in the discussion. Given their unicode status of of Default_Ignorable_Code_Points, I believe the best course of action would be to canonicalise identifiers by allowing invisible characters only where appropriate and ignoring them everywhere else.

The alternative to ignoring them would be to not canonicalise identifiers and treat invisible characters as an error instead.

This doesn't address the issue of unicode confusable characters, but solving that has additional problems of its own and would probably be better addressed in a different proposal entirely.

I'd like to start writing the proposal if there is agreement that this would be the best course of action.

BigZaphod · June 23, 2016, 4:19pm

I’m no unicode expert, but this sounds like the way to go to me.

l8r
Sean

···

On Jun 23, 2016, at 11:17 AM, João Pinheiro via swift-evolution <swift-evolution@swift.org> wrote:

On 21 Jun 2016, at 20:15, Xiaodi Wu via swift-evolution <swift-evolution@swift.org> wrote:

On Tue, Jun 21, 2016 at 1:16 PM, Joe Groff <jgroff@apple.com> wrote:
Any discussion about this ought to start from UAX #31, the Unicode consortium's recommendations on identifiers in programming languages:

UAX #31: Unicode Identifiers and Syntax

Section 2.3 specifically calls out the situations in which ZWJ and ZWNJ need to be allowed. The document also describes a stability policy for handling new Unicode versions, other confusability issues, and many of the other problems with adopting Unicode in a programming language's syntax.

That's a fantastic document--a very edifying read. Given Swift's robust support for Unicode in its core libraries, it's kind of surprising to me that identifiers aren't canonicalized at compile time. From a quick first read, faithful adoption of UAX #31 recommendations would address most if not all of the confusability and zero-width security issues raised in this conversation.

From what I've read of UAX #31 it does seem to address all of the invisible character issues raised in the discussion. Given their unicode status of of Default_Ignorable_Code_Points, I believe the best course of action would be to canonicalise identifiers by allowing invisible characters only where appropriate and ignoring them everywhere else.

The alternative to ignoring them would be to not canonicalise identifiers and treat invisible characters as an error instead.

This doesn't address the issue of unicode confusable characters, but solving that has additional problems of its own and would probably be better addressed in a different proposal entirely.

I'd like to start writing the proposal if there is agreement that this would be the best course of action.

Sincerely,
João Pinheiro
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

xwu · June 23, 2016, 4:40pm

I think this issue is bigger than that. As UAX #31 suggests, the most
appropriate approach is canonicalizing identifiers by NFC, with specific
treatment of ZWJ and ZWNJ by allowing them in three contexts, which will
require thought as to how to implement.

Given that there is a specifically recommended algorithm on how to handle
this issue, I'm also not sure anymore that this requires a proposal;
"process Unicode correctly" is really more of a bug fix because, given the
strict limits of what's canonicalized, there shouldn't be a user-facing
effect if we are merely proposing to prohibit glyphs from appearing in
certain contexts where they are never in fact encountered in real language.

···

On Thu, Jun 23, 2016 at 11:19 AM Sean Heber <sean@fifthace.com> wrote:

I’m no unicode expert, but this sounds like the way to go to me.

l8r
Sean

> On Jun 23, 2016, at 11:17 AM, João Pinheiro via swift-evolution < > swift-evolution@swift.org> wrote:
>
>
>> On 21 Jun 2016, at 20:15, Xiaodi Wu via swift-evolution < > swift-evolution@swift.org> wrote:
>>
>> On Tue, Jun 21, 2016 at 1:16 PM, Joe Groff <jgroff@apple.com> wrote:
>> Any discussion about this ought to start from UAX #31, the Unicode
consortium's recommendations on identifiers in programming languages:
>>
>> UAX #31: Unicode Identifiers and Syntax
>>
>> Section 2.3 specifically calls out the situations in which ZWJ and ZWNJ
need to be allowed. The document also describes a stability policy for
handling new Unicode versions, other confusability issues, and many of the
other problems with adopting Unicode in a programming language's syntax.
>>
>> That's a fantastic document--a very edifying read. Given Swift's robust
support for Unicode in its core libraries, it's kind of surprising to me
that identifiers aren't canonicalized at compile time. From a quick first
read, faithful adoption of UAX #31 recommendations would address most if
not all of the confusability and zero-width security issues raised in this
conversation.
>
> From what I've read of UAX #31 it does seem to address all of the
invisible character issues raised in the discussion. Given their unicode
status of of Default_Ignorable_Code_Points, I believe the best course of
action would be to canonicalise identifiers by allowing invisible
characters only where appropriate and ignoring them everywhere else.
>
> The alternative to ignoring them would be to not canonicalise
identifiers and treat invisible characters as an error instead.
>
> This doesn't address the issue of unicode confusable characters, but
solving that has additional problems of its own and would probably be
better addressed in a different proposal entirely.
>
> I'd like to start writing the proposal if there is agreement that this
would be the best course of action.
>
> Sincerely,
> João Pinheiro
> _______________________________________________
> swift-evolution mailing list
> swift-evolution@swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution

Joao_Pinheiro · June 23, 2016, 5:41pm

There are two different issues here, individual character normalisation and identifier canonicalisation. NFC handles character normalisation and it definitely should be part of the proposal since identifier canonicalisation doesn't make sense if the individual character representation isn't normalised first.

Swift currently doesn't normalise unicode characters, as can be seen in the following code example:

let Å = "Hello" // Angstrom
let Å = "Swift" // Latin Capital Letter A With Ring Above
let Å = "World" // Latin Capital Letter A + Combining Ring Above

print(Å)
print(Å)
print(Å)

According to the unicode standard, all 3 of these characters should be normalised into the same representation.

Sincerely,
João Pinheiro

···

On 23 Jun 2016, at 17:40, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

I think this issue is bigger than that. As UAX #31 suggests, the most appropriate approach is canonicalizing identifiers by NFC, with specific treatment of ZWJ and ZWNJ by allowing them in three contexts, which will require thought as to how to implement.

Given that there is a specifically recommended algorithm on how to handle this issue, I'm also not sure anymore that this requires a proposal; "process Unicode correctly" is really more of a bug fix because, given the strict limits of what's canonicalized, there shouldn't be a user-facing effect if we are merely proposing to prohibit glyphs from appearing in certain contexts where they are never in fact encountered in real language.

On Thu, Jun 23, 2016 at 11:19 AM Sean Heber <sean@fifthace.com <mailto:sean@fifthace.com>> wrote:
I’m no unicode expert, but this sounds like the way to go to me.

l8r
Sean

> On Jun 23, 2016, at 11:17 AM, João Pinheiro via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:
>
>
>> On 21 Jun 2016, at 20:15, Xiaodi Wu via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:
>>
>> On Tue, Jun 21, 2016 at 1:16 PM, Joe Groff <jgroff@apple.com <mailto:jgroff@apple.com>> wrote:
>> Any discussion about this ought to start from UAX #31, the Unicode consortium's recommendations on identifiers in programming languages:
>>
>> UAX #31: Unicode Identifiers and Syntax
>>
>> Section 2.3 specifically calls out the situations in which ZWJ and ZWNJ need to be allowed. The document also describes a stability policy for handling new Unicode versions, other confusability issues, and many of the other problems with adopting Unicode in a programming language's syntax.
>>
>> That's a fantastic document--a very edifying read. Given Swift's robust support for Unicode in its core libraries, it's kind of surprising to me that identifiers aren't canonicalized at compile time. From a quick first read, faithful adoption of UAX #31 recommendations would address most if not all of the confusability and zero-width security issues raised in this conversation.
>
> From what I've read of UAX #31 it does seem to address all of the invisible character issues raised in the discussion. Given their unicode status of of Default_Ignorable_Code_Points, I believe the best course of action would be to canonicalise identifiers by allowing invisible characters only where appropriate and ignoring them everywhere else.
>
> The alternative to ignoring them would be to not canonicalise identifiers and treat invisible characters as an error instead.
>
> This doesn't address the issue of unicode confusable characters, but solving that has additional problems of its own and would probably be better addressed in a different proposal entirely.
>
> I'd like to start writing the proposal if there is agreement that this would be the best course of action.
>
> Sincerely,
> João Pinheiro
> _______________________________________________
> swift-evolution mailing list
> swift-evolution@swift.org <mailto:swift-evolution@swift.org>
> https://lists.swift.org/mailman/listinfo/swift-evolution

David_Sweeris · June 25, 2016, 5:12am

Speaking of which, hypothetically, if we wanted to support translations of Swift itself (and the standard library), would it be better to have the compiler figure out how to make object files work across languages, or would it be better for the on-disk file to always be in the "canonical" language and have the IDE do the translation?

I'm *not* proposing we do this... Just thinking about what would need to be done and how hard it would be.

- Dave Sweeris

···

On Jun 24, 2016, at 23:13, Charlie Monroe via swift-evolution <swift-evolution@swift.org> wrote:

BTW how far along with programming do you think you'd get without the knowledge of English? All libraries, SDKs use English identifiers. The documentation is in English. For one to lear programming without actually knowing any English would require the language to have localizable identifiers. Can you imagine those? Given how much time is put here to standardize the naming of a few methods in the standard library, how would it look in other languages?

ahltorp · June 25, 2016, 10:36am

As far as I can see, forcing the programmer to write identifiers in an only-ASCII language, and requiring that identifier names have to be meaningful to the programmer means that the programmer has to know a language that is written in only ASCII. Most people are not, and requiring that they first learn for example English to be able to write programs is absurd.

You can always write your identifier using ASCII. For languages that use the latin script, just extended with various accents, can be written without them (this is how I was taught). Any language I'm aware of can be eventually written in the latin script (Chinese, cyrilic, ...).

Take this small example from the Swift book:

let favoriteSnacks = [
    "Alice": "Chips",
    "Bob": "Licorice",
    "Eve": "Pretzels",
]

func buyFavoriteSnack(person: String, vendingMachine: VendingMachine) throws {
let snackName = favoriteSnacks[person] ?? "Candy Bar"
try vendingMachine.vend(itemNamed: snackName)
}

Imagine being forced to write all your identifiers in Chinese script, based on the Mandarin transliteration of the English pronunciation:

假设费弗勒特斯纳克斯 = [
    "Alice": "Chips",
    "Bob": "Licorice",
    "Eve": "Pretzels",
]

函数拜费弗勒特斯纳克(珀尔瑟恩: 字符串, 文丁默欣: 文丁默欣) 会投掷 {
假设斯纳克内姆 = 费弗勒特斯纳克斯[珀尔瑟恩] ?? "Candy Bar"
尝试文丁默欣.文德(艾德姆内姆德: 斯纳克内姆)
}

At least you can write the strings in your familiar script. Since you probably don't know Chinese, it is best illustrated by converting it to (non-accented) Pinyin.

jiashe feifuletesinakesi = [
    "Alice": "Chips",
    "Bob": "Licorice",
    "Eve": "Pretzels",
]

hanshu baifeifuletesinake(poerseen: zifuchuan, wendingmoxin: wendingmoxin) huitouzhi {
jiashe sinakeneimu = feifuletesinakesi[poerseen] ?? "Candy Bar"
changshi wendingmoxin.wende(aidemuneimude: sinakeneimu)
}

You would probably quickly learn words like jiashe and hanshu, but every identifier else is actually still in English, just horribly garbled.

Please excuse any errors in the transliterations, but you would probably make some errors if you would have to write your code in Chinese as well.

BTW how far along with programming do you think you'd get without the knowledge of English? All libraries, SDKs use English identifiers. The documentation is in English. For one to lear programming without actually knowing any English would require the language to have localizable identifiers. Can you imagine those? Given how much time is put here to standardize the naming of a few methods in the standard library, how would it look in other languages?

There are actually programming resources in other languages than English. This is especially true for Chinese.

/Magnus

···

25 June 2016 06:13 Charlie Monroe <charlie@charliemonroe.net> wrote:

On Jun 25, 2016, at 12:21 AM, Magnus Ahltorp <map@kth.se> wrote:

Charlie_Monroe1 · June 27, 2016, 4:24am

BTW how far along with programming do you think you'd get without the knowledge of English? All libraries, SDKs use English identifiers. The documentation is in English. For one to lear programming without actually knowing any English would require the language to have localizable identifiers. Can you imagine those? Given how much time is put here to standardize the naming of a few methods in the standard library, how would it look in other languages?

Speaking of which, hypothetically, if we wanted to support translations of Swift itself (and the standard library), would it be better to have the compiler figure out how to make object files work across languages, or would it be better for the on-disk file to always be in the "canonical" language and have the IDE do the translation?

Historically, these languages were 100% translated and required localized compiler support (we're talking about BASIC, Pascal) since back then IDE support was quite poor. Nowadays, on-the-fly translation by the IDE would probably work out the best.

···

On Jun 25, 2016, at 7:12 AM, David Sweeris <davesweeris@mac.com> wrote:

On Jun 24, 2016, at 23:13, Charlie Monroe via swift-evolution <swift-evolution@swift.org> wrote:

I'm *not* proposing we do this... Just thinking about what would need to be done and how hard it would be.

- Dave Sweeris

Joao_Pinheiro · July 26, 2016, 7:27pm

I've submitted a draft of the proposal on the thread "Normalize Unicode Identifiers <http://thread.gmane.org/gmane.comp.lang.swift.evolution/25126>"\. Please make any comments and recommendations there.

Sincerely,
João Pinheiro

···

On 23 Jun 2016, at 18:30, Chris Lattner <clattner@apple.com> wrote:

On Jun 23, 2016, at 9:17 AM, João Pinheiro via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On 21 Jun 2016, at 20:15, Xiaodi Wu via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Tue, Jun 21, 2016 at 1:16 PM, Joe Groff <jgroff@apple.com <mailto:jgroff@apple.com>> wrote:
Any discussion about this ought to start from UAX #31, the Unicode consortium's recommendations on identifiers in programming languages:

UAX #31: Unicode Identifiers and Syntax

Section 2.3 specifically calls out the situations in which ZWJ and ZWNJ need to be allowed. The document also describes a stability policy for handling new Unicode versions, other confusability issues, and many of the other problems with adopting Unicode in a programming language's syntax.

That's a fantastic document--a very edifying read. Given Swift's robust support for Unicode in its core libraries, it's kind of surprising to me that identifiers aren't canonicalized at compile time. From a quick first read, faithful adoption of UAX #31 recommendations would address most if not all of the confusability and zero-width security issues raised in this conversation.

From what I've read of UAX #31 <http://unicode.org/reports/tr31/> it does seem to address all of the invisible character issues raised in the discussion. Given their unicode status of of Default_Ignorable_Code_Points, I believe the best course of action would be to canonicalise identifiers by allowing invisible characters only where appropriate and ignoring them everywhere else.

The alternative to ignoring them would be to not canonicalise identifiers and treat invisible characters as an error instead.

This doesn't address the issue of unicode confusable characters, but solving that has additional problems of its own and would probably be better addressed in a different proposal entirely.

I'd like to start writing the proposal if there is agreement that this would be the best course of action.

Sounds great, please do. Thanks!

-Chris

xwu · June 23, 2016, 5:56pm

There are two different issues here, individual character normalisation
and identifier canonicalisation. NFC handles character normalisation and it
definitely should be part of the proposal since identifier canonicalisation
doesn't make sense if the individual character representation isn't
normalised first.

I think we're using terminology differently here. What you call "character
normalization" is what I'm calling canonicalization. NFC is described in
UAX #15 as "canonical decomposition followed by canonical composition" and
I'm just using the word "canonicalization" because it's shorter. If Swift
represents each identifier in an NFC-transformed form (what I call
canonicalized), then I understand the identifier to be canonicalized. What
is the distinction you're drawing here?

···

On Thu, Jun 23, 2016 at 12:41 PM, João Pinheiro <joao@joaopinheiro.org> wrote:

Swift currently doesn't normalise unicode characters, as can be seen in
the following code example:

let Å = "Hello" // Angstrom
let Å = "Swift" // Latin Capital Letter A With Ring Above
let Å = "World" // Latin Capital Letter A + Combining Ring Above

print(Å)
print(Å)
print(Å)

According to the unicode standard, all 3 of these characters should be
normalised into the same representation.

Sincerely,
João Pinheiro

On 23 Jun 2016, at 17:40, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

I think this issue is bigger than that. As UAX #31 suggests, the most
appropriate approach is canonicalizing identifiers by NFC, with specific
treatment of ZWJ and ZWNJ by allowing them in three contexts, which will
require thought as to how to implement.

Given that there is a specifically recommended algorithm on how to handle
this issue, I'm also not sure anymore that this requires a proposal;
"process Unicode correctly" is really more of a bug fix because, given the
strict limits of what's canonicalized, there shouldn't be a user-facing
effect if we are merely proposing to prohibit glyphs from appearing in
certain contexts where they are never in fact encountered in real language.

On Thu, Jun 23, 2016 at 11:19 AM Sean Heber <sean@fifthace.com> wrote:

I’m no unicode expert, but this sounds like the way to go to me.

l8r
Sean

> On Jun 23, 2016, at 11:17 AM, João Pinheiro via swift-evolution < >> swift-evolution@swift.org> wrote:
>
>
>> On 21 Jun 2016, at 20:15, Xiaodi Wu via swift-evolution < >> swift-evolution@swift.org> wrote:
>>
>> On Tue, Jun 21, 2016 at 1:16 PM, Joe Groff <jgroff@apple.com> wrote:
>> Any discussion about this ought to start from UAX #31, the Unicode
consortium's recommendations on identifiers in programming languages:
>>
>> UAX #31: Unicode Identifiers and Syntax
>>
>> Section 2.3 specifically calls out the situations in which ZWJ and
ZWNJ need to be allowed. The document also describes a stability policy for
handling new Unicode versions, other confusability issues, and many of the
other problems with adopting Unicode in a programming language's syntax.
>>
>> That's a fantastic document--a very edifying read. Given Swift's
robust support for Unicode in its core libraries, it's kind of surprising
to me that identifiers aren't canonicalized at compile time. From a quick
first read, faithful adoption of UAX #31 recommendations would address most
if not all of the confusability and zero-width security issues raised in
this conversation.
>
> From what I've read of UAX #31 it does seem to address all of the
invisible character issues raised in the discussion. Given their unicode
status of of Default_Ignorable_Code_Points, I believe the best course of
action would be to canonicalise identifiers by allowing invisible
characters only where appropriate and ignoring them everywhere else.
>
> The alternative to ignoring them would be to not canonicalise
identifiers and treat invisible characters as an error instead.
>
> This doesn't address the issue of unicode confusable characters, but
solving that has additional problems of its own and would probably be
better addressed in a different proposal entirely.
>
> I'd like to start writing the proposal if there is agreement that this
would be the best course of action.
>
> Sincerely,
> João Pinheiro
> _______________________________________________
> swift-evolution mailing list
> swift-evolution@swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution

xwu · June 23, 2016, 6:31pm

There are two different issues here, individual character normalisation
and identifier canonicalisation. NFC handles character normalisation and it
definitely should be part of the proposal since identifier canonicalisation
doesn't make sense if the individual character representation isn't
normalised first.

I think we're using terminology differently here. What you call "character
normalization" is what I'm calling canonicalization. NFC is described in
UAX #15 as "canonical decomposition followed by canonical composition" and
I'm just using the word "canonicalization" because it's shorter. If Swift
represents each identifier in an NFC-transformed form (what I call
canonicalized), then I understand the identifier to be canonicalized. What
is the distinction you're drawing here?

Swift currently doesn't normalise unicode characters, as can be seen in
the following code example:

let Å = "Hello" // Angstrom
let Å = "Swift" // Latin Capital Letter A With Ring Above
let Å = "World" // Latin Capital Letter A + Combining Ring Above

print(Å)
print(Å)
print(Å)

According to the unicode standard, all 3 of these characters should be
normalised into the same representation.

Just re-read UAX #31. I see two different issues here too--do these match
up with what you're saying above?

* Disallowing certain glyphs in identifiers. To do so, we can implement the
recommendation to disallow all glyphs in UAX #31 Table 4, except ZWJ and
ZWNJ in the specific scenarios outlined in section 2.3.

* Internally, when comparing two identifiers A and B, compare NFC(A) and
NFC(B) without modifying or otherwise restricting the actual user-facing
code to contain only NFC-normalized strings. This would be the approach
recommended in section 1.3.

···

On Thu, Jun 23, 2016 at 12:56 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

On Thu, Jun 23, 2016 at 12:41 PM, João Pinheiro <joao@joaopinheiro.org> > wrote:

Sincerely,

João Pinheiro

On 23 Jun 2016, at 17:40, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

I think this issue is bigger than that. As UAX #31 suggests, the most
appropriate approach is canonicalizing identifiers by NFC, with specific
treatment of ZWJ and ZWNJ by allowing them in three contexts, which will
require thought as to how to implement.

Given that there is a specifically recommended algorithm on how to handle
this issue, I'm also not sure anymore that this requires a proposal;
"process Unicode correctly" is really more of a bug fix because, given the
strict limits of what's canonicalized, there shouldn't be a user-facing
effect if we are merely proposing to prohibit glyphs from appearing in
certain contexts where they are never in fact encountered in real language.

On Thu, Jun 23, 2016 at 11:19 AM Sean Heber <sean@fifthace.com> wrote:

I’m no unicode expert, but this sounds like the way to go to me.

l8r
Sean

> On Jun 23, 2016, at 11:17 AM, João Pinheiro via swift-evolution < >>> swift-evolution@swift.org> wrote:
>
>
>> On 21 Jun 2016, at 20:15, Xiaodi Wu via swift-evolution < >>> swift-evolution@swift.org> wrote:
>>
>> On Tue, Jun 21, 2016 at 1:16 PM, Joe Groff <jgroff@apple.com> wrote:
>> Any discussion about this ought to start from UAX #31, the Unicode
consortium's recommendations on identifiers in programming languages:
>>
>> UAX #31: Unicode Identifiers and Syntax
>>
>> Section 2.3 specifically calls out the situations in which ZWJ and
ZWNJ need to be allowed. The document also describes a stability policy for
handling new Unicode versions, other confusability issues, and many of the
other problems with adopting Unicode in a programming language's syntax.
>>
>> That's a fantastic document--a very edifying read. Given Swift's
robust support for Unicode in its core libraries, it's kind of surprising
to me that identifiers aren't canonicalized at compile time. From a quick
first read, faithful adoption of UAX #31 recommendations would address most
if not all of the confusability and zero-width security issues raised in
this conversation.
>
> From what I've read of UAX #31 it does seem to address all of the
invisible character issues raised in the discussion. Given their unicode
status of of Default_Ignorable_Code_Points, I believe the best course of
action would be to canonicalise identifiers by allowing invisible
characters only where appropriate and ignoring them everywhere else.
>
> The alternative to ignoring them would be to not canonicalise
identifiers and treat invisible characters as an error instead.
>
> This doesn't address the issue of unicode confusable characters, but
solving that has additional problems of its own and would probably be
better addressed in a different proposal entirely.
>
> I'd like to start writing the proposal if there is agreement that this
would be the best course of action.
>
> Sincerely,
> João Pinheiro
> _______________________________________________
> swift-evolution mailing list
> swift-evolution@swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution

Joao_Pinheiro · June 23, 2016, 7:29pm

I think we're using terminology differently here. What you call "character normalization" is what I'm calling canonicalization. NFC is described in UAX #15 as "canonical decomposition followed by canonical composition" and I'm just using the word "canonicalization" because it's shorter. If Swift represents each identifier in an NFC-transformed form (what I call canonicalized), then I understand the identifier to be canonicalized. What is the distinction you're drawing here?

There is a small difference between normalisation and canonicalisation, but it's mostly splitting hairs. They both ensure something is represented properly, but canonicalisation implies establishing a single base representation for something. Web addresses are a good example. Both http://www.apple.com and http://apple.com are valid normalised addresses, but only the former is the canonical address for the Apple website.

Just re-read UAX #31. I see two different issues here too--do these match up with what you're saying above?

* Disallowing certain glyphs in identifiers. To do so, we can implement the recommendation to disallow all glyphs in UAX #31 Table 4, except ZWJ and ZWNJ in the specific scenarios outlined in section 2.3.

* Internally, when comparing two identifiers A and B, compare NFC(A) and NFC(B) without modifying or otherwise restricting the actual user-facing code to contain only NFC-normalized strings. This would be the approach recommended in section 1.3.

Yes, that's correct. The proposal would be to normalise the encoding via NFC and then canonicalise the identifiers by ignoring invisible characters except in the scenarios described in UAX #31.

Sincerely,
João Pinheiro

saagarjha · June 27, 2016, 5:59am

The problem with depending on the IDE is that not everyone is using
Xcode…or even a modern IDE. There are those that are using basic text
editors, which must be considered as well.

···

On Sun, Jun 26, 2016 at 9:25 PM Charlie Monroe via swift-evolution < swift-evolution@swift.org> wrote:

> On Jun 25, 2016, at 7:12 AM, David Sweeris <davesweeris@mac.com> wrote:
>
>
>> On Jun 24, 2016, at 23:13, Charlie Monroe via swift-evolution < > swift-evolution@swift.org> wrote:
>>
>> BTW how far along with programming do you think you'd get without the
knowledge of English? All libraries, SDKs use English identifiers. The
documentation is in English. For one to lear programming without actually
knowing any English would require the language to have localizable
identifiers. Can you imagine those? Given how much time is put here to
standardize the naming of a few methods in the standard library, how would
it look in other languages?
>
> Speaking of which, hypothetically, if we wanted to support translations
of Swift itself (and the standard library), would it be better to have the
compiler figure out how to make object files work across languages, or
would it be better for the on-disk file to always be in the "canonical"
language and have the IDE do the translation?

Historically, these languages were 100% translated and required localized
compiler support (we're talking about BASIC, Pascal) since back then IDE
support was quite poor. Nowadays, on-the-fly translation by the IDE would
probably work out the best.

> I'm *not* proposing we do this... Just thinking about what would need to
be done and how hard it would be.
>
> - Dave Sweeris

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

--
-Saagar Jha