Prohibit invisible characters in identifier names

Sure, but if you want to have translated identifiers, there's really no other (better) option unless you want to create ABI incompatible code (given that Swift 4 has a finalized ABI) that only runs on your localized system.

···

On Jun 27, 2016, at 7:59 AM, Saagar Jha <saagarjha28@gmail.com> wrote:

The problem with depending on the IDE is that not everyone is using Xcode…or even a modern IDE. There are those that are using basic text editors, which must be considered as well.

On Sun, Jun 26, 2016 at 9:25 PM Charlie Monroe via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

> On Jun 25, 2016, at 7:12 AM, David Sweeris <davesweeris@mac.com <mailto:davesweeris@mac.com>> wrote:
>
>
>> On Jun 24, 2016, at 23:13, Charlie Monroe via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:
>>
>> BTW how far along with programming do you think you'd get without the knowledge of English? All libraries, SDKs use English identifiers. The documentation is in English. For one to lear programming without actually knowing any English would require the language to have localizable identifiers. Can you imagine those? Given how much time is put here to standardize the naming of a few methods in the standard library, how would it look in other languages?
>
> Speaking of which, hypothetically, if we wanted to support translations of Swift itself (and the standard library), would it be better to have the compiler figure out how to make object files work across languages, or would it be better for the on-disk file to always be in the "canonical" language and have the IDE do the translation?

Historically, these languages were 100% translated and required localized compiler support (we're talking about BASIC, Pascal) since back then IDE support was quite poor. Nowadays, on-the-fly translation by the IDE would probably work out the best.

> I'm *not* proposing we do this... Just thinking about what would need to be done and how hard it would be.
>
> - Dave Sweeris

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution
--
-Saagar Jha

That's cool, although my preferred solution would be more closely aligned
with UAX #31: overtly disallow the glyphs in Table 4 (instead of ignoring
them) except in the specific scenarios for ZWJ and ZWNJ identified in UAX
#31, then afterwards internally represent the identifier as its
NFC-normalized string.

···

On Thu, Jun 23, 2016 at 2:29 PM, João Pinheiro <joao@joaopinheiro.org> wrote:

> I think we're using terminology differently here. What you call
"character normalization" is what I'm calling canonicalization. NFC is
described in UAX #15 as "canonical decomposition followed by canonical
composition" and I'm just using the word "canonicalization" because it's
shorter. If Swift represents each identifier in an NFC-transformed form
(what I call canonicalized), then I understand the identifier to be
canonicalized. What is the distinction you're drawing here?

There is a small difference between normalisation and canonicalisation,
but it's mostly splitting hairs. They both ensure something is represented
properly, but canonicalisation implies establishing a single base
representation for something. Web addresses are a good example. Both
http://www.apple.com and http://apple.com are valid normalised addresses,
but only the former is the canonical address for the Apple website.

> Just re-read UAX #31. I see two different issues here too--do these
match up with what you're saying above?
>
> * Disallowing certain glyphs in identifiers. To do so, we can implement
the recommendation to disallow all glyphs in UAX #31 Table 4, except ZWJ
and ZWNJ in the specific scenarios outlined in section 2.3.
>
> * Internally, when comparing two identifiers A and B, compare NFC(A) and
NFC(B) without modifying or otherwise restricting the actual user-facing
code to contain only NFC-normalized strings. This would be the approach
recommended in section 1.3.

Yes, that's correct. The proposal would be to normalise the encoding via
NFC and then canonicalise the identifiers by ignoring invisible characters
except in the scenarios described in UAX #31

Explicitly disallowing them was my initial idea, but I think it would end up being a confusing error for users to encounter. Ignoring the invisible characters and leaving it up to a linter to remove them is less likely to cause confusion for users.

I'll be sure to describe the alternative of explicitly prohibiting them in the proposal though.

Sincerely,
João Pinheiro

···

On 23 Jun 2016, at 20:43, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:
That's cool, although my preferred solution would be more closely aligned with UAX #31: overtly disallow the glyphs in Table 4 (instead of ignoring them) except in the specific scenarios for ZWJ and ZWNJ identified in UAX #31, then afterwards internally represent the identifier as its NFC-normalized string.

It’s leaving a lot of Linux users out to dry; a better option may be a sort
of hybrid approach with a “middleman” tool that does the translation, which
people could simply add as a build step if they needed translation.

···

On Sun, Jun 26, 2016 at 11:08 PM Charlie Monroe <charlie@charliemonroe.net> wrote:

Sure, but if you want to have translated identifiers, there's really no
other (better) option unless you want to create ABI incompatible code
(given that Swift 4 has a finalized ABI) that only runs on your localized
system.

On Jun 27, 2016, at 7:59 AM, Saagar Jha <saagarjha28@gmail.com> wrote:

The problem with depending on the IDE is that not everyone is using
Xcode…or even a modern IDE. There are those that are using basic text
editors, which must be considered as well.

On Sun, Jun 26, 2016 at 9:25 PM Charlie Monroe via swift-evolution < > swift-evolution@swift.org> wrote:

> On Jun 25, 2016, at 7:12 AM, David Sweeris <davesweeris@mac.com> wrote:
>
>
>> On Jun 24, 2016, at 23:13, Charlie Monroe via swift-evolution < >> swift-evolution@swift.org> wrote:
>>
>> BTW how far along with programming do you think you'd get without the
knowledge of English? All libraries, SDKs use English identifiers. The
documentation is in English. For one to lear programming without actually
knowing any English would require the language to have localizable
identifiers. Can you imagine those? Given how much time is put here to
standardize the naming of a few methods in the standard library, how would
it look in other languages?
>
> Speaking of which, hypothetically, if we wanted to support translations
of Swift itself (and the standard library), would it be better to have the
compiler figure out how to make object files work across languages, or
would it be better for the on-disk file to always be in the "canonical"
language and have the IDE do the translation?

Historically, these languages were 100% translated and required localized
compiler support (we're talking about BASIC, Pascal) since back then IDE
support was quite poor. Nowadays, on-the-fly translation by the IDE would
probably work out the best.

> I'm *not* proposing we do this... Just thinking about what would need
to be done and how hard it would be.
>
> - Dave Sweeris

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

--
-Saagar Jha

--

-Saagar Jha

> That's cool, although my preferred solution would be more closely
aligned with UAX #31: overtly disallow the glyphs in Table 4 (instead of
ignoring them) except in the specific scenarios for ZWJ and ZWNJ identified
in UAX #31, then afterwards internally represent the identifier as its
NFC-normalized string.

Explicitly disallowing them was my initial idea, but I think it would end
up being a confusing error for users to encounter. Ignoring the invisible
characters and leaving it up to a linter to remove them is less likely to
cause confusion for users.

I'll be sure to describe the alternative of explicitly prohibiting them in
the proposal though.

I would strongly urge you to propose explicitly prohibiting them just as
UAX #31 recommends. Their reasoning is that these characters, which include
those that reverse text direction or control joining, can cause one
identifier to be maliciously changed to look like another. If you ignore
these characters instead of prohibiting them, an identifier that visually
appears as one string could in fact be a different one to the compiler.

Moreover, a compiler error can be made helpful by saying that the offending
character is potentially invisible and it can come with a fix-it to remove
the offending character. I don't think that would confuse the user at all.
It would be more confusing if invisible characters that caused one thing to
look identical to another were silently permitted.

···

On Thu, Jun 23, 2016 at 2:54 PM, João Pinheiro <joao@joaopinheiro.org> wrote:

> On 23 Jun 2016, at 20:43, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

Sincerely,
João Pinheiro

+1
I didn't even know there were any invisible characters until this thread came up.

- Dave Sweeris

···

On Jun 23, 2016, at 15:13, Xiaodi Wu via swift-evolution <swift-evolution@swift.org> wrote:

On Thu, Jun 23, 2016 at 2:54 PM, João Pinheiro <joao@joaopinheiro.org> wrote:

> On 23 Jun 2016, at 20:43, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:
> That's cool, although my preferred solution would be more closely aligned with UAX #31: overtly disallow the glyphs in Table 4 (instead of ignoring them) except in the specific scenarios for ZWJ and ZWNJ identified in UAX #31, then afterwards internally represent the identifier as its NFC-normalized string.

Explicitly disallowing them was my initial idea, but I think it would end up being a confusing error for users to encounter. Ignoring the invisible characters and leaving it up to a linter to remove them is less likely to cause confusion for users.

I'll be sure to describe the alternative of explicitly prohibiting them in the proposal though.

I would strongly urge you to propose explicitly prohibiting them just as UAX #31 recommends. Their reasoning is that these characters, which include those that reverse text direction or control joining, can cause one identifier to be maliciously changed to look like another. If you ignore these characters instead of prohibiting them, an identifier that visually appears as one string could in fact be a different one to the compiler.

Moreover, a compiler error can be made helpful by saying that the offending character is potentially invisible and it can come with a fix-it to remove the offending character. I don't think that would confuse the user at all. It would be more confusing if invisible characters that caused one thing to look identical to another were silently permitted.

Sincerely,
João Pinheiro

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

+1 on this. Josh Wisenbaker’s example says enough. Yikes!

···

On Jun 23, 2016, at 3:18 PM, David Sweeris via swift-evolution <swift-evolution@swift.org> wrote:

+1
I didn't even know there were any invisible characters until this thread came up.

- Dave Sweeris

On Jun 23, 2016, at 15:13, Xiaodi Wu via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Thu, Jun 23, 2016 at 2:54 PM, João Pinheiro <joao@joaopinheiro.org <mailto:joao@joaopinheiro.org>> wrote:

> On 23 Jun 2016, at 20:43, Xiaodi Wu <xiaodi.wu@gmail.com <mailto:xiaodi.wu@gmail.com>> wrote:
> That's cool, although my preferred solution would be more closely aligned with UAX #31: overtly disallow the glyphs in Table 4 (instead of ignoring them) except in the specific scenarios for ZWJ and ZWNJ identified in UAX #31, then afterwards internally represent the identifier as its NFC-normalized string.

Explicitly disallowing them was my initial idea, but I think it would end up being a confusing error for users to encounter. Ignoring the invisible characters and leaving it up to a linter to remove them is less likely to cause confusion for users.

I'll be sure to describe the alternative of explicitly prohibiting them in the proposal though.

I would strongly urge you to propose explicitly prohibiting them just as UAX #31 recommends. Their reasoning is that these characters, which include those that reverse text direction or control joining, can cause one identifier to be maliciously changed to look like another. If you ignore these characters instead of prohibiting them, an identifier that visually appears as one string could in fact be a different one to the compiler.

Moreover, a compiler error can be made helpful by saying that the offending character is potentially invisible and it can come with a fix-it to remove the offending character. I don't think that would confuse the user at all. It would be more confusing if invisible characters that caused one thing to look identical to another were silently permitted.

Sincerely,
João Pinheiro

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Thanks Xiaodi. That’s a relief to know.

···

On Jun 23, 2016, at 3:32 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

FWIW, Josh's example would be fixed whether we prohibit or ignore invisible characters, but there are other potential strings for which prohibition would be more secure.

On Thu, Jun 23, 2016 at 15:27 James Hillhouse <jdhillhouse4@icloud.com <mailto:jdhillhouse4@icloud.com>> wrote:
+1 on this. Josh Wisenbaker’s example says enough. Yikes!

On Jun 23, 2016, at 3:18 PM, David Sweeris via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

+1
I didn't even know there were any invisible characters until this thread came up.

- Dave Sweeris

On Jun 23, 2016, at 15:13, Xiaodi Wu via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Thu, Jun 23, 2016 at 2:54 PM, João Pinheiro <joao@joaopinheiro.org <mailto:joao@joaopinheiro.org>> wrote:

> On 23 Jun 2016, at 20:43, Xiaodi Wu <xiaodi.wu@gmail.com <mailto:xiaodi.wu@gmail.com>> wrote:
> That's cool, although my preferred solution would be more closely aligned with UAX #31: overtly disallow the glyphs in Table 4 (instead of ignoring them) except in the specific scenarios for ZWJ and ZWNJ identified in UAX #31, then afterwards internally represent the identifier as its NFC-normalized string.

Explicitly disallowing them was my initial idea, but I think it would end up being a confusing error for users to encounter. Ignoring the invisible characters and leaving it up to a linter to remove them is less likely to cause confusion for users.

I'll be sure to describe the alternative of explicitly prohibiting them in the proposal though.

I would strongly urge you to propose explicitly prohibiting them just as UAX #31 recommends. Their reasoning is that these characters, which include those that reverse text direction or control joining, can cause one identifier to be maliciously changed to look like another. If you ignore these characters instead of prohibiting them, an identifier that visually appears as one string could in fact be a different one to the compiler.

Moreover, a compiler error can be made helpful by saying that the offending character is potentially invisible and it can come with a fix-it to remove the offending character. I don't think that would confuse the user at all. It would be more confusing if invisible characters that caused one thing to look identical to another were silently permitted.

Sincerely,
João Pinheiro

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

Allowing invisibles has already resulted in being able to do things like this which is intensely confusing to say the very least.

Josh

···

On Jun 23, 2016, at 3:54 PM, João Pinheiro via swift-evolution <swift-evolution@swift.org> wrote:

On 23 Jun 2016, at 20:43, Xiaodi Wu <xiaodi.wu@gmail.com <mailto:xiaodi.wu@gmail.com>> wrote:
That's cool, although my preferred solution would be more closely aligned with UAX #31: overtly disallow the glyphs in Table 4 (instead of ignoring them) except in the specific scenarios for ZWJ and ZWNJ identified in UAX #31, then afterwards internally represent the identifier as its NFC-normalized string.

Explicitly disallowing them was my initial idea, but I think it would end up being a confusing error for users to encounter. Ignoring the invisible characters and leaving it up to a linter to remove them is less likely to cause confusion for users.

I asked my colleague who played the prank on me and got the details:

"Lines 4 and 5 declare variables with embedded Unicode Zero Width Spaces (U+200B) in their names. Line 4 is actually “var\U+200B1”, not “var1”. Isn’t it nice of Swift to be this flexible :blush:

Josh

Josh Wisenbaker
macshome@mac.com

···

On Jun 23, 2016, at 4:45 PM, Xiaodi Wu via swift-evolution <swift-evolution@swift.org> wrote:

Let me correct myself: what I think Josh's example is should be corrected whether we prohibit or ignore. However, since no one can see the invisible characters he used, I can't say for sure.

If he found a clever way to reorder or change spacing between letters (e.g. superimpose two characters so that "var11" looks like "var1"), then the problem can only be fixed by prohibition.

FWIW, Josh's example would be fixed whether we prohibit or ignore invisible
characters, but there are other potential strings for which prohibition
would be more secure.

···

On Thu, Jun 23, 2016 at 15:27 James Hillhouse <jdhillhouse4@icloud.com> wrote:

+1 on this. Josh Wisenbaker’s example says enough. Yikes!

On Jun 23, 2016, at 3:18 PM, David Sweeris via swift-evolution < > swift-evolution@swift.org> wrote:

+1
I didn't even know there were any invisible characters until this thread
came up.

- Dave Sweeris

On Jun 23, 2016, at 15:13, Xiaodi Wu via swift-evolution < > swift-evolution@swift.org> wrote:

On Thu, Jun 23, 2016 at 2:54 PM, João Pinheiro <joao@joaopinheiro.org> > wrote:

> On 23 Jun 2016, at 20:43, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:
> That's cool, although my preferred solution would be more closely
aligned with UAX #31: overtly disallow the glyphs in Table 4 (instead of
ignoring them) except in the specific scenarios for ZWJ and ZWNJ identified
in UAX #31, then afterwards internally represent the identifier as its
NFC-normalized string.

Explicitly disallowing them was my initial idea, but I think it would end
up being a confusing error for users to encounter. Ignoring the invisible
characters and leaving it up to a linter to remove them is less likely to
cause confusion for users.

I'll be sure to describe the alternative of explicitly prohibiting them
in the proposal though.

I would strongly urge you to propose explicitly prohibiting them just as
UAX #31 recommends. Their reasoning is that these characters, which include
those that reverse text direction or control joining, can cause one
identifier to be maliciously changed to look like another. If you ignore
these characters instead of prohibiting them, an identifier that visually
appears as one string could in fact be a different one to the compiler.

Moreover, a compiler error can be made helpful by saying that the
offending character is potentially invisible and it can come with a fix-it
to remove the offending character. I don't think that would confuse the
user at all. It would be more confusing if invisible characters that caused
one thing to look identical to another were silently permitted.

Sincerely,
João Pinheiro

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Let me correct myself: what I think Josh's example is should be corrected
whether we prohibit or ignore. However, since no one can see the invisible
characters he used, I can't say for sure.

If he found a clever way to reorder or change spacing between letters (e.g.
superimpose two characters so that "var11" looks like "var1"), then the
problem can only be fixed by prohibition.

···

On Thu, Jun 23, 2016 at 15:36 James Hillhouse <jdhillhouse4@icloud.com> wrote:

Thanks Xiaodi. That’s a relief to know.

On Jun 23, 2016, at 3:32 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

FWIW, Josh's example would be fixed whether we prohibit or ignore
invisible characters, but there are other potential strings for which
prohibition would be more secure.

On Thu, Jun 23, 2016 at 15:27 James Hillhouse <jdhillhouse4@icloud.com> > wrote:

+1 on this. Josh Wisenbaker’s example says enough. Yikes!

On Jun 23, 2016, at 3:18 PM, David Sweeris via swift-evolution < >> swift-evolution@swift.org> wrote:

+1
I didn't even know there were any invisible characters until this thread
came up.

- Dave Sweeris

On Jun 23, 2016, at 15:13, Xiaodi Wu via swift-evolution < >> swift-evolution@swift.org> wrote:

On Thu, Jun 23, 2016 at 2:54 PM, João Pinheiro <joao@joaopinheiro.org> >> wrote:

> On 23 Jun 2016, at 20:43, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:
> That's cool, although my preferred solution would be more closely
aligned with UAX #31: overtly disallow the glyphs in Table 4 (instead of
ignoring them) except in the specific scenarios for ZWJ and ZWNJ identified
in UAX #31, then afterwards internally represent the identifier as its
NFC-normalized string.

Explicitly disallowing them was my initial idea, but I think it would
end up being a confusing error for users to encounter. Ignoring the
invisible characters and leaving it up to a linter to remove them is less
likely to cause confusion for users.

I'll be sure to describe the alternative of explicitly prohibiting them
in the proposal though.

I would strongly urge you to propose explicitly prohibiting them just as
UAX #31 recommends. Their reasoning is that these characters, which include
those that reverse text direction or control joining, can cause one
identifier to be maliciously changed to look like another. If you ignore
these characters instead of prohibiting them, an identifier that visually
appears as one string could in fact be a different one to the compiler.

Moreover, a compiler error can be made helpful by saying that the
offending character is potentially invisible and it can come with a fix-it
to remove the offending character. I don't think that would confuse the
user at all. It would be more confusing if invisible characters that caused
one thing to look identical to another were silently permitted.

Sincerely,
João Pinheiro

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Indeed, the case shown in Josh's example was the motivation for this thread and will be solved by the proposal.

The current discussion has been around whether it should be solved by ignoring invisible characters or prohibiting them and explicitly highlighting them as an error. I originally proposed prohibiting them and was convinced into thinking that ignoring them would suffice. Upon further reading of the unicode normalisation and security documents, I agree that prohibiting them outside of the situations described in UAX #31 is the best and safest choice.

Sincerely,
João Pinheiro

···

On 23 Jun 2016, at 21:45, Xiaodi Wu via swift-evolution <swift-evolution@swift.org> wrote:

Let me correct myself: what I think Josh's example is should be corrected whether we prohibit or ignore. However, since no one can see the invisible characters he used, I can't say for sure.

If he found a clever way to reorder or change spacing between letters (e.g. superimpose two characters so that "var11" looks like "var1"), then the problem can only be fixed by prohibition.
On Thu, Jun 23, 2016 at 15:36 James Hillhouse <jdhillhouse4@icloud.com <mailto:jdhillhouse4@icloud.com>> wrote:
Thanks Xiaodi. That’s a relief to know.

On Jun 23, 2016, at 3:32 PM, Xiaodi Wu <xiaodi.wu@gmail.com <mailto:xiaodi.wu@gmail.com>> wrote:

FWIW, Josh's example would be fixed whether we prohibit or ignore invisible characters, but there are other potential strings for which prohibition would be more secure.

On Thu, Jun 23, 2016 at 15:27 James Hillhouse <jdhillhouse4@icloud.com <mailto:jdhillhouse4@icloud.com>> wrote:
+1 on this. Josh Wisenbaker’s example says enough. Yikes!

On Jun 23, 2016, at 3:18 PM, David Sweeris via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

+1
I didn't even know there were any invisible characters until this thread came up.

- Dave Sweeris

On Jun 23, 2016, at 15:13, Xiaodi Wu via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Thu, Jun 23, 2016 at 2:54 PM, João Pinheiro <joao@joaopinheiro.org <mailto:joao@joaopinheiro.org>> wrote:

> On 23 Jun 2016, at 20:43, Xiaodi Wu <xiaodi.wu@gmail.com <mailto:xiaodi.wu@gmail.com>> wrote:
> That's cool, although my preferred solution would be more closely aligned with UAX #31: overtly disallow the glyphs in Table 4 (instead of ignoring them) except in the specific scenarios for ZWJ and ZWNJ identified in UAX #31, then afterwards internally represent the identifier as its NFC-normalized string.

Explicitly disallowing them was my initial idea, but I think it would end up being a confusing error for users to encounter. Ignoring the invisible characters and leaving it up to a linter to remove them is less likely to cause confusion for users.

I'll be sure to describe the alternative of explicitly prohibiting them in the proposal though.

I would strongly urge you to propose explicitly prohibiting them just as UAX #31 recommends. Their reasoning is that these characters, which include those that reverse text direction or control joining, can cause one identifier to be maliciously changed to look like another. If you ignore these characters instead of prohibiting them, an identifier that visually appears as one string could in fact be a different one to the compiler.

Moreover, a compiler error can be made helpful by saying that the offending character is potentially invisible and it can come with a fix-it to remove the offending character. I don't think that would confuse the user at all. It would be more confusing if invisible characters that caused one thing to look identical to another were silently permitted.

Sincerely,
João Pinheiro

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

This was exactly the motivation for the proposal and a similar example was given on the first email of the thread.

Try this:

func test() { print("A") }
func t​est() { print("B") }
func te​st() { print("C") }

let abc = 1
let a​bc = 2
let ab​c = 3

test()
t​est()
te​st()

print(abc)
print(a​bc)
print(ab​c)

Sincerely,
João Pinheiro

···

On 23 Jun 2016, at 22:59, Josh Wisenbaker via swift-evolution <swift-evolution@swift.org> wrote:

On Jun 23, 2016, at 4:45 PM, Xiaodi Wu via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Let me correct myself: what I think Josh's example is should be corrected whether we prohibit or ignore. However, since no one can see the invisible characters he used, I can't say for sure.

If he found a clever way to reorder or change spacing between letters (e.g. superimpose two characters so that "var11" looks like "var1"), then the problem can only be fixed by prohibition.

I asked my colleague who played the prank on me and got the details:

"Lines 4 and 5 declare variables with embedded Unicode Zero Width Spaces (U+200B) in their names. Line 4 is actually “var\U+200B1”, not “var1”. Isn’t it nice of Swift to be this flexible :blush:

Josh

Josh Wisenbaker
macshome@mac.com <mailto:macshome@mac.com>

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Indeed, the case shown in Josh's example was the motivation for this thread
and will be solved by the proposal.

The current discussion has been around whether it should be solved by
ignoring invisible characters or prohibiting them and explicitly
highlighting them as an error. I originally proposed prohibiting them and
was convinced into thinking that ignoring them would suffice. Upon further
reading of the unicode normalisation and security documents, I agree that
prohibiting them outside of the situations described in UAX #31 is the best
and safest choice.

I do believe the *safest* variant should be chosen as, actually, do we see lot of sources with unicode identifiers? I believe very small percent in real code. IMO At first we should protect Swift from problems with unicode identifiers, and only after this support as much unicode as we can.
(Personally I really don't understand why we need anything than ASCII codes for identifiers. This could solve all the problems with invisible space/left-to-right-flags/complicated rules/graphemes etc. But someone needs to be able to put dog emoji as identifiers.. well.. OK)

···

On 24.06.2016 0:57, João Pinheiro via swift-evolution wrote:

Sincerely,
João Pinheiro

On 23 Jun 2016, at 21:45, Xiaodi Wu via swift-evolution >> <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Let me correct myself: what I think Josh's example is should be corrected
whether we prohibit or ignore. However, since no one can see the
invisible characters he used, I can't say for sure.

If he found a clever way to reorder or change spacing between letters
(e.g. superimpose two characters so that "var11" looks like "var1"), then
the problem can only be fixed by prohibition.
On Thu, Jun 23, 2016 at 15:36 James Hillhouse <jdhillhouse4@icloud.com >> <mailto:jdhillhouse4@icloud.com>> wrote:

    Thanks Xiaodi. That’s a relief to know.

    On Jun 23, 2016, at 3:32 PM, Xiaodi Wu <xiaodi.wu@gmail.com >>> <mailto:xiaodi.wu@gmail.com>> wrote:

    FWIW, Josh's example would be fixed whether we prohibit or ignore
    invisible characters, but there are other potential strings for
    which prohibition would be more secure.

    On Thu, Jun 23, 2016 at 15:27 James Hillhouse >>> <jdhillhouse4@icloud.com <mailto:jdhillhouse4@icloud.com>> wrote:

        +1 on this. Josh Wisenbaker’s example says enough. Yikes!

        On Jun 23, 2016, at 3:18 PM, David Sweeris via swift-evolution >>>> <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> >>>> wrote:

        +1
        I didn't even know there were any invisible characters until
        this thread came up.

        - Dave Sweeris

        On Jun 23, 2016, at 15:13, Xiaodi Wu via swift-evolution >>>> <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> >>>> wrote:

        On Thu, Jun 23, 2016 at 2:54 PM, João Pinheiro >>>>> <joao@joaopinheiro.org <mailto:joao@joaopinheiro.org>> wrote:

            > On 23 Jun 2016, at 20:43, Xiaodi Wu <xiaodi.wu@gmail.com <mailto:xiaodi.wu@gmail.com>> wrote:
            > That's cool, although my preferred solution would be more closely aligned with UAX #31: overtly disallow the glyphs in Table 4 (instead of ignoring them) except in the specific scenarios for ZWJ and ZWNJ identified in UAX #31, then afterwards internally represent the identifier as its NFC-normalized string.

            Explicitly disallowing them was my initial idea, but I
            think it would end up being a confusing error for users to
            encounter. Ignoring the invisible characters and leaving
            it up to a linter to remove them is less likely to cause
            confusion for users.

            I'll be sure to describe the alternative of explicitly
            prohibiting them in the proposal though.

        I would strongly urge you to propose explicitly prohibiting
        them just as UAX #31 recommends. Their reasoning is that these
        characters, which include those that reverse text direction or
        control joining, can cause one identifier to be maliciously
        changed to look like another. If you ignore these characters
        instead of prohibiting them, an identifier that visually
        appears as one string could in fact be a different one to the
        compiler.

        Moreover, a compiler error can be made helpful by saying that
        the offending character is potentially invisible and it can
        come with a fix-it to remove the offending character. I don't
        think that would confuse the user at all. It would be more
        confusing if invisible characters that caused one thing to
        look identical to another were silently permitted.

            Sincerely,
            João Pinheiro

        _______________________________________________
        swift-evolution mailing list
        swift-evolution@swift.org <mailto:swift-evolution@swift.org>
        https://lists.swift.org/mailman/listinfo/swift-evolution

        _______________________________________________
        swift-evolution mailing list
        swift-evolution@swift.org <mailto:swift-evolution@swift.org>
        https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Did you even watch the WWDC keynote? This was basically the transcript of it:

"Emoji! Emoji emoji emoji. Emoji emoji. Emoji! Emoji emoji emoji emoji. Emoji Emoji? Emoji. Emoji!”

(cue a bunch of emoji bouncing around on an iPad screen)

Charles

···

On Jun 24, 2016, at 8:27 AM, Vladimir.S via swift-evolution <swift-evolution@swift.org> wrote:

(Personally I really don't understand why we need anything than ASCII codes for identifiers. This could solve all the problems with invisible space/left-to-right-flags/complicated rules/graphemes etc. But someone needs to be able to put dog emoji as identifiers.. well.. OK)

Math symbols make everything better (at least if you're into math).

- Dave Sweeris

···

On Jun 24, 2016, at 08:27, Vladimir.S via swift-evolution <swift-evolution@swift.org> wrote:

(Personally I really don't understand why we need anything than ASCII codes for identifiers. This could solve all the problems with invisible space/left-to-right-flags/complicated rules/graphemes etc. But someone needs to be able to put dog emoji as identifiers.. well.. OK)