InternalString class for easy String manipulation


(Michael Savich) #1

Back in Swift 1.0, subscripting a String was easy, you could just use subscripting in a very Python like way. But now, things are a bit more complicated. I recognize why we need syntax like str.startIndex.advancedBy(x) but it has its downsides. Namely, it makes things hard on beginners. If one of Swift's goals is to make it a great first language, this syntax fights that. Imagine having to explain Unicode and character size to an 8 year old. This is doubly problematic because String manipulation is one of the first things new coders might want to do.

What about having an InternalString subclass that only supports one encoding, allowing it to be subscripted with Ints? The idea is that an InternalString is for Strings that are more or less hard coded into the app. Dictionary keys, enum raw values, that kind of stuff. This also has the added benefit of forcing the programmer to think about what the String is being used for. Is it user facing? Or is it just for internal use? And of course, it makes code dealing with String manipulation much more concise and readable.

It follows that something like this would need to be entered as a literal to make it as easy as using String. One way would be to make all String literals InternalStrings, but that sounds far too drastic. Maybe appending an exclamation point like "this"! Or even just wrapping the whole thing in exclamation marks like !"this"! Of course, we could go old school and write it like @"this" …That last one is a joke.

I'll be the first to admit I'm way in over my head here, so I'm very open to suggestions and criticism. Thanks!

···

Sent from my iPad


(William Sumner) #2

If desired for educational purposes, subscripting can be added to String as an extension. On the other hand, the current APIs convey the performance costs of traversing strings. In my opinion, that is the correct approach to learn in a Unicode-aware world.

Preston

···

On Aug 14, 2016, at 4:41 PM, Michael Savich via swift-evolution <swift-evolution@swift.org> wrote:

Back in Swift 1.0, subscripting a String was easy, you could just use subscripting in a very Python like way. But now, things are a bit more complicated. I recognize why we need syntax like str.startIndex.advancedBy(x) but it has its downsides. Namely, it makes things hard on beginners. If one of Swift's goals is to make it a great first language, this syntax fights that. Imagine having to explain Unicode and character size to an 8 year old. This is doubly problematic because String manipulation is one of the first things new coders might want to do.

What about having an InternalString subclass that only supports one encoding, allowing it to be subscripted with Ints? The idea is that an InternalString is for Strings that are more or less hard coded into the app. Dictionary keys, enum raw values, that kind of stuff. This also has the added benefit of forcing the programmer to think about what the String is being used for. Is it user facing? Or is it just for internal use? And of course, it makes code dealing with String manipulation much more concise and readable.

It follows that something like this would need to be entered as a literal to make it as easy as using String. One way would be to make all String literals InternalStrings, but that sounds far too drastic. Maybe appending an exclamation point like "this"! Or even just wrapping the whole thing in exclamation marks like !"this"! Of course, we could go old school and write it like @"this" …That last one is a joke.

I'll be the first to admit I'm way in over my head here, so I'm very open to suggestions and criticism. Thanks!


(Xiaodi Wu) #3

Back in Swift 1.0, subscripting a String was easy, you could just use
subscripting in a very Python like way. But now, things are a bit more
complicated. I recognize why we need syntax like
str.startIndex.advancedBy(x) but it has its downsides. Namely, it makes
things hard on beginners. If one of Swift's goals is to make it a great
first language, this syntax fights that. Imagine having to explain Unicode
and character size to an 8 year old. This is doubly problematic because
String manipulation is one of the first things new coders might want to do.

What about having an InternalString subclass that only supports one
encoding, allowing it to be subscripted with Ints? The idea is that an
InternalString is for Strings that are more or less hard coded into the
app. Dictionary keys, enum raw values, that kind of stuff. This also has
the added benefit of forcing the programmer to think about what the String
is being used for. Is it user facing? Or is it just for internal use? And
of course, it makes code dealing with String manipulation much more concise
and readable.

It follows that something like this would need to be entered as a literal
to make it as easy as using String. One way would be to make all String
literals InternalStrings, but that sounds far too drastic. Maybe appending
an exclamation point like "this"! Or even just wrapping the whole thing in
exclamation marks like !"this"! Of course, we could go old school and write
it like @"this" …That last one is a joke.

I'll be the first to admit I'm way in over my head here, so I'm very open
to suggestions and criticism. Thanks!

I can sympathize, but this is tricky.

Fundamentally, if it's going to be a learning and teaching issue, then this
"easy" string should be the default. That is to say, if I write `var a =
"Hello, world!"`, then `a` should be inferred to be of type InternalString
or EasyString, whatever you want to call it.

But, we also want Swift to support Unicode by default, and we want that
support to do things The Right Way(TM) by default. In other words, a user
should not have to reach for a special type in order to handle arbitrary
strings correctly, and I should be able to reassign `a = "你好"` and have
things work as expected. So, we also can't have the "easy" string type be
the default...

I can't think of a way to square that circle.

···

On Sun, Aug 14, 2016 at 5:41 PM, Michael Savich via swift-evolution < swift-evolution@swift.org> wrote:

Sent from my iPad

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Slava Pestov) #4

If you actually just want an ASCII string, you can just use an Array<UInt8> and avoid all the complexity of Unicode altogether. You won’t get the string literal syntax, but if you’re really adventurous you can wrap it in a new struct type, and define an ExpressibleByStringLiteral conformance.

Slava

···

On Aug 14, 2016, at 3:41 PM, Michael Savich via swift-evolution <swift-evolution@swift.org> wrote:

What about having an InternalString subclass that only supports one encoding, allowing it to be subscripted with Ints? The idea is that an InternalString is for Strings that are more or less hard coded into the app. Dictionary keys, enum raw values, that kind of stuff. This also has the added benefit of forcing the programmer to think about what the String is being used for. Is it user facing? Or is it just for internal use? And of course, it makes code dealing with String manipulation much more concise and readable.


(Jacob Bandes-Storch) #5

Here's a little prior discussion about ASCIIString:
https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20151207/002138.html

Jacob

···

On Sun, Aug 14, 2016 at 3:41 PM, Michael Savich via swift-evolution < swift-evolution@swift.org> wrote:

Back in Swift 1.0, subscripting a String was easy, you could just use
subscripting in a very Python like way. But now, things are a bit more
complicated. I recognize why we need syntax like
str.startIndex.advancedBy(x) but it has its downsides. Namely, it makes
things hard on beginners. If one of Swift's goals is to make it a great
first language, this syntax fights that. Imagine having to explain Unicode
and character size to an 8 year old. This is doubly problematic because
String manipulation is one of the first things new coders might want to do.

What about having an InternalString subclass that only supports one
encoding, allowing it to be subscripted with Ints? The idea is that an
InternalString is for Strings that are more or less hard coded into the
app. Dictionary keys, enum raw values, that kind of stuff. This also has
the added benefit of forcing the programmer to think about what the String
is being used for. Is it user facing? Or is it just for internal use? And
of course, it makes code dealing with String manipulation much more concise
and readable.

It follows that something like this would need to be entered as a literal
to make it as easy as using String. One way would be to make all String
literals InternalStrings, but that sounds far too drastic. Maybe appending
an exclamation point like "this"! Or even just wrapping the whole thing in
exclamation marks like !"this"! Of course, we could go old school and write
it like @"this" …That last one is a joke.

I'll be the first to admit I'm way in over my head here, so I'm very open
to suggestions and criticism. Thanks!

Sent from my iPad

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Dave Abrahams) #6

Just to correct the record: no, it was really never like that in Swift.

···

on Sun Aug 14 2016, Michael Savich <swift-evolution@swift.org> wrote:

Back in Swift 1.0, subscripting a String was easy, you could just use
subscripting in a very Python like way.

--
-Dave


(Kenny Leung) #7

I agree with both points of view. I think we need to bring back subscripting on strings which does the thing people would most commonly expect.

I would say that the subscripts indexes should correspond to a visual glyph. This seems reasonable to me for most character sets like Roman, Cyrillic, Chinese. There is some doubt in my mind for things like subscripted Japanese or connected (ligatured?) languages like Arabic, Hindi or Thai.

-Kenny

···

On Aug 15, 2016, at 10:42 AM, Xiaodi Wu via swift-evolution <swift-evolution@swift.org> wrote:

On Sun, Aug 14, 2016 at 5:41 PM, Michael Savich via swift-evolution <swift-evolution@swift.org> wrote:
Back in Swift 1.0, subscripting a String was easy, you could just use subscripting in a very Python like way. But now, things are a bit more complicated. I recognize why we need syntax like str.startIndex.advancedBy(x) but it has its downsides. Namely, it makes things hard on beginners. If one of Swift's goals is to make it a great first language, this syntax fights that. Imagine having to explain Unicode and character size to an 8 year old. This is doubly problematic because String manipulation is one of the first things new coders might want to do.

What about having an InternalString subclass that only supports one encoding, allowing it to be subscripted with Ints? The idea is that an InternalString is for Strings that are more or less hard coded into the app. Dictionary keys, enum raw values, that kind of stuff. This also has the added benefit of forcing the programmer to think about what the String is being used for. Is it user facing? Or is it just for internal use? And of course, it makes code dealing with String manipulation much more concise and readable.

It follows that something like this would need to be entered as a literal to make it as easy as using String. One way would be to make all String literals InternalStrings, but that sounds far too drastic. Maybe appending an exclamation point like "this"! Or even just wrapping the whole thing in exclamation marks like !"this"! Of course, we could go old school and write it like @"this" …That last one is a joke.

I'll be the first to admit I'm way in over my head here, so I'm very open to suggestions and criticism. Thanks!

I can sympathize, but this is tricky.

Fundamentally, if it's going to be a learning and teaching issue, then this "easy" string should be the default. That is to say, if I write `var a = "Hello, world!"`, then `a` should be inferred to be of type InternalString or EasyString, whatever you want to call it.

But, we also want Swift to support Unicode by default, and we want that support to do things The Right Way(TM) by default. In other words, a user should not have to reach for a special type in order to handle arbitrary strings correctly, and I should be able to reassign `a = "你好"` and have things work as expected. So, we also can't have the "easy" string type be the default...

I can't think of a way to square that circle.

Sent from my iPad

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Xiaodi Wu) #8

Swift supports arbitrary Unicode for identifier names, so Unicode would
have to be supported even for debugging strings.

···

On Mon, Aug 15, 2016 at 12:54 Michael Savich <savichmichael@icloud.com> wrote:

Well, the thing I've been thinking is that InternalString doesn't have to
be just for learning. There is value in distinguishing between whether a
String is for UI or just for code. I get that Swift wants to be Unicode
friendly, but I think that only really benefits when dealing with UI and
possibly APIs.

I think String wears too many hats-- the best way forward is to split it
up. As it is String is a bit like an NSImage, and I'm not sure that is a
good thing for a basic type.

Sent from my iPhone

On Aug 15, 2016, at 1:42 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

On Sun, Aug 14, 2016 at 5:41 PM, Michael Savich via swift-evolution < > swift-evolution@swift.org> wrote:

Back in Swift 1.0, subscripting a String was easy, you could just use
subscripting in a very Python like way. But now, things are a bit more
complicated. I recognize why we need syntax like
str.startIndex.advancedBy(x) but it has its downsides. Namely, it makes
things hard on beginners. If one of Swift's goals is to make it a great
first language, this syntax fights that. Imagine having to explain Unicode
and character size to an 8 year old. This is doubly problematic because
String manipulation is one of the first things new coders might want to do.

What about having an InternalString subclass that only supports one
encoding, allowing it to be subscripted with Ints? The idea is that an
InternalString is for Strings that are more or less hard coded into the
app. Dictionary keys, enum raw values, that kind of stuff. This also has
the added benefit of forcing the programmer to think about what the String
is being used for. Is it user facing? Or is it just for internal use? And
of course, it makes code dealing with String manipulation much more concise
and readable.

It follows that something like this would need to be entered as a literal
to make it as easy as using String. One way would be to make all String
literals InternalStrings, but that sounds far too drastic. Maybe appending
an exclamation point like "this"! Or even just wrapping the whole thing in
exclamation marks like !"this"! Of course, we could go old school and write
it like @"this" …That last one is a joke.

I'll be the first to admit I'm way in over my head here, so I'm very open
to suggestions and criticism. Thanks!

I can sympathize, but this is tricky.

Fundamentally, if it's going to be a learning and teaching issue, then
this "easy" string should be the default. That is to say, if I write `var a
= "Hello, world!"`, then `a` should be inferred to be of type
InternalString or EasyString, whatever you want to call it.

But, we also want Swift to support Unicode by default, and we want that
support to do things The Right Way(TM) by default. In other words, a user
should not have to reach for a special type in order to handle arbitrary
strings correctly, and I should be able to reassign `a = "你好"` and have
things work as expected. So, we also can't have the "easy" string type be
the default...

I can't think of a way to square that circle.

Sent from my iPad

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Félix Cloutier) #9

The major problem with this approach is that visual glyphs themselves have one level of variable-length encoding, and they sit on top of another variable-length encoding used to represent the Unicode characters (Swift-native Strings are currently encoded as UTF-8). For instance, the visual glyph :us: is the the result of putting side-by-side the Unicode characters 🇺 and 🇸("REGIONAL INDICATOR SYMBOL LETTER U" and "REGIONAL INDICATOR SYMBOL LETTER S"), which are themselves encoded as UTF-8 using 4 bytes each. A design in which you can "just write" string[4544] hides the fact that indexing is a linear-time operation that needs to recompose UTF-8 characters and then recompose visual glyphs on top of that.

Generally speaking, I *think* that I agree that human-geared "long string" on which you probably won't need random access, and machine-geared smaller strings that encode a command, could benefit from not being considered the same fundamental thing. However, I'm also afraid that this will end with more applications and websites that think that first names only contain 7-bit-clean characters in the A-Z range. (I live in the US and I can attest that this is still very common.)

You could make a point too that better facilities to parse strings would probably address this issue.

Félix

···

Le 15 août 2016 à 10:52:02, Kenny Leung via swift-evolution <swift-evolution@swift.org> a écrit :

I agree with both points of view. I think we need to bring back subscripting on strings which does the thing people would most commonly expect.

I would say that the subscripts indexes should correspond to a visual glyph. This seems reasonable to me for most character sets like Roman, Cyrillic, Chinese. There is some doubt in my mind for things like subscripted Japanese or connected (ligatured?) languages like Arabic, Hindi or Thai.

-Kenny

On Aug 15, 2016, at 10:42 AM, Xiaodi Wu via swift-evolution <swift-evolution@swift.org> wrote:

On Sun, Aug 14, 2016 at 5:41 PM, Michael Savich via swift-evolution <swift-evolution@swift.org> wrote:
Back in Swift 1.0, subscripting a String was easy, you could just use subscripting in a very Python like way. But now, things are a bit more complicated. I recognize why we need syntax like str.startIndex.advancedBy(x) but it has its downsides. Namely, it makes things hard on beginners. If one of Swift's goals is to make it a great first language, this syntax fights that. Imagine having to explain Unicode and character size to an 8 year old. This is doubly problematic because String manipulation is one of the first things new coders might want to do.

What about having an InternalString subclass that only supports one encoding, allowing it to be subscripted with Ints? The idea is that an InternalString is for Strings that are more or less hard coded into the app. Dictionary keys, enum raw values, that kind of stuff. This also has the added benefit of forcing the programmer to think about what the String is being used for. Is it user facing? Or is it just for internal use? And of course, it makes code dealing with String manipulation much more concise and readable.

It follows that something like this would need to be entered as a literal to make it as easy as using String. One way would be to make all String literals InternalStrings, but that sounds far too drastic. Maybe appending an exclamation point like "this"! Or even just wrapping the whole thing in exclamation marks like !"this"! Of course, we could go old school and write it like @"this" …That last one is a joke.

I'll be the first to admit I'm way in over my head here, so I'm very open to suggestions and criticism. Thanks!

I can sympathize, but this is tricky.

Fundamentally, if it's going to be a learning and teaching issue, then this "easy" string should be the default. That is to say, if I write `var a = "Hello, world!"`, then `a` should be inferred to be of type InternalString or EasyString, whatever you want to call it.

But, we also want Swift to support Unicode by default, and we want that support to do things The Right Way(TM) by default. In other words, a user should not have to reach for a special type in order to handle arbitrary strings correctly, and I should be able to reassign `a = "你好"` and have things work as expected. So, we also can't have the "easy" string type be the default...

I can't think of a way to square that circle.

Sent from my iPad

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Xiaodi Wu) #10

Nice, thanks :slight_smile:
FWIW, there are at least some ASCII-specific optimizations internally in
String (this was a question asked and not answered in the prior thread).

···

On Tue, Aug 16, 2016 at 11:21 PM, Jacob Bandes-Storch via swift-evolution < swift-evolution@swift.org> wrote:

Here's a little prior discussion about ASCIIString: https://lists.
swift.org/pipermail/swift-evolution/Week-of-Mon-20151207/002138.html

Jacob

On Sun, Aug 14, 2016 at 3:41 PM, Michael Savich via swift-evolution < > swift-evolution@swift.org> wrote:

Back in Swift 1.0, subscripting a String was easy, you could just use
subscripting in a very Python like way. But now, things are a bit more
complicated. I recognize why we need syntax like
str.startIndex.advancedBy(x) but it has its downsides. Namely, it makes
things hard on beginners. If one of Swift's goals is to make it a great
first language, this syntax fights that. Imagine having to explain Unicode
and character size to an 8 year old. This is doubly problematic because
String manipulation is one of the first things new coders might want to do.

What about having an InternalString subclass that only supports one
encoding, allowing it to be subscripted with Ints? The idea is that an
InternalString is for Strings that are more or less hard coded into the
app. Dictionary keys, enum raw values, that kind of stuff. This also has
the added benefit of forcing the programmer to think about what the String
is being used for. Is it user facing? Or is it just for internal use? And
of course, it makes code dealing with String manipulation much more concise
and readable.

It follows that something like this would need to be entered as a literal
to make it as easy as using String. One way would be to make all String
literals InternalStrings, but that sounds far too drastic. Maybe appending
an exclamation point like "this"! Or even just wrapping the whole thing in
exclamation marks like !"this"! Of course, we could go old school and write
it like @"this" …That last one is a joke.

I'll be the first to admit I'm way in over my head here, so I'm very open
to suggestions and criticism. Thanks!

Sent from my iPad

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Austin Zheng) #11

I just want to mention that the standard library team (or, more
specifically, Dave and Dmitri) is planning a rewrite of Swift's String
subsystem, in part to make it easier to work with strings in the "common
case". We may want to get their input, or wait until they've prepared a
proposal.

Best,
Austin

···

On Mon, Aug 15, 2016 at 10:58 AM, Xiaodi Wu via swift-evolution < swift-evolution@swift.org> wrote:

Swift supports arbitrary Unicode for identifier names, so Unicode would
have to be supported even for debugging strings.

On Mon, Aug 15, 2016 at 12:54 Michael Savich <savichmichael@icloud.com> > wrote:

Well, the thing I've been thinking is that InternalString doesn't have to
be just for learning. There is value in distinguishing between whether a
String is for UI or just for code. I get that Swift wants to be Unicode
friendly, but I think that only really benefits when dealing with UI and
possibly APIs.

I think String wears too many hats-- the best way forward is to split it
up. As it is String is a bit like an NSImage, and I'm not sure that is a
good thing for a basic type.

Sent from my iPhone

On Aug 15, 2016, at 1:42 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

On Sun, Aug 14, 2016 at 5:41 PM, Michael Savich via swift-evolution < >> swift-evolution@swift.org> wrote:

Back in Swift 1.0, subscripting a String was easy, you could just use
subscripting in a very Python like way. But now, things are a bit more
complicated. I recognize why we need syntax like
str.startIndex.advancedBy(x) but it has its downsides. Namely, it makes
things hard on beginners. If one of Swift's goals is to make it a great
first language, this syntax fights that. Imagine having to explain Unicode
and character size to an 8 year old. This is doubly problematic because
String manipulation is one of the first things new coders might want to do.

What about having an InternalString subclass that only supports one
encoding, allowing it to be subscripted with Ints? The idea is that an
InternalString is for Strings that are more or less hard coded into the
app. Dictionary keys, enum raw values, that kind of stuff. This also has
the added benefit of forcing the programmer to think about what the String
is being used for. Is it user facing? Or is it just for internal use? And
of course, it makes code dealing with String manipulation much more concise
and readable.

It follows that something like this would need to be entered as a
literal to make it as easy as using String. One way would be to make all
String literals InternalStrings, but that sounds far too drastic. Maybe
appending an exclamation point like "this"! Or even just wrapping the whole
thing in exclamation marks like !"this"! Of course, we could go old school
and write it like @"this" …That last one is a joke.

I'll be the first to admit I'm way in over my head here, so I'm very
open to suggestions and criticism. Thanks!

I can sympathize, but this is tricky.

Fundamentally, if it's going to be a learning and teaching issue, then
this "easy" string should be the default. That is to say, if I write `var a
= "Hello, world!"`, then `a` should be inferred to be of type
InternalString or EasyString, whatever you want to call it.

But, we also want Swift to support Unicode by default, and we want that
support to do things The Right Way(TM) by default. In other words, a user
should not have to reach for a special type in order to handle arbitrary
strings correctly, and I should be able to reassign `a = "你好"` and have
things work as expected. So, we also can't have the "easy" string type be
the default...

I can't think of a way to square that circle.

Sent from my iPad

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Xiaodi Wu) #12

At the risk of treading on sacred ground, does Swift need arbitrary
Unicode in identifiers? The Swift guidebook uses emojis as variable names
as an example of the benefits of this which is… not convincing...

The ship has sailed on that decision, I'd imagine.

But in any case, debugging strings would still need to support Unicode
because file paths, etc., need to be Unicode-aware. IMO, you're right that
a clear distinction between UI and non-UI is useful, but that doesn't break
down into Unicode vs. ASCII, and it's not clear to me that strings used for
these two purposes need or should have distinct APIs.

If it is international users that is the concern, my understanding is that

···

On Mon, Aug 15, 2016 at 1:07 PM, Michael Savich <savichmichael@icloud.com> wrote:

they usually use English for everything because AFAIK the Swift keywords
are English no matter where you are.

One reason I could see for keeping Unicode strings is if we localized
keywords into other languages, but that is a whole other discussion.

Sent from my iPhone

On Aug 15, 2016, at 1:58 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

Swift supports arbitrary Unicode for identifier names, so Unicode would
have to be supported even for debugging strings.


(Xiaodi Wu) #13

The utf16 property can already be subscripted with an Int, just as you
desire, if you import Foundation. (See the code for corelibs-foundation for
an intriguing discussion of why you must import Foundation at the moment.)

···

On Tue, Aug 16, 2016 at 02:00 Michael Savich <savichmichael@icloud.com> wrote:

What about adding a property to String called naiveCharacters (someone
come up with a better name) that can be subscripted with an Int a la Swift
1.0, and if it leads to runtime errors at least the programmer was warned?
We could make this property an optional that is only non-nil if the String
was created using literal syntax.

Of course there is the risk that tying arbitrary behavior to the literal
syntax might be too weird for a language feature, but it's an idea.

Sent from my iPad

> On Aug 15, 2016, at 2:50 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:
> The ship has sailed on that decision, I'd imagine.
>
> But in any case, debugging strings would still need to support Unicode
because file paths, etc., need to be Unicode-aware. IMO, you're right that
a clear distinction between UI and non-UI is useful, but that doesn't break
down into Unicode vs. ASCII, and it's not clear to me that strings used for
these two purposes need or should have distinct APIs.


(Félix Cloutier) #14

I'd just like to leave it here that Microsoft called me "F+¬lix" in corporate communications this morning. I've never seen that variation before. If Microsoft used Swift, I would like this to be borderline impossible for them to screw up. :slight_smile:

Félix

···

Le 16 août 2016 à 21:27:54, Xiaodi Wu via swift-evolution <swift-evolution@swift.org> a écrit :

Nice, thanks :slight_smile:
FWIW, there are at least some ASCII-specific optimizations internally in String (this was a question asked and not answered in the prior thread).

On Tue, Aug 16, 2016 at 11:21 PM, Jacob Bandes-Storch via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:
Here's a little prior discussion about ASCIIString: https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20151207/002138.html

Jacob

On Sun, Aug 14, 2016 at 3:41 PM, Michael Savich via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:
Back in Swift 1.0, subscripting a String was easy, you could just use subscripting in a very Python like way. But now, things are a bit more complicated. I recognize why we need syntax like str.startIndex.advancedBy(x) but it has its downsides. Namely, it makes things hard on beginners. If one of Swift's goals is to make it a great first language, this syntax fights that. Imagine having to explain Unicode and character size to an 8 year old. This is doubly problematic because String manipulation is one of the first things new coders might want to do.

What about having an InternalString subclass that only supports one encoding, allowing it to be subscripted with Ints? The idea is that an InternalString is for Strings that are more or less hard coded into the app. Dictionary keys, enum raw values, that kind of stuff. This also has the added benefit of forcing the programmer to think about what the String is being used for. Is it user facing? Or is it just for internal use? And of course, it makes code dealing with String manipulation much more concise and readable.

It follows that something like this would need to be entered as a literal to make it as easy as using String. One way would be to make all String literals InternalStrings, but that sounds far too drastic. Maybe appending an exclamation point like "this"! Or even just wrapping the whole thing in exclamation marks like !"this"! Of course, we could go old school and write it like @"this" …That last one is a joke.

I'll be the first to admit I'm way in over my head here, so I'm very open to suggestions and criticism. Thanks!

Sent from my iPad

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Kenny Leung) #15

I understand that the most friendly approach may not be the most efficient, but that’s not what I’m pushing for. I’m pushing for "does the thing people would most commonly expect”. Take a first-time programmer who reads any (human) language, and that is what they would expect.

Why couldn’t String’s internal storage format be glyph-based? If I were, say, writing a text editor, it would certainly be the easiest and most efficient format to work in.

-Kenny

···

On Aug 15, 2016, at 9:20 PM, Félix Cloutier <felixcca@yahoo.ca> wrote:

The major problem with this approach is that visual glyphs themselves have one level of variable-length encoding, and they sit on top of another variable-length encoding used to represent the Unicode characters (Swift-native Strings are currently encoded as UTF-8). For instance, the visual glyph :us: is the the result of putting side-by-side the Unicode characters 🇺 and 🇸("REGIONAL INDICATOR SYMBOL LETTER U" and "REGIONAL INDICATOR SYMBOL LETTER S"), which are themselves encoded as UTF-8 using 4 bytes each. A design in which you can "just write" string[4544] hides the fact that indexing is a linear-time operation that needs to recompose UTF-8 characters and then recompose visual glyphs on top of that.

Generally speaking, I *think* that I agree that human-geared "long string" on which you probably won't need random access, and machine-geared smaller strings that encode a command, could benefit from not being considered the same fundamental thing. However, I'm also afraid that this will end with more applications and websites that think that first names only contain 7-bit-clean characters in the A-Z range. (I live in the US and I can attest that this is still very common.)

You could make a point too that better facilities to parse strings would probably address this issue.

Félix

Le 15 août 2016 à 10:52:02, Kenny Leung via swift-evolution <swift-evolution@swift.org> a écrit :

I agree with both points of view. I think we need to bring back subscripting on strings which does the thing people would most commonly expect.

I would say that the subscripts indexes should correspond to a visual glyph. This seems reasonable to me for most character sets like Roman, Cyrillic, Chinese. There is some doubt in my mind for things like subscripted Japanese or connected (ligatured?) languages like Arabic, Hindi or Thai.

-Kenny

On Aug 15, 2016, at 10:42 AM, Xiaodi Wu via swift-evolution <swift-evolution@swift.org> wrote:

On Sun, Aug 14, 2016 at 5:41 PM, Michael Savich via swift-evolution <swift-evolution@swift.org> wrote:
Back in Swift 1.0, subscripting a String was easy, you could just use subscripting in a very Python like way. But now, things are a bit more complicated. I recognize why we need syntax like str.startIndex.advancedBy(x) but it has its downsides. Namely, it makes things hard on beginners. If one of Swift's goals is to make it a great first language, this syntax fights that. Imagine having to explain Unicode and character size to an 8 year old. This is doubly problematic because String manipulation is one of the first things new coders might want to do.

What about having an InternalString subclass that only supports one encoding, allowing it to be subscripted with Ints? The idea is that an InternalString is for Strings that are more or less hard coded into the app. Dictionary keys, enum raw values, that kind of stuff. This also has the added benefit of forcing the programmer to think about what the String is being used for. Is it user facing? Or is it just for internal use? And of course, it makes code dealing with String manipulation much more concise and readable.

It follows that something like this would need to be entered as a literal to make it as easy as using String. One way would be to make all String literals InternalStrings, but that sounds far too drastic. Maybe appending an exclamation point like "this"! Or even just wrapping the whole thing in exclamation marks like !"this"! Of course, we could go old school and write it like @"this" …That last one is a joke.

I'll be the first to admit I'm way in over my head here, so I'm very open to suggestions and criticism. Thanks!

I can sympathize, but this is tricky.

Fundamentally, if it's going to be a learning and teaching issue, then this "easy" string should be the default. That is to say, if I write `var a = "Hello, world!"`, then `a` should be inferred to be of type InternalString or EasyString, whatever you want to call it.

But, we also want Swift to support Unicode by default, and we want that support to do things The Right Way(TM) by default. In other words, a user should not have to reach for a special type in order to handle arbitrary strings correctly, and I should be able to reassign `a = "你好"` and have things work as expected. So, we also can't have the "easy" string type be the default...

I can't think of a way to square that circle.

Sent from my iPad

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Zachary Waldowski) #16

It's 2016, "the thing people would most commonly expect"
impossible-to-screw-up Unicode support that's performance. Optimizing
developer experience for beginning developers is just going to lead to
software that screws up in situations the developer doesn't anticipate,
as F+¬lix notes above.

Zachary

···

On Wed, Aug 17, 2016, at 09:40 AM, Kenny Leung via swift-evolution wrote:

I understand that the most friendly approach may not be the most
efficient, but that’s not what I’m pushing for. I’m pushing for "does the
thing people would most commonly expect”. Take a first-time programmer
who reads any (human) language, and that is what they would expect.

Why couldn’t String’s internal storage format be glyph-based? If I were,
say, writing a text editor, it would certainly be the easiest and most
efficient format to work in.

-Kenny

> On Aug 15, 2016, at 9:20 PM, Félix Cloutier <felixcca@yahoo.ca> wrote:
>
> The major problem with this approach is that visual glyphs themselves have one level of variable-length encoding, and they sit on top of another variable-length encoding used to represent the Unicode characters (Swift-native Strings are currently encoded as UTF-8). For instance, the visual glyph :us: is the the result of putting side-by-side the Unicode characters 🇺 and 🇸("REGIONAL INDICATOR SYMBOL LETTER U" and "REGIONAL INDICATOR SYMBOL LETTER S"), which are themselves encoded as UTF-8 using 4 bytes each. A design in which you can "just write" string[4544] hides the fact that indexing is a linear-time operation that needs to recompose UTF-8 characters and then recompose visual glyphs on top of that.
>
> Generally speaking, I *think* that I agree that human-geared "long string" on which you probably won't need random access, and machine-geared smaller strings that encode a command, could benefit from not being considered the same fundamental thing. However, I'm also afraid that this will end with more applications and websites that think that first names only contain 7-bit-clean characters in the A-Z range. (I live in the US and I can attest that this is still very common.)
>
> You could make a point too that better facilities to parse strings would probably address this issue.
>
> Félix
>
>> Le 15 août 2016 à 10:52:02, Kenny Leung via swift-evolution <swift-evolution@swift.org> a écrit :
>>
>> I agree with both points of view. I think we need to bring back subscripting on strings which does the thing people would most commonly expect.
>>
>> I would say that the subscripts indexes should correspond to a visual glyph. This seems reasonable to me for most character sets like Roman, Cyrillic, Chinese. There is some doubt in my mind for things like subscripted Japanese or connected (ligatured?) languages like Arabic, Hindi or Thai.
>>
>> -Kenny
>>
>>
>>> On Aug 15, 2016, at 10:42 AM, Xiaodi Wu via swift-evolution <swift-evolution@swift.org> wrote:
>>>
>>> On Sun, Aug 14, 2016 at 5:41 PM, Michael Savich via swift-evolution <swift-evolution@swift.org> wrote:
>>> Back in Swift 1.0, subscripting a String was easy, you could just use subscripting in a very Python like way. But now, things are a bit more complicated. I recognize why we need syntax like str.startIndex.advancedBy(x) but it has its downsides. Namely, it makes things hard on beginners. If one of Swift's goals is to make it a great first language, this syntax fights that. Imagine having to explain Unicode and character size to an 8 year old. This is doubly problematic because String manipulation is one of the first things new coders might want to do.
>>>
>>> What about having an InternalString subclass that only supports one encoding, allowing it to be subscripted with Ints? The idea is that an InternalString is for Strings that are more or less hard coded into the app. Dictionary keys, enum raw values, that kind of stuff. This also has the added benefit of forcing the programmer to think about what the String is being used for. Is it user facing? Or is it just for internal use? And of course, it makes code dealing with String manipulation much more concise and readable.
>>>
>>> It follows that something like this would need to be entered as a literal to make it as easy as using String. One way would be to make all String literals InternalStrings, but that sounds far too drastic. Maybe appending an exclamation point like "this"! Or even just wrapping the whole thing in exclamation marks like !"this"! Of course, we could go old school and write it like @"this" …That last one is a joke.
>>>
>>> I'll be the first to admit I'm way in over my head here, so I'm very open to suggestions and criticism. Thanks!
>>>
>>> I can sympathize, but this is tricky.
>>>
>>> Fundamentally, if it's going to be a learning and teaching issue, then this "easy" string should be the default. That is to say, if I write `var a = "Hello, world!"`, then `a` should be inferred to be of type InternalString or EasyString, whatever you want to call it.
>>>
>>> But, we also want Swift to support Unicode by default, and we want that support to do things The Right Way(TM) by default. In other words, a user should not have to reach for a special type in order to handle arbitrary strings correctly, and I should be able to reassign `a = "你好"` and have things work as expected. So, we also can't have the "easy" string type be the default...
>>>
>>> I can't think of a way to square that circle.
>>>
>>>
>>> Sent from my iPad
>>>
>>> _______________________________________________
>>> swift-evolution mailing list
>>> swift-evolution@swift.org
>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>>
>>>
>>> _______________________________________________
>>> swift-evolution mailing list
>>> swift-evolution@swift.org
>>> https://lists.swift.org/mailman/listinfo/swift-evolution
>>
>> _______________________________________________
>> swift-evolution mailing list
>> swift-evolution@swift.org
>> https://lists.swift.org/mailman/listinfo/swift-evolution
>

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(William Sumner) #17

Can you be more specific about the improvements you’d like to see? Based on an earlier message, you want to be able to use subscripting on strings to retrieve visual glyphs, but you can do this now via the .characters property, which presents a view of the string’s contents as a collection of extended grapheme clusters.

Preston

···

On Aug 17, 2016, at 10:40 AM, Kenny Leung via swift-evolution <swift-evolution@swift.org> wrote:

I understand that the most friendly approach may not be the most efficient, but that’s not what I’m pushing for. I’m pushing for "does the thing people would most commonly expect”. Take a first-time programmer who reads any (human) language, and that is what they would expect.

Why couldn’t String’s internal storage format be glyph-based? If I were, say, writing a text editor, it would certainly be the easiest and most efficient format to work in.

-Kenny


(Kenny Leung) #18

It seems to me that UTF-8 is the best choice to encode strings in English and English-like character sets for storage, but it’s not clear that it is the most useful or performant internal representation for working with strings. In my opinion, conflating the preferred storage format and the best internal representation is not the proper thing to do. Picking the right internal storage format should be evaluated based on its own criteria. Even as an experienced programmer, I assert that the most useful indexing system is glyph based.

In Félix’s case, I would expect to have to ask for a mail-friendly representation of his name, just like you have to ask for a filesystem-friendly representation of a filename regardless of what the internal representation is. Just because you are using UTF-8 as the internal format, it does not mean that universal support is guaranteed.

In response to this statement: “Optimizing developer experience for beginning developers is just going to lead to software that screws…”, the current system trips up not only beginning developers, but is different from pretty much every programming language in my experience.

-Kenny

···

On Aug 17, 2016, at 11:48 AM, Zach Waldowski via swift-evolution <swift-evolution@swift.org> wrote:

It's 2016, "the thing people would most commonly expect"
impossible-to-screw-up Unicode support that's performance. Optimizing
developer experience for beginning developers is just going to lead to
software that screws up in situations the developer doesn't anticipate,
as F+¬lix notes above.

Zachary

On Wed, Aug 17, 2016, at 09:40 AM, Kenny Leung via swift-evolution > wrote:

I understand that the most friendly approach may not be the most
efficient, but that’s not what I’m pushing for. I’m pushing for "does the
thing people would most commonly expect”. Take a first-time programmer
who reads any (human) language, and that is what they would expect.

Why couldn’t String’s internal storage format be glyph-based? If I were,
say, writing a text editor, it would certainly be the easiest and most
efficient format to work in.

-Kenny

On Aug 15, 2016, at 9:20 PM, Félix Cloutier <felixcca@yahoo.ca> wrote:

The major problem with this approach is that visual glyphs themselves have one level of variable-length encoding, and they sit on top of another variable-length encoding used to represent the Unicode characters (Swift-native Strings are currently encoded as UTF-8). For instance, the visual glyph :us: is the the result of putting side-by-side the Unicode characters 🇺 and 🇸("REGIONAL INDICATOR SYMBOL LETTER U" and "REGIONAL INDICATOR SYMBOL LETTER S"), which are themselves encoded as UTF-8 using 4 bytes each. A design in which you can "just write" string[4544] hides the fact that indexing is a linear-time operation that needs to recompose UTF-8 characters and then recompose visual glyphs on top of that.

Generally speaking, I *think* that I agree that human-geared "long string" on which you probably won't need random access, and machine-geared smaller strings that encode a command, could benefit from not being considered the same fundamental thing. However, I'm also afraid that this will end with more applications and websites that think that first names only contain 7-bit-clean characters in the A-Z range. (I live in the US and I can attest that this is still very common.)

You could make a point too that better facilities to parse strings would probably address this issue.

Félix

Le 15 août 2016 à 10:52:02, Kenny Leung via swift-evolution <swift-evolution@swift.org> a écrit :

I agree with both points of view. I think we need to bring back subscripting on strings which does the thing people would most commonly expect.

I would say that the subscripts indexes should correspond to a visual glyph. This seems reasonable to me for most character sets like Roman, Cyrillic, Chinese. There is some doubt in my mind for things like subscripted Japanese or connected (ligatured?) languages like Arabic, Hindi or Thai.

-Kenny

On Aug 15, 2016, at 10:42 AM, Xiaodi Wu via swift-evolution <swift-evolution@swift.org> wrote:

On Sun, Aug 14, 2016 at 5:41 PM, Michael Savich via swift-evolution <swift-evolution@swift.org> wrote:
Back in Swift 1.0, subscripting a String was easy, you could just use subscripting in a very Python like way. But now, things are a bit more complicated. I recognize why we need syntax like str.startIndex.advancedBy(x) but it has its downsides. Namely, it makes things hard on beginners. If one of Swift's goals is to make it a great first language, this syntax fights that. Imagine having to explain Unicode and character size to an 8 year old. This is doubly problematic because String manipulation is one of the first things new coders might want to do.

What about having an InternalString subclass that only supports one encoding, allowing it to be subscripted with Ints? The idea is that an InternalString is for Strings that are more or less hard coded into the app. Dictionary keys, enum raw values, that kind of stuff. This also has the added benefit of forcing the programmer to think about what the String is being used for. Is it user facing? Or is it just for internal use? And of course, it makes code dealing with String manipulation much more concise and readable.

It follows that something like this would need to be entered as a literal to make it as easy as using String. One way would be to make all String literals InternalStrings, but that sounds far too drastic. Maybe appending an exclamation point like "this"! Or even just wrapping the whole thing in exclamation marks like !"this"! Of course, we could go old school and write it like @"this" …That last one is a joke.

I'll be the first to admit I'm way in over my head here, so I'm very open to suggestions and criticism. Thanks!

I can sympathize, but this is tricky.

Fundamentally, if it's going to be a learning and teaching issue, then this "easy" string should be the default. That is to say, if I write `var a = "Hello, world!"`, then `a` should be inferred to be of type InternalString or EasyString, whatever you want to call it.

But, we also want Swift to support Unicode by default, and we want that support to do things The Right Way(TM) by default. In other words, a user should not have to reach for a special type in order to handle arbitrary strings correctly, and I should be able to reassign `a = "你好"` and have things work as expected. So, we also can't have the "easy" string type be the default...

I can't think of a way to square that circle.

Sent from my iPad

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Shawn Erickson) #19

I would also like to understand the perceived problem for first time
programmers. To me first time programmers would be working with string
literals ("hello world"), string literals with values in them ("Hello
/(name)"), doing basic string concat, using higher level API of string to
do and find things in a string, etc..

I think indexing into a string is actually a complex programming task and
usually is the last thing you want to be doing outside specific problem
domains.

···

On Wed, Aug 17, 2016 at 12:38 PM William Sumner via swift-evolution < swift-evolution@swift.org> wrote:

> On Aug 17, 2016, at 10:40 AM, Kenny Leung via swift-evolution < > swift-evolution@swift.org> wrote:
>
> I understand that the most friendly approach may not be the most
efficient, but that’s not what I’m pushing for. I’m pushing for "does the
thing people would most commonly expect”. Take a first-time programmer who
reads any (human) language, and that is what they would expect.
>
> Why couldn’t String’s internal storage format be glyph-based? If I were,
say, writing a text editor, it would certainly be the easiest and most
efficient format to work in.
>
> -Kenny

Can you be more specific about the improvements you’d like to see? Based
on an earlier message, you want to be able to use subscripting on strings
to retrieve visual glyphs, but you can do this now via the .characters
property, which presents a view of the string’s contents as a collection of
extended grapheme clusters.

Preston
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Kenny Leung) #20

William Sumner says:
Can you be more specific about the improvements you’d like to see? Based on an earlier message, you want to be able to use subscripting on strings to retrieve visual glyphs, but you can do this now via the .characters property, which presents a view of the string’s contents as a collection of extended grapheme clusters.

I did not know about .characters. I would say this addresses the glyph portion of my issues.

I still have a problem with not being able to index using simple integers to create subscripts or ranges. I wrote my own “split" function, and found it extremely frustrating to have to work with .Index types instead of being able to use integers for subscripts and ranges. Compared to other languages, this almost obviates the usefulness of subscripts altogether. I understand that there are performance implications with translating integer subscripts into actual indexes in the string, but I guess this is a case where even generating another view on the string doesn’t do enough (internally) to make it worthwhile. Perhaps if it did… Again, this is very beginner unfriendly. I guess I will amend my definition of beginner to not only include people new to programming, but people already experienced in languages besides Swift. Now that I think about it, NSString is as powerful as Swift.String when ti comes to Unicode, and it still allows integer based indexing.

Another issue I have is that a String itself is not subscriptable, but that you have to get a view of it. I think String should have some default subscriptability that “does the right thing”, whatever that is decided to be.

<heart-to-heart on>
Now that we’re getting to the heart of the problem (thanks for the prompting me to think more deeply about it), Swift may be more frustrating to learn for experienced programmers coming from C, Objective-C, Java, Ruby, etc. You try to do the simplest think like index into a string, and then find out you can’t - you think to yourself, “I’ve been programming in Objective-C for 20 years. Why can’t I do this? Am I stupid? Is the Swift team purposely trying to make this hard for me?”

I’ve been reading swift-evolution for a long time now, and a reason often given for design decisions is “term of art”. I believe that integer-based subscriptablilty is a term of art that should be supported.
<heart-to-heart off>

···

On Aug 17, 2016, at 12:51 PM, Shawn Erickson <shawnce@gmail.com> wrote:

I would also like to understand the perceived problem for first time programmers. To me first time programmers would be working with string literals ("hello world"), string literals with values in them ("Hello /(name)"), doing basic string concat, using higher level API of string to do and find things in a string, etc..

I guess it’s a matter of opinion what features beginner programmers will dip their toes into, but I think string manipulation is not that far up the totem pole. Would you consider splitting a comma-separated string an advanced task?

Also, see my revised definition of beginner programmer above.

-Kenny