Strings in Swift 4

TedvG · February 10, 2017, 1:09am

Hello Shawn
Just google with any programming language name and “string manipulation”
and you have enough reading for a week or so :o)
TedvG

That truly doesn't answer the question. It's not, “why do people index
strings with integers when that's the only tool they are given for
decomposing strings?” It's, “what do you have to do with strings that's
hard in Swift *because* you can't index them with integers?”

Hi Dave,
Ok. here are just a few examples:
Parsing and validating an ISBN code? or a (freight) container ID? or EAN13 perhaps?
of many of the typical combined article codes and product IDs that many factories and shops use?

or:

E.g. processing legacy files from IBM mainframes:
extract fields from ancient data records read from very old sequential files,
say, a product data record like this from a file from 1978 you’d have to unpack and process:
123534-09EVXD4568,991234,89ABCYELLOW12AGRAINESYTEMZ3453
into:
123, 534, -09, EVXD45, 68,99, 1234,99, ABC, YELLOW, 12A, GRAIN, ESYSTEM, Z3453.
product category, pcs, discount code, product code, price Yen, price $, class code, etc…
in Cobol and PL/1 records are nearly always defined with a fixed field layout like this.:
(storage was limited and very, very expensive, e.g. XML would be regarded as a
"scandalous waste" even the commas in CSV files! )

01 MAILING-RECORD.
       05 COMPANY-NAME PIC X(30).
       05 CONTACTS.
           10 PRESIDENT.
               15 LAST-NAME PIC X(15).
               15 FIRST-NAME PIC X(8).
           10 VP-MARKETING.
               15 LAST-NAME PIC X(15).
               15 FIRST-NAME PIC X(8).
           10 ALTERNATE-CONTACT.
               15 TITLE PIC X(10).
               15 LAST-NAME PIC X(15).
               15 FIRST-NAME PIC X(8).
       05 ADDRESS PIC X(15).
       05 CITY PIC X(15).
       05 STATE PIC XX.
       05 ZIP PIC 9(5).

These are all character data fields here, except for the numeric ZIP field , however in Cobol it can be treated like character data.
So here I am, having to get the data of these old Cobol production files
into a brand new Swift based accounting system of 2017, what can I do?

How do I unpack these records and being the data into a Swift structure or class?
(In Cobol I don’t have to because of the predefined fixed format record layout).

AFAIK there are no similar record structures with fixed fields like this available Swift?

So, the only way I can think of right now is to do it like this:

// mailingRecord is a Swift structure
struct MailingRecord
{
    var companyName: String = “no Name”
     var contacts: CompanyContacts
     .
     etc..
}

// recordStr was read here with ASCII encoding

// unpack data in to structure’s properties, in this case all are Strings
mailingRecord.companyName = recordStr[ 0..<30]
mailingRecord.contacts.president.lastName = recordStr[30..<45]
mailingRecord.contacts.president.firstName = recordStr[45..<53]

// and so on..

Ever worked for e.g. a bank with thousands of these files unchanged formats for years?

Any alternative, convenient en simpler methods in Swift present?

Kind Regards
TedvG
( example of the above Cobol record borrowed from here:
http://www.3480-3590-data-conversion.com/article-reading-cobol-layouts-1.html )

···

On 10 Feb 2017, at 00:11, Dave Abrahams <dabrahams@apple.com> wrote:
on Thu Feb 09 2017, "Ted F.A. van Gaalen" <tedvgiosdev-AT-gmail.com> wrote:

On 9 Feb 2017, at 16:48, Shawn Erickson <shawnce@gmail.com> wrote:

I also wonder what folks are actually doing that require indexing
into strings. I would love to see some real world examples of what
and why indexing into a string is needed. Who is the end consumer of
that string, etc.

Do folks have so examples?

-Shawn

On Thu, Feb 9, 2017 at 6:56 AM Ted F.A. van Gaalen via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:
Hello Hooman
That invalidates my assumptions, thanks for evaluating
it's more complex than I thought.
Kind Regards
Ted

On 8 Feb 2017, at 00:07, Hooman Mehr <hooman@mac.com <mailto:hooman@mac.com>> wrote:

On Feb 7, 2017, at 12:19 PM, Ted F.A. van Gaalen via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

I now assume that:
     1. -= a “plain” Unicode character (codepoint?) can result in one glyph.=-

What do you mean by “plain”? Characters in some Unicode scripts are
by no means “plain”. They can affect (and be affected by) the
characters around them, they can cause glyphs around them to
rearrange or combine (like ligatures) or their visual
representation (glyph) may float in the same space as an adjacent
glyph (and seem to be part of the “host” glyph), etc. So, the
general relationship of a character and its corresponding glyph (if
there is one) is complex and depends on context and surroundings
characters.

     2. -= a grapheme cluster always results in just a single glyph, true? =-

False

     3. The only thing that I can see on screen or print are glyphs (“carvings”,visual elements that stand on their own )

The visible effect might not be a visual shape. It may be for example, the way the surrounding shapes change or re-arrange.

    4. In this context, a glyph is a humanly recognisable visual form of a character,

Not in a straightforward one to one fashion, not even in Latin / Roman script.

    5. On this level (the glyph, what I can see as a user) it is not relevant and also not detectable
        with how many Unicode scalars (codepoints ?), grapheme, or even on what kind
        of encoding the glyph was based upon.

False

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

<https://lists.swift.org/mailman/listinfo/swift-evolution>

--
-Dave

Shawn_Erickson · February 10, 2017, 2:50am

The end goal of this string is for human consumption right? So such
manipulation would need need to unicode aware in the modern world? ..or is
it for some other reason?

-Shawn

···

On Thu, Feb 9, 2017 at 3:45 PM Hooman Mehr <hooman@mac.com> wrote:

On Feb 9, 2017, at 3:11 PM, Dave Abrahams <dabrahams@apple.com> wrote:

on Thu Feb 09 2017, "Ted F.A. van Gaalen" <tedvgiosdev-AT-gmail.com> > wrote:

Hello Shawn
Just google with any programming language name and “string manipulation”
and you have enough reading for a week or so :o)
TedvG

That truly doesn't answer the question. It's not, “why do people index
strings with integers when that's the only tool they are given for
decomposing strings?” It's, “what do you have to do with strings that's
hard in Swift *because* you can't index them with integers?”

I have done some string processing. I have not encountered any algorithm
where an integer index is absolutely needed, but sometimes it might be the
most convenient.

For example, there are valid reasons to keep side tables that hold indexes
into a string. (such as maintaining attributes that apply to a substring or
things like pre-computed positions of soft line breaks). It does not
require the index to be *integer*, but maintaining validity of those
indexes after the string is mutated requires being able to offset them back
or forth from some position on. These operations could be less verbose and
easier if the index happens to be integer or (efficiently) supports + -
operators. Also, I know there are other methods to deal with such things
and mutating a large string generally is a bad idea, but sometimes it is
the easiest and most convenient solution to the problem at hand.

Jon_Hull · February 11, 2017, 1:49am

This is the biggest need I have from strings (and collections) that is not being met, and is I think why people reach for integers. I have a stored index which points to something important, and if the string/collection is edited, I now have to update the index to be correct. Lots of chances to screw up (e.g. off by 1 errors) if I am not super careful.

I would much rather have that dealt with by the string/collection itself, so that I can think about my larger project instead of keeping everything in sync.

My preferred design for this would be to have two types of index. An internal index (what we have now) which is fast, efficient and transient, and a stable index which will always point to the same item despite having added or removed other items (or be testably invalid if the item pointed to has been removed). For strings, this means the stable index would point to the same characters even if the string has been edited (as long as those characters are still there).

I know the second isn’t useful for algorithms in the standard library, but it is sooooo useful for things like storing user selections… and it is very easy to foot-gun when trying to do it yourself. Keeping stored indexes in sync is among my top annoyances while programming.

An easier to implement, but slightly less useful approach, would be to have methods which take an array of indexes along with the proposed change, and then it adjusts the indexes (or replaces them with nil if they are invalid) as it makes the update. For example:

func append(_ element:Element, adjusting: [Index]) -> [Index?]
func appending(_ element:Element, adjusting: [Index]) -> (Self, [Index?])

Thanks,
Jon

···

On Feb 9, 2017, at 3:45 PM, Hooman Mehr via swift-evolution <swift-evolution@swift.org> wrote:

On Feb 9, 2017, at 3:11 PM, Dave Abrahams <dabrahams@apple.com <mailto:dabrahams@apple.com>> wrote:

on Thu Feb 09 2017, "Ted F.A. van Gaalen" <tedvgiosdev-AT-gmail.com <http://tedvgiosdev-at-gmail.com/>> wrote:

Hello Shawn
Just google with any programming language name and “string manipulation”
and you have enough reading for a week or so :o)
TedvG

That truly doesn't answer the question. It's not, “why do people index
strings with integers when that's the only tool they are given for
decomposing strings?” It's, “what do you have to do with strings that's
hard in Swift *because* you can't index them with integers?”

I have done some string processing. I have not encountered any algorithm where an integer index is absolutely needed, but sometimes it might be the most convenient.

For example, there are valid reasons to keep side tables that hold indexes into a string. (such as maintaining attributes that apply to a substring or things like pre-computed positions of soft line breaks). It does not require the index to be integer, but maintaining validity of those indexes after the string is mutated requires being able to offset them back or forth from some position on. These operations could be less verbose and easier if the index happens to be integer or (efficiently) supports + - operators. Also, I know there are other methods to deal with such things and mutating a large string generally is a bad idea, but sometimes it is the easiest and most convenient solution to the problem at hand.

On 9 Feb 2017, at 16:48, Shawn Erickson <shawnce@gmail.com <mailto:shawnce@gmail.com>> wrote:

I also wonder what folks are actually doing that require indexing
into strings. I would love to see some real world examples of what
and why indexing into a string is needed. Who is the end consumer of
that string, etc.

Do folks have so examples?

-Shawn

On Thu, Feb 9, 2017 at 6:56 AM Ted F.A. van Gaalen via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org> <mailto:swift-evolution@swift.org <mailto:swift-evolution@swift.org>>> wrote:
Hello Hooman
That invalidates my assumptions, thanks for evaluating
it's more complex than I thought.
Kind Regards
Ted

On 8 Feb 2017, at 00:07, Hooman Mehr <hooman@mac.com <mailto:hooman@mac.com> <mailto:hooman@mac.com <mailto:hooman@mac.com>>> wrote:

On Feb 7, 2017, at 12:19 PM, Ted F.A. van Gaalen via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org> <mailto:swift-evolution@swift.org <mailto:swift-evolution@swift.org>>> wrote:

I now assume that:
     1. -= a “plain” Unicode character (codepoint?) can result in one glyph.=-

What do you mean by “plain”? Characters in some Unicode scripts are
by no means “plain”. They can affect (and be affected by) the
characters around them, they can cause glyphs around them to
rearrange or combine (like ligatures) or their visual
representation (glyph) may float in the same space as an adjacent
glyph (and seem to be part of the “host” glyph), etc. So, the
general relationship of a character and its corresponding glyph (if
there is one) is complex and depends on context and surroundings
characters.

     2. -= a grapheme cluster always results in just a single glyph, true? =-

False

     3. The only thing that I can see on screen or print are glyphs (“carvings”,visual elements that stand on their own )

The visible effect might not be a visual shape. It may be for example, the way the surrounding shapes change or re-arrange.

    4. In this context, a glyph is a humanly recognisable visual form of a character,

Not in a straightforward one to one fashion, not even in Latin / Roman script.

    5. On this level (the glyph, what I can see as a user) it is not relevant and also not detectable
        with how many Unicode scalars (codepoints ?), grapheme, or even on what kind
        of encoding the glyph was based upon.

False

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org> <mailto:swift-evolution@swift.org <mailto:swift-evolution@swift.org>>
https://lists.swift.org/mailman/listinfo/swift-evolution

<https://lists.swift.org/mailman/listinfo/swift-evolution>

--
-Dave

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

dabrahams · February 11, 2017, 5:30pm

Hello Shawn
Just google with any programming language name and “string manipulation”
and you have enough reading for a week or so :o)
TedvG

That truly doesn't answer the question. It's not, “why do people index
strings with integers when that's the only tool they are given for
decomposing strings?” It's, “what do you have to do with strings that's
hard in Swift *because* you can't index them with integers?”

I have done some string processing. I have not encountered any algorithm where an integer index is absolutely needed, but sometimes it might be the most convenient.

For example, there are valid reasons to keep side tables that hold indexes into a string. (such as maintaining attributes that apply to a substring or things like pre-computed positions of soft line breaks). It does not require the index to be integer, but maintaining validity of those indexes after the string is mutated requires being able to offset them back or forth from some position on. These operations could be less verbose and easier if the index happens to be integer or (efficiently) supports + - operators. Also, I know there are other methods to deal with such things and mutating a large string generally is a bad idea, but sometimes it is the easiest and most convenient solution to the problem at hand.

As noted in the manifesto, it will be trivial to translate string indices to/from integer code unit offsets, most likely by a property / an init

String indices as proposed will not however support +/-

···

Sent from my moss-covered three-handled family gradunza

On Feb 9, 2017, at 3:45 PM, Hooman Mehr <hooman@mac.com> wrote:

On Feb 9, 2017, at 3:11 PM, Dave Abrahams <dabrahams@apple.com> wrote:
on Thu Feb 09 2017, "Ted F.A. van Gaalen" <tedvgiosdev-AT-gmail.com> wrote:

On 9 Feb 2017, at 16:48, Shawn Erickson <shawnce@gmail.com> wrote:

I also wonder what folks are actually doing that require indexing
into strings. I would love to see some real world examples of what
and why indexing into a string is needed. Who is the end consumer of
that string, etc.

Do folks have so examples?

-Shawn

On Thu, Feb 9, 2017 at 6:56 AM Ted F.A. van Gaalen via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:
Hello Hooman
That invalidates my assumptions, thanks for evaluating
it's more complex than I thought.
Kind Regards
Ted

On 8 Feb 2017, at 00:07, Hooman Mehr <hooman@mac.com <mailto:hooman@mac.com>> wrote:

On Feb 7, 2017, at 12:19 PM, Ted F.A. van Gaalen via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

I now assume that:
     1. -= a “plain” Unicode character (codepoint?) can result in one glyph.=-

What do you mean by “plain”? Characters in some Unicode scripts are
by no means “plain”. They can affect (and be affected by) the
characters around them, they can cause glyphs around them to
rearrange or combine (like ligatures) or their visual
representation (glyph) may float in the same space as an adjacent
glyph (and seem to be part of the “host” glyph), etc. So, the
general relationship of a character and its corresponding glyph (if
there is one) is complex and depends on context and surroundings
characters.

     2. -= a grapheme cluster always results in just a single glyph, true? =-

False

     3. The only thing that I can see on screen or print are glyphs (“carvings”,visual elements that stand on their own )

The visible effect might not be a visual shape. It may be for example, the way the surrounding shapes change or re-arrange.

    4. In this context, a glyph is a humanly recognisable visual form of a character,

Not in a straightforward one to one fashion, not even in Latin / Roman script.

    5. On this level (the glyph, what I can see as a user) it is not relevant and also not detectable
        with how many Unicode scalars (codepoints ?), grapheme, or even on what kind
        of encoding the glyph was based upon.

False

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

<https://lists.swift.org/mailman/listinfo/swift-evolution>

--
-Dave

Shawn_Erickson · February 10, 2017, 2:56am

byte buffer into strings, etc. Likely little need to force them via a
higher order string concept, at least not until unpacked from its compact
byte form.

-Shawn

···

On Thu, Feb 9, 2017 at 5:09 PM Ted F.A. van Gaalen <tedvgiosdev@gmail.com> wrote:

On 10 Feb 2017, at 00:11, Dave Abrahams <dabrahams@apple.com> wrote:

on Thu Feb 09 2017, "Ted F.A. van Gaalen" <tedvgiosdev-AT-gmail.com> > wrote:

Hello Shawn
Just google with any programming language name and “string manipulation”
and you have enough reading for a week or so :o)
TedvG

That truly doesn't answer the question. It's not, “why do people index
strings with integers when that's the only tool they are given for
decomposing strings?” It's, “what do you have to do with strings that's
hard in Swift *because* you can't index them with integers?”

Hi Dave,
Ok. here are just a few examples:
Parsing and validating an ISBN code? or a (freight) container ID? or EAN13
perhaps?
of many of the typical combined article codes and product IDs that many
factories and shops use?

or:

E.g. processing legacy files from IBM mainframes:
extract fields from ancient data records read from very old sequential
files,
say, a product data record like this from a file from 1978 you’d have to
unpack and process:
123534-09EVXD4568,991234,89ABCYELLOW12AGRAINESYTEMZ3453
into:
123, 534, -09, EVXD45, 68,99, 1234,99, ABC, YELLOW, 12A, GRAIN, ESYSTEM,
Z3453.
product category, pcs, discount code, product code, price Yen, price $,
class code, etc…
in Cobol and PL/1 records are nearly always defined with a fixed field
layout like this.:
(storage was limited and very, very expensive, e.g. XML would be regarded
as a
"scandalous waste" even the commas in CSV files! )

01 MAILING-RECORD.

       05 COMPANY-NAME PIC X(30).
       05 CONTACTS.
           10 PRESIDENT.
               15 LAST-NAME PIC X(15).
               15 FIRST-NAME PIC X(8).
           10 VP-MARKETING.
               15 LAST-NAME PIC X(15).
               15 FIRST-NAME PIC X(8).
           10 ALTERNATE-CONTACT.
               15 TITLE PIC X(10).
               15 LAST-NAME PIC X(15).
               15 FIRST-NAME PIC X(8).
       05 ADDRESS PIC X(15).
       05 CITY PIC X(15).
       05 STATE PIC XX.
       05 ZIP PIC 9(5).

These are all character data fields here, except for the numeric ZIP field , however in Cobol it can be treated like character data.
So here I am, having to get the data of these old Cobol production files
into a brand new Swift based accounting system of 2017, what can I do?

How do I unpack these records and being the data into a Swift structure or class?
(In Cobol I don’t have to because of the predefined fixed format record layout).

AFAIK there are no similar record structures with fixed fields like this available Swift?

So, the only way I can think of right now is to do it like this:

// mailingRecord is a Swift structure
struct MailingRecord
{
    var companyName: String = “no Name”
     var contacts: CompanyContacts
     .
     etc..
}

// recordStr was read here with ASCII encoding

// unpack data in to structure’s properties, in this case all are Strings
mailingRecord.companyName = recordStr[ 0..<30]
mailingRecord.contacts.president.lastName = recordStr[30..<45]
mailingRecord.contacts.president.firstName = recordStr[45..<53]

// and so on..

Ever worked for e.g. a bank with thousands of these files unchanged formats for years?

Any alternative, convenient en simpler methods in Swift present?

These looks like examples of fix data format that could be parsed from a

hooman · February 10, 2017, 6:38pm

For an example of what I mean, see the source code of NS(Mutable)AttributedString <https://github.com/apple/swift-corelibs-foundation/blob/master/Foundation/NSAttributedString.swift> and note how most of the mutating methods of Mutable variant are not implemented yet...

So, a good example of where such indexing would be convenient, could be writing a swift-native AttributedString backed by Swift native String.

···

On Feb 9, 2017, at 6:50 PM, Shawn Erickson <shawnce@gmail.com> wrote:

On Thu, Feb 9, 2017 at 3:45 PM Hooman Mehr <hooman@mac.com <mailto:hooman@mac.com>> wrote:

On Feb 9, 2017, at 3:11 PM, Dave Abrahams <dabrahams@apple.com <mailto:dabrahams@apple.com>> wrote:

on Thu Feb 09 2017, "Ted F.A. van Gaalen" <tedvgiosdev-AT-gmail.com <http://tedvgiosdev-at-gmail.com/>> wrote:

Hello Shawn
Just google with any programming language name and “string manipulation”
and you have enough reading for a week or so :o)
TedvG

That truly doesn't answer the question. It's not, “why do people index
strings with integers when that's the only tool they are given for
decomposing strings?” It's, “what do you have to do with strings that's
hard in Swift *because* you can't index them with integers?”

I have done some string processing. I have not encountered any algorithm where an integer index is absolutely needed, but sometimes it might be the most convenient.

For example, there are valid reasons to keep side tables that hold indexes into a string. (such as maintaining attributes that apply to a substring or things like pre-computed positions of soft line breaks). It does not require the index to be integer, but maintaining validity of those indexes after the string is mutated requires being able to offset them back or forth from some position on. These operations could be less verbose and easier if the index happens to be integer or (efficiently) supports + - operators. Also, I know there are other methods to deal with such things and mutating a large string generally is a bad idea, but sometimes it is the easiest and most convenient solution to the problem at hand.

The end goal of this string is for human consumption right? So such manipulation would need need to unicode aware in the modern world? ..or is it for some other reason?

-Shawn

Jon_Hull · February 11, 2017, 2:04am

Ok, appending was a dumb example (It has been a long week). Imagine the same idea with insert/remove…

Thanks,
Jon

···

On Feb 10, 2017, at 5:49 PM, Jonathan Hull via swift-evolution <swift-evolution@swift.org> wrote:

This is the biggest need I have from strings (and collections) that is not being met, and is I think why people reach for integers. I have a stored index which points to something important, and if the string/collection is edited, I now have to update the index to be correct. Lots of chances to screw up (e.g. off by 1 errors) if I am not super careful.

I would much rather have that dealt with by the string/collection itself, so that I can think about my larger project instead of keeping everything in sync.

My preferred design for this would be to have two types of index. An internal index (what we have now) which is fast, efficient and transient, and a stable index which will always point to the same item despite having added or removed other items (or be testably invalid if the item pointed to has been removed). For strings, this means the stable index would point to the same characters even if the string has been edited (as long as those characters are still there).

I know the second isn’t useful for algorithms in the standard library, but it is sooooo useful for things like storing user selections… and it is very easy to foot-gun when trying to do it yourself. Keeping stored indexes in sync is among my top annoyances while programming.

An easier to implement, but slightly less useful approach, would be to have methods which take an array of indexes along with the proposed change, and then it adjusts the indexes (or replaces them with nil if they are invalid) as it makes the update. For example:

  func append(_ element:Element, adjusting: [Index]) -> [Index?]
  func appending(_ element:Element, adjusting: [Index]) -> (Self, [Index?])

Thanks,
Jon

On Feb 9, 2017, at 3:45 PM, Hooman Mehr via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Feb 9, 2017, at 3:11 PM, Dave Abrahams <dabrahams@apple.com <mailto:dabrahams@apple.com>> wrote:

on Thu Feb 09 2017, "Ted F.A. van Gaalen" <tedvgiosdev-AT-gmail.com <http://tedvgiosdev-at-gmail.com/>> wrote:

Hello Shawn
Just google with any programming language name and “string manipulation”
and you have enough reading for a week or so :o)
TedvG

That truly doesn't answer the question. It's not, “why do people index
strings with integers when that's the only tool they are given for
decomposing strings?” It's, “what do you have to do with strings that's
hard in Swift *because* you can't index them with integers?”

I have done some string processing. I have not encountered any algorithm where an integer index is absolutely needed, but sometimes it might be the most convenient.

For example, there are valid reasons to keep side tables that hold indexes into a string. (such as maintaining attributes that apply to a substring or things like pre-computed positions of soft line breaks). It does not require the index to be integer, but maintaining validity of those indexes after the string is mutated requires being able to offset them back or forth from some position on. These operations could be less verbose and easier if the index happens to be integer or (efficiently) supports + - operators. Also, I know there are other methods to deal with such things and mutating a large string generally is a bad idea, but sometimes it is the easiest and most convenient solution to the problem at hand.

On 9 Feb 2017, at 16:48, Shawn Erickson <shawnce@gmail.com <mailto:shawnce@gmail.com>> wrote:

I also wonder what folks are actually doing that require indexing
into strings. I would love to see some real world examples of what
and why indexing into a string is needed. Who is the end consumer of
that string, etc.

Do folks have so examples?

-Shawn

On Thu, Feb 9, 2017 at 6:56 AM Ted F.A. van Gaalen via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org> <mailto:swift-evolution@swift.org <mailto:swift-evolution@swift.org>>> wrote:
Hello Hooman
That invalidates my assumptions, thanks for evaluating
it's more complex than I thought.
Kind Regards
Ted

On 8 Feb 2017, at 00:07, Hooman Mehr <hooman@mac.com <mailto:hooman@mac.com> <mailto:hooman@mac.com <mailto:hooman@mac.com>>> wrote:

On Feb 7, 2017, at 12:19 PM, Ted F.A. van Gaalen via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org> <mailto:swift-evolution@swift.org <mailto:swift-evolution@swift.org>>> wrote:

I now assume that:
     1. -= a “plain” Unicode character (codepoint?) can result in one glyph.=-

What do you mean by “plain”? Characters in some Unicode scripts are
by no means “plain”. They can affect (and be affected by) the
characters around them, they can cause glyphs around them to
rearrange or combine (like ligatures) or their visual
representation (glyph) may float in the same space as an adjacent
glyph (and seem to be part of the “host” glyph), etc. So, the
general relationship of a character and its corresponding glyph (if
there is one) is complex and depends on context and surroundings
characters.

     2. -= a grapheme cluster always results in just a single glyph, true? =-

False

     3. The only thing that I can see on screen or print are glyphs (“carvings”,visual elements that stand on their own )

The visible effect might not be a visual shape. It may be for example, the way the surrounding shapes change or re-arrange.

    4. In this context, a glyph is a humanly recognisable visual form of a character,

Not in a straightforward one to one fashion, not even in Latin / Roman script.

    5. On this level (the glyph, what I can see as a user) it is not relevant and also not detectable
        with how many Unicode scalars (codepoints ?), grapheme, or even on what kind
        of encoding the glyph was based upon.

False

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org> <mailto:swift-evolution@swift.org <mailto:swift-evolution@swift.org>>
https://lists.swift.org/mailman/listinfo/swift-evolution

<https://lists.swift.org/mailman/listinfo/swift-evolution>

--
-Dave

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Brent_Royal-Gordon · February 11, 2017, 3:23am

This is a very interesting idea. A couple observations:

1. The problem of adjusting indices is not just a String one. It also applies to Array, among other things.

2. This logic could be encapsulated and reused in a separate type. For instance, imagine:

  let myStringProxy = IndexTracking(collection: myString, trackedIndices: [someIndex, otherIndex])
  myStringProxy.insert("foo", at: otherIndex)
  (someIndex, otherIndex) = (stringProxy.trackedIndices[0], stringProxy.trackedIndices[1])

Or, with a helper method:

  myString.withTracked(&someIndex) { myStringProxy in
    myStringProxy.insert("foo", at: otherIndex)
  }

3. An obstacle to doing this correctly is that a collection's index invalidation behavior is not expressed in the type system. If there were a protocol like:

protocol RangeReplaceableWithEarlierIndexesStableCollection: RangeReplaceableCollection {}

That would help us here.

···

On Feb 10, 2017, at 5:49 PM, Jonathan Hull via swift-evolution <swift-evolution@swift.org> wrote:

An easier to implement, but slightly less useful approach, would be to have methods which take an array of indexes along with the proposed change, and then it adjusts the indexes (or replaces them with nil if they are invalid) as it makes the update. For example:

func append(_ element:Element, adjusting: [Index]) -> [Index?]
func appending(_ element:Element, adjusting: [Index]) -> (Self, [Index?])

--
Brent Royal-Gordon
Architechies

dabrahams · February 11, 2017, 5:33pm

All of these examples should be efficiently and expressively handled by the pattern matching API mentioned in the proposal. They definitely do not require random access or integer indexing.

···

Sent from my moss-covered three-handled family gradunza

On Feb 9, 2017, at 5:09 PM, Ted F.A. van Gaalen <tedvgiosdev@gmail.com> wrote:

On 10 Feb 2017, at 00:11, Dave Abrahams <dabrahams@apple.com> wrote:

on Thu Feb 09 2017, "Ted F.A. van Gaalen" <tedvgiosdev-AT-gmail.com> wrote:

Hello Shawn
Just google with any programming language name and “string manipulation”
and you have enough reading for a week or so :o)
TedvG

That truly doesn't answer the question. It's not, “why do people index
strings with integers when that's the only tool they are given for
decomposing strings?” It's, “what do you have to do with strings that's
hard in Swift *because* you can't index them with integers?”

Hi Dave,
Ok. here are just a few examples:
Parsing and validating an ISBN code? or a (freight) container ID? or EAN13 perhaps?
of many of the typical combined article codes and product IDs that many factories and shops use?

or:

E.g. processing legacy files from IBM mainframes:
extract fields from ancient data records read from very old sequential files,
say, a product data record like this from a file from 1978 you’d have to unpack and process:
123534-09EVXD4568,991234,89ABCYELLOW12AGRAINESYTEMZ3453
into:
123, 534, -09, EVXD45, 68,99, 1234,99, ABC, YELLOW, 12A, GRAIN, ESYSTEM, Z3453.
product category, pcs, discount code, product code, price Yen, price $, class code, etc…
in Cobol and PL/1 records are nearly always defined with a fixed field layout like this.:
(storage was limited and very, very expensive, e.g. XML would be regarded as a
"scandalous waste" even the commas in CSV files! )

01 MAILING-RECORD.
       05 COMPANY-NAME PIC X(30).
       05 CONTACTS.
           10 PRESIDENT.
               15 LAST-NAME PIC X(15).
               15 FIRST-NAME PIC X(8).
           10 VP-MARKETING.
               15 LAST-NAME PIC X(15).
               15 FIRST-NAME PIC X(8).
           10 ALTERNATE-CONTACT.
               15 TITLE PIC X(10).
               15 LAST-NAME PIC X(15).
               15 FIRST-NAME PIC X(8).
       05 ADDRESS PIC X(15).
       05 CITY PIC X(15).
       05 STATE PIC XX.
       05 ZIP PIC 9(5).

These are all character data fields here, except for the numeric ZIP field , however in Cobol it can be treated like character data.
So here I am, having to get the data of these old Cobol production files
into a brand new Swift based accounting system of 2017, what can I do?

How do I unpack these records and being the data into a Swift structure or class?
(In Cobol I don’t have to because of the predefined fixed format record layout).

AFAIK there are no similar record structures with fixed fields like this available Swift?

So, the only way I can think of right now is to do it like this:

// mailingRecord is a Swift structure
struct MailingRecord
{
    var companyName: String = “no Name”
     var contacts: CompanyContacts
     .
     etc..
}

// recordStr was read here with ASCII encoding

// unpack data in to structure’s properties, in this case all are Strings
mailingRecord.companyName = recordStr[ 0..<30]
mailingRecord.contacts.president.lastName = recordStr[30..<45]
mailingRecord.contacts.president.firstName = recordStr[45..<53]

// and so on..

Ever worked for e.g. a bank with thousands of these files unchanged formats for years?

Any alternative, convenient en simpler methods in Swift present?

Kind Regards
TedvG
( example of the above Cobol record borrowed from here:
http://www.3480-3590-data-conversion.com/article-reading-cobol-layouts-1.html )

On 9 Feb 2017, at 16:48, Shawn Erickson <shawnce@gmail.com> wrote:

I also wonder what folks are actually doing that require indexing
into strings. I would love to see some real world examples of what
and why indexing into a string is needed. Who is the end consumer of
that string, etc.

Do folks have so examples?

-Shawn

On Thu, Feb 9, 2017 at 6:56 AM Ted F.A. van Gaalen via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:
Hello Hooman
That invalidates my assumptions, thanks for evaluating
it's more complex than I thought.
Kind Regards
Ted

On 8 Feb 2017, at 00:07, Hooman Mehr <hooman@mac.com <mailto:hooman@mac.com>> wrote:

On Feb 7, 2017, at 12:19 PM, Ted F.A. van Gaalen via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

I now assume that:
     1. -= a “plain” Unicode character (codepoint?) can result in one glyph.=-

What do you mean by “plain”? Characters in some Unicode scripts are
by no means “plain”. They can affect (and be affected by) the
characters around them, they can cause glyphs around them to
rearrange or combine (like ligatures) or their visual
representation (glyph) may float in the same space as an adjacent
glyph (and seem to be part of the “host” glyph), etc. So, the
general relationship of a character and its corresponding glyph (if
there is one) is complex and depends on context and surroundings
characters.

     2. -= a grapheme cluster always results in just a single glyph, true? =-

False

     3. The only thing that I can see on screen or print are glyphs (“carvings”,visual elements that stand on their own )

The visible effect might not be a visual shape. It may be for example, the way the surrounding shapes change or re-arrange.

    4. In this context, a glyph is a humanly recognisable visual form of a character,

Not in a straightforward one to one fashion, not even in Latin / Roman script.

    5. On this level (the glyph, what I can see as a user) it is not relevant and also not detectable
        with how many Unicode scalars (codepoints ?), grapheme, or even on what kind
        of encoding the glyph was based upon.

False

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

<https://lists.swift.org/mailman/listinfo/swift-evolution>

--
-Dave

TedvG · February 10, 2017, 5:20pm

Please see in-line response below

Hello Shawn
Just google with any programming language name and “string manipulation”
and you have enough reading for a week or so :o)
TedvG

That truly doesn't answer the question. It's not, “why do people index
strings with integers when that's the only tool they are given for
decomposing strings?” It's, “what do you have to do with strings that's
hard in Swift *because* you can't index them with integers?”

Hi Dave,
Ok. here are just a few examples:
Parsing and validating an ISBN code? or a (freight) container ID? or EAN13 perhaps?
of many of the typical combined article codes and product IDs that many factories and shops use?

or:

E.g. processing legacy files from IBM mainframes:
extract fields from ancient data records read from very old sequential files,
say, a product data record like this from a file from 1978 you’d have to unpack and process:
123534-09EVXD4568,991234,89ABCYELLOW12AGRAINESYTEMZ3453
into:
123, 534, -09, EVXD45, 68,99, 1234,99, ABC, YELLOW, 12A, GRAIN, ESYSTEM, Z3453.
product category, pcs, discount code, product code, price Yen, price $, class code, etc…
in Cobol and PL/1 records are nearly always defined with a fixed field layout like this.:
(storage was limited and very, very expensive, e.g. XML would be regarded as a
"scandalous waste" even the commas in CSV files! )

01 MAILING-RECORD.
       05 COMPANY-NAME PIC X(30).
       05 CONTACTS.
           10 PRESIDENT.
               15 LAST-NAME PIC X(15).
               15 FIRST-NAME PIC X(8).
           10 VP-MARKETING.
               15 LAST-NAME PIC X(15).
               15 FIRST-NAME PIC X(8).
           10 ALTERNATE-CONTACT.
               15 TITLE PIC X(10).
               15 LAST-NAME PIC X(15).
               15 FIRST-NAME PIC X(8).
       05 ADDRESS PIC X(15).
       05 CITY PIC X(15).
       05 STATE PIC XX.
       05 ZIP PIC 9(5).

These are all character data fields here, except for the numeric ZIP field , however in Cobol it can be treated like character data.
So here I am, having to get the data of these old Cobol production files
into a brand new Swift based accounting system of 2017, what can I do?

How do I unpack these records and being the data into a Swift structure or class?
(In Cobol I don’t have to because of the predefined fixed format record layout).

AFAIK there are no similar record structures with fixed fields like this available Swift?

So, the only way I can think of right now is to do it like this:

// mailingRecord is a Swift structure
struct MailingRecord
{
    var companyName: String = “no Name”
     var contacts: CompanyContacts
     .
     etc..
}

// recordStr was read here with ASCII encoding

// unpack data in to structure’s properties, in this case all are Strings
mailingRecord.companyName = recordStr[ 0..<30]
mailingRecord.contacts.president.lastName = recordStr[30..<45]
mailingRecord.contacts.president.firstName = recordStr[45..<53]

// and so on..

Ever worked for e.g. a bank with thousands of these files unchanged formats for years?

Any alternative, convenient en simpler methods in Swift present?
These looks like examples of fix data format

Hi Shawn,
No, it could also be an UTF-8 String.

that could be parsed from a byte buffer into strings, etc.

How would you do that? could you please provide an example how to do this, with a byte buffer?
eg. read from flat ascii file —> unpack fields —> store in structure props?

Likely little need to force them via a higher order string concept,

What do you mean here with “high order string concept” ??
Swift is a high level language, I expect to do this with Strings directly,
instead of being forced to use low-level coding with byte arrays etc.
(I have/want no time for that)
Surely, one doesn’t have to resort to that in a high level language like Swift?
If I am certain that all characters in a file etc. are of fixed width, even in UTF-32
(in the above example I am 100% sure of that) then
using str[n1..<n2] is that case legitimate, because there are no
grapheme characters involved.
Therefore IMHO String direct subscripting should be available in Swift
for all Unicode types, and that the responsibility wether or not to use
this feature is with the programmer, not the language designer.

at least not until unpacked from its compact byte form.

I am sorry, but to me, it all sounds a bit like:
“why solve the problem with simple solution, when one can make it much
more complicated?” Be more pragmatic.

TedvG,

···

On 10 Feb 2017, at 03:56, Shawn Erickson <shawnce@gmail.com> wrote:
On Thu, Feb 9, 2017 at 5:09 PM Ted F.A. van Gaalen <tedvgiosdev@gmail.com <mailto:tedvgiosdev@gmail.com>> wrote:

On 10 Feb 2017, at 00:11, Dave Abrahams <dabrahams@apple.com <mailto:dabrahams@apple.com>> wrote:
on Thu Feb 09 2017, "Ted F.A. van Gaalen" <tedvgiosdev-AT-gmail.com <http://tedvgiosdev-at-gmail.com/>> wrote:

-Shawn

dabrahams · February 11, 2017, 5:38pm

One of the major (so far unstated) goals of the String rethink is to eliminate reasons for people to process textual data outside of String, though. You shouldn't have to use an array of bytes to get performance processing of ASCII, for example.

···

Sent from my moss-covered three-handled family gradunza

On Feb 9, 2017, at 6:56 PM, Shawn Erickson <shawnce@gmail.com> wrote:

On Thu, Feb 9, 2017 at 5:09 PM Ted F.A. van Gaalen <tedvgiosdev@gmail.com> wrote:

On 10 Feb 2017, at 00:11, Dave Abrahams <dabrahams@apple.com> wrote:

on Thu Feb 09 2017, "Ted F.A. van Gaalen" <tedvgiosdev-AT-gmail.com> wrote:

Hello Shawn
Just google with any programming language name and “string manipulation”
and you have enough reading for a week or so :o)
TedvG

That truly doesn't answer the question. It's not, “why do people index
strings with integers when that's the only tool they are given for
decomposing strings?” It's, “what do you have to do with strings that's
hard in Swift *because* you can't index them with integers?”

Hi Dave,
Ok. here are just a few examples:
Parsing and validating an ISBN code? or a (freight) container ID? or EAN13 perhaps?
of many of the typical combined article codes and product IDs that many factories and shops use?

or:

E.g. processing legacy files from IBM mainframes:
extract fields from ancient data records read from very old sequential files,
say, a product data record like this from a file from 1978 you’d have to unpack and process:
123534-09EVXD4568,991234,89ABCYELLOW12AGRAINESYTEMZ3453
into:
123, 534, -09, EVXD45, 68,99, 1234,99, ABC, YELLOW, 12A, GRAIN, ESYSTEM, Z3453.
product category, pcs, discount code, product code, price Yen, price $, class code, etc…
in Cobol and PL/1 records are nearly always defined with a fixed field layout like this.:
(storage was limited and very, very expensive, e.g. XML would be regarded as a
"scandalous waste" even the commas in CSV files! )

01 MAILING-RECORD.
       05 COMPANY-NAME PIC X(30).
       05 CONTACTS.
           10 PRESIDENT.
               15 LAST-NAME PIC X(15).
               15 FIRST-NAME PIC X(8).
           10 VP-MARKETING.
               15 LAST-NAME PIC X(15).
               15 FIRST-NAME PIC X(8).
           10 ALTERNATE-CONTACT.
               15 TITLE PIC X(10).
               15 LAST-NAME PIC X(15).
               15 FIRST-NAME PIC X(8).
       05 ADDRESS PIC X(15).
       05 CITY PIC X(15).
       05 STATE PIC XX.
       05 ZIP PIC 9(5).

These are all character data fields here, except for the numeric ZIP field , however in Cobol it can be treated like character data.
So here I am, having to get the data of these old Cobol production files
into a brand new Swift based accounting system of 2017, what can I do?

How do I unpack these records and being the data into a Swift structure or class?
(In Cobol I don’t have to because of the predefined fixed format record layout).

AFAIK there are no similar record structures with fixed fields like this available Swift?

So, the only way I can think of right now is to do it like this:

// mailingRecord is a Swift structure
struct MailingRecord
{
    var companyName: String = “no Name”
     var contacts: CompanyContacts
     .
     etc..
}

// recordStr was read here with ASCII encoding

// unpack data in to structure’s properties, in this case all are Strings
mailingRecord.companyName = recordStr[ 0..<30]
mailingRecord.contacts.president.lastName = recordStr[30..<45]
mailingRecord.contacts.president.firstName = recordStr[45..<53]

// and so on..

Ever worked for e.g. a bank with thousands of these files unchanged formats for years?

Any alternative, convenient en simpler methods in Swift present?

These looks like examples of fix data format that could be parsed from a byte buffer into strings, etc. Likely little need to force them via a higher order string concept, at least not until unpacked from its compact byte form.

-Shawn

Ronald_Bell · February 11, 2017, 3:59pm

On that last topic, NSAttributedString has always seemed like a strange design — a class with a bunch of attribute methods with a property that lets you interrogate the String behind it all.

It always seemed to me that there was a false distinction. Strings should have optional properties, the way GameplayKit does GKEntities and GKComponents.

Simple Strings would return nil if you asked them for their attributes.

Attributed Strings would return attributes.

I think it would be a lot more intuitive how to parse an attributed string in blocks and then refer back to the attributes of each chunk, for one thing.

Is there a reason why composition was chosen to be the way it is in NSAttributedString, instead?

- Ron

···

On Feb 10, 2017, at 12:38 PM, Hooman Mehr via swift-evolution <swift-evolution@swift.org> wrote:

On Feb 9, 2017, at 6:50 PM, Shawn Erickson <shawnce@gmail.com <mailto:shawnce@gmail.com>> wrote:

On Thu, Feb 9, 2017 at 3:45 PM Hooman Mehr <hooman@mac.com <mailto:hooman@mac.com>> wrote:

On Feb 9, 2017, at 3:11 PM, Dave Abrahams <dabrahams@apple.com <mailto:dabrahams@apple.com>> wrote:

on Thu Feb 09 2017, "Ted F.A. van Gaalen" <tedvgiosdev-AT-gmail.com <http://tedvgiosdev-at-gmail.com/>> wrote:

Hello Shawn
Just google with any programming language name and “string manipulation”
and you have enough reading for a week or so :o)
TedvG

That truly doesn't answer the question. It's not, “why do people index
strings with integers when that's the only tool they are given for
decomposing strings?” It's, “what do you have to do with strings that's
hard in Swift *because* you can't index them with integers?”

I have done some string processing. I have not encountered any algorithm where an integer index is absolutely needed, but sometimes it might be the most convenient.

For example, there are valid reasons to keep side tables that hold indexes into a string. (such as maintaining attributes that apply to a substring or things like pre-computed positions of soft line breaks). It does not require the index to be integer, but maintaining validity of those indexes after the string is mutated requires being able to offset them back or forth from some position on. These operations could be less verbose and easier if the index happens to be integer or (efficiently) supports + - operators. Also, I know there are other methods to deal with such things and mutating a large string generally is a bad idea, but sometimes it is the easiest and most convenient solution to the problem at hand.

The end goal of this string is for human consumption right? So such manipulation would need need to unicode aware in the modern world? ..or is it for some other reason?

-Shawn

For an example of what I mean, see the source code of NS(Mutable)AttributedString <https://github.com/apple/swift-corelibs-foundation/blob/master/Foundation/NSAttributedString.swift> and note how most of the mutating methods of Mutable variant are not implemented yet...

So, a good example of where such indexing would be convenient, could be writing a swift-native AttributedString backed by Swift native String.

Karl · February 11, 2017, 1:16pm

I mentioned this much earlier in the thread. My preferred solution would be some kind of RRC-like protocol where mutating methods returned an associated “IndexDisplacement” type. That IndexDisplacement would store, for each operation, the offset and number of index-positions which have been inserted/removed, and know how to translate an index in the previous state in to one in the new state.

You would still need to manually adjust your stored indexes using that IndexDisplacement, but it’d be less error-prone as the logic is written for you.

The standard (non-IndexDisplacement-returning) RRC methods could then be implemented as wrappers which discard the displacement.

- Karl

···

On 11 Feb 2017, at 04:23, Brent Royal-Gordon <brent@architechies.com> wrote:

On Feb 10, 2017, at 5:49 PM, Jonathan Hull via swift-evolution <swift-evolution@swift.org> wrote:

An easier to implement, but slightly less useful approach, would be to have methods which take an array of indexes along with the proposed change, and then it adjusts the indexes (or replaces them with nil if they are invalid) as it makes the update. For example:

  func append(_ element:Element, adjusting: [Index]) -> [Index?]
  func appending(_ element:Element, adjusting: [Index]) -> (Self, [Index?])

This is a very interesting idea. A couple observations:

1. The problem of adjusting indices is not just a String one. It also applies to Array, among other things.

2. This logic could be encapsulated and reused in a separate type. For instance, imagine:

  let myStringProxy = IndexTracking(collection: myString, trackedIndices: [someIndex, otherIndex])
  myStringProxy.insert("foo", at: otherIndex)
  (someIndex, otherIndex) = (stringProxy.trackedIndices[0], stringProxy.trackedIndices[1])

Or, with a helper method:

  myString.withTracked(&someIndex) { myStringProxy in
    myStringProxy.insert("foo", at: otherIndex)
  }

3. An obstacle to doing this correctly is that a collection's index invalidation behavior is not expressed in the type system. If there were a protocol like:

  protocol RangeReplaceableWithEarlierIndexesStableCollection: RangeReplaceableCollection {}

That would help us here.

--
Brent Royal-Gordon
Architechies

TedvG · February 12, 2017, 6:17pm

Hi Dave,
then I am very interested to know how to unpack aString (e.g. read from a file record such as in the previous example:
123534-09EVXD4568,991234,89ABCYELLOW12AGRAINESYTEMZ3453 )
without using direct subscripting like str[n1…n2) ?
(which btw is for me the most straightforward and ideal method)
conditions:
   -The source string contains fields of known position (offset) and length, concatenated together
    without any separators (like in a CSV)
  -the contents of each field is unpredictable.
   which excludes the use of pattern-matching.
   -the source string needs to be unpacked in independent strings.

I made this example: (the comments also stress my point)
//: Playground - noun: a place outside the mean and harsh production environment
// No presidents were harmed during the production of this example.
import UIKit
import Foundation

// The following String extension with subscriptor "direct access"
// functionality, included in in almost each and every app I create,
// wouldn't be necessary if str[a..<b] was an integral part of Swift strings!
//
// However when str[a..<b] would or will be implemented into Swift,
// then, by all means, make sure in the documentation, notably the
// Swift language manual, that any string-element position and count
// does not necessarely correspond 1:1 with the positions and length
// on a graphical presentation devices, e.g. like dispays and printers.
//
// Leave it to the programmer to decide, wether or not to use
// direct subscripting like str[a..<b].
// as -in most cases- it only makes sense where fixed length characters
// are used, like in the example below.
//
// Like in any other programming language, an important focus should be
// to make things as intuitively simple as possible as this:
// - reduces and prevents errors caused by indirect programming
// - also note that it might reduce the risk of the normally
// very friendly but mostly stressful guys of the maintenance
// department coming after you with dangerous intentions...
extension String
{
    subscript(i: Int) -> String
    {
        guard i >= 0 && i < characters.count else { return "" }
        return String(self[index(startIndex, offsetBy: i)])
    }

    subscript(range: Range<Int>) -> String
    {
        let lowerIndex = index(startIndex, offsetBy: max(0,range.lowerBound), limitedBy: endIndex) ?? endIndex
        return substring(with: lowerIndex..<(index(lowerIndex, offsetBy: range.upperBound - range.lowerBound, limitedBy: endIndex) ?? endIndex))
    }

    subscript(range: ClosedRange<Int>) -> String
    {
        let lowerIndex = index(startIndex, offsetBy: max(0,range.lowerBound), limitedBy: endIndex) ?? endIndex
        return substring(with: lowerIndex..<(index(lowerIndex, offsetBy: range.upperBound - range.lowerBound + 1, limitedBy: endIndex) ?? endIndex))
    }
}
// In the following example, the record's field positions and lengths are fixed format
// and will never change.
// Also, the record's contents has been validated completely by the sending application.

// Normally it is an input record, read from a storage medium,
// however for the purpose of this example it is defined here:

let record = "123A.534.CMCU3Arduino Due Arm 32-bit Micro controller. 000000034100000005680000002250$"

// Define a product data structure:
struct Product
{
    var id :String // is key
    var group: String
    var name: String
    var description : String
    var inStock: Int
    var ordered : Int
    var price: Int // in cents: no Money data type in Swift available.
    var currency: String

    // of course one could use "set/get" properties here
    // which could validate the input to this structure.

    var priceFormatted: String // computed property.
    {
        get
        {
            let whole = (price / 100)
            let cents = price - (whole * 100)
            return currency + " \(whole).\(cents)"
        }
    }

    // TODO: disable other default initiators.
    init(inputrecord: String)
    {
       id = inputrecord[ 0..<10]
       group = inputrecord[10..<14]
       name = inputrecord[14..<30]
       description = inputrecord[30..<60]
       inStock = Int(inputrecord[60..<70])!
       ordered = Int(inputrecord[70..<80])!
       price = Int(inputrecord[80..<90])!
       currency = inputrecord[90]
    }

    // Add necessary business and DB logic for products here.
}

func test()
{
let product = Product(inputrecord: record)

    print("=== Product data for the item with ID: \(product.id) ====")
    print("ID : \(product.id)")
    print("group : \(product.group)")
    print("name : \(product.name)")
    print("description : \(product.description)")
    print("items in stock : \(product.inStock)")
    print("items ordered : \(product.ordered)")
    print("price per item : \(product.priceFormatted)")
    print("=========================================================")
}

test()

Which emitted the following output

=== Product data for the item with ID 123A.534.C ====
ID : 123A.534.C
group : MCU3
name : Arduino Due
description : Arm 32-bit Micro controller.
items in stock : 341
items ordered : 568
price per item : $ 22.50

···

On 11 Feb 2017, at 18:33, Dave Abrahams <dabrahams@apple.com> wrote:

All of these examples should be efficiently and expressively handled by the pattern matching API mentioned in the proposal. They definitely do not require random access or integer indexing.

====================================================

Isn’t that an elegant solution or what?
I might start a very lengthy discussion here about the threshold of where and how
to protect the average programmer (like me :o) from falling in to language pittfalls
and to what extend these have effect on working with a PL. One cannot make
a PL idiot-proof. Of course, i agree a lot of it make sense, and also the “intelligence”
of the Swift compiler (sometimes it almost feels as if it sits next to me looking at
the screen and shaking its head from time to time) But hey, remember most of
us in our profession have a brain too.
(btw, if you now of a way to let Xcode respect in-between spaces when auto-formatting please let me know, thanks)

@Ben Cohen:
Hi, you wrote:
"p.s. as someone who has worked in a bank with thousands of ancient file formats, no argument from me that COBOL rules :)"
Although still the most part of accounting software is Cobol (mostly because it is too expensive
and risky to convert to newer technologies) I don’t think that Cobol rules and that new apps definitely should
not be written in Cobol. I wouldn’t be doing Swift if I thought otherwise.
If I would be doing a Cobol project again, It would be with same enjoyment as say,
a 2017 mechanical engineer, working on a steam locomotive of a touristic railroad.
which I would do with dedication as well. However, never use this comparison
at the hiring interview..:o)

Kind Regards
TedvG

Sent from my moss-covered three-handled family gradunza

On Feb 9, 2017, at 5:09 PM, Ted F.A. van Gaalen <tedvgiosdev@gmail.com <mailto:tedvgiosdev@gmail.com>> wrote:

On 10 Feb 2017, at 00:11, Dave Abrahams <dabrahams@apple.com <mailto:dabrahams@apple.com>> wrote:

on Thu Feb 09 2017, "Ted F.A. van Gaalen" <tedvgiosdev-AT-gmail.com <http://tedvgiosdev-at-gmail.com/>> wrote:

Hello Shawn
Just google with any programming language name and “string manipulation”
and you have enough reading for a week or so :o)
TedvG

That truly doesn't answer the question. It's not, “why do people index
strings with integers when that's the only tool they are given for
decomposing strings?” It's, “what do you have to do with strings that's
hard in Swift *because* you can't index them with integers?”

Hi Dave,
Ok. here are just a few examples:
Parsing and validating an ISBN code? or a (freight) container ID? or EAN13 perhaps?
of many of the typical combined article codes and product IDs that many factories and shops use?

or:

E.g. processing legacy files from IBM mainframes:
extract fields from ancient data records read from very old sequential files,
say, a product data record like this from a file from 1978 you’d have to unpack and process:
123534-09EVXD4568,991234,89ABCYELLOW12AGRAINESYTEMZ3453
into:
123, 534, -09, EVXD45, 68,99, 1234,99, ABC, YELLOW, 12A, GRAIN, ESYSTEM, Z3453.
product category, pcs, discount code, product code, price Yen, price $, class code, etc…
in Cobol and PL/1 records are nearly always defined with a fixed field layout like this.:
(storage was limited and very, very expensive, e.g. XML would be regarded as a
"scandalous waste" even the commas in CSV files! )

01 MAILING-RECORD.
       05 COMPANY-NAME PIC X(30).
       05 CONTACTS.
           10 PRESIDENT.
               15 LAST-NAME PIC X(15).
               15 FIRST-NAME PIC X(8).
           10 VP-MARKETING.
               15 LAST-NAME PIC X(15).
               15 FIRST-NAME PIC X(8).
           10 ALTERNATE-CONTACT.
               15 TITLE PIC X(10).
               15 LAST-NAME PIC X(15).
               15 FIRST-NAME PIC X(8).
       05 ADDRESS PIC X(15).
       05 CITY PIC X(15).
       05 STATE PIC XX.
       05 ZIP PIC 9(5).

These are all character data fields here, except for the numeric ZIP field , however in Cobol it can be treated like character data.
So here I am, having to get the data of these old Cobol production files
into a brand new Swift based accounting system of 2017, what can I do?

How do I unpack these records and being the data into a Swift structure or class?
(In Cobol I don’t have to because of the predefined fixed format record layout).

AFAIK there are no similar record structures with fixed fields like this available Swift?

So, the only way I can think of right now is to do it like this:

// mailingRecord is a Swift structure
struct MailingRecord
{
    var companyName: String = “no Name”
     var contacts: CompanyContacts
     .
     etc..
}

// recordStr was read here with ASCII encoding

// unpack data in to structure’s properties, in this case all are Strings
mailingRecord.companyName = recordStr[ 0..<30]
mailingRecord.contacts.president.lastName = recordStr[30..<45]
mailingRecord.contacts.president.firstName = recordStr[45..<53]

// and so on..

Ever worked for e.g. a bank with thousands of these files unchanged formats for years?

Any alternative, convenient en simpler methods in Swift present?

Kind Regards
TedvG
( example of the above Cobol record borrowed from here:
http://www.3480-3590-data-conversion.com/article-reading-cobol-layouts-1.html )

On 9 Feb 2017, at 16:48, Shawn Erickson <shawnce@gmail.com <mailto:shawnce@gmail.com>> wrote:

I also wonder what folks are actually doing that require indexing
into strings. I would love to see some real world examples of what
and why indexing into a string is needed. Who is the end consumer of
that string, etc.

Do folks have so examples?

-Shawn

On Thu, Feb 9, 2017 at 6:56 AM Ted F.A. van Gaalen via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org> <mailto:swift-evolution@swift.org <mailto:swift-evolution@swift.org>>> wrote:
Hello Hooman
That invalidates my assumptions, thanks for evaluating
it's more complex than I thought.
Kind Regards
Ted

On 8 Feb 2017, at 00:07, Hooman Mehr <hooman@mac.com <mailto:hooman@mac.com> <mailto:hooman@mac.com <mailto:hooman@mac.com>>> wrote:

On Feb 7, 2017, at 12:19 PM, Ted F.A. van Gaalen via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org> <mailto:swift-evolution@swift.org <mailto:swift-evolution@swift.org>>> wrote:

I now assume that:
     1. -= a “plain” Unicode character (codepoint?) can result in one glyph.=-

What do you mean by “plain”? Characters in some Unicode scripts are
by no means “plain”. They can affect (and be affected by) the
characters around them, they can cause glyphs around them to
rearrange or combine (like ligatures) or their visual
representation (glyph) may float in the same space as an adjacent
glyph (and seem to be part of the “host” glyph), etc. So, the
general relationship of a character and its corresponding glyph (if
there is one) is complex and depends on context and surroundings
characters.

     2. -= a grapheme cluster always results in just a single glyph, true? =-

False

     3. The only thing that I can see on screen or print are glyphs (“carvings”,visual elements that stand on their own )

The visible effect might not be a visual shape. It may be for example, the way the surrounding shapes change or re-arrange.

    4. In this context, a glyph is a humanly recognisable visual form of a character,

Not in a straightforward one to one fashion, not even in Latin / Roman script.

    5. On this level (the glyph, what I can see as a user) it is not relevant and also not detectable
        with how many Unicode scalars (codepoints ?), grapheme, or even on what kind
        of encoding the glyph was based upon.

False

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org> <mailto:swift-evolution@swift.org <mailto:swift-evolution@swift.org>>
https://lists.swift.org/mailman/listinfo/swift-evolution

<https://lists.swift.org/mailman/listinfo/swift-evolution>

--
-Dave

Ben_Cohen · February 10, 2017, 7:47pm

Hi Ted,

Here’s a sketch of one way to handle this kind of processing without requiring integer indexing. Hopefully not too buggy though I haven’t tested it extensively :).

Here I’m stashing the parsed values in a dictionary, but you could also write code to insert them into a proper data structure where the dictionary set is happening (or maybe stick with the dictionary build, and then use that dictionary to populate your data structure, along with some more data validation and error handling).

import Foundation
extension String: Collection { }

let fieldLengths: DictionaryLiteral = [
    "CompanyName":30,
    "PresidentLastName":15,
    "PresidentFirstName":8,
    "VPMarketingLastName":15,
    "VPMarketingFirstName":8,
    "AlternateContactTitle":10,
    "AlternateContactLastName":15,
    "AlternateContactFirstName":8,
    "Address":15,
    "City":15,
    "State":2,
    "Zip":5,
]

var data = "Premier Properties Murray Mitch Ricky Roma Office MgrWilliamson John 350 Fifth Av New York NY10118"
var keyedRecord: [String:String] = [:]

for (key,length) in fieldLengths {
let field = data.prefix(length)

    guard field.count == length
    else { fatalError("Input too short while reading \(key)") }
    // or however you want to handle it

    keyedRecord[key] = field.trimmingCharacters(in: CharacterSet.whitespaces)

    data = data.dropFirst(length)
}
guard data.isEmpty
else { fatalError("Input too long") }

print(keyedRecord)

I think it’s worth noting how seductive it is, with the integer indexing, to perform unchecked indexing into the data: recordStr[ 0..<30] is great until you have to process a corrupt record. Working in terms of higher-level APIs encourages handling of the failure cases. As an added bonus, when you upgrade your system and now the incoming data turns out to be utf8, your system doesn’t crash when a bored intern inserts some emoji into the president’s name.

There is still definitely room to make this easier/more discoverable for users:

- The “patterns” concept that is briefly touched on in the string manifesto would hopefully provide a another way of expressing this, with patterns matching fixed numbers of characters.
- The need to walk over the field multiple times (first prefix, then count, then dropFirst) should be better-handled by some other scanning APIs mentioned in the manifesto e.g. if let field = data.dropPrefix(lengthPattern). Note that if the underlying String held only ASCII/Latin1, these should still be constant-time operations under the hood.
- Another approach is to provide generic operations on Collection that chunks collections into subsequences of given lengths and serves them up, possibly via a a lazy view. This would have the advantage of not requiring mutable state in the loop.

But the above is what we can achieve with the tools we have today.

p.s. as someone who has worked in a bank with thousands of ancient file formats, no argument from me that COBOL rules :)

···

On Feb 10, 2017, at 9:20 AM, Ted F.A. van Gaalen via swift-evolution <swift-evolution@swift.org> wrote:

Please see in-line response below

On 10 Feb 2017, at 03:56, Shawn Erickson <shawnce@gmail.com <mailto:shawnce@gmail.com>> wrote:

On Thu, Feb 9, 2017 at 5:09 PM Ted F.A. van Gaalen <tedvgiosdev@gmail.com <mailto:tedvgiosdev@gmail.com>> wrote:

On 10 Feb 2017, at 00:11, Dave Abrahams <dabrahams@apple.com <mailto:dabrahams@apple.com>> wrote:

on Thu Feb 09 2017, "Ted F.A. van Gaalen" <tedvgiosdev-AT-gmail.com <http://tedvgiosdev-at-gmail.com/>> wrote:

Hello Shawn
Just google with any programming language name and “string manipulation”
and you have enough reading for a week or so :o)
TedvG

That truly doesn't answer the question. It's not, “why do people index
strings with integers when that's the only tool they are given for
decomposing strings?” It's, “what do you have to do with strings that's
hard in Swift *because* you can't index them with integers?”

Hi Dave,
Ok. here are just a few examples:
Parsing and validating an ISBN code? or a (freight) container ID? or EAN13 perhaps?
of many of the typical combined article codes and product IDs that many factories and shops use?

or:

E.g. processing legacy files from IBM mainframes:
extract fields from ancient data records read from very old sequential files,
say, a product data record like this from a file from 1978 you’d have to unpack and process:
123534-09EVXD4568,991234,89ABCYELLOW12AGRAINESYTEMZ3453
into:
123, 534, -09, EVXD45, 68,99, 1234,99, ABC, YELLOW, 12A, GRAIN, ESYSTEM, Z3453.
product category, pcs, discount code, product code, price Yen, price $, class code, etc…
in Cobol and PL/1 records are nearly always defined with a fixed field layout like this.:
(storage was limited and very, very expensive, e.g. XML would be regarded as a
"scandalous waste" even the commas in CSV files! )

01 MAILING-RECORD.
       05 COMPANY-NAME PIC X(30).
       05 CONTACTS.
           10 PRESIDENT.
               15 LAST-NAME PIC X(15).
               15 FIRST-NAME PIC X(8).
           10 VP-MARKETING.
               15 LAST-NAME PIC X(15).
               15 FIRST-NAME PIC X(8).
           10 ALTERNATE-CONTACT.
               15 TITLE PIC X(10).
               15 LAST-NAME PIC X(15).
               15 FIRST-NAME PIC X(8).
       05 ADDRESS PIC X(15).
       05 CITY PIC X(15).
       05 STATE PIC XX.
       05 ZIP PIC 9(5).

These are all character data fields here, except for the numeric ZIP field , however in Cobol it can be treated like character data.
So here I am, having to get the data of these old Cobol production files
into a brand new Swift based accounting system of 2017, what can I do?

How do I unpack these records and being the data into a Swift structure or class?
(In Cobol I don’t have to because of the predefined fixed format record layout).

AFAIK there are no similar record structures with fixed fields like this available Swift?

So, the only way I can think of right now is to do it like this:

// mailingRecord is a Swift structure
struct MailingRecord
{
    var companyName: String = “no Name”
     var contacts: CompanyContacts
     .
     etc..
}

// recordStr was read here with ASCII encoding

// unpack data in to structure’s properties, in this case all are Strings
mailingRecord.companyName = recordStr[ 0..<30]
mailingRecord.contacts.president.lastName = recordStr[30..<45]
mailingRecord.contacts.president.firstName = recordStr[45..<53]

// and so on..

Ever worked for e.g. a bank with thousands of these files unchanged formats for years?

Any alternative, convenient en simpler methods in Swift present?
These looks like examples of fix data format

Hi Shawn,
No, it could also be an UTF-8 String.


that could be parsed from a byte buffer into strings, etc.

How would you do that? could you please provide an example how to do this, with a byte buffer?
eg. read from flat ascii file —> unpack fields —> store in structure props?

Likely little need to force them via a higher order string concept,

What do you mean here with “high order string concept” ??
Swift is a high level language, I expect to do this with Strings directly,
instead of being forced to use low-level coding with byte arrays etc.
(I have/want no time for that)
Surely, one doesn’t have to resort to that in a high level language like Swift?
If I am certain that all characters in a file etc. are of fixed width, even in UTF-32
(in the above example I am 100% sure of that) then
using str[n1..<n2] is that case legitimate, because there are no
grapheme characters involved.
Therefore IMHO String direct subscripting should be available in Swift
for all Unicode types, and that the responsibility wether or not to use
this feature is with the programmer, not the language designer.

at least not until unpacked from its compact byte form.

I am sorry, but to me, it all sounds a bit like:
“why solve the problem with simple solution, when one can make it much
more complicated?” Be more pragmatic.

TedvG,

-Shawn

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

dabrahams · February 11, 2017, 6:06pm

An easier to implement, but slightly less useful approach, would be to have methods which take an array of indexes along with the proposed change, and then it adjusts the indexes (or replaces them with nil if they are invalid) as it makes the update. For example:

func append(_ element:Element, adjusting: [Index]) -> [Index?]
func appending(_ element:Element, adjusting: [Index]) -> (Self, [Index?])

This is a very interesting idea. A couple observations:

1. The problem of adjusting indices is not just a String one. It also applies to Array, among other things.

You can think of this as a generalization of AttributedString

2. This logic could be encapsulated and reused in a separate type. For instance, imagine:

   let myStringProxy = IndexTracking(collection: myString, trackedIndices: [someIndex, otherIndex])
   myStringProxy.insert("foo", at: otherIndex)
   (someIndex, otherIndex) = (stringProxy.trackedIndices[0], stringProxy.trackedIndices[1])

Or, with a helper method:

   myString.withTracked(&someIndex) { myStringProxy in
       myStringProxy.insert("foo", at: otherIndex)
   }

You can't adjust indices in arbitrary RangeReplaceableCollections without penalizing the performance of all RangeReplaceableCollections. Also, to do it without introducing reference semantics you need to bundle the index storage with the collection or explicitly make the collection of indices to be updated avAilable input to the range replacement methods. Given the latter, you could build something like this to implement the former:

struct IndexTracked<C: RangeReplaceableCollection>

Also, you probably want his thing to adjust ranges rather than indices because otherwise you need to decide whether to adjust an index when there is an insertion at that position. Does it stick to the left or right element?

3. An obstacle to doing this correctly is that a collection's index invalidation behavior is not expressed in the type system.

I don't see why that's an issue.

If there were a protocol like:

protocol RangeReplaceableWithEarlierIndexesStableCollection: RangeReplaceableCollection {}

There's one interesting wrinkle on invalidation I discovered recently: there is an important class of indices that are not invalidated as positions when they precede the change, but may be invalidated for movement: those that store some cached information about following elements, such as transcoded Unicode code units.

···

Sent from my moss-covered three-handled family gradunza

On Feb 11, 2017, at 5:16 AM, Karl Wagner <razielim@gmail.com> wrote:

On 11 Feb 2017, at 04:23, Brent Royal-Gordon <brent@architechies.com> wrote:
On Feb 10, 2017, at 5:49 PM, Jonathan Hull via swift-evolution <swift-evolution@swift.org> wrote:

That would help us here.

--
Brent Royal-Gordon
Architechies

I mentioned this much earlier in the thread. My preferred solution would be some kind of RRC-like protocol where mutating methods returned an associated “IndexDisplacement” type. That IndexDisplacement would store, for each operation, the offset and number of index-positions which have been inserted/removed, and know how to translate an index in the previous state in to one in the new state.

You would still need to manually adjust your stored indexes using that IndexDisplacement, but it’d be less error-prone as the logic is written for you.

The standard (non-IndexDisplacement-returning) RRC methods could then be implemented as wrappers which discard the displacement.

- Karl

TedvG · February 20, 2017, 8:59pm

Hi Ben, Dave (you should not read this now, you’re on vacation :o) & Others

As described in the Swift Standard Library API Reference:

The Character type represents a character made up of one or more Unicode scalar values,
grouped by a Unicode boundary algorithm. Generally, a Character instance matches what
the reader of a string will perceive as a single character. The number of visible characters is
generally the most natural way to count the length of a string.
The smallest discrete unit we (app programmers) are mostly working with is this
perceived visible character, what else?

If that is the case, my reasoning is, that Strings (could / should? ) be relatively simple,
because most, if not all, complexity of Unicode is confined within the Character object and
completely hidden** for the average application programmer, who normally only needs
to work with Strings which contains these visible Characters, right?
It doesn’t then make no difference at all “what’ is in” the Character, (excellent implementation btw)
(Unicode, ASCCII, EBCDIC, Elvish, KlingonIV, IntergalacticV.2, whatever)
because we rely in sublime oblivion for the visually representation of whatever is in
the Character on miraculous font processors hidden in the dark depths of the OS.

Then, in this perspective, my question is: why is String not implemented as
directly based upon an array [Character] ? In that case one can refer to the Characters of the
String directly, not only for direct subscripting and other String functionality in an efficient way.
(i do hava scope of independent Swift here, that is interaction with libraries should be
solved by the compiler, so as not to be restricted by legacy ObjC etc.

** (expect if one needs to do e.g. access individual elements and/or compose graphics directly?
but for this purpose the Character’s properties are accessible)

For the sake of convenience, based upon the above reasoning, I now “emulate" this in
a string extension, thereby ignoring the rare cases that a visible character could be based
upon more than a single Character (extended grapheme cluster) If that would occur,
thye should be merged into one extended grapheme cluster, a single Character that is.

//: Playground - implement direct subscripting using a Character array
// of course, when the String is defined as an array of Characters, directly
// accessible it would be more efficient as in these extension functions.
extension String
{
    var count: Int
        {
        get
        {
            return self.characters.count
        }
    }

    subscript (n: Int) -> String
    {
        return String(Array(self.characters)[n])
    }

    subscript (r: Range<Int>) -> String
    {
        return String(Array(self.characters)[r])
    }

    subscript (r: ClosedRange<Int>) -> String
    {
        return String(Array(self.characters)[r])
    }
}

func test()
{
    let zoo = "Koala , Snail , Penguin , Dromedary "
    print("zoo has \(zoo.count) characters (discrete extended graphemes):")
    for i in 0..<zoo.count
    {
        print(i,zoo[i],separator: "=", terminator:" ")
    }
    print("\n")
    print(zoo[0..<7])
    print(zoo[9..<16])
    print(zoo[18...26])
    print(zoo[29...39])
    print("images:" + zoo[6] + zoo[15] + zoo[26] + zoo[39])
}

test()

this works as intended and generates the following output:

zoo has 40 characters (discrete extended graphemes):
0=K 1=o 2=a 3=l 4=a 5= 6=🐨 7=, 8= 9=S 10=n 11=a 12=i 13=l 14= 15=🐌 16=, 17=
18=P 19=e 20=n 21=g 22=u 23=i 24=n 25= 26=🐧 27=, 28= 29=D 30=r 31=o 32=m
33=e 34=d 35=a 36=r 37=y 38= 39=🐪

Koala
Snail
Penguin
Dromedary
images:

I don’t know how (in) efficient this method is.
but in many cases this is not so important as e.g. with numerical computation.

I still fail to understand why direct subscripting strings would be unnecessary,
and would like to see this built-in in Swift asap.

Btw, I do share the concern as expressed by Rien regarding the increasing complexity of the language.

Kind Regards,

TedvG

Ben_Cohen · February 21, 2017, 12:31am

Hi Ted,

While Character is the Element type for String, it would be unsuitable for a String’s implementation to actually use Character for storage. Character is fairly large (currently 9 bytes), very little of which is used for most values. For unusual graphemes that require more storage, it allocates more memory on the heap. By contrast, String’s actual storage is a buffer of 1- or 2-byte elements, and all graphemes (what we expose as Characters) are held in that contiguous memory no matter how many code points they comprise. When you iterate over the string, the graphemes are unpacked into a Character on the fly. This gives you an user interface of a collection that superficially appears to resemble [Character], but this does not mean that this would be a workable implementation.

···

On Feb 20, 2017, at 12:59 PM, Ted F.A. van Gaalen <tedvgiosdev@gmail.com> wrote:

Hi Ben, Dave (you should not read this now, you’re on vacation :o) & Others

As described in the Swift Standard Library API Reference:

The Character type represents a character made up of one or more Unicode scalar values,
grouped by a Unicode boundary algorithm. Generally, a Character instance matches what
the reader of a string will perceive as a single character. The number of visible characters is
generally the most natural way to count the length of a string.
The smallest discrete unit we (app programmers) are mostly working with is this
perceived visible character, what else?

If that is the case, my reasoning is, that Strings (could / should? ) be relatively simple,
because most, if not all, complexity of Unicode is confined within the Character object and
completely hidden** for the average application programmer, who normally only needs
to work with Strings which contains these visible Characters, right?
It doesn’t then make no difference at all “what’ is in” the Character, (excellent implementation btw)
(Unicode, ASCCII, EBCDIC, Elvish, KlingonIV, IntergalacticV.2, whatever)
because we rely in sublime oblivion for the visually representation of whatever is in
the Character on miraculous font processors hidden in the dark depths of the OS.

Then, in this perspective, my question is: why is String not implemented as
directly based upon an array [Character] ? In that case one can refer to the Characters of the
String directly, not only for direct subscripting and other String functionality in an efficient way.
(i do hava scope of independent Swift here, that is interaction with libraries should be
solved by the compiler, so as not to be restricted by legacy ObjC etc.

** (expect if one needs to do e.g. access individual elements and/or compose graphics directly?
      but for this purpose the Character’s properties are accessible)

For the sake of convenience, based upon the above reasoning, I now “emulate" this in
a string extension, thereby ignoring the rare cases that a visible character could be based
upon more than a single Character (extended grapheme cluster) If that would occur,
thye should be merged into one extended grapheme cluster, a single Character that is.

//: Playground - implement direct subscripting using a Character array
// of course, when the String is defined as an array of Characters, directly
// accessible it would be more efficient as in these extension functions.
extension String
{
    var count: Int
        {
        get
        {
            return self.characters.count
        }
    }

    subscript (n: Int) -> String
    {
        return String(Array(self.characters)[n])
    }

    subscript (r: Range<Int>) -> String
    {
        return String(Array(self.characters)[r])
    }

    subscript (r: ClosedRange<Int>) -> String
    {
        return String(Array(self.characters)[r])
    }
}

func test()
{
    let zoo = "Koala , Snail , Penguin , Dromedary "
    print("zoo has \(zoo.count) characters (discrete extended graphemes):")
    for i in 0..<zoo.count
    {
        print(i,zoo[i],separator: "=", terminator:" ")
    }
    print("\n")
    print(zoo[0..<7])
    print(zoo[9..<16])
    print(zoo[18...26])
    print(zoo[29...39])
    print("images:" + zoo[6] + zoo[15] + zoo[26] + zoo[39])
}

test()

this works as intended and generates the following output:

zoo has 40 characters (discrete extended graphemes):
0=K 1=o 2=a 3=l 4=a 5= 6=🐨 7=, 8= 9=S 10=n 11=a 12=i 13=l 14= 15=🐌 16=, 17=
18=P 19=e 20=n 21=g 22=u 23=i 24=n 25= 26=🐧 27=, 28= 29=D 30=r 31=o 32=m
33=e 34=d 35=a 36=r 37=y 38= 39=🐪

Koala
Snail
Penguin
Dromedary
images:

I don’t know how (in) efficient this method is.
but in many cases this is not so important as e.g. with numerical computation.

I still fail to understand why direct subscripting strings would be unnecessary,
and would like to see this built-in in Swift asap.

Btw, I do share the concern as expressed by Rien regarding the increasing complexity of the language.

Kind Regards,

TedvG

TedvG · February 22, 2017, 3:56pm

Hi Ben,
thank you, yes, I know all that by now.

Have seen that one goes to great lengths to optimise, not only for storage but also for speed. But how far does this need to go? In any case, optimisation should not be used
as an argument for restricting a PLs functionality that is to refrain from PL elements which are common and useful.?

I wouldn’t worry so much over storage (unless one wants to load a complete book into memory… in iOS, the average app is about 15-50 MB, String data is mostly a fraction of that. In macOS or similar I’d think it is even less significant…

I wonder how much performance and memory consumption would be different from the current contiguous memory implementation? if a String is just is a plain row of (references to) Character (extended grapheme cluster) objects, Array<[Character>, which would simplify the basic logic and (sub)string handling significantly, because then one has direct access to the String’s elements directly, using the reasonably fast access methods of a Swift Collection/Array.

I have experimented with an alternative String struct based upon Array<Character>, seeing how easy it was to implement most popular string handling functions as one can work with the Character array directly.

Currently at deep-dive-depth in the standard lib sources, especially String & Co.

Kind Regards
TedvG

···

On 21 Feb 2017, at 01:31, Ben Cohen <ben_cohen@apple.com> wrote:

Hi Ted,

While Character is the Element type for String, it would be unsuitable for a String’s implementation to actually use Character for storage. Character is fairly large (currently 9 bytes), very little of which is used for most values. For unusual graphemes that require more storage, it allocates more memory on the heap. By contrast, String’s actual storage is a buffer of 1- or 2-byte elements, and all graphemes (what we expose as Characters) are held in that contiguous memory no matter how many code points they comprise. When you iterate over the string, the graphemes are unpacked into a Character on the fly. This gives you an user interface of a collection that superficially appears to resemble [Character], but this does not mean that this would be a workable implementation.

On Feb 20, 2017, at 12:59 PM, Ted F.A. van Gaalen <tedvgiosdev@gmail.com <mailto:tedvgiosdev@gmail.com>> wrote:

Hi Ben, Dave (you should not read this now, you’re on vacation :o) & Others

As described in the Swift Standard Library API Reference:

The Character type represents a character made up of one or more Unicode scalar values,
grouped by a Unicode boundary algorithm. Generally, a Character instance matches what
the reader of a string will perceive as a single character. The number of visible characters is
generally the most natural way to count the length of a string.
The smallest discrete unit we (app programmers) are mostly working with is this
perceived visible character, what else?

If that is the case, my reasoning is, that Strings (could / should? ) be relatively simple,
because most, if not all, complexity of Unicode is confined within the Character object and
completely hidden** for the average application programmer, who normally only needs
to work with Strings which contains these visible Characters, right?
It doesn’t then make no difference at all “what’ is in” the Character, (excellent implementation btw)
(Unicode, ASCCII, EBCDIC, Elvish, KlingonIV, IntergalacticV.2, whatever)
because we rely in sublime oblivion for the visually representation of whatever is in
the Character on miraculous font processors hidden in the dark depths of the OS.

Then, in this perspective, my question is: why is String not implemented as
directly based upon an array [Character] ? In that case one can refer to the Characters of the
String directly, not only for direct subscripting and other String functionality in an efficient way.
(i do hava scope of independent Swift here, that is interaction with libraries should be
solved by the compiler, so as not to be restricted by legacy ObjC etc.

** (expect if one needs to do e.g. access individual elements and/or compose graphics directly?
      but for this purpose the Character’s properties are accessible)

For the sake of convenience, based upon the above reasoning, I now “emulate" this in
a string extension, thereby ignoring the rare cases that a visible character could be based
upon more than a single Character (extended grapheme cluster) If that would occur,
thye should be merged into one extended grapheme cluster, a single Character that is.

//: Playground - implement direct subscripting using a Character array
// of course, when the String is defined as an array of Characters, directly
// accessible it would be more efficient as in these extension functions.
extension String
{
    var count: Int
        {
        get
        {
            return self.characters.count
        }
    }

    subscript (n: Int) -> String
    {
        return String(Array(self.characters)[n])
    }

    subscript (r: Range<Int>) -> String
    {
        return String(Array(self.characters)[r])
    }

    subscript (r: ClosedRange<Int>) -> String
    {
        return String(Array(self.characters)[r])
    }
}

func test()
{
    let zoo = "Koala , Snail , Penguin , Dromedary "
    print("zoo has \(zoo.count) characters (discrete extended graphemes):")
    for i in 0..<zoo.count
    {
        print(i,zoo[i],separator: "=", terminator:" ")
    }
    print("\n")
    print(zoo[0..<7])
    print(zoo[9..<16])
    print(zoo[18...26])
    print(zoo[29...39])
    print("images:" + zoo[6] + zoo[15] + zoo[26] + zoo[39])
}

test()

this works as intended and generates the following output:

zoo has 40 characters (discrete extended graphemes):
0=K 1=o 2=a 3=l 4=a 5= 6=🐨 7=, 8= 9=S 10=n 11=a 12=i 13=l 14= 15=🐌 16=, 17=
18=P 19=e 20=n 21=g 22=u 23=i 24=n 25= 26=🐧 27=, 28= 29=D 30=r 31=o 32=m
33=e 34=d 35=a 36=r 37=y 38= 39=🐪

Koala
Snail
Penguin
Dromedary
images:

I don’t know how (in) efficient this method is.
but in many cases this is not so important as e.g. with numerical computation.

I still fail to understand why direct subscripting strings would be unnecessary,
and would like to see this built-in in Swift asap.

Btw, I do share the concern as expressed by Rien regarding the increasing complexity of the language.

Kind Regards,

TedvG

David_Sweeris · February 24, 2017, 11:25pm

It mutates because the String has to instantiate the Array<Character> to which you're indexing into, if it doesn't already exist. It may not make any externally visible changes, but it's still a change.

- Dave Sweeris

···

On Feb 24, 2017, at 13:41, Ted F.A. van Gaalen <tedvgiosdev@gmail.com> wrote:

Hi David & Dave

can you explain that in more detail?

Wouldn’t that turn simple character access into a mutating function?

assigning like s[11…14] = str is of course, yes.
only then - that is if the character array thus has been changed -
it has to update the string in storage, yes.

but str = s[n..<m] doesn’t. mutate.
so you’d have to maintain keep (private) a isChanged: Bool or bit.
a checksum over the character array .
?