Make offset index available for String

Hi,

I would like to discuss the String.Index problem within Swift. I know the current situation of String.Index is based on the nature of the underlaying data structure of the string.

But could we just make String.Index contain offset information? Or make offset index subscript available for accessing character in String?

for example:
let a = "01234"
print(a[0]) // 0
print(a[0...4]) // 01234
print(a[...]) // 01234
print(a[..<2]) // 01
print(a[...2]) // 012
print(a[2...]) // 234
print(a[2...3]) // 23
print(a[2...2]) // 2
if let number = a.index(of: "1") {
    print(number) // 1
    print(a[number...]) // 1234
}

0 equals to Collection.Index of collection.index(startIndex, offsetBy: 0)
1 equals to Collection.Index of collection.index(startIndex, offsetBy: 1)
...
we keep the String.Index, but allow another kind of index, which is called "offsetIndex" to access the String.Index and the character in the string.
Any Collection could use the offset index to access their element, regarding the real index of it.

I have make the Collection OffsetIndexable protocol available here, and make it more accessible for StringProtocol considering all API related to the index.

If someone want to make the offset index/range available for any collection, you just need to extend the collection:
extension String : OffsetIndexableCollection {
}

extension Substring : OffsetIndexableCollection {
}

I hope the Swift core team could consider bring the offset index to string, or make it available to other collection, thus let developer to decide whether their collection could use offset indices as an assistant for the real index of the collection.

Thanks!
Jiannan

We really don't want to make subscripting a non-O(1) operation. That just provides false convenience and encourages people to do the wrong thing with Strings anyway.

I'm always interested in why people want this kind of ability. Yes, it's nice for teaching programming to be able to split strings on character boundaries indexed by integers, but where does it come up in real life? The most common cases I see are trying to strip off the first or last character, or a known prefix or suffix, and I feel like we should have better answers for those than "use integer indexes" anyway.

Jordan

···

On Dec 13, 2017, at 22:30, Cao, Jiannan via swift-dev <swift-dev@swift.org> wrote:

Hi,

I would like to discuss the String.Index problem within Swift. I know the current situation of String.Index is based on the nature of the underlaying data structure of the string.

But could we just make String.Index contain offset information? Or make offset index subscript available for accessing character in String?

for example:
let a = "01234"
print(a[0]) // 0
print(a[0...4]) // 01234
print(a[...]) // 01234
print(a[..<2]) // 01
print(a[...2]) // 012
print(a[2...]) // 234
print(a[2...3]) // 23
print(a[2...2]) // 2
if let number = a.index(of: "1") {
    print(number) // 1
    print(a[number...]) // 1234
}

0 equals to Collection.Index of collection.index(startIndex, offsetBy: 0)
1 equals to Collection.Index of collection.index(startIndex, offsetBy: 1)
...
we keep the String.Index, but allow another kind of index, which is called "offsetIndex" to access the String.Index and the character in the string.
Any Collection could use the offset index to access their element, regarding the real index of it.

I have make the Collection OffsetIndexable protocol available here, and make it more accessible for StringProtocol considering all API related to the index.

GitHub - frogcjn/OffsetIndexableCollection-String-Int-Indexable-: OffsetIndexableCollection (String Int Indexable)

If someone want to make the offset index/range available for any collection, you just need to extend the collection:
extension String : OffsetIndexableCollection {
}

extension Substring : OffsetIndexableCollection {
}

I hope the Swift core team could consider bring the offset index to string, or make it available to other collection, thus let developer to decide whether their collection could use offset indices as an assistant for the real index of the collection.

Thanks!
Jiannan

_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

I think there’s a middle ground here as the current API is overly complex
and general. 95% of the time people are dealing with ASCII strings and
don’t care about things like unicode cluster boundaries and whatnot

···

On Thu, Dec 14, 2017 at 12:13 PM, Jordan Rose via swift-dev < swift-dev@swift.org> wrote:

We really don't want to make subscripting a non-O(1) operation. That just
provides false convenience and encourages people to do the wrong thing with
Strings anyway.

I'm always interested in why people want this kind of ability. Yes, it's
nice for teaching programming to be able to split strings on character
boundaries indexed by integers, but where does it come up in real life? The
most common cases I see are trying to strip off the first or last
character, or a known prefix or suffix, and I feel like we should have
better answers for those than "use integer indexes" anyway.

Jordan

On Dec 13, 2017, at 22:30, Cao, Jiannan via swift-dev <swift-dev@swift.org> > wrote:

Hi,

I would like to discuss the String.Index problem within Swift. I know the
current situation of String.Index is based on the nature of the underlaying
data structure of the string.

But could we just make String.Index contain offset information? Or make
offset index subscript available for accessing character in String?

for example:

let a = "01234"
print(a[0]) // 0
print(a[0...4]) // 01234
print(a[...]) // 01234
print(a[..<2]) // 01
print(a[...2]) // 012
print(a[2...]) // 234
print(a[2...3]) // 23
print(a[2...2]) // 2
if let number = a.index(of: "1") {
    print(number) // 1
    print(a[number...]) // 1234
}

0 equals to Collection.Index of collection.index(startIndex, offsetBy: 0)
1 equals to Collection.Index of collection.index(startIndex, offsetBy: 1)
...
we keep the String.Index, but allow another kind of index, which is called
"offsetIndex" to access the String.Index and the character in the string.
Any Collection could use the offset index to access their element,
regarding the real index of it.

I have make the Collection OffsetIndexable protocol available here, and
make it more accessible for StringProtocol considering all API related to
the index.

GitHub - frogcjn/OffsetIndexableCollection-String-Int-Indexable-: OffsetIndexableCollection (String Int Indexable)

If someone want to make the offset index/range available for any
collection, you just need to extend the collection:

extension String : OffsetIndexableCollection {
}
extension Substring : OffsetIndexableCollection {
}

I hope the Swift core team could consider bring the offset index to
string, or make it available to other collection, thus let developer to
decide whether their collection could use offset indices as an assistant
for the real index of the collection.

Thanks!
Jiannan

_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

1 Like

95% ASCII? Even if you forgot the rest of the World and consider only US
this doesn’t seems right.

But of course people (in the general) don’t care about Unicode Clusters.
Because that, we should focus on keep Unicode string manipulation easy,
instead just forget the Unicode rules.

···

Em qui, 14 de dez de 2017 às 17:28, Kelvin Ma via swift-dev < swift-dev@swift.org> escreveu:

I think there’s a middle ground here as the current API is overly complex
and general. 95% of the time people are dealing with ASCII strings and
don’t care about things like unicode cluster boundaries and whatnot

On Thu, Dec 14, 2017 at 12:13 PM, Jordan Rose via swift-dev < > swift-dev@swift.org> wrote:

We really don't want to make subscripting a non-O(1) operation. That just
provides false convenience and encourages people to do the wrong thing with
Strings anyway.

I'm always interested in why people want this kind of ability. Yes, it's
nice for teaching programming to be able to split strings on character
boundaries indexed by integers, but where does it come up in real life? The
most common cases I see are trying to strip off the first or last
character, or a known prefix or suffix, and I feel like we should have
better answers for those than "use integer indexes" anyway.

Jordan

On Dec 13, 2017, at 22:30, Cao, Jiannan via swift-dev < >> swift-dev@swift.org> wrote:

Hi,

I would like to discuss the String.Index problem within Swift. I know the
current situation of String.Index is based on the nature of the underlaying
data structure of the string.

But could we just make String.Index contain offset information? Or make
offset index subscript available for accessing character in String?

for example:

let a = "01234"
print(a[0]) // 0
print(a[0...4]) // 01234
print(a[...]) // 01234
print(a[..<2]) // 01
print(a[...2]) // 012
print(a[2...]) // 234
print(a[2...3]) // 23
print(a[2...2]) // 2
if let number = a.index(of: "1") {
    print(number) // 1
    print(a[number...]) // 1234
}

0 equals to Collection.Index of collection.index(startIndex, offsetBy: 0)
1 equals to Collection.Index of collection.index(startIndex, offsetBy: 1)
...
we keep the String.Index, but allow another kind of index, which is
called "offsetIndex" to access the String.Index and the character in the
string.
Any Collection could use the offset index to access their element,
regarding the real index of it.

I have make the Collection OffsetIndexable protocol available here, and
make it more accessible for StringProtocol considering all API related to
the index.

GitHub - frogcjn/OffsetIndexableCollection-String-Int-Indexable-: OffsetIndexableCollection (String Int Indexable)

If someone want to make the offset index/range available for any
collection, you just need to extend the collection:

extension String : OffsetIndexableCollection {
}
extension Substring : OffsetIndexableCollection {
}

I hope the Swift core team could consider bring the offset index to
string, or make it available to other collection, thus let developer to
decide whether their collection could use offset indices as an assistant
for the real index of the collection.

Thanks!
Jiannan

_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

People used to the offset index system instead of the String.Index. Using offset indices to name the elements, count the elements is normal and nature.

This offset index system has a long history and a real meaning to the collection. The subscript s[i] has a fix meaning of "getting the i-th element in this collection", which is normal and direct. Get the range with offset indices, is also direct. It means the substring is from the i-th character up to the j-th character of the original string.

People used to play subscript, range with offset indices. Use string[string.index(i, offsetBy: 5)] is not as directly and easily as string[i + 5]. Also the Range<String.Index> is not as directly as Range<Offset>. Developers need to transfer the Range<String.Index> result of string.range(of:) to Range<OffsetIndex> to know the exact range of the substring. Range<String.Index> has a real meaning to the machine and underlaying data location for the substring, but Range<OffsetIndex> also has a direct location information for human being, and represents the abstract location concept of the collection (This is the most UNIMPEACHABLE REASON I could provide).

Offset index system is based on the nature of collection. Each element of the collection could be located by offset, which is a direct and simple conception to any collection. Right? Even the String with String.Index has some offset index property within it. For example: the `count` of the String, is the offset index of the endIndex.The enumerated() generated a sequence with elements contains the same offset as the offset index system provided. And when we apply Array(string), the string divided by each character and make the offset indices available for the new array.

The offset index system is just an assistant for collection, not a replacement to String.Index. We use String.Index to represent the normal underlaying of the String. We also could use offset indices to represent the nature of the Collection with its elements. Providing the offset index as a second choice to access elements in collections, is not only for the String struct, is for all collections, since it is the nature of the collection concept, and developer could choose use it or not.

We don't make the String.Index O(1), but translate the offset indices to the underlaying String.Index. Each time using subscript with offset index, we just need to translate offset indices to underlaying indices using c.index(startIndex, offsetBy:i), c.distance(from: startIndex, to:i)

We can make the offset indices available through extension to Collection (as my GitHub repo demo: https://github.com/frogcjn/OffsetIndexableCollection-String-Int-Indexable-\).

or we could make it at compile time:
for example

  c[1...]
compile to
  c[c.index(startIndex, offsetBy:1)...]

  let index: Int = s.index(of: "a")
compile to
  let index: Int = s.distance(from: s.startIndex, to: s.index(of:"a"))

  let index = 1 // if used in s only
  s[index..<index+2]
compile to
  let index = s.index(s.startIndex, offsetBy: 1)
  s[index..<s.index(index, offsetBy: 2)]

  let index = 1 // if used both in s1, s2
  s1[index..<index+2]
  s2[index..<index+2]
compile to
  let index = 1
  let index1 = s1.index(s.startIndex, offsetBy: index)
  let index2 = s2.index(s.startIndex, offsetBy: index)
  s1[index1..<s.index(index1, offsetBy: 2)]
  s2[index2..<s.index(index2, offsetBy: 2)]

I really want the team to consider providing the offset index system as an assistant to the collection. It is the very necessary basic concept of Collection.

Thanks!
Jiannan

···

在 2017年12月15日,上午2:13,Jordan Rose <jordan_rose@apple.com <mailto:jordan_rose@apple.com>> 写道:

We really don't want to make subscripting a non-O(1) operation. That just provides false convenience and encourages people to do the wrong thing with Strings anyway.

I'm always interested in why people want this kind of ability. Yes, it's nice for teaching programming to be able to split strings on character boundaries indexed by integers, but where does it come up in real life? The most common cases I see are trying to strip off the first or last character, or a known prefix or suffix, and I feel like we should have better answers for those than "use integer indexes" anyway.

Jordan

On Dec 13, 2017, at 22:30, Cao, Jiannan via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

Hi,

I would like to discuss the String.Index problem within Swift. I know the current situation of String.Index is based on the nature of the underlaying data structure of the string.

But could we just make String.Index contain offset information? Or make offset index subscript available for accessing character in String?

for example:
let a = "01234"
print(a[0]) // 0
print(a[0...4]) // 01234
print(a[...]) // 01234
print(a[..<2]) // 01
print(a[...2]) // 012
print(a[2...]) // 234
print(a[2...3]) // 23
print(a[2...2]) // 2
if let number = a.index(of: "1") {
    print(number) // 1
    print(a[number...]) // 1234
}

0 equals to Collection.Index of collection.index(startIndex, offsetBy: 0)
1 equals to Collection.Index of collection.index(startIndex, offsetBy: 1)
...
we keep the String.Index, but allow another kind of index, which is called "offsetIndex" to access the String.Index and the character in the string.
Any Collection could use the offset index to access their element, regarding the real index of it.

I have make the Collection OffsetIndexable protocol available here, and make it more accessible for StringProtocol considering all API related to the index.

GitHub - frogcjn/OffsetIndexableCollection-String-Int-Indexable-: OffsetIndexableCollection (String Int Indexable)

If someone want to make the offset index/range available for any collection, you just need to extend the collection:
extension String : OffsetIndexableCollection {
}

extension Substring : OffsetIndexableCollection {
}

I hope the Swift core team could consider bring the offset index to string, or make it available to other collection, thus let developer to decide whether their collection could use offset indices as an assistant for the real index of the collection.

Thanks!
Jiannan

_______________________________________________
swift-dev mailing list
swift-dev@swift.org <mailto:swift-dev@swift.org>
https://lists.swift.org/mailman/listinfo/swift-dev

People used to the offset index system instead of the String.Index. Using offset indices to name the elements, count the elements is normal and nature.

The offset system that you’re referring to is totally available in String today, if you’re willing for it to be the offset into the encoding. That’s the offset “people” you’re referring to are likely used to and consider normal and natural. On String.Index, there is the following:

init(encodedOffset offset: Int <https://developer.apple.com/documentation/swift/int&gt;\)

and

var encodedOffset: Int <https://developer.apple.com/documentation/swift/int&gt; { get }

[1] String.Index | Apple Developer Documentation

···

On Dec 14, 2017, at 4:49 PM, Cao, Jiannan via swift-dev <swift-dev@swift.org> wrote:

This offset index system has a long history and a real meaning to the collection. The subscript s[i] has a fix meaning of "getting the i-th element in this collection", which is normal and direct. Get the range with offset indices, is also direct. It means the substring is from the i-th character up to the j-th character of the original string.

People used to play subscript, range with offset indices. Use string[string.index(i, offsetBy: 5)] is not as directly and easily as string[i + 5]. Also the Range<String.Index> is not as directly as Range<Offset>. Developers need to transfer the Range<String.Index> result of string.range(of:) to Range<OffsetIndex> to know the exact range of the substring. Range<String.Index> has a real meaning to the machine and underlaying data location for the substring, but Range<OffsetIndex> also has a direct location information for human being, and represents the abstract location concept of the collection (This is the most UNIMPEACHABLE REASON I could provide).

Offset index system is based on the nature of collection. Each element of the collection could be located by offset, which is a direct and simple conception to any collection. Right? Even the String with String.Index has some offset index property within it. For example: the `count` of the String, is the offset index of the endIndex.The enumerated() generated a sequence with elements contains the same offset as the offset index system provided. And when we apply Array(string), the string divided by each character and make the offset indices available for the new array.

The offset index system is just an assistant for collection, not a replacement to String.Index. We use String.Index to represent the normal underlaying of the String. We also could use offset indices to represent the nature of the Collection with its elements. Providing the offset index as a second choice to access elements in collections, is not only for the String struct, is for all collections, since it is the nature of the collection concept, and developer could choose use it or not.

We don't make the String.Index O(1), but translate the offset indices to the underlaying String.Index. Each time using subscript with offset index, we just need to translate offset indices to underlaying indices using c.index(startIndex, offsetBy:i), c.distance(from: startIndex, to:i)

We can make the offset indices available through extension to Collection (as my GitHub repo demo: https://github.com/frogcjn/OffsetIndexableCollection-String-Int-Indexable-\).

or we could make it at compile time:
for example

  c[1...]
compile to
  c[c.index(startIndex, offsetBy:1)...]

  let index: Int = s.index(of: "a")
compile to
  let index: Int = s.distance(from: s.startIndex, to: s.index(of:"a"))

  let index = 1 // if used in s only
  s[index..<index+2]
compile to
  let index = s.index(s.startIndex, offsetBy: 1)
  s[index..<s.index(index, offsetBy: 2)]

  let index = 1 // if used both in s1, s2
  s1[index..<index+2]
  s2[index..<index+2]
compile to
  let index = 1
  let index1 = s1.index(s.startIndex, offsetBy: index)
  let index2 = s2.index(s.startIndex, offsetBy: index)
  s1[index1..<s.index(index1, offsetBy: 2)]
  s2[index2..<s.index(index2, offsetBy: 2)]

I really want the team to consider providing the offset index system as an assistant to the collection. It is the very necessary basic concept of Collection.

Thanks!
Jiannan

在 2017年12月15日,上午2:13,Jordan Rose <jordan_rose@apple.com <mailto:jordan_rose@apple.com>> 写道:

We really don't want to make subscripting a non-O(1) operation. That just provides false convenience and encourages people to do the wrong thing with Strings anyway.

I'm always interested in why people want this kind of ability. Yes, it's nice for teaching programming to be able to split strings on character boundaries indexed by integers, but where does it come up in real life? The most common cases I see are trying to strip off the first or last character, or a known prefix or suffix, and I feel like we should have better answers for those than "use integer indexes" anyway.

Jordan

On Dec 13, 2017, at 22:30, Cao, Jiannan via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

Hi,

I would like to discuss the String.Index problem within Swift. I know the current situation of String.Index is based on the nature of the underlaying data structure of the string.

But could we just make String.Index contain offset information? Or make offset index subscript available for accessing character in String?

for example:
let a = "01234"
print(a[0]) // 0
print(a[0...4]) // 01234
print(a[...]) // 01234
print(a[..<2]) // 01
print(a[...2]) // 012
print(a[2...]) // 234
print(a[2...3]) // 23
print(a[2...2]) // 2
if let number = a.index(of: "1") {
    print(number) // 1
    print(a[number...]) // 1234
}

0 equals to Collection.Index of collection.index(startIndex, offsetBy: 0)
1 equals to Collection.Index of collection.index(startIndex, offsetBy: 1)
...
we keep the String.Index, but allow another kind of index, which is called "offsetIndex" to access the String.Index and the character in the string.
Any Collection could use the offset index to access their element, regarding the real index of it.

I have make the Collection OffsetIndexable protocol available here, and make it more accessible for StringProtocol considering all API related to the index.

GitHub - frogcjn/OffsetIndexableCollection-String-Int-Indexable-: OffsetIndexableCollection (String Int Indexable)

If someone want to make the offset index/range available for any collection, you just need to extend the collection:
extension String : OffsetIndexableCollection {
}

extension Substring : OffsetIndexableCollection {
}

I hope the Swift core team could consider bring the offset index to string, or make it available to other collection, thus let developer to decide whether their collection could use offset indices as an assistant for the real index of the collection.

Thanks!
Jiannan

_______________________________________________
swift-dev mailing list
swift-dev@swift.org <mailto:swift-dev@swift.org>
https://lists.swift.org/mailman/listinfo/swift-dev

_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

perhaps i’m biased because most of the work i do is on strings that are
meant to be machine read and written but if you ask me, user-facing unicode
strings are the special case and it seems the entire API is optimized for
them, and not more common ASCII strings.

···

On Thu, Dec 14, 2017 at 3:40 PM, Wallacy <wallacyf@gmail.com> wrote:

95% ASCII? Even if you forgot the rest of the World and consider only US
this doesn’t seems right.

But of course people (in the general) don’t care about Unicode Clusters.
Because that, we should focus on keep Unicode string manipulation easy,
instead just forget the Unicode rules.

Em qui, 14 de dez de 2017 às 17:28, Kelvin Ma via swift-dev < > swift-dev@swift.org> escreveu:

I think there’s a middle ground here as the current API is overly complex
and general. 95% of the time people are dealing with ASCII strings and
don’t care about things like unicode cluster boundaries and whatnot

On Thu, Dec 14, 2017 at 12:13 PM, Jordan Rose via swift-dev < >> swift-dev@swift.org> wrote:

We really don't want to make subscripting a non-O(1) operation. That
just provides false convenience and encourages people to do the wrong thing
with Strings anyway.

I'm always interested in why people want this kind of ability. Yes, it's
nice for teaching programming to be able to split strings on character
boundaries indexed by integers, but where does it come up in real life? The
most common cases I see are trying to strip off the first or last
character, or a known prefix or suffix, and I feel like we should have
better answers for those than "use integer indexes" anyway.

Jordan

On Dec 13, 2017, at 22:30, Cao, Jiannan via swift-dev < >>> swift-dev@swift.org> wrote:

Hi,

I would like to discuss the String.Index problem within Swift. I know
the current situation of String.Index is based on the nature of the
underlaying data structure of the string.

But could we just make String.Index contain offset information? Or make
offset index subscript available for accessing character in String?

for example:

let a = "01234"
print(a[0]) // 0
print(a[0...4]) // 01234
print(a[...]) // 01234
print(a[..<2]) // 01
print(a[...2]) // 012
print(a[2...]) // 234
print(a[2...3]) // 23
print(a[2...2]) // 2
if let number = a.index(of: "1") {
    print(number) // 1
    print(a[number...]) // 1234
}

0 equals to Collection.Index of collection.index(startIndex, offsetBy: 0)
1 equals to Collection.Index of collection.index(startIndex, offsetBy:
1)
...
we keep the String.Index, but allow another kind of index, which is
called "offsetIndex" to access the String.Index and the character in the
string.
Any Collection could use the offset index to access their element,
regarding the real index of it.

I have make the Collection OffsetIndexable protocol available here, and
make it more accessible for StringProtocol considering all API related to
the index.

https://github.com/frogcjn/OffsetIndexableCollection-
String-Int-Indexable-

If someone want to make the offset index/range available for any
collection, you just need to extend the collection:

extension String : OffsetIndexableCollection {
}
extension Substring : OffsetIndexableCollection {
}

I hope the Swift core team could consider bring the offset index to
string, or make it available to other collection, thus let developer to
decide whether their collection could use offset indices as an assistant
for the real index of the collection.

Thanks!
Jiannan

_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

This offset is unicode offset, is not the offset of element.
For example: index(startIndex, offsetBy:1) is encodedOffset 4 or 8, not 1.

Offset indexable is based on the offset of count of each element/index. it is the same result of s.index(s.startIndex, offsetBy:i)
The encodedOffset is the underlaying offset of unicode string, not the same concept of the offset index of collection.

The offset indexable is meaning to the elements and index of collection (i-th element of the collection), not related to the unicode offset (which is the underlaying data offset meaning to the UTF-16 String).

These two offset is totally different.

Best,
Jiannan

···

在 2017年12月15日,上午9:17,Michael Ilseman <milseman@apple.com <mailto:milseman@apple.com>> 写道:

On Dec 14, 2017, at 4:49 PM, Cao, Jiannan via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

People used to the offset index system instead of the String.Index. Using offset indices to name the elements, count the elements is normal and nature.

The offset system that you’re referring to is totally available in String today, if you’re willing for it to be the offset into the encoding. That’s the offset “people” you’re referring to are likely used to and consider normal and natural. On String.Index, there is the following:

init(encodedOffset offset: Int <https://developer.apple.com/documentation/swift/int&gt;\)

and

var encodedOffset: Int <https://developer.apple.com/documentation/swift/int&gt; { get }

[1] String.Index | Apple Developer Documentation

This offset index system has a long history and a real meaning to the collection. The subscript s[i] has a fix meaning of "getting the i-th element in this collection", which is normal and direct. Get the range with offset indices, is also direct. It means the substring is from the i-th character up to the j-th character of the original string.

People used to play subscript, range with offset indices. Use string[string.index(i, offsetBy: 5)] is not as directly and easily as string[i + 5]. Also the Range<String.Index> is not as directly as Range<Offset>. Developers need to transfer the Range<String.Index> result of string.range(of:) to Range<OffsetIndex> to know the exact range of the substring. Range<String.Index> has a real meaning to the machine and underlaying data location for the substring, but Range<OffsetIndex> also has a direct location information for human being, and represents the abstract location concept of the collection (This is the most UNIMPEACHABLE REASON I could provide).

Offset index system is based on the nature of collection. Each element of the collection could be located by offset, which is a direct and simple conception to any collection. Right? Even the String with String.Index has some offset index property within it. For example: the `count` of the String, is the offset index of the endIndex.The enumerated() generated a sequence with elements contains the same offset as the offset index system provided. And when we apply Array(string), the string divided by each character and make the offset indices available for the new array.

The offset index system is just an assistant for collection, not a replacement to String.Index. We use String.Index to represent the normal underlaying of the String. We also could use offset indices to represent the nature of the Collection with its elements. Providing the offset index as a second choice to access elements in collections, is not only for the String struct, is for all collections, since it is the nature of the collection concept, and developer could choose use it or not.

We don't make the String.Index O(1), but translate the offset indices to the underlaying String.Index. Each time using subscript with offset index, we just need to translate offset indices to underlaying indices using c.index(startIndex, offsetBy:i), c.distance(from: startIndex, to:i)

We can make the offset indices available through extension to Collection (as my GitHub repo demo: https://github.com/frogcjn/OffsetIndexableCollection-String-Int-Indexable-\).

or we could make it at compile time:
for example

  c[1...]
compile to
  c[c.index(startIndex, offsetBy:1)...]

  let index: Int = s.index(of: "a")
compile to
  let index: Int = s.distance(from: s.startIndex, to: s.index(of:"a"))

  let index = 1 // if used in s only
  s[index..<index+2]
compile to
  let index = s.index(s.startIndex, offsetBy: 1)
  s[index..<s.index(index, offsetBy: 2)]

  let index = 1 // if used both in s1, s2
  s1[index..<index+2]
  s2[index..<index+2]
compile to
  let index = 1
  let index1 = s1.index(s.startIndex, offsetBy: index)
  let index2 = s2.index(s.startIndex, offsetBy: index)
  s1[index1..<s.index(index1, offsetBy: 2)]
  s2[index2..<s.index(index2, offsetBy: 2)]

I really want the team to consider providing the offset index system as an assistant to the collection. It is the very necessary basic concept of Collection.

Thanks!
Jiannan

在 2017年12月15日,上午2:13,Jordan Rose <jordan_rose@apple.com <mailto:jordan_rose@apple.com>> 写道:

We really don't want to make subscripting a non-O(1) operation. That just provides false convenience and encourages people to do the wrong thing with Strings anyway.

I'm always interested in why people want this kind of ability. Yes, it's nice for teaching programming to be able to split strings on character boundaries indexed by integers, but where does it come up in real life? The most common cases I see are trying to strip off the first or last character, or a known prefix or suffix, and I feel like we should have better answers for those than "use integer indexes" anyway.

Jordan

On Dec 13, 2017, at 22:30, Cao, Jiannan via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

Hi,

I would like to discuss the String.Index problem within Swift. I know the current situation of String.Index is based on the nature of the underlaying data structure of the string.

But could we just make String.Index contain offset information? Or make offset index subscript available for accessing character in String?

for example:
let a = "01234"
print(a[0]) // 0
print(a[0...4]) // 01234
print(a[...]) // 01234
print(a[..<2]) // 01
print(a[...2]) // 012
print(a[2...]) // 234
print(a[2...3]) // 23
print(a[2...2]) // 2
if let number = a.index(of: "1") {
    print(number) // 1
    print(a[number...]) // 1234
}

0 equals to Collection.Index of collection.index(startIndex, offsetBy: 0)
1 equals to Collection.Index of collection.index(startIndex, offsetBy: 1)
...
we keep the String.Index, but allow another kind of index, which is called "offsetIndex" to access the String.Index and the character in the string.
Any Collection could use the offset index to access their element, regarding the real index of it.

I have make the Collection OffsetIndexable protocol available here, and make it more accessible for StringProtocol considering all API related to the index.

GitHub - frogcjn/OffsetIndexableCollection-String-Int-Indexable-: OffsetIndexableCollection (String Int Indexable)

If someone want to make the offset index/range available for any collection, you just need to extend the collection:
extension String : OffsetIndexableCollection {
}

extension Substring : OffsetIndexableCollection {
}

I hope the Swift core team could consider bring the offset index to string, or make it available to other collection, thus let developer to decide whether their collection could use offset indices as an assistant for the real index of the collection.

Thanks!
Jiannan

_______________________________________________
swift-dev mailing list
swift-dev@swift.org <mailto:swift-dev@swift.org>
https://lists.swift.org/mailman/listinfo/swift-dev

_______________________________________________
swift-dev mailing list
swift-dev@swift.org <mailto:swift-dev@swift.org>
https://lists.swift.org/mailman/listinfo/swift-dev

Yes, I was trying to highlight that they are different and should be treated different. This was because it seemed you were conflating the two in your argument. You claim that people expect it, and I’m pointing out that what people actually expect (assuming they’re coming from C or languages with a similar model) already exists as those models deal in encoded offsets.

More important than expectations surrounding what to provide to a subscript are expectations surrounding algorithmic complexity. This has security implications. The expectation of subscript is that it is “constant-ish”, for a fuzzy hand-wavy definition of “constant-ish” which includes amortized constant or logarithmic.

Now, I agree with the overall sentiment that `index(offsetBy:)` is unwieldy. I am interested in approaches to improve this. But, we cannot throw linear complexity into subscript without extreme justification.

···

On Dec 14, 2017, at 5:25 PM, Cao, Jiannan <frogcjn@163.com> wrote:

This offset is unicode offset, is not the offset of element.
For example: index(startIndex, offsetBy:1) is encodedOffset 4 or 8, not 1.

Offset indexable is based on the offset of count of each element/index. it is the same result of s.index(s.startIndex, offsetBy:i)
The encodedOffset is the underlaying offset of unicode string, not the same concept of the offset index of collection.

The offset indexable is meaning to the elements and index of collection (i-th element of the collection), not related to the unicode offset (which is the underlaying data offset meaning to the UTF-16 String).

These two offset is totally different.

Best,
Jiannan

在 2017年12月15日,上午9:17,Michael Ilseman <milseman@apple.com <mailto:milseman@apple.com>> 写道:

On Dec 14, 2017, at 4:49 PM, Cao, Jiannan via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

People used to the offset index system instead of the String.Index. Using offset indices to name the elements, count the elements is normal and nature.

The offset system that you’re referring to is totally available in String today, if you’re willing for it to be the offset into the encoding. That’s the offset “people” you’re referring to are likely used to and consider normal and natural. On String.Index, there is the following:

init(encodedOffset offset: Int <https://developer.apple.com/documentation/swift/int&gt;\)

and

var encodedOffset: Int <https://developer.apple.com/documentation/swift/int&gt; { get }

[1] String.Index | Apple Developer Documentation

This offset index system has a long history and a real meaning to the collection. The subscript s[i] has a fix meaning of "getting the i-th element in this collection", which is normal and direct. Get the range with offset indices, is also direct. It means the substring is from the i-th character up to the j-th character of the original string.

People used to play subscript, range with offset indices. Use string[string.index(i, offsetBy: 5)] is not as directly and easily as string[i + 5]. Also the Range<String.Index> is not as directly as Range<Offset>. Developers need to transfer the Range<String.Index> result of string.range(of:) to Range<OffsetIndex> to know the exact range of the substring. Range<String.Index> has a real meaning to the machine and underlaying data location for the substring, but Range<OffsetIndex> also has a direct location information for human being, and represents the abstract location concept of the collection (This is the most UNIMPEACHABLE REASON I could provide).

Offset index system is based on the nature of collection. Each element of the collection could be located by offset, which is a direct and simple conception to any collection. Right? Even the String with String.Index has some offset index property within it. For example: the `count` of the String, is the offset index of the endIndex.The enumerated() generated a sequence with elements contains the same offset as the offset index system provided. And when we apply Array(string), the string divided by each character and make the offset indices available for the new array.

The offset index system is just an assistant for collection, not a replacement to String.Index. We use String.Index to represent the normal underlaying of the String. We also could use offset indices to represent the nature of the Collection with its elements. Providing the offset index as a second choice to access elements in collections, is not only for the String struct, is for all collections, since it is the nature of the collection concept, and developer could choose use it or not.

We don't make the String.Index O(1), but translate the offset indices to the underlaying String.Index. Each time using subscript with offset index, we just need to translate offset indices to underlaying indices using c.index(startIndex, offsetBy:i), c.distance(from: startIndex, to:i)

We can make the offset indices available through extension to Collection (as my GitHub repo demo: https://github.com/frogcjn/OffsetIndexableCollection-String-Int-Indexable-\).

or we could make it at compile time:
for example

  c[1...]
compile to
  c[c.index(startIndex, offsetBy:1)...]

  let index: Int = s.index(of: "a")
compile to
  let index: Int = s.distance(from: s.startIndex, to: s.index(of:"a"))

  let index = 1 // if used in s only
  s[index..<index+2]
compile to
  let index = s.index(s.startIndex, offsetBy: 1)
  s[index..<s.index(index, offsetBy: 2)]

  let index = 1 // if used both in s1, s2
  s1[index..<index+2]
  s2[index..<index+2]
compile to
  let index = 1
  let index1 = s1.index(s.startIndex, offsetBy: index)
  let index2 = s2.index(s.startIndex, offsetBy: index)
  s1[index1..<s.index(index1, offsetBy: 2)]
  s2[index2..<s.index(index2, offsetBy: 2)]

I really want the team to consider providing the offset index system as an assistant to the collection. It is the very necessary basic concept of Collection.

Thanks!
Jiannan

在 2017年12月15日,上午2:13,Jordan Rose <jordan_rose@apple.com <mailto:jordan_rose@apple.com>> 写道:

We really don't want to make subscripting a non-O(1) operation. That just provides false convenience and encourages people to do the wrong thing with Strings anyway.

I'm always interested in why people want this kind of ability. Yes, it's nice for teaching programming to be able to split strings on character boundaries indexed by integers, but where does it come up in real life? The most common cases I see are trying to strip off the first or last character, or a known prefix or suffix, and I feel like we should have better answers for those than "use integer indexes" anyway.

Jordan

On Dec 13, 2017, at 22:30, Cao, Jiannan via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

Hi,

I would like to discuss the String.Index problem within Swift. I know the current situation of String.Index is based on the nature of the underlaying data structure of the string.

But could we just make String.Index contain offset information? Or make offset index subscript available for accessing character in String?

for example:
let a = "01234"
print(a[0]) // 0
print(a[0...4]) // 01234
print(a[...]) // 01234
print(a[..<2]) // 01
print(a[...2]) // 012
print(a[2...]) // 234
print(a[2...3]) // 23
print(a[2...2]) // 2
if let number = a.index(of: "1") {
    print(number) // 1
    print(a[number...]) // 1234
}

0 equals to Collection.Index of collection.index(startIndex, offsetBy: 0)
1 equals to Collection.Index of collection.index(startIndex, offsetBy: 1)
...
we keep the String.Index, but allow another kind of index, which is called "offsetIndex" to access the String.Index and the character in the string.
Any Collection could use the offset index to access their element, regarding the real index of it.

I have make the Collection OffsetIndexable protocol available here, and make it more accessible for StringProtocol considering all API related to the index.

GitHub - frogcjn/OffsetIndexableCollection-String-Int-Indexable-: OffsetIndexableCollection (String Int Indexable)

If someone want to make the offset index/range available for any collection, you just need to extend the collection:
extension String : OffsetIndexableCollection {
}

extension Substring : OffsetIndexableCollection {
}

I hope the Swift core team could consider bring the offset index to string, or make it available to other collection, thus let developer to decide whether their collection could use offset indices as an assistant for the real index of the collection.

Thanks!
Jiannan

_______________________________________________
swift-dev mailing list
swift-dev@swift.org <mailto:swift-dev@swift.org>
https://lists.swift.org/mailman/listinfo/swift-dev

_______________________________________________
swift-dev mailing list
swift-dev@swift.org <mailto:swift-dev@swift.org>
https://lists.swift.org/mailman/listinfo/swift-dev

for example:
caféz

the offset index of z is always 4. Which means the 4-th character of the string. You can always use s[s.index(s.startIndex, offsetBy:4)] to access the z.
but the encodedOffset index of z maybe 16 or 20. This is not the offset concept of the collection, but the encoded offset concept of UTF-16.

···

在 2017年12月15日,上午9:25,Cao, Jiannan via swift-dev <swift-dev@swift.org> 写道:

This offset is unicode offset, is not the offset of element.
For example: index(startIndex, offsetBy:1) is encodedOffset 4 or 8, not 1.

Offset indexable is based on the offset of count of each element/index. it is the same result of s.index(s.startIndex, offsetBy:i)
The encodedOffset is the underlaying offset of unicode string, not the same concept of the offset index of collection.

The offset indexable is meaning to the elements and index of collection (i-th element of the collection), not related to the unicode offset (which is the underlaying data offset meaning to the UTF-16 String).

These two offset is totally different.

Best,
Jiannan

在 2017年12月15日,上午9:17,Michael Ilseman <milseman@apple.com <mailto:milseman@apple.com>> 写道:

On Dec 14, 2017, at 4:49 PM, Cao, Jiannan via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

People used to the offset index system instead of the String.Index. Using offset indices to name the elements, count the elements is normal and nature.

The offset system that you’re referring to is totally available in String today, if you’re willing for it to be the offset into the encoding. That’s the offset “people” you’re referring to are likely used to and consider normal and natural. On String.Index, there is the following:

init(encodedOffset offset: Int <https://developer.apple.com/documentation/swift/int&gt;\)

and

var encodedOffset: Int <https://developer.apple.com/documentation/swift/int&gt; { get }

[1] String.Index | Apple Developer Documentation

This offset index system has a long history and a real meaning to the collection. The subscript s[i] has a fix meaning of "getting the i-th element in this collection", which is normal and direct. Get the range with offset indices, is also direct. It means the substring is from the i-th character up to the j-th character of the original string.

People used to play subscript, range with offset indices. Use string[string.index(i, offsetBy: 5)] is not as directly and easily as string[i + 5]. Also the Range<String.Index> is not as directly as Range<Offset>. Developers need to transfer the Range<String.Index> result of string.range(of:) to Range<OffsetIndex> to know the exact range of the substring. Range<String.Index> has a real meaning to the machine and underlaying data location for the substring, but Range<OffsetIndex> also has a direct location information for human being, and represents the abstract location concept of the collection (This is the most UNIMPEACHABLE REASON I could provide).

Offset index system is based on the nature of collection. Each element of the collection could be located by offset, which is a direct and simple conception to any collection. Right? Even the String with String.Index has some offset index property within it. For example: the `count` of the String, is the offset index of the endIndex.The enumerated() generated a sequence with elements contains the same offset as the offset index system provided. And when we apply Array(string), the string divided by each character and make the offset indices available for the new array.

The offset index system is just an assistant for collection, not a replacement to String.Index. We use String.Index to represent the normal underlaying of the String. We also could use offset indices to represent the nature of the Collection with its elements. Providing the offset index as a second choice to access elements in collections, is not only for the String struct, is for all collections, since it is the nature of the collection concept, and developer could choose use it or not.

We don't make the String.Index O(1), but translate the offset indices to the underlaying String.Index. Each time using subscript with offset index, we just need to translate offset indices to underlaying indices using c.index(startIndex, offsetBy:i), c.distance(from: startIndex, to:i)

We can make the offset indices available through extension to Collection (as my GitHub repo demo: https://github.com/frogcjn/OffsetIndexableCollection-String-Int-Indexable-\).

or we could make it at compile time:
for example

  c[1...]
compile to
  c[c.index(startIndex, offsetBy:1)...]

  let index: Int = s.index(of: "a")
compile to
  let index: Int = s.distance(from: s.startIndex, to: s.index(of:"a"))

  let index = 1 // if used in s only
  s[index..<index+2]
compile to
  let index = s.index(s.startIndex, offsetBy: 1)
  s[index..<s.index(index, offsetBy: 2)]

  let index = 1 // if used both in s1, s2
  s1[index..<index+2]
  s2[index..<index+2]
compile to
  let index = 1
  let index1 = s1.index(s.startIndex, offsetBy: index)
  let index2 = s2.index(s.startIndex, offsetBy: index)
  s1[index1..<s.index(index1, offsetBy: 2)]
  s2[index2..<s.index(index2, offsetBy: 2)]

I really want the team to consider providing the offset index system as an assistant to the collection. It is the very necessary basic concept of Collection.

Thanks!
Jiannan

在 2017年12月15日,上午2:13,Jordan Rose <jordan_rose@apple.com <mailto:jordan_rose@apple.com>> 写道:

We really don't want to make subscripting a non-O(1) operation. That just provides false convenience and encourages people to do the wrong thing with Strings anyway.

I'm always interested in why people want this kind of ability. Yes, it's nice for teaching programming to be able to split strings on character boundaries indexed by integers, but where does it come up in real life? The most common cases I see are trying to strip off the first or last character, or a known prefix or suffix, and I feel like we should have better answers for those than "use integer indexes" anyway.

Jordan

On Dec 13, 2017, at 22:30, Cao, Jiannan via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

Hi,

I would like to discuss the String.Index problem within Swift. I know the current situation of String.Index is based on the nature of the underlaying data structure of the string.

But could we just make String.Index contain offset information? Or make offset index subscript available for accessing character in String?

for example:
let a = "01234"
print(a[0]) // 0
print(a[0...4]) // 01234
print(a[...]) // 01234
print(a[..<2]) // 01
print(a[...2]) // 012
print(a[2...]) // 234
print(a[2...3]) // 23
print(a[2...2]) // 2
if let number = a.index(of: "1") {
    print(number) // 1
    print(a[number...]) // 1234
}

0 equals to Collection.Index of collection.index(startIndex, offsetBy: 0)
1 equals to Collection.Index of collection.index(startIndex, offsetBy: 1)
...
we keep the String.Index, but allow another kind of index, which is called "offsetIndex" to access the String.Index and the character in the string.
Any Collection could use the offset index to access their element, regarding the real index of it.

I have make the Collection OffsetIndexable protocol available here, and make it more accessible for StringProtocol considering all API related to the index.

GitHub - frogcjn/OffsetIndexableCollection-String-Int-Indexable-: OffsetIndexableCollection (String Int Indexable)

If someone want to make the offset index/range available for any collection, you just need to extend the collection:
extension String : OffsetIndexableCollection {
}

extension Substring : OffsetIndexableCollection {
}

I hope the Swift core team could consider bring the offset index to string, or make it available to other collection, thus let developer to decide whether their collection could use offset indices as an assistant for the real index of the collection.

Thanks!
Jiannan

_______________________________________________
swift-dev mailing list
swift-dev@swift.org <mailto:swift-dev@swift.org>
https://lists.swift.org/mailman/listinfo/swift-dev

_______________________________________________
swift-dev mailing list
swift-dev@swift.org <mailto:swift-dev@swift.org>
https://lists.swift.org/mailman/listinfo/swift-dev

_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

Sorry for my unclear "offset indexable".
So what I mean by "offset indexable" is providing a "collection-element level offset indexing". This indexing could provide to any collection, because it is the basic concept of collections.
The unicode offset is different than it and is important to String. Thus I just want the team to consider providing collections with the collection-element level offset indexing as an assistant to String.Index (which is the unicode level offset indexing).

···

在 2017年12月15日,上午9:34,Michael Ilseman <milseman@apple.com <mailto:milseman@apple.com>> 写道:

Yes, I was trying to highlight that they are different and should be treated different. This was because it seemed you were conflating the two in your argument. You claim that people expect it, and I’m pointing out that what people actually expect (assuming they’re coming from C or languages with a similar model) already exists as those models deal in encoded offsets.

More important than expectations surrounding what to provide to a subscript are expectations surrounding algorithmic complexity. This has security implications. The expectation of subscript is that it is “constant-ish”, for a fuzzy hand-wavy definition of “constant-ish” which includes amortized constant or logarithmic.

Now, I agree with the overall sentiment that `index(offsetBy:)` is unwieldy. I am interested in approaches to improve this. But, we cannot throw linear complexity into subscript without extreme justification.

On Dec 14, 2017, at 5:25 PM, Cao, Jiannan <frogcjn@163.com <mailto:frogcjn@163.com>> wrote:

This offset is unicode offset, is not the offset of element.
For example: index(startIndex, offsetBy:1) is encodedOffset 4 or 8, not 1.

Offset indexable is based on the offset of count of each element/index. it is the same result of s.index(s.startIndex, offsetBy:i)
The encodedOffset is the underlaying offset of unicode string, not the same concept of the offset index of collection.

The offset indexable is meaning to the elements and index of collection (i-th element of the collection), not related to the unicode offset (which is the underlaying data offset meaning to the UTF-16 String).

These two offset is totally different.

Best,
Jiannan

在 2017年12月15日,上午9:17,Michael Ilseman <milseman@apple.com <mailto:milseman@apple.com>> 写道:

On Dec 14, 2017, at 4:49 PM, Cao, Jiannan via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

People used to the offset index system instead of the String.Index. Using offset indices to name the elements, count the elements is normal and nature.

The offset system that you’re referring to is totally available in String today, if you’re willing for it to be the offset into the encoding. That’s the offset “people” you’re referring to are likely used to and consider normal and natural. On String.Index, there is the following:

init(encodedOffset offset: Int <https://developer.apple.com/documentation/swift/int&gt;\)

and

var encodedOffset: Int <https://developer.apple.com/documentation/swift/int&gt; { get }

[1] String.Index | Apple Developer Documentation

This offset index system has a long history and a real meaning to the collection. The subscript s[i] has a fix meaning of "getting the i-th element in this collection", which is normal and direct. Get the range with offset indices, is also direct. It means the substring is from the i-th character up to the j-th character of the original string.

People used to play subscript, range with offset indices. Use string[string.index(i, offsetBy: 5)] is not as directly and easily as string[i + 5]. Also the Range<String.Index> is not as directly as Range<Offset>. Developers need to transfer the Range<String.Index> result of string.range(of:) to Range<OffsetIndex> to know the exact range of the substring. Range<String.Index> has a real meaning to the machine and underlaying data location for the substring, but Range<OffsetIndex> also has a direct location information for human being, and represents the abstract location concept of the collection (This is the most UNIMPEACHABLE REASON I could provide).

Offset index system is based on the nature of collection. Each element of the collection could be located by offset, which is a direct and simple conception to any collection. Right? Even the String with String.Index has some offset index property within it. For example: the `count` of the String, is the offset index of the endIndex.The enumerated() generated a sequence with elements contains the same offset as the offset index system provided. And when we apply Array(string), the string divided by each character and make the offset indices available for the new array.

The offset index system is just an assistant for collection, not a replacement to String.Index. We use String.Index to represent the normal underlaying of the String. We also could use offset indices to represent the nature of the Collection with its elements. Providing the offset index as a second choice to access elements in collections, is not only for the String struct, is for all collections, since it is the nature of the collection concept, and developer could choose use it or not.

We don't make the String.Index O(1), but translate the offset indices to the underlaying String.Index. Each time using subscript with offset index, we just need to translate offset indices to underlaying indices using c.index(startIndex, offsetBy:i), c.distance(from: startIndex, to:i)

We can make the offset indices available through extension to Collection (as my GitHub repo demo: https://github.com/frogcjn/OffsetIndexableCollection-String-Int-Indexable-\).

or we could make it at compile time:
for example

  c[1...]
compile to
  c[c.index(startIndex, offsetBy:1)...]

  let index: Int = s.index(of: "a")
compile to
  let index: Int = s.distance(from: s.startIndex, to: s.index(of:"a"))

  let index = 1 // if used in s only
  s[index..<index+2]
compile to
  let index = s.index(s.startIndex, offsetBy: 1)
  s[index..<s.index(index, offsetBy: 2)]

  let index = 1 // if used both in s1, s2
  s1[index..<index+2]
  s2[index..<index+2]
compile to
  let index = 1
  let index1 = s1.index(s.startIndex, offsetBy: index)
  let index2 = s2.index(s.startIndex, offsetBy: index)
  s1[index1..<s.index(index1, offsetBy: 2)]
  s2[index2..<s.index(index2, offsetBy: 2)]

I really want the team to consider providing the offset index system as an assistant to the collection. It is the very necessary basic concept of Collection.

Thanks!
Jiannan

在 2017年12月15日,上午2:13,Jordan Rose <jordan_rose@apple.com <mailto:jordan_rose@apple.com>> 写道:

We really don't want to make subscripting a non-O(1) operation. That just provides false convenience and encourages people to do the wrong thing with Strings anyway.

I'm always interested in why people want this kind of ability. Yes, it's nice for teaching programming to be able to split strings on character boundaries indexed by integers, but where does it come up in real life? The most common cases I see are trying to strip off the first or last character, or a known prefix or suffix, and I feel like we should have better answers for those than "use integer indexes" anyway.

Jordan

On Dec 13, 2017, at 22:30, Cao, Jiannan via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

Hi,

I would like to discuss the String.Index problem within Swift. I know the current situation of String.Index is based on the nature of the underlaying data structure of the string.

But could we just make String.Index contain offset information? Or make offset index subscript available for accessing character in String?

for example:
let a = "01234"
print(a[0]) // 0
print(a[0...4]) // 01234
print(a[...]) // 01234
print(a[..<2]) // 01
print(a[...2]) // 012
print(a[2...]) // 234
print(a[2...3]) // 23
print(a[2...2]) // 2
if let number = a.index(of: "1") {
    print(number) // 1
    print(a[number...]) // 1234
}

0 equals to Collection.Index of collection.index(startIndex, offsetBy: 0)
1 equals to Collection.Index of collection.index(startIndex, offsetBy: 1)
...
we keep the String.Index, but allow another kind of index, which is called "offsetIndex" to access the String.Index and the character in the string.
Any Collection could use the offset index to access their element, regarding the real index of it.

I have make the Collection OffsetIndexable protocol available here, and make it more accessible for StringProtocol considering all API related to the index.

GitHub - frogcjn/OffsetIndexableCollection-String-Int-Indexable-: OffsetIndexableCollection (String Int Indexable)

If someone want to make the offset index/range available for any collection, you just need to extend the collection:
extension String : OffsetIndexableCollection {
}

extension Substring : OffsetIndexableCollection {
}

I hope the Swift core team could consider bring the offset index to string, or make it available to other collection, thus let developer to decide whether their collection could use offset indices as an assistant for the real index of the collection.

Thanks!
Jiannan

_______________________________________________
swift-dev mailing list
swift-dev@swift.org <mailto:swift-dev@swift.org>
https://lists.swift.org/mailman/listinfo/swift-dev

_______________________________________________
swift-dev mailing list
swift-dev@swift.org <mailto:swift-dev@swift.org>
https://lists.swift.org/mailman/listinfo/swift-dev

Or we can copy the design of C++. the index keep a reference to the sequence.
The index can be offset by + operator, since it keeps a reference to the sequence owner, it should call the owner to offset the index.

let startIndex = s.startIndex
s[startIndex+1]

public struct MyIndex<T: Collection> : Comparable where T.Index == MyIndex {
    public let owner: T
    public static func + (lhs: MyIndex, rhs: T.IndexDistance) -> MyIndex {
        return lhs.owner.index(lhs, offsetBy: rhs)
    }
}

···

在 2017年12月15日,上午9:34,Michael Ilseman <milseman@apple.com> 写道:

Yes, I was trying to highlight that they are different and should be treated different. This was because it seemed you were conflating the two in your argument. You claim that people expect it, and I’m pointing out that what people actually expect (assuming they’re coming from C or languages with a similar model) already exists as those models deal in encoded offsets.

More important than expectations surrounding what to provide to a subscript are expectations surrounding algorithmic complexity. This has security implications. The expectation of subscript is that it is “constant-ish”, for a fuzzy hand-wavy definition of “constant-ish” which includes amortized constant or logarithmic.

Now, I agree with the overall sentiment that `index(offsetBy:)` is unwieldy. I am interested in approaches to improve this. But, we cannot throw linear complexity into subscript without extreme justification.

On Dec 14, 2017, at 5:25 PM, Cao, Jiannan <frogcjn@163.com <mailto:frogcjn@163.com>> wrote:

This offset is unicode offset, is not the offset of element.
For example: index(startIndex, offsetBy:1) is encodedOffset 4 or 8, not 1.

Offset indexable is based on the offset of count of each element/index. it is the same result of s.index(s.startIndex, offsetBy:i)
The encodedOffset is the underlaying offset of unicode string, not the same concept of the offset index of collection.

The offset indexable is meaning to the elements and index of collection (i-th element of the collection), not related to the unicode offset (which is the underlaying data offset meaning to the UTF-16 String).

These two offset is totally different.

Best,
Jiannan

在 2017年12月15日,上午9:17,Michael Ilseman <milseman@apple.com <mailto:milseman@apple.com>> 写道:

On Dec 14, 2017, at 4:49 PM, Cao, Jiannan via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

People used to the offset index system instead of the String.Index. Using offset indices to name the elements, count the elements is normal and nature.

The offset system that you’re referring to is totally available in String today, if you’re willing for it to be the offset into the encoding. That’s the offset “people” you’re referring to are likely used to and consider normal and natural. On String.Index, there is the following:

init(encodedOffset offset: Int <https://developer.apple.com/documentation/swift/int&gt;\)

and

var encodedOffset: Int <https://developer.apple.com/documentation/swift/int&gt; { get }

[1] String.Index | Apple Developer Documentation

This offset index system has a long history and a real meaning to the collection. The subscript s[i] has a fix meaning of "getting the i-th element in this collection", which is normal and direct. Get the range with offset indices, is also direct. It means the substring is from the i-th character up to the j-th character of the original string.

People used to play subscript, range with offset indices. Use string[string.index(i, offsetBy: 5)] is not as directly and easily as string[i + 5]. Also the Range<String.Index> is not as directly as Range<Offset>. Developers need to transfer the Range<String.Index> result of string.range(of:) to Range<OffsetIndex> to know the exact range of the substring. Range<String.Index> has a real meaning to the machine and underlaying data location for the substring, but Range<OffsetIndex> also has a direct location information for human being, and represents the abstract location concept of the collection (This is the most UNIMPEACHABLE REASON I could provide).

Offset index system is based on the nature of collection. Each element of the collection could be located by offset, which is a direct and simple conception to any collection. Right? Even the String with String.Index has some offset index property within it. For example: the `count` of the String, is the offset index of the endIndex.The enumerated() generated a sequence with elements contains the same offset as the offset index system provided. And when we apply Array(string), the string divided by each character and make the offset indices available for the new array.

The offset index system is just an assistant for collection, not a replacement to String.Index. We use String.Index to represent the normal underlaying of the String. We also could use offset indices to represent the nature of the Collection with its elements. Providing the offset index as a second choice to access elements in collections, is not only for the String struct, is for all collections, since it is the nature of the collection concept, and developer could choose use it or not.

We don't make the String.Index O(1), but translate the offset indices to the underlaying String.Index. Each time using subscript with offset index, we just need to translate offset indices to underlaying indices using c.index(startIndex, offsetBy:i), c.distance(from: startIndex, to:i)

We can make the offset indices available through extension to Collection (as my GitHub repo demo: https://github.com/frogcjn/OffsetIndexableCollection-String-Int-Indexable-\).

or we could make it at compile time:
for example

  c[1...]
compile to
  c[c.index(startIndex, offsetBy:1)...]

  let index: Int = s.index(of: "a")
compile to
  let index: Int = s.distance(from: s.startIndex, to: s.index(of:"a"))

  let index = 1 // if used in s only
  s[index..<index+2]
compile to
  let index = s.index(s.startIndex, offsetBy: 1)
  s[index..<s.index(index, offsetBy: 2)]

  let index = 1 // if used both in s1, s2
  s1[index..<index+2]
  s2[index..<index+2]
compile to
  let index = 1
  let index1 = s1.index(s.startIndex, offsetBy: index)
  let index2 = s2.index(s.startIndex, offsetBy: index)
  s1[index1..<s.index(index1, offsetBy: 2)]
  s2[index2..<s.index(index2, offsetBy: 2)]

I really want the team to consider providing the offset index system as an assistant to the collection. It is the very necessary basic concept of Collection.

Thanks!
Jiannan

在 2017年12月15日,上午2:13,Jordan Rose <jordan_rose@apple.com <mailto:jordan_rose@apple.com>> 写道:

We really don't want to make subscripting a non-O(1) operation. That just provides false convenience and encourages people to do the wrong thing with Strings anyway.

I'm always interested in why people want this kind of ability. Yes, it's nice for teaching programming to be able to split strings on character boundaries indexed by integers, but where does it come up in real life? The most common cases I see are trying to strip off the first or last character, or a known prefix or suffix, and I feel like we should have better answers for those than "use integer indexes" anyway.

Jordan

On Dec 13, 2017, at 22:30, Cao, Jiannan via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

Hi,

I would like to discuss the String.Index problem within Swift. I know the current situation of String.Index is based on the nature of the underlaying data structure of the string.

But could we just make String.Index contain offset information? Or make offset index subscript available for accessing character in String?

for example:
let a = "01234"
print(a[0]) // 0
print(a[0...4]) // 01234
print(a[...]) // 01234
print(a[..<2]) // 01
print(a[...2]) // 012
print(a[2...]) // 234
print(a[2...3]) // 23
print(a[2...2]) // 2
if let number = a.index(of: "1") {
    print(number) // 1
    print(a[number...]) // 1234
}

0 equals to Collection.Index of collection.index(startIndex, offsetBy: 0)
1 equals to Collection.Index of collection.index(startIndex, offsetBy: 1)
...
we keep the String.Index, but allow another kind of index, which is called "offsetIndex" to access the String.Index and the character in the string.
Any Collection could use the offset index to access their element, regarding the real index of it.

I have make the Collection OffsetIndexable protocol available here, and make it more accessible for StringProtocol considering all API related to the index.

GitHub - frogcjn/OffsetIndexableCollection-String-Int-Indexable-: OffsetIndexableCollection (String Int Indexable)

If someone want to make the offset index/range available for any collection, you just need to extend the collection:
extension String : OffsetIndexableCollection {
}

extension Substring : OffsetIndexableCollection {
}

I hope the Swift core team could consider bring the offset index to string, or make it available to other collection, thus let developer to decide whether their collection could use offset indices as an assistant for the real index of the collection.

Thanks!
Jiannan

_______________________________________________
swift-dev mailing list
swift-dev@swift.org <mailto:swift-dev@swift.org>
https://lists.swift.org/mailman/listinfo/swift-dev

_______________________________________________
swift-dev mailing list
swift-dev@swift.org <mailto:swift-dev@swift.org>
https://lists.swift.org/mailman/listinfo/swift-dev

Or we can copy the design of std::vector::iterator in C++.The index could keep a reference to the collection.
When the index being offset by + operator, since it keeps a reference to the collection owner, it could call the owner to offset the index.

let startIndex = s.startIndex
s[startIndex+1]

public struct MyIndex<T: Collection> : Comparable where T.Index == MyIndex {
    public let owner: T
...
    public static func + (lhs: MyIndex, rhs: T.IndexDistance) -> MyIndex {
        return lhs.owner.index(lhs, offsetBy: rhs)
    }
}

···

在 2017年12月15日,上午9:34,Michael Ilseman <milseman@apple.com <mailto:milseman@apple.com>> 写道:

Yes, I was trying to highlight that they are different and should be treated different. This was because it seemed you were conflating the two in your argument. You claim that people expect it, and I’m pointing out that what people actually expect (assuming they’re coming from C or languages with a similar model) already exists as those models deal in encoded offsets.

More important than expectations surrounding what to provide to a subscript are expectations surrounding algorithmic complexity. This has security implications. The expectation of subscript is that it is “constant-ish”, for a fuzzy hand-wavy definition of “constant-ish” which includes amortized constant or logarithmic.

Now, I agree with the overall sentiment that `index(offsetBy:)` is unwieldy. I am interested in approaches to improve this. But, we cannot throw linear complexity into subscript without extreme justification.

On Dec 14, 2017, at 5:25 PM, Cao, Jiannan <frogcjn@163.com <mailto:frogcjn@163.com>> wrote:

This offset is unicode offset, is not the offset of element.
For example: index(startIndex, offsetBy:1) is encodedOffset 4 or 8, not 1.

Offset indexable is based on the offset of count of each element/index. it is the same result of s.index(s.startIndex, offsetBy:i)
The encodedOffset is the underlaying offset of unicode string, not the same concept of the offset index of collection.

The offset indexable is meaning to the elements and index of collection (i-th element of the collection), not related to the unicode offset (which is the underlaying data offset meaning to the UTF-16 String).

These two offset is totally different.

Best,
Jiannan

在 2017年12月15日,上午9:17,Michael Ilseman <milseman@apple.com <mailto:milseman@apple.com>> 写道:

On Dec 14, 2017, at 4:49 PM, Cao, Jiannan via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

People used to the offset index system instead of the String.Index. Using offset indices to name the elements, count the elements is normal and nature.

The offset system that you’re referring to is totally available in String today, if you’re willing for it to be the offset into the encoding. That’s the offset “people” you’re referring to are likely used to and consider normal and natural. On String.Index, there is the following:

init(encodedOffset offset: Int <https://developer.apple.com/documentation/swift/int&gt;\)

and

var encodedOffset: Int <https://developer.apple.com/documentation/swift/int&gt; { get }

[1] String.Index | Apple Developer Documentation

This offset index system has a long history and a real meaning to the collection. The subscript s[i] has a fix meaning of "getting the i-th element in this collection", which is normal and direct. Get the range with offset indices, is also direct. It means the substring is from the i-th character up to the j-th character of the original string.

People used to play subscript, range with offset indices. Use string[string.index(i, offsetBy: 5)] is not as directly and easily as string[i + 5]. Also the Range<String.Index> is not as directly as Range<Offset>. Developers need to transfer the Range<String.Index> result of string.range(of:) to Range<OffsetIndex> to know the exact range of the substring. Range<String.Index> has a real meaning to the machine and underlaying data location for the substring, but Range<OffsetIndex> also has a direct location information for human being, and represents the abstract location concept of the collection (This is the most UNIMPEACHABLE REASON I could provide).

Offset index system is based on the nature of collection. Each element of the collection could be located by offset, which is a direct and simple conception to any collection. Right? Even the String with String.Index has some offset index property within it. For example: the `count` of the String, is the offset index of the endIndex.The enumerated() generated a sequence with elements contains the same offset as the offset index system provided. And when we apply Array(string), the string divided by each character and make the offset indices available for the new array.

The offset index system is just an assistant for collection, not a replacement to String.Index. We use String.Index to represent the normal underlaying of the String. We also could use offset indices to represent the nature of the Collection with its elements. Providing the offset index as a second choice to access elements in collections, is not only for the String struct, is for all collections, since it is the nature of the collection concept, and developer could choose use it or not.

We don't make the String.Index O(1), but translate the offset indices to the underlaying String.Index. Each time using subscript with offset index, we just need to translate offset indices to underlaying indices using c.index(startIndex, offsetBy:i), c.distance(from: startIndex, to:i)

We can make the offset indices available through extension to Collection (as my GitHub repo demo: https://github.com/frogcjn/OffsetIndexableCollection-String-Int-Indexable-\).

or we could make it at compile time:
for example

  c[1...]
compile to
  c[c.index(startIndex, offsetBy:1)...]

  let index: Int = s.index(of: "a")
compile to
  let index: Int = s.distance(from: s.startIndex, to: s.index(of:"a"))

  let index = 1 // if used in s only
  s[index..<index+2]
compile to
  let index = s.index(s.startIndex, offsetBy: 1)
  s[index..<s.index(index, offsetBy: 2)]

  let index = 1 // if used both in s1, s2
  s1[index..<index+2]
  s2[index..<index+2]
compile to
  let index = 1
  let index1 = s1.index(s.startIndex, offsetBy: index)
  let index2 = s2.index(s.startIndex, offsetBy: index)
  s1[index1..<s.index(index1, offsetBy: 2)]
  s2[index2..<s.index(index2, offsetBy: 2)]

I really want the team to consider providing the offset index system as an assistant to the collection. It is the very necessary basic concept of Collection.

Thanks!
Jiannan

在 2017年12月15日,上午2:13,Jordan Rose <jordan_rose@apple.com <mailto:jordan_rose@apple.com>> 写道:

We really don't want to make subscripting a non-O(1) operation. That just provides false convenience and encourages people to do the wrong thing with Strings anyway.

I'm always interested in why people want this kind of ability. Yes, it's nice for teaching programming to be able to split strings on character boundaries indexed by integers, but where does it come up in real life? The most common cases I see are trying to strip off the first or last character, or a known prefix or suffix, and I feel like we should have better answers for those than "use integer indexes" anyway.

Jordan

On Dec 13, 2017, at 22:30, Cao, Jiannan via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

Hi,

I would like to discuss the String.Index problem within Swift. I know the current situation of String.Index is based on the nature of the underlaying data structure of the string.

But could we just make String.Index contain offset information? Or make offset index subscript available for accessing character in String?

for example:
let a = "01234"
print(a[0]) // 0
print(a[0...4]) // 01234
print(a[...]) // 01234
print(a[..<2]) // 01
print(a[...2]) // 012
print(a[2...]) // 234
print(a[2...3]) // 23
print(a[2...2]) // 2
if let number = a.index(of: "1") {
    print(number) // 1
    print(a[number...]) // 1234
}

0 equals to Collection.Index of collection.index(startIndex, offsetBy: 0)
1 equals to Collection.Index of collection.index(startIndex, offsetBy: 1)
...
we keep the String.Index, but allow another kind of index, which is called "offsetIndex" to access the String.Index and the character in the string.
Any Collection could use the offset index to access their element, regarding the real index of it.

I have make the Collection OffsetIndexable protocol available here, and make it more accessible for StringProtocol considering all API related to the index.

GitHub - frogcjn/OffsetIndexableCollection-String-Int-Indexable-: OffsetIndexableCollection (String Int Indexable)

If someone want to make the offset index/range available for any collection, you just need to extend the collection:
extension String : OffsetIndexableCollection {
}

extension Substring : OffsetIndexableCollection {
}

I hope the Swift core team could consider bring the offset index to string, or make it available to other collection, thus let developer to decide whether their collection could use offset indices as an assistant for the real index of the collection.

Thanks!
Jiannan

_______________________________________________
swift-dev mailing list
swift-dev@swift.org <mailto:swift-dev@swift.org>
https://lists.swift.org/mailman/listinfo/swift-dev

_______________________________________________
swift-dev mailing list
swift-dev@swift.org <mailto:swift-dev@swift.org>
https://lists.swift.org/mailman/listinfo/swift-dev

Or we can copy the design of std::vector::iterator in C++.The index could keep a reference to the collection.
When the index being offset by + operator, it could call the owner to offset the index, since it keeps a reference to the collection owner.

let startIndex = s.startIndex
s[startIndex+1]

public struct MyIndex<T: Collection> : Comparable where T.Index == MyIndex {
    public let owner: T
...
    public static func + (lhs: MyIndex, rhs: T.IndexDistance) -> MyIndex {
        return lhs.owner.index(lhs, offsetBy: rhs)
    }
}

···

在 2017年12月15日,上午9:34,Michael Ilseman <milseman@apple.com <mailto:milseman@apple.com>> 写道:

Yes, I was trying to highlight that they are different and should be treated different. This was because it seemed you were conflating the two in your argument. You claim that people expect it, and I’m pointing out that what people actually expect (assuming they’re coming from C or languages with a similar model) already exists as those models deal in encoded offsets.

More important than expectations surrounding what to provide to a subscript are expectations surrounding algorithmic complexity. This has security implications. The expectation of subscript is that it is “constant-ish”, for a fuzzy hand-wavy definition of “constant-ish” which includes amortized constant or logarithmic.

Now, I agree with the overall sentiment that `index(offsetBy:)` is unwieldy. I am interested in approaches to improve this. But, we cannot throw linear complexity into subscript without extreme justification.

On Dec 14, 2017, at 5:25 PM, Cao, Jiannan <frogcjn@163.com <mailto:frogcjn@163.com>> wrote:

This offset is unicode offset, is not the offset of element.
For example: index(startIndex, offsetBy:1) is encodedOffset 4 or 8, not 1.

Offset indexable is based on the offset of count of each element/index. it is the same result of s.index(s.startIndex, offsetBy:i)
The encodedOffset is the underlaying offset of unicode string, not the same concept of the offset index of collection.

The offset indexable is meaning to the elements and index of collection (i-th element of the collection), not related to the unicode offset (which is the underlaying data offset meaning to the UTF-16 String).

These two offset is totally different.

Best,
Jiannan

在 2017年12月15日,上午9:17,Michael Ilseman <milseman@apple.com <mailto:milseman@apple.com>> 写道:

On Dec 14, 2017, at 4:49 PM, Cao, Jiannan via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

People used to the offset index system instead of the String.Index. Using offset indices to name the elements, count the elements is normal and nature.

The offset system that you’re referring to is totally available in String today, if you’re willing for it to be the offset into the encoding. That’s the offset “people” you’re referring to are likely used to and consider normal and natural. On String.Index, there is the following:

init(encodedOffset offset: Int <https://developer.apple.com/documentation/swift/int&gt;\)

and

var encodedOffset: Int <https://developer.apple.com/documentation/swift/int&gt; { get }

[1] String.Index | Apple Developer Documentation

This offset index system has a long history and a real meaning to the collection. The subscript s[i] has a fix meaning of "getting the i-th element in this collection", which is normal and direct. Get the range with offset indices, is also direct. It means the substring is from the i-th character up to the j-th character of the original string.

People used to play subscript, range with offset indices. Use string[string.index(i, offsetBy: 5)] is not as directly and easily as string[i + 5]. Also the Range<String.Index> is not as directly as Range<Offset>. Developers need to transfer the Range<String.Index> result of string.range(of:) to Range<OffsetIndex> to know the exact range of the substring. Range<String.Index> has a real meaning to the machine and underlaying data location for the substring, but Range<OffsetIndex> also has a direct location information for human being, and represents the abstract location concept of the collection (This is the most UNIMPEACHABLE REASON I could provide).

Offset index system is based on the nature of collection. Each element of the collection could be located by offset, which is a direct and simple conception to any collection. Right? Even the String with String.Index has some offset index property within it. For example: the `count` of the String, is the offset index of the endIndex.The enumerated() generated a sequence with elements contains the same offset as the offset index system provided. And when we apply Array(string), the string divided by each character and make the offset indices available for the new array.

The offset index system is just an assistant for collection, not a replacement to String.Index. We use String.Index to represent the normal underlaying of the String. We also could use offset indices to represent the nature of the Collection with its elements. Providing the offset index as a second choice to access elements in collections, is not only for the String struct, is for all collections, since it is the nature of the collection concept, and developer could choose use it or not.

We don't make the String.Index O(1), but translate the offset indices to the underlaying String.Index. Each time using subscript with offset index, we just need to translate offset indices to underlaying indices using c.index(startIndex, offsetBy:i), c.distance(from: startIndex, to:i)

We can make the offset indices available through extension to Collection (as my GitHub repo demo: https://github.com/frogcjn/OffsetIndexableCollection-String-Int-Indexable-\).

or we could make it at compile time:
for example

  c[1...]
compile to
  c[c.index(startIndex, offsetBy:1)...]

  let index: Int = s.index(of: "a")
compile to
  let index: Int = s.distance(from: s.startIndex, to: s.index(of:"a"))

  let index = 1 // if used in s only
  s[index..<index+2]
compile to
  let index = s.index(s.startIndex, offsetBy: 1)
  s[index..<s.index(index, offsetBy: 2)]

  let index = 1 // if used both in s1, s2
  s1[index..<index+2]
  s2[index..<index+2]
compile to
  let index = 1
  let index1 = s1.index(s.startIndex, offsetBy: index)
  let index2 = s2.index(s.startIndex, offsetBy: index)
  s1[index1..<s.index(index1, offsetBy: 2)]
  s2[index2..<s.index(index2, offsetBy: 2)]

I really want the team to consider providing the offset index system as an assistant to the collection. It is the very necessary basic concept of Collection.

Thanks!
Jiannan

在 2017年12月15日,上午2:13,Jordan Rose <jordan_rose@apple.com <mailto:jordan_rose@apple.com>> 写道:

We really don't want to make subscripting a non-O(1) operation. That just provides false convenience and encourages people to do the wrong thing with Strings anyway.

I'm always interested in why people want this kind of ability. Yes, it's nice for teaching programming to be able to split strings on character boundaries indexed by integers, but where does it come up in real life? The most common cases I see are trying to strip off the first or last character, or a known prefix or suffix, and I feel like we should have better answers for those than "use integer indexes" anyway.

Jordan

On Dec 13, 2017, at 22:30, Cao, Jiannan via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

Hi,

I would like to discuss the String.Index problem within Swift. I know the current situation of String.Index is based on the nature of the underlaying data structure of the string.

But could we just make String.Index contain offset information? Or make offset index subscript available for accessing character in String?

for example:
let a = "01234"
print(a[0]) // 0
print(a[0...4]) // 01234
print(a[...]) // 01234
print(a[..<2]) // 01
print(a[...2]) // 012
print(a[2...]) // 234
print(a[2...3]) // 23
print(a[2...2]) // 2
if let number = a.index(of: "1") {
    print(number) // 1
    print(a[number...]) // 1234
}

0 equals to Collection.Index of collection.index(startIndex, offsetBy: 0)
1 equals to Collection.Index of collection.index(startIndex, offsetBy: 1)
...
we keep the String.Index, but allow another kind of index, which is called "offsetIndex" to access the String.Index and the character in the string.
Any Collection could use the offset index to access their element, regarding the real index of it.

I have make the Collection OffsetIndexable protocol available here, and make it more accessible for StringProtocol considering all API related to the index.

GitHub - frogcjn/OffsetIndexableCollection-String-Int-Indexable-: OffsetIndexableCollection (String Int Indexable)

If someone want to make the offset index/range available for any collection, you just need to extend the collection:
extension String : OffsetIndexableCollection {
}

extension Substring : OffsetIndexableCollection {
}

I hope the Swift core team could consider bring the offset index to string, or make it available to other collection, thus let developer to decide whether their collection could use offset indices as an assistant for the real index of the collection.

Thanks!
Jiannan

_______________________________________________
swift-dev mailing list
swift-dev@swift.org <mailto:swift-dev@swift.org>
https://lists.swift.org/mailman/listinfo/swift-dev

_______________________________________________
swift-dev mailing list
swift-dev@swift.org <mailto:swift-dev@swift.org>
https://lists.swift.org/mailman/listinfo/swift-dev

An index that keeps a reference to the collection is an iterator, e.g. the IndexingIterator[1], which String already provides.

[1] IndexingIterator | Apple Developer Documentation

···

On Dec 18, 2017, at 12:53 AM, Cao, Jiannan <frogcjn@163.com> wrote:

Or we can copy the design of std::vector::iterator in C++.The index could keep a reference to the collection.
When the index being offset by + operator, it could call the owner to offset the index, since it keeps a reference to the collection owner.

let startIndex = s.startIndex
s[startIndex+1]

public struct MyIndex<T: Collection> : Comparable where T.Index == MyIndex {
    public let owner: T
...
    public static func + (lhs: MyIndex, rhs: T.IndexDistance) -> MyIndex {
        return lhs.owner.index(lhs, offsetBy: rhs)
    }
}

在 2017年12月15日,上午9:34,Michael Ilseman <milseman@apple.com <mailto:milseman@apple.com>> 写道:

Yes, I was trying to highlight that they are different and should be treated different. This was because it seemed you were conflating the two in your argument. You claim that people expect it, and I’m pointing out that what people actually expect (assuming they’re coming from C or languages with a similar model) already exists as those models deal in encoded offsets.

More important than expectations surrounding what to provide to a subscript are expectations surrounding algorithmic complexity. This has security implications. The expectation of subscript is that it is “constant-ish”, for a fuzzy hand-wavy definition of “constant-ish” which includes amortized constant or logarithmic.

Now, I agree with the overall sentiment that `index(offsetBy:)` is unwieldy. I am interested in approaches to improve this. But, we cannot throw linear complexity into subscript without extreme justification.

On Dec 14, 2017, at 5:25 PM, Cao, Jiannan <frogcjn@163.com <mailto:frogcjn@163.com>> wrote:

This offset is unicode offset, is not the offset of element.
For example: index(startIndex, offsetBy:1) is encodedOffset 4 or 8, not 1.

Offset indexable is based on the offset of count of each element/index. it is the same result of s.index(s.startIndex, offsetBy:i)
The encodedOffset is the underlaying offset of unicode string, not the same concept of the offset index of collection.

The offset indexable is meaning to the elements and index of collection (i-th element of the collection), not related to the unicode offset (which is the underlaying data offset meaning to the UTF-16 String).

These two offset is totally different.

Best,
Jiannan

在 2017年12月15日,上午9:17,Michael Ilseman <milseman@apple.com <mailto:milseman@apple.com>> 写道:

On Dec 14, 2017, at 4:49 PM, Cao, Jiannan via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

People used to the offset index system instead of the String.Index. Using offset indices to name the elements, count the elements is normal and nature.

The offset system that you’re referring to is totally available in String today, if you’re willing for it to be the offset into the encoding. That’s the offset “people” you’re referring to are likely used to and consider normal and natural. On String.Index, there is the following:

init(encodedOffset offset: Int <https://developer.apple.com/documentation/swift/int&gt;\)

and

var encodedOffset: Int <https://developer.apple.com/documentation/swift/int&gt; { get }

[1] String.Index | Apple Developer Documentation

This offset index system has a long history and a real meaning to the collection. The subscript s[i] has a fix meaning of "getting the i-th element in this collection", which is normal and direct. Get the range with offset indices, is also direct. It means the substring is from the i-th character up to the j-th character of the original string.

People used to play subscript, range with offset indices. Use string[string.index(i, offsetBy: 5)] is not as directly and easily as string[i + 5]. Also the Range<String.Index> is not as directly as Range<Offset>. Developers need to transfer the Range<String.Index> result of string.range(of:) to Range<OffsetIndex> to know the exact range of the substring. Range<String.Index> has a real meaning to the machine and underlaying data location for the substring, but Range<OffsetIndex> also has a direct location information for human being, and represents the abstract location concept of the collection (This is the most UNIMPEACHABLE REASON I could provide).

Offset index system is based on the nature of collection. Each element of the collection could be located by offset, which is a direct and simple conception to any collection. Right? Even the String with String.Index has some offset index property within it. For example: the `count` of the String, is the offset index of the endIndex.The enumerated() generated a sequence with elements contains the same offset as the offset index system provided. And when we apply Array(string), the string divided by each character and make the offset indices available for the new array.

The offset index system is just an assistant for collection, not a replacement to String.Index. We use String.Index to represent the normal underlaying of the String. We also could use offset indices to represent the nature of the Collection with its elements. Providing the offset index as a second choice to access elements in collections, is not only for the String struct, is for all collections, since it is the nature of the collection concept, and developer could choose use it or not.

We don't make the String.Index O(1), but translate the offset indices to the underlaying String.Index. Each time using subscript with offset index, we just need to translate offset indices to underlaying indices using c.index(startIndex, offsetBy:i), c.distance(from: startIndex, to:i)

We can make the offset indices available through extension to Collection (as my GitHub repo demo: https://github.com/frogcjn/OffsetIndexableCollection-String-Int-Indexable-\).

or we could make it at compile time:
for example

  c[1...]
compile to
  c[c.index(startIndex, offsetBy:1)...]

  let index: Int = s.index(of: "a")
compile to
  let index: Int = s.distance(from: s.startIndex, to: s.index(of:"a"))

  let index = 1 // if used in s only
  s[index..<index+2]
compile to
  let index = s.index(s.startIndex, offsetBy: 1)
  s[index..<s.index(index, offsetBy: 2)]

  let index = 1 // if used both in s1, s2
  s1[index..<index+2]
  s2[index..<index+2]
compile to
  let index = 1
  let index1 = s1.index(s.startIndex, offsetBy: index)
  let index2 = s2.index(s.startIndex, offsetBy: index)
  s1[index1..<s.index(index1, offsetBy: 2)]
  s2[index2..<s.index(index2, offsetBy: 2)]

I really want the team to consider providing the offset index system as an assistant to the collection. It is the very necessary basic concept of Collection.

Thanks!
Jiannan

在 2017年12月15日,上午2:13,Jordan Rose <jordan_rose@apple.com <mailto:jordan_rose@apple.com>> 写道:

We really don't want to make subscripting a non-O(1) operation. That just provides false convenience and encourages people to do the wrong thing with Strings anyway.

I'm always interested in why people want this kind of ability. Yes, it's nice for teaching programming to be able to split strings on character boundaries indexed by integers, but where does it come up in real life? The most common cases I see are trying to strip off the first or last character, or a known prefix or suffix, and I feel like we should have better answers for those than "use integer indexes" anyway.

Jordan

On Dec 13, 2017, at 22:30, Cao, Jiannan via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

Hi,

I would like to discuss the String.Index problem within Swift. I know the current situation of String.Index is based on the nature of the underlaying data structure of the string.

But could we just make String.Index contain offset information? Or make offset index subscript available for accessing character in String?

for example:
let a = "01234"
print(a[0]) // 0
print(a[0...4]) // 01234
print(a[...]) // 01234
print(a[..<2]) // 01
print(a[...2]) // 012
print(a[2...]) // 234
print(a[2...3]) // 23
print(a[2...2]) // 2
if let number = a.index(of: "1") {
    print(number) // 1
    print(a[number...]) // 1234
}

0 equals to Collection.Index of collection.index(startIndex, offsetBy: 0)
1 equals to Collection.Index of collection.index(startIndex, offsetBy: 1)
...
we keep the String.Index, but allow another kind of index, which is called "offsetIndex" to access the String.Index and the character in the string.
Any Collection could use the offset index to access their element, regarding the real index of it.

I have make the Collection OffsetIndexable protocol available here, and make it more accessible for StringProtocol considering all API related to the index.

GitHub - frogcjn/OffsetIndexableCollection-String-Int-Indexable-: OffsetIndexableCollection (String Int Indexable)

If someone want to make the offset index/range available for any collection, you just need to extend the collection:
extension String : OffsetIndexableCollection {
}

extension Substring : OffsetIndexableCollection {
}

I hope the Swift core team could consider bring the offset index to string, or make it available to other collection, thus let developer to decide whether their collection could use offset indices as an assistant for the real index of the collection.

Thanks!
Jiannan

_______________________________________________
swift-dev mailing list
swift-dev@swift.org <mailto:swift-dev@swift.org>
https://lists.swift.org/mailman/listinfo/swift-dev

_______________________________________________
swift-dev mailing list
swift-dev@swift.org <mailto:swift-dev@swift.org>
https://lists.swift.org/mailman/listinfo/swift-dev

OK, but how do you use IndexingIterator to get element and not change itself?

In C++, the iterator is more like index. because it could be used to get the element, and not change itself, and it can be offset back and forth not only by one.
  *it
  *(it+5)
  *(it-5)
but in Swift,
you can only using next() to get the element and change the iterator to point to the next item:
  it.next()
  it.next()
  it.next()
  it.next()
  it.next()
and you could not go to previous item.

The iterator in Swift could not be used as index.

···

在 2017年12月19日,上午3:00,Michael Ilseman <milseman@apple.com <mailto:milseman@apple.com>> 写道:

An index that keeps a reference to the collection is an iterator, e.g. the IndexingIterator[1], which String already provides.

[1] IndexingIterator | Apple Developer Documentation

On Dec 18, 2017, at 12:53 AM, Cao, Jiannan <frogcjn@163.com <mailto:frogcjn@163.com>> wrote:

Or we can copy the design of std::vector::iterator in C++.The index could keep a reference to the collection.
When the index being offset by + operator, it could call the owner to offset the index, since it keeps a reference to the collection owner.

let startIndex = s.startIndex
s[startIndex+1]

public struct MyIndex<T: Collection> : Comparable where T.Index == MyIndex {
    public let owner: T
...
    public static func + (lhs: MyIndex, rhs: T.IndexDistance) -> MyIndex {
        return lhs.owner.index(lhs, offsetBy: rhs)
    }
}

在 2017年12月15日,上午9:34,Michael Ilseman <milseman@apple.com <mailto:milseman@apple.com>> 写道:

Yes, I was trying to highlight that they are different and should be treated different. This was because it seemed you were conflating the two in your argument. You claim that people expect it, and I’m pointing out that what people actually expect (assuming they’re coming from C or languages with a similar model) already exists as those models deal in encoded offsets.

More important than expectations surrounding what to provide to a subscript are expectations surrounding algorithmic complexity. This has security implications. The expectation of subscript is that it is “constant-ish”, for a fuzzy hand-wavy definition of “constant-ish” which includes amortized constant or logarithmic.

Now, I agree with the overall sentiment that `index(offsetBy:)` is unwieldy. I am interested in approaches to improve this. But, we cannot throw linear complexity into subscript without extreme justification.

On Dec 14, 2017, at 5:25 PM, Cao, Jiannan <frogcjn@163.com <mailto:frogcjn@163.com>> wrote:

This offset is unicode offset, is not the offset of element.
For example: index(startIndex, offsetBy:1) is encodedOffset 4 or 8, not 1.

Offset indexable is based on the offset of count of each element/index. it is the same result of s.index(s.startIndex, offsetBy:i)
The encodedOffset is the underlaying offset of unicode string, not the same concept of the offset index of collection.

The offset indexable is meaning to the elements and index of collection (i-th element of the collection), not related to the unicode offset (which is the underlaying data offset meaning to the UTF-16 String).

These two offset is totally different.

Best,
Jiannan

在 2017年12月15日,上午9:17,Michael Ilseman <milseman@apple.com <mailto:milseman@apple.com>> 写道:

On Dec 14, 2017, at 4:49 PM, Cao, Jiannan via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

People used to the offset index system instead of the String.Index. Using offset indices to name the elements, count the elements is normal and nature.

The offset system that you’re referring to is totally available in String today, if you’re willing for it to be the offset into the encoding. That’s the offset “people” you’re referring to are likely used to and consider normal and natural. On String.Index, there is the following:

init(encodedOffset offset: Int <https://developer.apple.com/documentation/swift/int&gt;\)

and

var encodedOffset: Int <https://developer.apple.com/documentation/swift/int&gt; { get }

[1] String.Index | Apple Developer Documentation

This offset index system has a long history and a real meaning to the collection. The subscript s[i] has a fix meaning of "getting the i-th element in this collection", which is normal and direct. Get the range with offset indices, is also direct. It means the substring is from the i-th character up to the j-th character of the original string.

People used to play subscript, range with offset indices. Use string[string.index(i, offsetBy: 5)] is not as directly and easily as string[i + 5]. Also the Range<String.Index> is not as directly as Range<Offset>. Developers need to transfer the Range<String.Index> result of string.range(of:) to Range<OffsetIndex> to know the exact range of the substring. Range<String.Index> has a real meaning to the machine and underlaying data location for the substring, but Range<OffsetIndex> also has a direct location information for human being, and represents the abstract location concept of the collection (This is the most UNIMPEACHABLE REASON I could provide).

Offset index system is based on the nature of collection. Each element of the collection could be located by offset, which is a direct and simple conception to any collection. Right? Even the String with String.Index has some offset index property within it. For example: the `count` of the String, is the offset index of the endIndex.The enumerated() generated a sequence with elements contains the same offset as the offset index system provided. And when we apply Array(string), the string divided by each character and make the offset indices available for the new array.

The offset index system is just an assistant for collection, not a replacement to String.Index. We use String.Index to represent the normal underlaying of the String. We also could use offset indices to represent the nature of the Collection with its elements. Providing the offset index as a second choice to access elements in collections, is not only for the String struct, is for all collections, since it is the nature of the collection concept, and developer could choose use it or not.

We don't make the String.Index O(1), but translate the offset indices to the underlaying String.Index. Each time using subscript with offset index, we just need to translate offset indices to underlaying indices using c.index(startIndex, offsetBy:i), c.distance(from: startIndex, to:i)

We can make the offset indices available through extension to Collection (as my GitHub repo demo: https://github.com/frogcjn/OffsetIndexableCollection-String-Int-Indexable-\).

or we could make it at compile time:
for example

  c[1...]
compile to
  c[c.index(startIndex, offsetBy:1)...]

  let index: Int = s.index(of: "a")
compile to
  let index: Int = s.distance(from: s.startIndex, to: s.index(of:"a"))

  let index = 1 // if used in s only
  s[index..<index+2]
compile to
  let index = s.index(s.startIndex, offsetBy: 1)
  s[index..<s.index(index, offsetBy: 2)]

  let index = 1 // if used both in s1, s2
  s1[index..<index+2]
  s2[index..<index+2]
compile to
  let index = 1
  let index1 = s1.index(s.startIndex, offsetBy: index)
  let index2 = s2.index(s.startIndex, offsetBy: index)
  s1[index1..<s.index(index1, offsetBy: 2)]
  s2[index2..<s.index(index2, offsetBy: 2)]

I really want the team to consider providing the offset index system as an assistant to the collection. It is the very necessary basic concept of Collection.

Thanks!
Jiannan

在 2017年12月15日,上午2:13,Jordan Rose <jordan_rose@apple.com <mailto:jordan_rose@apple.com>> 写道:

We really don't want to make subscripting a non-O(1) operation. That just provides false convenience and encourages people to do the wrong thing with Strings anyway.

I'm always interested in why people want this kind of ability. Yes, it's nice for teaching programming to be able to split strings on character boundaries indexed by integers, but where does it come up in real life? The most common cases I see are trying to strip off the first or last character, or a known prefix or suffix, and I feel like we should have better answers for those than "use integer indexes" anyway.

Jordan

On Dec 13, 2017, at 22:30, Cao, Jiannan via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

Hi,

I would like to discuss the String.Index problem within Swift. I know the current situation of String.Index is based on the nature of the underlaying data structure of the string.

But could we just make String.Index contain offset information? Or make offset index subscript available for accessing character in String?

for example:
let a = "01234"
print(a[0]) // 0
print(a[0...4]) // 01234
print(a[...]) // 01234
print(a[..<2]) // 01
print(a[...2]) // 012
print(a[2...]) // 234
print(a[2...3]) // 23
print(a[2...2]) // 2
if let number = a.index(of: "1") {
    print(number) // 1
    print(a[number...]) // 1234
}

0 equals to Collection.Index of collection.index(startIndex, offsetBy: 0)
1 equals to Collection.Index of collection.index(startIndex, offsetBy: 1)
...
we keep the String.Index, but allow another kind of index, which is called "offsetIndex" to access the String.Index and the character in the string.
Any Collection could use the offset index to access their element, regarding the real index of it.

I have make the Collection OffsetIndexable protocol available here, and make it more accessible for StringProtocol considering all API related to the index.

GitHub - frogcjn/OffsetIndexableCollection-String-Int-Indexable-: OffsetIndexableCollection (String Int Indexable)

If someone want to make the offset index/range available for any collection, you just need to extend the collection:
extension String : OffsetIndexableCollection {
}

extension Substring : OffsetIndexableCollection {
}

I hope the Swift core team could consider bring the offset index to string, or make it available to other collection, thus let developer to decide whether their collection could use offset indices as an assistant for the real index of the collection.

Thanks!
Jiannan

_______________________________________________
swift-dev mailing list
swift-dev@swift.org <mailto:swift-dev@swift.org>
https://lists.swift.org/mailman/listinfo/swift-dev

_______________________________________________
swift-dev mailing list
swift-dev@swift.org <mailto:swift-dev@swift.org>
https://lists.swift.org/mailman/listinfo/swift-dev

I implemented the second approach: SuperIndex

https://github.com/frogcjn/SuperStringIndex/

SuperString is a special version of String. Its SuperIndex keeps a reference to the string, let the index calculate the offset.

struct SuperIndex : Comparable, Strideable, CustomStringConvertible {

var owner: Substring

var wrapped: String.Index

...

// Offset

var offset: Int {

    return owner.distance(from: owner.startIndex, to: wrapped)

}

// Strideable

func advanced(by n: SuperIndex.Stride) -> SuperIndex {

    return SuperIndex(owner.index(wrapped, offsetBy: n), owner)

}

static  func +(lhs: SuperIndex, rhs: SuperIndex.Stride) -> SuperIndex {

return lhs.advanced(by: rhs)

}

}

let a: SuperString = "01234"
let o = a.startIndex
let o1 = o + 4
print(a[o]) // 0
print(a[...]) // 01234
print(a[..<(o+2)]) // 01
print(a[...(o+2)]) // 012
print(a[(o+2)...]) // 234
print(a[o+2..<o+3]) // 2
print(a[o1-2...o1-1]) // 23

if let number = a.index(of: "1"    ) {
print(number) // 1
    print(a[number...]) // 1234

}
if let number = a.index(where: { $0 > "1" }) {
print(number) // 2

}
let b = a[(o+1)...]
let z = b.startIndex
let z1 = z + 4
print(b[z]) // 1
print(b[...]) // 1234
print(b[..<(z+2)]) // 12
print(b[...(z+2)]) // 123
print(b[(z+2)...]) // 34
print(b[z+2...z+3]) // 34
print(b[z1-2...z1-2]) // 3

在 2017年12月18日,下午4:53,Cao, Jiannan frogcjn@163.com 写道:

Or we can copy the design of std::vector::iterator in C++.The index could keep a reference to the collection.
When the index being offset by + operator, it could call the owner to offset the index, since it keeps a reference to the collection owner.

let startIndex = s.startIndex

s[startIndex+1]

public struct MyIndex<T: Collection> : Comparable where T.Index == MyIndex {

public let owner: T

...

public static func + (lhs: MyIndex, rhs: T.IndexDistance) -> MyIndex {

return lhs.owner.index(lhs, offsetBy: rhs)

}

}

在 2017年12月15日,上午9:34,Michael Ilseman milseman@apple.com 写道:

Yes, I was trying to highlight that they are different and should be treated different. This was because it seemed you were conflating the two in your argument. You claim that people expect it, and I’m pointing out that what people actually expect (assuming they’re coming from C or languages with a similar model) already exists as those models deal in encoded offsets.

More important than expectations surrounding what to provide to a subscript are expectations surrounding algorithmic complexity. This has security implications. The expectation of subscript is that it is “constant-ish”, for a fuzzy hand-wavy definition of “constant-ish” which includes amortized constant or logarithmic.

Now, I agree with the overall sentiment that index(offsetBy:) is unwieldy. I am interested in approaches to improve this. But, we cannot throw linear complexity into subscript without extreme justification.

This offset is unicode offset, is not the offset of element.
For example: index(startIndex, offsetBy:1) is encodedOffset 4 or 8, not 1.

Offset indexable is based on the offset of count of each element/index. it is the same result of s.index(s.startIndex, offsetBy:i)

The encodedOffset is the underlaying offset of unicode string, not the same concept of the offset index of collection.

The offset indexable is meaning to the elements and index of collection (i-th element of the collection), not related to the unicode offset (which is the underlaying data offset meaning to the UTF-16 String).

These two offset is totally different.

Best,

Jiannan

在 2017年12月15日,上午9:17,Michael Ilseman milseman@apple.com 写道:

People used to the offset index system instead of the String.Index. Using offset indices to name the elements, count the elements is normal and nature.

The offset system that you’re referring to is totally available in String today, if you’re willing for it to be the offset into the encoding. That’s the offset “people” you’re referring to are likely used to and consider normal and natural. On String.Index, there is the following:

> > > > init(encodedOffset offset: [Int](https://developer.apple.com/documentation/swift/int))

and

> > > > var encodedOffset: [Int](https://developer.apple.com/documentation/swift/int) { get }

[1] https://developer.apple.com/documentation/swift/string.index

**This offset index system has a long history and a real meaning to the collection. **The subscript s[i] has a fix meaning of "getting the i-th element in this collection", which is normal and direct. Get the range with offset indices, is also direct. It means the substring is from the i-th character up to the j-th character of the original string.

People used to play subscript, range with offset indices. Use string[string.index(i, offsetBy: 5)] is not as directly and easily as string[i + 5]. Also the Range<String.Index> is not as directly as Range. Developers need to transfer the Range<String.Index> result of string.range(of:) to Range to know the exact range of the substring. Range<String.Index> has a real meaning to the machine and underlaying data location for the substring, but Range also has a direct location information for human being, and represents the abstract location concept of the collection (This is the most UNIMPEACHABLE REASON I could provide).

**Offset index system is based on the nature of collection. Each element of the collection could be located by offset, which is a direct and simple conception to any collection. Right? **Even the String with String.Index has some offset index property within it. For example: the count of the String, is the offset index of the endIndex.The enumerated() generated a sequence with elements contains the same offset as the offset index system provided. And when we apply Array(string), the string divided by each character and make the offset indices available for the new array.

The offset index system is just an assistant for collection, not a replacement to String.Index. We use String.Index to represent the normal underlaying of the String. We also could use offset indices to represent the nature of the Collection with its elements. Providing the offset index as a second choice to access elements in collections, is not only for the String struct, is for all collections, since it is the nature of the collection concept, and developer could choose use it or not.

We don't make the String.Index O(1), but translate the offset indices to the underlaying String.Index. Each time using subscript with offset index, we just need to translate offset indices to underlaying indices using c.index(startIndex, offsetBy:i), c.distance(from: startIndex, to:i)

We can make the offset indices available through extension to Collection (as my GitHub repo demo: https://github.com/frogcjn/OffsetIndexableCollection-String-Int-Indexable-).

or we could make it at compile time:
for example

c[1...]

compile to

c[c.index(startIndex, offsetBy:1)...]
let index: Int = s.index(of: "a")

compile to

let index: Int = s.distance(from: s.startIndex, to: s.index(of:"a"))
let index = 1 // if used in s only
s[index..<index+2]

compile to

let index = s.index(s.startIndex, offsetBy: 1)
s[index..<s.index(index, offsetBy: 2)]
let index = 1 // if used both in s1, s2
s1[index..<index+2]
s2[index..<index+2]

compile to

let index = 1
let index1 = s1.index(s.startIndex, offsetBy: index)
let index2 = s2.index(s.startIndex, offsetBy: index)
s1[index1..<s.index(index1, offsetBy: 2)]
s2[index2..<s.index(index2, offsetBy: 2)]

I really want the team to consider providing the offset index system as an assistant to the collection. It is the very necessary basic concept of Collection.

Thanks!

Jiannan

在 2017年12月15日,上午2:13,Jordan Rose jordan_rose@apple.com 写道:

We really don't want to make subscripting a non-O(1) operation. That just provides false convenience and encourages people to do the wrong thing with Strings anyway.

I'm always interested in why people want this kind of ability. Yes, it's nice for teaching programming to be able to split strings on character boundaries indexed by integers, but where does it come up in real life? The most common cases I see are trying to strip off the first or last character, or a known prefix or suffix, and I feel like we should have better answers for those than "use integer indexes" anyway.

Jordan

Hi,

I would like to discuss the String.Index problem within Swift. I know the current situation of String.Index is based on the nature of the underlaying data structure of the string.

But could we just make String.Index contain offset information? Or make offset index subscript available for accessing character in String?

for example:

let a = "01234"
print(a[0]) // 0
print(a[0...4]) // 01234
print(a[...]) // 01234
print(a[..<2]) // 01
print(a[...2]) // 012
print(a[2...]) // 234
print(a[2...3]) // 23
print(a[2...2]) // 2
if let number = a.index(of: "1") {
print(number) // 1
print(a[number...]) // 1234
}

0 equals to Collection.Index of collection.index(startIndex, offsetBy: 0)

1 equals to Collection.Index of collection.index(startIndex, offsetBy: 1)

...

we keep the String.Index, but allow another kind of index, which is called "offsetIndex" to access the String.Index and the character in the string.

Any Collection could use the offset index to access their element, regarding the real index of it.

I have make the Collection OffsetIndexable protocol available here, and make it more accessible for StringProtocol considering all API related to the index.

https://github.com/frogcjn/OffsetIndexableCollection-String-Int-Indexable-

If someone want to make the offset index/range available for any collection, you just need to extend the collection:

> > > > > > > extension String : OffsetIndexableCollection
> > > > > > > {
> > > > > > > }
> > > > > > > extension Substring : OffsetIndexableCollection {
> > > > > > > }
···

On Dec 14, 2017, at 5:25 PM, Cao, Jiannan frogcjn@163.com wrote:

On Dec 14, 2017, at 4:49 PM, Cao, Jiannan via swift-dev swift-dev@swift.org wrote:

On Dec 13, 2017, at 22:30, Cao, Jiannan via swift-dev swift-dev@swift.org wrote:

I hope the Swift core team could consider bring the offset index to string, or make it available to other collection, thus let developer to decide whether their collection could use offset indices as an assistant for the real index of the collection.

Thanks!

Jiannan


swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev


swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

I implemented the second approach: SuperIndex

SuperString is a special version of String. Its SuperIndex keeps a reference to the string, let the index calculate the offset.

struct SuperIndex : Comparable, Strideable, CustomStringConvertible {
    
    var owner: Substring
    var wrapped: String.Index
   
  ...

    // Offset
    var offset: Int {
        return owner.distance(from: owner.startIndex, to: wrapped)
    }

    // Strideable
    func advanced(by n: SuperIndex.Stride) -> SuperIndex {
        return SuperIndex(owner.index(wrapped, offsetBy: n), owner)
    }

    static func +(lhs: SuperIndex, rhs: SuperIndex.Stride) -> SuperIndex {
        return lhs.advanced(by: rhs)
    }
}

let a: SuperString = "01234"
let o = a.startIndex
let o1 = o + 4
print(a[o]) // 0
print(a[...]) // 01234
print(a[..<(o+2)]) // 01
print(a[...(o+2)]) // 012
print(a[(o+2)...]) // 234
print(a[o+2..<o+3]) // 2
print(a[o1-2...o1-1]) // 23

if let number = a.index(of: "1") {
    print(number) // 1
    print(a[number...]) // 1234
}

if let number = a.index(where: { $0 > "1" }) {
    print(number) // 2
}

let b = a[(o+1)...]
let z = b.startIndex
let z1 = z + 4
print(b[z]) // 1
print(b[...]) // 1234
print(b[..<(z+2)]) // 12
print(b[...(z+2)]) // 123
print(b[(z+2)...]) // 34
print(b[z+2...z+3]) // 34
print(b[z1-2...z1-2]) // 3

···

在 2017年12月18日,下午4:53,Cao, Jiannan <frogcjn@163.com> 写道:

Or we can copy the design of std::vector::iterator in C++.The index could keep a reference to the collection.
When the index being offset by + operator, it could call the owner to offset the index, since it keeps a reference to the collection owner.

let startIndex = s.startIndex
s[startIndex+1]

public struct MyIndex<T: Collection> : Comparable where T.Index == MyIndex {
    public let owner: T
...
    public static func + (lhs: MyIndex, rhs: T.IndexDistance) -> MyIndex {
        return lhs.owner.index(lhs, offsetBy: rhs)
    }
}

在 2017年12月15日,上午9:34,Michael Ilseman <milseman@apple.com <mailto:milseman@apple.com>> 写道:

Yes, I was trying to highlight that they are different and should be treated different. This was because it seemed you were conflating the two in your argument. You claim that people expect it, and I’m pointing out that what people actually expect (assuming they’re coming from C or languages with a similar model) already exists as those models deal in encoded offsets.

More important than expectations surrounding what to provide to a subscript are expectations surrounding algorithmic complexity. This has security implications. The expectation of subscript is that it is “constant-ish”, for a fuzzy hand-wavy definition of “constant-ish” which includes amortized constant or logarithmic.

Now, I agree with the overall sentiment that `index(offsetBy:)` is unwieldy. I am interested in approaches to improve this. But, we cannot throw linear complexity into subscript without extreme justification.

On Dec 14, 2017, at 5:25 PM, Cao, Jiannan <frogcjn@163.com <mailto:frogcjn@163.com>> wrote:

This offset is unicode offset, is not the offset of element.
For example: index(startIndex, offsetBy:1) is encodedOffset 4 or 8, not 1.

Offset indexable is based on the offset of count of each element/index. it is the same result of s.index(s.startIndex, offsetBy:i)
The encodedOffset is the underlaying offset of unicode string, not the same concept of the offset index of collection.

The offset indexable is meaning to the elements and index of collection (i-th element of the collection), not related to the unicode offset (which is the underlaying data offset meaning to the UTF-16 String).

These two offset is totally different.

Best,
Jiannan

在 2017年12月15日,上午9:17,Michael Ilseman <milseman@apple.com <mailto:milseman@apple.com>> 写道:

On Dec 14, 2017, at 4:49 PM, Cao, Jiannan via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

People used to the offset index system instead of the String.Index. Using offset indices to name the elements, count the elements is normal and nature.

The offset system that you’re referring to is totally available in String today, if you’re willing for it to be the offset into the encoding. That’s the offset “people” you’re referring to are likely used to and consider normal and natural. On String.Index, there is the following:

init(encodedOffset offset: Int <https://developer.apple.com/documentation/swift/int&gt;\)

and

var encodedOffset: Int <https://developer.apple.com/documentation/swift/int&gt; { get }

[1] String.Index | Apple Developer Documentation

This offset index system has a long history and a real meaning to the collection. The subscript s[i] has a fix meaning of "getting the i-th element in this collection", which is normal and direct. Get the range with offset indices, is also direct. It means the substring is from the i-th character up to the j-th character of the original string.

People used to play subscript, range with offset indices. Use string[string.index(i, offsetBy: 5)] is not as directly and easily as string[i + 5]. Also the Range<String.Index> is not as directly as Range<Offset>. Developers need to transfer the Range<String.Index> result of string.range(of:) to Range<OffsetIndex> to know the exact range of the substring. Range<String.Index> has a real meaning to the machine and underlaying data location for the substring, but Range<OffsetIndex> also has a direct location information for human being, and represents the abstract location concept of the collection (This is the most UNIMPEACHABLE REASON I could provide).

Offset index system is based on the nature of collection. Each element of the collection could be located by offset, which is a direct and simple conception to any collection. Right? Even the String with String.Index has some offset index property within it. For example: the `count` of the String, is the offset index of the endIndex.The enumerated() generated a sequence with elements contains the same offset as the offset index system provided. And when we apply Array(string), the string divided by each character and make the offset indices available for the new array.

The offset index system is just an assistant for collection, not a replacement to String.Index. We use String.Index to represent the normal underlaying of the String. We also could use offset indices to represent the nature of the Collection with its elements. Providing the offset index as a second choice to access elements in collections, is not only for the String struct, is for all collections, since it is the nature of the collection concept, and developer could choose use it or not.

We don't make the String.Index O(1), but translate the offset indices to the underlaying String.Index. Each time using subscript with offset index, we just need to translate offset indices to underlaying indices using c.index(startIndex, offsetBy:i), c.distance(from: startIndex, to:i)

We can make the offset indices available through extension to Collection (as my GitHub repo demo: https://github.com/frogcjn/OffsetIndexableCollection-String-Int-Indexable-\).

or we could make it at compile time:
for example

  c[1...]
compile to
  c[c.index(startIndex, offsetBy:1)...]

  let index: Int = s.index(of: "a")
compile to
  let index: Int = s.distance(from: s.startIndex, to: s.index(of:"a"))

  let index = 1 // if used in s only
  s[index..<index+2]
compile to
  let index = s.index(s.startIndex, offsetBy: 1)
  s[index..<s.index(index, offsetBy: 2)]

  let index = 1 // if used both in s1, s2
  s1[index..<index+2]
  s2[index..<index+2]
compile to
  let index = 1
  let index1 = s1.index(s.startIndex, offsetBy: index)
  let index2 = s2.index(s.startIndex, offsetBy: index)
  s1[index1..<s.index(index1, offsetBy: 2)]
  s2[index2..<s.index(index2, offsetBy: 2)]

I really want the team to consider providing the offset index system as an assistant to the collection. It is the very necessary basic concept of Collection.

Thanks!
Jiannan

在 2017年12月15日,上午2:13,Jordan Rose <jordan_rose@apple.com <mailto:jordan_rose@apple.com>> 写道:

We really don't want to make subscripting a non-O(1) operation. That just provides false convenience and encourages people to do the wrong thing with Strings anyway.

I'm always interested in why people want this kind of ability. Yes, it's nice for teaching programming to be able to split strings on character boundaries indexed by integers, but where does it come up in real life? The most common cases I see are trying to strip off the first or last character, or a known prefix or suffix, and I feel like we should have better answers for those than "use integer indexes" anyway.

Jordan

On Dec 13, 2017, at 22:30, Cao, Jiannan via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

Hi,

I would like to discuss the String.Index problem within Swift. I know the current situation of String.Index is based on the nature of the underlaying data structure of the string.

But could we just make String.Index contain offset information? Or make offset index subscript available for accessing character in String?

for example:
let a = "01234"
print(a[0]) // 0
print(a[0...4]) // 01234
print(a[...]) // 01234
print(a[..<2]) // 01
print(a[...2]) // 012
print(a[2...]) // 234
print(a[2...3]) // 23
print(a[2...2]) // 2
if let number = a.index(of: "1") {
    print(number) // 1
    print(a[number...]) // 1234
}

0 equals to Collection.Index of collection.index(startIndex, offsetBy: 0)
1 equals to Collection.Index of collection.index(startIndex, offsetBy: 1)
...
we keep the String.Index, but allow another kind of index, which is called "offsetIndex" to access the String.Index and the character in the string.
Any Collection could use the offset index to access their element, regarding the real index of it.

I have make the Collection OffsetIndexable protocol available here, and make it more accessible for StringProtocol considering all API related to the index.

GitHub - frogcjn/OffsetIndexableCollection-String-Int-Indexable-: OffsetIndexableCollection (String Int Indexable)

If someone want to make the offset index/range available for any collection, you just need to extend the collection:
extension String : OffsetIndexableCollection {
}

extension Substring : OffsetIndexableCollection {
}

I hope the Swift core team could consider bring the offset index to string, or make it available to other collection, thus let developer to decide whether their collection could use offset indices as an assistant for the real index of the collection.

Thanks!
Jiannan

_______________________________________________
swift-dev mailing list
swift-dev@swift.org <mailto:swift-dev@swift.org>
https://lists.swift.org/mailman/listinfo/swift-dev

_______________________________________________
swift-dev mailing list
swift-dev@swift.org <mailto:swift-dev@swift.org>
https://lists.swift.org/mailman/listinfo/swift-dev

Swift used to do this, but we switched it around so indexes couldn’t self-increment.

One of the problems was that strings are value-types. So you would get an index, then append stuff to the string, but when you tried to advance the index again it would blow up. The index retained the backing, which means the “append” caused a copy, and the index was suddenly pointing to a different String backing.

Basically, self-incrementing indexes require that the Collection has reference semantics. Otherwise there simply is no concept of an independent “owning” Collection which your Index can hold a reference to.

Anyway, that doesn’t mean you’re wrong. Collection-slicing syntax is still way too ugly. We need to keep it safe, and communicative, but it should also be obvious and not tiring.

Currently, you have to write:

<collection>[<collection>.index(<collection>.<member>, offsetBy: <distance>)]

And an example...

results[results.index(results.startIndex, offsetBy: 3)]

Which is safe, and communicative, and obvious, but also really, really tiring. There are ways we can make it less tiring without sacrificing the good parts:

1) Add a version of index(_: offsetBy:) which takes a KeyPath<Self, Self.Index> as its first argument. That’s a minor convenience you can add today in your own projects. It removes one repetition of <collection>, in many common cases.

extension Collection {
  func index(_ i: KeyPath<Self, Index>, offsetBy n: IndexDistance) -> Index {
    return index(self[keyPath: i], offsetBy: n)
  }
  func index(_ i: KeyPath<Self, Index>, offsetBy n: IndexDistance, limitedBy: Index) -> Index? {
    return index(self[keyPath: i], offsetBy: n, limitedBy: limitedBy)
  }
}

results[results.index(\.startIndex, offsetBy: 3)]

Seriously, man, KeyPaths are just the business. I love them.

2) Bind <collection> to something like an anonymous closure argument within the subscript. Or just allow “.” syntax, as for static members. That removes another <collection>.

results[.index(\.startIndex, offsetBy: 3)]

or

results[$.index(\.startIndex, offsetBy: 3)]

If anybody’s interested, I was playing around with an “IndexExpression” type for this kind of thing. The language lets you get pretty far, but it doesn’t work and I can’t figure out why. It looks like a simple-enough generic struct, but it fails with a cyclic metadata dependency.

- Karl

···

On 19. Dec 2017, at 08:38, Cao, Jiannan via swift-dev <swift-dev@swift.org> wrote:

I implemented the second approach: SuperIndex

GitHub - frogcjn/SuperStringIndex: StringIndex with reference of string to calculate the offset

SuperString is a special version of String. Its SuperIndex keeps a reference to the string, let the index calculate the offset.

struct SuperIndex : Comparable, Strideable, CustomStringConvertible {
    
    var owner: Substring
    var wrapped: String.Index
   
  ...

    // Offset
    var offset: Int {
        return owner.distance(from: owner.startIndex, to: wrapped)
    }

    // Strideable
    func advanced(by n: SuperIndex.Stride) -> SuperIndex {
        return SuperIndex(owner.index(wrapped, offsetBy: n), owner)
    }

    static func +(lhs: SuperIndex, rhs: SuperIndex.Stride) -> SuperIndex {
        return lhs.advanced(by: rhs)
    }
}

let a: SuperString = "01234"
let o = a.startIndex
let o1 = o + 4
print(a[o]) // 0
print(a[...]) // 01234
print(a[..<(o+2)]) // 01
print(a[...(o+2)]) // 012
print(a[(o+2)...]) // 234
print(a[o+2..<o+3]) // 2
print(a[o1-2...o1-1]) // 23

if let number = a.index(of: "1") {
    print(number) // 1
    print(a[number...]) // 1234
}

if let number = a.index(where: { $0 > "1" }) {
    print(number) // 2
}

let b = a[(o+1)...]
let z = b.startIndex
let z1 = z + 4
print(b[z]) // 1
print(b[...]) // 1234
print(b[..<(z+2)]) // 12
print(b[...(z+2)]) // 123
print(b[(z+2)...]) // 34
print(b[z+2...z+3]) // 34
print(b[z1-2...z1-2]) // 3

在 2017年12月18日,下午4:53,Cao, Jiannan <frogcjn@163.com <mailto:frogcjn@163.com>> 写道:

Or we can copy the design of std::vector::iterator in C++.The index could keep a reference to the collection.
When the index being offset by + operator, it could call the owner to offset the index, since it keeps a reference to the collection owner.

let startIndex = s.startIndex
s[startIndex+1]

public struct MyIndex<T: Collection> : Comparable where T.Index == MyIndex {
    public let owner: T
...
    public static func + (lhs: MyIndex, rhs: T.IndexDistance) -> MyIndex {
        return lhs.owner.index(lhs, offsetBy: rhs)
    }
}

在 2017年12月15日,上午9:34,Michael Ilseman <milseman@apple.com <mailto:milseman@apple.com>> 写道:

Yes, I was trying to highlight that they are different and should be treated different. This was because it seemed you were conflating the two in your argument. You claim that people expect it, and I’m pointing out that what people actually expect (assuming they’re coming from C or languages with a similar model) already exists as those models deal in encoded offsets.

More important than expectations surrounding what to provide to a subscript are expectations surrounding algorithmic complexity. This has security implications. The expectation of subscript is that it is “constant-ish”, for a fuzzy hand-wavy definition of “constant-ish” which includes amortized constant or logarithmic.

Now, I agree with the overall sentiment that `index(offsetBy:)` is unwieldy. I am interested in approaches to improve this. But, we cannot throw linear complexity into subscript without extreme justification.

On Dec 14, 2017, at 5:25 PM, Cao, Jiannan <frogcjn@163.com <mailto:frogcjn@163.com>> wrote:

This offset is unicode offset, is not the offset of element.
For example: index(startIndex, offsetBy:1) is encodedOffset 4 or 8, not 1.

Offset indexable is based on the offset of count of each element/index. it is the same result of s.index(s.startIndex, offsetBy:i)
The encodedOffset is the underlaying offset of unicode string, not the same concept of the offset index of collection.

The offset indexable is meaning to the elements and index of collection (i-th element of the collection), not related to the unicode offset (which is the underlaying data offset meaning to the UTF-16 String).

These two offset is totally different.

Best,
Jiannan

在 2017年12月15日,上午9:17,Michael Ilseman <milseman@apple.com <mailto:milseman@apple.com>> 写道:

On Dec 14, 2017, at 4:49 PM, Cao, Jiannan via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

People used to the offset index system instead of the String.Index. Using offset indices to name the elements, count the elements is normal and nature.

The offset system that you’re referring to is totally available in String today, if you’re willing for it to be the offset into the encoding. That’s the offset “people” you’re referring to are likely used to and consider normal and natural. On String.Index, there is the following:

init(encodedOffset offset: Int <https://developer.apple.com/documentation/swift/int&gt;\)

and

var encodedOffset: Int <https://developer.apple.com/documentation/swift/int&gt; { get }

[1] String.Index | Apple Developer Documentation

This offset index system has a long history and a real meaning to the collection. The subscript s[i] has a fix meaning of "getting the i-th element in this collection", which is normal and direct. Get the range with offset indices, is also direct. It means the substring is from the i-th character up to the j-th character of the original string.

People used to play subscript, range with offset indices. Use string[string.index(i, offsetBy: 5)] is not as directly and easily as string[i + 5]. Also the Range<String.Index> is not as directly as Range<Offset>. Developers need to transfer the Range<String.Index> result of string.range(of:) to Range<OffsetIndex> to know the exact range of the substring. Range<String.Index> has a real meaning to the machine and underlaying data location for the substring, but Range<OffsetIndex> also has a direct location information for human being, and represents the abstract location concept of the collection (This is the most UNIMPEACHABLE REASON I could provide).

Offset index system is based on the nature of collection. Each element of the collection could be located by offset, which is a direct and simple conception to any collection. Right? Even the String with String.Index has some offset index property within it. For example: the `count` of the String, is the offset index of the endIndex.The enumerated() generated a sequence with elements contains the same offset as the offset index system provided. And when we apply Array(string), the string divided by each character and make the offset indices available for the new array.

The offset index system is just an assistant for collection, not a replacement to String.Index. We use String.Index to represent the normal underlaying of the String. We also could use offset indices to represent the nature of the Collection with its elements. Providing the offset index as a second choice to access elements in collections, is not only for the String struct, is for all collections, since it is the nature of the collection concept, and developer could choose use it or not.

We don't make the String.Index O(1), but translate the offset indices to the underlaying String.Index. Each time using subscript with offset index, we just need to translate offset indices to underlaying indices using c.index(startIndex, offsetBy:i), c.distance(from: startIndex, to:i)

We can make the offset indices available through extension to Collection (as my GitHub repo demo: https://github.com/frogcjn/OffsetIndexableCollection-String-Int-Indexable-\).

or we could make it at compile time:
for example

  c[1...]
compile to
  c[c.index(startIndex, offsetBy:1)...]

  let index: Int = s.index(of: "a")
compile to
  let index: Int = s.distance(from: s.startIndex, to: s.index(of:"a"))

  let index = 1 // if used in s only
  s[index..<index+2]
compile to
  let index = s.index(s.startIndex, offsetBy: 1)
  s[index..<s.index(index, offsetBy: 2)]

  let index = 1 // if used both in s1, s2
  s1[index..<index+2]
  s2[index..<index+2]
compile to
  let index = 1
  let index1 = s1.index(s.startIndex, offsetBy: index)
  let index2 = s2.index(s.startIndex, offsetBy: index)
  s1[index1..<s.index(index1, offsetBy: 2)]
  s2[index2..<s.index(index2, offsetBy: 2)]

I really want the team to consider providing the offset index system as an assistant to the collection. It is the very necessary basic concept of Collection.

Thanks!
Jiannan

在 2017年12月15日,上午2:13,Jordan Rose <jordan_rose@apple.com <mailto:jordan_rose@apple.com>> 写道:

We really don't want to make subscripting a non-O(1) operation. That just provides false convenience and encourages people to do the wrong thing with Strings anyway.

I'm always interested in why people want this kind of ability. Yes, it's nice for teaching programming to be able to split strings on character boundaries indexed by integers, but where does it come up in real life? The most common cases I see are trying to strip off the first or last character, or a known prefix or suffix, and I feel like we should have better answers for those than "use integer indexes" anyway.

Jordan

On Dec 13, 2017, at 22:30, Cao, Jiannan via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

Hi,

I would like to discuss the String.Index problem within Swift. I know the current situation of String.Index is based on the nature of the underlaying data structure of the string.

But could we just make String.Index contain offset information? Or make offset index subscript available for accessing character in String?

for example:
let a = "01234"
print(a[0]) // 0
print(a[0...4]) // 01234
print(a[...]) // 01234
print(a[..<2]) // 01
print(a[...2]) // 012
print(a[2...]) // 234
print(a[2...3]) // 23
print(a[2...2]) // 2
if let number = a.index(of: "1") {
    print(number) // 1
    print(a[number...]) // 1234
}

0 equals to Collection.Index of collection.index(startIndex, offsetBy: 0)
1 equals to Collection.Index of collection.index(startIndex, offsetBy: 1)
...
we keep the String.Index, but allow another kind of index, which is called "offsetIndex" to access the String.Index and the character in the string.
Any Collection could use the offset index to access their element, regarding the real index of it.

I have make the Collection OffsetIndexable protocol available here, and make it more accessible for StringProtocol considering all API related to the index.

GitHub - frogcjn/OffsetIndexableCollection-String-Int-Indexable-: OffsetIndexableCollection (String Int Indexable)

If someone want to make the offset index/range available for any collection, you just need to extend the collection:
extension String : OffsetIndexableCollection {
}

extension Substring : OffsetIndexableCollection {
}

I hope the Swift core team could consider bring the offset index to string, or make it available to other collection, thus let developer to decide whether their collection could use offset indices as an assistant for the real index of the collection.

Thanks!
Jiannan

_______________________________________________
swift-dev mailing list
swift-dev@swift.org <mailto:swift-dev@swift.org>
https://lists.swift.org/mailman/listinfo/swift-dev

_______________________________________________
swift-dev mailing list
swift-dev@swift.org <mailto:swift-dev@swift.org>
https://lists.swift.org/mailman/listinfo/swift-dev

_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

And really, this is more an issue for swift-evolution, since what you’re talking about (self-incrementing indexes) would be a new language feature.

- Karl

···

On 3. Jan 2018, at 01:19, Karl Wagner <razielim@gmail.com> wrote:

Swift used to do this, but we switched it around so indexes couldn’t self-increment.

One of the problems was that strings are value-types. So you would get an index, then append stuff to the string, but when you tried to advance the index again it would blow up. The index retained the backing, which means the “append” caused a copy, and the index was suddenly pointing to a different String backing.

Basically, self-incrementing indexes require that the Collection has reference semantics. Otherwise there simply is no concept of an independent “owning” Collection which your Index can hold a reference to.

Anyway, that doesn’t mean you’re wrong. Collection-slicing syntax is still way too ugly. We need to keep it safe, and communicative, but it should also be obvious and not tiring.

Currently, you have to write:

<collection>[<collection>.index(<collection>.<member>, offsetBy: <distance>)]

And an example...

results[results.index(results.startIndex, offsetBy: 3)]

Which is safe, and communicative, and obvious, but also really, really tiring. There are ways we can make it less tiring without sacrificing the good parts:

1) Add a version of index(_: offsetBy:) which takes a KeyPath<Self, Self.Index> as its first argument. That’s a minor convenience you can add today in your own projects. It removes one repetition of <collection>, in many common cases.

extension Collection {
  func index(_ i: KeyPath<Self, Index>, offsetBy n: IndexDistance) -> Index {
    return index(self[keyPath: i], offsetBy: n)
  }
  func index(_ i: KeyPath<Self, Index>, offsetBy n: IndexDistance, limitedBy: Index) -> Index? {
    return index(self[keyPath: i], offsetBy: n, limitedBy: limitedBy)
  }
}

results[results.index(\.startIndex, offsetBy: 3)]

Seriously, man, KeyPaths are just the business. I love them.

2) Bind <collection> to something like an anonymous closure argument within the subscript. Or just allow “.” syntax, as for static members. That removes another <collection>.

results[.index(\.startIndex, offsetBy: 3)]

or

results[$.index(\.startIndex, offsetBy: 3)]

If anybody’s interested, I was playing around with an “IndexExpression” type for this kind of thing. The language lets you get pretty far, but it doesn’t work and I can’t figure out why. It looks like a simple-enough generic struct, but it fails with a cyclic metadata dependency.

index expressions for collections · GitHub

- Karl

On 19. Dec 2017, at 08:38, Cao, Jiannan via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

I implemented the second approach: SuperIndex

GitHub - frogcjn/SuperStringIndex: StringIndex with reference of string to calculate the offset

SuperString is a special version of String. Its SuperIndex keeps a reference to the string, let the index calculate the offset.

struct SuperIndex : Comparable, Strideable, CustomStringConvertible {
    
    var owner: Substring
    var wrapped: String.Index
   
  ...

    // Offset
    var offset: Int {
        return owner.distance(from: owner.startIndex, to: wrapped)
    }

    // Strideable
    func advanced(by n: SuperIndex.Stride) -> SuperIndex {
        return SuperIndex(owner.index(wrapped, offsetBy: n), owner)
    }

    static func +(lhs: SuperIndex, rhs: SuperIndex.Stride) -> SuperIndex {
        return lhs.advanced(by: rhs)
    }
}

let a: SuperString = "01234"
let o = a.startIndex
let o1 = o + 4
print(a[o]) // 0
print(a[...]) // 01234
print(a[..<(o+2)]) // 01
print(a[...(o+2)]) // 012
print(a[(o+2)...]) // 234
print(a[o+2..<o+3]) // 2
print(a[o1-2...o1-1]) // 23

if let number = a.index(of: "1") {
    print(number) // 1
    print(a[number...]) // 1234
}

if let number = a.index(where: { $0 > "1" }) {
    print(number) // 2
}

let b = a[(o+1)...]
let z = b.startIndex
let z1 = z + 4
print(b[z]) // 1
print(b[...]) // 1234
print(b[..<(z+2)]) // 12
print(b[...(z+2)]) // 123
print(b[(z+2)...]) // 34
print(b[z+2...z+3]) // 34
print(b[z1-2...z1-2]) // 3

在 2017年12月18日,下午4:53,Cao, Jiannan <frogcjn@163.com <mailto:frogcjn@163.com>> 写道:

Or we can copy the design of std::vector::iterator in C++.The index could keep a reference to the collection.
When the index being offset by + operator, it could call the owner to offset the index, since it keeps a reference to the collection owner.

let startIndex = s.startIndex
s[startIndex+1]

public struct MyIndex<T: Collection> : Comparable where T.Index == MyIndex {
    public let owner: T
...
    public static func + (lhs: MyIndex, rhs: T.IndexDistance) -> MyIndex {
        return lhs.owner.index(lhs, offsetBy: rhs)
    }
}

在 2017年12月15日,上午9:34,Michael Ilseman <milseman@apple.com <mailto:milseman@apple.com>> 写道:

Yes, I was trying to highlight that they are different and should be treated different. This was because it seemed you were conflating the two in your argument. You claim that people expect it, and I’m pointing out that what people actually expect (assuming they’re coming from C or languages with a similar model) already exists as those models deal in encoded offsets.

More important than expectations surrounding what to provide to a subscript are expectations surrounding algorithmic complexity. This has security implications. The expectation of subscript is that it is “constant-ish”, for a fuzzy hand-wavy definition of “constant-ish” which includes amortized constant or logarithmic.

Now, I agree with the overall sentiment that `index(offsetBy:)` is unwieldy. I am interested in approaches to improve this. But, we cannot throw linear complexity into subscript without extreme justification.

On Dec 14, 2017, at 5:25 PM, Cao, Jiannan <frogcjn@163.com <mailto:frogcjn@163.com>> wrote:

This offset is unicode offset, is not the offset of element.
For example: index(startIndex, offsetBy:1) is encodedOffset 4 or 8, not 1.

Offset indexable is based on the offset of count of each element/index. it is the same result of s.index(s.startIndex, offsetBy:i)
The encodedOffset is the underlaying offset of unicode string, not the same concept of the offset index of collection.

The offset indexable is meaning to the elements and index of collection (i-th element of the collection), not related to the unicode offset (which is the underlaying data offset meaning to the UTF-16 String).

These two offset is totally different.

Best,
Jiannan

在 2017年12月15日,上午9:17,Michael Ilseman <milseman@apple.com <mailto:milseman@apple.com>> 写道:

On Dec 14, 2017, at 4:49 PM, Cao, Jiannan via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

People used to the offset index system instead of the String.Index. Using offset indices to name the elements, count the elements is normal and nature.

The offset system that you’re referring to is totally available in String today, if you’re willing for it to be the offset into the encoding. That’s the offset “people” you’re referring to are likely used to and consider normal and natural. On String.Index, there is the following:

init(encodedOffset offset: Int <https://developer.apple.com/documentation/swift/int&gt;\)

and

var encodedOffset: Int <https://developer.apple.com/documentation/swift/int&gt; { get }

[1] String.Index | Apple Developer Documentation

This offset index system has a long history and a real meaning to the collection. The subscript s[i] has a fix meaning of "getting the i-th element in this collection", which is normal and direct. Get the range with offset indices, is also direct. It means the substring is from the i-th character up to the j-th character of the original string.

People used to play subscript, range with offset indices. Use string[string.index(i, offsetBy: 5)] is not as directly and easily as string[i + 5]. Also the Range<String.Index> is not as directly as Range<Offset>. Developers need to transfer the Range<String.Index> result of string.range(of:) to Range<OffsetIndex> to know the exact range of the substring. Range<String.Index> has a real meaning to the machine and underlaying data location for the substring, but Range<OffsetIndex> also has a direct location information for human being, and represents the abstract location concept of the collection (This is the most UNIMPEACHABLE REASON I could provide).

Offset index system is based on the nature of collection. Each element of the collection could be located by offset, which is a direct and simple conception to any collection. Right? Even the String with String.Index has some offset index property within it. For example: the `count` of the String, is the offset index of the endIndex.The enumerated() generated a sequence with elements contains the same offset as the offset index system provided. And when we apply Array(string), the string divided by each character and make the offset indices available for the new array.

The offset index system is just an assistant for collection, not a replacement to String.Index. We use String.Index to represent the normal underlaying of the String. We also could use offset indices to represent the nature of the Collection with its elements. Providing the offset index as a second choice to access elements in collections, is not only for the String struct, is for all collections, since it is the nature of the collection concept, and developer could choose use it or not.

We don't make the String.Index O(1), but translate the offset indices to the underlaying String.Index. Each time using subscript with offset index, we just need to translate offset indices to underlaying indices using c.index(startIndex, offsetBy:i), c.distance(from: startIndex, to:i)

We can make the offset indices available through extension to Collection (as my GitHub repo demo: https://github.com/frogcjn/OffsetIndexableCollection-String-Int-Indexable-\).

or we could make it at compile time:
for example

  c[1...]
compile to
  c[c.index(startIndex, offsetBy:1)...]

  let index: Int = s.index(of: "a")
compile to
  let index: Int = s.distance(from: s.startIndex, to: s.index(of:"a"))

  let index = 1 // if used in s only
  s[index..<index+2]
compile to
  let index = s.index(s.startIndex, offsetBy: 1)
  s[index..<s.index(index, offsetBy: 2)]

  let index = 1 // if used both in s1, s2
  s1[index..<index+2]
  s2[index..<index+2]
compile to
  let index = 1
  let index1 = s1.index(s.startIndex, offsetBy: index)
  let index2 = s2.index(s.startIndex, offsetBy: index)
  s1[index1..<s.index(index1, offsetBy: 2)]
  s2[index2..<s.index(index2, offsetBy: 2)]

I really want the team to consider providing the offset index system as an assistant to the collection. It is the very necessary basic concept of Collection.

Thanks!
Jiannan

在 2017年12月15日,上午2:13,Jordan Rose <jordan_rose@apple.com <mailto:jordan_rose@apple.com>> 写道:

We really don't want to make subscripting a non-O(1) operation. That just provides false convenience and encourages people to do the wrong thing with Strings anyway.

I'm always interested in why people want this kind of ability. Yes, it's nice for teaching programming to be able to split strings on character boundaries indexed by integers, but where does it come up in real life? The most common cases I see are trying to strip off the first or last character, or a known prefix or suffix, and I feel like we should have better answers for those than "use integer indexes" anyway.

Jordan

On Dec 13, 2017, at 22:30, Cao, Jiannan via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

Hi,

I would like to discuss the String.Index problem within Swift. I know the current situation of String.Index is based on the nature of the underlaying data structure of the string.

But could we just make String.Index contain offset information? Or make offset index subscript available for accessing character in String?

for example:
let a = "01234"
print(a[0]) // 0
print(a[0...4]) // 01234
print(a[...]) // 01234
print(a[..<2]) // 01
print(a[...2]) // 012
print(a[2...]) // 234
print(a[2...3]) // 23
print(a[2...2]) // 2
if let number = a.index(of: "1") {
    print(number) // 1
    print(a[number...]) // 1234
}

0 equals to Collection.Index of collection.index(startIndex, offsetBy: 0)
1 equals to Collection.Index of collection.index(startIndex, offsetBy: 1)
...
we keep the String.Index, but allow another kind of index, which is called "offsetIndex" to access the String.Index and the character in the string.
Any Collection could use the offset index to access their element, regarding the real index of it.

I have make the Collection OffsetIndexable protocol available here, and make it more accessible for StringProtocol considering all API related to the index.

GitHub - frogcjn/OffsetIndexableCollection-String-Int-Indexable-: OffsetIndexableCollection (String Int Indexable)

If someone want to make the offset index/range available for any collection, you just need to extend the collection:
extension String : OffsetIndexableCollection {
}

extension Substring : OffsetIndexableCollection {
}

I hope the Swift core team could consider bring the offset index to string, or make it available to other collection, thus let developer to decide whether their collection could use offset indices as an assistant for the real index of the collection.

Thanks!
Jiannan

_______________________________________________
swift-dev mailing list
swift-dev@swift.org <mailto:swift-dev@swift.org>
https://lists.swift.org/mailman/listinfo/swift-dev

_______________________________________________
swift-dev mailing list
swift-dev@swift.org <mailto:swift-dev@swift.org>
https://lists.swift.org/mailman/listinfo/swift-dev

_______________________________________________
swift-dev mailing list
swift-dev@swift.org <mailto:swift-dev@swift.org>
https://lists.swift.org/mailman/listinfo/swift-dev