[Pitch] Changing the behavior of Subsequences of String Views

Hi swift-evolution :blush:

String鈥檚 Views have a few odd properties that have bothered me for a while. I initially did not bring it up because I thought a String redesign was coming. But since Swift 3 will be released very soon鈥攁nd with the recent focus on breaking changes鈥擨 thought now might be a good time to talk about it.

## Subsequences of UTF16View and CharacterView don鈥檛 use the same indices as the original collection

One requirement of the Collection protocol is

public subscript(bounds: Range<Self.Index>) -> Self.SubSequence { get }

whose documentation says:

/// Accesses a contiguous subrange of the collection's elements.
///
/// The accessed slice uses the same indices for the same elements as the
/// original collection uses.

However, it appears that UTF16View and CharacterView don鈥檛 follow the documentation. For example:

let str = "Hello World!".utf16
let (start, end) = (str.index(str.startIndex, offsetBy: 2), str.index(str.startIndex, offsetBy: 9))

let sub1 = str[start ..< end]
print(sub1) // llo Wor

let sub2 = str[sub1.startIndex ..< sub1.endIndex]
print(sub2) // Hello W

Here, using `sub1`鈥檚 indices on the original collection `str` returns a completely different subsequence.
I think that, ideally, `sub2` should be equal to `sub1`, just like when using UTF8View and UnicodeScalarView.

## Accessing elements past the end of the subsequence

Consider this piece of code:

let str = "Hello World!".utf8
let (start, end) = (str.index(str.startIndex, offsetBy: 2), str.index(str.startIndex, offsetBy: 9))

let sub1 = str[start ..< end]
print(sub1) // llo Wor

let pastEnd = sub1.index(sub1.endIndex, offsetBy: 2)

let sub2 = sub1[sub1.startIndex ..< pastEnd]
print(sub2) // llo World

I was able to access elements of the original string that should be beyond the reach of `sub1`.
Using a UnicodeScalarView gives an odd result too: indices past the end are seemingly ignored, and `sub2` is equal to `sub1`.

## Conclusion

I think String鈥檚 Views should
1. Follow Collection鈥檚 documentation by using the same indices for their subsequences
2. Provide safe, consistent behavior when using a subscript operation with a past-the-end index

However, this means more breaking changes that won鈥檛 be easy to detect.

Thoughts?

Lo茂c

Hi Lo茂c,

These are bugs. Fixes for these bugs don't need to be approved by
swift-evolution. We would appreciate patches for these issues.

I believe though that the fixes might be non-trivial. I'd be happy to
discuss ideas how to fix this and will help review the patches.

Dmitri

路路路

On Tue, Jun 28, 2016 at 9:46 AM, Lo茂c Lecrenier <swift-evolution@swift.org> wrote:

I think String鈥檚 Views should
1. Follow Collection鈥檚 documentation by using the same indices for their subsequences
2. Provide safe, consistent behavior when using a subscript operation with a past-the-end index

--
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr@gmail.com>*/

Hi Dmitri,

Thanks for the quick answer, I filed a bug here: [SR-1927] Subsequences of String Views don鈥檛 behave correctly 路 Issue #44536 路 apple/swift 路 GitHub

(Unfortunately, I don鈥檛 think I could write a patch myself)

Lo茂c

路路路

On Jun 28, 2016, at 7:25 PM, Dmitri Gribenko <gribozavr@gmail.com> wrote:

On Tue, Jun 28, 2016 at 9:46 AM, Lo茂c Lecrenier > <swift-evolution@swift.org> wrote:

I think String鈥檚 Views should
1. Follow Collection鈥檚 documentation by using the same indices for their subsequences
2. Provide safe, consistent behavior when using a subscript operation with a past-the-end index

Hi Lo茂c,

These are bugs. Fixes for these bugs don't need to be approved by
swift-evolution. We would appreciate patches for these issues.

I believe though that the fixes might be non-trivial. I'd be happy to
discuss ideas how to fix this and will help review the patches.

Dmitri

--
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr@gmail.com>*/