[Pitch] Changing the behavior of Subsequences of String Views


(Lo茂c Lecrenier) #1

Hi swift-evolution :blush:

String鈥檚 Views have a few odd properties that have bothered me for a while. I initially did not bring it up because I thought a String redesign was coming. But since Swift 3 will be released very soon鈥攁nd with the recent focus on breaking changes鈥擨 thought now might be a good time to talk about it.

## Subsequences of UTF16View and CharacterView don鈥檛 use the same indices as the original collection

One requirement of the Collection protocol is

public subscript(bounds: Range<Self.Index>) -> Self.SubSequence { get }

whose documentation says:

/// Accesses a contiguous subrange of the collection's elements.
///
/// The accessed slice uses the same indices for the same elements as the
/// original collection uses.

However, it appears that UTF16View and CharacterView don鈥檛 follow the documentation. For example:

let str = "Hello World!".utf16
let (start, end) = (str.index(str.startIndex, offsetBy: 2), str.index(str.startIndex, offsetBy: 9))

let sub1 = str[start ..< end]
print(sub1) // llo Wor

let sub2 = str[sub1.startIndex ..< sub1.endIndex]
print(sub2) // Hello W

Here, using `sub1`鈥檚 indices on the original collection `str` returns a completely different subsequence.
I think that, ideally, `sub2` should be equal to `sub1`, just like when using UTF8View and UnicodeScalarView.

## Accessing elements past the end of the subsequence

Consider this piece of code:

let str = "Hello World!".utf8
let (start, end) = (str.index(str.startIndex, offsetBy: 2), str.index(str.startIndex, offsetBy: 9))

let sub1 = str[start ..< end]
print(sub1) // llo Wor

let pastEnd = sub1.index(sub1.endIndex, offsetBy: 2)

let sub2 = sub1[sub1.startIndex ..< pastEnd]
print(sub2) // llo World

I was able to access elements of the original string that should be beyond the reach of `sub1`.
Using a UnicodeScalarView gives an odd result too: indices past the end are seemingly ignored, and `sub2` is equal to `sub1`.

## Conclusion

I think String鈥檚 Views should
1. Follow Collection鈥檚 documentation by using the same indices for their subsequences
2. Provide safe, consistent behavior when using a subscript operation with a past-the-end index

However, this means more breaking changes that won鈥檛 be easy to detect.

Thoughts?

Lo茂c


(Dmitri Gribenko) #2

Hi Lo茂c,

These are bugs. Fixes for these bugs don't need to be approved by
swift-evolution. We would appreciate patches for these issues.

I believe though that the fixes might be non-trivial. I'd be happy to
discuss ideas how to fix this and will help review the patches.

Dmitri

路路路

On Tue, Jun 28, 2016 at 9:46 AM, Lo茂c Lecrenier <swift-evolution@swift.org> wrote:

I think String鈥檚 Views should
1. Follow Collection鈥檚 documentation by using the same indices for their subsequences
2. Provide safe, consistent behavior when using a subscript operation with a past-the-end index

--
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr@gmail.com>*/


(Lo茂c Lecrenier) #3

Hi Dmitri,

Thanks for the quick answer, I filed a bug here: https://bugs.swift.org/browse/SR-1927

(Unfortunately, I don鈥檛 think I could write a patch myself)

Lo茂c

路路路

On Jun 28, 2016, at 7:25 PM, Dmitri Gribenko <gribozavr@gmail.com> wrote:

On Tue, Jun 28, 2016 at 9:46 AM, Lo茂c Lecrenier > <swift-evolution@swift.org> wrote:

I think String鈥檚 Views should
1. Follow Collection鈥檚 documentation by using the same indices for their subsequences
2. Provide safe, consistent behavior when using a subscript operation with a past-the-end index

Hi Lo茂c,

These are bugs. Fixes for these bugs don't need to be approved by
swift-evolution. We would appreciate patches for these issues.

I believe though that the fixes might be non-trivial. I'd be happy to
discuss ideas how to fix this and will help review the patches.

Dmitri

--
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr@gmail.com>*/