Hi swift-evolution
String鈥檚 Views have a few odd properties that have bothered me for a while. I initially did not bring it up because I thought a String redesign was coming. But since Swift 3 will be released very soon鈥攁nd with the recent focus on breaking changes鈥擨 thought now might be a good time to talk about it.
## Subsequences of UTF16View and CharacterView don鈥檛 use the same indices as the original collection
One requirement of the Collection protocol is
public subscript(bounds: Range<Self.Index>) -> Self.SubSequence { get }
whose documentation says:
/// Accesses a contiguous subrange of the collection's elements.
///
/// The accessed slice uses the same indices for the same elements as the
/// original collection uses.
However, it appears that UTF16View and CharacterView don鈥檛 follow the documentation. For example:
let str = "Hello World!".utf16
let (start, end) = (str.index(str.startIndex, offsetBy: 2), str.index(str.startIndex, offsetBy: 9))
let sub1 = str[start ..< end]
print(sub1) // llo Wor
let sub2 = str[sub1.startIndex ..< sub1.endIndex]
print(sub2) // Hello W
Here, using `sub1`鈥檚 indices on the original collection `str` returns a completely different subsequence.
I think that, ideally, `sub2` should be equal to `sub1`, just like when using UTF8View and UnicodeScalarView.
## Accessing elements past the end of the subsequence
Consider this piece of code:
let str = "Hello World!".utf8
let (start, end) = (str.index(str.startIndex, offsetBy: 2), str.index(str.startIndex, offsetBy: 9))
let sub1 = str[start ..< end]
print(sub1) // llo Wor
let pastEnd = sub1.index(sub1.endIndex, offsetBy: 2)
let sub2 = sub1[sub1.startIndex ..< pastEnd]
print(sub2) // llo World
I was able to access elements of the original string that should be beyond the reach of `sub1`.
Using a UnicodeScalarView gives an odd result too: indices past the end are seemingly ignored, and `sub2` is equal to `sub1`.
## Conclusion
I think String鈥檚 Views should
1. Follow Collection鈥檚 documentation by using the same indices for their subsequences
2. Provide safe, consistent behavior when using a subscript operation with a past-the-end index
However, this means more breaking changes that won鈥檛 be easy to detect.
Thoughts?
Lo茂c