Is the problem being addressed significant enough to warrant a change to Swift?
Yes.
Does this proposal fit well with the feel and direction of Swift?
Undecided.
If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?
I used lots of languages, none of them had an explicit solution for indexing into strings.
How much effort did you put into your review? A glance, a quick reading, or an in-depth study?
I took some time to try to understand the problem but I have to admit that I'm not sure if I succeeded. I do understand that having a single Int as index does not work for the four different views that string supports. On the other hand, I do not see how String.Index actually solves that problem.
One of the problems I see is that String.Index does not seem to have any knownledge about the string. Thus there is no defined error behaviour for
let i = String.Index(offset: 1, within: "one")
let x = i.offset(within: "other".utf8)
I do understand that Swift has different views on strings that use different index values for the same position in the string. As shown in the previous point, the proposed solution seems to have problems here. Isn't this the main reason for not using Int as index type? I boiled down this comment to: âUsers will mix up the index into different *Views or into the Collection of Characters that String is.â
With a dedicated index type that supports convertion (by retaining the string it comes from) this would indeed not be a problem.
On the other hand, the argument in the comment is not too strong in my opinion as subscript with Int is unquestioned for arrays (where array.sorted() is a âviewâ on array) .
TL;DR
I think String.Index is too complicated to use when the benefit is unclear.
I would be fine if String.Index was the return type for all offset calculation methods but the views should (also) accept Int as input.
This thread shows that users do not understand why you have to write
let x = "abc"
let c = x[x.index(x.startIndex, offsetBy: 2)]
That's a violation of collection index semantics in general. You can't take an index from one collection and use in another in the general case. In the specific case of collections whose index is Int you can do this as long as the index is in-bounds in the new collection, but in the general case, you can only safely use an index in a collection if the index is a member of col.indices, and that's not going to be true for index types other than Int.
Which proposal are you reviewing? SE-0241 is not proposing String.Index, which already exists. It's deprecating an existing property and method on the type (which was misused in the wild), and replacing it with a more explicit alternative.
You are right, that's a violation of collection index semantics but in my opinion the API should be designed in a way that either prevents incorrect usage or has a defined error behaviour.
But, as Michael_Ilseman correctly pointed out: My point if off-topic here.
As I understood it, SE-0241 seeks to improve String.Index to minimize misuse of encodedOffset. I think that the proposed change will not fix this problem because people (well: me), will still fail to understand the reason for String.Index in the first place. E.g. Iâm not even sure if this will be correct:
let iutf8 = String.Index(offset: 4, in: myString.utf8)
let c = myString[iutf8.offset(in: myString)]
My opinion (which might well be wrong because I, again, admit that I do not get all the hard edges that unicode imposes on Swift strings) is: 1500 people tried to do myString[i..<j] and failed. SE-0241 will not fix this. Making myString[i..<j] behave like String(Array(myString)[i..<j]) will fix it. It will come with its own problems but in most cases it does what people expect it to do. Removing String.Index is the ultimate fix of encodedOffset misuse.
String.Index(offset:in:) is not in the proposal anymore, since this is now an emergency fix for 5.0. Even with the originally proposed API, this code doesn't make any sense. What are you trying to do?
I agree that indices are the wrong level for the majority of people programming against String, and it's a shame people still have to do it. Personally, I don't think String.Index makes much sense, but it's what we have right now and there are bugs in the wild being exposed in Swift 5.0.
Back to the proposal:
The behavior of encodedOffset shifted in a way that exposes bugs in existing misuses of it, and there is no way to recover the original behavior without access to the string content (even though this is still amortized O(1)). Another way of saying that is that SE-0180 had a bug. This proposal deprecates the encodedOffset pair, and replaces it with a semantics-preserving alternative utf16Offset:in: pair, which explicitly specifies what the user is asking for.
Is the problem being addressed significant enough to warrant a change to Swift?
Yes, there is definitely a need for convenient-to-use functions for converting between String.Index and a utf16 offset.
There is nothing wrong with the implementation of the two methods in the detailed design, but I feel they are declared on the wrong type. In Swift the convention is that the collection type is responsible for constructing indices. These two functions are simply convenience functions built on the existing collection index functions. Therefore, they should be in an extension on String.UTF16View. This avoids polluting the String.Index namespace with a bunch of methods of unrelated encodings.
Does this proposal fit well with the feel and direction of Swift?
Sceptical. It is okay-ish only considering Swift 5, but it doesn't make much sense for the future when adding similar functions for utf8 and other encodings.
How much effort did you put into your review? A glance, a quick reading, or an in-depth study?
In-depth study of proposal and previous string index proposals.
Alternative proposal
extension String.UTF16View {
/// Returns the UTF-16 code unit offset corresponding to the index, as measured from the views startIndex
public func codeUnitOffset(_ index: String.Index) -> Int {
return distance(from: startIndex, to: index)
}
/// Creates a new index at the specified UTF-16 code unit offset
///
/// - Parameter offset: An offset in UTF-16 code units.
public func index(codeUnitOffset offset: Int) -> String.Index {
let (start, end) = (startIndex, endIndex)
guard offset >= 0,
let idx = index(start, offsetBy: offset, limitedBy: end)
else {
return end.nextEncoded
}
return idx
}
}
Example usage:
let string = "foo"
let index = string.utf16.index(codeUnitOffset: 2)
let codeUnitOffset = string.utf16.codeUnitOffset(index)
We already have this. Use the pre-existing Collection.distance(from:to:) to calculate an offset from the start index, and Collection.index(_:offsetBy:) to calculate an index from an offset.
extension Collection {
public func distance(to i: Index) -> Int {
return distance(from: startIndex, to: i)
}
public func index(offsetBy offset: Int) -> Index {
return index(startIndex, offsetBy: offset)
// Could use index(_:offsetBy:limitedBy:) and return Optional,
// or even allow negative offsets from the endIndex, but those
// arenât my preference.
}
}
(If protocols permitted default arguments, these could be default arguments instead. But they donât, so they canât.)
Modifying this so that something like str.utf8[10..<25, in: \.self] slices the UTF-8 view by character offsetsâor, for that matter, str.indices[10..<25, in: \.self] gives you a slice of the stringâs indicesâis left as an exercise for the reader. (Or me if I get bored late at night again.)
The change feels to me like it replaces something broken with something less broken. There is no time to get âthe real fixâ for Swift 5 and I fear that it will not come afterwards if it is ânot that broken anymoreâ and âalready got changed twiceâ.
So itâs not that I am opposing this change but itâs more that I do not want this change to block the future change I would like to see.