Generally, I try to define all my string processing code on Substring (or StringProtocol) to avoid forcing constant reallocations of the string buffer. This usually works fine, but on methods like this:
func firstWord(in str: String) -> Substring {
let index = str.unicodeScalars.index {
CharacterSet.whitespacesAndNewlines.contains($0)
} ?? str.unicodeScalars.endIndex
// The character set should only match the first UnicodeScalar in a Character
let strIndex = index.samePosition(in: str)!
return str[..<strIndex]
}
it breaks if I attempt to change the input type to a Substring (and breaks even more if I attempt to accept StringProtocol). Substring.unicodeScalars is a String.UnicodeScalarView, which uses String.UnicodeScalarView.Index as its index. The samePosition(in:) method on these indexes doesn't work with Substring
Possible Solutions
Add a samePosition(in: Substring) method to String.*View.Index. This would (I assume) be the easiest change, but I would guess this would be much harder to implement on StringProtocol if support for that is wanted.
Move samePosition to StringProtocol, as an index(samePositionAs: Index) method. This would allow all StringProtocol members to provide their own implementations of the operation for their own *View Index types, but is a much bigger change than the first option
Something else, these are the two solutions I could think of, if you can think of another please share it
That still doesn't work because samePosition(in:) is only defined for String, String.UnicodeScalarView, String.UTF16View, and String.UTF8View. It's not generic at all.
String's index interchangeability doesn't seem to be reflected in StringProtocol currently, so we add those constraints ourselves. When the future of StringProtocol is more certain, we can add these constraints directly on the protocol.
edit: Added firstWord5, which shows that we don't need the S.Index == String.Index constraint
Ahh I see. Either way, this is just a method to do what I did without using samePosition(in:) (which I didn't realize would compile). What if you wanted to do this:
func prefix(of str: String, until condition: (UnicodeScalar) -> Bool) -> Substring {
var index = str.unicodeScalars.index(where: condition) ?? str.unicodeScalars.endIndex
while true {
if let characterIndex = index.samePosition(in: str) {
return str[..<characterIndex]
}
index = str.index(before: index)
}
}
I guess when simplifying my use of the function for my example I accidentally made it into something that didn't actually need use samePosition(in:), but the fact that samePosition(in:) doesn't work with Substring seems to still be an issue.
Ah, so you want to know when the index is grapheme-aligned, that is, the index is a member of str.indices. Yes, this is a gap in Substring.
Off the top of my head, I can't think of a reason why this couldn't be defined for Substring. There might be some more effort involved in having this be present on StringProtocol, as StringProtocol might need further constraints on its associated types.
Sorry to resurrect this thread, but it seems that String.Index.samePosition(in:) still does not offer a way to do index conversion on Substring views, even though Substring uses String.Index. Has this gap been closed in another way?
Swift 5.7 made an important step in the right direction by having Substring.index(before:) and Substring.index(after:) automatically and implicitly round the supplied index down to the nearest character boundary within the substring.
I think it would make sense to provide this rounding-down operation as explicit methods on String and Substring. SE-0180 called this out as important future work; it’s time we followed up on that.
Note: Character boundaries within a Substring do not necessarily match character boundaries within its base string — so we cannot use the String-based methods to infer character positions within substrings: we need dedicated methods for rounding down indices within a substring.