let s = "😀"
let range = s.startIndex..<s.endIndex
let _ = ((s as NSString) as String)[range] // crashes
Is this expected to crash? And if so can someone tell me why or point me to an explanation? I'm trying to move to using Swift native ranges instead of NSRange and this crash really surprised me!
String indices are valid only for the original, unmodified instance. (s as NSString) as String is not the same as s, so it is invalid to subscript with indices of s.
Specifically, bridging to NSString changes how the data is stored in memory. But the details do not really matter; any use of the indices of one string to subscript another is liable to fail in any future version of Swift even if it does not today. This includes even:
let s = "abc"
let t = "abc"
t[s.startIndex..<s.endIndex] // invalid, may crash
let u = s
u[s.startIndex..<s.endIndex] // invalid, may crash
I see very little on the possibility of indices becoming invalid on String documentation page. I would expect it to be more highlighted, but good to know now. I guess I'll just go back to using NSRanges... but I was trying to become modern!
Note that the fact that you cannot use indices from one instance on another is true for Collections in general, not just Strings. Here's an example with ArraySlice
let arr = [1, 2, 3, 1, 2, 3]
let slice1 = arr[..<3]
let slice2 = arr[3...]
print(slice1 == slice2) // true
let idx = slice1.startIndex
print(slice1[idx]) // 1
print(slice2[idx]) // Fatal error: Index out of bounds
or more similar to your original example:
let s = [1, 2, 3, 1, 2, 3][3...]
let range = s.startIndex..<s.endIndex
let _ = ArraySlice(Array(s))[range]
NSString isn’t strictly needed to be modern, either. In particular, you may want to look at this initializer to work with NSRange. We may also be able to help better if you want to give a more specific example of what you’re trying to modernize!
For that last one, wouldn't that include usage of substring indices on their original strings, like
let s = "abcacd"
let nextA = s.dropFirst().firstIndex(of: "a")!
s[..<nextA]
which is how I thought you were supposed to do this kind of thing (getting indices of / doing searches on subsections of a string for use with the whole string) and is the whole reason we don't have API that takes ranges for operation everywhere
If you're not supposed to do this kind of thing, what should you do instead?
Note still that SubSequence will share index with the corresponding Collection.
So a valid ArraySlice index will also be valid for corresponding Array; and a valid Array index will be a valid for corresponding ArraySlice if it’s in the range for example.
That’s how we make slicing collections very cheap.
As explained in the previous thread, to implement index invalidation detection, a future version of String.Index could mix in bits of the address. Therefore, two bitwise identical values could nonetheless have incompatible indices. (Now, String is a copy-on-write type, so the address could remain the same, but IIUC, there’s no guarantee of such.)
Substrings share indices with their parent strings.
I'm calling a Rust api and getting back items. Each item has a title that comes in as utf8 formatted bytes and highlight ranges that come in as utf8 offset pairs that represent ranges in that text.
I will display these items in a NSTableView with the ranges highlighted, so my end target is to get the utf8 bytes into a NSAttributedString and get the offsets into NSRanges so that I can add highlight ranges to the attributed string.
Right now I'm converting the rust data over to these structs:
struct Item {
let text: String
let highlights: [NSRange]
}
And they are the "Items" I return from my NSOutlineView data source. It's working fine as is, but I thought it would be more swift style to use Swift's native Range instead of NSRange... so that's why I was trying to store the highlights as [Range<String.Index>] and ran into above described problems.
Anyway, I'm happy enough using NSRange for storage... it's what I need in the end to build up the NSAttributedString anyway.
You need to be dead sure no middleman between the two APIs is treating the communications as text and possibly subjecting it to normalization. Otherwise you need to be prepared for the UTF‐8 offsets to be incorrect or even out of bounds.