String to NSString to String invalidates Range<String.Index>

Jesse_Grosjean · September 24, 2019, 7:10pm

When I try this I get a crash on the last line:

let s = "😀"
let range = s.startIndex..<s.endIndex
let _ = ((s as NSString) as String)[range] // crashes

Is this expected to crash? And if so can someone tell me why or point me to an explanation? I'm trying to move to using Swift native ranges instead of NSRange and this crash really surprised me!

Thanks,
Jesse

xwu · September 24, 2019, 7:24pm

String indices are valid only for the original, unmodified instance. (s as NSString) as String is not the same as s, so it is invalid to subscript with indices of s.

Specifically, bridging to NSString changes how the data is stored in memory. But the details do not really matter; any use of the indices of one string to subscript another is liable to fail in any future version of Swift even if it does not today. This includes even:

let s = "abc"

let t = "abc"
t[s.startIndex..<s.endIndex] // invalid, may crash

let u = s
u[s.startIndex..<s.endIndex] // invalid, may crash

Jesse_Grosjean · September 24, 2019, 7:37pm

Thanks for your help.

I see very little on the possibility of indices becoming invalid on String documentation page. I would expect it to be more highlighted, but good to know now. I guess I'll just go back to using NSRanges... but I was trying to become modern!

suyashsrijan · September 24, 2019, 7:45pm

See NSString Indices invalidated potentially after it has been bridged to String and manipulated

cukr · September 24, 2019, 7:49pm

Note that the fact that you cannot use indices from one instance on another is true for Collections in general, not just Strings. Here's an example with ArraySlice

let arr = [1, 2, 3, 1, 2, 3]
let slice1 = arr[..<3]
let slice2 = arr[3...]

print(slice1 == slice2) // true

let idx = slice1.startIndex
print(slice1[idx]) // 1
print(slice2[idx]) // Fatal error: Index out of bounds

or more similar to your original example:

let s = [1, 2, 3, 1, 2, 3][3...]
let range = s.startIndex..<s.endIndex
let _ = ArraySlice(Array(s))[range]

Jesse_Grosjean · September 24, 2019, 8:08pm

Ok, that explains why it isn't more highlighted in String docs, thanks.

Lantua · September 24, 2019, 9:08pm

Why would this fail? AFAICT the assigner and assignee of value type should be indistinguishable from one another.

zwaldowski · September 24, 2019, 11:33pm

NSString isn’t strictly needed to be modern, either. In particular, you may want to look at this initializer to work with NSRange. We may also be able to help better if you want to give a more specific example of what you’re trying to modernize!

TellowKrinkle · September 25, 2019, 2:17am

For that last one, wouldn't that include usage of substring indices on their original strings, like

let s = "abcacd"
let nextA = s.dropFirst().firstIndex(of: "a")!
s[..<nextA]

which is how I thought you were supposed to do this kind of thing (getting indices of / doing searches on subsections of a string for use with the whole string) and is the whole reason we don't have API that takes ranges for operation everywhere

If you're not supposed to do this kind of thing, what should you do instead?

Lantua · September 25, 2019, 4:55am

Note still that SubSequence will share index with the corresponding Collection.

So a valid ArraySlice index will also be valid for corresponding Array; and a valid Array index will be a valid for corresponding ArraySlice if it’s in the range for example.

That’s how we make slicing collections very cheap.

xwu · September 25, 2019, 10:40am

As explained in the previous thread, to implement index invalidation detection, a future version of String.Index could mix in bits of the address. Therefore, two bitwise identical values could nonetheless have incompatible indices. (Now, String is a copy-on-write type, so the address could remain the same, but IIUC, there’s no guarantee of such.)

Substrings share indices with their parent strings.

Jesse_Grosjean · September 25, 2019, 12:07pm

I'm calling a Rust api and getting back items. Each item has a title that comes in as utf8 formatted bytes and highlight ranges that come in as utf8 offset pairs that represent ranges in that text.

I will display these items in a NSTableView with the ranges highlighted, so my end target is to get the utf8 bytes into a NSAttributedString and get the offsets into NSRanges so that I can add highlight ranges to the attributed string.

Right now I'm converting the rust data over to these structs:

struct Item {
  let text: String
  let highlights: [NSRange]
}

And they are the "Items" I return from my NSOutlineView data source. It's working fine as is, but I thought it would be more swift style to use Swift's native Range instead of NSRange... so that's why I was trying to store the highlights as [Range<String.Index>] and ran into above described problems.

Anyway, I'm happy enough using NSRange for storage... it's what I need in the end to build up the NSAttributedString anyway.

SDGGiesbrecht · September 25, 2019, 7:21pm

You need to be dead sure no middleman between the two APIs is treating the communications as text and possibly subjecting it to normalization. Otherwise you need to be prepared for the UTF‐8 offsets to be incorrect or even out of bounds.