NSString Indices invalidated potentially after it has been bridged to String and manipulated

Has index invalidation detection ever been explored? A new version: Int could be added to String (well, its StringGuts), and String.Index.

  • When an index is created, its version is set to that of the string from which it was made.
  • Index-invalidating operations like replaceSubrange will increment the string's version
  • When an index is attempted to be used, its version is compared to that of the string, and an error is thrown if the two are different.

This is similar to the AbstractList.modCount in Java, which is used to detect modifications of a list that's undergoing iteration, throwing a ConcurrentModificationException.

This would make String and String.indices instances bigger, which would obviously have a negative effect on memory use and cache efficiency, but the correctness it buys might be worth it. Thoughts?

It wouldn't make String.Index bigger; we have seven reserved bits in the index encoding precisely for uses like this.

1 Like

How would we know what string it was made from?

As Steve mentioned, we have bits available to us (potentially a couple more if we're really careful about backwards deployment, or less if we fix SE-0180). We can store a tiny hash in those bits which should successfully flag mis-uses the majority of the time. All-zeroes skips the check for back/forwards compatibility.

For native large strings, we can mix in relevant bits from the object address with a mutation count. Foreign strings would defer to the foreign string itself, and for NSString it would be just the address (they're not mutated). For small strings, we could mix in the small count with a quick in-register mixing of the contents.

It's possible that some special indices would continue to not be checked, such as startIndex and endIndex.

1 Like

Cool. So was an idea like this ever considered? What are your thoughts on it?

Yes, the standard library team has considered it in the past. @Michael_Ilseman and @lorentey would know more. There are some non-trivial performance tradeoffs to consider, but I think it's likely that we'll do something along these lines eventually.

Thank you for all your replies, especially for the example that @Michael_Ilseman provided.

Here is my another question, if you are interested, please help.

String.UnicodeScalarView.Index issue

As @scanon said, the answer is "yes", and indeed some data structures do attempt to detect index invalidation where the performance cost is not too high. For example, Set and Dictionary in the standard library attempt to detect index invalidation (there's a description of the scheme, with the implementation using the age field in the Index itself).

SwiftNIO's CircularBuffer attempts this as well, though for performance reasons we didn't attempt a perfect invalidation detection scheme, just a probabilistic one, and we only check in debug mode. CircularBuffer.Index stores three bytes of data that correspond to the (clipped) size of the backing storage. This is sufficient for us because we only invalidate the indices when we resize the backing storage, and we are likely to detect that here.

I think in general this is another thing that well-designed Collection objects may want to attempt, though retrofitting this to the standard library is not likely to be easy in many cases.

1 Like

Thanks for sharing all that.

As a safeguard against using invalid indices, Set and Dictionary maintain a mutation counter in their storage header (_age). This counter gets bumped every time an element is removed and whenever the contents are rehashed. Native indices include a copy of this counter so that index validation can verify it matches with current storage.

This is precisely the idea I "came up with independently" (i.e. it was an idea that occurred to me, having not already seen that it exists). I take it as a sign that I'm growing and starting to really understand things. It's super cool to see it in practice!