Behavior of index(after:)

The documentation for func index(after i: Index) -> Index states that:

Parameters:
i: A valid index of the collection. i must be less than endIndex.

It does not however state what should happen when the method is called with endIndex. And there seem to be different kinds of behaviors depending on the exact collection type.

For example:

let string = "Hello, World!"
let pastEndIndex = string.index(after: string.endIndex)

... causes a runtime error:

error: Execution was interrupted, reason: EXC_BAD_INSTRUCTION

On the other hand:

let array = [1, 2, 3, 4]
let pastEndIndex = array.index(after: array.endIndex)

... executes without problems and even produces the "pseudo-correct" result that pastEndIndex == 5.


To me this looks like undefined behavior, which we tend to avoid in Swift. I feel like String's behavior is good, because it enforces the semantics of index(after:) in a sensible way. And I think Array's implementation should take the same approach of trapping on invalid inputs.

Any thoughts on this?

5 Likes

This could be one of the behaviours that can only reasonably be enforced in a best-effort way for an arbitrary collection, like only using an index with the collection it came from or index invalidation on mutation. e.g. for some collections it could be too expensive to perform this check, though I don't have any examples to hand. In general you need to be careful not to misuse collection indices.

For Array specifically, it could probably trap in this case (at least in -Onone) but it's not particularly dangerous to form an invalid Array index because it will just fail when you try to use it instead. Similar situations occur when manually calculating an Array index, where .endIndex + 1 - length / 2 might calculate an intermediate value for the index that is invalid, but end up at a valid index.

This has been discussed before, and as @jawbroken alludes to, it was decided to leave this up to concrete types.

It’s not undefined behavior; the behavior is knowable for any concrete type and the compiler does not assume that incrementing past the end can never happen.

The protocol simply does not place its own semantic requirements for what the concrete behavior should be in that case. As a general rule, protocols only guarantee what they guarantee; it is a non-goal to specify the behavior in all cases for all conforming types.

It is reasonable, though, if a particular concrete type doesn’t document its own behavior to ask if that can be guaranteed.

6 Likes

Undefined behavior means something quite specific in C-family languages. This is not undefined behavior. It is unspecified behavior at the protocol level, but that does not introduce memory unsafety in the way that undefined behavior would, and concrete types should have fully-specified behavior.

I think it is reasonable for the protocol to recommend, but not require, certain behavior in cases like these, merely as guidance to implementors of conforming types ("if you don't have a good reason to do something else, do this")

4 Likes

This works:

let string = "Hello, World!"
let arrayOfString = Array(string)
let pastEndIndex = arrayOfString.index(after: arrayOfString.endIndex) //14

The crux of the behaviour observed in the original post lies in the difference between a String and an array. In my understanding, Strings can be treated like arrays in some senses (.count, and iterating over them, etc.), but not all functions that can be used with arrays can apply to Strings.

Check this out from "A Swift Tour" (see link below):

Each String value has an associated index type, String.Index, which corresponds to the position of each Character in the string. In order to determine which Character is at a particular position, you must iterate over each Unicode scalar from the start or end of that String.

Use the startIndex property to access the position of the first Character of a String. The endIndex property is the position after the last character in a String. As a result, the endIndex property isn't a valid argument to a string's subscript. If a String is empty, startIndex and endIndex are equal.
A Swift Tour

Of course, there are so many cases where Strings can be handled like arrays, as highlighted, I completely understand why one might scratch one's head - as we all have to varying degrees.

It could be worded better, but the argument to index(after:) must be a current index for the collection that refers to a valid element. The endIndex value does not qualify for the latter, and is just as illegal an input as a random state completely outside the bounds of indices. Using endIndex as an argument is a precondition violation; whether it actually triggers the Standard Library's run-time failure routines or does some other wacky code instead doesn't matter, since you're not supposed to use it as an input to begin with!

endIndex is a valid result, though; it must the result when the argument is the last-occurring element's index. (This means that it's a valid input argument to index(before:) for non-empty collections, where startIndex is an invalid input instead.)

Is there a reason why the Developer Documentation doesn't link to the Swift.org examples and documentation?