Is it okay to ignore overflow when incrementing/decrementing Collection indexes?

I've been wondering about whether or not is generally safe to ignore overflow when incrementing/decrementing Collection indexes. I'm quite sure that it is, but I'd like to check.

My understanding has been that Collection doesn't require invalid operations to necessarily trap at runtime: only to produce predictable results. So incrementing a Collection's endIndex is allowed to keep returning endIndex forever, or it might return some sentinel "invalid" index, or it might trap, or loop back to startIndex... if that's what the implementors want to do. By extension, no part of Collection's documented requirements require a conformer to trap if subscripted with an invalid index; a custom Collection of UInt8s might decide to return 0, 1, 2, or any arbitrary value in that case.

I've seen this come up in various threads over the years, and the prevailing wisdom is always that Collection doesn't require you to trigger runtime errors.

The caveat to all of this is that you must not compromise memory safety: so if you use the index as, say, an offset in to an UnsafeBufferPointer, you must bounds-check that access first. That's the thing that provides safety; not overflow checks.

If we look at how Array is implemented, we see that it does something similar to this: index(after:) will keep returning ever-greater integers, even beyond the bounds of the Array - which does surprise some users, but is not considered unsafe because all accesses are bounds-checked.

So am I correct in saying that if Array (or UnsafeBufferPointer, or any custom Collection) were to use wrapping addition for its indexes, and not fail on overflow, that it would not compromise memory safety as long as accesses are bounds-checked? Is that a principle which can scale to the general case?

The "Advanced Operators" section of TSPL appears to indicate that signed integer overflow is defined in Swift - so AFAICT there should be no problem with this. Such a Collection would not invoke undefined behaviour, not produce unpredictable results, and not access any memory it shouldn't. Seems okay to me.

My next question is: assuming that this behaviour is safe, why don't any types in the standard library do it? Array, and even the UnsafeBufferPointer types, all use addition which can trap on overflow.

Yes: you violate the contract of the Collection API when you increment the index beyond the bounds of the Collection. What happens after that point is unspecified, and may include integer overflow.

My guess is that the answer is that + is easier to spell than &+, and it requires positive mental effort to confirm to yourself that &+ is the right thing to do. As you might remember, I removed some checked arithmetic from the buffer pointer rebasing initializer recently. I think similarly-motivated patches in other places would likely be accepted.

3 Likes

Indeed @lorentey nearly says as much in a comment on that issue:

  1. Collection.distance(from:to:) is only defined to operate on indices that are valid in the collection value at the time distance is called. Crucially, there is no requirement for distance to diagnose invalid indices -- which is why it is okay for e.g. Array.distance(from:to:) not to perform bounds checking on its input. Checking is still preferable to no checking, but performance concerns may override this, and in the case of U[M]BP , they probably should.

    If start < end , then ideally base.distance(from: start, to: end) wouldn't return a negative value. But if start and end are invalid indices, then comparing them or calculating their distance are meaningless operations -- so it is probably okay for them to return inconsistent results (such as when the subtraction overflows in distance ).

3 Likes