Allow `index(_:offsetBy:limitedBy:)` to return nil if limit is invalid

Karl · March 11, 2024, 4:42pm

index(_:offsetBy:limitedBy:)

Returns an index that is the specified distance from the given index, unless that distance is beyond a given limiting index.
func index(
   _ i: Self.Index,
   offsetBy distance: Int,
   limitedBy limit: Self.Index
) -> Self.Index?
Parameters

i
A valid index of the collection.

distance
The distance to offset i. distance must not be negative unless the collection conforms to the BidirectionalCollection protocol.

limit
A valid index of the collection to use as a limit. If distance > 0, a limit that is less than i has no effect. Likewise, if distance < 0, a limit that is greater than i has no effect.

Documentation

Note the specification of limit, and that it "has no effect" if it does not match the starting index and direction of offsetting. I would argue that this is not useful behaviour, and I've found that it can require a suboptimal implementation, and would like to suggest that we change it to either:

a) Always return nil (since, technically, all results would be beyond the given limit).
b) Not specify the result in this case.

Concrete Example

I've been experimenting with a SIMD variant of Collection's contains function:

extension Collection where Element == UInt8 {

  @inlinable
  internal func contains_simd(_ element: UInt8) -> Bool {
    typealias Vector = SIMD8<Element>

    var i = startIndex
    while let j = index(i, offsetBy: Vector.scalarCount, limitedBy: endIndex) {
      if any(Vector(self[i..<j]) .== Vector(repeating: element)) {
        return true
      }
      i = j
    }
    while i < endIndex {
      if self[i] == element { return true }
      formIndex(after: &i)
    }
    return false
  }
}

And how well it works when applied to my custom UnsafeBoundsCheckedBufferPointer type.

In the first instance, I implement index(_:offsetBy:limtedBy:) like this:

@inlinable
internal func index(_ i: UInt, offsetBy distance: Int, limitedBy limit: UInt) -> UInt? {
  if distance >= 0 {

    // All valid 'i' are <= Int.max, so this will not overflow.
    // An invalid 'i' is allowed to return a nonsense result.
    let result = i &+ UInt(distance)
    return result <= limit ? result : nil

  } else {

    // All valid 'i' are >= 0 and <= Int.max, so this will not underflow.
    let result = Int(bitPattern: i) &+ distance
    return result >= limit ? UInt(bitPattern: result) : nil
  }
}

This generates really nice assembly, which loads a chunk of bytes directly in to a SIMD register with a single bounds-check (and even that I think could probably be hoisted if #71919 were fixed):

.LBB2_9:
        add     r8, 8
        jb      .LBB2_13                     ; bounds check
        movq    xmm1, qword ptr [rsi + rdx]  ; single load in to SIMD reg
        pcmpeqb xmm1, xmm0
        pmovmskb        eax, xmm1

Behaviour-wise, this is option (a):

// Incrementing, but limit < initial index.
buffer.index(2, offsetBy: 1, limitedBy: 1) // nil
buffer.index(2, offsetBy: 2, limitedBy: 1) // nil
buffer.index(2, offsetBy: 3, limitedBy: 1) // nil

// Subtracting, but limit > initial index.
buffer.index(2, offsetBy: -3, limitedBy: 3) // nil
buffer.index(2, offsetBy: -2, limitedBy: 3) // nil
buffer.index(2, offsetBy: -1, limitedBy: 3) // nil

And I think this is quite reasonable. If you're incrementing, you're saying you don't want the result to be greater than the limit. If the limit had no effect, the results (3, 4, 5) would indeed be greater than the limit (1), and so it is preferable to return nil IMO.

If I were to implement the behaviour precisely as specified and cause invalid limits to have no effect, I'd need to make the following changes:

@inlinable
internal func index(_ i: UInt, offsetBy distance: Int, limitedBy limit: UInt) -> UInt? {
  if distance >= 0 {

    // All valid 'i' are <= Int.max, so this will not overflow.
    // An invalid 'i' is allowed to return a nonsense result.
    let result = i &+ UInt(distance)
+   guard limit >= i else { return result }
    return result <= limit ? result : nil

  } else {

    // All valid 'i' are >= 0 and <= Int.max, so this will not underflow.
    let result = Int(bitPattern: i) &+ distance
+   guard limit <= i else { return UInt(bitPattern: result) }
    return result >= limit ? UInt(bitPattern: result) : nil
  }
}

The result is much worse assembly. Because it is not known whether the limit actually had an effect, the compiler needs to bounds-check every position before loading data in to a SIMD register:

.LBB2_9:
        cmp     r8, -9
        ja      .LBB2_21  ; bounds check
        cmp     rax, rcx
        ja      .LBB2_22  ; bounds check
        lea     r8, [rdx - 3]
        cmp     r8, rax
        jae     .LBB2_23  ; bounds check
        lea     r8, [rdx - 2]
        cmp     r8, rax
        jae     .LBB2_23  ; bounds check
        lea     r8, [rdx - 1]
        cmp     r8, rax
        jae     .LBB2_23  ; bounds check
        cmp     rdx, rax
        jae     .LBB2_23  ; bounds check
        lea     r8, [rdx + 1]
        cmp     r8, rax
        jae     .LBB2_23  ; bounds check
        lea     r8, [rdx + 2]
        cmp     r8, rax
        jae     .LBB2_23  ; bounds check
        lea     r8, [rdx + 3]
        cmp     r8, rax
        jae     .LBB2_23  ; bounds check
        movq    xmm1, qword ptr [rsi + rdx - 4]  ; load in to SIMD register
        pcmpeqb xmm1, xmm0
        pmovmskb        eax, xmm1

Moreover, I would argue this behaviour is not useful. Here's what the difference means in practice:

// Incrementing, but limit < initial index.
buffer.index(2, offsetBy: 1, limitedBy: 1) // 3
buffer.index(2, offsetBy: 2, limitedBy: 1) // 4
buffer.index(2, offsetBy: 3, limitedBy: 1) // 5

// Subtracting, but limit > initial index.
buffer.index(2, offsetBy: -3, limitedBy: 3) // 18446744073709551615
buffer.index(2, offsetBy: -2, limitedBy: 3) // 0
buffer.index(2, offsetBy: -1, limitedBy: 3) // 1

The limit parameter is basically useless (it "has no effect"), allowing us to even get junk values which do not correspond to a valid position in the collection. Now, this is still safe because actual reads from memory will bounds-check this, but the previous implementation would have prevented the junk index in the first place.

So I'd prefer to return nil in this case, or at least for the standard library to leave me enough room that I can choose to return nil in the case where the combination of (i, distance, limit) is not valid.

Previous Discussions

This has come up before in the distant past:

I don't think this necessarily needs to trap (although implementations may do that if they want to). This function already returns an optional, and the condition where the start and end point are in the wrong order is compatible with the existing conditions where the function returns nil.

jrose · March 11, 2024, 5:28pm

In general, we can’t change the requirements on protocols because that would break existing implementations outside the standard library. This might have been the wrong choice, but without introducing a new method we’re stuck with it.

lorentey · March 18, 2024, 7:25pm

Jordan is right, the existing protocol requirement is what it is. We cannot change it.

Another, even bigger semantic issue with this operation is that it does not indicate how many steps it did manage to take before hitting the limit. This makes it impractical to implement piecewise navigation in a container that's internally split into a series of sub-containers (such as, say, tree-based container types).

One way to fix that problem would be to introduce a new operation with a different signature, such as:

// Returns true if the operation did manage to take `distance` steps
func formIndex(
   _ i: inout Self.Index,
   offsetBy distance: inout Int,
   limitedBy limit: Self.Index
) -> Bool

If we did this, then that would open an opportunity for us to adjust required behavior if the limit lies in the wrong direction, fixing both of these issues.

Would your codegen look acceptable if you had to maintain inout index & distance values?

Noncopyable & nonescapable types will require a fundamental overhaul (perhaps outright replacement) of Swift's container protocol hierarchy. Because of this, I don't have much appetite for one-off tweaks to the current protocol Collection; however, I do want minor API adjustments like this to be (to some extent) rolled into that effort.