Forming index crashes when the Data's representation is .empty

I just run into a crash with a helper function I wrote which is abstract and simplified as follows:

func foo<C: Collection>(source: C) where C.Index: Strideable {
    ...
    let offset = ...
    let firstMatchIndex = source.index(source.startIndex, offsetBy: offset)
    let distance = source.startIndex.distance(to: firstMatchIndex)
    ...
}

It crashes at the place where the firstMatchIndex is calculated when the passed-in Collection is a Data instance whose _Representation is .empty. To be specific, this will crash:

let data = Data()
let index = data.index(data.startIndex, offsetBy: 1) /// crashes

Is this expected? I mean, crashing the program if we are accessing an index out of bound, that I can understand. But I don't think forming an index (even it is out of bound) should crash as well, because other Collection type does not as far as I know.

I found using index(after:) is fine in this case:

let data = Data()
let index = data.index(after: data.startIndex) /// form the index fine

So I changed my function to the following as a workaround:

func foo<C: Collection>(source: C) where C.Index: Strideable {
    ...
    let offset = ...
    let firstMatchIndex = (0..< offset).reduce(source.startIndex) { current, _ in source.index(after: current) }
    let distance = source.startIndex.distance(to: firstMatchIndex)
    ...
}

The value passed as distance must not offset i beyond the bounds of the collection.

2 Likes

[Edit: @suyashsrijan’s post landed first. I removed a section that was almost verbatim what he posted.]


The where clause is unwise. Just because an index happens to be strideable does not mean its strides in isolation correspond to the same strides in the collection. For example:

let array = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
let evens = array.lazy.filter { $0.isMultiple(of: 2) }
let indexStride = evens.startIndex + 4
let collectionStride = evens.index(evens.startIndex, offsetBy: 4)
XCTAssertEqual(indexStride, collectionStride)
// Test failure: 4 ≠ 8

Thanks for the detailed documentation.

Can you elaborate on what is concern behind this decision, and why the same decision is not made for Collection.index(after:) as well?

Thanks for the reminder!

In my helper function, I am just using the distance to stride through the source, do you think if it is safe to do so?

func foo<C: Collection>(source: C) where C.Index: Strideable {
    ...
    let offset = ...
    let firstMatchIndex = (0..< offset).reduce(source.startIndex) { current, _ in source.index(after: current) }
    let distance = source.startIndex.distance(to: firstMatchIndex)
    ...
    Swift.stride(from: source.startIndex, to: source.endIndex, by: distance).map { index in
        doSomething(source[index ..< index.advanced(by: distance)])
    }
    ...
}

index(after:) has the same requirement. (Data just neglects to trap.)

A valid index of the collection. i must be less than endIndex.


If you are not using Strideable, then you can remove the where clause. At that point, if it compiles, you’re implementation is safe (and will work whether the indices are strideable or not). If it doesn’t compile without the where, then it wasn’t really logically sound to begin with.

Is this what you are fishing for?

source.indices.map { /* ... */ }

[Edit: This comment was kinda stupid. Sorry. I guess I cannot read.]

No, I don't think that is the same requirement. It only says the parameter needs to be a valid one (which in the example, startIndex is valid), doesn't say if the result is out of bound or not.

Let me correct my former idiocy to something along the lines of this:

var cursor: Index? = source.startIndex
var ranges: [Range<C.Index>] = []
while let boundary = cursor {
  if let end = source.index(boundary, offsetBy: distance, limitedBy: source.endIndex) {
    ranges.append(boundary ..< end)
    cursor = end
  } else {
    cursor = nil
  }
}
let whatever = ranges.map { doSomething(source[$0]) }

If the parameter is a valid index and less than endIndex, then the next index is either valid or endIndex. The result will never be greater than endIndex. (endIndex itself is a valid result of either method.)

No.

let array: [Int] = []
let indexAfter5Advances = (0..<5).reduce(array.startIndex) { current, _ in array.index(after: current) }

array.endIndex /// 0
indexAfter5Advances /// 5

And that, is my question.

I mean the result from a valid input will never be greater than endIndex. The code you just posted is repeatedly calling index(after:) with an argument that violates the preconditions by not being less than the endIndex. That is an undefined situation and many collections will trap (e.g. String). The Data and Array implementations are simply ambivalent about the undefined territory and not bothering to check for it.

Sorry that 5 times iteration makes you misunderstand.

What I am trying to say is, even with the valid input, the result is still possible to be greater than endIndex:

let array: [Int] = []
let validIndex = array.startIndex // valid index
let indexAfterValidIndex = array.index(after: validIndex)

indexAfterValidIndex // 1
array.endIndex // 0

indexAfterValidIndex > array.endIndex // true

Second thought:
The startIndex is not a valid index for an empty collection.

Because of the way Index is structured.

For a valid index:

  • index < endIndex if-and-only-if
  • collection.index(after: index) is valid

All your examples so far agree with that. So yes, they are equivalent.

1 Like

Determining neighboring indices and/or the spacing between them must be done with the index and/or distance API within Collection. Collection only mandates that later elements have higher Index values and that endIndex is higher still. There is no requirement that indices are 1 unit apart, or even have a constant number of units between two indices.

For the rest of your sample code, there is no guarantee that going past endIndex (or before startIndex) results in a valid state for the Index type. Even if it did (like for Int), it is not legal to use an out-of-bound Index value for a collection. The semantics of Collection demand that you respect this, use the provided index-manipulation API, and not create random Index states and hope they work.

@SDGGiesbrecht is entirely correct here: the documented requirements of Collection supersede whatever code happens to run on a specific collection instance. startIndex is not guaranteed to be a valid index for a collection: specifically, if it is == endIndex then it will not be a valid index. It is therefore not safe to unconditionally ask for the index after startIndex. Neither is it safe to calculate arbitrary offsets within a Collection without first confirming that the Collection has sufficiently many indices afterwards.

Whether this traps or does not trap on a given Collection is entirely unrelated to whether or not the operation is acceptable. This goes double when you are coding in a generic context (which you are): the only rules that matter there are the rules imposed by the protocol.

Some concrete examples of where there is no easy answer for how to represent the index after endIndex:

0..<Int.max
"foo"

It may be worth noting that the documentation uses the word valid to describe two separate things.

  • In some contexts, it is referring to indices which are valid to create and manipulate. This definition includes endIndex. Here some examples from the documentation for Collection:

    You can access an element of a collection through its subscript by using any valid index except the collection’s endIndex property. This property is a “past the end” index that does not correspond with any element of the collection.

    You can pass only valid indices to collection operations. You can find a complete set of a collection’s valid indices by starting with the collection’s startIndex property and finding every successor up to, and including, the endIndex property. All other values of the Index type, such as the startIndex property of a different collection, are invalid indices for this collection.

  • In other contexts, it is referring to indices which are valid to query the collection for. This definition excludes endIndex. Here some examples from the same page:

    var endIndex: Self.Index

    The collection’s “past the end” position—that is, the position one greater than the last valid subscript argument.

    var indices: Self.Indices

    The indices that are valid for subscripting the collection, in ascending order.

Confusion about the two separate uses of the word may have made this thread harder follow—especially since I often didn’t explicitly clarify which of the two I meant at any given time.

1 Like