Collection's "past the end" endIndex

timv · July 2, 2016, 4:48pm

Why is endIndex one greater than the last valid subscript argument, rather than simply the last valid subscript argument? I’m sure there are great reasons for this, but I just can’t think of them myself.

I’m implementing a binary tree and I want to implement BidirectionalCollection. Each index contains a (private) reference to the corresponding node in the tree. However, endIndex shouldn’t point to anything, and it makes things very complicated if I want a constant time implementation of index(before:). How are we supposed to deal with endIndex if the index is something more complex than a simple integer?

snej · July 2, 2016, 6:40pm

Why is endIndex one greater than the last valid subscript argument, rather than simply the last valid subscript argument? I’m sure there are great reasons for this, but I just can’t think of them myself.

It’s hard to handle empty collections, otherwise. If endIndex were the last valid subscript, then in an empty collection what would its value be? Presumably it would have to point to one before the start, which is rather weird because that’s never a valid index (in an int-indexed collection it would be negative.) Even worse, endIndex is never reachable from startIndex, so an iteration can’t simply continue until it reaches the endIndex. Instead the Indexable type would have to implement a (cheap, constant-time) >= operator to use as the test for an iteration.

These are probably the reasons why C++ also uses the convention of a collection’s end() pointing to one past the end.

I’m implementing a binary tree and I want to implement BidirectionalCollection. Each index contains a (private) reference to the corresponding node in the tree. However, endIndex shouldn’t point to anything, and it makes things very complicated if I want a constant time implementation of index(before:).

Have the tree object keep a pointer to its last node. That makes it cheap to get that node as the ‘before’ of the last node. You’ll also have the benefit of making endIndex() itself constant-time, whereas presumably it’s now O(log n) since you have to walk down the tree to find it.

Or alternatively, you could add a flag to your index type that indicates it’s pointing to the end of the collection. Then the endIndex() really points to the last node but has the flag set.

—Jens

···

On Jul 2, 2016, at 9:48 AM, Tim Vermeulen via swift-users <swift-users@swift.org> wrote:

Brent_Royal-Gordon · July 2, 2016, 9:34pm

To add to Jens's point, all of this adds up to simplify iterating over a collection with a `while` loop (remember, all loops are `while` loops eventually):

  // Without invalid endIndex:

  var index = c.startIndex
  var done = c.isEmpty

  while !done {
    ...

    if index == c.endIndex {
      done = true
    }
    else {
      index = c.index(after: index)
    }
  }

  // With invalid endIndex:

  var index = c.startIndex
  while index != c.endIndex {
    ...
    index = c.index(after: index)
  }

There are even more reasons than this one. For instance, having `endIndex` be past the end means that `insert(_:at:)` can insert after the last element of the collection, in addition to before any other element. It gives other range-based APIs similar powers. It means that `distance(from: startIndex, to: endIndex)` is the count of the elements in the collection. It means you natively use `Range` instead of `ClosedRange`, which (perhaps counterintuitively) is easier to reason about. And so on.

The manuscripts of famed computer scientist Edsger Dijkstra include a little note about ranges and zero-based indexes; its logic is intimately related to this: E.W. Dijkstra Archive: Why numbering should start at zero (EWD 831)

···

On Jul 2, 2016, at 11:40 AM, Jens Alfke via swift-users <swift-users@swift.org> wrote:

It’s hard to handle empty collections, otherwise. If endIndex were the last valid subscript, then in an empty collection what would its value be? Presumably it would have to point to one before the start, which is rather weird because that’s never a valid index (in an int-indexed collection it would be negative.) Even worse, endIndex is never reachable from startIndex, so an iteration can’t simply continue until it reaches the endIndex. Instead the Indexable type would have to implement a (cheap, constant-time) >= operator to use as the test for an iteration.

--
Brent Royal-Gordon
Architechies

timv · July 2, 2016, 7:30pm

Why is endIndex one greater than the last valid subscript argument, rather than simply the last valid subscript argument? I’m sure there are great reasons for this, but I just can’t think of them myself.

It’s hard to handle empty collections, otherwise. If endIndex were the last valid subscript, then in an empty collection what would its value be? Presumably it would have to point to one before the start, which is rather weird because that’s never a valid index (in an int-indexed collection it would be negative.) Even worse, endIndex is never reachable from startIndex, so an iteration can’t simply continue until it reaches the endIndex. Instead the Indexable type would have to implement a (cheap, constant-time) >= operator to use as the test for an iteration.

These are probably the reasons why C++ also uses the convention of a collection’s end() pointing to one past the end.

Thanks Jens, that makes sense. An alternative to the current model would be to require an implementation of isEmpty, and in the case that it evaluates to true, startIndex and endIndex aren’t considered. That isn’t as clean as the current model though, and I don’t doubt that the core Swift team thought this out very well.

I’m implementing a binary tree and I want to implement BidirectionalCollection. Each index contains a (private) reference to the corresponding node in the tree. However, endIndex shouldn’t point to anything, and it makes things very complicated if I want a constant time implementation of index(before:).

Have the tree object keep a pointer to its last node. That makes it cheap to get that node as the ‘before’ of the last node. You’ll also have the benefit of making endIndex() itself constant-time, whereas presumably it’s now O(log n) since you have to walk down the tree to find it.

You’re right, I need to keep a reference to the last node anyways to ensure constant time access to endIndex. This was a bigger issue when I implemented a singly linked list, but I was able to implement a nice workaround.

Or alternatively, you could add a flag to your index type that indicates it’s pointing to the end of the collection. Then the endIndex() really points to the last node but has the flag set.

I also did this before, but it made everything less elegant than it needed to be. Thanks for your suggestions, I’ll stick with keeping a reference to the last node.

···

On 2 Jul 2016, at 20:40, Jens Alfke <jens@mooseyard.com> wrote:

On Jul 2, 2016, at 9:48 AM, Tim Vermeulen via swift-users <swift-users@swift.org <mailto:swift-users@swift.org>> wrote:

—Jens