[Draft] Adding Safe indexing to Array

I think it depends how you think about it. At what level is the checking taking place? I tried to address this in my first post by giving the example of the hypothetical Range.init?(checkedBounds:). I think it follows that an initializer written like that would validate the bounds and return nil upon failure. The arr[checked: i] subscript follows that logic — at least in my head.

Naming it "checked" implies that the existing subscript is unchecked, which is confusing, simple as that. For further "evidence" of the currently established meaning of the terms checked/unchecked, look at eg the compiler flag -Ounchecked, or the Range initializer uncheckedBounds or the identical behavior of the existing Index-subscripts of other collection types. I will not spend more time debating this / repeating myself.

Dear Swift Core Team, is the existing array subscript unchecked? Yes or no?
: )

If this is something I use in my tests, and others use in their tests and elsewhere, we'll end up with everyone defining something different. I don't care that much, but I think it would be worth adding, if we can find a good name for it.

You've given me an idea by highlighting how suitable it is for unit tests, where you must always prepare for failure.

array[test: 0]

The test label adds a sense of uncertainty about the result. We're testing the given index... and it might fail, otherwise it wouldn't be a test. That's also a perfect name for use in unit tests.

subscript(test position: Int) -> Element? { get }

If this is a "test", it would presumably return a boolean, no? This looks like it's semantically equivalent to array.indices.contains(0) at a glance.

Well, this code without the guard is obviously fundamentally wrong either way, whether it's using regular subscripting or safe indexing, since the arrays are of different length. By default, I'd assume that users will try "normal" subscripting, which will crash, and they'll quickly realize that they went out of bounds of the array (which the runtime check would clearly indicate to them) and fix the issue. Why would they opt to using arr[checked:]? Personally, I feel that subscripts that have labels are relatively difficult to discover, since they don't really show up in autocomplete and are difficult to search for, so I don't feel as if this is something that is easy to pick up without understanding what it does.

I think newcomers to the language will reach for array[checked:] (or whatever it is called) in the same way people misuse postifx ? today, as documented by endless examples elsewhere on the web.

That's sort of the point of the example. If you guarded at the top of the dummy example I gave with a condition like array2.count == array1.count then you spell out the preconditions early, and can consciously decide the right behaviour.
.

array[checked:] will not have a fix-it in Xcode that tells you to use it, as ! and ? do (and notice that thread is about removing the fix-it, which will supposedly make the issue go away). The only way for anyone to pick this up is to see it in someone else's code and search up what it does or have it taught to them, which are presumably better places to learn about it than a fix-it that teaches them "use this to make my errors go away".

But it's also a subscript, and that adds to the meaning. If I were to put it in function form, I'd call it:

array.element(testingAt: 0)

A subscript in essence is a more convenient expression for element(at:), and if you add a label to the subscript it's like you are adding more words to that function.

That said, there are other words in the vein of seeking something with uncertainty about success. We could use probe for instance.

I hope we use subscript syntax instead of a method syntax.

I think part of the problem is that philosophically, Arrays in Swift ought to have returned optionals from the start.

Has anyone suggested "optionally" as the label yet?

let el = arr[optionally:i]

Hmm,... for a linked list, each node (which would be a class) would have an Int (or similar) ID, and the main body of the list would have an internal [Int: Node] property to index all of its nodes. The list's Index would be those key IDs. It's up to the list to make sure all IDs are strictly increasing whenever a RangeReplaceableCollection or a custom equivalent event occurs, which may involve recalculating all indexes.

Maybe something similar for search trees? (I have to find out what they are first.)

Ooh, time to get distracted from my BigInt obsession for a Linked-List one.

Nope. You're thinking of when RandomAccessCollection.Index conforms to Strideable; and even then, nope.

Using indices or index(after:) and similar properties and methods from a Collection are the only way to get its valid indexes. Even if the Index type is Strideable or otherwise can be incremented/decremented, the collection can use a custom skipping-around scheme that does not match what Index natively provides! It did that in my custom chunking Collection; the collection groups every 5 base elements, so my collection's index routines scale all the jumps by 5. As long as I keep my jumping routines (amortized) O(1), I'm good at keeping my collection a RandomAccessCollection.

If you're using an Index instance's native routines to jump to another value, without going through the owning Collection for approval, you're technically doing it wrong.

3 Likes

I was first going to encourage generalizing this idea to Collection, then I realized the consequences made it a bad idea, possibly even for Array.

If I understand correctly, all dereferencing and other methods that peek at Index values can assume that the value is valid for the corresponding collection instance. You can start with a valid Index state for a Collection by using indices, startIndex, or endIndex. This is a precondition; you have to ensure it before calling any code, but the very premise of your proposal violates this!

I guess this idea can work with IndexDistance, and better: offsets can work with all Collection types, not just Array. It's weird appearance-wise since Index and IndexDistance are the same for Array.

You just have to redo your documentation, but the equivalence between Index and IndexDistance for the most common case (Array) will still make it confusing for newbs. So I still advise not going ahead.

And your 3 motivating examples are bad programming given the current constraints on collection indexes.

  • The set of indexes for two collection instances may not be compatible, even when the same types are involved. So just sticking an index from one collection into another is already illegal. Offsets are compatible, but even then you're supposed to check relative lengths first.
  • Your wrapper's equivalent to Index better be definitively mapping to ones of the inner collection. Your programming is just wrong otherwise. (I'm not completely sure, since your second example is vague. But the vagueness itself is a red flag that you're already not using the inner collection's indexes correctly.)
  • This one is like the first case; you need to check the bounds beforehand.
2 Likes

It is checked. You can see the source code here.

I am skeptical of this proposal. I think the need for legitimate use cases (i.e. combining a check that an index is in range with fetching of the element) is fairly rare. OTOH the defensive use of this technique i.e. "just in case the logic of my code is totally wrong, I don't want to trap" is highly dubious IMO. We should discourage this kind of use of optionals, for the same reason neverNil! should be favored over neverNil ?? butJustInCaseItIs.

Then again, given the legitimate "combined bounds check and fetch" use case, there's grounds to add it for readability/expressibility; and we've just been through a round of "but what if people misuse this feature" discussion with the python interop proposal so we probably shouldn't have that again... I think the best solution to that is to give it an appropriate argument label such as Collection.subscript(ifInBounds:).

I'm strongly opposed to making it a function. It should match the regular trapping subscript, and definitely shouldn't be more prominent. Discoverability issues are more of a tooling question than a driving reason to deviate from the current language idioms.

If we do add a nil-returning subscript (or indeed even if we don't), we should also add a parallel Collection.subscript(unchecked:) customization point, which collections can override to access the underlying elements without bounds checks. This would be an unsafe operation (so perhaps the argument label should be unsafe:, but I think I prefer unchecked:).

This is important for the implementation of other operations that also do this checking and want to avoid duplicate checks. For example, the default implementation of Collection.subscript(ifInBounds:) would check the bounds, and if not nil, call the unchecked implementation. Slice, which ought to check if the index is in the bounds of the slice not just the base, could also use this.

8 Likes

I would argue the exact opposite. That for naive programmers, the nil returning version is better because swift forces them to deal with the failure case explicitly.

guard let x = array[i] else {
    //The failure case is much clearer here
    return ...
}

I understand the decision was made the way it was for pragmatic reasons (not wanting the unwrap cost everywhere), but with the trapping version we are back to the C++ days of having to remember to add checks... and some invariably slip through.

In an ideal world, the failable version would be the default and the faster/trapping version would have a label.

My ideal would actually be having a special optimization for ! after failable subscripts/functions that doesn't incur wrapping costs (it just traps or returns the value). Then you would just use:

let x = array[i]! 

for the current version, and it is explicitly shown in code that it will trap. But when it was suggested, people said having the ! everywhere would be ugly. I still think it adds clarity though...

1 Like

This is very much not the reason. It is not about performance. Since bounds checks are still performed on subscript access, just to trap rather than decide on whether to return nil, that cost is going to be there regardless (though in many cases the optimizer can eliminate them).

The important piece elided from your guard example is: how was i generated? In the vast majority of straight-line code, i cannot possibly be outside the bounds, because it was generated from logic that ensures i is kept within the bounds.

In those cases, the failure case for the else is unreachable. It is meaningless, other than to swallow the error silently or trap or some compromise like assert in debug but silently fail in prod.

Now, you can take issue with a trap taking down the whole program – and it might perhaps be better in some future Swift if we can find some way of isolating and safely winding down rather than hard panicking. But the idea of forcing the user to unwrap every time – even in a for i in array.indices loop – is safety theatre not actual safety. Instead, the user will just wear out their ! key, and become so used to banging stuff that the one time they shouldn't, they will.

13 Likes

Not to get too far off-topic, but having arrays use optional indexing by default would also match our dictionaries. Either both should return optionals, or Dictionaries should require checking that a key exists, as Arrays require checking their count.

This is a misconception. You need to take a closer look at Array, Dictionary and Collection, and how they relate to each other.


tl;dr:

Dictionary has both Element and Index, and an Index-taking subscript, just like Array, since they are both Collections and these are requirements of Collection.

Dictionary's Element == (Key, Value) and its Index-taking subscript is checked and safe, ie it traps/crashes on Index out of bounds, exactly like Array' s Index-taking subscript! (see demo below)

Dictionary also has its own special Key-subscript that returns an optional Value, which is not required by Collection.

A Key is not an Index in this context. If we think it is, in some situation, perhaps that is indicating that we should be using a Dictionary (ie an associative array) instead of an Array?


Longer Version

Assuming that we all can now finally agree that the existing Array subscript is indeed checked and safe (since it is checking whether index is out of bounds or not and traps if its not, and that If it was unchecked, it would not check for index out of bounds and thus let us cause undefined behavior), I'd like to offer this additional gentle reminder:

Other collection types, for example Dictionary, is no different than Array when it comes to trapping on an out of bounds Index.

For example the following program will print the key-value-pairs of dict and then dict[index] will cause it to crash/trap once index has gotten out of bounds:

let dict = ["abc": 12.34, "def": 56.78]
var index = dict.startIndex
while true {
    let (key, value) = dict[index]
    print("key:", key, ", value:", value)
    index = dict.index(after: index)
}

Array's only subscript is an Index-taking subscript, and it shouldn't be compared to or confused with Dictionary's Key-taking subscript, especially since Dictionary has an Index-taking subscript, with the exact same behavior as that of Array, as shown above. This is because they are both Collections.

Note that Dictionary has an Element type and an Index type, as does all Collections. So the dict[index] above returns a value of type Dictionary<String, Double>.Element (which is == (Key, Value)).

Perhaps some of the situations that lead people into wanting to add a special Optional<Element>-returning subscript for Array, would be easier to deal with by using an associative array instead of an array? Swift's associative array is Dictionary.

Since @Jens brought this up anyway: I've been thinking for a while that an alternate conceptual approach would be to think of a new Array subscript as a key lookup, along the lines of a dictionary. Since the parameter type conflicts with the standard subscript, it has to be a keyword, but perhaps it could be:

  array [key: i]

or:

  array [lookup: i]

However…

There's yet another possible direction, which is to stop thinking of the parameter as an index (valid or invalid), but rather as an offset from the start of the array, which kind of links this thread up with this other thread: Shorthand for Offsetting startIndex and endIndex.

The various suggestions in that thread are mostly about slicing, so don't necessarily propose a single-argument lookup method/subscript, and those that do provide a non-optional result. I suppose they could have an optional result instead, since this is not a vital case in that proposal.

The advantage would be a single, consistent keyword convention (whether method or array), that "explains" the optional return type in terms of a semantic difference in the parameter type (index vs. offset), even though both types are Int for an array.

The optional result doesn't quite "sit" naturally in that thread, but I don't see all that much consensus emerging in this thread. A joint solution might be more palatable.


Edit: To clarify (since that thread is long), the idea would be to define a "complete" API set on Sequence (the first 2) or Collection (the 3rd one):

func index (offset: Int) -> Collection.Index?
func element (offset: Int) -> Collection.Element?
subscript (offset offset: Int) -> Collection.Element?

(with a keyword TBD in place of offset).

1 Like

Yes. This other thread and its offset ideas were what I was thinking of in my last reply here.

Maybe this will sound like "but this one goes to 11" but I still don't think it was a good decision.

arr[i] // must check index is valid

dict[key] // no need to check if key exists

These are two really common pieces of syntax. Would a naive user expect one to return optional, and the other to require a check? I doubt it