[Draft] Adding Safe indexing to Array

Ben_Cohen · March 14, 2018, 5:38pm

It is checked. You can see the source code here.

I am skeptical of this proposal. I think the need for legitimate use cases (i.e. combining a check that an index is in range with fetching of the element) is fairly rare. OTOH the defensive use of this technique i.e. "just in case the logic of my code is totally wrong, I don't want to trap" is highly dubious IMO. We should discourage this kind of use of optionals, for the same reason neverNil! should be favored over neverNil ?? butJustInCaseItIs.

Then again, given the legitimate "combined bounds check and fetch" use case, there's grounds to add it for readability/expressibility; and we've just been through a round of "but what if people misuse this feature" discussion with the python interop proposal so we probably shouldn't have that again... I think the best solution to that is to give it an appropriate argument label such as Collection.subscript(ifInBounds:).

I'm strongly opposed to making it a function. It should match the regular trapping subscript, and definitely shouldn't be more prominent. Discoverability issues are more of a tooling question than a driving reason to deviate from the current language idioms.

If we do add a nil-returning subscript (or indeed even if we don't), we should also add a parallel Collection.subscript(unchecked:) customization point, which collections can override to access the underlying elements without bounds checks. This would be an unsafe operation (so perhaps the argument label should be unsafe:, but I think I prefer unchecked:).

This is important for the implementation of other operations that also do this checking and want to avoid duplicate checks. For example, the default implementation of Collection.subscript(ifInBounds:) would check the bounds, and if not nil, call the unchecked implementation. Slice, which ought to check if the index is in the bounds of the slice not just the base, could also use this.

Jon_Hull · March 14, 2018, 8:10pm

I would argue the exact opposite. That for naive programmers, the nil returning version is better because swift forces them to deal with the failure case explicitly.

guard let x = array[i] else {
    //The failure case is much clearer here
    return ...
}

I understand the decision was made the way it was for pragmatic reasons (not wanting the unwrap cost everywhere), but with the trapping version we are back to the C++ days of having to remember to add checks... and some invariably slip through.

In an ideal world, the failable version would be the default and the faster/trapping version would have a label.

My ideal would actually be having a special optimization for ! after failable subscripts/functions that doesn't incur wrapping costs (it just traps or returns the value). Then you would just use:

let x = array[i]!

for the current version, and it is explicitly shown in code that it will trap. But when it was suggested, people said having the ! everywhere would be ugly. I still think it adds clarity though...

Ben_Cohen · March 14, 2018, 8:19pm

This is very much not the reason. It is not about performance. Since bounds checks are still performed on subscript access, just to trap rather than decide on whether to return nil, that cost is going to be there regardless (though in many cases the optimizer can eliminate them).

The important piece elided from your guard example is: how was i generated? In the vast majority of straight-line code, i cannot possibly be outside the bounds, because it was generated from logic that ensures i is kept within the bounds.

In those cases, the failure case for the else is unreachable. It is meaningless, other than to swallow the error silently or trap or some compromise like assert in debug but silently fail in prod.

Now, you can take issue with a trap taking down the whole program – and it might perhaps be better in some future Swift if we can find some way of isolating and safely winding down rather than hard panicking. But the idea of forcing the user to unwrap every time – even in a for i in array.indices loop – is safety theatre not actual safety. Instead, the user will just wear out their ! key, and become so used to banging stuff that the one time they shouldn't, they will.

Charles_Constant · March 14, 2018, 9:58pm

Not to get too far off-topic, but having arrays use optional indexing by default would also match our dictionaries. Either both should return optionals, or Dictionaries should require checking that a key exists, as Arrays require checking their count.

Jens · March 14, 2018, 10:06pm

This is a misconception. You need to take a closer look at Array, Dictionary and Collection, and how they relate to each other.

tl;dr:

Dictionary has both Element and Index, and an Index-taking subscript, just like Array, since they are both Collections and these are requirements of Collection.

Dictionary's Element == (Key, Value) and its Index-taking subscript is checked and safe, ie it traps/crashes on Index out of bounds, exactly like Array' s Index-taking subscript! (see demo below)

Dictionary also has its own special Key-subscript that returns an optional Value, which is not required by Collection.

A Key is not an Index in this context. If we think it is, in some situation, perhaps that is indicating that we should be using a Dictionary (ie an associative array) instead of an Array?

Longer Version

Assuming that we all can now finally agree that the existing Array subscript is indeed checked and safe (since it is checking whether index is out of bounds or not and traps if its not, and that If it was unchecked, it would not check for index out of bounds and thus let us cause undefined behavior), I'd like to offer this additional gentle reminder:

Other collection types, for example Dictionary, is no different than Array when it comes to trapping on an out of bounds Index.

For example the following program will print the key-value-pairs of dict and then dict[index] will cause it to crash/trap once index has gotten out of bounds:

let dict = ["abc": 12.34, "def": 56.78]
var index = dict.startIndex
while true {
    let (key, value) = dict[index]
    print("key:", key, ", value:", value)
    index = dict.index(after: index)
}

Array's only subscript is an Index-taking subscript, and it shouldn't be compared to or confused with Dictionary's Key-taking subscript, especially since Dictionary has an Index-taking subscript, with the exact same behavior as that of Array, as shown above. This is because they are both Collections.

Note that Dictionary has an Element type and an Index type, as does all Collections. So the dict[index] above returns a value of type Dictionary<String, Double>.Element (which is == (Key, Value)).

Perhaps some of the situations that lead people into wanting to add a special Optional<Element>-returning subscript for Array, would be easier to deal with by using an associative array instead of an array? Swift's associative array is Dictionary.

QuinceyMorris · March 14, 2018, 10:37pm

Since @Jens brought this up anyway: I've been thinking for a while that an alternate conceptual approach would be to think of a new Array subscript as a key lookup, along the lines of a dictionary. Since the parameter type conflicts with the standard subscript, it has to be a keyword, but perhaps it could be:

  array [key: i]

or:

  array [lookup: i]

However…

There's yet another possible direction, which is to stop thinking of the parameter as an index (valid or invalid), but rather as an offset from the start of the array, which kind of links this thread up with this other thread: Shorthand for Offsetting startIndex and endIndex.

The various suggestions in that thread are mostly about slicing, so don't necessarily propose a single-argument lookup method/subscript, and those that do provide a non-optional result. I suppose they could have an optional result instead, since this is not a vital case in that proposal.

The advantage would be a single, consistent keyword convention (whether method or array), that "explains" the optional return type in terms of a semantic difference in the parameter type (index vs. offset), even though both types are Int for an array.

The optional result doesn't quite "sit" naturally in that thread, but I don't see all that much consensus emerging in this thread. A joint solution might be more palatable.

Edit: To clarify (since that thread is long), the idea would be to define a "complete" API set on Sequence (the first 2) or Collection (the 3rd one):

func index (offset: Int) -> Collection.Index?
func element (offset: Int) -> Collection.Element?
subscript (offset offset: Int) -> Collection.Element?

(with a keyword TBD in place of offset).

CTMacUser · March 14, 2018, 11:46pm

QuinceyMorris:

However…

There’s yet another possible direction, which is to stop thinking of the parameter as an index (valid or invalid), but rather as an offset from the start of the array, which kind of links this thread up with this other thread: Shorthand for Offsetting startIndex and endIndex.

The various suggestions in that thread are mostly about slicing, so don’t necessarily propose a single-argument lookup method/subscript, and those that do provide a non-optional result. I suppose they could have an optional result instead, since this is not a vital case in that proposal.

The advantage would be a single, consistent keyword convention (whether method or array), that “explains” the optional return type in terms of a semantic difference in the parameter type (index vs. offset), even though both types are Int for an array.

The optional result doesn’t quite “sit” naturally in that thread, but I don’t see all that much consensus emerging in this thread. A joint solution might be more palatable.

Yes. This other thread and its offset ideas were what I was thinking of in my last reply here.

Charles_Constant · March 16, 2018, 12:16am

Maybe this will sound like "but this one goes to 11" but I still don't think it was a good decision.

arr[i] // must check index is valid

dict[key] // no need to check if key exists

These are two really common pieces of syntax. Would a naive user expect one to return optional, and the other to require a check? I doubt it

DeFrenZ · March 16, 2018, 12:23am

The mistake — in regards to clarity — was to make Dictionary's subscript unlabelled, as this has led people to associate it with index subscripting... but that ship has sailed I believe...

QuinceyMorris · March 16, 2018, 1:40am

I'd say "yes" is about as likely an answer as "no".

Let me give you two additional reasons why the current way is IMO the "right" one, and see if that moves your intuition on the subject at all:

Arrays are the natural data structure for modeling important mathematical constructs such as vectors, matrices and … um … arrays. In those contexts, the idea of being unable to retrieve subscripted items without needing to unwrap seems extremely unpalatable. It also seems likely to kill the possibility of good performance in highly compute-bound calculations using those data structures, such as matrix multiplication.
Arrays are different from dictionaries in that the domain of their subscripts (that is, what indexes are valid) is easily and quickly checkable before subscripting. This means that bounds checks can often be hoisted out of loops, if they're not implicit in the loop bounds to begin with. For dictionaries, the domain of their valid keys is sparse, and it's (likely) as expensive to check a key for validity as it is to look up the value for a key. It makes sense to combine the two operations in that case.

Any parity between "indexing" on Array and Dictionary is more an artifact of using Collection as an abstraction of their implementation than a reflection of common semantics.

Charles_Constant · March 16, 2018, 2:52am

You've given me food for thought and I appreciate the explanation. I think I'm in danger of derailing the thread at this point

Michael_Ilseman · March 16, 2018, 4:26pm

I think you're mixing up two different functions here, which is easy to do with function overloading, where the same name can take a different argument type and return a different result type.

Dictionary has a subscript that takes an Index and returns a non-optional Element. Array has a subscript that takes an Index and returns a non-optional Element. Both Dictionary's and Array's do bounds checking and trap on out of bounds, to ensure memory safety.

Your example uses Dictionary's additional subscript that does member lookup, which Array can't have for a couple reasons:

An element-lookup subscript on Array<Int> would be ambiguous, given that Array's Index type is Int. Dictionary has a distinct Key type that cannot be ambiguous with its Index type.
Unadorned subscript strongly implies O(1) access, while Array would have to O(n) scan. Dictionary is a hashed collection that can support O(1) element lookup.

Nicole_Jacque · March 16, 2018, 6:20pm

In the interest of keeping threads relevant, I'm closing this thread, as the draft in question has been withdrawn and will not be moving forward. Feel free to take any spillover conversations to another thread!