[Draft] Adding Safe indexing to Array

gwendal.roue · March 12, 2018, 8:08am

Yes, please. Something short and easily recognizable. Conventions are a powerful tool: we don't always need to fully describe a method/operator/subscript behavior in its name. That's a classic reflex in front of anything new. But we're talking about a convenience method that should have a big success, here. It won't remain new very long. Proper API design should take this fact in account.

saagarjha · March 12, 2018, 9:36am

Without providing an opinion on this suggestion, I'd like to point out that this is how C++ does it.

xwu · March 12, 2018, 10:07pm

Methods should adhere to the API naming guidelines, whether they’re often used or not. In fact, if the most commonly used APIs don’t adhere to the guidelines, that’s much more inconsistent for the end user than if the least commonly used APIs don’t adhere to the guidelines.

If this is going to be a method, I’d imagine the name could be element(at:), but certainly at(_:) won’t cut it.

jarod · March 12, 2018, 11:32pm

I don't agree -- I feel pretty strongly that similar APIs should be named in a way that distinguishes them on their functionality unless there's a strong reason not to.

array[index]
array.at(index) or array.element(at: index)

There's nothing about these names that indicates a difference in functionality. You could assign the safe / unsafe behavior to any of them arbitrarily. It's obvious in the context of this discussion that whichever one returns an optional is the safe version, but I think it would be less obvious in the real world, and potentially a real source of confusion for learning developers in particular.

It's worth exploring whether a method would work better than a labeled subscript, but imo the behavior of the method needs to be made clear in its name.

xwu · March 12, 2018, 11:49pm

Remember, though, that types are part of the API and of the signature. That was a huge factor in the "omit needless words" part of the API naming guidelines.

A method that returns Element? versus a method that returns Element have clearly different error handling; there are other reasons why we can't simply overload the same name (including practical ones like optional promotion).

jarod · March 12, 2018, 11:56pm

I understand, although in this case I feel that it wouldn't be clear enough why one version returns an optional while the other does not if the name didn't help to clarify.

Nevin · March 13, 2018, 12:24am

Thinking more about this, I believe the functionality we should actually be pushing for is a fast way to test “is a given index valid for this collection”. Currently we can write “indices.contains(i)”, but that could be O(n) on a non-RandomAccessCollection.

Perhaps we should consider adding a method like this:

extension Collection {
  func isValidIndex(_ i: Index) -> Bool
}

Jens · March 13, 2018, 2:23am

My initial reaction is to agree with this. But, for a collection type on which indices.contains(i) is O(n), and thus probably also its (Comparable) .endIndex, is it really possible to write this isValidIndex(_:)-method so that it is faster than O(n)? Wouldn't it need some kind of caching/memoization?

Erica_Sadun · March 13, 2018, 2:26am

Despite all the bikeshedding fun, I'm not really seeing motivating use-cases being discussed beyond those mentioned in the pitch at the top of the thread.

If you feel you would use this feature in your code, regardless of whether it is for production code, scripting, pedagogy, or any other purpose, can you please help me expand the list of motivating scenarios. Without those, despite the positive reactions I get from many developers, I don't feel there's enough here to move forward. Thanks in advance for your use-cases.

(cc @Chris_Lattner3)

Nevin · March 13, 2018, 2:57am

I don’t know how to write it, but the compiler already guarantees a deterministic crash on subscripting with invalid indices. It should be possible to use whatever the existing strategy is, and return “false” from “isValid”.

The fact that this probably requires compiler support is why I think it is worthwhile to include in the standard library.

saagarjha · March 13, 2018, 3:42am

That's only for Array, right? Custom Collections can do whatever they want.

ckeithray · March 13, 2018, 4:05am

Yeah... I probably wouldn't use "if let x = arr.at(index)" or whatever for arrays or similar data types.

I might be more interested in default values. "let x = arr.at(index) ?? (-1)"

(BTW, a few more container types would be nice in the standard library: queue, deque, etc.)

duan · March 13, 2018, 7:09am

The 3rd item in the Motivation section actually decreases my desire to have this API:

Unintentional out of bounds issues can happen easily in array slices. Support is provided here for both Array and ArraySlice types.

Are we just admitting we are making it easy to do wrong things? iOS app developers of certain school of thoughts are highly incentivized to use this way of accessing arrays/slices ("look ma, no crash!"). Many UIKit APIs will gladly take a nil, and there'd be more hidden out-of-bounds bugs in Swift apps.

Even if we somehow establish a community-wise best practice that discourage abuse, this API would be that asterisk in the Array chapter of Swift books and team style guides. To introduce such a caveat to a critical data structure for a few edge-case usefulness seems bad, from an educational point-of-view.

(In case this was accepted and we have to document the caveat, I propose we hide it in introductory materials and call it "progressively disclosed convenience". )

gwendal.roue · March 13, 2018, 7:12am

Indeed. But if you have a problem with array.at(index), you should have a problem with array[index] too.

array[index] was first to introduce a complex behavior behind a plain subscript. I'm sure that the API design guidelines have been carefully written to accommodate with such exceptions to the clarity rule, and that some will be happy posting the guidelines excerpt that cleanses array[index]. But this would not be very sensible, IMHO.

jarod · March 13, 2018, 7:30am

I don't mind either one in isolation -- array[index] is a widely-established convention, and it's intuitive to me that array.at(index) would do the same thing as array[index]. My issue is that because they seem like different ways to spell the same thing, it would be confusing if both of them existed at the same time and functioned differently, and I don't think the optional return type of one vs. the other is enough to alleviate that confusion.

gwendal.roue · March 13, 2018, 7:37am

On the contrary, I put all bets on this difference. You just can't use one as you use the other.

gwendal.roue · March 13, 2018, 7:42am

Intuitions can be shaped. array.at(index) has enough potential to enter the basic fundamental Swift toolkit that everybody learns quickly. The official documentation will of course talk about it, since it is so fundamental. All online Swift newbie tutorials will talk about it as well. It will get voted as the best answer about all "index out of bounds" errors on StackOverflow. Etc.

Karl · March 13, 2018, 8:42am

Obj-C even has this "amazing" behaviour where you can send messages to NULL with no crash.

As for this: I don't think it replaces subscripting, but I can see it being helpful in some particular situations:

if array.count >= 3 {
  let value = array[2]
}

// will become

if let value = array.element(at: 2) {
 ...
}

and

let value: String?
if array.count >= 3 {
 value = array[2] 
} else { 
 value = nil 
}

// will become

let value = array.element(at: 2)

Some more thoughts:

I think it should be a function, not a subscript
If it was a subscript, I would prefer unchecked rather than checked. The latter implies something better than a regular subscript. Really, we're saying that the Index is unchecked, so I think it's a better parameter label.
I like the name element(at:) because it mirrors the name of the Collection.Element associated type. I don't we need to say unchecked for a function.

DevAndArtist · March 13, 2018, 8:58am

The only issue I personally have with all these different getters is the inconsistency across all our Sequence / Collection types. In some we can always assume that the value can be nil and in other the program will crash due to out of bounds issue. I understand that this is more or less a philosophical issue, but still (sigh). I really would wish that we had a consistent checked / unchecked access across all of those types, but that ship has sailed long time ago.

Alexey · March 13, 2018, 9:48am

Yes, possibly we will have NSOrderedSet Swift equivalent in future.

But just checks for out of bounds is not first-class necessity as I see. It is useful though.
As Erica_Sadun told us we should find sufficient list of motivating scenarios. I think new features should evolve Swift code but not just add one more function that everyone can write in a couple of minutes.