Safe Random Access Collection Element

wendyliga · May 15, 2020, 9:03am

Hi, This is my first Pitch, please take easy on me

On swift, to access Array Element, we use index

let array = ["Hello", "World"]
print(array[0]) // Hello

but turns out if you accessing out of bounds index, you will get "Fatal error: Index out of range."

let array = ["Hello", "World"]
print(array[2]) // Fatal error: Index out of range

So to fix this issue there are just 2 solutions of it:

Index you got from Array Indicies

zip(array.indicies, array).forEach { index, value in
   ...
}

You make sure that Array contains index you want to access

if array.indices.contains(2) {
   print(array[2])
}

but this then create lots of boilerplate code, and not to mention the first solution is not practical if you want to access random index in the array.

so I purpose to add new subscript on Collection (Array and Set), that will look like this

** Update thanks to @lukasa

extension Collection where Indices.Iterator.Element == Index {
   subscript(safe index: Index) -> Iterator.Element?
}

let array = ["Hello", "World"]
print(array[safe: 0]) // Hello
print(array[safe: 2]) // nil

This way, you can easily access array elements, and if the index is out of bounds, instead of getting the fatal error, you just get nil instead. This new subscript will have a nice separation with current default subscript.

lukasa · May 15, 2020, 9:25am

This is a good idea! Because you haven't written down the signature of the function you propose, it's a bit difficult to know exactly what you want here. I can conceive of two signatures:

extension RandomAccessCollection {
    subscript(safe: Index) -> Optional<Element>

    subscript(safe: Int) -> Optional<Element>
}

These are slightly different in semantics. The first is a straightforward wrapper around the basic Index operation that performs a bounds-check before accessing the element. The second is a more complex beast that moves forward into offset-based indexing.

I think we should restrict this discussion to the former. The latter is part of a wider discussion around SE-0256 (see [Returned for revision] SE-0265: Offset-Based Access to Indices, Elements, and Slices for more).

The former is an extremely straightforward method. It's simple, does not currently exist, and simplifies some reasonably common code for accessing collections at arbitrary indices. If it were not for the fact that it operates on Index I'd say it's a slam-dunk.

However, it's slightly less useful than it seems on most collections though. There are very few collections that behave like Array, where they are both a) Index == Int and b) startIndex == 0: that is, where they are indexed by an integer type and zero-indexed. The Swift Standard Library contains only one such Collection: Array itself. Other major Swift libraries often also don't contain such Collections: SwiftNIO doesn't either, for example.

For collections that have either got non-integer indices or whose startIndex may not be 0, it is much less common to directly index them with an Index that may not be in the Collection. Usually you need to actually create an Index, and most of the ways of doing so guarantee that the Index is valid for the collection. For example, things like firstIndex(of:) and firstIndex(where:) will only return valid indices for the Collection.

More or less the only thing I can think of that would do this is calling index(after:), which may produce endIndex, which cannot safely subscript the Collection. In this case there's some mild utility, but if you were doing that you're almost always doing so in a loop. Not always (indeed, I just recently reviewed some code that did exactly this outside a loop, and where we saw runtime crashes), so there's definitely some utility still there, but I don't think it's terribly common.

All-in-all, I remain kinda neutral on this proposal. I don't think it detracts from the language, but I don't think it adds huge value for most Collections aside from Array. That raises the question of whether we should either a) just define this for Array, or b) try to get SE-0256 moving forward again instead. I don't have a strong view either way, but I think those are the other options on the table here.

wendyliga · May 15, 2020, 9:49am

I don't realize how big Swift Collection is, on my mind, I think Collection is the protocol used only for Dictionary, Set, and Array. that prove me wrong

So from your perspective, I guess what I mean is really Collection where use Index as Indicies. for this will be Array and Set.

lukasa · May 15, 2020, 9:51am

All Collections meet the constraint that Idices.Element == Index. Not all Collections meet the constraint that Index can be constructed without reference to the Collection itself.

Even Set doesn't meet your requirement. In Swift, Set.Index is an opaque type.

lukasa · May 15, 2020, 9:55am

To be clear what I mean here, I mean this:

  1> let s = Set([1, 2, 3, 4])
  2> type(of: s.startIndex)
$R1: Set<Int>.Index.Type = Set<Int>.Index
  3> type(of: s.first)
$R2: Int?.Type = Int?

Note that the type of startIndex is not Int, but Set<Int>.Index. These types are not the same.

ktoso · May 15, 2020, 9:58am

Just to throw it out there... another way to look at the issue is that such seemingly simple to make mistakes have such fatal (pun intended) consequences.

Rather, we would want to yes aborting execution but still not taking down the entire process but perhaps the thread (and bubble it up to the event loop, thread pool, dispatch queue, something else) which was running the computation (i.e. "panics" or "soft faults") which have been discussed here and there (e.g. [stdlib] Cleanup callback for fatal Swift errors - #4 by ktoso) a while ago.

This would allow for some more control of the crashing behaviors; by default the behaviors would remain the same, but depending on runtime they could mean not necessarily taking down the entire process (noted, my use cases for this are server biased, though wanted to bring it up here as well – since that seems to be the root of why fatal errors are such a pain IMHO).

Having that said, I would not mind such a safer (returning option) indexing into arrays...
but also not sure it solves the actual root cause. what causes these troubles.

xwu · May 15, 2020, 10:43am

This idea has been discussed in the past, extensively but without conclusion. I would suggest that you may be interested in the following preceding threads. Many very important points have already been made in those discussions, and I hope you’ll find it helpful in shaping your idea:

It is important to note that the existing unlabeled subscript is safe (because crashing is safe, since it prevents execution from continuing in an unexpected state) and it is checked (since it crashes precisely because it checks whether the index is out of bounds).

CTMacUser · May 15, 2020, 5:27pm

If you're just sticking in random numbers for an index with no clue whether that value is valid, your application logic is already fundamentally broken. That's why out-of-range indices are a program-crashing logic error.

If you want a random index, use indices.randomElement(). It returns nil for empty collections, but will always point to a valid element otherwise. And it works for all collections, not just ones indexed with 0..<count like Array. Note that calls work in linear time for collections that aren't random-access.

wowbagger · May 15, 2020, 6:17pm

Adding on to @xwu's list of past discussions, this is one of the proposed changes that will not likely be accepted, according to the write-up in Swift Evolution's GitHub repository.

Jumhyn · May 15, 2020, 6:27pm

The commonly rejected change noted there seems to be specifically about changing the unlabeled subscript's behavior, rather than precluding the possibility of adding an additional optional-returning variant.

wowbagger · May 15, 2020, 6:31pm

That's a good point.

Jens · May 15, 2020, 6:54pm

This has already been mentioned but I think it's worth repeating: If this failable subscript should ever be added to the stdlib, the label cannot be "safe" or "checked", because that would be misleading and confusing, ie:

I have never felt the need for this failable subscript myself, and have no strong opinion on whether it should be added or not.

Jens · May 15, 2020, 7:05pm

Or maybe what you actually want is an associative array, in which case you should just use one, ie Swift's Dictionary.

Perhaps when people think they need an Array with a failable subscript, what they actually want is a Dictionary with Int keys?

jarod · May 15, 2020, 7:34pm

You misunderstand the purpose of this API. The use case isn't to toss in an arbitrary index and hope you get something useful, or to get a random element from a collection. It's simply a convenient way to combine a bounds check + element access and is useful whenever you have an index that isn't known to be valid.

I use this extension pretty regularly in my own code. There are a few different specific use cases that come up, but the most common is when I have a known-valid index, and I need to get the element before or after that index. Those adjacent indices might be out of bounds, so using this optional subscript is a very clean and concise way to get the elements without having to do the bounds check manually.

I think the discussion here needs to focus on finding an acceptable spelling since that has always been the sticking point for this proposal. I don't have any suggestions that haven't been explored thoroughly in the past, but maybe this thread will generate some new ideas.

Lantua · May 15, 2020, 7:47pm

I gotta say, that's one of a few legit conform-to-swift-collection-model usage I've seen on this particular class of API pitches.

Jens · May 15, 2020, 8:39pm

Including whether it should be a subscript or a method, getter-only or getter-and-setter:

array[ifWithinBounds: index] = "Hello" // Does nothing if index out of bounds I guess?
array[ifWithinBounds: index] = nil // ?

A setter should probably not be supported (too strange), the following is clearer and not much longer:

if array.indices.contains(index) { array[index] = "Hello" }

And for the getter case, the pitch would enable us to write something like this:

let maybeValue = array[ifWithinBounds: index]

instead of the slightly longer currently possible:

let maybeValue = array.indices.contains(index) ? array[index] : nil

CTMacUser · May 15, 2020, 11:41pm

Looking at the API for the Collection single-element subscript:

The position of the element to access.  `position`  must be a valid index of the collection that is not equal to the  `endIndex`  property.

As I said, if you have no idea if the index value is valid, your code is already broken. The only valid ways to get an Index value is to use startIndex, endIndex (not valid for dereference), and the various other Collection API. And you have to remember that index values are invalidated after its collection gets a RangeReplaceableCollection operation applied to it.

We already have API for this: index(after:) and index(before:), where the latter is only for bi-directional collections and you need to check for endIndex/startIndex before use.

It's called using index-incrementing methods provided by the Collection API. Using the Index type's own incrementing methods, if it has any, has been illegal ever since the Swift 3(?) new-world order.

Index doesn't have to be Int or any other type that models real numbers
Even if it is, startIndex doesn't have to be zero
Separate from that, the span between consecutive indices doesn't have to be 1 from Index's perspective
The span between consecutive elements doesn't even have to be equally spaced from Index's perspective
Note that the space between these consecutive indices is still 1 from the perspective of Collection.distance(from: to:)

How are you generating these potential Index states in the first place without knowing whether or not they're valid for a given collection? Note that a given Collection.Index type doesn't have to have publicly accessible initializers.

CTMacUser · May 15, 2020, 11:42pm

Yikes! myCollection.indices.contains(somePotentialIndexValue) can be a linear search. Indices doesn't have to be Range<Index>.

Lantua · May 16, 2020, 12:29am

Actually, you need index(_:offsetBy:limitedBy:). Your point still stands, if that's not obvious.

ALSO, any properly constructed indices would results in true (except endIndex). It's questionable if someone needs to evaluate this expression. Sooo, double yikes!!

CTMacUser · May 16, 2020, 3:44pm

I said to check, which you still need to do with index(_: offsetBy: limitedBy:). But all of you want something like:

extension Collection {

    /// Returns the position immediately after the given index, if that position
    /// exists and points to a valid element.
    ///
    /// - Parameter i: A valid index of the collection.
    /// - Returns: The index value immediately after `i`, unless that would be
    ///   at or past `endIndex`, then `nil`.
    public func elementIndex(after i: Index) -> Index? {
        let end = endIndex
        guard i < end else { return nil }

        let result = index(after: i)
        return result != end ? result : nil
    }

    /// Returns the position that is the specified distance from the given
    /// index, as long as that position exists and points to a valid element.
    ///
    /// - Parameters:
    ///    - i: A valid index of the collection.
    ///    - distance: The distance to offset `i`.  `distance` must not be
    ///      negative unless the collection conforms to the
    ///      `BidirectionalCollection` protocol.
    /// - Returns: A value *x* such that `distance(from: i, to: x)` is parameter
    ///   `distance`, as long as `x` would be within `startIndex..<endIndex`;
    ///   otherwise, `nil`.
    ///
    /// - Complexity: O(1) if the collection conforms to
    ///   `RandomAccessCollection`; otherwise, O(*k*), where *k* is the absolute
    ///   value of `distance`.
    public func elementIndex(_ i: Index, offsetBy distance: Int) -> Index? {
        let end = endIndex
        guard distance != 0 else { return i != end ? i : nil }

        if distance > 0 {
            let result = index(i, offsetBy: distance, limitedBy: end)
            return result != end ? result : nil
        } else {
            return index(i, offsetBy: distance, limitedBy: startIndex)
        }
    }

}

extension BidirectionalCollection {

    /// Returns the position immediately before the given index, if that position
    /// position exists.
    ///
    /// - Parameter i: A valid index of the collection.
    /// - Returns: The index value immediately before `i`, unless that would be
    ///   past `startIndex`, then `nil`.
    func elementIndex(before i: Index) -> Index? {
        guard i > startIndex else { return nil }

        return index(before: i)
    }

}

?