[Draft] Adding Safe indexing to Array

MutatingFunk · March 13, 2018, 11:15am

Interestingly, my typical use cases for this don't seem to overlap at all with your motivations:

Optional chaining, usually for succinct comparisons:

if items[ifPresent: 1]?.property == value { … }

Safe use of ?? to provide a default value.

let value = items[ifPresent: 1]?.property ?? defaultValue

Note the current system is:

let value = items.indices.contains(1) ? items[1] : defaultValue
//equivalent to
let value = value != nil ? value! : defaultValue

PS: I'll second the motivation for forwarding optional-returning methods to an array access.

DevAndArtist · March 13, 2018, 11:38am

One more thing about the inconsistency. Does Swift consider renaming API as a huge source breaking change, even if the migration should be straight forward? (One flatMap was renamed to compactMap.)

From my perspective it would make sense to require a label on the existing subscripts.

public protocol Collection : Sequence {
  public subscript(unchecked position: Self.Index) -> Self.Element { get }
  public subscript(unchecked bounds: Range<Self.Index>) -> Self.SubSequence { get }
}

public protocol MutableCollection : Collection where Self.SubSequence : Collection {
  public subscript(unchecked position: Self.Index) -> Self.Element { get set }
  public subscript(unchecked bounds: Range<Self.Index>) -> Self.SubSequence { get set }
}

Then we could introduce failable subscripts and extend Mutable-/Collection so that all types would benefit from that.

extension Collection {
  // Checked by default
  public subscript?(position: Self.Index) -> Self.Element {
  //              ^ again pay attention to `?`
    ...
  }

  // Checked by default
  public subscript?(bounds: Range<Self.Index>) -> Self.SubSequence {
  //              ^ again pay attention to `?`
    ...
  }
}

Similar for MutableCollection as I showed in this thread already. Or at least some kind of retrofitting the checked subscripts into all Mutable-/Collection types. (On Dictionary we already have the checked subscript but it also uses a different type for position, so we might need to add an extra assciated CheckedIndex type.)

I would advocate for a checked access as the default, same analogy that we use with Unsafe types. So we'd rotate the default to be checked and require an unchecked label for the unsafe access to the elements of a collection.

let foo: [Int: Int] = [0:0, 1:1]
let bar: [Int: Int] = [2:2]

if let index = foo.index(forKey: 1) {
  bar[index] // current unchecked subscript which will crash
}

Jens · March 13, 2018, 11:59am

But, again, as mentioned several times in this thread, these existing ones are checked (and safe). If we decide to call it something else, then we'd have to also eg rename the compiler flag -Ounchecked.

DevAndArtist · March 13, 2018, 12:01pm

Okay I'm confused why are the existing subscripts checked? Maybe it's a different context that you and I are talking about. (The fatal error isn't the check I meant.)

Jens · March 13, 2018, 12:16pm

Note that the existing subscript traps on out of bounds, and it can do this because it is checking whether the index is out of bounds or not. So it is safe because it is checking that the index is within bounds and traps otherwise. If it really was unchecked and unsafe then it would allow us to read and write outside the allocated memory of the array (with undefined behavior as the result).

This meaning of the words "checked" and "unchecked" is already established in Swift (and probably other languages) through the existing compiler flag -Ounchecked, and for example:

/// Creates an instance with the given bounds.
///
/// Because this initializer does not perform any checks, it should be used
/// as an optimization only when you are absolutely certain that `lower` is
/// less than or equal to `upper`. Using the half-open range operator
/// (`..<`) to form `Range` instances is preferred.
///
/// - Parameter bounds: A tuple of the lower and upper bounds of the range.
public init(uncheckedBounds bounds: (lower: Range.Bound, upper: Range.Bound))

Which we can see in action here:

var a = 1
var b = -1
let r1 = a ..< b // Will trap (at runtime) because it is checked and safe!
let r2 = Range<Int>(uncheckedBounds: (a, b)) // OK, but dangerous/unsafe!

Here is a list of the previous mentions in this thread (some of them are only pointing out the confusion caused by trying to make checked/unchecked mean something else than it already does in Swift):

DevAndArtist · March 13, 2018, 12:38pm

Okay that makes sense to me now. So such a naming rotation won't and cannot happen. I can see one solution to the major problem of this thread, but I still have to reuse the non existing failable subscript (subscript?) to achieve it.

public struct FailableAccess<Elements> where Elements : SomeProtocol {
  public subscript?(position: Elements.Index) -> Elements.Element {
    /* with failable subscript semantics */
  }
}

public struct FailableMutableAccess<Elements> where Elements : SomeProtocol {
  public subscript?(position: Elements.Index) -> Elements.Element {
    get { /* with failable subscript semantics */ }
    set { /* with failable subscript semantics */ }
  }
}

public protocol Collection ... {
  associatedtype FailableView = FailableAccess<Self>
  var failable: FailableView { get }
}

extension Collection where FailableView == FailableAccess<Self> {
  public var failable: FailableView { ... }
}

public protocol MutableCollection ... {
  associatedtype FailableView = FailableMutableAccess<Self>
  var failable: FailableView { get set }
}

extension Collection where FailableView == FailableMutableAccess<Self>{
  public var failable: FailableView { ... }
}

var array = [0,1]
let element: Int? = array.failable[2]
array.failable[2] = 42 // will fail without error
array.failable[2] = nil // that is a compile time error because of the failable subscript semantics

What this bikeshedding code does?

It intorduces two things:

a failable subscript
a failable view for all collection and mutable collection types that implements both the read only (failable) subscript and the read write (failable) subscript

bzamayo · March 13, 2018, 12:57pm

Regarding the naming controversy, I think arr[checked: i] is appropriate and inline with standard library naming. Range has an initializer .init(uncheckedBounds:) which indicates that a check is skipped over in the name of performance when the caller knows the preconditions are met. It follows from that a natural signature for an initializer that does validate the parameters would be named .init?(checkedBounds:), if it was to exist.

My primary concern is with the motivation as to whether this should exist at all. Taking the three examples in the draft proposal for instance:

Who's to say that a nil value is better than trapping here? In the zipped sequence case, the intention of the code might be for the arrays to be equal length. Surely, a better way to program this would be to check the preconditions ahead of time using guard and an early return, or similar, rather than relying on nil values that can be easily discarded.

var sum = 0

for i in array1.indices {
    let value1 = array1[i]

    if let value2 = array2[checked: i] {
       sum += value1 * value2
    }
}

print(sum)

This will lead to an incorrect sum if array2 is shorter than expected. It won't crash, but it's not really a correct program if the intent was to multiply every value in array1 with every value in array2. Now, you are going to say 'well in this case the checked subscript shouldn't be used at all'. This is true, but I see a danger that developers new to the language will over-use arr[checked:] with a mentality that 'it doesn't crash, it must be right'. It's opening up another avenue for the implicit misuse of optionals. A guard formulation (which is ergonomic and perfectly possible with Swift as is) is a better way to teach how preconditions should be considered, and the code is more efficient to boot.

Again it depends on the exact scenario, but the third bullet point overlaps with the argument I already made for the first example. Your second bullet point is most compelling but I'd ask how often does this happen in practice. A defined struct with named properties is a much better representation of a data object (and I would guess much more common in real APIs) than an untyped array of indeterminate length. In my experience, I have seen some lower-level high-performance C code ported into Swift that does return arrays like this, but in this case you are probably caring about the performance profile of the code already, which means you probably want to avoid the overhead of arr[checked:] anyway.

For completeness, here's another look at an example usage posed downthread.

michelf:

One use case for me is writing unit tests. I often check the results of some operation by checking all the values in the array in a sequence, like this:

let pathString = "#f=FFF/g=GGG/u=UUU/___"
let path = FragmentPath(fragment: pathString)
XCTAssertEqual(path.selectors.count, 4)
XCTAssertEqual(path.selectors.at(0), FragmentPath.Selector(prefix: .file, identifier: "FFF"))
XCTAssertEqual(path.selectors.at(1), FragmentPath.Selector(prefix: .group, identifier: "GGG"))
XCTAssertEqual(path.selectors.at(2), FragmentPath.Selector(prefix: .unit, identifier: "UUU"))
XCTAssertEqual(path.selectors.at(3), FragmentPath.Selector(prefix: nil, identifier: "___"))

In this case, you are already checking that the count of the selectors array is 4 so the optional subscript checks are redundant. And the reality is this is how good tests are formed, explicitly and tediously laying out each expectation.

At the end of the day, if a codebase really warrants this kind of checked subscript for use in their test suite or whatever, they can write their own extension in literally two lines. In summary, I do not want to support this proposal without better motivating examples.

Jens · March 13, 2018, 2:29pm

Before I give up on trying to explain why "checked" really would be a confusing name, I ask you to please read my last post in which I used exactly this init(uncheckedBounds:) of Range as "evidence" for the established meaning of checked/unchecked (the meaning which makes it a confusing choice for the subscript being discussed here).

The checked version of Range's init(uncheckedBounds:) is mentioned as the preferred version to use in its docs, and it's the operator: a ..< b, which is checking the bounds, and traps(!) if the bounds are invalid, and is thereby safe! Just like the existing Array subscript!

bzamayo · March 13, 2018, 2:36pm

I think it depends how you think about it. At what level is the checking taking place? I tried to address this in my first post by giving the example of the hypothetical Range.init?(checkedBounds:). I think it follows that an initializer written like that would validate the bounds and return nil upon failure. The arr[checked: i] subscript follows that logic — at least in my head.

Jens · March 13, 2018, 2:38pm

Naming it "checked" implies that the existing subscript is unchecked, which is confusing, simple as that. For further "evidence" of the currently established meaning of the terms checked/unchecked, look at eg the compiler flag -Ounchecked, or the Range initializer uncheckedBounds or the identical behavior of the existing Index-subscripts of other collection types. I will not spend more time debating this / repeating myself.

Dear Swift Core Team, is the existing array subscript unchecked? Yes or no?
: )

michelf · March 13, 2018, 2:38pm

If this is something I use in my tests, and others use in their tests and elsewhere, we'll end up with everyone defining something different. I don't care that much, but I think it would be worth adding, if we can find a good name for it.

You've given me an idea by highlighting how suitable it is for unit tests, where you must always prepare for failure.

array[test: 0]

The test label adds a sense of uncertainty about the result. We're testing the given index... and it might fail, otherwise it wouldn't be a test. That's also a perfect name for use in unit tests.

subscript(test position: Int) -> Element? { get }

saagarjha · March 13, 2018, 4:49pm

If this is a "test", it would presumably return a boolean, no? This looks like it's semantically equivalent to array.indices.contains(0) at a glance.

saagarjha · March 13, 2018, 4:56pm

bzamayo:

var sum = 0

for i in array1.indices {
    let value1 = array1[i]

    if let value2 = array2[checked: i] {
       sum += value1 * value2
    }
}

print(sum)
This will lead to an incorrect sum if array2 is shorter than expected. It won’t crash, but it’s not really a correct program if the intent was to multiply every value in array1 with every value in array2. Now, you are going to say ‘well in this case the checked subscript shouldn’t be used at all’. This is true, but I see a danger that developers new to the language will over-use arr[checked:] with a mentality that ‘it doesn’t crash, it must be right’. It’s opening up another avenue for the implicit misuse of optionals. A guard formulation (which is ergonomic and perfectly possible with Swift as is) is a better way to teach how preconditions should be considered, and the code is more efficient to boot.

Well, this code without the guard is obviously fundamentally wrong either way, whether it's using regular subscripting or safe indexing, since the arrays are of different length. By default, I'd assume that users will try "normal" subscripting, which will crash, and they'll quickly realize that they went out of bounds of the array (which the runtime check would clearly indicate to them) and fix the issue. Why would they opt to using arr[checked:]? Personally, I feel that subscripts that have labels are relatively difficult to discover, since they don't really show up in autocomplete and are difficult to search for, so I don't feel as if this is something that is easy to pick up without understanding what it does.

bzamayo · March 13, 2018, 5:04pm

I think newcomers to the language will reach for array[checked:] (or whatever it is called) in the same way people misuse postifx ? today, as documented by endless examples elsewhere on the web.

That's sort of the point of the example. If you guarded at the top of the dummy example I gave with a condition like array2.count == array1.count then you spell out the preconditions early, and can consciously decide the right behaviour.
.

saagarjha · March 13, 2018, 5:24pm

array[checked:] will not have a fix-it in Xcode that tells you to use it, as ! and ? do (and notice that thread is about removing the fix-it, which will supposedly make the issue go away). The only way for anyone to pick this up is to see it in someone else's code and search up what it does or have it taught to them, which are presumably better places to learn about it than a fix-it that teaches them "use this to make my errors go away".

michelf · March 13, 2018, 6:02pm

But it's also a subscript, and that adds to the meaning. If I were to put it in function form, I'd call it:

array.element(testingAt: 0)

A subscript in essence is a more convenient expression for element(at:), and if you add a label to the subscript it's like you are adding more words to that function.

That said, there are other words in the vein of seeking something with uncertainty about success. We could use probe for instance.

Charles_Constant · March 14, 2018, 8:15am

I hope we use subscript syntax instead of a method syntax.

I think part of the problem is that philosophically, Arrays in Swift ought to have returned optionals from the start.

Has anyone suggested "optionally" as the label yet?

let el = arr[optionally:i]

CTMacUser · March 14, 2018, 4:53pm

Hmm,... for a linked list, each node (which would be a class) would have an Int (or similar) ID, and the main body of the list would have an internal [Int: Node] property to index all of its nodes. The list's Index would be those key IDs. It's up to the list to make sure all IDs are strictly increasing whenever a RangeReplaceableCollection or a custom equivalent event occurs, which may involve recalculating all indexes.

Maybe something similar for search trees? (I have to find out what they are first.)

Ooh, time to get distracted from my BigInt obsession for a Linked-List one.

CTMacUser · March 14, 2018, 5:10pm

Nope. You're thinking of when RandomAccessCollection.Index conforms to Strideable; and even then, nope.

Using indices or index(after:) and similar properties and methods from a Collection are the only way to get its valid indexes. Even if the Index type is Strideable or otherwise can be incremented/decremented, the collection can use a custom skipping-around scheme that does not match what Index natively provides! It did that in my custom chunking Collection; the collection groups every 5 base elements, so my collection's index routines scale all the jumps by 5. As long as I keep my jumping routines (amortized) O(1), I'm good at keeping my collection a RandomAccessCollection.

If you're using an Index instance's native routines to jump to another value, without going through the owning Collection for approval, you're technically doing it wrong.

CTMacUser · March 14, 2018, 5:37pm

I was first going to encourage generalizing this idea to Collection, then I realized the consequences made it a bad idea, possibly even for Array.

If I understand correctly, all dereferencing and other methods that peek at Index values can assume that the value is valid for the corresponding collection instance. You can start with a valid Index state for a Collection by using indices, startIndex, or endIndex. This is a precondition; you have to ensure it before calling any code, but the very premise of your proposal violates this!

I guess this idea can work with IndexDistance, and better: offsets can work with all Collection types, not just Array. It's weird appearance-wise since Index and IndexDistance are the same for Array.

You just have to redo your documentation, but the equivalence between Index and IndexDistance for the most common case (Array) will still make it confusing for newbs. So I still advise not going ahead.

And your 3 motivating examples are bad programming given the current constraints on collection indexes.

The set of indexes for two collection instances may not be compatible, even when the same types are involved. So just sticking an index from one collection into another is already illegal. Offsets are compatible, but even then you're supposed to check relative lengths first.
Your wrapper's equivalent to Index better be definitively mapping to ones of the inner collection. Your programming is just wrong otherwise. (I'm not completely sure, since your second example is vague. But the vagueness itself is a red flag that you're already not using the inner collection's indexes correctly.)
This one is like the first case; you need to check the bounds beforehand.