Add subscript with unchecked index to Array

I sometimes use the following (for cases where the overhead of the bounds check matters and the optimizer is unable to get rid of it):

extension Array {
  subscript(unchecked index: Index) -> Element {
    self.withUnsafeBufferPointer { $0[index] }
  }
}

Has this been pitched before (I couldn't find it if so)?

Any reason why the Standard Library shouldn't let us opt out of bounds checking (for both get and set, and for both Array and ContiguousArray)

let x = myArray[unchecked: i]
...
myArray[unchecked: i] = y

?

In Swift, that's simply called UnsafeBufferPointer, as you show...

I'm not sure what you mean, my code could involve some myArray that makes sense to define as an Array (rather than an UnsafeBufferPointer) and using myArray[unchecked: i] could be motivated at some place in my code, while at most other places myArray[i] would be used.

Do you mean that the Standard Library should not provide an unchecked subscript because in the rare cases that we'd need it, we're supposed to write:

let x = myArray.withUnsafeBufferPointer { $0[i] }

instead of:

let x = myArray[unchecked: i]

(or implement that subscript ourselves)
?

Perhaps I should have worded my question differently: Any reason why the Standard Library shouldn't provide this more succinct API for opting out of bounds checking?

1 Like

Ohhhh! Whoops. I blame my morning brain.

1 Like

If we added this "more succinct API" to Array and added the bounds-checked API to UnsafeBufferPointer, then they would be one and the same type.

That Swift has chosen to separate these APIs in the type system reflects a deliberate judgment that interleaving both bounds-checked and bounds-unchecked access on the same instance is either rare, to be discouraged, or both.

If we decide that this judgment is incorrect, and that interleaving these two different kinds of access is sufficiently important to require support in the standard library, then the "safe by default" tentpole design philosophy of Swift would argue for adding the bounds-checked API to UnsafeBufferPointer and not the bounds-unchecked API to Array.

1 Like

I guess I'm not sure if/how eg these are in line with that design philosophy:

Array.init(unsafeUninitializedCapacity: initializingWith:)
Range<T>.init(uncheckedBounds: (lower:, upper:))
&+ &- &*

Ie, aren't these examples of similar succinct unsafe / unchecked APIs?

1 Like

Nowhere do I suggest that standard library types shouldn't have any unsafe or unchecked APIs (which, by the way, the overflowing arithmetic operators are not). You asked specifically about bypassing bounds checking on accessing elements of an array when the compiler can't prove that it's safe, which is the paradigm of an unsafe operation. The standard library elects to offer a distinct type for this functionality; this is a deliberate design choice.

1 Like

OK, fine.

FWIW, my (sloppy) motivation for why the stdlib could/should allow this:

Swift is safe by default, so myArray[i] does bounds checking and traps if i is out of bounds,
but of course, if you really have to, you can opt out of this default behavior by being explicit about that: myArray[unchecked: i].

When do you "really have to"?

Also, we should clarify: the stdlib already allows this. What you're asking for is the stdlib to give a concise spelling for it. That's a pretty high bar to clear, IMO.

eh, I wouldn't say that UnsafeBufferPointer is "Array without bounds-checking". It's a very different beast - it has reference semantics, no ARC, etc.

I'd like to see an unchecked Array subscript. Whenever I'm checking things out with Godbolt, I notice just how much our bounds-checking adds to the generated assembly. Typically it's the biggest difference between our output and an equivalent C program.

At the same time, you can't always enforce that a particular file is compiled with -Ounchecked (e.g. in a SPM package), so dropping it down to the API level makes sense -- of course, with an appropriate "unsafe" name, since you may be accessing uninitialised memory if your code is incorrect.

1 Like

Do you have a concrete example of when it would be absolutely essential to have an API like this? What would you like to happen if you go outside the array bounds?

Except that our output is memory safe, and the equivalent C program is not.

I think it's a mistake to assume that bounds checks are costs, to be elided whenever we the programmer can convince ourselves we don't need them. I would be happy to bet that most Godbolt programs showing bounds checks are quite right to keep them in place because the programmer has not, in fact, correctly determined that they are unneeded. In bigger programs, the cost of bounds checking is low.

I would be profoundly opposed to adding an unchecked array subscript. I am aware of very few programs that would be improved by removing bounds checking, and in those there is a straightforward solution: use the unsafe pointer to access the storage.

3 Likes

Well, it's more accurate to say that the memory safety of the C program is not guaranteed.

Bounds checks are costs, and the compiler will remove them whenever it can. It can't do it in all cases, or it may not be reliable, so there are times when we need to help it (see the other unchecked constructs above). Ultimately, as smart as the compiler is, it doesn't know the purpose and functioning of my code as well as I do (and if I don't understand it and start throwing around unsafe constructs in inappropriate places, that's just bad and even the compiler can't save me).

Myeh... well the unsafe pointer thing exists, but it's not entirely practical. I think being "profoundly opposed" as opposed to, well, just regular-old "opposed" is perhaps exaggerating the impact a bit.

1 Like

IMO, this will easily be abused by programmers who knows better than compilers and leads to a lot more unsafe codes. When in fact such level of optimisation should really be after determining that the bound-check is the real bottleneck of the operation (I believe almost never). At which point, it'd be appropriate that the syntax itself is salted. And withUnsafePointer seems appropriately salted to me.

Sure.

What I think you mean is that unnecessary bounds checks are costs, and the compiler will remove them whenever it can. I agree, provably unnecessary bounds checks are costs, and if you can prove a bounds check is unnecessary and have a program that still includes it, that should be considered a compiler bug.

This is not an argument for having an unchecked subscript, it's an argument for not doing automatic bounds checking. Automatic pervasive bounds checking is valuable because human beings make a lot of mistakes, and if a computer can catch that mistake and fail safely instead of dangerously, that's really useful.

Also, and this is not a personal comment but it applies to all of us: you don't know the functioning of your code very well. The purpose of your code is not really relevant when it comes to memory safety, only what your code actually does. The computer doesn't look into your heart and weigh your intentions before deciding to allow remote code execution. If you have misunderstood the function of your code, it is good for the compiler to protect you.

Now, for pragmatic reasons we continue to have languages that allow unsafe constructs. I think this pragmatism is reasonable: sometimes the unsafe construct is truly necessary. But my view is that we should be resisting making the unsafe constructs too tempting to use.

Human beings are demonstrably terrible at using unsafe constructs safely. We have decades of experience in the matter. We have programming languages that are safe-by-default with unsafe hatches where memory safety bugs creep in on the unsafe hatches.

And remember, we're not arguing about whether it should be possible, only whether it should have a shorthand spelling. I'm all in favour of it being possible: I just don't think there's a good motivation to add a shorthand for it.

Nope, I'm pretty happy being profoundly opposed.

4 Likes

A minor but important detail: these are not unsafe unchecked APIs. These are safe APIs defined to implement wrapping arithmetic (Swift used to have unsafe unchecked arithmetic, and it was the mother of all footguns, so we removed it: Removing unsafe arithmetic methods).

4 Likes

While I agree that it should be considered a compiler bug, or rather a missed optimization opportunity, anyone who has profiled and tried to optimize some array-involving code will know that it is very common that the optimizer does not remove all provably unnecessary bounds checks.

I'm just guessing now but according to my experience I wouldn't be surprised at all if it kept the bounds check for something like this:

let myArray = (0 ..< 1024).map { ... } // Just some array for which count is statically knowable.
...
let value = myArray[someIndexWhichCanBeAnyValue & 1023] // It's statically knowable that out of bounds cannot happen here, yet I'd guess the optimizer would miss this opportunity.
...

Reporting every such missed optimization opportunity I bumped into would take a lot of my time.


I'm fine with not adding the unchecked subscript though, if most people think it wouldn't be wise, as I agree that it's just a convenience that'd probably not be very commonly used (but perhaps commonly misused), so I can just keep adding it myself.

Yes, I know, I just thought that they were somewhat (loosely) similar to the pitched API in that:

  • There is a "safe by default API" (+) (safe in that it traps instead of going ahead and returning a potentially surprising (wrapped) result)
    (Analogous to ar[i].)

  • And there is a convenient "unsafe/potentially-surprising but faster" alternative (&+) (I know that it isn't unchecked or unsafe, it's just a different op, but anyway).
    (Analogous to the pitched ar[unchecked: i].)

BTW: I noticed that the term "unchecked arithmetic/operations" is similarly misused in Swift's OptimizationTips.

1 Like

The gulf between "a result you may not have expected" and "undefined behavior" is as vast as any distinction in computation. Calling both of these "unchecked" is dangerously misleading.

Also, https://github.com/apple/swift/pull/31463

7 Likes

Yup, it does. Filed as SR-12715.

It's worth noting, however, that while it's statically knowable in your current case, several years later someone can come along and change the size of the array without changing the subscript, and now your code is broken. The downside of doing things that aren't safe is that while your code may be safe today, it can easily be wrong tomorrow. Having the compiler back you up provides value.

We should also try to be clear about the performance conversation we're having. In the straightforward case, we're talking about an extra cmp and a (highly-predictable) branch. Two instructions, one very cheap and one often cheap. While there are some code paths where this cost matters, in most cases it does not. Benchmarking the code in question will usually reveal a much more important performance optimisation available to you than the cost of the bounds check.

There is one major exception here: Swift's logic for bounds checks inhibit vectorisation if they can't be elided. This often is a major performance delta on large arrays. Anytime you see that fail to happen it really genuinely is worth a bug report, because here the optimiser will achieve a substantial improvement if it can vectorise.

6 Likes