Suggestion: Tackle bounds safety

ConfusedVorlon · December 10, 2024, 5:16pm

Swift has been putting a lot of thought into data-race safety recently.

That hasn't been a problem for me in practice - but what has caused me a lot of pain are fatal errors relating to bounds issues.

Specifically in my own code, I have had two (three?) cases this year of production crashes which amounted to this bug (of course - less obviously arrived at)

And I have spent many many hours trying to work around an Apple Framework crash which is caused by an array index error

An uncaught exception was raised
*** -[NSMutableArray removeObjectsAtIndexes:]: index 1 in index set beyond bounds [0 .. 0]
(
	0   CoreFoundation                      0x000000019f998300 __exceptionPreprocess + 176
	1   libobjc.A.dylib                     0x000000019f47ecd8 objc_exception_throw + 88
	2   CoreFoundation                      0x000000019f930400 -[NSArray isEqualToArray:] + 0
	3   Vision                              0x00000001b4eb0f48 -[VNWeakTypeWrapperCollection _enumerateObjectsDroppingWeakZeroedObjects:usingBlock:] + 252
	4   Vision                              0x00000001b4eb0d10 -[VNWeakTypeWrapperCollection addObject:droppingWeakZeroedObjects:] + 180
	5   Vision                              0x00000001b4f6c834 -[VNSession initWithCachingBehavior:] + 344
	6   Vision                              0x00000001b4fd9300 -[VNCoreMLTransformer initWithOptions:model:error:] + 192
	7   Vision                              0x00000001b5041a4c -[VNCoreMLRequest internalPerformRevision:inContext:error:] + 260
	8   Vision                              0x00000001b4e9be54 -[VNRequest performInContext:error:] + 624
	9   Vision                              0x00000001b4e9c514 __73-[VNRequest performInContextAsync:asyncDispatchQueue:asyncDispatchGroup:]_block_invoke + 128

For the classic case of accessing an out of bounds index on an Array, I have my own helper

public extension Collection {
    /// Returns the element at the specified index iff it is within bounds, otherwise nil.
    subscript(safe index: Index) -> Element? {
        return indices.contains(index) ? self[index] : nil
    }
}

But I wonder if there might be a broader language-supported approach to provide safer variants of these basic language features.

For example

start?...finish
returns nil if start > finish

or perhaps
let range = try start...finish
throws if start > finish

and for an array

array[? index]
returns nil if index out of bounds

I'm not suggesting this would be the right syntax - just raising the idea that there might be some syntax

These would of course have to be optional - but might encourage people towards more defensive programming.

I'd love to see some stats on how many Apple crashes relate to out-of-bounds type errors. I bet it's a meaningful proportion...

xwu · December 10, 2024, 5:36pm

As has been said before, halting execution is safe. It is continuing execution in an unexpected state that is potentially unsafe.

The idea of a "lenient" subscript has been discussed at length and you can review the existing conversations over the past decade using the search function. As an example:

When resurrecting ideas such as this with an extensive history, it is helpful to the community if you can take some time to summarize existing contributions in such a way that the conversation doesn't end up as a rehash: after all, if the goal is to make progress, repeating the same things won't get us there.

Slava_Pestov · December 10, 2024, 5:47pm

An array indexing model that is statically correct would introduce conceptual complexity and annotation burden that far exceeds that of data race safety, though.

JuneBash · December 10, 2024, 8:37pm

I have also run into issues with crashes in production due to indexes being out of bounds and ranges being improperly formed, but I don't really blame the language for that happening; I think it's the right thing to do. In the past couple years, it's become less and less common for me to encounter these crashes as I've gotten to know the cases where forming a range from unknown values or using indices directly is truly necessary. I don't yet have a conscious understanding of what exactly has changed in my approach, but there are often other ways of accomplishing things that avoid these sorts of issues. My view is that this is not the job of the language to provide these extra "safety" nets.

Slava_Pestov · December 10, 2024, 9:09pm

Yeah, exactly. Replacing all subscripting operations with a “safe” variant is one of those things that often just creates a new problem: how do you handle a nil return?

While it might be technically possible to recover in some way in every case, if some execution path that leads to out of bounds access isn’t properly tested, you’re still going to end up in an undefined state, and it’s very possible that something will end up crashing anyway later, in which case you’ve gained nothing, and now the root cause is harder to debug.

Ironically, given that the crash is inside a weak-reference collection of some sort, it's possible that this is caused by a data race in your app.

ConfusedVorlon · December 10, 2024, 10:02pm

I'm certainly not suggesting replacing all subscript operations

By the same logic - you should get rid of try? and try. try! is fine if you have properly tested your code.

if let and guard can also go - you should know what you're dealing with without needing to explicitly handle possibilities.

JuneBash · December 11, 2024, 1:20am

Optional and error-handling are very different beasts from fatal errors caused by programmer error. They are tools for dealing with the inherent unknowability of interacting with the unpredictable runtime environment. A user might not input a valid number, and you need to account for that, etc. There's a lot of ways to handle it, but it's up to the programmer to decide how they want to handle it depending on the context.

Of course, one could say the same thing about subscripts range literals that fatalError, but every programming language is full of tradeoffs. Swift's creators decided early on that Collection subscripts would not return an Optional<Element>. Same with x...y. You can argue with whether that was correct, but I don't think it's likely to get very far. That ship has sailed. And again, I don't think it's up to the language to provide alternative tools for every single case that might cause a fatal error (integer overflow/underflow is another one, for example; hell, even a stack overflow caused by a recursive function call).

mayoff · December 11, 2024, 2:16am

This is the heart of it. I dug up the following quote from @beccadax for a stack overflow answer last year:

The use cases for arrays and dictionaries are different, though.

I’d say about 80% of the time you subscript an array, you’re using an index that was somehow derived from the array—for instance, a range like 0..<array.count, or array.indices, or array[indexPath.row] where tableView(_:numberOfRowsInSection:) returns array.count. This is very different from dictionaries, where the key is usually some piece of data from somewhere else and you’re trying to look up the value corresponding to it. You rarely say, for instance, array[2] or array[someRandomNumberFromSomewhere], but dictionary[“myKey”] or dictionary[someRandomValueFromSomewhere] are pretty common.

Because the use cases are different, arrays have a non-optional subscriptor which fails a precondition when the index is invalid, while dictionaries have an optional subscriptor which returns nil when the index is invalid.

None of which prevents you from adding your own lenient subscript implementation or lenient range initializer, if you find them useful. The ability to add operations to existing types as if they were built-in is one of the delights of writing Swift.

QuinceyMorris · December 11, 2024, 4:14am

I'd also suggest that perma-checking indexes at subscripting time is too big a hammer. IRL there are usually "gates" in your code where untrusted indexes will cause problems if allowed through. However, if indexes are validated at the gates, they're essentially trustworthy for as long as they're on the inside, and re-checking them at each subscripting operation is pointless.

Another way of looking at this, without the hokey metaphors, is that indexes should be trusted by default when arrived at via local reasoning, and you should write your own validation checks when they're arrived at via globally complex reasoning.

ConfusedVorlon · December 11, 2024, 9:45am

It surprises me how keen people are to enforce onerous requirements in order to prevent one source of problems - whilst being uninterested in providing options to avoid another.

but hey-ho, I'm clearly way out of sync with the swift mood music...

Dmitriy_Ignatyev · December 11, 2024, 10:40am

There is an old pitch for resolving this problem: ClosedRange init with unordered bounds

Now I have some free time to make the pitch more descriptive with meaningful examples.

Do anybody knows is it enough motivation to add such an extension to standard library?

xwu · December 11, 2024, 2:10pm

Probably worth pitching for swift-algorithms first if you have meaningful examples where swapping the bounds on initialization is actively desired.

bbrk24 · December 11, 2024, 2:17pm

Just to throw my 2¢ in, the only time I've had a crash from out-of-bounds array access that wasn't because of an obvious bug that was caught early in development, it was because I said array[0] rather than array.first.

Karl · December 11, 2024, 2:51pm

The current industry push to memory safe languages is because this approach has proven not to scale.

Moreover, it fails to account for bugs. For instance, it is trivial to make an index invalid "remotely" by mutating the backing storage -- now that index no longer points to a valid element, even though the index itself didn't change. Code inside the "gate" would have to be vigilant to avoid such issues, but what if it isn't? What if somebody forgets, or a calculation fails to account for some edge case?

If that bug leads to invalid accesses to memory, the effects can be particularly severe. Not only can it lead to crashes on its own (just much more difficult to debug crashes), it can also make your code vulnerable to being exploited.

There are still things that can be done to optimise bounds checking. In WebURL, I implemented my own bounds checking on top of unsafe pointers with a focus on speed. It drops certain checks related to Collection protocol semantics but not strictly required for memory safety. Last time I checked, the bounds-checking it implements can almost entirely be optimised out by the compiler; you can change the UnsafeBufferPointer.boundsChecked accessor to return self (i.e. disabling bounds-checking), and the performance is exactly the same. Part of it is that I've written the library very deliberately to make that easier on the compiler, though.

Once we have Span to guarantee lifetimes, I'll see about implementing this bounds-checking strategy on top of it and releasing it as a package.

tgoyne · December 11, 2024, 4:43pm

No amount of testing will make it so that a network request cannot fail. try and more generally Swift Errors are for expected runtime errors that are impossible to guarantee will succeed. There is no amount of precondition checking that will let you know if you can open a file; you have to just try and handle the failure if it turns out you can't. Indexing an array does not work like that because no one is changing the array behind your back†. You can check if the index you want to access is in bounds, and if that check succeeds then indexing the array will also always succeed.

† Assuming no data races, which is why Swift is trying to solve that problem.

ConfusedVorlon · December 11, 2024, 5:05pm

what do you think the lower level code is doing before deciding whether to throw?

the try approach here is just a wrapper transforming a common scenario into a pattern where people are encouraged/required to think about error cases.

The language is helping the frameworks to surface common issues and encourage you to think about them.

QuinceyMorris · December 12, 2024, 12:30am

I'm a bit confused by this. I was suggesting that adding (a) some kind of syntax to wrap the existing compiler-generated array bounds check, and (b) your own code to check your optional subscripting result for nil would count as an "onerous" requirement you've been imposing on yourself. I'm suggesting an option — writing a much smaller amount of checking code in a much smaller number of place.

I was genuinely trying to suggest something easier, not something harder.

As @xwu already pointed out about, this thread isn't about safety (in Swift terms), but about the question, "Is there an easier way to avoid having my app crash at array subscript bounds checks, other than writing boilerplate at every such access to turn the failure into a more ordinary error?"

I wonder if you thought I was suggesting removing the index bounds check from the compiler's code generation? I wasn't. I was suggesting a methodology to avoid wrapping that bounds check in additional boilerplate to avoid the crash.

tera · December 12, 2024, 1:11am

Just "DIY". Here are some ideas for you:

var array = [1, 2, 3]

_ = array[100, default: 42] // returns 42
array[100, default: 42] = 24 // silently ignored, sadly

_ = array[100, default: .log] // logs out-of-bounds to console, returns nil
array[100, default: .log] = 24 // logs out-of-bounds to console, does nothing else

_ = try! array[100, default: .throw] // throws
// try! array[100, default: .throw] = 24 // sadly impossible in current Swift

Plus consider the "unchecked exceptions" approach, see the draft implementation in the other thread.

ConfusedVorlon · December 12, 2024, 10:03am

As described in my original post, I do have my own safe accessor for arrays, though I should build something for ranges.

My suggestion was that by building support for this approach into the language, you might encourage broader adoption of direct checking of boundaries.

In the limit, you could imagine an unchecked range initialisation causing eyebrows to be raised in the way that try! does now.

joshappdev · January 9, 2025, 8:10pm

@ConfusedVorlon out of curiosity, did you ever find a workaround for your Vision crash? We are facing the same issue and haven't found a solution yet.