What's up with _failEarlyRangeCheck?

Karl · June 28, 2021, 2:27pm

See title. What's up with _failEarlyRangeCheck?

The documentation says it is a no-op for non-RandomAccessCollections.
That's not true; actually, the default implementation always performs a range-check, even for plain Collections.
That default implementation also performs _precondition checks, which IIUC, are maintained in release builds (?). This seems a bit heavy for a check which the documentation says is only for QoI, and should never relied upon for memory safety. Surely it should defer to the actual Collection subscript for safety in release builds?
It seems to be inconsistently used in the standard library. The only times it seems to be called to perform a range check are (I think this is all of them?):

a. Slice's subscripts, where it bounds-checks every access - even if the underlying collection's subscript already does bounds-checking. It also uses a range literal, which adds another potential runtime failure if the compiler can't absolutely prove that startIndex <= endIndex.

b. RandomAccessCollection default implementations for Index == Int

c. MutableCollection's slicing subscript, which returns a Slice (see above). Also _writeBackMutableSlice, even though it also sets elements via the underlying collection's subscript.

d. Range's default index(after:) when Bound: Strideable. But only index(after:); the other functions use regular bounds checks
Quite a few types in the standard library override it with a no-op for performance reasons (including Array, ContiguousArray, ArraySlice, AnyCollection and friends). Non-standard-library types would need to implement an underscored method to do the same.
If you write your own Collection wrapper, you can't inherit any customised behaviour from the base unless you implement this underscored method.

I think something needs to be done about this function. Either it should be removed, lowered to a debug-mode check, or made non-underscored, so collection wrappers can inherit the behaviour of their base collections without implementing underscored methods. It seems like it's been around for quite a long time, maybe even predating the Swift 3 indexing model, so perhaps it's worth rethinking if it's worth keeping.

What do people think?

With regard to Slice in particular, it can lead to double bounds-checking if you implement a collection which does its own bounds-checking (e.g. if their underlying storage is unsafe). I have a toy example which demonstrates the issue:

Summary

Consider the following example code, which demonstrates the code generated to
read an element from a slice of some custom collection which performs its own bounds-checking:

func accessElement(from c: Slice<MyCollection>, at index: Int) -> String {
    c[index]
}

struct MyCollection: Collection {
    var startIndex: Int
    var endIndex: Int
    func index(after i: Int) -> Int {
        i + 1
    }

    subscript(index: Int) -> String {
        precondition(index >= startIndex)
        precondition(index < endIndex)
        return String(index) // pretend this was an access to an unsafe buffer, hence we bounds-checked it like a good citizen
    }
}

This results in the following assembly (Godbolt):

output.accessElement(from: Swift.Slice<output.MyCollection>, at: Swift.Int) -> Swift.String:
        push    r13
        sub     rsp, 16
        cmp     rsi, rdi
        jl      .LBB1_6
        cmp     r8, rdi
        jl      .LBB1_7
        cmp     r8, rsi
        jge     .LBB1_8
        cmp     r8, rdx
        jl      .LBB1_9
        cmp     r8, rcx
        jge     .LBB1_10
        mov     qword ptr [rsp + 8], r8
        call    (lazy protocol witness table accessor for type Swift.Int and conformance Swift.Int : Swift.BinaryInteger in Swift)
        mov     rdi, qword ptr [rip + ($sSiN)@GOTPCREL]
        lea     r13, [rsp + 8]
        mov     rsi, rax
        call    ($sSzsE11descriptionSSvg)@PLT
        add     rsp, 16
        pop     r13
        ret
.LBB1_6:
        ud2
.LBB1_7:
        ud2
.LBB1_8:
        ud2
.LBB1_9:
        ud2
.LBB1_10:
        ud2

Notice that even for this minimal example, the function ends up with 5 (!!!) potential runtime errors.

Let's see what happens if we no-op _failEarlyRangeCheck (Godbolt):

output.accessElement(from: Swift.Slice<output.MyCollection>, at: Swift.Int) -> Swift.String:
        push    r13
        sub     rsp, 16
        cmp     rsi, rdi
        jl      .LBB1_4
        cmp     r8, rdx
        jl      .LBB1_5
        cmp     r8, rcx
        jge     .LBB1_6
        mov     qword ptr [rsp + 8], r8
        call    (lazy protocol witness table accessor for type Swift.Int and conformance Swift.Int : Swift.BinaryInteger in Swift)
        mov     rdi, qword ptr [rip + ($sSiN)@GOTPCREL]
        lea     r13, [rsp + 8]
        mov     rsi, rax
        call    ($sSzsE11descriptionSSvg)@PLT
        add     rsp, 16
        pop     r13
        ret
.LBB1_4:
        ud2
.LBB1_5:
        ud2
.LBB1_6:
        ud2

Now we only have 3 traps. So let's look at them:

        cmp     rsi, rdi
        jl      .LBB1_4
        cmp     r8, rdx
        jl      .LBB1_5
        cmp     r8, rcx
        jge     .LBB1_6

It's pretty clear that the last 2 are the bounds checks from MyCollection, and Godbolt confirms that. But what about that first one? The one that jumps to .LBB1_4 if one value is less than the other? I believe that one is happening because of Slice's implementation of _failEarlyRangeCheck (even when no-oped); in order to form its range literal, it needs to check that the Range's lowerBound <= upperBound, and nothing about the later bounds checks, nor the fact that MyCollection's _failEarlyRangeCheck is an empty function, allows the compiler to eliminate that. It isn't actually relevant to how MyCollection implements its bounds-checking, and it actually opts-out of Slice's FERC, but it still has to pay for it.

In short: if we removed _failEarlyRangeCheck, or made it a no-op in release builds, or made it non-underscored, I believe we could reduce those 5 potential runtime errors in this toy example down to 2. Right now, we're doing bounds-checking twice: one for the slice bounds, once for the underlying collection's bounds, and then one more for the range literal as well. It's really inefficient.

Fewer checks could very well mean code-size reductions, as well less stuff for the compiler to try and reason about, more effective bounds-check elimination, and more effective optimisations after those checks have been proven redundant.

lorentey · June 29, 2021, 5:48am

Thanks for all this investigation!

Like most underscored stuff, it's a bit of a mess, and it isn't fit for general consumption. I believe looking through the Swift mailing list archives for discussions about bounds checks around the time the new indexing model was introduced might uncover some of the original thinking behind it.

My vaguely educated guess is that _failEarlyRangeCheck (or rather, its original ForwardIndex-based incarnation) was originally introduced to work around the inability to directly compare indices, and it has since been repositioned as a minor performance hack.

FWIW though, I don't think the original intentions are worth much -- what matters is what the code does in its current, ABI stable, incarnation.

I believe this part of the doc comment originates from a time when Index wasn't Comparable -- it's out of date and it's misleading, and so it should be removed. AFAICT, the default implementation now performs a check because we can now compare indices even in forward-only collections, and we evidently assume that Index.< will be O(1).

(I think this assumption is reasonable, even though Comparable doesn't come with any performance requirements, and IIRC Collection doesn't say anything about the expected performance of index comparisons, either.)

I deeply, deeply mistrust the term "QoI", but if I understand it correctly, the doc comment doesn't mean to use it in a disparaging sense. Labeling index validation as a quality-of-implementation issue (as opposed to a memory safety concern) doesn't mean that it's fine to not do it.

While making _failEarlyRangeCheck a noop must not ever allow code to lead to undefined behavior through an out-of-bounds memory access, custom implementations are allowed to sometimes result in incorrect but still well-defined results when an invalid index is passed to a Collection operation that calls _fERC.

Public collection operations performing O(1) index validation in release builds (whether or not through _fERC) is, in general, a feature, not a bug; these checks must not be disabled. I strongly prefer Swift functions to always return correct values, and to trap when that's not possible.

(I'd personally be okay with Collection providing a separate unchecked subscript, though, in cases where we (think) we can guarantee that the index (or index range) is valid. (One example would be the subscript invocation in IndexingIterator.next(), at least in cases where the base collection implements value semantics.) The point is that the C++ STL got this exactly backwards -- the most convenient, default way to spell access must be the safest, fully validated variant.)

This is crucial for correctness (although not for memory safety). In general, we do not want slices to allow access outside of their bounds:

let slice = foo[3 ..< 10]
print(slice[0]) // Should trap

I do think that these checks are necessary, and omitting them would be bad engineering. Saying that these are merely a quality of implementation concern (as opposed to a memory safety issue) does not mean that we are allowed to ignore them. The stdlib is supposed to be a well-engineered, quality piece of software -- partial functions like the indexing subscript generally mustn't return nonsensical results on unexpected input.

Slice operations perform bounds checking via the _fERC hooks in order to allow Collection authors to have some control over how these checks are done without having to implement a custom SubSequence type. (Personally, I do not think this is a particularly great idea, and these hooks should rarely if ever be exercised. But again, opinions aren't worth much; what matters is what we've committed to the ABI, which is that Slice performs bounds checking through _failEarlyRangeCheck.)

We have previously established (e.g., in PR #34961) that we don't need a runtime check to verify that Range values are valid (i.e., lowerBound <= upperBound), and unless I'm badly mistaken, Slice can only be initialized via its init(base:bounds:) initializer, so this relation also holds for its start/endIndex.

Therefore, I believe it is safe to assume that startIndex <= endIndex for every Slice, which means that it would be okay to replace all _failEarlyRangeCheck(foo, bounds: startIndex ..< endIndex) invocations in Slice with _failEarlyRangeCheck(foo, bounds: Range(_uncheckedBounds: (startIndex, endIndex))). (I think we don't even need debug checks here, so the underscored variant would be fine.)

Note that removing the precondition isn't necessarily going to be a clear win. The compiler doesn't understand that a range's lower bound is guaranteed to never exceed the upper bound, so I expect it may sometimes generate extra code to unnecessarily cover that case in downstream code.

But still, a PR that implements this would be welcome -- especially if you can add a new benchmark that highlights the beneficial effect of removing the condition. (So as to justify accepting the inevitable regressions and to have a chance to catch any future regressions.)

I believe this is to allow custom array-like collections to get the same deferred index validation behavior as Array without having to manually implement these. In my opinion, this is largely pointless -- manually implementing index(after:) etc. seems vastly preferable to implementing an underscored requirement. On the other hand, if I already need to implement _fERC to customize Slice behavior, this seems like a reasonable (if a little dubious) freebie.

These implementations already use the init(uncheckedBounds:) initializers, so they only perform checks in debug builds (and, sadly, opaque parts of the stdlib). Unfortunately, I think the debug mode checks are somewhat load-bearing --- they help catch invalid Collection implementations that violate the startIndex <= endIndex requirement. So replacing these with init(_uncheckedBounds:) calls seems unwise. (It's not entirely impossible though -- the benefits of these debug checks are marginal enough that we could consider removing them if someone could demonstrate a clear performance advantage that couldn't be achieved in some other way. But I don't think debug-only checks are worth spending much time on!)

That this default implementation exists is horrifying, because unlike Collection's variant, this implementation isn't in a conditional extension -- every MutableCollection gets it, no matter if their SubSequence is Slice or not. @glessard is investigating the practical effects of this; it may well require some sort of fix. Thanks for pointing it out!

That MutableCollection's default bounds subscript uses _fERC doesn't seem like a problem to me --the range we pass to this method needs to be validated, and like with the default RAC index(after:), this may just as well be done through these hooks. However, it would be better if the implementation was using Range.init(uncheckedBounds:) to construct the range, not ..<. (We do need the debug checks for the same reason as above.)

As I mentioned above, Collection has a similar default implementation where SubSequence == Slice<Self>. That too uses _fERC, as expected. It also uses ..<, which can be similarly replaced with the uncheckedBounds: the initializer.

(That the default bounds subscript uses _fERC means that if we implement it as a noop in a custom collection that uses Slice, then that collection will allow slices to be created with out of bounds indices -- unless we also implement a custom slicing subscript. However, Slice always accesses elements through its base, so this will not lead to memory safety issues, as long as the base collection properly validates its indices. Note that I'm conveniently ignoring withContiguousStorageIfAvailable here -- although that family of methods raises some complications we should probably address at some point.)

This feels like a cosmetic oversight that we should probably fix. Either Range.index(before:) and Range.index(_:offsetBy:) should also be calling _fERC, or index(after:) should be using direct _preconditions. AFAICT, neither of these options would have any functional impact in optimized code, as Range's _fERC implementation is effectively the same as the standalone preconditions. (In debug builds, the fERC variant does come with a better error message, which is a nice bonus!)

Allowing a select few collection types to customize it is the entire point of having _fERC, I believe.

The Array family does it for no immediately obvious technical reason -- as far as I can tell, for them fERC is just a public entry point; they themselves never call it. I think the same goes for AnyCollection/AnyBidirectionalCollection/AnyRandomAccessCollection -- they are self-sliced and AFAIK they forward all other actual requirements to the underlying base collection.

I'm guessing(!) that the fERC implementations in these types are there in case some collection algorithm decided to call fERC to do its own index validation. (This seems unlikely to ever be necessary, to be honest. A full precondition would work better in all but the most trivial cases.)

The type where fERC really shines is Unsafe[Mutable]BufferPointer, which defines these to use debug assertions. (For the record, I don't like this; but the ABI doesn't at all care about my opinions.)

One important additional type that could benefit from having _fERC implementations is Unsafe[Mutable]RawBufferPointer -- there is a todo item in the code somewhere to add these, and we can do it at any time; all that needs to be done is for someone to submit a PR with the implementation (+ tests). Following the precedence established with Unsafe[Mutable]BufferPointer, the custom implementations should perform debug assertions.

What exactly do you mean? Can you give an example? (I can forward member implementations to a base collection to wrap its behavior just fine, with or without implementing _fERC.)

I think I'd personally be fine with replacing some or all of the calls to the _fERC methods with direct, unconditional preconditions. However, eliminating these checks altogether, or indiscriminately downgrading them to debug-only checks seems like a bad idea to me.

The _fERC methods are performance hacks that allow customizing the behavior of some default collection implementations. It is not at all crucial for Collection authors to implement them, or even know about them; but it's sort of nice that special types like UnsafeBufferPointer are able customize these to trade a bit of performance for a bit of correctness, without having to swallow the API surface bloat of implementing a custom SubSequence.

I don't think it would be worth pushing _fERC through Swift Evolution though -- I doubt we'd want to generally encourage people to eliminate index validation. I really, really hope people aren't trying to implement their own UnsafeBufferPointer; but if for some reason they do, I expect having to learn about underscored and underdocumented methods isn't going to be the most difficult problem.

dabrahams · June 29, 2021, 4:40pm

lorentey:

I believe this part of the doc comment originates from a time when Index wasn't Comparable -- it's out of date and it's misleading, and so it should be removed. AFAICT, the default implementation now performs a check because we can now compare indices even in forward-only collections, and we evidently assume that Index.< will be O(1).

(I think this assumption is reasonable, even though Comparable doesn't come with any performance requirements, and IIRC Collection doesn't say anything about the expected performance of index comparisons, either.)

Karl:

That default implementation also performs _precondition checks, which IIUC, are maintained in release builds (?). This seems a bit heavy for a check which the documentation says is only for QoI, and should never relied upon for memory safety. Surely it should defer to the actual Collection subscript for safety in release builds?

I deeply, deeply mistrust the term "QoI", but if I understand it correctly, the doc comment doesn't mean to use it in a disparaging sense. Labeling index validation as a quality-of-implementation issue (as opposed to a memory safety concern) doesn't mean that it's fine to not do it.

In this context, “QoI” means “Quality of dIagnostics”

IIRC, when the check was introduced, correct implementations of non-random-access collections would still have prevented memory safety violations due to later checks in the code, but with a less-informative diagnostic. I think your suspicions about original reasons for the check's existence might indeed be correct, and I don't know what it's doing now.

HTH,
Dave

Karl · June 29, 2021, 7:30pm

Thanks a lot for the insightful responses!

lorentey:

This is crucial for correctness (although not for memory safety). In general, we do not want slices to allow access outside of their bounds:
let slice = foo[3 ..< 10]
print(slice[0]) // Should trap
I do think that these checks are necessary, and omitting them would be bad engineering. Saying that these are merely a quality of implementation concern (as opposed to a memory safety issue) does not mean that we are allowed to ignore them. The stdlib is supposed to be a well-engineered, quality piece of software -- partial functions like the indexing subscript generally mustn't return nonsensical results on unexpected input.

I'm a bit on the fence about this. There are many aspects to quality, or good engineering, and often they depend on what you're trying to do. If you're trying to make the software "fool-proof" (so to speak), trapping on every nonsensical input seems like good practice; but if you're trying to optimise code in which you have high confidence, and where performance is a serious concern, you might consider a good quality implementation to be one that only does the bare minimum bounds-checks to ensure memory safety.

As I understand it, accessing an invalid index is unspecified, and there is no requirement to trap. The best example of this, IMO, is Array's index(after:). You could make an equally-good argument that incrementing an Array's index outside of its bounds should trap; but it doesn't, and AFAIK that is not considered that to be bad quality or poor engineering:

let arr = Array(10..<20)
arr.index(arr.startIndex, offsetBy: 1000)  // Returns 1000. No runtime error.
arr.index(arr.startIndex, offsetBy: -1000) // Returns -1000. No runtime error.

You could also make an argument for something in-between: for debug-mode checks to be strict about slice bounds, but for release-mode checks to be as lenient as possible (without violating memory-safety, of course). Or, even for a new build mode which enables safety-critical checks, but omits other kinds of correctness checks. Something between release mode and -Ounchecked (which is too extreme; it's really useful as a developer tool, but IMO nobody should be shipping software compiled at -Ounchecked).

Well, it depends what you use the Range for and precisely how you write it. It requires some careful coding to be robust against invalid Ranges. It's certainly possible, and the UBP types do the right thing AFAICT, but it can be tricky. That said, if Slice checks the bounds at construction, it should be fine to use that as justification to omit repeated checks later. It's unfortunate that we don't have a ValueSemantics generic constraint to really codify why this is safe.

Personally, I've found that Range preconditions are one of the most difficult things for the compiler to eliminate in complex generic Collection code. I expect that there would be some performance gain, but it can be hard for a single PR to demonstrate that because it can depend on the containing code and whether eliminating the check enables further optimisations.

If you wrap a collection which customises FERC, but your wrapper doesn't customise FERC, it'll get a fresh, default implementation instead of inheriting its base's version. If you then take a Slice of that wrapper, Slice will call that default implementation.

And because it's an underscored requirement, you may not know to implement it. And because it appears to be used a bit sporadically, with out-of-date documentation, it's hard to judge what the effects of customising it are.

I'd really like it to be my choice whether this unspecified operation actually results in a runtime error or not. It would be easy to say "well, write your own Slice, then!", but when it comes to wrappers, that's quite a lot of work - it means introducing additional hooks in protocols which refine Collection (as well as more protocols to refine BidirectionalCollection, RandomAccessCollection, MutableCollection, etc), changing the wrapper's generic constraints to require those refined protocols, writing adapter wrappers for the world of existing Collections, etc. Basically, it means writing your own Collection hierarchy, more-or-less. And all of that is to handle the couple of specialised collections containing special logic which enables saving some bounds-checks.

So yeah, I think dropping the underscore and giving this method some tighter semantics would be appreciated.

All I'm going to say is that there are good reasons.

Seriously, though, it's not so much about learning the underscored methods - more that, since they are underscored, it's generally bad practice to implement them and difficult to understand what the effects are. Developers should generally be sceptical if they see code using underscored methods.

And it isn't only the custom UBP that needs to implement it - wrapper types should implement it, too, lest they get fresh, default implementations and fail to acknowledge the wishes of their base collection.

lorentey · June 30, 2021, 9:24am

For the record, I do have vague doubts about whether it was the responsible thing to do. (I think I'd have preferred if we kept debug-mode checks in index manipulation methods, exactly like you suggest.)

But as Dave pointed out above, not diagnosing the index(after:) case does seem less harmful than not diagnosing the subscript(_:) case. If someone uses index(after:) to walk off the end of the collection, the error will still be caught when the resulting invalid index is passed to the subscript or another operation that accesses an actual element. (Unless they proceed to walk back into the bounds of the collection, but then getting a value may actually be the desired outcome.) The subscript diagnostic will not pinpoint the exact source of the problem, but there is still an error.

There is a case to be made that Slice should be using direct _preconditions rather than _fERC to implement bounds checking in its subscripts -- however, that could introduce bincompat headaches, so it might not be practical to change it at this point. But it could be worth a try!

Considering index(after:) diagnosis less critical seems especially reasonable when the Index type is strideable and can be advanced on its own, like an Int -- in this case, I expect most people just use += to increment indices anyway.

My position is stronger -- I believe that Range values can be safely assumed to be always valid; there is therefore no need to ever explicitly check that lowerBound <= upperBound, because it cannot possibly* be false -- this is a fully enforced invariant of [Closed]Range.

* Without someone invoking undefined behavior, like manually overwriting memory or misusing the unchecked-unsafe Range initializer in release builds. (Which I don't think are things most code can be reasonably expected to defend against.)

I agree it would be interesting if we were able to express such invariants in the type system. (Having to prove them to the compiler could prove tricky in general though.)

I don't see how not customizing _fERC would lead to any major hardship; if the wrapper uses the default Slice for its SubSequence, then it will incur the trivial costs of proper index validation. (Which are usually (way) below 20%, even in heavily index-oriented workloads.)

To be honest, I don't believe it is possible to write a completely transparent collection wrapper like you describe without implementing every single underscored requirement and forwarding/adapting those as well as the documented requirements.

I'm not even certain it is possible to implement a wrapper that behaves 100% transparently when it comes to wrapping random combinations of conditional Bidirectional/RandomAccess/Mutable/RangeReplaceable conformances that randomly stack on top of Collection (c.f. @timv's complex work on SE-0312), with or without the exponentially higher complexities of trying to also wrap SubSequence and Indices the same way.

It would probably be illuminating if you could describe the actual use case you have in mind. What wrapper type are you implementing?

/// A re-imagined contiguous buffer type, which attempts to protect against buffer over-reads in release builds with a negligible performance hit, and in some
/// cases even an overall performance _improvement_ over the standard library's `UnsafeBufferPointer`.

Now this seems right up my alley! I'm all for adding more bounds checking to UBP, whether or not there is a performance benefit. Cc @glessard.

(If this proves out, ideally we should be able to adopt some of it in the stdlib. Changing U[M][R]BP's Index/SubSequence types, internal layout would be ABI+source breaking though, so I don't believe such things would be feasible -- unless we find an incredibly convincing argument to deprecate the existing types and replace them with a new set. The migration work would be unbelievably painful, as these types are used in all sorts of tricky low-level code; I think the potential benefits need to be extremely high to pay for the (resource & goodwill) costs of another Swift 3-style update.)

This UBP alternative isn't a collection wrapper, though, so implementing a custom SubSequence doesn't seem that big an issue. This type does not need to implement _fERC at all, especially since _fERC isn't suitable for it -- AIUI, it wants to mix-and-match which SubSequence operations perform bounds checks, while _fERC is (at the moment) an all-or-nothing affair. (Unless we do attempt to replace the _fERC calls in the Slice subscripts with raw preconditions... )

It definitely does need to implement and use Sequence._copyContents though, to get adequate performance for its copy operations. Is there a reason you're more comfortable with that requirement? (We could use the investigations in this thread to update the documentation of the _fERC methods like PR 360004 did for _copyContents.)

(Btw, replacing that particular underscored method with a couple of public IteratorProtocol+Collection requirements is on my personal (unbudgeted) todo list, to enable better support for copying data between piecewise contiguous collection types, such as the new `Deque` in the Collections package. I have a prototype that seems fairly flexible & efficient.)

The way I see it, the main technical difficulty with U[M]BP isn't necessarily its Collection conformance (which is somewhat questionable in the first place, since UBP values can also reference partially or wholly uninitialized memory, which mustn't be accessed through the Collection operations) -- rather, it's fleshing out a convenient set of operations that can be used to initialize/deinitialize/copy/move data from these buffers, without reaching down to naked pointers. The current buffer pointer types aren't particularly great at this.