See title. What's up with _failEarlyRangeCheck
?
-
The documentation says it is a no-op for non-
RandomAccessCollection
s. -
That's not true; actually, the default implementation always performs a range-check, even for plain
Collection
s. -
That default implementation also performs
_precondition
checks, which IIUC, are maintained in release builds (?). This seems a bit heavy for a check which the documentation says is only for QoI, and should never relied upon for memory safety. Surely it should defer to the actualCollection
subscript for safety in release builds? -
It seems to be inconsistently used in the standard library. The only times it seems to be called to perform a range check are (I think this is all of them?):
a. Slice's subscripts, where it bounds-checks every access - even if the underlying collection's subscript already does bounds-checking. It also uses a range literal, which adds another potential runtime failure if the compiler can't absolutely prove that
startIndex <= endIndex
.b. RandomAccessCollection default implementations for
Index == Int
c. MutableCollection's slicing subscript, which returns a Slice (see above). Also
_writeBackMutableSlice
, even though it also sets elements via the underlying collection's subscript.d.
Range
's default index(after:) whenBound: Strideable
. But onlyindex(after:)
; the other functions use regular bounds checks -
Quite a few types in the standard library override it with a no-op for performance reasons (including
Array
,ContiguousArray
,ArraySlice
,AnyCollection
and friends). Non-standard-library types would need to implement an underscored method to do the same. -
If you write your own
Collection
wrapper, you can't inherit any customised behaviour from the base unless you implement this underscored method.
I think something needs to be done about this function. Either it should be removed, lowered to a debug-mode check, or made non-underscored, so collection wrappers can inherit the behaviour of their base collections without implementing underscored methods. It seems like it's been around for quite a long time, maybe even predating the Swift 3 indexing model, so perhaps it's worth rethinking if it's worth keeping.
What do people think?
With regard to Slice
in particular, it can lead to double bounds-checking if you implement a collection which does its own bounds-checking (e.g. if their underlying storage is unsafe). I have a toy example which demonstrates the issue:
Summary
Consider the following example code, which demonstrates the code generated to
read an element from a slice of some custom collection which performs its own bounds-checking:
func accessElement(from c: Slice<MyCollection>, at index: Int) -> String {
c[index]
}
struct MyCollection: Collection {
var startIndex: Int
var endIndex: Int
func index(after i: Int) -> Int {
i + 1
}
subscript(index: Int) -> String {
precondition(index >= startIndex)
precondition(index < endIndex)
return String(index) // pretend this was an access to an unsafe buffer, hence we bounds-checked it like a good citizen
}
}
This results in the following assembly (Godbolt):
output.accessElement(from: Swift.Slice<output.MyCollection>, at: Swift.Int) -> Swift.String:
push r13
sub rsp, 16
cmp rsi, rdi
jl .LBB1_6
cmp r8, rdi
jl .LBB1_7
cmp r8, rsi
jge .LBB1_8
cmp r8, rdx
jl .LBB1_9
cmp r8, rcx
jge .LBB1_10
mov qword ptr [rsp + 8], r8
call (lazy protocol witness table accessor for type Swift.Int and conformance Swift.Int : Swift.BinaryInteger in Swift)
mov rdi, qword ptr [rip + ($sSiN)@GOTPCREL]
lea r13, [rsp + 8]
mov rsi, rax
call ($sSzsE11descriptionSSvg)@PLT
add rsp, 16
pop r13
ret
.LBB1_6:
ud2
.LBB1_7:
ud2
.LBB1_8:
ud2
.LBB1_9:
ud2
.LBB1_10:
ud2
Notice that even for this minimal example, the function ends up with 5 (!!!) potential runtime errors.
Let's see what happens if we no-op _failEarlyRangeCheck
(Godbolt):
output.accessElement(from: Swift.Slice<output.MyCollection>, at: Swift.Int) -> Swift.String:
push r13
sub rsp, 16
cmp rsi, rdi
jl .LBB1_4
cmp r8, rdx
jl .LBB1_5
cmp r8, rcx
jge .LBB1_6
mov qword ptr [rsp + 8], r8
call (lazy protocol witness table accessor for type Swift.Int and conformance Swift.Int : Swift.BinaryInteger in Swift)
mov rdi, qword ptr [rip + ($sSiN)@GOTPCREL]
lea r13, [rsp + 8]
mov rsi, rax
call ($sSzsE11descriptionSSvg)@PLT
add rsp, 16
pop r13
ret
.LBB1_4:
ud2
.LBB1_5:
ud2
.LBB1_6:
ud2
Now we only have 3 traps. So let's look at them:
cmp rsi, rdi
jl .LBB1_4
cmp r8, rdx
jl .LBB1_5
cmp r8, rcx
jge .LBB1_6
It's pretty clear that the last 2 are the bounds checks from MyCollection
, and Godbolt confirms that. But what about that first one? The one that jumps to .LBB1_4
if one value is less than the other? I believe that one is happening because of Slice's implementation of _failEarlyRangeCheck
(even when no-oped); in order to form its range literal, it needs to check that the Range's lowerBound <= upperBound
, and nothing about the later bounds checks, nor the fact that MyCollection's _failEarlyRangeCheck
is an empty function, allows the compiler to eliminate that. It isn't actually relevant to how MyCollection
implements its bounds-checking, and it actually opts-out of Slice's FERC, but it still has to pay for it.
In short: if we removed _failEarlyRangeCheck
, or made it a no-op in release builds, or made it non-underscored, I believe we could reduce those 5 potential runtime errors in this toy example down to 2. Right now, we're doing bounds-checking twice: one for the slice bounds, once for the underlying collection's bounds, and then one more for the range literal as well. It's really inefficient.
Fewer checks could very well mean code-size reductions, as well less stuff for the compiler to try and reason about, more effective bounds-check elimination, and more effective optimisations after those checks have been proven redundant.