Pitch: enable bounds-checking for BufferPointers

This is precisely what the new [unchecked:] will give you. It will compile to a bare load in both debug and release builds. Currently the only way to spell that is to go to offset from the baseAddress.

1 Like

I understand that, but it feels pretty tiresome to have to opt in at every call site to get something so fundamental. Iā€™m imagining looking at a page full of code like my2dArray[0, .unsafe].pointee[24, .unsafe].pointee and wondering why I didnā€™t just write the thing in C or C++. There are so many words compared to something that scans easily: my2dArray[0][24]. And I donā€™t have to remember to throw the word unsafe into every pointer access or risk falling off a performance cliff.

6 Likes

Presumably because you wanted to benefit from bounds checks and other memory safety features, so write it as my2DArray[0, 24] and get those. If you need to opt out for specific accesses after using a profiling tool to identify them, you might write my2DArray[unchecked: 0, 24]. I (personally) think that's acceptable.

High-performance code in C and C++ is astonishingly verbose, too. Clearly communicating intended program semantics to the compiler requires a lot of words.

3 Likes

I donā€™t expect the things that are verbose on C/C++ to be less verbose in Swift. The pain comes from things which are concise and commonplace in C/C++ being verbose or presenting gotchas.

I think I can adapt @Karl ā€™s opinion a little bit: ā€œunsafeā€ should be a superset of ā€œuncheckedā€. Iā€™ve clearly indicated my desire to trade off safety for performance by using an unsafe type; please donā€™t sneak the opposite trade off back in unless I ask, and especially donā€™t give me a false sense of security that my trade-off is being respected until I happen to exceed the optimizerā€™s current level of intelligence.

2 Likes

Is it not like this today (without "pointee")?

var matrix: UnsafeMutableBufferPointer<UnsafeMutableBufferPointer<Double>> = ...
let element: Double = matrix[0][24] // āœ…

The "pointee" is analogous to C's * prefix operator:

let element: Double = ((matrix.baseAddress! + 0).pointee.baseAddress! + 24).pointee

However, I do understand what you mean ā€“ that you don't want to add extra words to get the current "unchecked" behaviour:

let element: Double = matrix[unchecked: 0][unchecked: 24]

Considering some alternatives:

  1. Here:
((matrix.baseAddress! + 0).pointee.baseAddress! + 24).pointee

we could consider (ab)using &+ operator to get the "non-default" unchecked behaviour:

((matrix.baseAddress! &+ 0).pointee.baseAddress! &+ 24).pointee
  1. C/C++'s #pragma way of doing this would look like:
#pragma optimize(push, checks, off)
func foo() {
    // no checks here
}
#pragma optimize(pop)

There's good and bad in that approach, it doesn't look nice and greatly pollutes the code for starters.

  1. One other possibility would be to achieve almost the same as in (2) on the per-function granularity basis by having an attribute:
@optimization(...) func foo() {
}
  1. We should also consider leaving the current "unchecked" behaviour the default one and opting in to the checked behaviour by using a long winded form. Personally I don't like it but worth to mention it here for completeness.
1 Like

Pointer arithmetic is already unchecked, and this proposal would not change that. You're doing pointer arithmetic. There's no need for any shenanigans.

At worst, this would be matrix.baseAddress![0].baseAddress![24]. But surely one would wrap this into a struct that provided multi indexing if you were using it more than once, and render it as matrix[unchecked: 0, 24].

3 Likes

Got you. Maybe this then?

((matrix  + 0).pointee  + 24).pointee // checked version
((matrix &+ 0).pointee &+ 24).pointee // unchecked version

A questionable point would be: is this legal way to get the 1st element:

(matrix + 100 - 99).pointee

or would it overflow mid-term.

This does look like only "x" is unchecked :wink:

There is no checking for pointer arithmetic currently, and this proposal would not change that. Not just within a buffer; you can wrap around the memory space a couple times and that's totally fine.Ā¹

  1> let a = UnsafeMutablePointer<Int>.allocate(capacity: 1)
a: UnsafeMutablePointer<Int> = 0x600002458000 {
  pointee = 0
}
  2> a + Int.max - Int.max
$R0: UnsafeMutablePointer<Int> = 0x600002458000 {
  pointee = 0
}
  3>  

Ā¹ "Fine" as far as Swift checks are concerned. There is a question about the interaction of this with LLVM-level optimization passes that may assume C-like semantics for such things and make it "not fine";Ā² I would need to investigate that further.

Ā² No more "not fine" than it is in C (which is very not fine).

2 Likes

Ah, no, I didn't mean pointer arithmetic there. More like this:

extension UnsafeMutableBufferPointer {
    struct Something {
        let buffer: UnsafeMutableBufferPointer
        let offset: Int
        var pointee: UnsafeMutableBufferPointer.Element {
            get { buffer[checked_or_unchecked: offset] }
            set { buffer[checked_or_unchecked: offset] = newValue }
        }
        static func + (something: Self, offset: Int) -> Something {
            Something(buffer: something.buffer, offset: something.offset + offset)
        }
    }
    static func + (buffer: Self, offset: Int) -> Something {
        Something(buffer: buffer, offset: offset)
    }
}

Or something similar to that account (that potentially could check "offset" validity inside "+" operation instead of (or in addition to) the "pointee" operation).

All of my yes, it's bad enough that the C++ equivalent (std::span) doesn't bound check, there's no need for Swift to continue the same mistake. +1 to the bound checks and +1 to the additional unchecked subscript.

1 Like

In sympathy to @Karl's line of thinking, would it make sense to extend this [unchecked: x] syntax to safer collection types too, like Array? Would that perhaps lessen the need to use Unsafe[Mutable]BufferPointer to begin with?

2 Likes

So while I'm sympathetic to this argument, I want to reinforce that @karl's fear is entirely reasonable. The Swift optimizer is consistently horrible at hoisting bounds checks. The only reason this hasn't been more of a disaster is that Array has magic semantics annotations that allow the code to be wholesale replaced.

As an example, below is the world's simplest collection type: a stupid reimplementation of Range:

struct BoundsCheckedRange: RandomAccessCollection {
    var startIndex: Int
    var endIndex: Int

    subscript(_ index: Int) -> Int {
        precondition(index >= startIndex)
        precondition(index < endIndex)

        return index
    }
}

We can call firstIndex(where:) in the safest possible way like this:

func funComputing(_ range: BoundsCheckedRange, element: Int) -> Int? {
    return range.firstIndex(where: { $0 >= element })
}

Here is the generated asm for this function (godbolt link for comparison]:

output.funComputing(_: output.BoundsCheckedRange, element: Swift.Int) -> Swift.Int?:
        mov     rax, rdx
        mov     dl, 1
        cmp     rdi, rsi
        jne     .LBB47_2
.LBB47_1:
        xor     edi, edi
        mov     rax, rdi
        ret
.LBB47_2:
        jge     .LBB47_12
        cmp     rdi, rax
        jge     .LBB47_8
        inc     rdi
        cmp     rdi, rsi
        je      .LBB47_1
.LBB47_5:
        cmp     rdi, rsi
        jge     .LBB47_11
        cmp     rax, rdi
        je      .LBB47_10
        inc     rdi
        cmp     rsi, rdi
        jne     .LBB47_5
        jmp     .LBB47_1
.LBB47_8:
        xor     edx, edx
        mov     rax, rdi
        ret
.LBB47_10:
        xor     edx, edx
        ret
.LBB47_11:
        ud2
.LBB47_12:
        ud2

We have two precondition checks here: one defensible, one not. One is in the setup block LBB47_2. We confirm that the endIndex is greater than the startIndex, a justifiable check (failure is LBB47_12).

The other is in the loop body at LBB47_5. Here we immediately check that our current index (rdi) is not greater than the endIndex (rsi). This is one of our two bounds checks in the subscript, and it has not been eliminated. Swift has successfully dropped one, but not the other.

Now, this leaves aside the fact that an actually optimized version of this code can correctly observe that the correct implementation of this function is validity checks on startIndex and endIndex, followed by if element < endIndex { return element} else { return nil }. It's totally understandable that the optimizer wasn't able to see its way to that, but we're a long way from that goal.

To be absolutely clear about this, this is an optimizer miss, not a stdlib issue. We performed the following checks:

    cmp     rdi, rsi
    jne     .LBB47_2  // taken
    jge     .LBB47_12  // not taken

This means we know that rdi is not equal to rsi, and not greater than rsi: that is, it's strictly less than. All subsequent operations on rdi are inc: there is no assignment, and no other math.

Therefore, the subsequent cmp rdi, rsi; jge .LBB47_11 cannot be taken because it's strictly dominated by the loop-end check at the bottom of the loop body. The optimizer had enough knowledge to know this, but failed to.


I want to stress that none of this is me saying that this patch is reasonable. My personal use of the buffer pointers has not been to remove bounds checks, and I've often wanted to add them back. I accept the motivation.

I just want us to be cautious of promising that the optimizer will fix things. Swift's optimizer has never done a good job of removing bounds checks for anything but Array.

16 Likes

Unsafe prefix tells that using of api is .. unsafe.
So seeing in code something like unsafe[unchecked: Int] is a little bit misleading.
What are your thoughts about keeping current api as is and add new [checked: Int]?
There will be no source or abi breakage, existing code will behave the same and in Swift 6 mode it become possible to use new [checked: Int].
It is worth to mention that such kind of design doesn't match well with Swift's safety by default on the one side, and the fact that unsafe is unsafe by design on the other.

1 Like

Yes, absolutely, this will be part of the actual proposal.

2 Likes

Unless we have "safe" / "unsafe" as formal attributes/qualifiers on functions, we have to rely on the naming convention of "UnsafeSomething" to denote unsafe constructs. "unchecked" array subscript is unsafe (instead of safely crashing the app on the relevant out of bounds precondition check the app would likely crash or misbehave unsafely at some later point), hence I'd expect to see "Unsafe" embedded in the name for the unchecked array subscript. array[x, .unsafeUnchecked] or something to that account. Otherwise when auditing code for unsafely we'd have to search for both "unsafe" and "unchecked" terms.

2 Likes

This isn't totally clear. Yes, it's an optimizer miss, but maybe we should be working around that in the standard library instead of waiting for the optimizer to get smarter. Explicitly unchecked subscripts can be one of the tools we use to build those workarounds.

1 Like

This is a strange divergence. If bounds checking is so cheap to be enabled by default for subscripting, why shouldnā€™t UnsafeBufferPointer.baseAddress return a type whose arithmetic operations perform bounds checking?

I would much rather this functionality be introduced under a new safe(r) but still unmanaged BufferPointer type.

2 Likes

How would that type be different from UnsafeBufferPointer itself? A raw pointer without size can't reasonably provide arithmetic with bounds checking. Perhaps one could argue that pointers alone shouldn't have arithmetic operations at all (but probably as a different proposal from this one).

1 Like

It would need to be a fat pointer.

I oppose this proposal and the comments suggesting to weaken Swift's pointer types.