Validatable protocol [Was: Prohibit unsafeBitCast(integer → pointer)]

tera · February 8, 2023, 4:36am

Edit: Please note that as discussion progresses the main focus of this topic is about Validatable (pseudo) protocol and the related runtime validation checks, done in debug mode only or in debug/release modes, perhaps controllable similar to how other diagnostic / sanitising checks are controlled).

Currently this code results in a warning:

let intValue = 12345
unsafeBitCast(intValue, to: Unsafe[Mutablle][Raw]Pointer[<Type>].self)
// 🔶 Warning: unsafeBitCast' from 'Int' to 'UnsafeRawPointer' can be replaced with 'bitPattern:' initializer on 'UnsafeRawPointer'

I believe this warning must be promoted to an error (with appropriate fix-it).

Normally users won't have an issue changing their exiting code from unsafeBitCast(intValue → pointer) to pointer.init(bitPattern:), unless they are creating a null pointer:

unsafeBitCast(0, Unsafe[Mutablle][Raw]Pointer[<Type>].self)

Which crafts an invalid value and invokes an undefined behaviour making the app invalid. Note that creating an invalid null pointer with the init + bitPattern is impossible:

UnsafeRawPointer(bitPattern: intValue)! // crashes if intValue == 0

~~The "workaround" fix to the user code that wants it would be something like this:~~
~~let pointer = integer != 0 ? UnsafeRawPointer(bitPattern: integer) : nil~~

Note that it would no longer be possible to create a null pointer to pass to a C-API that was annotated incorrectly:

void foo(uint8_t * dst_buffer, size_t dst_size) // known to accept null but not annotated
foo(UnsafePointer<UInt8>(bitPattern: intValue), ...)
// 🛑 Value of optional type 'UnsafePointer<UInt8>?' must be unwrapped to a value of type 'UnsafePointer<UInt8>'

The fix could be fixing the annotation (preferred), or using a wrapper C api:

void bar(uint8_t * _Nullable dst_buffer, size_t dst_size) {
    foo(dst_buffer, dstSize);
}
bar(UnsafePointer<UInt8>(bitPattern: intValue), ...)  // ✅

See the relevant recent discussion.

Karl · February 8, 2023, 11:54am

Meh, I don't think it's really worth trying to make unsafeBitCast safer. It exists to defeat the type system (and there are some very rare but legit situations where you need to do that). If somebody is using it without carefully studying the documentation and is ignoring the compiler warnings, that's the real problem.

This is not even close to the worst thing you can do if you misuse unsafeBitCast. Try this on for size:

print("Hello \(oh_dear())")

@inline(never)
func oh_dear() -> Int {
  unsafeBitCast((), to: Never.self)
}

Yes - you can actually use it to instantiate a value of type Never - and of course, that then makes all other control-flow analysis incorrect, and the compiler basically produces junk. We misused the API and got undefined behaviour.

xwu · February 8, 2023, 12:06pm

I don't understand: it is not a problem that unsafeBitCast makes undefined behavior possible (it's explicitly unsafe and it's explicitly for doing very naughty things), and if the problem in your view is that the warning tells you about an alternative that is less unsafe and crashes (which is precisely why the alternative is less unsafe), why do you want to make that a hard error?

What's to stop someone from writing unsafeBitCast(nil as UnsafeRawPointer?, UnsafePointer<UInt8>.self)...

tera · February 8, 2023, 12:52pm

It's a low hanging fruit: someone already went into trouble implementing the "can be replaced with 'bitPattern:' initializer" (so that people are prompted to switch to a "somewhat safer" alternative), making that warning an error is the further small step in the same direction.

By the same account if there was currently a warning IRT:

unsafeBitCast((), to: Never.self)

I'd suggest making it an error as well (and as we don't have that warning yet it's probably worth introducing it).

Nothing at the moment, as well as a simpler:

unsafeBitCast(0.0, to: UnsafePointer<UInt8>.self)

Which is not ideal. I believe we should have a runtime assert in this case (if not a precondition, perhaps enabled by one of the diagnostics sanitising options) – that would be quite a big deterrent. Many ways to have an invalid app, would be one less.

tera · February 8, 2023, 1:12pm

As for the runtime checks – for this and similar cases we may introduce "Validatable" protocol:

protocol Validatable {
    var isValid: Bool { get }
}

public func unsafeBitCast<T, U>(_ x: T, to type: U.Type) -> U {
    let result = original_unsafeBitCast(x, to: type)
    #if DEBUG // release? controlled with diagnostic/sanitising options?
    if let v = result as? Validatable {
        precondition(v.isValid)
    }
    #endif
    return result
}

// example conformance
extension UnsafePointer: Validatable {
    var isValid: Bool {
        Int(bitPattern: self) != 0
    }
}
extension Never: Validatable {
    var isValid: Bool { false }
}

cukr · February 8, 2023, 1:57pm

Unfortunately Validatable wouldn't work, because UB cannot be "undone". You have to check for it before triggering it. As far as compiler is concerned, every UnsafePointer instance is a valid instance.

tera · February 8, 2023, 3:12pm

From a pure computer theory perspective – no it won't work. And there's a chance that just before terminating the app (or even instead of terminating the app) this code will do something very bad:

let result = unsafeBitCast(x, to: type) // triggers UB
if let v = result as? Validatable {
    precondition(v.isValid) // quits

In practice – it can work, and do its job as good as (or better than)
guard edges, and other Xcode diagnostic tools.

Definitely better than this, which worked reasonably well for its time.

Karl · February 8, 2023, 3:29pm

Fundamentally, if you're using unsafeBitCast in a context where you can't be sure you are using it correctly, you shouldn't be using it.

It doesn't exist to just construct arbitrary values out of thin air. It's an extremely low-level tool and while there are some legit uses, they are extremely, extremely rare - basically only situations where you have information that cannot currently be expressed in the type system, but where you know for a fact you are using it correctly.

If there is any doubt at all that you are using this function correctly, do not use it

tera · February 8, 2023, 3:36pm

As well as 1000's of other developers I can tell that "without a doubt I know for a fact I am using this correctly". I just can be wrong (and be aware of that only post factum) – and want the system to protect me when ~~and just before~~ I am getting wrong (in fact I should've said: "just after I become wrong, to minimise the further damage").

Karl · February 8, 2023, 3:39pm

That's not how unsafe code works. With unsafe code, you assert that what you are doing is correct, and the compiler goes along with it even if the result is nonsense. The onus is on you to get it right, and mistakes are allowed to have catastrophic consequences. That's why it's unsafe.

It is not for general consumption. It demands exceptional care.

tera · February 8, 2023, 4:15pm

This is not entirely true. If it was we wouldn't have this precondition:

public func unsafeBitCast<T, U>(_ x: T, to type: U.Type) -> U {
  _precondition(MemoryLayout<T>.size == MemoryLayout<U>.size,
    "Can't unsafeBitCast between types of different sizes")
  return Builtin.reinterpretCast(x)
}

I checked just now - Builtin.reinterpretCast happily "works" with types of different sizes, so there is no inherent need for that precondition check, and according to your logic above we should not have that check and compiler should accept different sizes even if the result is nonsense.

By and large I am cool to accept the POV that we do not need Validatable ("because it can't work in theory even if it works in practice") and that we don't need promoting the unsafeBitCast(intValue → pointer) warning into an error ("because there are so many other ways to crash"), what I am less cool with is having a precondition check in-place to make the unsafe api "somewhat safer" and being reluctant to consider adding another precondition/assert check nearby ("because ... – see all good reasons above") - it just doesn't add up. You either have it or not have it, but please be consistent!

Jumhyn · February 8, 2023, 5:10pm

There is, I think, an important distinction between general validation conditions based on type information that is independent of type identity (and indeed, independent of any specific value information), and bespoke checks implemented for concrete types that verify the bitwise representation for a constructed instance obeys the constraints of that type.

I remain pretty unconvinced by the idea that this particular use of unsafeBitCast is so problematic that it justifies even further protection than the warning already emitted. If folks are using unsafe functionality in Swift and ignoring warnings emitted on the unsafe code, I’d want to see some sort of empirical demonstration that this change would significantly improve the safety of Swift before I’d support it; I don’t think I’m likely to be convinced by different framings of the theoretical argument.

tera · February 8, 2023, 10:07pm

Validatable can be used in other places as well, making unsafe programming much safer overall. Consider this example:

// sample user structure
struct S {
    var b: Bool = true              // offset 0, size 1
    var int: Int32 = 0x12345678     // offset 4, size 4
    var c: NSObject = NSObject()    // offset 8, size 8
}

In the following code I'm deliberately messing with the underlying bytes simulating a developer error (e.g. due to a typo, lack of knowledge, lack of sleep, etc):

var s = S()
withUnsafeMutableBytes_analogue(&s) { p in
    p[7] = 0xFF                 // messing with integer value - ok
// uncomment one of the following to trigger precondition failure:
//    p[0] = 123                // mess with Bool
//    p[8] |= 1                 // mess with the pointer: make it odd
//    memset(p + 8, 0xF0, 8);   // mess with the pointer: make it bad
//    memset(p + 8, 0, 8)       // mess with the pointer: make it null
}
print("next line")

In all of the uncommented examples the precondition failure was triggered and the app safely terminated right away before the app reaches "next line" - a very nice behaviour compared to what could happen without the checks in-place.

For this user structure the following code is supposed to be autogenerated by the compiler:

// should be automatically generated
extension S: Validatable {
    static func isValid(_ v: UnsafeRawPointer) -> Bool {
        isBoolValid((v + 0).assumingMemoryBound(to: UInt8.self).pointee) &&
        isObjectValid((v + 8).assumingMemoryBound(to: Int.self).pointee)
        // won't check Int - all Int bit patterns are valid
    }
}

Where Validatable protocol is slightly changed compared to the version in one of the messages above:

protocol Validatable {
    static func isValid(_ v: UnsafeRawPointer) -> Bool
}

And validity checks themselves for specific library types are shown here.

func isBoolValid(_ value: UInt8) -> Bool {
    if value == 0 || value == 1 { return true }
    print("strange bool value: \(value)")
    return false
}

func isObjectValid(_ object: Int) -> Bool {
    if object == 0 {
        print("object == nil -> invalid object")
        return false
    }
    if (object & 0x0F) != 0 {
        print("object is not aligned properly -> invalid object")
        return false
    }
    let size = objectSize(object)
    if size == 0 {
        print("object size is 0 -> invalid object")
        return false
    }
    if size < 16 {
        print("object size is too small -> invalid object")
        return false
    }
    if objectClass(object) == nil {
        print("not an object")
        return false
    }
    return true
}

This is a quick prototype of withUnsafeMutableBytes_analogue (obviously in this simple form it lacks all the bells and whistles like throw / re throw, etc). I'm using the C-helper API withUnsafeMutableBytesInternal (see below).

func withUnsafeMutableBytes_analogue<T: Validatable>(_ v: UnsafeMutablePointer<T>, execute block: @escaping (UnsafeMutablePointer<UInt8>) -> Void) {
    withUnsafeMutableBytesInternal(v, block)
    #if DEBUG // release? controllable via diagnostic/sanitising options?
    precondition(T.isValid(v), "invalid value")
    #endif
}

Finally some simple C helpers.

// C.h
typedef void (^BytesBlock)(unsigned char* _Nonnull);
void withUnsafeMutableBytesInternal(void* _Nonnull value, BytesBlock _Nonnull block);
long objectSize(long object);
Class _Nullable objectClass(long object);

// C.m
// compile with -fno-objc-arc
#include "C.h"
#include <objc/runtime.h>
#include <malloc/malloc.h>

void withUnsafeMutableBytesInternal(void* value, BytesBlock block) {
    block(value);
}
long objectSize(long object) {
    return malloc_size((void*)object);
}
Class objectClass(long object) {
    return object_getClass((void*)object);
}

As this discussion progresses it is now clear to me that this is a much bigger safety measure than just one particular check in one particular API. Let me change the subject to match the current vector of discussion.

bbrk24 · February 8, 2023, 11:27pm

Just a nitpick but isn’t that what assert is for?

Does this work for classes that don’t subclass NSObject (i.e. Swift-only classes)? If that doesn’t fail the object_getClass call almost certainly will.

tera · February 8, 2023, 11:53pm

Normally so, I just wanted to leave the door open for this to be triggered in release mode under some circumstances (hence the comment) and at the end of the day that should probably be "_precondition" (without #if DEBUG), like in the other similar places in stdlib – that call checks some config variables for debug / release configuration (according to stdlib sources). As for the "print" - it's for my internal debugging, maybe there's some established way for it in stdlib – IDK – just at the point where "precondition" fires there's no information about "what" went wrong, and during debugging I wanted to see that information.

Yes, I checked with native swift objects (albeit on 64 bit CPU only) - there instance size was indeed 16+ bytes (8 bytes for ISA and 4+4 bytes for RC and something else) and object_getClass and even ivar_xxx all worked absolutely fine (surprise). Obviously in the checks there should be some #if 32/64, and perhaps other changes related to the tagged pointers, perhaps NSObjects v native objects and whatnot – the version I presented is just a quick sketch as at this point we are establishing the answer for a bigger question: do we want this feature at all.