Why is `UnsafeBufferPointer`'s `baseAddress` optional?

tera · January 30, 2023, 2:08pm

Prevent?

// C.h
void ccall1(const void* _Nullable ptr, long size);
void ccall2(const void* _Nonnull ptr, long size);

// test.swift
ccall1(nil,  0) or ccall1(.none, 0) 
ccall2(null, 0) or ccall2(.null, 0)

// utils.swift
extension UnsafeRawPointer {
    static var null: UnsafeRawPointer {
        unsafeBitCast(0, to: UnsafeRawPointer.self)
    }
}
let null = UnsafeRawPointer.null

The call site would've been very similar.

Edit:

Can you clarify please?

var a: UnsafeBufferPointer<UInt8> = ...
var b: UnsafeBufferPointer<UInt8> = ...

ccall1(a.baseAddress, b.count) // mistakenly using b for a

xwu · January 30, 2023, 2:28pm

N.B. (emphasis added):

lukasa · January 30, 2023, 2:46pm

On the topic that @xwu noted: unsafeBitCast is "unsafe", it's right there in the name. You're required to obey the rules when you use it, and if you don't then you've broken the rules and you get what you get. In this case, the unsafe pointer types are defined not to have an all-zero representation (that's left for the Optional versions). Your unsafe bit cast has crafted an invalid value, and so you've invoked undefined behaviour and your program is invalid.

tera · January 30, 2023, 3:05pm

I have hard time imagining correct code where I am passing null pointer with a non zero size to a C API... all cases I can think of are "incorrect".

So is "UnsafeBufferPointer" solely based on its name.

Right, I have macOS, iOS, watchOS, tvOS to test on and I can ask someone to test this for me on linux: how do I make that code snippet to crash or misbehave on any of these platforms? Or do you mean some hypothetical platform where nil is different to unsafeBitCast(0, to: ...) the platform that may not yet exist?

scanon · January 30, 2023, 3:18pm

The essence of undefined behavior is that it is undefined. There are no semantics that you can validate by testing when you violate these invariants. Your tests may happen to pass, they may happen to fail, but neither one makes any guarantee about program behavior, because the behavior is undefined. It might "work" today, but fail when the compiler updates, when you get different user input, when a linked library changes, or when the program is run on different hardware. Do not do this. It is always a bug, even if you can't yet make the bug manifest in observable errors.

lukasa · January 30, 2023, 3:20pm

All of those cases are incorrect, which is why it's the case the buffer pointers forbid. The pointers allow the case of a null pointer with zero size, which is what the discussion in this thread is about.

Yes, it is, and if you misuse that type you also invoke undefined behaviour.

If your requirement is that invoking undefined behaviour produces a crash at exactly a specific location in your program, then I'm sorry, that can't be done because that's not how undefined behaviour works. Borrowing John Regehr's excellent quote:

It is very common for people to say — or at least think — something like this:

The x86 ADD instruction is used to implement C’s signed add operation, and it has two’s complement behavior when the result overflows. I’m developing for an x86 platform, so I should be able to expect two’s complement semantics when 32-bit signed integers overflow.

THIS IS WRONG. You are saying something like this:

Somebody once told me that in basketball you can’t hold the ball and run. I got a basketball and tried it and it worked just fine. He obviously didn’t understand basketball.

Of course it is physically possible to pick up a basketball and run with it. It is also possible you will get away with it during a game. However, it is against the rules; good players won’t do it and bad players won’t get away with it for long.

In this instance, Swift is very clear that UnsafePointer<T> corresponds to _Nonnull T * and that UnsafePointer<T>? corresponds to _Nullable T *. If you lie to the compiler about whether your pointer is null, the compiler can and will punish you. Here's the canonical example (compiler explorer link):

func good(_ ptr: UnsafePointer<Int>?) -> Int? {
    if let ptr {
        return ptr.pointee
    } else {
        return nil
    }
}

func youBrokeIt(_ ptr: UnsafePointer<Int>) -> Int? {
    return good(ptr)
}

This code is entirely correct code in Swift. No undefined behaviour is exhibited in the presence of this program. We correctly check for nil, and if the pointer is nil, we'll happily return nil. Otherwise, we'll dereference it.

Here's the compiled assembly:

output.good(Swift.UnsafePointer<Swift.Int>?) -> Swift.Int?:
        test    rdi, rdi
        je      .LBB1_1
        mov     rax, qword ptr [rdi]
        jmp     .LBB1_3
.LBB1_1:
        xor     eax, eax
.LBB1_3:
        test    rdi, rdi
        sete    dl
        ret

output.youBrokeIt(Swift.UnsafePointer<Swift.Int>) -> Swift.Int?:
        mov     rax, qword ptr [rdi]
        xor     edx, edx
        ret

Observe that the second function has optimised down to always dereference the pointer. The null-pointer check is gone! Your code generates a pointer that will deterministically trigger a crash when passed to the second function. The compiler's optimisation is safe in code that doesn't invoke UB, because the second function promises the pointer is non-nil. The bug isn't in this code, it's in yours, because you created a value that is explicitly invalid by the language of the data type.

tera · January 30, 2023, 4:44pm

I can see your point that it is incorrect to use the "crafted" zero bit-cast pointer will-nilly in Swift. However in my example above it is only used when calling the C-API:

ccall2(unsafeBitCast(0, to: UnsafeRawPointer.self), 0)

similar to running with a basketball in between games.

If you are saying that there is no guarantee that future swift compiler itself won't insert some code in between "swift caller" and "C callee" to cause this example to crash or misbehave – I agree, it can – it's low but non-zero probability.

lukasa · January 30, 2023, 5:02pm

I'm not saying any of that: I'm saying that the act of creating the invalid data structure is itself running with the ball. The fact that you are sometimes able to get away with it doesn't change the fact that you've constructed something Swift forbids.

Using John Regehr's post again, your var null is a "type 3 function":

These functions admit no well-defined executions. They are, strictly speaking, completely meaningless: the compiler is not even obligated to generate even a return instruction.

If this function is actually dead, you're safe. Otherwise, you're in danger:

If any step in a program’s execution has undefined behavior, then the entire execution is without meaning. This is important: it’s not that evaluating (1<<32) has an unpredictable result, but rather that the entire execution of a program that evaluates this expression is meaningless. Also, it’s not that the execution is meaningful up to the point where undefined behavior happens: the bad effects can actually precede the undefined operation.

In this instance, yeah, you might get away with it: moving a 0 into that value will probably only move it into a register and inadvertently probably generate the correct representation for NULL which is probably tolerated by the relevant function. But the code you've written is fundamentally incorrect, and getting away with it now doesn't mean you'll get away with it forever: swiftc is free to assume you'll never do this, and make optimisations accordingly.

tera · January 30, 2023, 5:13pm

I am not surprised C/C++ was used in your quotes - that language is known to be heavily littered with "unspecified/undefined behaviours". I had the impression that Swift tried to avoid those wording in its documentation.

How about this: instead of having a potential situation of unaware developer like me triggering an "undefined behaviour" and thus having "an invalid program" how about we add the relevant precondition check inside "unsafeBitCast(...)" – the check that prevents creating the crafted 0 pointers – so it would be impossible to break the law to begin with?

lukasa · January 30, 2023, 5:28pm

C has three kinds of behaviours: specification-defined, implementation-defined, and undefined. Swift lacks a specification, so there can be no specification-defined behaviour. That means there's only implementation-defined and undefined. Swift doesn't very explicitly call out what is undefined behaviour, but in this instance the documentation does (emphasis added):

type and the type of x must have the same size of memory representation and compatible memory layout.

Warning

Calling this function breaks the guarantees of the Swift type system; use with extreme care.

I think it's hard for the documentation to be clearer on this point. There are rules to how to use unsafeBitCast, but it will not enforce them: that's why it's "unsafe".

There already exists a safe way to perform Int to UnsafePointer conversion in Swift: .init(bitPattern:). unsafeBitCast even warns about this in your example:

test.swift:1:1: warning: 'unsafeBitCast' from 'Int' to 'UnsafeRawPointer' can be replaced with 'bitPattern:' initializer on 'UnsafeRawPointer'
unsafeBitCast(0, to: UnsafeRawPointer.self)
^~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~

(Though this warning is IMO insufficient, as it fails to clarify that the two operations aren't identical.) This is pretty close to giving you what you want in terms of warning users.

Swift could, and some constrained checking might be worthwhile here, at least in debug mode. However the problem with unsafe code is that it's always possible to break the rules. There are many ways to produce an invalid representation for a given type in Swift in unsafe code, and the compiler simply cannot check them all without incurring a performance penalty.

More broadly, though, adding these misunderstands what unsafeBitCast is for. unsafeBitCast is for those moments when you cannot convince the compiler that what you're doing is sound, but you know it is. It's the "shut up and get out of my way" hammer. This means that checking that the compiler knows what you're doing is safe would defeat the point of the function: the compiler doesn't know, that's why you're calling this function to begin with. While a small number of cases could be checked, the general case cannot be, without defeating the point of the function altogether.

tera · January 30, 2023, 9:26pm

If what you saying is true, and (rewording the quotes above) the mere usage of unsafeBitCast(0, to: UnsafeRawPointer.self) triggers undefined behaviour, makes the program invalid, the execution of such app meaningless and the bad effects in the app can actually precede calling unsafeBitCast(0, to:...) then this is very serious and we should not stop just having this conversation in this topic (which is visible to some 1000s developers here while there are many more developers out in the field who would miss it). In the standard library we must check for this frequent special case of 0 being cast to a pointer type and specifically prohibit it. Definitely in debug mode, perhaps even in release mode if this check can be made performant. In addition to that documentation must include and specifically outlaw the special case of 0 being cast to a pointer type as the current wording was definitely not strong enough (I considered 0 to have "a compatible layout" to UnsafeRawPointer's and I thought I was already using "extreme care" which's not the case apparently). The Xcode fix-it warning (which, in all fairness I considered to be a lie, as it is impossible to change unsafeBitCast(0, to: UnsafeRawPointer.self) to "UnsafeRawPointer(bitPattern: 0)!") should be changed from warning to error.

Edit:

0 is very important and frequent special case to worth having checks for. For example unsafeBitCast(1234, to: UnsafeRawPointer.self) while making a bogus and dangerous pointer – is not trying to make some "invalid" UnsafeRawPointer (from UnsafeRawPointer perspective (non zero == valid, zero == invalid).

Allowing another analogy the special checks for 0 in unsafeBitCast would be similar to having a fence around railway – want stop all people but definitely would help the majority to not do the wrong choice.

Edit2. Maybe it would be possible to prohibit it without any performance penalty by using some overload?

func unsafeBitCast(_ x: Int, to type: UnsafeRawPointer) -> UnsafeRawPointer {
    fatalError("prohibited")
}

(not sure how to do this overload properly, it doesn't work as written).

itaiferber · January 30, 2023, 9:52pm

I think you may be misunderstanding the point — unsafeBitCast(...) is an extraordinarily low-level operation which states that "I have already done the checking necessary to ensure that what I am about to do is valid. Make it so, at any possible cost." This is the point of the unsafe labeling of this operation (and all other unsafe operations): by using this function, you are accepting the tradeoff that if you haven't upheld your end of the bargain, undefined behavior is what you get in return. So beside the fact that unsafeBitCast(...) doesn't actually compile to any code, adding any sort of checking defeats the entire purpose of the operation: if you need safety, you use the appropriate initializers which help prevent making invalid calls by doing that checking for you up-front; if you've typed unsafe in your codebase, you have already crossed the line into informing the compiler that you know what you're doing and that you don't want any additional safety checks.

tera · January 30, 2023, 9:59pm

Actually I started seeing a bigger picture here, there are two types of "unsafeness" here, in the first unsafeBitCast(1234, to: UnsafeRawPointer.self) compiler is telling me "ok, you are doing something dangerous but I trust you". In the second unsafeBitCast(0, to: UnsafeRawPointer.self) it could well tell me: "come on, you are not even making an UnsafeRawPointer here, you are doing a different type altogether!" which is the whole new level of unsafeness.

I just edited my post above (see Edit2) - would it be possible to prohibit it at compile time without any runtime penalty or with a minimal runtime penalty paid only by those who break the law?

Jumhyn · January 30, 2023, 10:01pm

I'm not sure I understand—as @lukasa mentions above, you do get a diagnostic from the compiler when you attempt to write out unsafeBitCast(0, to: UnsafeRawPointer.self):

'unsafeBitCast' from 'Int' to 'UnsafeRawPointer' can be replaced with 'bitPattern:' initializer on 'UnsafeRawPointer'

What more do you want from the compiler here?

tera · January 30, 2023, 10:04pm

An error instead of a warning. Plus a runtime assert in debug builds that would catch it even I somehow managed to bypass the compilation check.

itaiferber · January 30, 2023, 10:05pm

Assuming you want to prohibit this behavior altogether: possible to a very limited extent, but unlikely to be worth it. The compiler can catch unsafeBitCast(0, to: UnsafeRawPointer.self) if typed literally, but what about

let x = 0
let ptr = unsafeBitCast(x, to: UnsafeRawPointer.self)

or

let x = UInt.random(...)
let ptr = unsafeBitCast(x, to: UnsafeRawPointer.self)

There are certain values of x for which the cast is correct — but those can only be known at runtime.

There are also so many other possible invalid casts you can perform with unsafeBitCast, both at compile time and at runtime; it doesn't make sense for the compiler to try to catch them. Conversely, it's also unlikely to be worth expanding the compiler to check for one, two, or even a small handful of cases.

In the end, the simplest rule wins out: unsafeBitCast(...) is you documenting that you've accepted undefined behavior for invalid input. There is already a way to spell the operation you're actually trying to perform — there's only so much the compiler can do to try to help you use it.

Jumhyn · January 30, 2023, 10:23pm

And the valid/invalid values are also going to be platform-specific. It seems like a bit of a layering issue long-term to have the Swift frontend try to diagnose whether specific casts here are correct or not for arbitrarily many backends.

tera · January 30, 2023, 10:24pm

[quote="itaiferber, post:36, topic:62791"]
The compiler can catch unsafeBitCast(0, to: UnsafeRawPointer.self) if typed literally, but what about ...

literal 0 can be caught, the above "let x = 0" could be caught, "var x = 0" in some cases - but I see your point, not all could be caught by the compiler. And we don't want to pay a penalty at runtime, which leaves us runtime (debug) checks, see bellow.

Can you name a couple? I can't think of anything obvious.

Ok, not a compiler check, but here's a sketch of the next best thing - runtime (debug only) checks:

#if DEBUG
func isValid(_ a: Any) -> Bool {
    true // default case
}
func isValid(_ a: UnsafeRawPointer) -> Bool {
    unsafeBitCast(a, to: Int.self) != 0
}
func isValid<T>(_ a: UnsafePointer<T>) -> Bool {
    unsafeBitCast(a, to: Int.self) != 0
}
#endif

#if DEBUG
func unsafeBitCast<T, U>(_ x: T, to type: U.Type) -> U {
    let y = original_unsafeBitCast(x, to: type)
    assert(isValid(y))
    return y
}
#else
func unsafeBitCast<T, U>(_ x: T, to type: U.Type) -> U {
    original_unsafeBitCast(x, to: type)
}
#endif

Jumhyn · January 30, 2023, 10:28pm

Presumably, the compiler will be free to completely elide your isValid checks because it is of course impossible for the result of unsafeBitCast of an Unsafe*Pointer to Int to result in the value 0. :)

Karl · January 30, 2023, 11:07pm

Well, you can sometimes map memory to the zero page (apparently there are legit reasons for doing so). It's determined by system policies.

I'm reminded of a fascinating exploit in the Linux kernel which was caused by the system sometimes failing to check those policies together with GCC optimisations for null pointers (which become invalid if null is a valid pointer). It's such a mess

I don't think we should worry too much about preventing things like that. If you're accidentally passing along an address and count from different buffers, you're already playing with fire. And even if buffer1 is empty, its pointer does not need to be nil, so - I mean, it's nice on those occasions when the mistake does lead to an early crash, but you can't rely on it happening.

I'm not sure why this has become a discussion about unsafeBitcast; it appears to have nothing to do with the original question AFAICT.