Why is `UnsafeBufferPointer`'s `baseAddress` optional?

I thought I remembered something from long ago about a nil baseAddress denoting an empty buffer with zero count, but I've recently encountered buffer pointers where the count was zero and the base address wasn't nil, and upon doing some searching, it appears that this is legal. So what is the point of the nillable baseAddress? When would one want baseAddress to be nil rather than, say, just having the whole UnsafeBufferPointer be nil? Does this serve a purpose beyond simply making APIs like compression_encode_buffer annoying to use?

3 Likes

An empty buffer may be denoted by a (nil, 0) pair, or it may be denoted by a (validPointer, 0) pair. Empty buffers are not canonicalized to (nil, 0) because sometimes the pointer still describes a position (like a zero-length slice).

So why is this like this? Because there are C APIs that produce pointer/length pairs, and some of them produce (nil, 0) for empty results, and if UnsafeBufferPointer didn't accept that you'd have to handle those cases specially.

This has long been controversial; I had a strong hand in its current design, and several other early Swift developers at Apple argued the other way. But it's a trade-off no matter which way you look at it. (The alternative would probably have been what you said, using Optional<UnsafeBufferPointer>.none to represent an empty buffer with a nil pointer, and .some to represent a possibly-empty buffer with a known-non-nil pointer.)

15 Likes

Adding onto what Jordan said, the rule is as follows:

  • If baseAddress is nil, count is zero.
  • If count is zero, baseAddress may or may not be nil.

See "discussion" in the baseAddress documentation.

8 Likes

The thing that would have been nicer about representing this case with a nil UnsafeBufferPointer would be when using things like .withUnsafeBytes on the ContiguousBytes protocol. On every type I've tried that with when empty, the result has been a non-nil baseAddress and a zero count, but since the API docs don't guarantee this, one can't just use the ! postfix operator safely when passing the pointer to something like compression_encode_buffer, necessitating some fairly obnoxious gymnastics to avoid a condition that probably won't actually even occur in practice.

If this were just represented by a nil buffer pointer, withUnsafeBytes could simply pass a non-nil buffer pointer to its closure, and it would thereby have promised that there will be a pointer.

Ah well, water under the bridge I guess.

Yeah, I'm not going to say it's not inconvenient in those cases. However, the way to avoid it is

guard !$0.isEmpty else {
  return
}
use_buffer($0.baseAddress!, $0.count)

which I'd classify only as "minorly obnoxious".

1 Like

Well, the thing is that that sometimes isn't what you need in the situation. After all, a zero-sized buffer is a valid thing to send to a compression algorithm; the results won't be particularly useful (for LZFSE it's just <6276782d 00000000 62767824>), but it is valid, so if you're writing a function that can take arbitrary input and compress it, you have to be able to handle that case. So there's a few ways to deal with it, but they all involve creating an empty pointer somewhere, and then only conditionally using it. And if you don't want to eat the cost of the allocation when it's not necessary, you have to keep track of whether you've done this somehow so you know whether to deallocate the thing afterward. It's not hard, but it is annoying. :person_shrugging:

3 Likes

Providing the buffer size is 0 could that pointer be ever read? Assuming the answer is "never", then this could work without a need to allocate / deallocate (even not on stack) a dummy pointer:

extension UnsafeBufferPointer {
    var nonNilBaseAddress: UnsafePointer<Element> {
        baseAddress ?? UnsafePointer(bitPattern: 1)! // but see † below
    }
}

var x: UnsafeBufferPointer<UInt8> = ...
let baseAddress = x.nonNilBaseAddress

BTW, it's "safe" in the way swift uses this term, but I understand what you mean.

† Might not be your cup of tea: I'd actually prefer a future crash here (to know about some peculiar case or to know that it actually never happens in my app) than to know that my app has code that is a dead code with high probability, and without further investigation (like intensive logging) I won't even know if it is dead or not.

I also wish it was non-optional. If you're dealing with a nullable base pointer, I think it makes sense for that nullability to propagate to the entire buffer, rather than just its base address.

And if you have a null base pointer but your processing code doesn't care about the specific value of the address, you can promote your null buffer to a zero-length, non-null buffer using any base address you like. And that happens once, at construction time, rather than every time somebody accesses the buffer.

If the processing code does care about the specific address, then propagating the nullability seems reasonable and even desirable.

I don't think you need to allocate/deallocate anything. As above - if the specific value of the address isn't important (which I would imagine is the case for compression), I think you should be able to use any address.

2 Likes

Yeah, I'd actually thought about that, but... it's just so gross.

Also thought of just passing the destination buffer pointer, since I know that one's going to be non-nil, because I'm allocating it myself. Seems misleading, though.

I'm currently thinking of just generally special-casing the situation where we receive any zero-length collection as an argument. At least that case will be easier to unit test.

1 Like

In the general case, code that can’t read from the base address may still be relying on alignment (maybe they are putting extra data in the lower bits). We considered adding a standard dangling() factory method to pointers as well, but were worried it would have ill effects to produce a pointer that could actually be in existence (like the last aligned pointer on a particular page) or a pointer that was known to not exist (like a properly-aligned pointer on the first page on Linux or macOS), in case the code that used it also made assumptions about uniqueness or validity. Paralyzed with these hypotheticals, we didn’t end up adding any of these to the standard library.

1 Like

Interesting.

Maybe this then if a particular target API call (like compression_encode_buffer above) is known to tolerate "nil" being passed:

extension UnsafeBufferPointer {
    var veryUnsafeBaseAddress: UnsafePointer<Element> {
        baseAddress ?? unsafeBitCast(0, to: UnsafePointer<Element>.self)
        // 🔶 Warning: 'unsafeBitCast' from 'Int' to 'UnsafePointer<Element>' can be replaced with 'bitPattern:' initializer on 'UnsafePointer<Element>'
        // (The warning in this case is somewhat misleading.)
    }
}

let p: UnsafeBufferPointer<UInt8> = ...
someApiCall(p.veryUnsafeBaseAddress, count: p.count)

That's undefined behavior. If the C API can tolerate being passed a null pointer for an empty buffer, then ideally it should not annotate itself as requiring a _Nonnull argument to begin with, and you can pass p.baseAddress along without any ceremony.

One thing that might be nice is an API that optionally gives you the nonnull base address and count for a nonempty buffer, so you can combine the isEmpty check and baseAddress unwrap into one check:

extension UnsafeBufferPointer {
  var nonemptyBaseAddressAndCount: (UnsafePointer<Pointee>, Int)? {
    if isEmpty { return nil }
    return (baseAddress.unsafelyUnwrapped, count)
  }
}

guard let (base, count) = $0.nonemptyBaseAddressAndCount else {
  return
}
use_buffer(base, count)
3 Likes

I will say that the principal downside of the construction of the buffer pointer as allowing non-nil base addresses is that we're playing whack-a-mole with unnecessary traps and branches in that code. Every now and then I trip over some case where the buffer pointers are force-unwrapping or otherwise asserting a condition that cannot possibly be true. Here is my most recent example.

4 Likes

I've actually used that unsafeBitCast(0, to: UnsafePointer<Something>.self) trick before. DADiskMountWithArguments's last argument used to be imported as an UnsafePointer<CFString>, but the trouble was that it was supposed to be a nil-terminated array. So the only way to do it was to put an unsafe bit-casted 0 at the end. Fortunately that bug eventually got fixed.

Not something I'd really want to do other than as a last resort, though.

Then compiler won't catch most typical errors of passing null pointers (with non zero sizes) – would be a greater evil.

Null is probably the safest "unsafe" value a pointer can have, in either C or Swift, since you'll almost certainly crash if you try to dereference a null pointer or anything close to one on any contemporary platform, so it's hard for me to believe that's worse than the alternatives when we're talking about raw C APIs. Jordan noted that there are hazards to making up fake non-null base addresses for empty buffers, such as alignment, but another one is programmer error—imagine if you fat-finger a call involving function(buffer1.baseAddress, buffer2.count) where buffer1 is empty, but buffer2 is not, and the artificial baseAddress ends up being a valid pointer to program state, or an attacker-controlled location.

3 Likes

Making C API parameter nullable just to allow the (exceptional) case of passing nil pointer with size=0 (at the same time opening the door of passing nil pointer in all cases) is a greater evil because runtime errors are harder and more expensive to catch than compiler time errors.

// C
void a_call(void* _Nonnull pointer, long size);
void b_call(void* _Nullable pointer, long size);

// Swift
var buffer: UnsafeBufferPointer<UInt8> = ...
var somePointer: UnsafePointer<UInt8>? = ...
var someSize: Int = ...

a_call(somePointer, someSize)     // Compiles: NO ✅
a_call(bp.baseAddress, bp.count)  // Compiles: NO ✅
a_call(somePointer!, someSize)    // Compiles: YES, Runtime: ok or crash 🔶
a_call(bp.baseAddress!, bp.count) // Compiles: YES, Runtime: ok or crash 🔶

b_call(somePointer, someSize)     // Compiles: YES, Runtime: ok or crash 🔶
b_call(bp.baseAddress, bp.count)  // Compiles: YES, Runtime: ok ✅
b_call(somePointer!, someSize)    // Compiles: YES, Runtime: ok or crash 🔶
b_call(bp.baseAddress!, bp.count) // Compiles: YES, Runtime: ok or crash 🔶

That "a_call(somePointer, someSize)" doesn't compile is a big help of preventing nil-pointer bugs. The issue discussed in this thread is a mere "o-small inconvenience" in comparison.

The overwhelming majority of C code in the world is not nullability-annotated, and the pattern of "null + 0" is extremely common. Annotating pointers as non-null doesn't prevent the dangerous bugs (passing wild or dangling pointers), but does prevent the fairly safe one (dereferencing null).

Additionally, if you're calling the C API from Swift then UnsafeBufferPointer does prevent you passing a non-null pointer with zero length.

1 Like

No it doesn't:

  1> let ptr = UnsafeMutablePointer<UInt8>.allocate(capacity: 16)
ptr: UnsafeMutablePointer<UInt8> = 0x600000004080 {
  pointee = 0
}
  2> let buf = UnsafeBufferPointer(start: ptr, count: 0)
buf: UnsafeBufferPointer<UInt8> = 0 values (0x600000004080)
  3> print(buf.baseAddress!)
0x0000600000004080
  4> print(buf.count)
0
1 Like

Apologies, put my "non" in the wrong place. I meant "passing a null pointer with non-zero length". If you're passing a zero-length, correct calling code can never dereference the provided pointer safely, so whether it's null or not is immaterial. Passing a non-zero length with a null pointer is the only case where the pointer can be dereferenced in correct code, and the buffer pointers prevent that.

1 Like