Why is `UnsafeBufferPointer`'s `baseAddress` optional?

Joe_Groff · January 30, 2023, 11:12pm

Well, you can convince the processor to map memory there, but if the compiler uses 0 as the null pointer representation, then you can't use 0 as the address of a C object or Swift value.

tera · January 31, 2023, 12:25am

It's on me. We started with the original example like this:

compression_encode_buffer(... x.baseAddress!, x.count ...)

Quickly skimmed through a somewhat bulky:

var dummyVar: Int = 0
withUnsafePointer(to: dummyVar) { dummy in
    compression_encode_buffer(... x.baseAddress ?? dummy, x.count ...)
}

promptly discussed the idea of having this dummy pointer built-in:

compression_encode_buffer(... x.baseAddress ?? .dangling, x.count ...) or the equivalent:
compression_encode_buffer(... x.veryUnsafeBaseAddress, x.count ...)

and then side-tracked into a discussion of constructing a hand crafted zero-bit pointer:

let null = unsafeBitCast(0, to: UnsafePointer<Int>.self)
compression_encode_buffer(... x.baseAddress ?? null, x.count ...)

which works in practice (at least for this specific example on today's OS/hardware) but triggers an undefined behaviour, turns out to be an excellent pandora box opener, and naturally lead to a further sub-discussion of tightening "unsafeBitCast" API so it won't allow crafted zero pointers by means of compile time and runtime checks.

Sorry to sidetrack this thread, please continue with the original question.

CharlesS · January 31, 2023, 1:34am

One could say that the mention of unsafeBitCast, with its subsequent effects on the functioning of this thread, is itself a pretty good metaphor for undefined behavior. ;-)

Geordie_J · January 31, 2023, 6:15pm

FWIW, I also wish UnsafeBufferPointer itself was nil in the case where its baseAddress is nil. That's how UnsafePointer itself works (AFAIK), so the inconsistency is very weird to me. As others have pointed out, using ! is an unfortunate workaround that sometimes hides bugs, and I'd personally rather never use it in my codebases (but this is one case where it feels clunky and unnecessary not to use it, which is a shame).

Also, it's still possible to have Optional<UnsafeBufferPointer>, which seems redundant.

jrose · March 11, 2023, 10:41pm

It really does come down to whether you consider (nil, 0) a valid empty buffer or not. If you do, then there’s a difference between .some((nil, 0)) (an empty buffer) and .none (no buffer), just like there’s a difference between .some(NSArray()) and .none.

EDIT: I had this open in a tab and didn’t see how long ago the conversation left off, whoops.

Geordie_J · March 18, 2023, 11:54pm

I wonder what “valid” means in this sense. To me the most important question is, “is the difference meaningful and beneficial”? Maybe I’m in the minority with this but I almost always initialise and use UnsafeBufferPointers as immutable containers, so the fact that the buffer itself can be .some without any contents does not seem useful to me. Mutable or not, I don’t understand why we’d want to make the obvious footgun of allowing subscripting into a nil buffer accessible, for example.

I’d be genuinely interested to find out why others find the idea of an empty buffer that isn’t itself .none compelling though.

To share a personal anecdote, when I was starting out learning Swift in 2014 I knew nothing about pointers or memory management (the stack vs the heap meant nothing to me at the time, for example). When this API was introduced, its design made me wonder whether the baseAddress could suddenly become nil based on something out of my control. These APIs are advanced tools and one could argue that “one should know what one is doing” when using them, but I feel this particular part could help beginners learn if it was Swiftier by design.

lukasa · March 20, 2023, 1:07pm

This pattern is in-line with other Collection types in Swift, which can be empty without being nil.

This is much less scary than the other thing you can do, which is subscripting a non-nil buffer past its count. That is, UnsafeRawBufferPointer(startAddress: nil, count: 0)[5] is far safer than UnsafeRawBufferPointer(startAddress: 0x00007f0000000000, count: 0)[5].

As noted up-thread, quite a lot of C APIs vend data in the form of one parameter that is a pointer and one that is a length. In those cases, it is almost always true that the pointer is allowed to be nil if the count is 0. UnsafeBufferPointer models that situation naturally, which is really its original purpose.

ksluder · March 20, 2023, 2:23pm

Why is this safer? As noted upthread, not all environments trap on access to the zero page.

lukasa · March 20, 2023, 2:52pm

It's safer because there is no scenario where it's more dangerous (it's at worst equivalent to the other), and many scenarios where it's less dangerous.

As a practical matter, I'll also add that environments that do not trap on the zero page definitely exist, but fall into the realm of edge cases. They exist, they have historical value, but they are deep into the long tail of practical programming problems to solve. Swift reflects this in its design, in that the all-zero pointer value is definitionally uninhabited in the Unsafe[Mutable][Raw]Pointer[<T>] family of types. It's definitely not a good idea to design this type around a behaviour that other types in Swift cannot handle, and that is so uncommon.

ksluder · March 20, 2023, 3:08pm

I disagree with this characterization. For example, on x86, Linux uses the zeroth page to communicate from the boot loader to the kernel. And setting up the zeroth page to trap requires being able to construct a zero pointer. Kernel programming in Swift is not esoteric or historical; in fact, it’s aspirational.

I think this is a mistake that will require correction.

lukasa · March 20, 2023, 3:14pm

This is taking the conversation way into the long grass and is deeply off topic for this thread, so I'll omit a lengthy response. Instead, I'll just say that a dialect of Swift suitable for kernel programming will contain approximately nothing from Swift's standard library, which includes the unsafe pointer types. Their design for kernel programming need not match the one used here.

ksluder · March 20, 2023, 5:30pm

It makes sense that kernel programming would only be able to use a subset of the standard library, but it would be extremely surprising for a fundamental type such as UnsafeRawBufferPointer to be excluded in that subset. Would UnsafeRawBufferPointer be missing from other non-userland environments such as DriverKit or embedded microcontrollers?

Kernel code written in C uses the exact same array and pointer syntax as userland code. Swift ought to strive for the same. Perhaps when linked into userland code, the standard library can elide checking UnsafeRawBufferPointer.baseAddress for nil, but the API shape should accommodate the environments where that check is meaningful.

lukasa · March 20, 2023, 5:39pm

The raw pointer types are a slightly different question, but the typed ones certainly risk being excluded. Generics are one of the things that cannot safely be used in a kernel context today due to their requirement to use runtime-allocated metadata.

The syntax can be the same, but the types can't. This brings us full-square back to my original point, which is the question of a pointer allowing the nil representation as a valid value. The version of Swift running in a kernel context will be different in huge ways from the one that runs in user context: this difference is one of the smallest.

I think we're drifting way off track here because I'll note that URBP does allow a nil base address: that's the whole premise of the thread. It just doesn't allow it for buffers with nonzero size. The shape of URBP does not forbid having a NULL pointer with non-zero size: the shape of UnsafeRawPointer, however, does.

Max_Desiatov · March 20, 2023, 5:45pm

This isn't a requirement though, there are certain conditions under which generics are specialized and that's something one would rely on if needed.

ksluder · March 20, 2023, 5:49pm

Is it not possible for the type system to treat Optional<UnsafeRawPointer>.none and UnsafeRawPointer(bitPattern: 0) as distinct values, even though their bit representation is identical?

lukasa · March 20, 2023, 5:56pm

This is spiralling way, way, off-topic, but again I'll note that this is only true if we prevent debug mode from being used in kernel programming or substantively change how it works. I really don't want this thread to try to be "let's enumerate exactly what things we can and cannot use in kernel-mode Swift". The important take-away here is that unless you substantially change the way the language works, the answer is "you can't use the vast majority of it".

I don't see how it could. To what would the following code compile?

func returnsAnOptionalPointer() -> UnsafeRawPointer?

func functionUnderTest() -> Int {
    if returnsAnOptionalPointer() != nil {
        return 1
    } else {
        return 0
    }
}

ksluder · March 20, 2023, 6:03pm

It would compile to bl returnsAnOptionalPointer() ; tst x0 #0 ; ret as expected. The type system guarantees that the only sense in which that comparison can be true is when the function has returned nil.

Implicit in this is that nil is not shorthand for “pointer to zero”, which I know contradicts my original train of thought.

lukasa · March 20, 2023, 6:05pm

But so what did the function return if it returned the pointer with the all-zero bit pattern? That is, what is the value of x0 for the "returned nil" case and for the "returned the all-zero bit pattern pointer" case?

ksluder · March 20, 2023, 6:13pm

I think the more illustrative example for that case is if let ptr = someFunc(), ptr == .zero { return 1 } else { return 0 }.

There’s no way around the fact that in-band signaling of nil causes ambiguity here, but the speed advantage is too valuable to expand Optional to two bytes. But unlike C, Swift always forces you to check for nil before you can compare against zero, and once you have a value of pointer type you know it must not be referring to nil. The only case you have to deal with is disambiguating return values of optional pointer type.

Since there is currently no ABI representation for the zero pointer, it is conceivable that one could be invented specifically for function returns. Or the compiler could just warn if a pointer value that was created by unwrapping is compared to zero, forcing the developer to use some sort of special syntax to signal their intent.

jrose · March 20, 2023, 6:33pm

@ksluder is correct. Swift, like C and C++, assumes the existence of an invalid address; it does not inherently assume that the bit pattern for that address is 0. A lot of people’s code probably does make that second assumption, so things may not port cleanly to such an environment, but there is no requirement that UnsafeRawPointer(bitPattern: 0) == nil. (This is equivalent to reinterpret_cast<void*>(0) in C++, which is also not required to be nullptr.)