Making pointer nullability explicit (using Optional)

jrose · March 18, 2016, 8:22pm

In an offline conversation Doug pointed out that Clang and LLVM also assume that NULL is 0, so there'd be a lot more work than just improving Swift to make Swift work on such a platform. I think when the time comes the implementer for such a platform can designate a non-zero bit pattern to use as an invalid pointer instead.

Jordan

···

On Mar 18, 2016, at 13:11 , Félix Cloutier <felixcca@yahoo.ca> wrote:

GCC may have some hints:

-fdelete-null-pointer-checks
Assume that programs cannot safely dereference null pointers, and that no code or data element resides at address zero. This option enables simple constant folding optimizations at all optimization levels. In addition, other optimization passes in GCC use this flag to control global dataflow analyses that eliminate useless checks for null pointers; these assume that a memory access to address zero always results in a trap, so that if a pointer is checked after it has already been dereferenced, it cannot be null.
Note however that in some environments this assumption is not true. Use -fno-delete-null-pointer-checks to disable this optimization for programs that depend on that behavior.

This option is enabled by default on most targets. On Nios II ELF, it defaults to off. On AVR and CR16, this option is completely disabled.

Passes that use the dataflow information are enabled independently at different optimization levels.

I recall that my first time being bit by null being valid was on AVR32 (though with GCC 3.something), and this seems to confirm it.

jrose · March 18, 2016, 10:13pm

Well, yes, optional chaining is a branch, which is a minor performance cost that can't always be optimized away. The problem cases I've seen are either using a for-loop over the buffer (something that's come up on the list before) or passing the base address to a C function (which could probably be handled with ??).

The main thing is just that UnsafeBufferPointer doesn't act like a pointer; it acts like a buffer, and a zero-element buffer turns out to be a perfectly useful degenerate case (and something that will happen anyway for an array with capacity reserved but no elements). In the cases where you use it as a Collection, nothing cares whether the base pointer is null.

Jordan

···

On Mar 18, 2016, at 14:54, Russ Bishop <xenadu@gmail.com> wrote:

On Mar 18, 2016, at 9:49 AM, Jordan Rose <jordan_rose@apple.com <mailto:jordan_rose@apple.com>> wrote:

On Mar 17, 2016, at 21:08 , Russ Bishop <xenadu@gmail.com <mailto:xenadu@gmail.com>> wrote:

I’m very much +1 on this idea.

On Mar 17, 2016, at 6:59 PM, Jordan Rose via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:
<https://github.com/jrose-apple/swift-evolution/tree/optional-pointers#open-issue-unsafebufferpointer>Open Issue: UnsafeBufferPointer

The type UnsafeBufferPointer represents a bounded typed memory region with no ownership or lifetime semantics; it is logically a bare typed pointer (its baseAddress) and a length (count). For a buffer with 0 elements, however, there's no need to provide the address of allocated memory, since it can't be read from. Previously this case would be represented as a nil base address and a count of 0.

With optional pointers, this now imposes a cost on clients that want to access the base address: they need to consider the nil case explicitly, where previously they wouldn't have had to. There are several possibilities here, each with their own possible implementations:

Like UnsafePointer, UnsafeBufferPointer should always have a valid base address, even when the count is 0. An UnsafeBufferPointer with a potentially-nil base address should be optional.

UnsafeBufferPointer's initializer accepts an optional pointer and becomes failable, returning nil if the input pointer is nil.

UnsafeBufferPointer's initializer accepts an optional pointer and synthesizes a non-null aligned pointer value if given nil as a base address.

UnsafeBufferPointer's initializer only accepts non-optional pointers. Clients such as withUnsafeBufferPointermust synthesize a non-null aligned pointer value if they do not have a valid pointer to provide.

UnsafeBufferPointer's initializer only accepts non-optional pointers. Clients using withUnsafeBufferPointermust handle a nil buffer.

UnsafeBufferPointer should allow nil base addresses, i.e. the baseAddress property will be optional. Clients will need to handle this case explicitly.

UnsafeBufferPointer's initializer accepts an optional pointer, but no other changes are made.

UnsafeBufferPointer's initializer accepts an optional pointer. Additionally, any buffers initialized with a count of 0 will be canonicalized to having a base address of nil.

I'm currently leaning towards option (2i). Clients that expect a pointer and length probably shouldn't require the pointer to be non-null, but if they do then perhaps there's a reason for it. It's also the least work.

Chris (Lattner) is leaning towards option (1ii), which treats UnsafeBufferPointer similar to UnsafePointer while not penalizing the common case of withUnsafeBufferPointer.

What’s the use of an UnsafeBufferPointer with zero count? Semantically that is making a claim that it can’t back up (“I have allocated memory at location X” which isn’t compatible with the idea of “zero count/size").

Without knowing more context I’d strongly favor (1i). If an array is empty the natural expectation for withUnsafeBufferPointer is you get UnsafeBufferPointer<Element>? = nil, which follows the behavior of the rest of the language and things like guard let make it trivial to handle properly. If someone really has a problem with it they can add ifUnsafeBufferPointer() that returns a non-optional pointer and skips executing the closure if the Array is empty (which is the behavior of your standard for loop).

The important use case here is that "array.withUnsafeBufferPointer" should always do something (i.e. it usually can't just skip the closure), and it turns out it's easiest if the zero-element case is treated the same as everything else. When converting over the standard library I found that very few of them wanted to do something different in the zero-element case, and then it would be bad to force Array to allocate memory just to not use it. That is, there aren't actually any clients interested in knowing whether the base address is valid, and all of the ones that do have to think about it (because they use it directly) aren't getting any value out of it.

Jordan

Does optional chaining (ptr?.count ?? 0) or the guard check (guard let ptr = ptr else { return }) impose a performance (or cognitive) burden here? I’m OK with (2i), it just seems less Swift-ish than (1i).

I don’t use UnsafeBufferPointer a lot so I’ll happily live with whatever the choice is.

jrose · March 21, 2016, 6:18pm

I think it's important to preserve alignment; LLVM will sometimes optimize based on this.

Jordan

···

On Mar 19, 2016, at 10:36, Chris Lattner <clattner@apple.com> wrote:

On Mar 18, 2016, at 9:49 AM, Jordan Rose via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Mar 17, 2016, at 21:08 , Russ Bishop <xenadu@gmail.com <mailto:xenadu@gmail.com>> wrote:

I’m very much +1 on this idea.

On Mar 17, 2016, at 6:59 PM, Jordan Rose via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:
<https://github.com/jrose-apple/swift-evolution/tree/optional-pointers#open-issue-unsafebufferpointer>Open Issue: UnsafeBufferPointer

The type UnsafeBufferPointer represents a bounded typed memory region with no ownership or lifetime semantics; it is logically a bare typed pointer (its baseAddress) and a length (count). For a buffer with 0 elements, however, there's no need to provide the address of allocated memory, since it can't be read from. Previously this case would be represented as a nil base address and a count of 0.

With optional pointers, this now imposes a cost on clients that want to access the base address: they need to consider the nil case explicitly, where previously they wouldn't have had to. There are several possibilities here, each with their own possible implementations:

Like UnsafePointer, UnsafeBufferPointer should always have a valid base address, even when the count is 0. An UnsafeBufferPointer with a potentially-nil base address should be optional.

UnsafeBufferPointer's initializer accepts an optional pointer and becomes failable, returning nil if the input pointer is nil.

UnsafeBufferPointer's initializer accepts an optional pointer and synthesizes a non-null aligned pointer value if given nil as a base address.

UnsafeBufferPointer's initializer only accepts non-optional pointers. Clients such as withUnsafeBufferPointermust synthesize a non-null aligned pointer value if they do not have a valid pointer to provide.

UnsafeBufferPointer's initializer only accepts non-optional pointers. Clients using withUnsafeBufferPointermust handle a nil buffer.

UnsafeBufferPointer should allow nil base addresses, i.e. the baseAddress property will be optional. Clients will need to handle this case explicitly.

UnsafeBufferPointer's initializer accepts an optional pointer, but no other changes are made.

UnsafeBufferPointer's initializer accepts an optional pointer. Additionally, any buffers initialized with a count of 0 will be canonicalized to having a base address of nil.

I'm currently leaning towards option (2i). Clients that expect a pointer and length probably shouldn't require the pointer to be non-null, but if they do then perhaps there's a reason for it. It's also the least work.

Chris (Lattner) is leaning towards option (1ii), which treats UnsafeBufferPointer similar to UnsafePointer while not penalizing the common case of withUnsafeBufferPointer.

What’s the use of an UnsafeBufferPointer with zero count? Semantically that is making a claim that it can’t back up (“I have allocated memory at location X” which isn’t compatible with the idea of “zero count/size").

Without knowing more context I’d strongly favor (1i). If an array is empty the natural expectation for withUnsafeBufferPointer is you get UnsafeBufferPointer<Element>? = nil, which follows the behavior of the rest of the language and things like guard let make it trivial to handle properly. If someone really has a problem with it they can add ifUnsafeBufferPointer() that returns a non-optional pointer and skips executing the closure if the Array is empty (which is the behavior of your standard for loop).

The important use case here is that "array.withUnsafeBufferPointer" should always do something (i.e. it usually can't just skip the closure), and it turns out it's easiest if the zero-element case is treated the same as everything else. When converting over the standard library I found that very few of them wanted to do something different in the zero-element case, and then it would be bad to force Array to allocate memory just to not use it. That is, there aren't actually any clients interested in knowing whether the base address is valid, and all of the ones that do have to think about it (because they use it directly) aren't getting any value out of it.

Why would you have to allocate memory for this case? The pointer only needs to be non-null, not valid and dereferencable. You could use the address of a global or even 0x1.

jrose · March 24, 2016, 6:09pm

I updated the proposal before it got accepted into the queue; the consensus was for the "round-trips cleanly" case. A (valid, 0) pair could still represent a range to replace in a C API, so canonicalizing to nil might be a bad idea.

You can see the current version here as SE-0055: https://github.com/apple/swift-evolution/blob/master/proposals/0055-optional-unsafe-pointers.md

Jordan

···

On Mar 24, 2016, at 11:02, David Waite <david@alkaline-solutions.com> wrote:

From "[swift-evolution] Notes from Swift core team 2016-03-23 design discussion”:

Make pointer nullability explicit using Optional <file:///Users/alexmartini/DevPubs%20Git%20Repositories/Swift%20Language%20Review/_build/html/LR_MeetingNotes/2016-03-23.html#make-pointer-nullability-explicit-using-optional>
"Make pointer nullability explicit using Optional" by jrose-apple · Pull Request #219 · apple/swift-evolution · GitHub
Biggest open issue is what to do with UnsafeBufferPointer which has a base address and a count of the number of elements at that address. The most common use is to do fast things with an array. The problem is when you have an empty array.

We have a statically initialized empty array, so this doesn’t apply to array. But slices and Cocoa arrays can do it.

Half of the use cases are subscripting off of the buffer, so they don’t actually use the base address. They can’t actually subscript an empty array, but it’s not a syntax error — the loop is run zero times, so it doesn’t matter. The other half pass the pointers down to a C API that takes an address and count.

Someone might expect that the base address doesn’t change when something is initialized.

We can’t easily use the zero pointer because SIL already uses it for nil. But there are issues with using the same representation as C to avoid bridging costs.

We’re mapping two things in C onto one thing in Swift. In C, the buffer pointer would be __nullable long * and the length is ulong.

Given everything else in the system, it’s more like pointer. We didn’t call it a buffer because that tends to imply ownership.

Sketching out the state space:

Pointer Length Static type
null 0 UBP?
valid >= 0 UBP
valid < 0 X
vull != 0 ???
This issue would go away if we got rid of the base address on UnsafeBufferPointer, but that would get rid of a number of valid C operations like calling memcopy.

It seems like withUnsafeBufferPointer should never produce nil. With that in mind, why should UnsafeBufferPointer need to?

We do need a properly-aligned “valid” invalid pointer. LLVM makes assumptions about things being aligned.

Dominant feedback on the list has been for people want something that round trips cleanly. Making the base address non-optional adds overhead and removes the ability to round trip.

It’s unfortunate that we don’t have a way to represent in the type system a buffer pointer that isn’t nullable, from within withUnsafeBufferPointer which wouldn’t even call its closure if the buffer has a null base address.

In my mind UBP is primarily meant to be a collection. In that case, I imagine (nil, 0) as an input wouldn’t necessarily represent a nil UBP? - it could represent an empty UBP.

My question is whether a valid pointer, length 0 is a valid UBP or not - I have trouble imagining a API which wants a UBP which would differentiate this value over the (nil, 0) one and not have it either be an abuse of UBP (using it to transport just a pointer and not representing a buffer) or an error. I suspect it actually would be ok to always represent a length 0 UBP as having a nil base address.

Brent_Royal-Gordon · March 19, 2016, 1:23am

The main thing is just that UnsafeBufferPointer doesn't act like a pointer; it acts like a buffer, and a zero-element buffer turns out to be a perfectly useful degenerate case (and something that will happen anyway for an array with capacity reserved but no elements). In the cases where you use it as a Collection, nothing cares whether the base pointer is null.

If UnsafeBufferPointer is primarily a buffer, not a pointer, then I think it's okay for the base address to be optional.

(And maybe it should just be called `UnsafeBuffer`.)

I really hate the "just make up an address" solution because the address is only "valid" in the narrow, technical sense that it's properly aligned and perhaps in an allocated page. In an empty buffer, it is never valid to actually *access* the memory at that address. There is no data there, so you have no business trying to do anything with the memory at that address.

If the buffer base address is nil, the situation is accurately represented: no memory has been allocated to store this buffer's contents. If the buffer base address is some random pointer, the buffer is telling a dangerous lie.

So I think the only way to make UnsafeBufferPointer accurately represent the situation is with either 2i (optional okay, base address of empty buffers can be arbitrary) or 2ii (optional okay, base address of empty buffers must be nil). There's a certain appeal to 2ii, but I think 2i is probably more useful. I could imagine somebody constructing an UnsafeBufferPointer that represents a slice of the actual buffer; in that case, there really *is* a buffer at that memory address, it's just that your slice of it doesn't have any elements. And the ability to construct a larger buffer from a zero-sized one might be useful.

Actually, I could make an argument for a design like this:

  public struct UnsafeBufferPointer<Element> {
    public init(baseAddress: UnsafePointer<Element>?, count: Int)

    private var _baseAddress: UnsafePointer<Element>?

    /// -Precondition: the base address provided at construction was not `nil`.
    public var baseAddress: UnsafePointer<Element> {
      return _baseAddress!
    }

    // Use these methods to safely modify a buffer pointer whether or not it has a base address.
    func rebased(at newBasePointer: UnsafePointer<Element>?) -> UnsafeBufferPointer
    func resized(to count: Int) -> UnsafeBufferPointer
  }

Or even a design that used an auto-unwrapped Optional. But with the same behavior just an `!` away, I don't think that's a good idea.

···

--
Brent Royal-Gordon
Architechies