Uninitialized memory + trivial types

Consider the code:

let ptr = UnsafeMutablePointer<UInt8>.allocate(capacity: 1)
let val = ptr.pointee

Is the behavior of this code undefined? Or is it defined, and val is "some bit pattern interpreted as UInt8"? A strict interpretation of Swift's documentation would conclude that it's undefined.

1 Like

Also, the documentation for UnsafeRawPointer.load<T> states that the referenced memory must be initialized. Is this true when T is trivial?

1 Like

Perhaps. One thing that I've found interesting when reading the documentation is that it says:

When reading from the pointee property, the instance referenced by this pointer must already be initialized. When pointee is used as the left side of an assignment, the instance is updated. The instance must be initialized or this pointer’s Pointee type must be a trivial type.

While the first sentence says the location must be initialised, the final sentence contains an "or" - and makes it sound as though you may read trivial types from uninitialised memory.

As you note, the docs for load are stricter/clearer:

The memory at this pointer plus offset must be properly aligned for accessing T and initialized to T or another type that is layout compatible with T .


Whether or not the load itself is UB, there are some particular situations where invalid loads will create values which will cause UB -- and importantly, there is no possible check you could write to validate the value after it has been loaded.

For instance, simple enums are trivial types, but if they have an invalid bitpattern (likely, if the value came from uninitialised memory), switch statements that the compiler believed to be exhaustive will not be. I don't believe there is any defined behaviour in that circumstance, and there can be lots of secondary UB as a result:

enum SimpleEnum { case a, b }

let simpleEnumPtr = UnsafeMutablePointer<SimpleEnum>.allocate(capacity: 1)
let anotherValue: String

switch simpleEnumPtr.pointee {
case .a: anotherValue = "a"
case .b: anotherValue = "b"
}

print(anotherValue)
// What if the enum was neither .a or .b?
// There is no check you can write to validate it, either.

In the extreme case, this will cause UB basically everywhere:

// Never is a trivial type, right?
let oh_no = UnsafeMutablePointer<Never>.allocate(capacity: 1).pointee

IMO, the documentation for .pointee should lose the condition about Pointee being a trivial type.

3 Likes

Reading uninitialized¹ memory (via pointee or load or any other mechanism) is undefined behavior.

This should be clarified, but the correct interpretation is as follows:

  1. When reading, the pointed-to instance must already be initialized.
  2. When assigning, the pointed-to instance must either already be initialized or be trivial.

¹ what exactly qualifies as "initialization" here is quite subtle, because we have to allow for memory initialized by non-Swift code that doesn't have the same notion of "initialization" as Swift. "Must either be initialized in the Swift sense or contain a valid value of compatible type" is probably close enough if we wave our hands a little bit.

10 Likes

The way I remember this rule (and, related, how I remember what the practical difference between assign and initialize on unsafe pointer types is) is I think to myself “if the type is an object type, what will happen when it tries to release the value that’s currently there?”

9 Likes

You are right to cast light on the enigmatic bit patterns of enums. This is also the case for booleans. It's very dangerous to depend on the bit patterns of types that don't have surjective relationships with their bit patterns.

Regarding your footnote, if we widen the notion of initialization to "...or contain a valid value of compatible type", then aren't we compelled to conclude that the behavior of my example code is defined? The value of val is "random", but otherwise the program executes like a healthy program.

No, because someone might someday build Swift for an architecture that e.g. used memory coloring to track memory regions that have been initialized, and then we might make that load trap when the coloring check failed.

Accessing uninitialized memory is UB. Accessing initialized memory after the lifetime of the allocation used to initialize it has ended (e.g. a UAF bug in C or C++) is UB. All sorts of similar mechanisms to get a wild read are UB. I would expect some mainstream HW implementations to detect some or all of these and trap accordingly sometime in the next decade.

13 Likes

Okay, that makes sense. Trapping on a failed coloring check would be a good outcome, however. You might say that it would be "defined to trap" on that platform.

It seems the rule is "when working with pointers to trivial types, Swift's abstract machine isn't very abstract", which is the only tenable position in the absence of a sublimely clever compiler.

Thanks for your guidance.