Possible unsoundness with RawSpan and MutableRawSpan

The RawSpan and MutableRawSpan APIs make it possible to arbitrarily change the bit pattern of a BitwiseCopyable value without using unsafe code. For example, the following function returns a Bool value with an invalid bit pattern:

func createInvalidBool() -> Bool {
    var dst: [1 of Bool] = [false]
    var dstSpan = dst.mutableSpan
    var dstBytes = dstSpan.mutableBytes
    dstBytes.storeBytes(of: 2, toByteOffset: 0, as: UInt8.self)
    return dst[0]
}

I've also noticed that the RawSpan and MutableRawSpan APIs make it possible to read uninitialized memory by reading the padding bytes of BitwiseCopyable values, without using unsafe code. For example, the following function returns the contents of uninitialized memory on the stack:

func readUninitializedBytesFromStack() -> [UInt8] {
    let src: [100 of Void] = InlineArray(repeating: ())
    var dst = Array(repeating: 0 as UInt8, count: 100)
    var dstSpan = dst.mutableSpan
    var dstBytes = dstSpan.mutableBytes
    dstBytes.storeBytes(of: src, toByteOffset: 0, as: [100 of Void].self)
    return dst
}

I'm not sure if this is undefined behavior, or if this has "freeze" semantics (where the returned values are indeterminately chosen but self-consistent, and there is no undefined behavior). Even if it has "freeze" semantics, it still arguably breaks memory safety, because exposing uninitialized memory like this can lead to the same kinds of security vulnerabilities associated with memory-unsafe code in languages like C.

7 Likes

Thanks for bringing this up. This is the motivation for unsafeLoad being @unsafe, since looking at MutableRawSpan in isolation, it's arguably safe to store anything, but not necessarily OK to load those bytes back as any type that isn't fully inhabited. But because you can get a MutableRawSpan temporarily from a MutableSpan, and then return to the original typed MutableSpan, you can effectively end up in the same situation. So perhaps mutableBytes should also be made @unsafe.

6 Likes

There's also the somewhat orthogonal problem of reading uninitialized memory from padding bytes. I'm not sure whether or not RawSpan and MutableRawSpan are supposed to be able to point to uninitialized memory. If so, then unsafeLoad should be @unsafe even for fully-inhabited types. If not, then the immutable bytes property should be @unsafe for types like Void that have padding bytes when stored in an array, and store should be an @unsafe operation as well (unless perhaps it guarantees that padding bytes are zeroed).

1 Like

Thanks for pointing out the mistake with MutableSpan.mutableBytes. It was marked with @unsafe in the proposal, but I forgot to add the annotation in the implementation. This is fixed in [stdlib] mark `Span.mutableBytes` as unsafe by glessard · Pull Request #86420 · swiftlang/swift · GitHub.

3 Likes

We discussed the subject of padding bytes recently. The de facto behavior today of loading padding bytes would be UB, but giving the raw byte load operation "freeze" semantics seems like it would strike a reasonable balance between flexibility, usability, and safety, without imposing undue burden on optimizations. Loading a value of a fully-inhabited type out of uninitialized memory wouldn't immediately violate safety so long as the value loaded is self-consistent with other loads without intervening stores (presupposing that we don't ever consider types with pointers in them to be fully inhabited). In Swift, type layout also sometimes has partially-undefined bytes, where for instance the non-tag bits of an enum might be left unspecified when the enum's value matches a no-payload or smaller-payload case. Freeze semantics would ensure that the raw span APIs can safely load partially-initialized bytes and mask out or test the relevant bits.

Note that we probably still do not want to make any strong guarantees about the padding behavior of intervening stores of padded types, preserving freedom for the compiler to overwrite padding bytes as part of a typed store in situations like this:

// for the purposes of this example, assume that `HasPadding` has a layout of:
// byte  | 0 | 1 | 2 | 3 | 4 | 5 |
// field |   x   | y |pad|   z   |
struct HasPadding {
  var x: UInt16
  var y: UInt8
  var z: UInt16
}

var x: [1 of HasPadding] = [.init(x: 1, y: 2, z: 3)]
var typedSpan = x.mutableSpan
do {
  var rawSpan = typedSpan.mutableBytes
  rawSpan[3] = 0x5a
  print(rawSpan[3]) // guaranteed to read back 0x5a
}
// ... but it's still unspecified whether this clobbers that byte
x[0] = .init(x: 4, y: 5, z: 6)
do {
  var rawSpanB = typedSpan.mutableBytes
  let value = rawSpanB[3]
  print(value) // unspecified
  let valueB = rawSpanB[3]
  print(value == valueB) // but this is guaranteed to be true, at least
}
4 Likes

It doesn't immediately violate safety, but it's still dangerous, especially when considering things like side-channel attacks.


The idea of introducing a "freeze" operation has caused quite a bit of debate in the Rust community. I know Swift has historically taken a more pragmatic approach to safety, but the relevant issues may be relevant to Swift as well:

To summarize (my interpretation of) the discussions, especially the points made by Ralf Jung:

The Rust community is heavily invested in verifying that unsafe code is "sound", meaning that it does not cause the typical issues associated with memory-unsafe code, including security vulnerabilities. Right now, unsoundness is equivalent to possible-UB, and UB is easy to formally reason about. For example, Miri, a Rust interpeter which detects UB at runtime, is used to test unsafe code. As another example, RustBelt is a proof framework used to formally verify the soundness of unsafe code.

Introducing a "freeze" operation breaks the property that a Rust program never depends on uninitialized memory unless it has UB, which breaks the equivalence of unsoundness and possible-UB. Formally reasoning about the soundness of "freeze" operations would likely require some notion of non-interference, which in the general case requires reasoning about things like side-channel attacks. That would be much more difficult to check formally using something like RustBelt, and maybe impossible to check using a tool like Miri.

My own thoughts: Maybe there's a conservative approximation of non-interference, such as the property that a program never branches on the result of a "freeze" operation. But formulating those kinds of rules, and making sure they hold across compiler optimizations, could be its own can of worms. On the other hand, maybe Swift could give up on these mostly-theoretical benefits and end up like C, where the existence of memcmp already breaks the equivalence of unsoundness and possible-UB.

4 Likes

One of the Rust community's discussions over on GitHub also brings up another issue: "freezing" a range of memory in place requires actual writes to memory in some situations, such as when the kernel optimizes memory pages under MADV_FREE. For example, maybe a large [Void] array which spans multiple pages could have some of those pages hold nondeterministic values at runtime. In practice, I think that would mean either getting a RawSpan is a potentially-expensive mutating operation, or the "freeze" occurs on each individual load and different loads are not guaranteed to be consistent.

Edited to add: the RFC discussed in the link above takes another direction that I didn't mention: make exposing the contents of uninitialized memory considered sound, so that a function which returns a "frozen" value doesn't even need to be marked unsafe. This has the benefit of allowing "freeze" while preserving the equivalence of unsoundness and possible-UB. In my opinion, it's an extremely bad idea, because it stretches the definition of "memory safe" if security vulnerabilities like MongoBleed become possible in "safe" code.

2 Likes

A useful ability we might want in the future is to safely cast a RawSpan to a Span<T> if T is a fully-inhabited type (with the correct alignment). That would be impossible if each individual load from a RawSpan performed a "freeze" operation.

(Apologies for the three replies in a row; I thought it would be more courteous to create a new reply for a substantially different point than to edit it into an existing reply and have it potentially be missed by those interested in this thread.)

1 Like

Span<T> is the programmer interface for contiguous elements within a RawSpan. Supporting that is non-negotiable. I think the memory range needs to be frozen (before the load) to avoid UB.

1 Like

for MutableRawSpan to be any safer than raw pointers, i think it needs to have a safety guarantee that all of its bytes are always initialized across API boundaries like function calls, just like other safe types.

if users of a MutableRawSpan are allowed to store uninitialized bytes, then no function that takes one as an argument can trust that it's safe to use. the same goes for yielding/mutating accessors. [UInt8], Span<UInt8>, or RawSpan can't provide a safe mutableBytes property if users can write uninitialized bytes into their storage (unless they immediately overwrite their entire storage after... which would be hilarious but completely useless).

IIUC, unsafe code could be allowed to write undefined bytes, so long as it guarantees that those bytes are defined at some point before they are ever read. this the same safety guarantee as any other safe type: if you get one through a safe API boundary, it's guaranteed to be completely valid, but if you muck around with unsafe code, it's your responsibility to ensure it's valid by the time you're done with it.

it is impossible for MutableRawSpan.storeBytes(of:as:) to be a safe API if it takes a BitwiseCopyable, as BitwiseCopyable types are allowed to have uninitialized padding bytes[1]. we could provide a safe version if we required the input to conform to a hypothetical FullyInhabited protocol (or BitwiseStorable :eyes:).


  1. well or storeBytes could be defined to write any padding bytes as 0s, though that would add performance overhead ↩︎

2 Likes

I agree that it's important to be able to convert a RawSpan to a Span. I mentioned the option of performing a freeze on each individual load because it avoids the difficulties I mentioned earlier of freezing the entire memory range.

To further elaborate on those difficulties: I don't think LLVM has an instruction to freeze a range of memory. It's possible to do it manually with something like rawSpan[i] = freeze(rawSpan[i]), but that has two problems:

  1. It mutates the memory, so it risks LLVM-level data races unless there's a guarantee of exclusivity. To avoid UB, getting a RawSpan would have to be a mutating operation.
  2. It would have to compile down to actually writing to memory, because of the MADV_FREE-related problems, making it an O(n) operation. The Rust RFC I linked above has given up on providing an in-place freeze operation for that reason.

There are multiple dimensions of memory safety. Even if RawSpan doesn't guarantee initialization safety, I think it would still be a meaningful improvement over unsafe pointers by guaranteeing bounds safety and lifetime safety.

I also think it's probably the best option. We want RawSpan to be a safer way to manipulate the raw bytes of values, and restricting it to types without padding would severely limit that use case. With that use case, dealing with uninitialized memory is generally unavoidable.

What I do think is bad for safety, though, is the status quo, where RawSpan has ambiguous safety guarantees.

I agree that a function taking a RawSpan most likely relies on additional safety guarantees that aren't ensured by the type itself. For that reason, I think RawSpan should be marked as @unsafe. (To my understanding, marking a type as @unsafe is basically a heuristic to interoperate with code without unsafety annotations; strictly speaking, it is operations on types that are unsafe, not types themselves.)


The other thread suggests that RawSpan could be used as a safe abstraction over fully-initialized memory, such as arrays of integers. That's useful, but it conflicts with the other major use case. In my opinion, the problem is that it's inadequate to address multiple dimensions of unsafety with only one safe type.

Maybe it would be a good idea to let RawSpan just ensure lifetime safety and bounds safety, and introduce a new type that also ensures initialization safety. Maybe the type could be called ByteSpan.

There's also another dimension of safety: type safety, but like @Joe_Groff mentioned earlier, that would be addressed by marking MutableSpan.mutableBytes as @unsafe, which has already been done.

Freezing the result of each load is attractive, because it has (essentially) no cost, but it doesn’t actually solve the problem of unsoundness, because two loads of the same byte may observe different frozen values. Thus it would be possible for, e.g. a hypothetical “NullTerminatedBytes” type’s failable initializer to act as though a (padding) byte were 0, even though it “really” was non-zero (meaning the actual value in that byte of memory was non-zero), resulting in an out-of-bounds access when passed to a subsequent safe operation that assumes “NullTerminatedBytes” was checked.

Storing out the frozen values would solve that problem (but introduces other, probably worse problems, as you note).

1 Like