[Prospective vision] Optional Strict Memory Safety for Swift

scanon · October 23, 2024, 6:56pm

For example, if we assume the invariant lowerBound <= upperBound, it is valid to transform x >= lowerBound && x < upperBound into unsignedLess(x &- lowerBound, upperBound &- lowerBound); on some architectures, this is a significant performance improvement, so this is an optimization that is likely to be performed (either explicitly in the standard library or implicitly by the compiler backend).

However, this transform is not valid if lowerBound > upperBound. Consider lowerBound = 3, upperBound = 0, x = 4; this will fail the original check, but pass the transformed check. You can "fix" this by asserting that upperBound &- lowerBound does not overflow as an unsigned operation, but that's exactly the check that was being eliminated by using the uncheckedBounds spelling to begin with.

xwu · October 23, 2024, 7:00pm

By the way, we used to have these unchecked math operations available in all optimization modes: they were explicitly documented as having UB in case of overflow, and of course, they were named as follows:

unsafeAdding(_:)
unsafeSubtracting(_:)
unsafeMultiplied(by:)
unsafeDivided(by:)

Jumhyn · October 23, 2024, 7:30pm

Yeah, I think this is the key observation. If UB can occur, then there needs to be at least one spot we're able to point to and identify as the 'cause' of the unsafety. The range-based subscripts which rely on a documented guarantee of how Range behaves seem like an odd spot to place the blame.

Karl · October 23, 2024, 7:43pm

Because this initialiser exists, safety-critical code already - today - should not be making the assumption that lowerBound is always <= upperBound. A safe API must account for invalid values.

That said, subscript(Range<Int>) is never required to be unsafe, even if the range is invalid, because:

In general, it does not need to access memory ("memory" meaning the collection's contents). The standard library's default implementation returns a Slice, which just wraps the existing collection but modifies its Collection conformance to override startIndex and endIndex. That's all you need to do to make a slice of a Swift collection.

The indices could be invalid, the underlying buffer could have been deallocated already, and it wouldn't matter. No safety would be violated by constructing the Slice.
If some custom implementation does need to access memory, it can trap if asked to access an out-of-bounds element. Just like subscript(Index) does.

xwu · October 23, 2024, 7:48pm

This is the tail wagging the dog: because safe APIs rely on the satisfaction of the precondition, this initializer is unsafe. As @fclout and @Jumhyn have argued above, putting the onus everywhere else so that we can call this initializer "safe" would be less desirable than what we've settled on here.

Karl · October 23, 2024, 7:51pm

Not at all - it only seems that way because you want Range to make stricter assurances than it actually does.

There are many ways that the invariant could be violated, even without using that initialiser. For example, the range's bounds may be classes with shared mutable state.

In every case, it is unacceptable for a safe API to invoke undefined behaviour because the invariant couldn't be upheld.

And in the case that we are talking about here, where non-bounds-checked subscripts are entirely banned, I don't think there is any way for an invalid range to defeat bounds-checking. Users of "strict memory safe Swift" will not need to care about these details and are free to use Range.init(uncheckedBounds:) without fear. If you think it is, I would be most interested to see an example.

xwu · October 23, 2024, 7:59pm

As @scanon explains above, there is, because the bounds check itself can (correctly, in a strictly safe mode) rely on the invariant.

Karl · October 23, 2024, 8:02pm

Steve describes a potential optimisation that could be possible with stronger invariants, but that has nothing to do with safety.

However, in order to have that invariant in the first place, we would need to pay the cost of the check that lowerBound <= upperBound and the non-mergeable trap if it isn't (and we wouldn't have the initialiser to bypass that check when we already know it holds), so effectively these wash out to the same thing - one operation becomes faster, others become slower.

Karl · October 23, 2024, 8:10pm

By the way, this quote is out of context - it refers to loading invalid bit-patterns, not loading valid bit-patterns that violate higher-level (library/application-level) invariants.

The former is UB, the latter is not.

struct Foo { 
  private var x: Int 
  init() { x = 42 } 
} 
withUnsafeBytes(of: 99) { $0.load(as: Foo.self) } 

// Foo = {
//  x = 99
// }

There is no UB in this example. It is deterministic and its result is guaranteed by the language semantics.

fclout · October 23, 2024, 8:12pm

The philosophy behind the strict memory safety vision for Swift is to accept that Swift code interacts with unsafe modules. The goal is to have a strong statement that a given module does not introduce memory unsafety. Over time, the list of safe modules grows and the memory safety bugs "die of old age" in modules that remain unsafe. As a result, in this vision, it is expressly not suggested to verify that a value is consistent with its safe initializers; that is "baseline correct". You should of course verify that a safely constructed value is adequate for your use case–for instance, you still need to check that the lower bound and upper bound of the range are compatible with the container, because you can very safely construct a range that is out of bounds of it.

Even if that initializer didn't exist, as you correctly said, code that has access to @unsafe functionality could create an invalid value, even in the absence of a better-dedicated @unsafe initializer. Most invalid values are fundamentally impossible to verify. For instance, you cannot verify that an enum value is valid, because it's "impossible" to construct an invalid value. You famously cannot verify that a pointer is valid, and by extension, that a value laundered to you as a safe reference is valid, either.

The only reason a range is verifiable is that, as an implementation detail, it has enough redundancy for that to be possible. Do you find that this is a sound argument for what should be verified versus what doesn't need to be verified?

xwu · October 23, 2024, 8:47pm

I could be misunderstanding him, but I believe he's saying that this is a likely optimization today of our current bounds checking—which, again, can allow an invalid range initialized using init(uncheckedBounds:) to result in out-of-bounds memory access.

I understand a type to be the set of its valid values, which it defines, and the set of valid bit patterns to be those which are the memory representations of those valid values. I'm not aware of any distinction in the language semantics which prohibit the compiler from being arbitrarily more capable of taking advantage of optimizations which flow from assuming valid initialization of library types on pain of UB.

As you well know, with literally just this code, the compiler can elide the whole thing and nothing is guaranteed to be executed at all. But supposing you bound the instance to a variable inside withUnsafeBytes, as far as I can tell, there is nothing in the language semantics which restricts a future version of the compiler from giving you an instance of Foo which observably has the same memory layout as 42 (or, for that matter, anything at all). Would be interested if I'm wrong on this...

Karl · October 23, 2024, 9:11pm

If the compiler actually did make that assumption, then I would agree that the initialiser is unsafe (and that the documentation would be shockingly deficient).

I don't think it does, though:

func isValid(_ r: Range<Int>) -> Bool {
    r.lowerBound <= r.upperBound
}

Does not get constant-folded to true - it performs the comparison.

I'm not discounting the possibility that it may be worthwhile to remove this initialiser and attempt to make this invariant stricter on performance grounds, so that assumption can be made, but that's a separate issue from safety. If it's worth pursing for performance, we should discuss doing it in every language mode.

xwu · October 23, 2024, 9:43pm

Right—it is my understanding that the compiler is free to make this assumption at any time under the current language semantics. Hence, why it is proposed that the initializer is to be marked unsafe and thus prohibited in the optional strict memory safety mode.

Karl · October 23, 2024, 9:58pm

If we look at the documentation:

Because this initializer does not perform any checks, it should be used as an optimization only when you are absolutely certain that lower is less than or equal to upper. Using the half-open range operator (..<) to form Range instances is preferred.

In my opinion, I don't think that language is strong enough to justify the compiler making that assumption.

For example, lower were not less than or equal to upper under such a model, passing the constructed value to a library may cause it to read out-of-bounds memory even if it exclusively uses safe data types such as Array. Does that language really make the severity of a mistake clear?

If the compiler doesn't make the assumption, then I return to my argument that it isn't a safety issue.

Nevin · October 23, 2024, 10:04pm

This is a good point. If we have a class like this:

class IntRef: Comparable { … }

class IntRef: Comparable {
  var value: Int
  
  init(_ value: Int) {
    self.value = value
  }
  
  static func == (lhs: IntRef, rhs: IntRef) -> Bool {
    lhs.value == rhs.value
  }
  
  static func < (lhs: IntRef, rhs: IntRef) -> Bool {
    lhs.value < rhs.value
  }
}

And a function like this:

func badRange() -> Range<IntRef> {
  let range = IntRef(0) ..< IntRef(0)
  range.lowerBound.value = 1
  return range
}

Then calling that function will produce a range with broken invariants:

let range = badRange()
print(range.lowerBound <= range.upperBound)  // false

xwu · October 23, 2024, 10:07pm

I think it's fair to file a bug against the documentation if you feel that way—but what in your view is stronger than absolute certainty?

Karl · October 23, 2024, 10:08pm

This:

It is undefined behaviour for lower not to be less than or equal to upper

xwu · October 23, 2024, 10:16pm

Sure—or, if we go with this vision, @unsafe.

So, should we consider a notion of ValueSemantics to achieve some of our safety goals? Probably a separate conversation from this one.

Nevin · October 23, 2024, 10:32pm

Perhaps. Definitely a separate conversation though.

Also worth noting that “class” is not required. We can break the invariant with structs alone. For example, we could make a struct CachedValue that stores an Int and uses it as an index into a lookup table.

All the comparisons would be done with respect to the cached values from the table, so if we have a Range<CachedValue> and then modify the values in the table we can end up with a broken invariant in the range.

Jumhyn · October 23, 2024, 11:05pm

Such a CachedValue would (presumably) not validly conform to the hypothetical ValueSemantics protocol, though.