I am still concerned about the operation that casts an unsafe type to a safe type.
I think we should remove it, or at minimum create a separately enabled diagnostic for it that is not silenced by the unsafe
keyword (which I predict all memory safe projects will enable).
In order to show why, letâs write out an explicit wrapper type that has the effect of casting unsafe to safe. Once itâs written out, we can point out where, and how, it departs from our intended practice of memory safety.
Consider a memory safe parser:
func parse(_ input: some Collection<UInt8>) {
}
And a caller that has an unsafe buffer:
let buffer = unsafe getUnsafeBuffer() // UnsafeBufferPointer<UInt8>
parse(unsafe buffer) // Unsafe cast to Collection<UInt8>
If we did not have a language feature to cast an unsafe type to a safe type, the caller would need to write a wrapper type instead:
struct UnsafeBufferWrapper<Element> : Collection {
let buffer: UnsafeBufferPointer<Element>
init(_ buffer: UnsafeBufferPointer<Element>) {
self.buffer = unsafe buffer
}
subscript(index: Index) -> Element { unsafe buffer[index] }
/* Also: Index, startIndex, endIndex, index(after:) */
}
âŚand invoke the parser like this:
let wrapper = unsafe UnsafeBufferWrapper(unsafe getUnsafeBuffer())
parse(wrapper)
Now that itâs written out, UnsafeBufferWrapper includes some obvious memory safety contradictions:
-
Lifetime. UnsafeBufferWrapper
advertises safe
and Escapable
lifetime, but it has no way of knowing the lifetime of its buffer.
-
Bounds. UnsafeBufferWrapper
advertises safe
subscripting, but it does no bounds checking.
UnsafeBufferWrapper
doesnât just temporarily depart from features the programming language can verify; instead, it entirely departs from the practice of programming weâre trying to achieve.
To cure these memory safety contradictions, we need to write a different wrapper â one that resolves our contradictory facts.
For example, letâs say that the caller knows that its pointer points to data that is embedded in the program binary, and immortal. They can write this wrapper instead:
struct ImmortalBufferWrapper<Element> : Collection {
let buffer: UnsafeBufferPointer<Element>
init(_ withImmortalBuffer: UnsafeBufferPointer<Element>) {
self.buffer = unsafe buffer
}
subscript(index: Index) -> Element {
precondition(index >= 0 && index < buffer.count)
return unsafe buffer[index]
}
/* Also: Index, startIndex, endIndex, index(after:) */
}
...and invoke the parser like this:
let wrapper = unsafe ImmortalBufferWrapper(withImmortalBuffer: unsafe getUnsafeBuffer())
parse(wrapper)
This wrapper has some significant memory safety advantages:
-
Bounds checking. It does bounds checking.
-
Lifetime. Though it uses an outside-the-language means, ImmortalBufferWrapper
clearly communicates a lifetime requirement. If getUnsafeBuffer()
is not documented to return an immortal pointer, the contradiction is visible to any programmer or reviewer who checks, based solely on local reasoning about the caller and the calleeâs documented interface.
Alternatively, if the programmer knows that the pointer is an in-scope temporary, and that the parser does not retain sub-ranges of its input, they can update the parse
API to offer a ~Copyable
or ~Escapable
input type, or they can write a NoescapeBufferWrapper
that uses weak pointers or isKnownUniquelyReferenced
to verify at runtime that the pointerâs wrapper does not escape.
What if the programmer canât do any of these things, because they truly donât know the pointerâs lifetime, canât verify at compile time or runtime the lifetimes created by the parser, or both? All we can say about that case without knowing more is that, whatever happens next, it is not going to be memory safe programming, so a strictly safe environment needs to signal an error to invite the programmer to rethink their design, or ask for help.
You could argue that casting unsafe to safe is a more concise way to say what the programmer would have said anyway with a wrapper. But I think this example shows that a strictly safe project would not accept such a wrapper, so we should not include a language feature that conjures such a wrapper on demand. And this example also shows that the programmer can say something safer instead.
You could argue that all this unsafety is OK because the programmer said the word unsafe
, thereby calling out and agreeing to unbounded consequences.
But the purpose of the unsafe indicator is not a simple callout. We are not trying to tell programmers âthis is not preferredâ. That feature already exists in default Swift, since the programmer usually has to say âwithUnsafePointerâ. In practice, we have found a simple callout to be insufficient to achieve our memory safety goals.
The purpose of the unsafe indicator is also not to ask the programmer to promise that they have done global reasoning to ensure memory safety. That feature is the flawed premise of all unsafe languages. In practice, we have found global reasoning to be insufficient to achieve our memory safety goals.
The purpose of the unsafe indicator is to communicate a programming discipline that achieves memory safety based on local reasoning, even when that reasoning depends on an outside-the-language invariant.
Casting unsafe to safe does not express any reasoning; it turns off reasoning.
You could argue that the programmer could have created an equal consequence by invoking a memory-unsafe parser:
/* C code */
// 327 CVE's have been reported in the history of this API
void parse(const char*, size_t);
/* Swift code */
unsafe string.withUTF8 { unsafe parse($0.baseAddress!, $0.count) }
But this is not an equal consequence. In the example case, we started with a memory safe parser -- something that may have taken a year or more to create -- and then casting unsafe to safe yeeted all its value. We also created a communication landmine, since one team might ask, "Did you adopt the strictly safe parser?" and another team might reply, "Sure did!"
Casting unsafe to safe creates a system that, in one important sense, does less rigorous textual verification than we do today in C and C++. Consider the case of bounds safety ensured by -fbounds-safety
or C++ Buffer Hardening. These systems do include a local unsafe operation to conjure a pointer or a bounds from the unverified world. But they do not include a casting operation that can cause a callee that expects to do bounds checking to just⌠not. The same goes for the reference counting verification we do by static analysis.
You could argue that a C or C++ programmer could always conjure a pointer with infinite bounds, which has the same effect of turning off bounds checking in a callee. Though this is possible, it leaves behind a local textual record of a manual attestation of infinite bounds, which makes the memory safety contradiction clear, by local reasoning, to any programmer or reviewer who looks.
We would really like Swift to do strictly more textual verification of memory safety than we can do today in C or C++. A Swift that could verify more safety properties in total than C or C++, but each one with less rigor, would present a difficult tradeoff.