SE-0458: Opt-in Strict Memory Safety Checking

It's not entirely clear to me if that would correctly (from the point of view of Swift Testing) propagate to inner scopes like closures generated by the macro (i.e. the same problem we've run into with try and await.)

I'm not trying to throw a wrench in the works, of course. :smile: Just want to make sure we can properly support unsafe when using Swift Testing. We may want to set some time aside in the next few weeks to brainstorm off-forums.

It feels important to avoid this kind of false positive warning. Are there situations where an implicit self access should be treated as unsafe? In this case, accessing UMBP.endIndex can't pose a safety issue, but are there other type designs where it could?

ETA that I'm not convinced about the tradeoff of identifying specific unsafe lines as a reason to have unsafe expressions vs unsafe { ... } blocks. I find it much easier to miss the "unsafety" of the examples with the unsafe expressions mixed into each line, compared to an unsafe block, which is always specified at the start of a line and has its own scope and indentation.

Not that there isn't a use for the expressions, but the block to me is much more clear about saying "there some unsafe usage happening here: be sure to check it over". This feels particularly true since unsafe usage doesn't propagate upwards. There may be nothing in a function's declaration that gives me a reason to look for the unsafe keyword, so having one or two potentially buried mid-line may not exactly jump out the way a block would.

4 Likes

From first principles (and also replying to @nnnnnnnn), the reason we want unsafe when referencing a pointer variable, and generally when passing them as arguments, is because copy and move operations on pointers are unsafe, as the copy/move could be the last visible operation before the pointer escapes. If we want to do away with the general requirement that any use of an @unsafe type is unsafe, we need to tie it to escapability to avoid regressing from the current proposal. However, we also need to keep in mind that @unsafe doesn't have to only be about lifetimes, so we probably can't tie it too hard to that.

Functions that take a pointer closure and return a generic value all have the opportunity to return the pointer they passed in as an argument, which makes me err on the side of them still being unsafe. (Although, this is about escapability, so maybe not if we have good rules for the above?)

I wonder if we could have : @unsafe Escapable instead of @unsafe on a type? We could make this behave like : ~Escapable except that you get an unsafety warning (cleared by unsafe) when passing the value somewhere it isn't proven to not escape. We could allow escaping through an escaping closure with an explicit [unsafe] capture. Instead of marking a function @safe, you put a borrowing lifetime on pointer arguments. As lifetime support improves for safe values, it also does for unsafe values.

It does remove the ability to use @unsafe on a type to specify the type is globally unsafe for reasons other than lifetimes.

I like the motivation here but I have a hard time seeing how this would help. Unsafe operations are already pretty thoroughly named “unsafe”. In theory this would highlight api that uses unsafe code under the hood, but I would expect any developer who wasn’t going to name their code with the unsafe prefix would also confidently mark their code as safe, assuming they had taken the proper precautions.

1 Like

I think the main difference between these assumptions is a lot of C APIs leak through as appearing to be safe despite not being safe, so this would give the compiler knowledge a linter that is looking for unsafe* wouldn't have.

2 Likes

I must have missed this in the proposal. So is the idea that any C API that takes an "unsafe" argument would be imported with the equivalent of @unsafe? I'd be interested in seeing an example like what it might look like to use something like SQLite with this enabled.

I think where I'm most struggling to get on board with this is that it doesn't seem to actually add any assurances that the code you are writing is safe. With strict concurrency warnings, there is a benefit to the user because it can show you areas of your code that may seem concurrency safe but are not, and there are reasonable ways to write code without using too many @unsafe Sendable annotations. With how I'm understanding this feature, I'm not sure what the solution to these warnings would be other than to do your best to check your usage of unsafe types and then mark a wrapper with safe.

As I understand it, that is the main point, yes. It is possible to use unsafe APIs correctly, but you have to manually verify that you're satisfying certain requirements. People who want to use this feature are saying that they would find language enforcement valuable as an auditing tool for finding the places where they need to manually argue correctness, as well as providing pressure to reduce casual use of unsafe APIs. But it has to be understood that that imposes a lot of secondary costs on those programmers to systematically design and adopt safe abstractions for their Swift code — for example, adopting an idiomatic Swift wrapper around the SQLite API. That's not something a lot of programmers are eager to take on, especially when the core problem is just that somebody else's API is less idiomatic than it should. That is probably the most important reason the feature is opt-in.

9 Likes

Here's a random SQLite API we use in WebKit:

SQLITE_API const char *sqlite3_column_name(sqlite3_stmt*, int N);

The ultimate goal is to annotate this API to make it safe:

SQLITE_API const char * __null_terminated sqlite3_column_name(sqlite3_stmt* __single NOESCAPE LIFETIMEBOUND, int N);

Then you see a @safe Swift overlay something like:

sqlite3_column_name(sqlite3_stmt, Int) -> String

Before the API is annotated, you see an @unsafe Swift overlay. Initially you get a compile-time safety diagnostic when you call the overlay. Then you schedule a project to add safety annotations to this API, and in the meantime you write a safety wrapper by hand. A by-hand safety wrapper looks something like:

func sqlite3_column_name sqlite3_column_name(
    _ statement: sqlite3_stmt, _ N: Int) -> String {
    let cString = unsafe sqlite3_column_name(unsafe sqlite3_stmt.pointer, int N);
    return unsafe String(cString: unsafe cString)
}

(This example presumes a Swift sqlite3_stmt wrapper, which I haven't typed out.)

The by-hand wrapper is not all that impressive. It just uses some unsafe pointers and says so.

The main value of strict safety is not this wrapper; it is all your other code. In default Swift, any expression might use an unsafe type, or invoke an unsafe function. So any expression might be as unsafe as, or more unsafe than, this wrapper. You just don't know. And when programmers working on your project add more un-safety, they may not know, and they may not even know that it's a goal in your project not to do that.

In a strictly safe module, you know that all your other code is memory safe, because it does not say unsafe. And you communicate your goal to other programmers because, when first typed out, unsafe code doesn't compile.

Once your unsafe code doesn't compile, you are empowered to have an informed conversation, such as:

  • Is there a safe version of this API I can call instead?
  • Can I write a wrapper that verifies safety, either manually, or automatically, or through some combination, either at compile time, or at runtime, or through some combination?
  • If no safety strategy is possible, is the use trivial enough to mark unsafe and just move on? Or should I move this code to a less privileged process? Or ask for help?
6 Likes

I am still concerned about the operation that casts an unsafe type to a safe type.

I think we should remove it, or at minimum create a separately enabled diagnostic for it that is not silenced by the unsafe keyword (which I predict all memory safe projects will enable).

In order to show why, let’s write out an explicit wrapper type that has the effect of casting unsafe to safe. Once it’s written out, we can point out where, and how, it departs from our intended practice of memory safety.

Consider a memory safe parser:

func parse(_ input: some Collection<UInt8>) {
}

And a caller that has an unsafe buffer:

let buffer = unsafe getUnsafeBuffer() // UnsafeBufferPointer<UInt8>
parse(unsafe buffer) // Unsafe cast to Collection<UInt8>

If we did not have a language feature to cast an unsafe type to a safe type, the caller would need to write a wrapper type instead:

struct UnsafeBufferWrapper<Element> : Collection {
    let buffer: UnsafeBufferPointer<Element>

    init(_ buffer: UnsafeBufferPointer<Element>) {
        self.buffer = unsafe buffer
    }
    
    subscript(index: Index) -> Element { unsafe buffer[index] }

    /* Also: Index, startIndex, endIndex, index(after:) */
}

…and invoke the parser like this:

let wrapper = unsafe UnsafeBufferWrapper(unsafe getUnsafeBuffer())
parse(wrapper)

Now that it’s written out, UnsafeBufferWrapper includes some obvious memory safety contradictions:

  1. Lifetime. UnsafeBufferWrapper advertises safe and Escapable lifetime, but it has no way of knowing the lifetime of its buffer.

  2. Bounds. UnsafeBufferWrapper advertises safe subscripting, but it does no bounds checking.

UnsafeBufferWrapper doesn’t just temporarily depart from features the programming language can verify; instead, it entirely departs from the practice of programming we’re trying to achieve.

To cure these memory safety contradictions, we need to write a different wrapper — one that resolves our contradictory facts.

For example, let’s say that the caller knows that its pointer points to data that is embedded in the program binary, and immortal. They can write this wrapper instead:

struct ImmortalBufferWrapper<Element> : Collection {
    let buffer: UnsafeBufferPointer<Element>

    init(_ withImmortalBuffer: UnsafeBufferPointer<Element>) {
        self.buffer = unsafe buffer
    }
    
    subscript(index: Index) -> Element {
        precondition(index >= 0 && index < buffer.count)
        return unsafe buffer[index]
    }

    /* Also: Index, startIndex, endIndex, index(after:) */
}

...and invoke the parser like this:

let wrapper = unsafe ImmortalBufferWrapper(withImmortalBuffer: unsafe getUnsafeBuffer())
parse(wrapper)

This wrapper has some significant memory safety advantages:

  1. Bounds checking. It does bounds checking.

  2. Lifetime. Though it uses an outside-the-language means, ImmortalBufferWrapper clearly communicates a lifetime requirement. If getUnsafeBuffer() is not documented to return an immortal pointer, the contradiction is visible to any programmer or reviewer who checks, based solely on local reasoning about the caller and the callee’s documented interface.

Alternatively, if the programmer knows that the pointer is an in-scope temporary, and that the parser does not retain sub-ranges of its input, they can update the parse API to offer a ~Copyable or ~Escapable input type, or they can write a NoescapeBufferWrapper that uses weak pointers or isKnownUniquelyReferenced to verify at runtime that the pointer’s wrapper does not escape.

What if the programmer can’t do any of these things, because they truly don’t know the pointer’s lifetime, can’t verify at compile time or runtime the lifetimes created by the parser, or both? All we can say about that case without knowing more is that, whatever happens next, it is not going to be memory safe programming, so a strictly safe environment needs to signal an error to invite the programmer to rethink their design, or ask for help.

You could argue that casting unsafe to safe is a more concise way to say what the programmer would have said anyway with a wrapper. But I think this example shows that a strictly safe project would not accept such a wrapper, so we should not include a language feature that conjures such a wrapper on demand. And this example also shows that the programmer can say something safer instead.

You could argue that all this unsafety is OK because the programmer said the word unsafe, thereby calling out and agreeing to unbounded consequences.

But the purpose of the unsafe indicator is not a simple callout. We are not trying to tell programmers “this is not preferred”. That feature already exists in default Swift, since the programmer usually has to say “withUnsafePointer”. In practice, we have found a simple callout to be insufficient to achieve our memory safety goals.

The purpose of the unsafe indicator is also not to ask the programmer to promise that they have done global reasoning to ensure memory safety. That feature is the flawed premise of all unsafe languages. In practice, we have found global reasoning to be insufficient to achieve our memory safety goals.

The purpose of the unsafe indicator is to communicate a programming discipline that achieves memory safety based on local reasoning, even when that reasoning depends on an outside-the-language invariant.

Casting unsafe to safe does not express any reasoning; it turns off reasoning.

You could argue that the programmer could have created an equal consequence by invoking a memory-unsafe parser:

/* C code */
// 327 CVE's have been reported in the history of this API
void parse(const char*, size_t);

/* Swift code */
unsafe string.withUTF8 { unsafe parse($0.baseAddress!, $0.count) }

But this is not an equal consequence. In the example case, we started with a memory safe parser -- something that may have taken a year or more to create -- and then casting unsafe to safe yeeted all its value. We also created a communication landmine, since one team might ask, "Did you adopt the strictly safe parser?" and another team might reply, "Sure did!"

Casting unsafe to safe creates a system that, in one important sense, does less rigorous textual verification than we do today in C and C++. Consider the case of bounds safety ensured by -fbounds-safety or C++ Buffer Hardening. These systems do include a local unsafe operation to conjure a pointer or a bounds from the unverified world. But they do not include a casting operation that can cause a callee that expects to do bounds checking to just… not. The same goes for the reference counting verification we do by static analysis.

You could argue that a C or C++ programmer could always conjure a pointer with infinite bounds, which has the same effect of turning off bounds checking in a callee. Though this is possible, it leaves behind a local textual record of a manual attestation of infinite bounds, which makes the memory safety contradiction clear, by local reasoning, to any programmer or reviewer who looks.

We would really like Swift to do strictly more textual verification of memory safety than we can do today in C or C++. A Swift that could verify more safety properties in total than C or C++, but each one with less rigor, would present a difficult tradeoff.

Here's another try at explaining the problem of casting unsafe types to safe types:

A safe interface means “I will verify my safety invariants”.

An unsafe interface means “My caller will verify my safety invariants”.

When a function declares a safe argument, and receives an unsafe argument, we have a contradiction. The argument says the function will verify safety; the function says the argument will verify safety; in practice no such verification can possibly happen.

If you interpret “unsafe” to mean “ignore safety”, it makes sense to accept this contradiction. But “unsafe” exists within the broader context of a project that turned on strict safety. “Unsafe” does not mean “ignore safety”, and if it currently invites that understanding, we need a different keyword. “Unsafe” should mean “the language cannot verify safety here, but my caller must”. And the language should signal an error when that contract is demonstrably broken.

3 Likes

I've been thinking about this more, and I think that @safe declarations should take responsibility for variables that are their direct arguments. So if you pass a variable of unsafe type into a function that is @safe, we won't diagnose that use of the variable is unsafe. That covers the buffer.endIndex example (so we don't diagnose it) without opening up a more general hole around local variables or parameters.

I have a PR up to amend the proposal this way, here: SE-0458: Allow @safe declarations to subsume some responsibility for their arguments by DougGregor ¡ Pull Request #2680 ¡ swiftlang/swift-evolution ¡ GitHub

Doug

As the proposal author, Doug has asked to make a handful of revisions to SE-0458:

  • Using a local variable of unsafe type as an argument of an explicitly @safe API will no longer be treated as unsafe. For example, if buf is an UnsafeBufferPointer<Int>, it will be possible to write buf.count or buf.endIndex without writing the unsafe operator.
  • Types that contain stored properties or enum payloads of unsafe type will no longer default to being considered safe and must be explicitly annotated as @safe or @unsafe.
  • There is a new Alternatives Considered section on prohibiting unsafe conformances and overrides.
  • There is a new Future Directions section on handling unsafe code in macro expansions.

You can read the PR to see the exact set of changes.

I have applied these changes to the proposal document, and I am extending the review until February 11th, 2025.

John McCall
Review Manager

5 Likes

Thank you for the update. I had a few questions after reading the PR, which John advised that I post here:

In the new "types with unsafe storage" section, we describe how Swift types are required to pick @safe or @unsafe, with the implication being that we will either need to use unsafe for (approximately) every use of values of that type or not. It does not describe the default that Swift will use for types that are imported from Clang:

  • C structs/unions that contain pointers to C objects
  • C structs/unions that contain pointers to ObjC objects
  • ObjC classes that contain @private, @protected or @public pointer ivars to C objects
  • C++ classes that contain pointers

I have guesses for what the "best default" would be for several of these cases, but I was wondering if this is something that you have already considered.

In tow with this question, earlier on this thread, we brought up the possibility of using an @unsafe Copyable conformance. I imagine that @safe will largely repeat whether the function borrows the unsafe value, so @unsafe Copyable could be an alternative to it. We had an offline interaction where you said that there would be issues with that, but you weren't available to expand on it at the time.

For macros: is it OK to use unsafe outside of strict memory safety mode? If so, is it reasonable to make macro developers responsible for emitting code that is correct both with and without strict memory safety enabled?

For unsafe conformances: you bring up specifically the case of UBP's conformance to Collection. Earlier in the thread, we proposed that it could be OK to make UBP's Collection.subscript implementation bounds-check even when UBP used directly would not. Have we arrived at a more solid resolution on that?

3 Likes