[Pitch] Opt-in Strict Memory Safety Checking

xwu · December 29, 2024, 5:49pm

Douglas_Gregor:

Grabbing a random @safe(unchecked) function out of my pull request, let's see how it would look:
  static func getProcessName(pid: pid_t) -> String {
    let buffer = unsafe UnsafeMutableBufferPointer<UInt8>.allocate(capacity: 4096)
    defer {
      unsafe buffer.deallocate()
    }
    let ret = unsafe proc_name(pid, buffer.baseAddress, UInt32(buffer.count))
    if ret <= 0 {
      return "<unknown>"
    } else {
      return unsafe String(decoding: buffer[0..<Int(ret)], as: UTF8.self)
    }
  }
As we probably expected, it's going to be fairly noisy to use unsafe types like UnsafeMutableBufferPointer with this approach. That's probably a good thing: it's indicating everywhere that we have an unsafe operation, and "count of uses of unsafe" is a decent proxy for how many unsafe operations there are.

This turns the feature from one that involves (initially, though of course the idea is to promote vending safer APIs) opting in and annotating your interfaces to one that requires invasively decorating implementation bodies.

For anyone writing in this opt-in mode, it causes any unsafe API to gain a hoistable unsafe prefix: for example, it turns the spelling of UnsafeMutableRawBufferPointer.init into unsafe UnsafeMutableRawBufferPointer.init.

If APIs need to retain names that mark them as clearly unsafe for use in the default language mode, then each such API is necessarily redundantly marked in the opt-in mode. And, we would nudge authors of any API newly created in this opt-in mode to abandon the existing naming practice because it would be seemingly redundant in their hands, leaving clients writing in the default language mode with less warning than the status quo.

fclout · December 29, 2024, 6:45pm

It'll be easiest to explain my position by breaking down the pieces of try/do ... catch/throws:

a function that can throw errors must be annotated with throws
an expression that includes a call that can throw must be annotated with try (expr) or one of its variants
do ... catch allows you to show the compiler that errors are handled, and may let you skip the throws

As I understand your current proposal:

@unsafe maps to throws: a function annotation that says the function leaks its unsafety to callers
the unsafe { ... } block maps to do ... catch: a syntax to tell the compiler the unsafety does not leak out of that block
nothing maps to try (expr)

My suggestion instead had nothing mapping to do ... catch, and has unsafe (expr) mapping to try (expr).

The piece of feedback that's very important for me is that we need something like try (expr). I'll have to check after the break for how the others feel about something mapping to do ... catch, but from my perspective, there is no real security difference between the proposed unsafe { ... } block and @safe(unchecked) on a function, so we could also find a way to fit that in. For instance, we could do this:

unsafe (expr) is required to mark unsafe expressions
a function with "unhandled" unsafe code must be annotated @unsafe
inside a function, you can "handle" unsafe code by putting it inside a safe(unchecked) { ... } block; you then do not need to annotate the function as @unsafe. This (maybe?) removes the need for a @safe(unchecked) attribute.

I agree that the compiler can have a fixit to add unsafe in front of unsafe expressions for you. What I'm trying to avoid is normalizing laundering unsafe code as safe. It's easier to set people on the right foot if they need to ask for help the first time, rather than if the compiler tells them their options without having a chance to explain them.

I also agree (and expect) that this is more verbose. That will hopefully drive people to use a little less unsafe code. For instance, engineers could choose to use an Array to manage the backing buffer instead, and lose all the unsafe except for the call to proc_name:

static func getProcessName(pid: pid_t) -> String {
  var buffer = ContiguousArray(repeating: 0, count: PATH_MAX)
  let ret: CInt
  safe(unchecked) {
  	ret = unsafe proc_name(pid, &buffer, UInt32(buffer.count))
  }
  if ret <= 0 {
    return "<unknown>"
  } else {
    return String(decoding: buffer[0..<Int(ret)], as: UTF8.self)
  }
}

This is not "ideal" because there is a cost to initializing the array fully that was avoided before, but for a lot of people, that could be better than coming up with the unsafe equivalent.

fclout · December 29, 2024, 7:25pm

Our first goal should be to make new code safer. Making existing code safer is nice, but a lower priority, both because it is harder (as you said) and less useful (security bugs are the most dense in new code). The compromise we made with -fbounds-safety is that libraries that do not enable it can still annotate their headers with bounds annotations, so that clients that do enable it can benefit from them even if the implementation might not be safe. Swift can do the same thing: let modules that don't enable strict memory safety still annotate functions with @unsafe. This is an alright place to land for a module that pervasively uses unsafe pointers today, until someone has the time and energy to put all of these new safety features to work and use fewer unsafe pointers, when possible.

The convention of using "unsafe" in the name of unsafe things is very unevenly respected today. For instance, Unmanaged, OpaquePointer and VaList are all unsafe and they don't say "unsafe" in the name. C-family functions that use pointers aren't imported with "unsafe" in their name. Most members of UnsafePointer are unsafe and don't have "unsafe" in their name. The new Span has an unsafe subscript that is labeled "unchecked" instead of "unsafe".

Second, any time that there are safe and unsafe variants of the same thing, there will have to be a naming difference. Since @unsafe is not ABI-visible, there is no hope of overloading on @unsafe. For instance, the unsafe Span subscript has the "unchecked" label because otherwise, it cannot be syntactically disambiguated from the safe one. Another example of this would be withCheckedContinuation/withUnsafeContinuation.

If that sounds right to you, then it seems the only APIs at a higher risk of being "sneakily unsafe" are those that only have unsafe variants and that are vended from modules that enable strict memory safety.

xwu · December 29, 2024, 11:14pm

That is actively desirable. Overload resolution is underspecified, and we would not want, say, the order in which files are compiled or in which types are sorted alphabetically to change whether user code ends up calling either a safe or an unsafe API in the default language mode.

This “only” would be a big caveat: if modules that specifically care a lot about supporting strict memory safety are nudged to vend APIs with less obviously unsafe names that specifically impact users who write in the default language mode (and thus actually rely on those names to know they’re using unsafe APIs).

fclout · December 29, 2024, 11:44pm

I don’t want to overload on unsafe either. There are a lot of reasons it’s a bad idea, but the fact it’s ABI-infeasible is the least appealable one.

I thought the main way we expected you would get worse names was that people who enable strict memory safety wouldn’t bother calling things “unsafe” again because they would pay less attention to the unsafe world. If authors are asked to specifically cater to the unsafe crowd, it seems we are no longer in that situation, but rather in the case where you need a different name anyway to disambiguate with existing safe features. What am I missing?

xwu · December 30, 2024, 12:44am

Yes that’s the primary scenario which would be counterproductive in my view: this new opt-in mode mustn’t nudge the default language mode in the direction of being worse off than the status quo, including when using strict memory safety libraries.

After all, we are staking a position that the default language mode is the right default, including for new code. As libraries (including of course the standard library) opt to support strict memory safety, if our default is correct and, say, 99% of “leaf” clients (apps, server executables, etc.) continue to be most suited for the default language mode, then any slight regression in helping users know when they’re using unsafe APIs in that default language mode offsets the benefit of all of this work to a very large degree.

j-f1 · January 5, 2025, 2:49pm

Is @exclusivity(unchecked) considered an unsafe construct since it removes runtime checks, or is it the case that strict concurrency checking will properly diagnose all possible exclusivity violations at compile time? (And should the mode that diagnoses unsafe constructs therefore require the Swift 6 language mode?)

Douglas_Gregor · January 7, 2025, 9:06pm

Yes, it should be considered an unsafe construct. The strict concurrency model isn't sufficient here; you can create dynamic exclusivity violations with classes in single-threaded code. I'll update the proposal, thank you!

Doug

geoffreygaren · January 9, 2025, 12:28am

I don’t think it’s sound to allow an @unsafe function override (or protocol conformance), when the base function was declared safe (even if the overriding type is marked @unsafe).

If we allow it, we create a system in which an unbounded number of call sites might invoke an unsafe function with no local acknowledgement that they’re doing so. Therefore, we do not meet our stated goal to “identify those places in Swift code that make use of unsafe language constructs and APIs”.

Pragmatically, @unsafe means that it is the caller’s job to ensure some set of outside-the-language invariants. Without local knowledge at the call site that such invariants exist, there’s no way for the caller to ensure them.

You could argue that this problem is the “fault” of the @safe(unchecked) function that returned an unsafe type implicitly up-casted to its safe base type. However, the only purpose of this feature is to enable such a function; and given that such a function cannot meet our stated goal, it is not desirable to enable it.

I propose instead that, in strictly safe code, an @unsafe override to a safe function is an error. This is identical to the problem of an override changing a function return type: In both cases, the override produces a change of requirements for the caller, so it is no longer possible to represent a single call site that can call both functions. The solution to this error is to add code inside the override to verify its outside-the-language invariants.

And what if that solution is impossible? What if (a) only the caller can verify the callee’s outside-the-language invariants and (b) the caller has no way to know that the callee has outside-the-language invariants? Well, that’s a contradiction. The only rational outcome is an error.

geoffreygaren · January 9, 2025, 12:34am

I think the "### Unsafe compiler flags" section needs to include

-enforce-exclusivity=unchecked
-enforce-exclusivity=none
SWIFT_STRICT_CONCURRENCY != complete (i.e., not enabling Swift 6)

The first two break lifetime and bounds safety. The last one breaks thread safety.

geoffreygaren · January 9, 2025, 12:59am

I agree with the jist of @fclout's feedback:

..."a human reviewing the code should have a clear, definitive understanding of what is unsafe"), which the current implementation only does at a very coarse granularity.

A function or type level opt out of safety is too coarse grained to meet our stated goal to “identify those places in Swift code that make use of unsafe language constructs and APIs”.

For context, Rust originally implemented function-level unsafety granularity, but came to regret it. I don't mean to say that we should copy Rust; but I do think that we should learn from their experience.

You could argue that it is bad style to write large unsafe functions or types, but:

I have found in code review that it only takes a few lines of code before I lose track of which named variables have safe types and which have unsafe types -- and when an expression uses type inference, or temporaries, or a type that has some safe interfaces and some unsafe interfaces, I actually don't know at all which expressions are safe or unsafe.
If we are really going to stand behind writing small unsafe functions and types as an essential component of our memory safety strategy, then we need a proposal for enforcing a maximum character and/or line length (of course, nobody wants that -- there are other, better options to improve granularity here, but that's what a self-consistent position would require)

This looks pretty nice to me:

`ret = unsafe proc_name(pid, &buffer, UInt32(buffer.count))`

I would like this even better, if it's feasible:

`ret = proc_name(pid, unsafe &buffer, UInt32(buffer.count))`

What I like about the second formulation is that it draws attention to the fact that what I need to verify is the use of buffer as an unsafe pointer in this argument context. (In contrast, if the whole call were labeled unsafe, I would expect that to mean that the function itself had unsafe semantics, and I needed to read about and reason about its preconditions and/or side effects.)

fclout · January 9, 2025, 4:59pm

I think that we align on this. When it comes to the second formulation, the current proposal is that all functions that take an unsafe pointer need to be @unsafe themselves. To accept proc_name(unsafe &buffer) when both &buffer is unsafe (because it has a reference-to-pointer conversion) and proc_name is also unsafe (because it takes a pointer), at least one of the following things has to happen:

we need to accept that some functions that take an unsafe pointer are not themselves unsafe;
we need to create exceptions to the rule that @unsafe functions and features must be enclosed in an unsafe expression.

IMO, it's not worth changing either to get more specific unsafe expressions.

Joe_Groff · January 9, 2025, 5:05pm

One way to approach this would be to avoid considering types themselves as wholly unsafe or not, and only judge operations as unsafe. That would allow for pointers to exist and be passed through safe code and only infect unsafety on operations that actually perform load/store/arithmetic/other unsafe operations on them.

fclout · January 9, 2025, 5:31pm

The proposal already allows you to make specific operations @unsafe instead of making an entire type @unsafe and it's a conscious choice (with which I agree) that Unsafe*Pointer types are @unsafe at the type level. Copying or moving unsafe pointers are unsafe operations, and since Swift doesn't syntactically expose copy and move constructors, the only way we have to encode that copying and moving is dangerous is to put @unsafe at the type level.

Karl · January 9, 2025, 5:38pm

How do you figure?

What are the preconditions that must be met by the programmer for those operations to not encounter undefined behaviour?

fclout · January 9, 2025, 5:47pm

The precondition is that the backing storage has to stay alive until the last time the pointer is dereferenced. This is not encoded in the type and not knowable at runtime in the general case, so there's no way to know aside from reading the source.

There's other preconditions, but they don't tend to cause memory corruption as often (for instance, you can't read from your pointer as type A while you also write to it as type B).

Karl · January 9, 2025, 5:54pm

I don't believe the backing storage needs to be alive for either of those operations to have defined semantics - I mean, UnsafePointer<T> is BitwiseCopyable after all, so I don't think it can depend on what lies at the end of the pointer. I don't believe aliasing affects the semantics of pointer copies, either.

There are times where you may want to refer to a pointer before it is alive - for instance, mmap allows you to specify a starting address. You may want to create pointers to that address before it is live, and while that is unsafe (you need to make sure not to read before it is mapped), I don't think it is undefined behaviour to construct or copy those pointer values.

fclout · January 9, 2025, 6:50pm

"It is undefined behavior" and "it is safe/unsafe" are different concepts. "It is safe" is used here to mean "it never leads to memory corruption", and "it is unsafe" means "you are in charge of verifying that the invariants hold to prevent memory corruption".

You might as well say that a copyable struct File { let fd: CInt } is always valid because it's BitwiseCopyable. The better understanding is that assuming an existing, correctly-initialized File (or UnsafePointer), you can safely pass it down to other users so long as they commit to not escaping the scope in which it is existing and correctly-initialized. Copy and move are the two operations that escape values. Hence, copy and move are unsafe operations when they exist despite a general requirement that the value cannot escape. (A move-only type would make things different, because the copy operation wouldn't exist and we would assume that the value would clean up after itself after its last use; this is specifically about copy and move when applied to types that are both copyable and movable.)

You are correct that an additional precondition on UnsafePointer is that you know whether it's in an initialized state or not. This makes even borrows an unsafe operation (even when you start off with a correctly-initialized value) because UnsafePointer allows you to interrupt the lifetime of its pointee at any time.

"There are times where you may want to refer to a pointer before it is alive" is an argument that pointers are unsafe but that maybe there exists a way to use the correctly, which, it seems to me, validates what I'm saying.

Joe_Groff · January 9, 2025, 7:01pm

Copying and moving a pointer doesn't ever by itself introduce undefined behavior, though (at least on our existing platforms; perhaps a CHERI-style architeture might eagerly blow up if you touch an expired or uninitialized pointer value?)

fclout · January 9, 2025, 7:12pm

This is technically correct, but I think it misses the point. If there was a way to check at the point you load from or store to a pointer that its lifetime was valid in the way that we can check that bounds are valid, I would agree that we don't need to think of copy and move operations as safe or not, and we could punt all unsafety onto unchecked load/store operations. However, lifetime safety can't be checked at runtime in the general case. You make lifetime safety problems go away by proving the pointer doesn't escape, so operations that can escape the pointer (as opposed to passing the pointer as an inout or borrow, which don't involve semantic moves or copies) have to be considered unsafe.

(Edit: in other words, the copy/move operation is the last operation in a sequence leading to memory corruption where we have a chance of catching a lifetime issue. If we don't take that opportunity, we should consider it unsafe.)