[Prospective vision] Optional Strict Memory Safety for Swift

Douglas_Gregor · October 1, 2024, 9:54pm

Hello Swift Community,

The Language Steering Group would like to gather feedback on a prospective vision for optional strict memory safety for Swift. Vision documents help describe an overall direction for Swift. The actual Swift changes for executing on the vision will come as a series of separate proposals, so concrete details (e.g., specific syntax, API names, etc.) are less important than the overall direction. There is more information about the role of vision documents in the evolution process here.

The text of the proposed vision follows.

[Prospective vision] Optional Strict Memory Safety for Swift

Swift is a memory-safe language by default , meaning that the major language features and standard library APIs are memory-safe. However, it is possible to opt out of memory safety when it’s pragmatic using certain “unsafe” language or library constructs. This document proposes a path toward an optional “strict” subset of Swift that prohibits any unsafe features. This subset is intended to be used for Swift code bases where memory safety is an absolute requirement, such as security-critical libraries.

Introduction

Memory safety is a popular topic in programming languages nowadays. Essentially, memory safety is a property that prevents programmer errors from manifesting as undefined behavior at runtime. Undefined behavior effectively breaks the semantic model of a language, with unpredictable results including crashes, data corruption, and otherwise-impossible program states. Much of the recent focus on memory safety is motivated by security, because memory safety issues offer a fairly direct way to compromise a program: in fact, the lack of memory safety in C and C++ has been found to be the root cause for ~70% of reported security issues in various analyses [1][2].

Memory safety in Swift

While there are a number of potential definitions for memory safety, the one provided by this blog post breaks it down into five dimensions of safety:

Lifetime safety : all accesses to a value are guaranteed to occur during its lifetime. Violations of this property, such as accessing a value after its lifetime has ended, are often called use-after-free errors.
Bounds safety: all accesses to memory are within the intended bounds of the memory allocation, such as accessing elements in an array. Violations of this property are called out-of-bounds accesses.
Type safety : all accesses to a value use the type to which it was initialized, or a type that is compatible with that type. For example, one cannot access a String value as if it were an Array. Violations of this property are called type confusions.
Initialization safety : all values are initialized property to being used, so they cannot contain unexpected data. Violations of this property often lead to information disclosures (where data that should be invisible becomes available) or even other memory-safety issues like use-after-frees or type confusions.
Thread safety: all values are accessed concurrently in a manner that is synchronized sufficiently to maintain their invariants. Violations of this property are typically called data races, and can lead to any of the other memory safety problems.

Since its inception, Swift has provided memory safety for the first four dimensions. Lifetime safety is provided for reference types by automatic reference counting and for value types via memory exclusivity; bounds safety is provided by bounds-checking on Array and other collections; type safety is provided by safe features for casting (as? , is ) and enum s; and initialization safety is provided by “definite initialization”, which doesn’t allow a variable to be accessed until it has been defined. Swift 6’s strict concurrency checking extends Swift’s memory safety guarantees to the last dimension.

Providing memory safety does not imply the absence of run-time failures. Good language design often means defining away runtime failures in the type system. However, memory safely requires only that an error in the program cannot be escalated into a violation of one of the safety properties. For example, having reference types by non-nullable by default defines away most problems with NULL pointers. With explicit optional types, the force-unwrap operator (postfix ! ) meets the definition of memory safety by trapping at runtime if the unwrapped optional is nil . The standard library also provides the unsafelyUnwrapped property that does not check for nil in release builds: this does not meet the definition of memory safety because it admits violations of initialization and lifetime safety that could be exploited.

Unsafe code

Swift is a memory-safe language by default , meaning that the major language features and standard library APIs are memory-safe. However, there exist opt-outs that allow one to write memory-unsafe code in Swift:

Language features like unowned(unsafe) and nonisolated(unsafe) that disable language safety features locally.
Library constructs like UnsafeMutableBufferPointer or unsafeBitCast(to:) that provide lower-level access than existing language constructs provide.
Interoperability with C-family APIs, which are implemented in a non-memory-safe language and tend to traffic in unsafe pointer types.

The convention of using unsafe or unchecked in the names of unsafe constructs works fairly well in practice: memory-unsafe code in Swift tends to sticks out because of the need for withUnsafe<...> operations, and for large swaths of Swift code there is no need to reach down for the unsafe APIs.

However, the convention is not entirely sufficient for identifying all Swift code that makes use of unsafe constructs. For example, it is possible to call the C memcpy directly from Swift as, e.g., memcpy(&to, &from, numBytes) , which can easily violate memory-safety along any dimension: to and from might be arrays with incompatible types, the number of bytes might be incorrect, etc. However, “unsafe” or “unchecked” do not appear in this code except as the (unseen) type of the parameters to memcpy .

Moreover, some tasks require lower-level access to memory that is only expressible today via the unsafe pointer types, meaning that one must choose between using only safe constructs, or having access to certain APIs and optimizations. For example, all access to contiguous memory requires an UnsafeMutableBufferPointer , which compromises on both lifetime and bounds safety. However, it fulfills a vital role for various systems-programming tasks, including interacting directly with specialized hardware or using lower-level system libraries written in the C family.

Strictly-safe subset of Swift

Swift’s by-default memory safety is a pragmatic choice that provides the benefits of memory safety to most Swift code while not requiring excessive ceremony for those places where some code needs to drop down to use unsafe constructs. However, there are code bases where memory safety is more important than programmer convenience, such as in security-critical subsystems handling untrusted data or that are executing with elevated privileges in an OS.

For such code bases, it’s important to ensure that the code is staying within the strictly-safe subset of Swift. This can be accomplished with a compiler option that produces an error for any use of unsafe code, whether it’s an unsafe language feature or unsafe library construct. Any code written within this strictly-safe subset also works as “normal” Swift and can interoperate with existing Swift code.

The compiler would flag any use of the following unsafe language features:

@unchecked Sendable
unowned(unsafe)
nonisolated(unsafe)
unsafeAddressor, unsafeMutableAddressor

In addition, an @unsafe attribute would be added to the language and would be used to mark any declaration that is unsafe to use. In the standard library, the following functions and types would be marked @unsafe :

Unsafe(Mutable)(Raw)(Buffer)Pointer
(Closed)Range.init(uncheckedBounds:)
OpaquePointer
CVaListPointer
Unmanaged
unsafeBitCast, unsafeDowncast
Optional.unsafelyUnwrapped
UnsafeContinuation, withUnsafe(Throwing)Continuation
UnsafeCurrentTask
Mutex's unsafeTryLock, unsafeLock, unsafeUnlock
VolatileMappedRegister.init(unsafeBitPattern:)
The subscript(unchecked:) introduced by the Span proposal.

Any use of these APIs would be flagged by the compiler as a use of an unsafe construct. In addition to the direct @unsafe annotation, any API that uses an @unsafe type is considered to itself be unsafe. This includes C-family APIs that use unsafe types, such as the aforementioned memcpy that uses Unsafe(Mutable)RawPointer in its signature:

func memcpy(
  _: UnsafeMutableRawPointer?,
  _: UnsafeRawPointer?,
  _: Int
) -> UnsafeMutableRawPointer?

The rules described above make it possible to detect and report the use of unsafe constructs in Swift.

An @unsafe function should be allowed to use other unsafe constructs without emitting any diagnostics. However, there are also library functions that encapsulate unsafe behavior in a safe API, such as the standard library’s Array and Spanthat are necessarily built from unsafe primitives. Such functions need some way to acknowledge the unsafe behavior while still being considered safe from the outside, such as an unsafe { ... } code block or a @safe(unchecked) attribute.

The following sections describe language features and library constructs that improve on what can be expressed within the strictly-safe subset of Swift. These improvements will also benefit Swift in general, making it easier to correctly work with contiguous memory and interoperate with APIs from the C-family on languages.

Accessing contiguous memory

Nearly every “unsafe” language feature and standard library API described in the previous section already has safe counterparts in the language: safe concurrency patterns via actors and Mutex , safe casting via as? , runtime-checked access to optionals (via ! ) and continuations (withChecked(Throwing)Continuation ), and so on.

One of the primary places where this doesn’t hold is with low-level access to contiguous memory. Even with ContiguousArray , which stores its elements contiguously, the only way to access elements is either one-by-one (e.g., subscripting) or to use an operation like withUnsafeBufferPointer that provides temporary access the storage via an Unsafe(Mutable)BufferPointer argument to a closure. These APIs are memory-unsafe along at least two dimensions:

Lifetime safety: the unsafe buffer pointer should only be used within the closure, but there is no checking to establish that the pointer does not escape the closure. If it does escape, it could be used after the closure has returned and the pointer could have effectively been “freed.”
Bounds safety: the unsafe buffer pointer types do not perform bounds checking in release builds.

Non-escapable types provide the ability to create types whose instances cannot escape out of the context in which they were created with no runtime overhead. Non-escapable types allow the creation of a memory-safe counterpart to the unsafe buffer types, proposed under the name Span . With Span , it becomes possible to access contiguous memory in an array in a manner that maintains memory safety. For example:

myInts.withSpan { span in
  globalSpan = span // error: span value cannot escape the closure
  print(span[myArray.count]) // runtime error: out-of-bounds access
  return span.first ?? 0
}

Lifetime dependencies can greatly improve the expressiveness of non-escaping types, providing the ability to work with types like Span without requiring deeply-nested with blocks. Additionally, they make it possible to build more complex data structures out of non-escaping types, extending Swift’s capabilities while maintaining memory safety.

Expressing memory-safe interfaces for the C family of languages

The C family of languages do not provide memory safety along any of the dimensions described in this document. As such, a Swift program that makes use of C APIs is never fully “memory safe” in the strict sense, because any C code called from Swift could undermine the memory safety guarantees Swift is trying to provide. Requiring that all such C code be rewritten in Swift would go against Swift’s general philosophy of incremental adoption into existing ecosystems. Therefore, this document proposes a different strategy: code written in Swift will be auditably memory-safe so long as the C APIs it uses follow reasonable conventions with respect to memory safety. As such, writing new code (or incrementally rewriting code from the C family) will not introduce new memory safety bugs, so that adopting Swift in an existing code base will incrementally improve on memory safety.

In the C family of languages, the primary memory safety issue for APIs is the widespread use of pointers that have neither lifetime annotations (who owns the pointer?) nor bounds annotations (how many elements does it point to?). As such, the pointers used in C APIs are reflected in Swift as unsafe pointer types, as shown above with memcpy .

Despite the lack of this information, C APIs often follow a reasonable set of conventions that make them usable in Swift without causing memory-safety problems. Swift has a long history of utilizing annotations in C headers to describe these conventions and improve the projection of C APIs into Swift, including:

Nullability annotations (_Nullable, _Nonnull) that describe what values can be NULL, and affects whether a C type is reflected as optional in Swift.
Non-escaping annotations (e.g., __attribute__((noescape))) on function/block pointer parameters, which results in them being imported as non-escaping function parameters.
@MainActor and Sendable annotations on C APIs that support Swift 6’s data-race safety model.

To provide safer interoperability with C APIs, additional annotations can be provided in C that Swift can use to project those C APIs into Swift APIs without any use of unsafe pointers. For example, the Clang bounds-safety attributes allow one to express when a C pointer’s size is described by another value:

double average(const double *__counted_by(N) ptr, int N);

Today, this function would be projected into a Swift function like the following:

/*@unsafe*/ func average(_ ptr: UnsafePointer<Double>!, _ N: CInt) -> Double

However, Swift could use the __counted_by attribute to provide a more convenient API that bundles the count and length together, e.g.,

/*@unsafe*/ func average(_ ptr: UnsafeBufferPointer<Double>) -> Double

Now, a Swift caller that passes a local Double array would not need to pass the count separately, and cannot get it wrong:

var values = [3.14159, 2.71828]
average(values) // ok, no need to pass count separately

This call is still technically unsafe, because we’re passing a temporary pointer into the array’s storage down to the average function. That function could save that pointer into some global variable that gets accessed some time after the call, causing a memory safety violation. The actual implementation of average is unlikely to do so, and could express this constraint using the existing noescape attribute as follows:

double average(const double *__counted_by(N) __attribute__((noescape)) ptr, int N);

The average function is now expressing that it takes in a double pointer referencing count values but will not retain the pointer beyond the call. These are the semantic requirements needed to provide a memory-safe Swift projection as follows:

func average(_ ptr: Span<Double>) -> Double

More expressive Swift lifetime features can also have corresponding C annotations, allowing more C semantics to be reflected into safe APIs in Swift. For example, consider a C function that finds the minimal element in an array and returns a pointer to it:

const double *min_element(const double *__counted_by(N) __attribute__((noescape)) ptr, int N);

The returned pointer will point into the buffer passed in, so its lifetime is tied to that of the pointer argument. The aforementioned lifetime dependencies proposal allows this kind of dependency to be expressed in Swift, where the resulting non-escaping value (e.g., a Span containing one element) has its lifetime tied to the input argument.

C++ offers a number of further opportunities for improved safety by modeling lifetimes. For example, std::list<T> has a front() method that returns a reference to the element at the front of the list:

T& front();

The returned reference is valid so long as the list is valid, i.e., its lifetime depends on the this parameter. Describing that lifetime dependency in C++ can lead to a safe mapping of this API into Swift without the need to introduce an extra copy of the element.

sveinhal · October 2, 2024, 6:58am

My understanding of the ABI is vague at best, so this question may be nonsensical:

Could this be somehow encoded into the ABI so that a project that is compiled with strict memory safety, would not be able to call into linked libraries that aren't?

That is, will functions that use unsafe constructs, be themselves @unsafe, even in a non-strict compilation, so that calling them later from a strict context will produce a warning/error?

Sajjon · October 2, 2024, 7:08am

Can you please clarify this a bit, where would one write unsafe { ... } or @safe(unchecked), is that something that the Swift Standard Library itself has to make use of, which does not propagate (like throws -> try propagates)? Or do you mean that any caller to those Standard Library APIs (Array / Span) in Swift authors code would need to wrap calls to Standard Library with unsafe { ... }? You probably mean the former, because if the latter, than that would propagate again right?

fclout · October 2, 2024, 2:50pm

I believe this would be a non-goal. The objective is to ensure that a project does not introduce memory safety bugs. The security community recognizes that most memory safety bugs that are exploited in the wild come from new code. Therefore, the first step towards ending the supply of memory safety bugs is to stop writing new memory-unsafe code.

bbrk24 · October 2, 2024, 3:00pm

Or until they remove the element from the list, right? This could be a call to pop_front(), or calling pop_back() enough times, or probably certain things in the algorithms header...

cocoaphony · October 2, 2024, 3:19pm

We are right in the midst of a highly disruptive safety move in the form of strict concurrency and Swift 6. I believe we should absorb the lessons of that effort before we take a step onto another of these arcs. I know this is optional, but so is Swift 6, and it still is incredibly disruptive to the community. I'm not saying it's not valuable. It really, really is. But it is also exhausting.

We need time and core-team focus to bring that concurrency effort to a stable point, learn its lessons, and then apply them to a new safety vision. I'm asking to delay this vision until we can do that. I want the team to be focused on nailing down what we have before continuing down the path. This seems a distraction from that key mission.

Please not yet.

When we are ready to proceed, a key thing I want in these early vision documents is what kind of impact it is expected to have on key groups. Will this require new annotations across Foundation? Combine? Is it expected to make popular libraries (including Apple frameworks) unusable within the subset? If popular libraries make use of this subset, will they still be easily consumable (without warnings) by projects that don't opt-in? Is this expected to impact back-deployment to older OSes (Apple and otherwise)? Do we expect these annotations to start showing up in app-level code, or are they expected to live entirely in low-level modules?

I know this is vision, and we don't have or want full APIs yet. But we should have from the beginning a sense of the blast radius and how long it will take to stabilize.

What is the canonical use case? Are there specific teams or projects that should be considered an obvious customer? Does this vision align with their actual (rather than theorized) needs? How will we be able to tell, 5 years from now, if this vision was successful?

I don't know about everyone else right now, but Swift concurrency has exhausted me, and I feel it is still shifting under my feet. I want every major effort right now to be about addressing that, and then use those lessons to launch into this next chapter.

austintatious · October 2, 2024, 8:50pm

Big agree

KeithBauerANZ · October 3, 2024, 1:06am

A little confused at the approach here.

Rust does well with unsafe blocks to delimit known-unsafe code, and everything else being safe-by-default. Swift could certainly go the same way (requiring an unsafe block to access any of the types currently named with Unsafe in their name, as well as any bridged C/C++/ObjC API, with potential scope to annotate known-safe APIs at import). This approach also allows individual targets to opt into strict safety via a #[deny(unsafe)] pragma, which only disallows usage of the unsafe keyword. Lint tools get to key off the unsafe keyword to ensure that comments document why a particular unsafe block is presumed safe in context, too, and the keyword provides a nice "speed bump" forcing consideration on the way to making a crash.

Honestly I presumed we were on our way here anyway — it seems the obvious next destination after Swift 6, and given the direction with Span, etc. I'm only unsure why it wasn't done in the early days of Swift, given Rust's unsafe block was certainly known back then.

What's to gain from making this a separate mode, other than further division of developers and codebases?

fclout · October 3, 2024, 3:23am

“Different mode” here is more or less a glorified diagnostic switch. The alternative is to break source compatibility for everyone using unsafe code today, which would be onerous for a feature that most projects already existing today are not expected to enable.

KeithBauerANZ · October 3, 2024, 3:55am

Why would you assume most projects don't want this? My gut instinct is that most projects do want this — they're not exactly full of Unsafe types and functions anyway, and a nice easy safeguard against accidentally introducing unsafety is probably going to be quite popular.

Failing that, efforts like Hummingbird will want it ('cos memory unsafety in a webserver is "you getting hacked"), and pressure from big projects which do want it will push the whole ecosystem in this direction.

I guess the big "unknown" is "Apple's future plans for its SDK", but I would presume that given they're all-in on Swift, whilst legacy ObjC frameworks might never be safe, SwiftUI and essentially all new APIs would be...

It feels strangely noncommittal, to me. To have got to 99% memory safety in Swift 6, and then say "the last 1% is a niche interest", rather than "the last 1% is occasionally and unfortunately unavoidable, but deserves a first-class audit trail"

Jon_Shier · October 3, 2024, 4:20am

If you can’t use anything unsafe, and your dependencies can’t use anything unsafe, and their dependencies can’t, ad infinitum, that rules out most existing Swift code. Unsafe constructs are the only way to get get good performance in many situations, and we see that across the Swift ecosystem. Banning unsafe constructs is an extremely high bar, one that the vast majority of code never needs to clear.

KeithBauerANZ · October 3, 2024, 4:30am

I mean, I suspect that most users don't want to 100% forbid it in all dependencies (as you say, it's a critical technique, and taking that to extremes would also ban the stdlib!) But like Rust has found, people will want an audit trail for unsafe code — something that can be searched for, linted against, etc. People will want to know it's not being used gratuitously.

sveinhal · October 3, 2024, 10:36am

What is a non-goal? Either I don't understand you, or you don't understand me. I'm asking to make it harder to call unsafe functions by always coloring them @unsafe, even in non-strict compilation modes.

Vinicius_Vendramini · October 3, 2024, 12:08pm

I think I have an issue with the part about C APIs.

From my understanding of the document, the idea is that someone making an app or a library that needs strict memory safety would also want to ensure the libraries they use adhere to this strict mode. However, allowing library authors to annotate C APIs in a way that makes the Swift compiler view them as safe without checking could break this safety guarantee. If the C code is unsafe (because of a coding mistake) but is annotated as safe, that introduces an issue that its users wouldn’t know about.

I would maybe suggest:

Adding some granularity to the strict mode, so users can decide if they want to only allow code that the compiler checked or if they want to allow user-checked C code, etc.
Making it easy to trace user-checked code, maybe by propagating an @unchecked flag or something, so that users can avoid that code if they want to, and so that it doesn’t get introduced in libraries they use without their knowledge.

xwu · October 3, 2024, 2:25pm

So you're suggesting a toggle to disable C interop?

fclout · October 3, 2024, 3:46pm

It is a non-goal to prevent safe Swift code from calling into unsafe libraries.

The C family of languages do not provide memory safety along any of the dimensions described in this document. As such, a Swift program that makes use of C APIs is never fully “memory safe” in the strict sense, because any C code called from Swift could undermine the memory safety guarantees Swift is trying to provide. Requiring that all such C code be rewritten in Swift would go against Swift’s general philosophy of incremental adoption into existing ecosystems. Therefore, this document proposes a different strategy: code written in Swift will be auditably memory-safe so long as the C APIs it uses follow reasonable conventions with respect to memory safety. As such, writing new code (or incrementally rewriting code from the C family) will not introduce new memory safety bugs, so that adopting Swift in an existing code base will incrementally improve on memory safety.

This is not a language feature that guarantees that your code is transitively memory-safe. It guarantees that your code is not introducing memory unsafety.

dnadoba · October 3, 2024, 4:28pm

I'm really exited to see this! swift-asn1 and swift-certificates fit the bill for this feature and would want to adopt Strict Memory Safety. These are libraries that parse untrusted binary data and Memory Safety is critical as we can see from security issues from other libraries in this space.

swift-asn1 has only three unsafe functions that make use of UnsafeBufferPointer<UInt8> (here, here and here) that could entirely be replaced with the proposed Span type.

swift-certificates has only two unsafe load operation that can even today be rewritten using safe integer parsing we use in the rest of the project. This is a good example of an accidental und unnecessary use of unsafe Swift and could have been prevented through the proposed Strict Memory Safety mode.

Both libraries will benefit in the long run to make sure unsafe Swift isn't accidentally used while the library evolves. This will make contributing and reviewing PRs smother as well as new contributors get early feedback about their code before opening a PR. They will not start going down the unsafe path just to hear from us after opening a PR that this is not allowed.

Vinicius_Vendramini · October 4, 2024, 2:57am

I’m honestly not sure what this could look like, it’s more of a vague idea.

It doesn’t though, right? If a library I use starts using unsafe C code, or if I start using it, I’m introducing memory unsafety even with the toggle on.

I’m not arguing that this should be banned or anything, only that if we’re discussing the vision for these features going forward, it might be worth considering cases where users also want to avoid potentially unsafe C interops.

taylorswift · October 4, 2024, 4:11am

i think the basic idea that security goes hand in hand with memory safety is correct. a lot of security issues involve unsafe memory usage in some way, and programs that use memory more safely have fewer security issues than code that makes heavy use of unsafe constructs.

this is an area where Swift has historically done well — like you say, Swift is a memory-safe language by default, and this guides developers towards writing programs that use fewer unsafe constructs and have fewer security issues.

but that’s not really what this vision is about, this vision is about defining guarantees that no unsafe constructs are being used anywhere in a stack and using static coloring to actively prevent code that lacks this guarantee from being used.

one way to think about this is to imagine that the likelihood of security issues scales proportionately to the amount of unsafe constructs being used. under this model, Swift (by being Swift) has already eliminated 99 percent of the occurrences of unsafe constructs, and the remaining 1 percent are tough joins where it is really impractical to avoid using unsafe constructs. sure, getting rid of that last 1 percent would make programs even slightly more secure, but that last 1 percent is much harder to achieve than the first 99 percent.

at this point, i think we should really step back and think about other rough edges in the language where it is really easy to write “clean” code that looks like it does the right thing but doesn’t — sometimes with catastrophic consequences!

these pitfalls are already well known to experienced Swift programmers, are perennial tripwires for teams, also count as potential attack vectors, and addressing them is less likely to involve a massively disruptive layer of annotations that needs to propagate through the ecosystem to reach people. for some of them, the fix is as simple as adding a compiler diagnostic.

these are low-effort, high-impact improvements we can make to the language, that can be tackled individually without needing a master plan integrating them all together, and would make Swift programs as a whole more secure than a narrow focus on safety proofs will.

fclout · October 4, 2024, 4:43am

It does guarantee that that your code is not introducing memory unsafety, where "introducing" sort of means "taking responsibility for". If you have a memory-safe interface to a library dependency (ie, it does not accept unsafe pointers, and it does not fundamentally defeat memory safety by its existence), even if the implementation of that library has a memory safety bug, then that bug can be fixed without you making any changes to your code. This model allows you, at scale, to make progress towards a system that is fully memory-safe.

You would be correct that your program, as a whole, could still contain memory safety bugs transitively. However, you would not be responsible for that bug, and over time, the likelihood that your program as a whole contains a memory safety bug will go down because (ideally) new code guarantees its own memory safety, and old code is gradually being displaced with memory-safe code.