[Prospective vision] Optional Strict Memory Safety for Swift

Finagolfin · October 4, 2024, 4:47am

That's great, hopefully you are already communicating these to the Swift team, privately where needed. Can you share some of the public ones in a thread here, say that diagnostic?

I don't think such simple "static coloring" really qualifies as "safety proofs."

My impression is that there is an overall effort underway by the Swift team on performance and C/C++ interoperability, both of which are needed by Apple and the broader software market that doesn't use Swift yet, while still maintaining security. One can certainly disagree with elements of how this particular vision implements those, but I don't think you can argue that this isn't where the market is going.

I think the Swift devs can do both, ie fix your list and work on a new feature like this, and the community can give feedback on this vision.

fclout · October 4, 2024, 5:23am

This is a defensible view, but it is visibly influenced by an app/server development background. Most developers in this area were brought up on memory-safe languages and respect unsafe constructs. As Swift is moving towards system-ey areas, it is being used more and more by people who are, for lack of a better description, very C-brained, sometimes with long careers behind them.

It is recognized with mild irony that being proficient at Rust makes you a better C++ engineer, because Rust instills good memory safety habits. The flipside is that C and C++ engineers who do not know Rust may have accumulated habits that are incompatible with consistent memory safety. This transfers to Swift too. When engineers who are primarily comfortable with C or C++ try out Swift, they tend to reach for unsafe constructs with laissez-faire. This is both an educational problem and a technical problem. Addressing situations where there is no memory-safe way forward is part of the solution, because sometimes you need unsafe pointers and you shouldn't; but sometimes people use unsafe pointers even though they don't have to. (You may have heard me harp on that every time there's a proposal to add something unsafe to the standard library.) Merely creating the tools isn't enough to ensure that they are being used when they should be.

I disagree that this is massively disruptive or high-effort. I think that you wouldn't have noticed if this was already implemented in the Swift compiler. With that said, this vision certainly does not preclude solutions to existing problems where people have to reach out to unsafe constructs for lack of better options. For instance, the new annotations for C source will help people avoid using unsafe pointers to talk to C libraries (one very common reason that unsafe pointers are used), and there are, of course, separate ongoing efforts to give Swift better options to work with lifetime safety and raw data. The overarching vision would most likely welcome simple suggestions that are not already being tracked.

sveinhal · October 4, 2024, 7:43am

Then we agree! I asked (perhaps poorly) if libraries will correctly be marked as @unsafe to prevent unsafe code from being called from safe.

Or, in other words, marking functions as unsafe could be orthogonal to allowing unsafe functions to be called.

If we always mark @unsafe functions as such, regardless of complier mode, there would be no safe/unsafe libraries — only safe/unsafe members (the same way some functions are async, throwing, etc). Then the only difference between strict memory safety and non-strict, would be whether calling unsafe functions produce an error or not.

I'm suggesting that (or rather, asking if) compilation always perform this strict analysis and always color functions accordingly. That way, it is always possible to link against any library — even one with unsafe code in it! But calling unsafe functions would not be! Kinda similar to how calling async functions from non-concurrent contexts produces an error:

error: 'unsafe' call in a function that requires strict memory-safety

grynspan · October 4, 2024, 4:07pm

This was a topic that came up when we pitched temporary allocations, and I think it's a good addition to the language.

I wonder if this should be modelled as an effect like throws and async? Something like:

func dereferenceZero() unsafe -> Int {
  ...
}

let x = unsafely dereferenceZero()

tera · October 4, 2024, 4:34pm

FWIW: New function colour: unsafe

fclout · October 4, 2024, 4:35pm

Yes, the unsafe attribute attaches to declarations, not entire libraries. A declaration is unsafe if it has the @unsafe attribute or if it references a type that has the @unsafe attribute. (Mostly) anything else is fair game. You can have types that are not wholly @unsafe but that have a few properties or methods which are.

I think that the exact override mechanism (how to use unsafe symbols in a function declared safe) is still to-be-specified. This will trivially work when you don't enable strict memory safety: it's foundational that nothing changes for you when you don't enable it, and that means you need to be able to continue to expose safe functions that use unsafe symbols. Doug threw out there an unsafe { ... } syntax or a @safe(unchecked) declaration attribute that could be used in modules that enable strict memory safety. (I personally think there's also an option to not even do that, and require people to have a separate module that disables strict memory safety to wrap their unsafe operations. That wishfully could motivate engineers to create safe primitives instead of sprinkling unsafe code less diligently.)

grynspan · October 4, 2024, 4:39pm

Yup—I don't claim to be clever enough to have thought of it first!

taylorswift · October 5, 2024, 9:08pm

overload selection in Swift is famously hard to understand, and this creates a lot of opportunities for accidental infinite recursion. here’s one of my favorite examples:

public
class Storage
{
    init() {}
}

public
protocol StatsCounter:ExpressibleByDictionaryLiteral
    where Key == Never, Value == Never
{
    init(_:Storage)
}
extension StatsCounter
{
    public
    init()
    {
        self.init(Storage.init())
    }
}
extension StatsCounter
{
    @inlinable public
    init(dictionaryLiteral:(Never, Never)...)
    {
        self.init()
    }
}

struct Views:StatsCounter
{
    init(_:Storage) {}
}

let views:Views = [:]

do you see the bug?

the issue, if you’ve run into this enough times, is that the empty [:] initializer is accidentally calling itself, instead of the Self.init() helper method. on Swift 6.0.1, this produces no diagnostic, and crashes at run time only on the paths that actually initialize the empty counter.

here’s another example that crops up from time to time:

@frozen public
enum AutomaticRenewal:Equatable, Sendable
{
    case enabled
    case disabled
}
extension AutomaticRenewal:RawRepresentable
{
    @inlinable public
    init?(rawValue:Bool) { self = rawValue ? .enabled : .disabled }

    @inlinable public
    var rawValue:Bool { self == .enabled }
}

func f(autorenew:AutomaticRenewal)
{
    guard autorenew == .enabled
    else
    {
        return
    }
}

f(autorenew: .enabled)

the bug here is that RawRepresentable suppresses the synthesized Equatable conformance that would otherwise appear for AutomaticRenewal, and replaces it with its own a.rawValue == b.rawValue implementation, which recurses infinitely leading to a crash at run time.

this produces no diagnostic as of Swift 6.0.1.

these are examples of code that looks clean, that i could easily imagine a team member (or myself) adding to a code base and passing review, and frequently finds its way into untested execution paths where they turn into DoS vulnerabilities when running on, for example, a server.

xwu · October 6, 2024, 1:56pm

I've been meaning to bring it up later, but since @taylorswift sort of touched on it tangentially, it's worth pointing out that we'll need a comprehensive understanding of "use" here when it comes to standard library facilities that intersect with language features. By this I mean (not a comprehensive list):

let x = 42 uses init(integerLiteral:) implicitly
for i in someCollection { ... } uses makeIterator() implicitly
let x: [Foo: Bar] = [someFoo: someBar] uses hash(into:), which uses Hasher.combine(_:), and so on...

When it comes to something like print, it'll just be a plain old Swift implementation, but some of these uses are not, yet we'll want to have an understanding that stays in sync with generated code which could change from version to version—cf. the reworking of for...in loops in Swift 5.

It's tempting to say that these are special cases, but a lot of these are pretty fundamental to any sort of idiomatic Swift that I don't think we can punt their consideration from the MVP. We will also want to make sure there is some forward-looking mechanism so that, for example, if we ever do something like SE-0213 again, we don't forget (or, ideally, it doesn't require human intervention to remember) to update the definition of "use" and allow holes to slip into this strict memory safety mode.

Joannis_Orlandos · October 6, 2024, 6:40pm

I'm really happy to see this vision document. Not so much because I think it needs to be adopted everywhere, but because it'll push the language in a healthy direction where improving performance of low-level code doesn't mean having to reside to (potentially) unsafe code.

I'm also really fond of the thorough explanations and examples. I don't think anything's unclear or missing from my perspective.

fclout · October 6, 2024, 9:54pm

It's always better to have fewer issues than more, but it needs to be said that not all security bugs are memory safety bugs, and in my professional opinion, it would be a stretch to say that stack overflows resulting in a deterministic trap are memory safety bugs. Among other languages considered memory-safe-enough, Java, C# and Rust all rely on the OS delivering an exception to your process to handle stack overflows. There is a memory safety bug when the stack pointer can decrement so much that it would jump over guard pages, but modern non-embedded OSes (and even some embedded OSes) know how to deal with that safely. Assuming that to be guarded against, the consequences of stack overflows don't line up with the problems that any of the 5 memory safety guarantees try to prevent.

Our opportunity for memory safety problems is to virtually stop their proliferation. There isn't at this time an equivalent opportunity to stop Swift programs from deterministically trapping. This is not to say that it shouldn't happen (I have use cases for a Swift dialect/mode where trapping operations would be recoverable somehow), but I think that we're not at this point yet, whereas it's a stone throw away for memory safety. It's not necessary to tie the two together for either to be successful.

taylorswift · October 6, 2024, 11:25pm

yes, but the purpose of a vision document is to signal a direction for language development, and the purpose of circulating a vision document is to gather feedback on that proposed direction. discussing whether this direction itself is worthy of elevated priority is certainly in scope.

as you’ve alluded to, security is more than just about memory safety. i personally feel that Swift has pretty good memory safety today. it’s not perfect, and this proposal would take the language from pretty good to even better, but i feel like we are quickly approaching a regime of diminishing returns, and that we would be better off taking a more well-rounded approach to “security” in the language. security is very much a game of weak links, and it doesn’t matter how well-secured your front door is if your first-floor window is unlocked.

it is tough to argue that we “should not make the language more memory-safe”, and i appreciate your concerns about C/C++ developers who are happily tracking unsafe constructs into code bases today. but in my opinion, this is a human behavior problem that should be addressed by training within organizations, and that adding ceremony to the language itself is not the most effective way to force this behavioral change. there are a lot of subtle pitfalls in the language today that cannot be avoided by educating developers on best practices, and we should focus our energies on addressing those.

software development isn’t a zero-sum game, but it’s also wrong to presume that a team can simply “do everything” with finite personnel and resources. if everything is a P0, nothing is.

Karl · October 23, 2024, 3:32pm

Douglas_Gregor:

The compiler would flag any use of the following unsafe language features:

@unchecked Sendable

unowned(unsafe)

nonisolated(unsafe)

unsafeAddressor, unsafeMutableAddressor

In addition, an @unsafe attribute would be added to the language and would be used to mark any declaration that is unsafe to use. In the standard library, the following functions and types would be marked @unsafe :

Unsafe(Mutable)(Raw)(Buffer)Pointer

(Closed)Range.init(uncheckedBounds:)

OpaquePointer

CVaListPointer

Unmanaged

unsafeBitCast, unsafeDowncast

Optional.unsafelyUnwrapped

UnsafeContinuation, withUnsafe(Throwing)Continuation

UnsafeCurrentTask

Mutex's unsafeTryLock, unsafeLock, unsafeUnlock

VolatileMappedRegister.init(unsafeBitPattern:)

The subscript(unchecked:) introduced by the Span proposal.

I have more to say on this proposal overall, but just briefly to add to this list: custom serial executors are potentially unsafe, as a buggy implementation could execute serial jobs in parallel, leading to data races.

Actors also expose unowned references to their executor - the expectation is that the actor itself retains the executor, but nothing guarantees this and it is certainly possible that a complex implementation would get it wrong.

I'm not sure about custom Task executors. It's possible they are safe, but it needs a more thorough analysis.

I don't think it is appropriate to mark this as unsafe. This initialiser stands out because (Closed)Range does not intrinsically have anything to do with the kinds of safety mentioned.

I think it is inconsistent for a language which performs Array accesses using signed integers to claim that merely constructing a malformed Range is a safety issue. Accesses to memory should be checked and that's where any safety issues may arise, but the construction of an invalid location is not.

xwu · October 23, 2024, 5:01pm

I think perhaps you're misunderstanding what's unchecked here about (Closed)Range.uncheckedBounds—it allows construction of a range without checking that lower is less than or equal to upper, which is useful in circumstances where you've already checked that invariant but is very much unsafe otherwise, in the same way that -Ounchecked math is unsafe.

Karl · October 23, 2024, 5:22pm

I understand that, and I don't think it's a safety issue.

Neither is "math" unsafe in -Ounchecked builds, for that matter. The reason -Ounchecked is unsafe is because it disables precondition checks which are used to implement bounds checking; it doesn't make math unsafe anywhere.

xwu · October 23, 2024, 5:37pm

Disabling these checks results in undefined behavior—which is not always a memory bounds safety issue but, as summarized in the pitch text, in putting the program in an otherwise impossible state breaks the language's semantic models to unpredictable results.

In the five-fold classification of memory safety given in the text, init(uncheckedBounds:) permits violations of initialization safety, in that it permits values to be initialized other than "properly" and to contain unexpected data.

Karl · October 23, 2024, 5:45pm

I don't think this is what initialisation safety means. When we're talking about formally defined and undefined behaviour in the language, there is no distinction between the values of any type. There is no such thing as expected/unexpected values in this context.

For example:

struct Foo {
  private var x: Int
  init() { x = 42 }
}

There is typically no way for me to create a Foo where x != 42. However, if I were to use unsafe APIs to load a value from raw memory, I could.

The creation of such a Foo is not undefined behaviour - formally speaking, it is perfectly fine for Foo.x to have any Int value. Your program may get an unexpected result, but it will be deterministic and stable across compiler versions. You won't get "nasal demons" where the observed behaviour changes wildly with the whims of the optimiser.

fclout · October 23, 2024, 6:18pm

It's indirectly a safety issue because range-based accessors on Array and possibly other collections assume that min..<max. If the unchecked initializer is safe, then subscript(Range<Int>) needs to be unsafe (or all implementations that make the assumption today need to stop making it). This is the less desirable of the two options. In an "ideal world", it would be impossible to create a range without checking that the lower bound is <= the upper bound; that's effectively what sticking @unsafe on it gives us in strictly memory-safe mode.

xwu · October 23, 2024, 6:32pm

Mmm, that is undefined behavior—it may not be "practical" UB when limited to only this code, but UB it is nonetheless.

fclout · October 23, 2024, 6:47pm

Complementing what @xwu said:

enum Bar {
	case bar, baz, frob
}

Without strict memory safety, it is entirely possible for you to reinterpret a random piece of memory as a Bar. The behavior is undefined when you use that invalid value. The compiler doesn't create an implicit default block for uninhabited values, so what happens is up to the emergent properties of Swift's codegen. It could well break memory safety, for instance by skipping over the initialization of a variable.

Undefined behavior isn't a function of whether you can ascribe meaning to an illegal operation. It's a function of what breaks when you do that illegal thing anyways. In the future, one very powerful optimization that Swift could start doing is build assumptions based on the values of immutable fields at the end of initializers. For instance:

struct LookupTable {
	let array: [Int]
	
	init() {
		array = Array(repeating: 10, count: 256)
	}
	
	subscript(byte: UInt8) -> Int {
		array[Int(byte)]
	}
}

If we accept that creating a LookupTable without going through init is undefined, and we accept that modifying a let field is undefined, then the compiler could infer that the bounds check in subscript is unnecessary because the array always has 256 elements and byte can never exceed 255. This relies on there being no well-defined programs in which that wouldn't be the case. On the other hand, if we decide instead that init is just a recommendation and it's perfectly fine to create LookupTable out of random bits, or that it's OK to modify array with unsafe pointers despite it being a let binding, then we can't do that.