New function colour: unsafe

tera · June 7, 2023, 7:43pm

This got me thinking: is it ok to record the important notion of "safe/unsafe" just in the unchecked word "Unsafe" or its absence in function & type names? It feels rather fragile ad-hock poor man's approach, a substitute for a missing feature. Bike shedding how that new feature could look like:

unsafe struct SomeType { ... }

func foo(_ param: SomeType) { // 🛑 error, `foo` must be marked unsafe because it is using unsafe parameter
// fixit 1: don't do that
// fixit 2: mark `foo` unsafe
    ...

    var x: SomeType = ... // 🛑 error, can't use unsafe type from a safe function
    // fixit 1: don't do that
    // fixit 2: mark `foo` unsafe
}

unsafe func bar(_ param: SomeType) { // ok
    ...
    var x: SomeType = ... // ok
}

func baz() {
    bar(param) // 🛑 error, can't call unsafe function from a safe function.
    // fixit 1: don't do that
    // fixit 2: mark `baz` unsafe
}

If that's maintained universally, then looking at the top level "func main()" vs "unsafe main()" it would be clear if we are looking at a safe or an unsafe function / app.

Without this, I may be looking at a large function and as it doesn't use types with "Unsafe" in their names and doesn't call (directly or indirectly) functions with "Unsafe" in their names, after a thorough and lengthy audit I'd come to a conclusion that the function is safe. Then some time later someone changes the function (or one of the things it calls indirectly) voiding that hard earned conclusion.

NB, we do similar function colouring with @noallocations / @nolocks attributes.

Karl · June 7, 2023, 8:33pm

I think the precedent for not annotating unsafe APIs came from withContiguousStorageIfAvailable. I had a look and I'm not able to find those discussions, but I'm quite sure it was argued that this should have "unsafe" in the name, and that the core team rejected that idea. Does anybody else remember that?

String.withUTF8 simply follows that precedent.

It's not necessarily just a problem of lacking the tools to annotate unsafe APIs; I think there has been what I can only describe as a cultural shift to put brevity above consistently communicating safety - we could have added the word "unsafe" to the above APIs to make things clearer at the call-site, but it was decided not to do that.

Now, especially when it comes to things like the concurrency, I feel that new APIs are being somewhat rushed, and that we should take the time to innovate in this space and consider ways to protect developers from making mistakes (because everyone - even experts - can make mistakes). For example, I outlined one way in which I think we could make custom actor executors safer:

[Second review] SE-0392: Custom Actor Executors

I'm really uncomfortable with the unownedExecutor requirement for actors. I know that it already exists, but I think it goes against everything we've tried to achieve in Swift with regards to safety. In the best case, it's a pointer and we just hope that keeping the actor alive is enough to keep the executor alive as well; and in the worst case, you can return different executors every time and defeat all of the data race safety that we hope to achieve with language-integrated concurrency.

I think it it worth exploring safer designs here. For instance, distributed actors have a synthesised actorSystem stored property which must be set. I think a similar design could work for the executor of a regular actor:
distributed actor Greeter {
  init(name: String, actorSystem: ActorSystem) {
    self.name = name
    self.actorSystem = actorSystem // required (!) initialization of implicit actorSystem property
  }
}
That way, we could guarantee the invariant that executors will live for as long as there are any actors which use them, that the executor cannot be changed after initialisation, and that all accesses to the synthesised property return the same executor instance. That's at least something.

If an actor does not explicitly initialise its executor, it would be initialised to a default executor.

This didn't happen. It was called "interesting" and suggested that maybe we'll investigate it one day - but personally I'm not optimistic; I think we're going to move on and forget all about it, or it'll become impossible to add later due to backwards compatibility. Personally, I think safety should be a major blocking issue rather than an afterthought or something to try patching in later, and that it warrants in-depth investigation before a feature is allowed to ship, but it seems the language workgroup do not agree.

Which leads me to reiterate the point I made above about APIs such as wCSIA and withUTF8 - I feel like it's a cultural shift. It's not a problem that can be fixed by adding a new function colour.

Joe_Groff · June 7, 2023, 10:06pm

Having a proper, viral "unsafe" annotation and unsafe blocks would be a much better approach than the ad-hoc naming convention we sometimes follow today. Getting rid of the repeated unsafe in APIs would make code more concise while strict checking would make it more likely that unsafe functions or code blocks are clearly marked and can be easily found for auditing. This would also be nice to allow for developers to impose a "no unsafe code" policy for some or all imported packages. I think this is the right direction to go.

taylorswift · June 7, 2023, 10:13pm

i still do not know why is it legal to escape self into a child Task from a deinit. i bring this up because this is a lego i have stepped on several times, and it couldn’t have been prevented by sprinkling the word “unsafe” around that API, because it is not an API.

Jumhyn · June 7, 2023, 10:15pm

IIRC this should now be safe in the sense that it will crash when we attempt to dealloc self and find that it has a nonzero ref count.

taylorswift · June 7, 2023, 10:17pm

that is certainly an improvement (i recall it used to manifest as a segfault on 5.7). but really i think this ought to be something the language can diagnose/prevent at compile time.

a related observation: i think we’ve also indirectly benefited from some of the tooling improvements on the VSCode plugin side recently. these kinds of unsafe things were really difficult to debug a year ago.

tera · June 7, 2023, 10:22pm

It's definitely better than nothing, but obviously not as good as a static check, as it could be missed during testing:

    deinit {
        if rare_condition() {
            Task {
                print("\(self)")
            }
        }
    }

beccadax · June 7, 2023, 10:25pm

An important piece that’s missing from this model is the ability to write a function or type which is considered safe but has an unsafe implementation. It is often possible to use unsafe primitives to write an operation that (assuming you have implemented it correctly) is safe, and packaging unsafe implementations in a safe form is good programming practice that reduces the amount of code that might have safety bugs.

(This would be explicit, of course—you would have to mark the code somehow as safe-with-unsafe-internals.)

With that in place, it would be extremely rare, possibly even illegal, to have an “unsafe main”.

Karl · June 7, 2023, 10:27pm

That kind of thing is more likely to be a bug, IMO.

The thing that I'm referring to is that Swift used to be a language which decided that all Array accesses would be bounds-checked, and even basic arithmetic operations such as + and - would trap on overflow (despite the fact that we actually define signed integer overflow -- the idea was that overflowing integers can lead to UB later if they are used). Clearly safety was taken very seriously, even at the cost of performance.

When I look at some recent (and even less-recent) proposals, I feel that that thoroughness has gone, in favour of other things.

It should be clear that I'm very much in favour of stricter checking of unsafe code. It should still be possible to build safe code using unsafe constructs, but anything which expects you (the caller) to validate preconditions yourself should be very obvious at that call site. It isn't today.

taylorswift · June 7, 2023, 10:31pm

yeah, the only way this is safe is if you await on the task (or use some other kind of synchronization with the deinit procedure) and we cannot suspend within a deinit.

scanon · June 7, 2023, 10:40pm

This is (at least from my perspective) a complete misapprehension of history. Recent proposals are much more carefully written and vetted than they ever were in the early days of Swift, and Swift was never a language that only permitted safe operations. If anything, we have been steadily removing unsafely holes in the language and library over time. We removed the unsafe arithmetic operations, we're slowly sanding down Unsafe[Buffer]Pointer to make it less pointy, the compiler and library have gotten much more sophisticated at flagging memory safety hazards like unaligned access and type confusion, while adding much clearer tools to do them explicitly when needed.

Some proposals are for things that cannot be done in safe code with the language we have today. One option is to sit and wait until the language has the necessary features, but in the meantime people will still do those unsafe things, and in general they will implement them with more bugs and much worse code coverage than we'll get by implementing them unsafely in the standard library today, with the tools we have. We can adopt new, better versions as the language evolves.

tera · June 7, 2023, 10:43pm

Good point. Something like so I think will do:

unsafe func bar() { ... }

func baz() { // no unsafe means safe
    // bar() // 🛑 error: can't call unsafe function from safe function
    i_solemnly_swear_its_safe {
        bar() // ✅ ok
    }
}

beccadax · June 7, 2023, 10:45pm

Yup, that’s the idea. But maybe we’d spell it unsafe do { … } or something.

Karl · June 7, 2023, 10:52pm

So are you saying that I'm just making up that the core team rejected the "unsafe" label for wCSIA or withUTF8? That nobody ever suggested that those unsafe operations should be called out? Or that for actor executors, it wasn't suggested that we can't explore safer designs because we need to ship something in 5.9 (as we now know, for the new SwiftData framework)?

There have been improvements to ownership, that's true, and that's good. But I still think we let far too many things through and do not make safety enough of a priority in the design of new features. Safety holes have been added as well as removed, and as I've already said, the holes are much more subtle because they sometimes just neglect to carry the "unsafe" moniker (or, worse IMO, they use terms like "unowned" as a substitute for "unsafe", despite "unowned" in Swift generally being safe).

It's easy (or at least much easier) to design systems that work when people do what you expect; it's much more difficult to design systems that are guaranteed to fail when somebody ignores the rules.

scanon · June 7, 2023, 11:00pm

I'm saying that you're over-indexing on a couple decisions that you disagree with, and missing the bigger picture of a language that has very clearly been driving towards more and more safety for most of a decade. The improvements are not just in ownership. They are all over the language and libraries and tooling, are too numerous to list, and the overwhelming majority of them have made Swift a safer language, even when they involve unsafe constructs.

I would also note that we're looking pretty seriously about turning on bounds-checking for UnsafeBufferPointers, which would entirely eliminate one of the major classes of unsafety that these very necessary API introduce.

Karl · June 7, 2023, 11:15pm

Perhaps this more forgiving, big-picture approach should also apply when we judge other language ecosystems? As was being discussed in the post this thread was spun-off from...

lack of memory safety is a major defect. Full stop.

if we can't be honest about the defects in the tools we work with, we're going to shoot our damn feet off, and that's not OK.

But apparently we're accused of "over-indexing" if we point out deficiencies in Swift... even deliberate, systematic deficiencies (well, some of us apparently can, some of us apparently are not allowed)

tera · June 7, 2023, 11:25pm

In my codebase I take a practical approach: if (when) I'm disagreeing with the language / framework I just change it locally, in this case that would be (not that I am literally doing this, just as an example):

extension String {
    public mutating func withUnsafeUTF8<R>(_ body: (UnsafeBufferPointer<UInt8>) throws -> R) rethrows -> R {
        try withUTF8(body)
    }
}

The only drawback is that I can't mark withUTF8 deprecated somehow (other than using some external lint tool which I don't usually do):

extension String {
    public mutating func withUnsafeUTF8<R>(_ body: (UnsafeBufferPointer<UInt8>) throws -> R) rethrows -> R {
        // somehow call through the original implementation 🤔
    }
    @available(*, deprecated, renamed: "withUnsafeUTF8")
    public mutating func withUTF8<R>(_ body: (UnsafeBufferPointer<UInt8>) throws -> R) rethrows -> R {
        // somehow call through the original implementation 🤔
    }
}

I did the spin-off hoping to have a thread focused on issues related to unsafe colour idea.

Exploring the colourisation idea further. The escape hatch to "change colour" is alright for safe/unsafe, although in some other instances it shouldn't be allowed, for example:

takesLocks func bar() {
    lock.lock()
    ...
    lock.unlock()
} 

noLocks func baz() {
    i_solemnly_swear_its_fine { // 🤔. No
        bar()
    }
}

Joe_Groff · June 7, 2023, 11:37pm

tera:

Exploring the colourisation idea further. The escape hatch to "change colour" is alright for safe/unsafe, although in some other instances it shouldn't be allowed, for example:
takesLocks func bar() {
    lock.lock()
    ...
    lock.unlock()
} 

noLocks func baz() {
    i_solemnly_swear_its_fine { // 🤔. No
        bar()
    }
}

In the case of the performance annotations, the escape hatch already exists as _unsafePerformance { }. Fundamentally, it's on you to be honest when you use it, since otherwise it's not really an "escape hatch" if the compiler still gets in your way when you use it. (Best effort optional static analysis and/or dynamic checking could still be useful nonetheless.)

For unsafe, I think D and Rust both have a decent model to emulate: unsafe things either have to happen inside of an unsafe { } block, which confines the unsafe-ness to the block (and it's on the programmer to ensure that's the case), or in a function that is marked unsafe on the outside, which means that the unsafety propagates to callers which must in turn either be unsafe functions or use unsafe blocks to call them. That's not so different in nature from the relationship of unsafePerformance to the performance annotations. It would also make sense IMO to have a compiler flag to ban unsafe in both forms, which could be used by the build system to allow projects to disallow unsafe code when building untrusted dependencies.

tera · June 7, 2023, 11:44pm

Good to know. Just the name is not ideal IMHO:

func foo() { // safe
    unsafe {
        ... some unsafe stuff here ...
    }
}
// hmm, looks like safe function can call unsafe stuff?

IMHO better name would be something along the lines:

func foo() { // safe
    i_know_this_is_safe {
        ... some unsafe stuff here ...
    }
}

obviously a different name, but with the connotation that this is actually a "safe" block.

maybe even:

not unsafe { ... }

or just:

safe { ... }

Joe_Groff · June 7, 2023, 11:50pm

The problem with that is that there isn't a good language rule to distinguish "actually safe with an unsafe implementation" from "supposed to be safe but has safety holes" (since if there were, we'd ideally be implementing the function in a way that's safe by construction rather than resorting to unsafe code). In my opinion, the purpose of the unsafe { } block (or however you want to spell it) is to provide an easily-searchable signal for auditors to find and focus on the unsafe parts of a codebase. From that perspective, having a consistent spelling for the "this function is outwardly unsafe" annotation and the "this block contains unsafe code that hopefully is self contained in its unsafeness" block is a feature, so you only have to search one thing to find both flavors.