New function colour: unsafe

bbrk24 · June 9, 2023, 5:50pm

I will note that unsafe structs, unlike pointers, are not viral in C#. In that regard, it’s more like D’s @trusted than @system, if the descriptions upthread are correct. One side effect of this is that you can implement manual refcounting in C# in a way such that Retain() and Release() can be called from “safe” code, by wrapping the pointer in an unsafe struct.

tera · June 9, 2023, 5:54pm

I'd say if it's literally "unsafe struct/class/enum S { ... }" - then it should be viral and prohibited to be held in safe functions (otherwise why were those types marked unsafe to begin with).

And if it's merely:

struct S {
    unsafe func foo() {...}
    unsafe func bar() {...}
}

then I can safely hold values/instances of that type in a safe function and it's only unsafe to call those unsafe methods inside the type.

bbrk24 · June 9, 2023, 5:58pm

If you aren’t doing anything with it, and just moving it between @trusted functions A and B, that could be fine. In that sense, marking the type unsafe would be equivalent to marking every field/method unsafe, without it necessarily being unsafe to hold. I’m not convinced that’s the best approach in general, though.

tera · June 9, 2023, 6:04pm

I don't like this approach. Is it to prefer brevity over clarity? Imagine you did that and have:

unsafe struct S {
    // a bunch of methods
}

and now you want to add a single safe method. What would you do? Reintroduce all those "unsafe" words before "func"? or this?

unsafe struct S {
    // a bunch of methods
    safe func bar() {...}
}

looks like a complication to me. IMHO there should be some difference between:

unsafe struct S {
    func foo() {...}
    func bar() {...}
}

and

struct S {
    unsafe func foo() {...}
    unsafe func bar() {...}
}

and this difference can well be whether I can or can't hold its value in safe code.

bbrk24 · June 9, 2023, 6:04pm

Has unsafe deinit been brought up yet? Like would the following be allowed?

struct StaticLifetimeArray<T>: ~Copyable {
  unsafe let ptr: UnsafeMutableBufferPointer<T>

  unsafe deinit {
    free(ptr)
  }
}

tera · June 9, 2023, 6:22pm

I'd say unsafe should be allowed on inits / deinits / subscripts / operators / dynamic vars with get/set/didSet and closures (†).

Good question should we have unsafe on types (struct/class/enum) and var/let (as in your example)? Perhaps, for simplicity we should not (unless anyone has a killer example). "unsafe func" (with † variations) should be a good start. So we can end up having an analogue of "UnsafePointer" which will not be marked unsafe itself, just its operations.

Geordie_J · June 14, 2023, 2:37pm

FWIW I like the direction suggested in this thread, would be +1 to a proposal if we can sort out the specifics.

Also FWIW I think Rust gets it right here. The assertion that code in the WHATEVER_WE_CALL_IT block is safe is literally and exclusively (from the perspective of the proposed feature) not the case.

The point of the unsafe block in Rust is not to make an assertion about (un)safety. It’s not saying “this is wrong!” but instead indicates to the reader that this particular part of the codebase has a higher likelihood (or any likelihood) of memory safety bugs. It’s saying “here is a part of the code that could be unsafe”, just like the unsafe func declaration would.

By the logic presented in the thread, we should also write safe func to assert that, “yes, we’re using unsafe APIs, but we’re doing it safely” - to be very clear, I humbly disagree with that direction.

We can assert all we like that something is “safe” even though it uses unsafe constructs. That’s obviously our goal as engineers. But the whole point of the safety features is that we sometimes get things wrong. The unsafe marker (and the unsafeXYZ naming before it) is indicating the function admittedly might not be safe, as opposed to other code that can be reasonable expected to be safe by default.

tera · June 14, 2023, 3:24pm

I truly believe the two markers require very different (even the opposite) level of attention:

unsafe func foo() {
    // we don't need to look inside it.
    // we can even have this function body collapsed.
    // it may be as unsafe as it wants.
    // we would not be allowed to use this call if we are writing safe software
    // e.g. that controls nuclear power stations (compiler just won't allow us).
}

vs

func foo() { // safe by default
    unchecked_safe {
        // in this case compiler will not prohibit us calling this block from safe functions.
        // even if we are writing software that control nuclear power stations.
        // thus we must verify every single line of it and do this on every change.
        // the level of attention to this code is "absolute".
    }
}

Karl · June 14, 2023, 3:26pm

It's important to be clear what unsafe code actually is. We throw around these terms "safe" and "unsafe" to the point where they kind of become like slogans, where "safe" is a synonym for "good" and "unsafe" is a synonym for "bad". Why would anybody choose the bad thing?! We shouldn't lose sight of what these terms actually refer to.

Unsafe constructs are not broken by definition; they're not "please break my program" operations. There is a well-defined way to use them, and they are useful when used correctly; that's obviously the goal of whoever created the construct.

But almost all APIs come with some sort of preconditions (in the English language sense, not the Swift precondition function sense); they don't accept fully arbitrary inputs. For example, Array's subscript operation accepts an integer index -- but it doesn't support subscripting arbitrary integer values; the values you give it must correspond to occupied positions in the array.

The difference between a safe API and an unsafe API is that safe APIs validate those preconditions which are required for memory safety, while unsafe APIs rely on you to use them correctly (or write your own checks) and do not have any built-in precondition validation. If it is at all possible to use the unsafe construct (and we must assume it is), it is also possible to create a safe version which validates its preconditions.

Here's a concrete example - we often get people looking to load POD types from data buffers (e.g. load a UInt32 from position x). Currently that is only offered as an unsafe API -- but it is totally possible to use that primitive to write a safe version, by validating those preconditions described in the primitive's documentation which pertain to memory safety:

This function only supports loading trivial types. A trivial type does not contain any reference-counted property within its in-memory stored representation.

The memory to read for the new instance must not extend beyond the buffer pointer’s memory region—that is, offset + MemoryLayout<T>.size must be less than or equal to the buffer pointer’s count.

(AFAIK the other requirement, "The memory at offset bytes into the buffer must be laid out identically to the in-memory representation of T.", isn't required for memory safety specifically if we know that T is a trivial type, but obviously it's worth abiding by otherwise you'll read a bunch of safe, junk values).

extension Array where Element == UInt8 {
    func loadUnaligned<T>(from offset: Index = 0, as: T.Type) -> T {
        withUnsafeBufferPointer { buffer in
          precondition(offset + MemoryLayout<T>.size <= buffer.count)
          precondition(_isPOD(T.self))
          return UnsafeRawBufferPointer(buffer).loadUnaligned(fromByteOffset: offset, as: T.self)
        }
    }
}

Yes, we used unsafe constructs (again, constructs which do not validate their preconditions), but those constructs have knowable, sensible preconditions which we can validate ourselves to create a memory-safe construct.

Geordie_J · June 14, 2023, 3:30pm

I agree with everything you wrote there, so I'm not sure if that was directed at me or intended to be an extension of my post. Either way, we're on the same page

Geordie_J · June 14, 2023, 3:51pm

tera:

unsafe func foo() {
    // we don't need to look inside it.
    // we can even have this function body collapsed.
    // it may be as unsafe as it wants.
    // we would not be allowed to use this call if we are writing safe software
    // e.g. that controls nuclear power stations (compiler just won't allow us).
}

vs

func foo() { // safe by default
    unchecked_safe {
        // in this case compiler will not prohibit us calling this block from safe functions.
        // even if we are writing software that control nuclear power stations.
        // thus we must verify every single line of it and do this on every change.
        // the level of attention to this code is "absolute".
    }
}

The thing is that these two use cases are effectively the same. The unsafe marker indicates a part of the code where unsafe constructs may be used. By extension, you'd probably need to be able to also call other unsafe functions within the WHATEVER_IT_IS_CALLED { } block.

Since you can call your first foo() from the second without ceremony, both variants deserve equal levels of scrutiny.

So, calling it uncheckedSafe { } would be ok in my opinion (it seems much clearer and more semantically correct than safe { }). That said, I don't see the benefit in introducing a second signifier for really the same thing we're already expressing with unsafe func.

To give a concrete example,

extension Data {
    // anyone using this function shouldn't have to worry that what it's doing is actually unsafe
    // that's why it's not marked as unsafe. that's the "unchecked unsafe" thing in action.
    func getByteArray() -> [UInt8] {
        unsafe {
            return self.withUnsafeBytes { Array($0.assumingMemoryBound(to: UInt8.self)) }
        }
    }

On the other hand, the withUnsafeBytes itself should of course be marked with unsafe func because it's a (hopefully rare) part of the codebase that hands you a foot gun directly. The fact that it works internally with an unsafe type UnsafeRawBufferPointer necessitates that the function itself be marked as unsafe. You can use it within an unsafe block within an otherwise safe function, but doing so will (rightly) cause it to be scrutinised more closely in code review and while debugging.

tera · June 14, 2023, 4:05pm

I can see your point. To me the difference is: while you can call "unsafe foo" in "uncheckedSafe" (or whatever the name is) block you would only need to check it if it's actually called. Imagine you have tens of thousands of unsafe functions and only a few of them are getting called directly or indirectly from "uncheckedSafe" blocks. In a way the difference is similar to "trusted" v "system" in D, just there they decided to put a marker at the function level rather on a closure level.

Tino · June 14, 2023, 4:28pm

If this moves on, imo unsafe { // danger! } is just fine, and I would not like second keyword.
I'd see this as a marker that unsafe things may happen in the marked block, but the danger is confined in the braces — just like with try (it is not called "ensure_success" either).

tera · June 14, 2023, 4:33pm

Note that "try" and "throws" have different spellings. In languages having "nothrow" - again, it's a different spelling instead of trying to recycle existing "try" name.

Having said that, I'd like this to happen regardless of a particular naming scheme. Having the same "unsafe" name for both usages is better than not having this feature at all.

Karl · June 14, 2023, 5:04pm

I was somewhat extending your post, but also adding a bit more precision about how I think of it.

It’s not saying “this is wrong!” but instead indicates to the reader that this particular part of the codebase has a higher likelihood (or any likelihood) of memory safety bugs.

I don't entirely think it's saying that some code has a higher likelihood of safety bugs - at least, I'm not 100% happy with that formulation; it's about whether memory-safety-related preconditions are being checked or propagated to callers. The value of seeing unsafe to me isn't that I think "this is where the bugs probably are" but more "what do I need to do to use this correctly?".

In other words, I'd describe it like this:

An unsafe function has unchecked preconditions. To call such a function, you will either need to:

Mark that you have checked all of those requirements (in which case, your calling function can lose the unsafe colour). There are a variety of ways you might implement those checks - in order of robustness and cost, from precondition calls for release-mode checks, to assert calls for things you expect to be guaranteed in other ways, to code comments at the low end. The compiler isn't going to be able to verify the correctness of any of those checks; the point is that there's a clear border between safe/unsafe land where you are encouraged to explain why what you're doing is valid.

Or

Propagate some/all of the precondition checking to your own callers (in which case, your calling function must also be marked unsafe).

sveinhal · June 15, 2023, 11:53am

I do not think so. If unsafe is viral, then any usage of any unsafe functions or types will taint the calling function as well. This goes all the way up the call hierarchy, all the way to the top.

As an app developer, I will almost never write unsafe primitives, but I will sometimes consume them — and as a result I will have to mark my call site as unsafe. As part of my continuous work on rewriting and improving my app, I will need a way to search for unsafe stuff, so I can review or replace it. This is ongoing work, and as long as my Swift files contains the word unsafe, I want to easily find them and treat them with the same care. At least very similar care is needed.

tera · June 15, 2023, 12:14pm

Consider you have this bunch of "helper" functions:

unsafe abort() {
	// platform specific code that is very unsafe
}

unsafe func foo1() {
	if someCondition1() { abort() }
}

unsafe func foo2() {
	if someCondition2() { abort() }
}
...
unsafe func foo999() {
	if someCondition999() { abort() }
}

It so happens that in the main app you only use a fraction of those helper "library" functions, although you have all them in the project (linker would strip unused functions so code size is not a concern):

/*safe*/ func bar() {
	THE_MARKER_WE_ARE_TALKING_ABOUT {
		foo1()
	}
}

/*safe*/ func main() {
	bar()
}

How many "foo's" will you be paying attention to in this app? I'd say just one, not all 999 of them, and the way I'd gather the "set" of functions to audit would be: find all THE_MARKER_WE_ARE_TALKING_ABOUT marked blocks, and collect all unsafe calls they make (directly or indirectly).

tera · June 15, 2023, 12:37pm

Thanks for bringing this up. That's actually an idea of the having this option: following the precedent established with "try" / "await", mark all unsafe calls with "unsafe" at the call site (in addition to having a marker in the function signature):

unsafe func foo() { ... }
/*safe*/ func bar() { ... }
/*safe*/ func baz() { ... }

/*safe*/ func main() {
    THE_MARKER_WE_ARE_TALKING_ABOUT {
        bar()
        unsafe foo()
        baz()
    }
}

With this approach it is easier to see during the audits what's exactly is unsafe.

On this particular one I am on the fence. On one hand it might be too noisy on the call site, and instead we could use some fancy IDE highlighting for brevity. On the other hand clarity matters more than brevity, the IDE feature is not always the right approach, and we do exactly that for "try" and "await", so if there are cons they must have been outweighed by the pros.

sveinhal · June 15, 2023, 1:02pm

However many are in my source tree. If the unused functions are in a third party library, then I don't care about them, but then they also won't show up in my source code search. Just as unsafe system primitives. But if the unused functions are in my source tree, I want to find them just as I want to find other unsafe functions.

rvsrvs · June 15, 2023, 1:49pm

Out of curiousity, (and sorry if this has already been mentioned above) would you propose that there be a corresponding monadic type for unsafe? i.e given that I can make the following trivial transformations with the other function colors:

func catch<T, U>(f: (T) throws -> U) -> (T) -> Result<U, Swift.Error>
func wait<T, U>(f: @escaping (T) async -> U) -> (T) -> Task<U, Never>
func catchAndWait<T, U>(f: @escaping (T) async throws -> U) -> (T) -> Task<U, Swift.Error>

shouldn't there be an equivalent type for unsafe?