Unsafe Functions

Alejandro · February 4, 2019, 7:43pm

Following the discussion about pointer type sugar (Int* which would be sugar for UnsafePointer<Int> etc. here: Swift Pointer Syntatic Sugar), Slava mentioned "unsafe" blocks + function annotation where such sugar could be utilized. This is a feature that is found in languages like Rust and C# where functions can be marked unsafe which allows their whole body to use things like pointer arithmetic or dereferencing raw pointers, etc. In order to call an "unsafe" function, one must either call it in an unsafe function, or within an unsafe block. Rust, for example, has "safe" Rust and "unsafe" Rust. Safe Rust guarantees there is no undefined behavior whereas unsafe Rust does not hold guarantee. Swift on the other hand is considered a "safe" language, but includes many unsafe features. Many of these features include the pointer types whose names all include an Unsafe prefix, or the withUnsafe family of functions.

One of the reasons why this feature could make sense going forward is that it might allow for twiddling with pointers in a way that might make working with them more concise like C. It might also allows us to make some bold claims like Rust and say safe Swift does not introduce undefined behavior. Currently you can mess around with unsafe pointers, but there are some operations that are inherently safe and some that aren't, so it's difficult to really understand which APIs have the ability to introduce UB and those that don't. I've fiddled with this idea enough to come up with an early implementation to play around with. Below are some examples of this in action:

// No pointer type sugar in this pitch/discussion
// just simply unsafe functions
unsafe func add1(to ptr: UnsafeMutableRawPointer) {
  // Unsafe and probably wrong
  // Add 1 to the first byte to this pointer
  ptr.assumingMemoryBound(to: UInt8.self).pointee &+= 1
}

var x = 0

// error: call is unsafe but is not marked with 'unsafe'
add1(to: &x)

// Nothing unsafe or worthy of throws, but for the sake
// of displaying why this was a good fit for do statements
unsafe func zero() throws -> Int {
  return 0
}

// Unsafe do block
//
// This flows nicely with throwing unsafe functions
unsafe do {
  // ok because this is inside an unsafe context
  add1(to: &x)

  // ok because this is inside an unsafe context
  let x = try zero()
} catch {
  // It is also acceptable to use unsafe functions without
  // marking them unsafe in the catch block of an 'unsafe' do
}

print(x) // 1 (if little endian)

// Unsafe expression
//
// This differs from Rust and C#, but allows for
// staying in the same scope while also ensuring
// the caller understands this is an unsafe operation.
unsafe add1(to: &x)

print(x) // 2 (if little endian)

// warning: no calls to unsafe functions occur within 'unsafe do'
//          statement
unsafe do {
  print(x) // 2 (if little endian)
}

// Adding 1 to enums
enum ABC {
  case a, b, c
}

var y = ABC.c

unsafe add1(to: &y) // this is probably very unsafe

print(y) // ()

switch y {
case .a:
  print(1)
case .b:
  print(2)
case .c:
  print(3)
}

// 1

// stdlib function to get ptr to value
unsafe prefix func &<T>(_ value: inout T) -> UnsafeMutablePointer<T> {
  // implementation here
}

var a = 0
let b = unsafe &a

For this feature to really make senes, it would require some stdlib inclusion (auditing unsafe apis and possibly marking them 'unsafe'), or something like Rust's model with FFIs. FFIs (Foreign Function Interface) in Rust are all treated as unsafe functions and must be used within unsafe blocks because Rust can't guarantee that calling them won't introduce UB, thus 'unsafe'. Because this is a simple annotation, we could mark some stdlib functions as unsafe without affecting ABI, but might affect source stability if we produced an error (we may be able to emit a warning for the stdlib to preserve SS).

This is intended to be more of a "pre-pitch"/discussion post regarding the merits of such an annotation in Swift rather than an actual proposal ready to go. I'm very interested in hearing thoughts and opinions about exploring ideas regarding unsafe Swift and whether this could make sense going forward. (I can also see a world where we may introduce inline assembly () and it would make sense to only allow such statements within unsafe contexts)

Karl · February 4, 2019, 8:14pm

Alejandro:

unsafe func add1(to ptr: UnsafeMutableRawPointer) {
  // Unsafe and probably wrong
  // Add 1 to the first byte to this pointer
  ptr.assumingMemoryBound(to: UInt8.self).pointee += 1
}

So, if I understand Swift's memory model (and to be perfectly honest, I'm not 100% sure I do), this is not defined behaviour. I don't think we really have very good high-level documentation for this, and right now memory binding is a no-op with no runtime sanitisers to help you catch mistakes, so it's common for people to get frustrated fighting the type-checker. If you understand what's going on, I think Swift provides a pretty elegant interface to work with memory safely.

Also, FWIW, new ARM chips will include memory-tagging. So safe memory access may be enforced some way in the future.

The best docs are the source comments. Pasting snippets for reference:

/// Binds the memory to the specified type and returns a typed pointer to the
/// bound memory.
///
/// Use the `bindMemory(to:capacity:)` method to bind the memory referenced
/// by this pointer to the type `T`. The memory must be uninitialized or
/// initialized to a type that is layout compatible with `T`. If the memory
/// is uninitialized, it is still uninitialized after being bound to `T`.
///
/// ...
///
/// - Warning: A memory location may only be bound to one type at a time. The
///   behavior of accessing memory as a type unrelated to its bound type is
///   undefined.
///
/// ...
public func bindMemory<T>(to type: T.Type, capacity count: Int) -> UnsafeMutablePointer<T>

/// Returns a typed pointer to the memory referenced by this pointer,
/// assuming that the memory is already bound to the specified type.
///
/// Use this method when you have a raw pointer to memory that has *already*
/// been bound to the specified type. The memory starting at this pointer
/// must be bound to the type `T`. Accessing memory through the returned
/// pointer is undefined if the memory has not been bound to `T`. To bind
/// memory to `T`, use `bindMemory(to:capacity:)` instead of this method.
///
/// ...
public func assumingMemoryBound<T>(to: T.Type) -> UnsafeMutablePointer<T>

Note the documentation for assumingMemoryBound goes to great pains to explain not to use the pattern you're using.

If you already have a typed pointer and want to switch types, you want to use withMemoryRebound:

/// Executes the given closure while temporarily binding the memory referenced 
/// by this buffer to the given type.
///
/// Use this method when you have a buffer of memory bound to one type and
/// you need to access that memory as a buffer of another type. Accessing
/// memory as type `T` requires that the memory be bound to that type. A
/// memory location may only be bound to one type at a time, so accessing
/// the same memory as an unrelated type without first rebinding the memory
/// is undefined.
///
/// ...
///
/// After executing `body`, this method rebinds memory back to the original
/// `Element` type.
///
/// - Note: Only use this method to rebind the buffer's memory to a type
///   with the same size and stride as the currently bound `Element` type.
///   To bind a region of memory to a type that is a different size, convert
///   the buffer to a raw buffer and use the `bindMemory(to:)` method.
///
/// ...
public func withMemoryRebound<T, Result>(to type: T.Type, _ body: (UnsafeBufferPointer<T>) throws -> Result) rethrows -> Result

Although now that I look at the docs for that function, I think that note about using bindMemory for types of different sizes is wrong. Converting to a raw buffer is a good idea, but you should use the load and storeBytes methods instead of binding the memory to another type. AFAIK creating the raw pointer over bound memory does not "unbind" the memory. @Andrew_Trick?

Alejandro · February 4, 2019, 8:54pm

Oh for sure. This was merely an example to show that it can be pretty easy to reach undefined behavior territory without "feeling" the implications of such operations and just being an example for unsafe functions. Arguably anyone using unsafe pointers should read the documentation for anything and everything they do, but that won't always be the case.

Andrew_Trick · February 4, 2019, 9:50pm

@Karl withMemoryRebound requires that the memory capacity not change because it doesn't do the math on MemoryLayout.stride to figure out the ratio. You can do that manually with bindMemory but using that API correctly is even harder.

As for

ptr.assumingMemoryBound(to: UInt8.self).pointee += 1

...don't do this. The safe thing is:

ptr.storeBytes(of: ptr.load(as: UInt8.self) + 1, as: UInt8.self)

assumingMemoryBound was supposed to be so ugly that no one would use it, but the safe thing is still uglier.

I don't want to make this post entirely about memory binding, since the OP makes a very good point without delving into that, but since it came up ;)

Here's a summary of the terrible three memory binding APIs. At this point, they serve as a minimal formalization of the model for circumventing Swift pointer aliasing rules. Swift pointers are strict primarily because, to be safe in the presence of C code, they need to more strict than C, and adding special cases to the language (to mimic C) I believe just encourages more rule breaking. Finding undefined behavior related to pointer types in a Swift project boils down to grepping for any of these APIs, which is actually much better than the likely alternative in which implicit casting could induce undefined behavior.

The usability and egonomics just aren't there, partly because

Higher priorities
With the exception of calling few legacy C APIs (pthread/socket) well-designed code doesn't use type punning
It's hard to motivate designing a feature that we don't want anyone to use.
We need to see what people really want before designing it

I do think it will be much easier for users to know what to reach for when we introduce something like ReinterpretedPointer.

`assumingMemoryBound(to:)`

I consider any Swift code calling assumingMemoryBound to be unsafe unless:

It is accompanied by a comment explaining why the assumption holds in this particular case.
That logic can be easily confirmed by locally reasoning about the code.

I know of two expected use cases for assumingMemoryBound:

Writing a wrapper around a C API taking a void * callback such as pthread_create.
Passing a Swift Unsafe[Mutable]RawPointer to a C API taking char * or unsigned char *.

In both cases, the typed pointer produced by assumingMemoryBound(to:) should be passed directly to the imported C code without accessing it in Swift. There's always a danger that the imported C function could by replaced by a Swift shim. However, I don't know how to avoid assumingMemoryBound(to:) in these cases without special support in the Swift type system, which is a fairly high bar.

Within the standard library, there are legitimate uses of assumingMemoryBound in which a raw buffer's bound type can be dynamically determined with some tag or discrimiator bit. I did't count that as an "expected" use case though. In the future, I think it would be better to do this sort of thing with a ReinterpretedPointer<T> type.

`withMemoryRebound(to:capacity:)`

withMemoryRebound(to:capacity:) is safer than assumingMemoryBound, but also only intended for C interop when two distinct imported C type are known to have a compatible layout. The canonical example of this from the original Swift 3 migration guide is:

var addr = sockaddr_in()
let sock = socket(PF_INET, SOCK_STREAM, 0)

let result = withUnsafePointer(to: &addr) {
  // Temporarily bind the memory at &addr to a single instance of type sockaddr.
  $0.withMemoryRebound(to: sockaddr.self, capacity: 1) {
    connect(sock, $0, socklen_t(MemoryLayout<sockaddr_in>.stride))
  }
}

`bindMemory(to:capacity)`

bindMemory(to:capacity) is the only of the three meant for use in pure Swift. It allows Swift code to decouple memory allocation from knowledge of the type that memory will hold. So, it's useful for layering a raw memory allocator, or raw byte buffer underneath a strictly typed view of the buffer.

In a recent thread about SwiftNIO, an unanticipated need for bindMemory(to:capacity:) for C interop came up. The C struct z_stream exposes a typed pointer into a raw byte buffer.

struct z_stream {
  unsigned char *next_in;
  //...
}

zstream.next_in = dataPtr.bindMemory(to: UInt8.self, capacity: count)

I'm not sure how else to get around this without defining your own layout compatible z_stream struct in C that takes void *. Then you would need to deal with the additional type mismatch when calling zlib.

Another way to deal with this would be adding some type system support for importing [unsigned] char * "differently" in select cases.

t-ae · February 5, 2019, 1:48am

do-catch block requires try for all throwable calls.
But this unsafe block requires no such things.
It'll be unclear which call is unsafe, isn't?

But requiring keyword in unsafe block would be preposterous if we also have unsafe expression.

unsafe do {
    // requires `unsafe` keyword for example
    unsafe someFuncA()
    unsafe someFuncB()
}

// We don't need block
unsafe someFuncA()
unsafe someFuncB()