[RFC] UnsafeBytePointer API for In-Memory Layout

Hello Swift development community,

Swift does a great job of protecting against undefined behavior--as long as you avoid "unsafe" APIs, that is. However, unsafe APIs are important for giving developers control over implementation details and performance. Naturally, the contract between unsafe APIs and the optimizer is crucial. When a developer uses an unsafe API, the rules governing safe, well-defined behavior must be clear. On the opposite end, the optimizer must know which assumptions it can make based on those rules. Simply saying that anything goes because "unsafe" is in the name is not helpful to this effort.

For a long time, I've wanted these rules nailed down. We have more users taking advantage of advanced features, and more optimizations that take advantage of assumptions guided by the type system. This seems like a particularly good time to resolve UnsafePointer semantics, considering the type system and UnsafePointer work that's been going on recently. Strict aliasing is something I would like addressed. If we do nothing here, then we will end up by default inheriting C/C++ semantics, as with any language that relies on a C/C++ backend. In other words, developers will be forced to write code with technically undefined behavior and rely on the compiler to be smart enough to recognize and recover from common patterns. Or we can take advantage of this opportunity and instead adopt a sound memory model with respect to aliasing.

This proposal is only an RFC at this point. I'm sending it out now to allow for plenty of time for discussion (or advance warning). Keep in mind that it could change considerably before it goes up for review.

-Andy

UnsafeBytePointer API for In-Memory Layout

Proposal: SE-NNNN <https://github.com/atrick/swift-evolution/blob/voidpointer/proposals/XXXX-unsafebytepointer.md&gt;
Author(s): Andrew Trick <https://github.com/atrick&gt;
Status: Awaiting review <https://github.com/atrick/swift-evolution/blob/voidpointer/proposals/XXXX-unsafebytepointer.md#rationale&gt;
Review manager: TBD
<https://github.com/atrick/swift-evolution/blob/voidpointer/proposals/XXXX-unsafebytepointer.md#introduction&gt;Introduction

UnsafePointer and UnsafeMutable refer to a typed region of memory, and the compiler must be able to assume that UnsafePointer element (Pointee) type is consistent with other access to the same memory. See proposed Type Safe Memory Access documentation <https://github.com/atrick/swift/blob/type-safe-mem-docs/docs/TypeSafeMemory.rst&gt;\. Consequently, inferred conversion between UnsafePointer element types exposes an easy way to abuse the type system. No alternative currently exists for manual memory layout and direct access to untyped memory, and that leads to an overuse of UnsafePointer. These uses of UnsafePointer, which depend on pointer type conversion, make accidental type punning likely. Type punning via UnsafePointer is semantically undefined behavior and de facto undefined behavior given the optimizer's long-time treatment of UnsafePointer.

In this document, all mentions of UnsafePointer also apply to UnsafeMutablePointer.

<https://github.com/atrick/swift-evolution/blob/voidpointer/proposals/XXXX-unsafebytepointer.md#motivation&gt;Motivation

To avoid accidental type punning, we should prohibit inferred conversion between UnsafePointer<T> and UnsafePointer<U> unless the target of the conversion is an untyped or nondereferenceable pointer (currently represented as UnsafePointer<Void>).

To support this change we should introduce a new pointer type that does not bind the type of its Pointee. Such a new pointer type would provide an ideal foundation for an API that allows byte-wise pointer arithmetic and a legal, well-defined means to access an untyped region of memory.

As motivation for such an API, consider that an UnsafePointer<Void> or OpaquePointer may be currently be obtained from an external API. However, the developer may know the memory layout and may want to read or write elements whose types are compatible with that layout. This a reasonable use case, but unless the developer can guarantee that all accesses to the same memory location have the same type, then they cannot use UnsafePointer to access the memory without risking undefined behavior.

An UnsafeBytePointer example, using a new proposed API is included below.

<https://github.com/atrick/swift-evolution/blob/voidpointer/proposals/XXXX-unsafebytepointer.md#proposed-solution&gt;Proposed solution

Introduce an UnsafeBytePointer type along with an API for obtaining a UnsafeBytePointer value at a relative byte offset and loading and storing arbitrary types at that location.

Statically prohibit inferred UnsafePointer conversion while allowing inferred UnsafePointer to UnsafeBytePointerconversion.

UnsafeBytePointer meets multiple requirements:

An untyped pointer to memory
Pointer arithmetic within byte-addressable memory
Type-unsafe access to memory (legal type punning)
UnsafeBytePointer will replace UnsafeMutablePointer<Void> as the representation for untyped memory. For API clarify we could consider a typealias for VoidPointer. I don't think a separate VoidPointer type would be useful--there's no danger that UnsafeBytePointer will be casually dereferenced, and don't see the danger in allowing pointer arithmetic since the only reasonable interpretation is that of a byte-addressable memory.

Providing an API for type-unsafe memory access would not serve a purpose without the ability to compute byte offsets. Of course, we could require users to convert back and forth using bitPatterns, but I think that would be awkward and only obscure the purpose of the UnsafeBytePointer type.

In this proposal, UnsafeBytePointer does not specify mutability. Adding an UnsafeMutableBytePointer would be straightforward, but adding another pointer type needs strong justification. I expect to get input from the community on this. If we agree that the imported type for const void* should be UnsafeBytePointer, then we probably need UnsafeMutablePointer to handle interoperability.

<https://github.com/atrick/swift-evolution/blob/voidpointer/proposals/XXXX-unsafebytepointer.md#detailed-design&gt;Detailed design

The public API is shown here. For details and comments, see the unsafeptr_convert branch <https://github.com/atrick/swift/commits/unsafeptr_convert&gt;\.

struct UnsafeBytePointer : Hashable, _Pointer {

  let _rawValue: Builtin.RawPointer

  var hashValue: Int {...}

  init<T>(_ : UnsafePointer<T>)
  init<T>(_ : UnsafeMutablePointer<T>)
  init?<T>(_ : UnsafePointer<T>?)
  init?<T>(_ : UnsafeMutablePointer<T>?)

  init<T>(_ : OpaquePointer<T>)
  init?<T>(_ : OpaquePointer<T>?)

  init?(bitPattern: Int)
  init?(bitPattern: UInt)

  func load<T>(_ : T.Type) -> T

  @warn_unused_result
  init(allocatingBytes size: Int, alignedTo: Int)

  @warn_unused_result
  init<T>(allocatingCapacity count: Int, of: T.Type)

  func deallocateBytes(_ size: Int, alignedTo: Int)

  func deallocateCapacity<T>(_ num: Int, of: T.Type)

  // Returns a pointer one byte after the initialized memory.
  func initialize<T>(with newValue: T, count: Int = 1) -> UnsafeBytePointer

  // Returns a pointer one byte after the initialized memory.
  func initialize<T>(from: UnsafePointer<T>, count: Int) -> UnsafeBytePointer

  func initializeBackward<T>(from source: UnsafePointer<T>, count: Int)

  func deinitialize<T>(_ : T.Type, count: Int = 1)
}

extension OpaquePointer {
  init(_ : UnsafeBytePointer)
}

extension Int {
  init(bitPattern: UnsafeBytePointer)
}

extension UInt {
  init(bitPattern: UnsafeBytePointer)
}

extension UnsafeBytePointer : RandomAccessIndex {
  typealias Distance = Int

  func successor() -> UnsafeBytePointer
  func predecessor() -> UnsafeBytePointer
  func distance(to : UnsafeBytePointer) -> Int
  func advanced(by : Int) -> UnsafeBytePointer
}

func == (lhs: UnsafeBytePointer, rhs: UnsafeBytePointer) -> Bool

func < (lhs: UnsafeBytePointer, rhs: UnsafeBytePointer) -> Bool

func + (lhs: UnsafeBytePointer, rhs: Int) -> UnsafeBytePointer

func + (lhs: Int, rhs: UnsafeBytePointer) -> UnsafeBytePointer

func - (lhs: UnsafeBytePointer, rhs: Int) -> UnsafeBytePointer

func - (lhs: UnsafeBytePointer, rhs: UnsafeBytePointer) -> Int

func += (lhs: inout UnsafeBytePointer, rhs: Int)

func -= (lhs: inout UnsafeBytePointer, rhs: Int)
Occasionally, we need to convert from a UnsafeBytePointer to an UnsafePointer. This should only be done in very rare circumstances when the author understands the compiler's strict type rules for UnsafePointer. Although this could be done by casting through an OpaquePointer, an explicit, designated unsafe pointer cast API would makes the risks more obvious and self-documenting. For example:

extension UnsafePointer {
  init(_ from: UnsafeBytePointer, toPointee: Pointee.type)
}
extension UnsafeMutablePointer {
  init(_ from: UnsafeBytePointer, toPointee: Pointee.type)
}
Similarly, conversion between UnsafePointer types must now be spelled with an explicitly Pointee type:

extension UnsafePointer {
  init<U>(_ from: UnsafePointer<U>, toPointee: Pointee.Type)
  init<U>(_ from: UnsafeMutablePointer<U>, toPointee: Pointee.Type)
}
extension UnsafeMutablePointer {
  init<U>(_ from: UnsafeMutablePointer<U>, toPointee: Pointee.Type)
}
<https://github.com/atrick/swift-evolution/blob/voidpointer/proposals/XXXX-unsafebytepointer.md#impact-on-existing-code&gt;Impact on existing code

The largest impact of this change is that void* and const void* are imported as UnsafeBytePointer. This impacts many public APIs, but with implicit argument conversion should not affect typical uses of those APIs.

Any Swift projects that rely on type inference to convert between UnsafePointer types will need to take action. The developer needs to determine whether type punning is necessary. If so, they must migrate to the UnsafeBytePointer API. Otherwise, they can work around the new restriction by using a toPointee, or mutating label.

Disallowing inferred UnsafePointer direct conversion requires some standard library code to use an explicit toPointeelabel for unsafe conversions that may violate strict aliasing.

All occurrences of Unsafe[Mutable]Pointer<Void> in the standard library are converted to UnsafeBytePointer. e.g. unsafeAddress() now returns UnsafeBytePointer, not UnsafePointer<Void>.

Some occurrences of Unsafe[Mutable]Pointer<Pointee> in the standard library are replaced with UnsafeBytePointer, either because the code was playing too loosely with strict aliasing rules, or because the code actually wanted to perform pointer arithmetic on byte-addresses.

StringCore.baseAddress changes from OpaquePointer to UnsafeBytePointer because it is computing byte offsets and accessing the memory. OpaquePointer is meant for bridging, but should be truly opaque; that is, nondereferenceable and not involved in address computation.

The StringCore implementation does a considerable amount of casting between different views of the String storage. The current implementation already demonstrates some awareness of strict aliasing rules. The rules are generally followed by ensuring that the StringBuffer only be accessed using the appropriate CodeUnit within Swift code. For interoperability and optimization, String buffers frequently need to be cast to and from CChar. This is valid as long access to the buffer from Swift is guarded by dynamic checks of the encoding type. These unsafe, but dynamically legal conversion points will now be labeled with toPointee.

CoreAudio utilities now use an UnsafeBytePointer.

<https://github.com/atrick/swift-evolution/blob/voidpointer/proposals/XXXX-unsafebytepointer.md#implementation-status&gt;Implementation status

On my unsafeptr_convert branch <https://github.com/atrick/swift/commits/unsafeptr_convert&gt;, I've made most of the necessary changes to support the addition of UnsafeBytePointerand the removal of inferred UnsafePointer conversion.

There are a several things going on here in order to make it possible to build the standard library with the changes:

A new UnsafeBytePointer type is defined.

The type system imports void* as UnsafeBytePointer.

The type system handles implicit conversions to UnsafeBytePointer.

UnsafeBytePointer replaces both UnsafePointer<Void> and UnsafeMutablePointer<Void>.

The standard library was relying on inferred UnsafePointer conversion in over 100 places. Most of these conversions now either take an explicit label, such as 'toPointee', 'mutating'. Some have been rewritten.

Several places in the standard library that were playing loosely with strict aliasing or doing bytewise pointer arithmetic now use UnsafeBytePointer instead.

Explicit labeled Unsafe[Mutable]Pointer initializers are added.

The inferred Unsafe[Mutable]Pointer conversion is removed.

TODO:

Once this proposal is accepted, and the rules for casting between pointers types have been decided, we need to finish implementing the type system support. The current implementation (intentionally) breaks a few tests in pointer_conversion.swift. We also need to ensure that interoperability requirements are met. Currently, many argument casts to be explicitly labeled. The current implementation also makes it easy for users to hit an "ambiguous use of 'init'" error when relying on implicit argument conversion.

Additionally:

A name mangled abbreviation needs to be created for UnsafeBytePointer.

The StringAPI tests should probably be rewritten with UnsafeBytePointer.

The NSStringAPI utilities and tests may need to be ported to UnsafeBytePointer

The CoreAudio utilities and tests may need to be ported to UnsafeBytePointer.

<https://github.com/atrick/swift-evolution/blob/voidpointer/proposals/XXXX-unsafebytepointer.md#alternatives-considered&gt;Alternatives considered

<https://github.com/atrick/swift-evolution/blob/voidpointer/proposals/XXXX-unsafebytepointer.md#existing-workaround&gt;Existing workaround

In some cases, developers can safely reinterpret values to achieve the same effect as type punning:

let ptrI32 = UnsafeMutablePointer<Int32>(allocatingCapacity: 1)
ptrI32[0] = Int32()
let u = unsafeBitCast(ptrI32[0], to: UInt32.self)
Note that all access to the underlying memory is performed with the same element type. This is perfectly legitimate, but simply isn't a complete solution. It also does not eliminate the inherent danger in declaring a typed pointer and expecting it to point to values of a different type.

<https://github.com/atrick/swift-evolution/blob/voidpointer/proposals/XXXX-unsafebytepointer.md#discarded-alternatives&gt;Discarded alternatives

We considered adding a typePunnedMemory property to the existing Unsafe[Mutabale]Pointer API. This would provide a legal way to access a potentially type punned Unsafe[Mutabale]Pointer. However, it would certainly cause confusion without doing much to reduce likelihood of programmer error. Furthermore, there are no good use cases for such a property evident in the standard library.

The opaque _RawByte struct is a technique that allows for byte-addressable buffers while hiding the dangerous side effects of type punning (a _RawByte could be loaded but it's value cannot be directly inspected). UnsafePointer<_RawByte> is a clever alternative to UnsafeBytePointer. However, it doesn't do enough to prevent undefined behavior. The loaded _RawByte would naturally be accessed via unsafeBitCast, which would mislead the author into thinking that they have legally bypassed the type system. In actuality, this API blatantly violates strict aliasing. It theoretically results in undefined behavior as it stands, and may actually exhibit undefined behavior if the user recovers the loaded value.

To solve the safety problem with UnsafePointer<_RawByte>, the compiler could associate special semantics with a UnsafePointer bound to this concrete generic parameter type. Statically enforcing casting rules would be difficult if not impossible without new language features. It would also be impossible to distinguish between typed and untyped pointer APIs. For example, UnsafePointer<T>.load<U> would be a nonsensical vestige.

<https://github.com/atrick/swift-evolution/blob/voidpointer/proposals/XXXX-unsafebytepointer.md#alternate-proposal-for-void-type&gt;Alternate proposal for void* type

Changing the imported type for void* will be somewhat disruptive. Furthermore, this proposal currently drops the distinction between void* and const void*--an obvious loss of API information.

We could continue to import void* as UnsafeMutablePointer<Void> and const void* as UnsafePointer<Void>, which will continue to serve as an "opaque" untyped pointer. Converting to UnsafeBytePointer would be necesarry to perform pointer arithmetic or to conservatively handle possible type punning.

This alternative is much less disruptive, but we are left with two forms of untyped pointer, one of which (UnsafePointer) the type system somewhat conflates with typed pointers.

Given the current restrictions of the language, it's not clear how to statically enforce the necessary rules for castingUnsafePointer<Void> once general UnsafePointer<T> conversions are disallowed. The following conversions should be inferred, and implied for function arguments (ignoring mutability):

UnsafePointer<T> to UnsafePointer<Void>

UnsafePointer<Void> to UnsafeBytePointer

I did not implement this simpler design because my primary goal was to enforce legal pointer conversion and rid Swift code of undefined behavior. I can't do that while allowing UnsafePointer conversions.

<https://github.com/atrick/swift-evolution/blob/voidpointer/proposals/XXXX-unsafebytepointer.md#api-improvements&gt;API improvements

As proposed, the initialize API infers the stored value:

func initialize<T>(with newValue: T, count: Int = 1) -> UnsafeBytePointer
This is somewhat dangerous because the developer may not realize the size of the object(s) that will be written to memory. This can be easily asserted by checking the return pointer:

let newptr = ptr.initialize(with: 3)
assert(newptr - ptr == 8)
As an alternative, we could force the user to provide the expected type name in the initialize invocation:

func initialize<T>(_ T.Type, with newValue: T, count: Int = 1)
  -> UnsafeBytePointer
<https://github.com/atrick/swift-evolution/blob/voidpointer/proposals/XXXX-unsafebytepointer.md#future-improvements&gt;Future improvements

UnsafeBytePointer should eventually support unaligned memory access. I believe that we will eventually have a modifier that allows "packed" struct members. At that time we may also want to add a "packed" flag to UnsafeBytePointer's load and initialize methods.

When accessing a memory buffer, it is generally convenient to cast to a type with known layout and compute offsets relative to the type's size. This is how UnsafePointer<Pointee> works. A generic UnsafeTypePunnedPointer<Pointee> could be introduced with the same interface as UnsafePointer<Pointer>, but without the strict aliasing requirements. This seems like an overdesign simply to avoid calling sizeof() in an rare use case, but nothing prevents adding this type later.

<https://github.com/atrick/swift-evolution/blob/voidpointer/proposals/XXXX-unsafebytepointer.md#unsafebytepointer-example&gt;UnsafeBytePointer example

/// An example of using UnsafeBytePointer to implement manual memory layout.

/// A Buffer for reading and writing basic types at a fixed address.
/// Indirection allows the buffer to refer to mutable state elsewhere.
struct MessageBuffer {
  let ptr: UnsafeBytePointer

  enum IndirectFlag { case Direct, Indirect }

  private func getPointer(atOffset n: Int, _ isIndirect: IndirectFlag)
  -> UnsafeBytePointer {
    switch isIndirect {
    case .Indirect:
      return (ptr + n).load(UnsafeBytePointer.self)
    case .Direct:
      return ptr + n
    }
  }

  func readUInt32(atOffset n: Int, _ isIndirect: IndirectFlag) -> UInt32 {
    return getPointer(atOffset: n, isIndirect).load(UInt32.self)
  }
  func readFloat32(atOffset n: Int, _ isIndirect: IndirectFlag) -> Float32 {
    return getPointer(atOffset: n, isIndirect).load(Float32.self)
  }

  func writeUInt32(_ val: UInt32, atOffset n: Int) {
    getPointer(atOffset: n, .Direct).initialize(with: val)
  }
  func writeFloat32(_ val: Float32, atOffset n: Int) {
    getPointer(atOffset: n, .Direct).initialize(with: val)
  }
  func writeIndirect(_ ptr: UnsafeBytePointer, atOffset n: Int) {
    getPointer(atOffset: n, .Direct).initialize(with: ptr)
  }
}

/// Encoded message format.
struct MessageFormat : Sequence, IteratorProtocol {
  typealias Element = MessageFormat

  private static let maxFormatFields = 32 / 4
  static let maxBufferBytes = maxFormatFields * sizeof(UInt)

  var formatCode: UInt32 = 0
  var elementCode: UInt32 = 0
  var offset: Int = 0

  init(bitPattern: UInt32) {
    formatCode = bitPattern
  }

  enum Kind {
    case None, Reserved, UInt32, Float32, IndirectUInt32, IndirectFloat32
  }

  /// The first field's kind.
  var kind : Kind {
    get {
      switch elementCode {
      case 0x0: return Kind.None
      case 0x2: return Kind.UInt32
      case 0x3: return Kind.Float32
      case 0x6: return Kind.IndirectUInt32
      case 0x7: return Kind.IndirectFloat32
      default: return Kind.Reserved
      }
    }
  }

  func elementSize() -> Int {
    return (elementCode & 0x4) != 0 ? sizeof(UInt) : 4
  }

  /// Get the format for the next element.
  mutating func next() -> Element? {
    if elementCode != 0 {
      offset += elementSize()
    }
    elementCode = formatCode & 0xF
    formatCode >>= 4
    if kind == .None {
      return nil
    }
    // align to the next element size
    let offsetMask = elementSize() - 1
    offset = (offset + offsetMask) & ~offsetMask
    return self
  }
}

func createBuffer() -> MessageBuffer {
  return MessageBuffer(ptr: UnsafeBytePointer(
      allocatingBytes: MessageFormat.maxBufferBytes, alignedTo: sizeof(UInt)))
}

func destroy(buffer: MessageBuffer) {
  buffer.ptr.deallocateBytes(MessageFormat.maxBufferBytes,
    alignedTo: sizeof(UInt))
}

var sharedInt: UInt32 = 42
var sharedFloat: Float32 = 16.25

func generateMessage(inBuffer mb: MessageBuffer) -> MessageFormat {
  let mf = MessageFormat(bitPattern: 0x06727632)
  for field in mf {
    switch field.kind {
    case .UInt32:
      mb.writeUInt32(66, atOffset: field.offset)
    case .Float32:
      mb.writeFloat32(41.625, atOffset: field.offset)
    case .IndirectUInt32:
      mb.writeIndirect(&sharedInt, atOffset: field.offset)
    case .IndirectFloat32:
      mb.writeIndirect(&sharedFloat, atOffset: field.offset)
    case .None:
      fallthrough
    case .Reserved:
      return MessageFormat(bitPattern: 0)
    }
  }
  return mf
}

func handleMessage(buffer mb: MessageBuffer, format: MessageFormat) -> Bool {
  for field in format {
    switch field.kind {
    case .UInt32:
      print(mb.readUInt32(atOffset: field.offset, .Direct))
    case .Float32:
      print(mb.readFloat32(atOffset: field.offset, .Direct))
    case .IndirectUInt32:
      print(mb.readUInt32(atOffset: field.offset, .Indirect))
    case .IndirectFloat32:
      print(mb.readFloat32(atOffset: field.offset, .Indirect))
    case .None:
      fallthrough
    case .Reserved:
      return false
    }
  }
  return true
}

func runProgram() {
  let mb = createBuffer()
  let mf = generateMessage(inBuffer: mb)
  if handleMessage(buffer: mb, format: mf) {
    print("Done")
  }
  destroy(buffer: mb)
}
runProgram()

Hi, Andy. Thanks for working on this! I have a pile of scattered comments which are hopefully helpful.

On the model itself:

- I’m uncomfortable with using the term “undefined behavior” as if it’s universally understood. Up until now we haven't formally had that notion in Swift, just “type safety” and “memory safety” and “invariant-preserving” and the like. Maybe we need it now, but I think it needs to be explicitly defined. (I’d actually talk to Dave about exactly what terms make the most sense for users.)

- "thin function, C function, and block function types” <-- block functions are not layout-compatible with C functions, and they are layout-compatible with AnyObject. (I mean, they’re both pointers at the moment, but so are non-weak object references.)

- "nonresilient structs” <-- nitpick: the term “nonresilient” is not defined here, and isn’t a formal term in the Library Evolution doc. I guess I would actually prefer “fragile” if you needed a generic term across structs and enums, but either way you should put a small definition somewhere in this doc.

- "homogeneous tuples, fixed-sized array storage, and homogeneous nonresilient structs in which the element type has no spare bits (structs may be bit packed).” <-- I would leave the structs out of this, even if it’s true. Also, Swift doesn’t have fixed-size arrays at the moment, right?

- "In particular, they apply to access that originates from stored property getter and setters, reading from and assigning into inout variables, and reading or assigning subscripts (including the Unsafe[Mutable]Pointer pointee property and subscripts).” I’m unhappy with inout variables being called out specially here. An inout variable should be exactly like a local variable that happens to be stack-allocated, rather than just in registers. Closure captures probably figure in here too.

- "unsafeBitCast is valid for pointer to integer conversions” <-- we have better APIs to do this now ('init(bitPattern:)’ in both directions).

- "It is also used internally to convert between nondereferenceable pointer types, which avoids the need to add builtin conversions for all combinations of pointer types.” <-- I’d be happy to get rid of this and just go through Builtin.RawPointer when necessary.

- On the flip side, I think we do need to preserve the ability to reference-cast in order to send Objective-C messages, at least for now. I don’t know how I want to expose that to users, though. (In general it’s probably worth seeing how unsafeBitCast is used in the wild and what we’d recommend instead.)

Some concerns with UnsafeBytePointer:

- I was concerned about having a store() to go with load(). It’s just deinitialize + initialize with a count of 1, but that’s easily the common case when you do need to write to something. That said, I’m not sure which people are more likely to mess up: using initialize and forgetting to deinitialize before, or using store when there wasn’t anything there before.

- I am concerned about eliminating the distinction between mutable and immutable memory. That is, I think we’ll want the Mutable variant to be a separate type.

- Is there a good way to do a mass copy or move from an UnsafeBytePointer?

Thoughts on the diff:

- What was the thought behind putting UnsafeBytePointer in PointerTypeKind? OpaquePointer isn’t there, and I’m concerned about places that test if something’s a pointer by checking that the pointee type is non-null (by far the common pattern).

- The PrintAsObjC test can’t possibly pass as is—it’s checking that one pointer is const and the other isn’t. I’m guessing there’s actually more work to do here.

I think that’s about it. Sorry for the laundry list, but I feel good about most of them being small issues. The model seems sensible, other than lingering fears about it still being too easy to do the wrong thing. Maybe we need a Pointer Sanitizer for dynamic verification.

Jordan

···

On May 6, 2016, at 22:28, Andrew Trick via swift-dev <swift-dev@swift.org> wrote:

Hello Swift development community,

Swift does a great job of protecting against undefined behavior--as long as you avoid "unsafe" APIs, that is. However, unsafe APIs are important for giving developers control over implementation details and performance. Naturally, the contract between unsafe APIs and the optimizer is crucial. When a developer uses an unsafe API, the rules governing safe, well-defined behavior must be clear. On the opposite end, the optimizer must know which assumptions it can make based on those rules. Simply saying that anything goes because "unsafe" is in the name is not helpful to this effort.

For a long time, I've wanted these rules nailed down. We have more users taking advantage of advanced features, and more optimizations that take advantage of assumptions guided by the type system. This seems like a particularly good time to resolve UnsafePointer semantics, considering the type system and UnsafePointer work that's been going on recently. Strict aliasing is something I would like addressed. If we do nothing here, then we will end up by default inheriting C/C++ semantics, as with any language that relies on a C/C++ backend. In other words, developers will be forced to write code with technically undefined behavior and rely on the compiler to be smart enough to recognize and recover from common patterns. Or we can take advantage of this opportunity and instead adopt a sound memory model with respect to aliasing.

This proposal is only an RFC at this point. I'm sending it out now to allow for plenty of time for discussion (or advance warning). Keep in mind that it could change considerably before it goes up for review.

-Andy

UnsafeBytePointer API for In-Memory Layout

Proposal: SE-NNNN <https://github.com/atrick/swift-evolution/blob/voidpointer/proposals/XXXX-unsafebytepointer.md&gt;
Author(s): Andrew Trick <https://github.com/atrick&gt;
Status: Awaiting review <https://github.com/atrick/swift-evolution/blob/voidpointer/proposals/XXXX-unsafebytepointer.md#rationale&gt;
Review manager: TBD
<https://github.com/atrick/swift-evolution/blob/voidpointer/proposals/XXXX-unsafebytepointer.md#introduction&gt;Introduction

UnsafePointer and UnsafeMutable refer to a typed region of memory, and the compiler must be able to assume that UnsafePointer element (Pointee) type is consistent with other access to the same memory. See proposed Type Safe Memory Access documentation <https://github.com/atrick/swift/blob/type-safe-mem-docs/docs/TypeSafeMemory.rst&gt;\. Consequently, inferred conversion between UnsafePointer element types exposes an easy way to abuse the type system. No alternative currently exists for manual memory layout and direct access to untyped memory, and that leads to an overuse of UnsafePointer. These uses of UnsafePointer, which depend on pointer type conversion, make accidental type punning likely. Type punning via UnsafePointer is semantically undefined behavior and de facto undefined behavior given the optimizer's long-time treatment of UnsafePointer.

In this document, all mentions of UnsafePointer also apply to UnsafeMutablePointer.

<https://github.com/atrick/swift-evolution/blob/voidpointer/proposals/XXXX-unsafebytepointer.md#motivation&gt;Motivation

To avoid accidental type punning, we should prohibit inferred conversion between UnsafePointer<T> and UnsafePointer<U> unless the target of the conversion is an untyped or nondereferenceable pointer (currently represented as UnsafePointer<Void>).

To support this change we should introduce a new pointer type that does not bind the type of its Pointee. Such a new pointer type would provide an ideal foundation for an API that allows byte-wise pointer arithmetic and a legal, well-defined means to access an untyped region of memory.

As motivation for such an API, consider that an UnsafePointer<Void> or OpaquePointer may be currently be obtained from an external API. However, the developer may know the memory layout and may want to read or write elements whose types are compatible with that layout. This a reasonable use case, but unless the developer can guarantee that all accesses to the same memory location have the same type, then they cannot use UnsafePointer to access the memory without risking undefined behavior.

An UnsafeBytePointer example, using a new proposed API is included below.

<https://github.com/atrick/swift-evolution/blob/voidpointer/proposals/XXXX-unsafebytepointer.md#proposed-solution&gt;Proposed solution

Introduce an UnsafeBytePointer type along with an API for obtaining a UnsafeBytePointer value at a relative byte offset and loading and storing arbitrary types at that location.

Statically prohibit inferred UnsafePointer conversion while allowing inferred UnsafePointer to UnsafeBytePointerconversion.

UnsafeBytePointer meets multiple requirements:

An untyped pointer to memory
Pointer arithmetic within byte-addressable memory
Type-unsafe access to memory (legal type punning)
UnsafeBytePointer will replace UnsafeMutablePointer<Void> as the representation for untyped memory. For API clarify we could consider a typealias for VoidPointer. I don't think a separate VoidPointer type would be useful--there's no danger that UnsafeBytePointer will be casually dereferenced, and don't see the danger in allowing pointer arithmetic since the only reasonable interpretation is that of a byte-addressable memory.

Providing an API for type-unsafe memory access would not serve a purpose without the ability to compute byte offsets. Of course, we could require users to convert back and forth using bitPatterns, but I think that would be awkward and only obscure the purpose of the UnsafeBytePointer type.

In this proposal, UnsafeBytePointer does not specify mutability. Adding an UnsafeMutableBytePointer would be straightforward, but adding another pointer type needs strong justification. I expect to get input from the community on this. If we agree that the imported type for const void* should be UnsafeBytePointer, then we probably need UnsafeMutablePointer to handle interoperability.

<https://github.com/atrick/swift-evolution/blob/voidpointer/proposals/XXXX-unsafebytepointer.md#detailed-design&gt;Detailed design

The public API is shown here. For details and comments, see the unsafeptr_convert branch <https://github.com/atrick/swift/commits/unsafeptr_convert&gt;\.

struct UnsafeBytePointer : Hashable, _Pointer {

  let _rawValue: Builtin.RawPointer

  var hashValue: Int {...}

  init<T>(_ : UnsafePointer<T>)
  init<T>(_ : UnsafeMutablePointer<T>)
  init?<T>(_ : UnsafePointer<T>?)
  init?<T>(_ : UnsafeMutablePointer<T>?)

  init<T>(_ : OpaquePointer<T>)
  init?<T>(_ : OpaquePointer<T>?)

  init?(bitPattern: Int)
  init?(bitPattern: UInt)

  func load<T>(_ : T.Type) -> T

  @warn_unused_result
  init(allocatingBytes size: Int, alignedTo: Int)

  @warn_unused_result
  init<T>(allocatingCapacity count: Int, of: T.Type)

  func deallocateBytes(_ size: Int, alignedTo: Int)

  func deallocateCapacity<T>(_ num: Int, of: T.Type)

  // Returns a pointer one byte after the initialized memory.
  func initialize<T>(with newValue: T, count: Int = 1) -> UnsafeBytePointer

  // Returns a pointer one byte after the initialized memory.
  func initialize<T>(from: UnsafePointer<T>, count: Int) -> UnsafeBytePointer

  func initializeBackward<T>(from source: UnsafePointer<T>, count: Int)

  func deinitialize<T>(_ : T.Type, count: Int = 1)
}

extension OpaquePointer {
  init(_ : UnsafeBytePointer)
}

extension Int {
  init(bitPattern: UnsafeBytePointer)
}

extension UInt {
  init(bitPattern: UnsafeBytePointer)
}

extension UnsafeBytePointer : RandomAccessIndex {
  typealias Distance = Int

  func successor() -> UnsafeBytePointer
  func predecessor() -> UnsafeBytePointer
  func distance(to : UnsafeBytePointer) -> Int
  func advanced(by : Int) -> UnsafeBytePointer
}

func == (lhs: UnsafeBytePointer, rhs: UnsafeBytePointer) -> Bool

func < (lhs: UnsafeBytePointer, rhs: UnsafeBytePointer) -> Bool

func + (lhs: UnsafeBytePointer, rhs: Int) -> UnsafeBytePointer

func + (lhs: Int, rhs: UnsafeBytePointer) -> UnsafeBytePointer

func - (lhs: UnsafeBytePointer, rhs: Int) -> UnsafeBytePointer

func - (lhs: UnsafeBytePointer, rhs: UnsafeBytePointer) -> Int

func += (lhs: inout UnsafeBytePointer, rhs: Int)

func -= (lhs: inout UnsafeBytePointer, rhs: Int)
Occasionally, we need to convert from a UnsafeBytePointer to an UnsafePointer. This should only be done in very rare circumstances when the author understands the compiler's strict type rules for UnsafePointer. Although this could be done by casting through an OpaquePointer, an explicit, designated unsafe pointer cast API would makes the risks more obvious and self-documenting. For example:

extension UnsafePointer {
  init(_ from: UnsafeBytePointer, toPointee: Pointee.type)
}
extension UnsafeMutablePointer {
  init(_ from: UnsafeBytePointer, toPointee: Pointee.type)
}
Similarly, conversion between UnsafePointer types must now be spelled with an explicitly Pointee type:

extension UnsafePointer {
  init<U>(_ from: UnsafePointer<U>, toPointee: Pointee.Type)
  init<U>(_ from: UnsafeMutablePointer<U>, toPointee: Pointee.Type)
}
extension UnsafeMutablePointer {
  init<U>(_ from: UnsafeMutablePointer<U>, toPointee: Pointee.Type)
}
<https://github.com/atrick/swift-evolution/blob/voidpointer/proposals/XXXX-unsafebytepointer.md#impact-on-existing-code&gt;Impact on existing code

The largest impact of this change is that void* and const void* are imported as UnsafeBytePointer. This impacts many public APIs, but with implicit argument conversion should not affect typical uses of those APIs.

Any Swift projects that rely on type inference to convert between UnsafePointer types will need to take action. The developer needs to determine whether type punning is necessary. If so, they must migrate to the UnsafeBytePointer API. Otherwise, they can work around the new restriction by using a toPointee, or mutating label.

Disallowing inferred UnsafePointer direct conversion requires some standard library code to use an explicit toPointeelabel for unsafe conversions that may violate strict aliasing.

All occurrences of Unsafe[Mutable]Pointer<Void> in the standard library are converted to UnsafeBytePointer. e.g. unsafeAddress() now returns UnsafeBytePointer, not UnsafePointer<Void>.

Some occurrences of Unsafe[Mutable]Pointer<Pointee> in the standard library are replaced with UnsafeBytePointer, either because the code was playing too loosely with strict aliasing rules, or because the code actually wanted to perform pointer arithmetic on byte-addresses.

StringCore.baseAddress changes from OpaquePointer to UnsafeBytePointer because it is computing byte offsets and accessing the memory. OpaquePointer is meant for bridging, but should be truly opaque; that is, nondereferenceable and not involved in address computation.

The StringCore implementation does a considerable amount of casting between different views of the String storage. The current implementation already demonstrates some awareness of strict aliasing rules. The rules are generally followed by ensuring that the StringBuffer only be accessed using the appropriate CodeUnit within Swift code. For interoperability and optimization, String buffers frequently need to be cast to and from CChar. This is valid as long access to the buffer from Swift is guarded by dynamic checks of the encoding type. These unsafe, but dynamically legal conversion points will now be labeled with toPointee.

CoreAudio utilities now use an UnsafeBytePointer.

<https://github.com/atrick/swift-evolution/blob/voidpointer/proposals/XXXX-unsafebytepointer.md#implementation-status&gt;Implementation status

On my unsafeptr_convert branch <https://github.com/atrick/swift/commits/unsafeptr_convert&gt;, I've made most of the necessary changes to support the addition of UnsafeBytePointerand the removal of inferred UnsafePointer conversion.

There are a several things going on here in order to make it possible to build the standard library with the changes:

A new UnsafeBytePointer type is defined.

The type system imports void* as UnsafeBytePointer.

The type system handles implicit conversions to UnsafeBytePointer.

UnsafeBytePointer replaces both UnsafePointer<Void> and UnsafeMutablePointer<Void>.

The standard library was relying on inferred UnsafePointer conversion in over 100 places. Most of these conversions now either take an explicit label, such as 'toPointee', 'mutating'. Some have been rewritten.

Several places in the standard library that were playing loosely with strict aliasing or doing bytewise pointer arithmetic now use UnsafeBytePointer instead.

Explicit labeled Unsafe[Mutable]Pointer initializers are added.

The inferred Unsafe[Mutable]Pointer conversion is removed.

TODO:

Once this proposal is accepted, and the rules for casting between pointers types have been decided, we need to finish implementing the type system support. The current implementation (intentionally) breaks a few tests in pointer_conversion.swift. We also need to ensure that interoperability requirements are met. Currently, many argument casts to be explicitly labeled. The current implementation also makes it easy for users to hit an "ambiguous use of 'init'" error when relying on implicit argument conversion.

Additionally:

A name mangled abbreviation needs to be created for UnsafeBytePointer.

The StringAPI tests should probably be rewritten with UnsafeBytePointer.

The NSStringAPI utilities and tests may need to be ported to UnsafeBytePointer

The CoreAudio utilities and tests may need to be ported to UnsafeBytePointer.

<https://github.com/atrick/swift-evolution/blob/voidpointer/proposals/XXXX-unsafebytepointer.md#alternatives-considered&gt;Alternatives considered

<https://github.com/atrick/swift-evolution/blob/voidpointer/proposals/XXXX-unsafebytepointer.md#existing-workaround&gt;Existing workaround

In some cases, developers can safely reinterpret values to achieve the same effect as type punning:

let ptrI32 = UnsafeMutablePointer<Int32>(allocatingCapacity: 1)
ptrI32[0] = Int32()
let u = unsafeBitCast(ptrI32[0], to: UInt32.self)
Note that all access to the underlying memory is performed with the same element type. This is perfectly legitimate, but simply isn't a complete solution. It also does not eliminate the inherent danger in declaring a typed pointer and expecting it to point to values of a different type.

<https://github.com/atrick/swift-evolution/blob/voidpointer/proposals/XXXX-unsafebytepointer.md#discarded-alternatives&gt;Discarded alternatives

We considered adding a typePunnedMemory property to the existing Unsafe[Mutabale]Pointer API. This would provide a legal way to access a potentially type punned Unsafe[Mutabale]Pointer. However, it would certainly cause confusion without doing much to reduce likelihood of programmer error. Furthermore, there are no good use cases for such a property evident in the standard library.

The opaque _RawByte struct is a technique that allows for byte-addressable buffers while hiding the dangerous side effects of type punning (a _RawByte could be loaded but it's value cannot be directly inspected). UnsafePointer<_RawByte> is a clever alternative to UnsafeBytePointer. However, it doesn't do enough to prevent undefined behavior. The loaded _RawByte would naturally be accessed via unsafeBitCast, which would mislead the author into thinking that they have legally bypassed the type system. In actuality, this API blatantly violates strict aliasing. It theoretically results in undefined behavior as it stands, and may actually exhibit undefined behavior if the user recovers the loaded value.

To solve the safety problem with UnsafePointer<_RawByte>, the compiler could associate special semantics with a UnsafePointer bound to this concrete generic parameter type. Statically enforcing casting rules would be difficult if not impossible without new language features. It would also be impossible to distinguish between typed and untyped pointer APIs. For example, UnsafePointer<T>.load<U> would be a nonsensical vestige.

<https://github.com/atrick/swift-evolution/blob/voidpointer/proposals/XXXX-unsafebytepointer.md#alternate-proposal-for-void-type&gt;Alternate proposal for void* type

Changing the imported type for void* will be somewhat disruptive. Furthermore, this proposal currently drops the distinction between void* and const void*--an obvious loss of API information.

We could continue to import void* as UnsafeMutablePointer<Void> and const void* as UnsafePointer<Void>, which will continue to serve as an "opaque" untyped pointer. Converting to UnsafeBytePointer would be necesarry to perform pointer arithmetic or to conservatively handle possible type punning.

This alternative is much less disruptive, but we are left with two forms of untyped pointer, one of which (UnsafePointer) the type system somewhat conflates with typed pointers.

Given the current restrictions of the language, it's not clear how to statically enforce the necessary rules for castingUnsafePointer<Void> once general UnsafePointer<T> conversions are disallowed. The following conversions should be inferred, and implied for function arguments (ignoring mutability):

UnsafePointer<T> to UnsafePointer<Void>

UnsafePointer<Void> to UnsafeBytePointer

I did not implement this simpler design because my primary goal was to enforce legal pointer conversion and rid Swift code of undefined behavior. I can't do that while allowing UnsafePointer conversions.

<https://github.com/atrick/swift-evolution/blob/voidpointer/proposals/XXXX-unsafebytepointer.md#api-improvements&gt;API improvements

As proposed, the initialize API infers the stored value:

func initialize<T>(with newValue: T, count: Int = 1) -> UnsafeBytePointer
This is somewhat dangerous because the developer may not realize the size of the object(s) that will be written to memory. This can be easily asserted by checking the return pointer:

let newptr = ptr.initialize(with: 3)
assert(newptr - ptr == 8)
As an alternative, we could force the user to provide the expected type name in the initialize invocation:

func initialize<T>(_ T.Type, with newValue: T, count: Int = 1)
  -> UnsafeBytePointer
<https://github.com/atrick/swift-evolution/blob/voidpointer/proposals/XXXX-unsafebytepointer.md#future-improvements&gt;Future improvements

UnsafeBytePointer should eventually support unaligned memory access. I believe that we will eventually have a modifier that allows "packed" struct members. At that time we may also want to add a "packed" flag to UnsafeBytePointer's load and initialize methods.

When accessing a memory buffer, it is generally convenient to cast to a type with known layout and compute offsets relative to the type's size. This is how UnsafePointer<Pointee> works. A generic UnsafeTypePunnedPointer<Pointee> could be introduced with the same interface as UnsafePointer<Pointer>, but without the strict aliasing requirements. This seems like an overdesign simply to avoid calling sizeof() in an rare use case, but nothing prevents adding this type later.

<https://github.com/atrick/swift-evolution/blob/voidpointer/proposals/XXXX-unsafebytepointer.md#unsafebytepointer-example&gt;UnsafeBytePointer example

/// An example of using UnsafeBytePointer to implement manual memory layout.

/// A Buffer for reading and writing basic types at a fixed address.
/// Indirection allows the buffer to refer to mutable state elsewhere.
struct MessageBuffer {
  let ptr: UnsafeBytePointer

  enum IndirectFlag { case Direct, Indirect }

  private func getPointer(atOffset n: Int, _ isIndirect: IndirectFlag)
  -> UnsafeBytePointer {
    switch isIndirect {
    case .Indirect:
      return (ptr + n).load(UnsafeBytePointer.self)
    case .Direct:
      return ptr + n
    }
  }

  func readUInt32(atOffset n: Int, _ isIndirect: IndirectFlag) -> UInt32 {
    return getPointer(atOffset: n, isIndirect).load(UInt32.self)
  }
  func readFloat32(atOffset n: Int, _ isIndirect: IndirectFlag) -> Float32 {
    return getPointer(atOffset: n, isIndirect).load(Float32.self)
  }

  func writeUInt32(_ val: UInt32, atOffset n: Int) {
    getPointer(atOffset: n, .Direct).initialize(with: val)
  }
  func writeFloat32(_ val: Float32, atOffset n: Int) {
    getPointer(atOffset: n, .Direct).initialize(with: val)
  }
  func writeIndirect(_ ptr: UnsafeBytePointer, atOffset n: Int) {
    getPointer(atOffset: n, .Direct).initialize(with: ptr)
  }
}

/// Encoded message format.
struct MessageFormat : Sequence, IteratorProtocol {
  typealias Element = MessageFormat

  private static let maxFormatFields = 32 / 4
  static let maxBufferBytes = maxFormatFields * sizeof(UInt)

  var formatCode: UInt32 = 0
  var elementCode: UInt32 = 0
  var offset: Int = 0

  init(bitPattern: UInt32) {
    formatCode = bitPattern
  }

  enum Kind {
    case None, Reserved, UInt32, Float32, IndirectUInt32, IndirectFloat32
  }

  /// The first field's kind.
  var kind : Kind {
    get {
      switch elementCode {
      case 0x0: return Kind.None
      case 0x2: return Kind.UInt32
      case 0x3: return Kind.Float32
      case 0x6: return Kind.IndirectUInt32
      case 0x7: return Kind.IndirectFloat32
      default: return Kind.Reserved
      }
    }
  }

  func elementSize() -> Int {
    return (elementCode & 0x4) != 0 ? sizeof(UInt) : 4
  }

  /// Get the format for the next element.
  mutating func next() -> Element? {
    if elementCode != 0 {
      offset += elementSize()
    }
    elementCode = formatCode & 0xF
    formatCode >>= 4
    if kind == .None {
      return nil
    }
    // align to the next element size
    let offsetMask = elementSize() - 1
    offset = (offset + offsetMask) & ~offsetMask
    return self
  }
}

func createBuffer() -> MessageBuffer {
  return MessageBuffer(ptr: UnsafeBytePointer(
      allocatingBytes: MessageFormat.maxBufferBytes, alignedTo: sizeof(UInt)))
}

func destroy(buffer: MessageBuffer) {
  buffer.ptr.deallocateBytes(MessageFormat.maxBufferBytes,
    alignedTo: sizeof(UInt))
}

var sharedInt: UInt32 = 42
var sharedFloat: Float32 = 16.25

func generateMessage(inBuffer mb: MessageBuffer) -> MessageFormat {
  let mf = MessageFormat(bitPattern: 0x06727632)
  for field in mf {
    switch field.kind {
    case .UInt32:
      mb.writeUInt32(66, atOffset: field.offset)
    case .Float32:
      mb.writeFloat32(41.625, atOffset: field.offset)
    case .IndirectUInt32:
      mb.writeIndirect(&sharedInt, atOffset: field.offset)
    case .IndirectFloat32:
      mb.writeIndirect(&sharedFloat, atOffset: field.offset)
    case .None:
      fallthrough
    case .Reserved:
      return MessageFormat(bitPattern: 0)
    }
  }
  return mf
}

func handleMessage(buffer mb: MessageBuffer, format: MessageFormat) -> Bool {
  for field in format {
    switch field.kind {
    case .UInt32:
      print(mb.readUInt32(atOffset: field.offset, .Direct))
    case .Float32:
      print(mb.readFloat32(atOffset: field.offset, .Direct))
    case .IndirectUInt32:
      print(mb.readUInt32(atOffset: field.offset, .Indirect))
    case .IndirectFloat32:
      print(mb.readFloat32(atOffset: field.offset, .Indirect))
    case .None:
      fallthrough
    case .Reserved:
      return false
    }
  }
  return true
}

func runProgram() {
  let mb = createBuffer()
  let mf = generateMessage(inBuffer: mb)
  if handleMessage(buffer: mb, format: mf) {
    print("Done")
  }
  destroy(buffer: mb)
}
runProgram()

_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

We do have undefined behavior, and use that term in the standard library docs where appropriate:

stdlib/public/core/Optional.swift- /// `!` (forced unwrap) operator. However, in optimized builds (`-O`), no
stdlib/public/core/Optional.swift- /// check is performed to ensure that the current instance actually has a
stdlib/public/core/Optional.swift- /// value. Accessing this property in the case of a `nil` value is a serious
stdlib/public/core/Optional.swift: /// programming error and could lead to undefined behavior or a runtime
stdlib/public/core/Optional.swift- /// error.
stdlib/public/core/Optional.swift- ///
stdlib/public/core/Optional.swift- /// In debug builds (`-Onone`), the `unsafelyUnwrapped` property has the same

···

On May 12, 2016, at 9:27 AM, Jordan Rose via swift-dev <swift-dev@swift.org> wrote:

- I’m uncomfortable with using the term “undefined behavior” as if it’s universally understood. Up until now we haven't formally had that notion in Swift, just “type safety” and “memory safety” and “invariant-preserving” and the like. Maybe we need it now, but I think it needs to be explicitly defined. (I’d actually talk to Dave about exactly what terms make the most sense for users.)

--
stdlib/public/core/StringBridge.swift- /// The caller of this function guarantees that the closure 'body' does not
stdlib/public/core/StringBridge.swift- /// escape the object referenced by the opaque pointer passed to it or
stdlib/public/core/StringBridge.swift- /// anything transitively reachable form this object. Doing so
stdlib/public/core/StringBridge.swift: /// will result in undefined behavior.
stdlib/public/core/StringBridge.swift- @_semantics("self_no_escaping_closure")
stdlib/public/core/StringBridge.swift- func _unsafeWithNotEscapedSelfPointer<Result>(
stdlib/public/core/StringBridge.swift- _ body: @noescape (OpaquePointer) throws -> Result
--
stdlib/public/core/Unmanaged.swift- /// reference's lifetime fixed for the duration of the
stdlib/public/core/Unmanaged.swift- /// '_withUnsafeGuaranteedRef' call.
stdlib/public/core/Unmanaged.swift- ///
stdlib/public/core/Unmanaged.swift: /// Violation of this will incur undefined behavior.
stdlib/public/core/Unmanaged.swift- ///
stdlib/public/core/Unmanaged.swift- /// A lifetime of a reference 'the instance' is fixed over a point in the
stdlib/public/core/Unmanaged.swift- /// programm if:

-Joe

On the model itself:

Responding to your feedback on the model (thanks!).

With just one follow up question at the bottom of this email…

- "thin function, C function, and block function types” <-- block functions are not layout-compatible with C functions, and they are layout-compatible with AnyObject. (I mean, they’re both pointers at the moment, but so are non-weak object references.)

   - pointer types (e.g. ``OpaquePointer``, ``UnsafePointer``)
- - thin function, C function, and block function types
+ - block function types and ``AnyObject``
+ - thin function and C function types
   - imported C types that have the same layout in C

- "nonresilient structs” <-- nitpick: the term “nonresilient” is not defined here, and isn’t a formal term in the Library Evolution doc. I guess I would actually prefer “fragile” if you needed a generic term across structs and enums, but either way you should put a small definition somewhere in this doc.

   - imported C types that have the same layout in C
- - nonresilient structs with one stored property and their stored
+ - fragile structs with one stored property and their stored
     property type
- - nonresilient enums with one case and their payload type
+ - fragile enums with one case and their payload type

.. note::

- `Library Evolution Support in Swift`__
+ "Fragile" enums and structs have strict layout rules that ensure
+ binary compatibility. `Library Evolution Support in Swift`__
    explains the impact of resilience on object layout.

- "homogeneous tuples, fixed-sized array storage, and homogeneous nonresilient structs in which the element type has no spare bits (structs may be bit packed).” <-- I would leave the structs out of this, even if it’s true. Also, Swift doesn’t have fixed-size arrays at the moment, right?

Hmm, I think we want to say that raw allocated memory, arrays, homogeneous tuples and structs are layout compatible with 'strideof'. I'll leave out structs for now and this can be hashed out in ABI specs. I want to avoid naming specific API's and I think it's ok to be a bit vague in this (non-ABI) document as long as the intent is obvious:

  - contiguous array storage and homogeneous tuples which
    have the same number and type of elements.

- "In particular, they apply to access that originates from stored property getter and setters, reading from and assigning into inout variables, and reading or assigning subscripts (including the Unsafe[Mutable]Pointer pointee property and subscripts).” I’m unhappy with inout variables being called out specially here. An inout variable should be exactly like a local variable that happens to be stack-allocated, rather than just in registers. Closure captures probably figure in here too.

Agreed. I'm not sure what I was thinking.

- "unsafeBitCast is valid for pointer to integer conversions” <-- we have better APIs to do this now ('init(bitPattern:)’ in both directions).

+``unsafeBitCast`` should generally be avoided on pointer types,
+particularly class types. For pointer to integer conversions,
+``bitPattern`` initializers are available in both
+directions. ``unsafeBitCast`` may be used to convert between
+nondereferenceable pointer types, but as with any conversion to and
+from opaque pointers, this presents an opportunity for type punning
+when converting back to a dereferenceable pointer type.

- "It is also used internally to convert between nondereferenceable pointer types, which avoids the need to add builtin conversions for all combinations of pointer types.” <-- I’d be happy to get rid of this and just go through Builtin.RawPointer when necessary.

...I do like to get feedback that eliminating unsafeBitCast is a good thing. I think it should only be needed for genuine reinterpretation of the bits as opposed working around the type system. I'd like to see only a tiny handful of occurrences in stdlib. I have a branch where I've cleaned up many unsafeBitCasts, which never got checked in, so I can spend some time on that again after UnsafePointer changes land. Then maybe we should prohibit it from being called on certain pointer types. For starters AnyObject, UnsafePointer, and UnsafeBytePointer.

- On the flip side, I think we do need to preserve the ability to reference-cast in order to send Objective-C messages, at least for now. I don’t know how I want to expose that to users, though. (In general it’s probably worth seeing how unsafeBitCast is used in the wild and what we’d recommend instead.)

Does ``X as Y`` fail for some reason? We have unchecked versions of ``X as Y`` for performance reasons: ``unsafeDowncast`` and ``_unsafeReferenceCast``.

-Andy

···

On May 12, 2016, at 9:27 AM, Jordan Rose <jordan_rose@apple.com> wrote:

Thoughts on the diff:

https://github.com/atrick/swift/tree/unsafeptr_convert

- What was the thought behind putting UnsafeBytePointer in PointerTypeKind? OpaquePointer isn’t there, and I’m concerned about places that test if something’s a pointer by checking that the pointee type is non-null (by far the common pattern).

In general I wanted UnsafeBytePointer to stand-in for UnsafePointer<Void> throughout the type system and handle most of the same implicit conversions. Specifically, I wanted getAnyPointerElementType to do the same thing as UnsafePointer<Void> and return an empty tuple pointee type so that the calling code could be reused. Also, I thought that supporting PointerToPointerExpr was necessary.

The only extra burden of doing this that I could find was that getPointerPointeePropertyDecl may return null. The only code that calls this is emitStoreToForeignErrorSlot.

I'm very open to alternate implementations, especially once the proposal is accepted.

- The PrintAsObjC test can’t possibly pass as is—it’s checking that one pointer is const and the other isn’t. I’m guessing there’s actually more work to do here.

That's right. PrintAsObjC is one of several tests that are still failing. Before fixing them I want to:

- get reassurance that we really want to replace the the imported type of 'void*'. Then I'll introduce an UnsafeMutableBytePointer.

- determine precisely which implicit conversions we want to allow and update test cases accordingly.

These are the tests that were failing on my branch (last time I succesfully rebased):

POSIX.swift - requires String -> UnsafeBytePointer arg conversion
UnsafeBufferPointer.swift - UnsafePointer conversion rules?
UnsafePointer.swift - UnsafePointer conversion rules?
ClangModules/* - UnsafePointer conversion rules?
IRGen/objc_pointer - name mangling
SDK/c_pointers - [Double] to UnsafeBytePointer
SDK/objc_inner_pointer - [UInt8] to UnsafeBytePointer
Parse/pointer_conversion - 'UnsafePointer<Int>' to 'UnsafeBytePointer'
PrintAsObjC/classes.swift - void* export
SILGen/objc_currying - name mangling
SILGen/pointer_conversion - String to UnsafeBytePointer
SourceKit/DocSupport - formatting
sil-opt/emit-sib. - mangling
OpenCLSDKOverlay - [Float] to UnsafeBytePointer

-Andy

···

On May 12, 2016, at 9:27 AM, Jordan Rose <jordan_rose@apple.com> wrote:

Those latter two are in stdlib-internal declarations. I think I have the same objection with using the term for 'unsafelyUnwrapped'.

Jordan

···

On May 12, 2016, at 10:44, Joe Groff <jgroff@apple.com> wrote:

On May 12, 2016, at 9:27 AM, Jordan Rose via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

- I’m uncomfortable with using the term “undefined behavior” as if it’s universally understood. Up until now we haven't formally had that notion in Swift, just “type safety” and “memory safety” and “invariant-preserving” and the like. Maybe we need it now, but I think it needs to be explicitly defined. (I’d actually talk to Dave about exactly what terms make the most sense for users.)

We do have undefined behavior, and use that term in the standard library docs where appropriate:

stdlib/public/core/Optional.swift- /// `!` (forced unwrap) operator. However, in optimized builds (`-O`), no
stdlib/public/core/Optional.swift- /// check is performed to ensure that the current instance actually has a
stdlib/public/core/Optional.swift- /// value. Accessing this property in the case of a `nil` value is a serious
stdlib/public/core/Optional.swift: /// programming error and could lead to undefined behavior or a runtime
stdlib/public/core/Optional.swift- /// error.
stdlib/public/core/Optional.swift- ///
stdlib/public/core/Optional.swift- /// In debug builds (`-Onone`), the `unsafelyUnwrapped` property has the same
--
stdlib/public/core/StringBridge.swift- /// The caller of this function guarantees that the closure 'body' does not
stdlib/public/core/StringBridge.swift- /// escape the object referenced by the opaque pointer passed to it or
stdlib/public/core/StringBridge.swift- /// anything transitively reachable form this object. Doing so
stdlib/public/core/StringBridge.swift: /// will result in undefined behavior.
stdlib/public/core/StringBridge.swift- @_semantics("self_no_escaping_closure")
stdlib/public/core/StringBridge.swift- func _unsafeWithNotEscapedSelfPointer<Result>(
stdlib/public/core/StringBridge.swift- _ body: @noescape (OpaquePointer) throws -> Result
--
stdlib/public/core/Unmanaged.swift- /// reference's lifetime fixed for the duration of the
stdlib/public/core/Unmanaged.swift- /// '_withUnsafeGuaranteedRef' call.
stdlib/public/core/Unmanaged.swift- ///
stdlib/public/core/Unmanaged.swift: /// Violation of this will incur undefined behavior.
stdlib/public/core/Unmanaged.swift- ///
stdlib/public/core/Unmanaged.swift- /// A lifetime of a reference 'the instance' is fixed over a point in the
stdlib/public/core/Unmanaged.swift- /// programm if:

The particular case I’m thinking about is where we reinterpret an AnyObject as an @objc protocol because we don’t know the dynamic type we would need to add a method to. This doesn’t happen much, and maybe we just say sending arbitrary messages requires an ObjC-side workaround. Or we commit to ‘_unsafeReferenceCast’ and de-underscore it.

(_unsafeReferenceCast is probably a better choice for StdlibUnittest, which is where I used this.)

Jordan

···

On May 12, 2016, at 18:34, Andrew Trick <atrick@apple.com> wrote:

- On the flip side, I think we do need to preserve the ability to reference-cast in order to send Objective-C messages, at least for now. I don’t know how I want to expose that to users, though. (In general it’s probably worth seeing how unsafeBitCast is used in the wild and what we’d recommend instead.)

Does ``X as Y`` fail for some reason? We have unchecked versions of ``X as Y`` for performance reasons: ``unsafeDowncast`` and ``_unsafeReferenceCast``.

I’m sorry, I got ASTContext::getPointerPointeePropertyDecl and TypeBase::getAnyPointerElementType mixed up. I’m still a little unsure that this is the right way to go, and I wonder how many uses of getAnyPointerElementType actually make sense for UnsafeBytePointer, but I see now that it’s the most incremental way to make this change.

For fun I looked through the (relatively small) set of uses for getAnyPointerElementType, and it looks like only the inout-to-pointer ones are relevant for UnsafeBytePointer. (These are the ones in lib/Sema, plus SILGen’s RValueEmitter::visitInOutToPointerExpr.) So it could be dropped as a pointer type. But it doesn’t seem to be doing any harm.

Jordan

···

On May 12, 2016, at 18:56, Andrew Trick <atrick@apple.com> wrote:

- What was the thought behind putting UnsafeBytePointer in PointerTypeKind? OpaquePointer isn’t there, and I’m concerned about places that test if something’s a pointer by checking that the pointee type is non-null (by far the common pattern).

In general I wanted UnsafeBytePointer to stand-in for UnsafePointer<Void> throughout the type system and handle most of the same implicit conversions. Specifically, I wanted getAnyPointerElementType to do the same thing as UnsafePointer<Void> and return an empty tuple pointee type so that the calling code could be reused. Also, I thought that supporting PointerToPointerExpr was necessary.

The only extra burden of doing this that I could find was that getPointerPointeePropertyDecl may return null. The only code that calls this is emitStoreToForeignErrorSlot.

I'm very open to alternate implementations, especially once the proposal is accepted.

Well, we can say "A program has undefined behavior if it does X or Y", or we can say "A program which does X or Y lacks type safety". In all cases we are referring to a concept defined elsewhere. If we say "undefined behavior", we are using an easily-googled term whose popular discussions will quickly inform the reader of the consequences of the violation. If we say "type safety", we are using a term with that's popularly used in very vague, hand-wavey ways and whose consequences aren't usually discussed outside of formal contexts. If we say "memory safety", we're using a term that doesn't even have that precedent. So we can use the latter two terms if we want, but that just means we need to have a standard place where we define them and describe the consequences of violating them, probably with at least a footnote saying "this is analogous to the undefined behavior rules of C and C++".

John.

···

On May 12, 2016, at 10:45 AM, Jordan Rose via swift-dev <swift-dev@swift.org> wrote:

On May 12, 2016, at 10:44, Joe Groff <jgroff@apple.com <mailto:jgroff@apple.com>> wrote:

On May 12, 2016, at 9:27 AM, Jordan Rose via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

- I’m uncomfortable with using the term “undefined behavior” as if it’s universally understood. Up until now we haven't formally had that notion in Swift, just “type safety” and “memory safety” and “invariant-preserving” and the like. Maybe we need it now, but I think it needs to be explicitly defined. (I’d actually talk to Dave about exactly what terms make the most sense for users.)

We do have undefined behavior, and use that term in the standard library docs where appropriate:

stdlib/public/core/Optional.swift- /// `!` (forced unwrap) operator. However, in optimized builds (`-O`), no
stdlib/public/core/Optional.swift- /// check is performed to ensure that the current instance actually has a
stdlib/public/core/Optional.swift- /// value. Accessing this property in the case of a `nil` value is a serious
stdlib/public/core/Optional.swift: /// programming error and could lead to undefined behavior or a runtime
stdlib/public/core/Optional.swift- /// error.
stdlib/public/core/Optional.swift- ///
stdlib/public/core/Optional.swift- /// In debug builds (`-Onone`), the `unsafelyUnwrapped` property has the same
--
stdlib/public/core/StringBridge.swift- /// The caller of this function guarantees that the closure 'body' does not
stdlib/public/core/StringBridge.swift- /// escape the object referenced by the opaque pointer passed to it or
stdlib/public/core/StringBridge.swift- /// anything transitively reachable form this object. Doing so
stdlib/public/core/StringBridge.swift: /// will result in undefined behavior.
stdlib/public/core/StringBridge.swift- @_semantics("self_no_escaping_closure")
stdlib/public/core/StringBridge.swift- func _unsafeWithNotEscapedSelfPointer<Result>(
stdlib/public/core/StringBridge.swift- _ body: @noescape (OpaquePointer) throws -> Result
--
stdlib/public/core/Unmanaged.swift- /// reference's lifetime fixed for the duration of the
stdlib/public/core/Unmanaged.swift- /// '_withUnsafeGuaranteedRef' call.
stdlib/public/core/Unmanaged.swift- ///
stdlib/public/core/Unmanaged.swift: /// Violation of this will incur undefined behavior.
stdlib/public/core/Unmanaged.swift- ///
stdlib/public/core/Unmanaged.swift- /// A lifetime of a reference 'the instance' is fixed over a point in the
stdlib/public/core/Unmanaged.swift- /// programm if:

Those latter two are in stdlib-internal declarations. I think I have the same objection with using the term for 'unsafelyUnwrapped'.

One thing Andy and I discussed is whether we could support aliasing addressors. The answer is yes, although it will important to force a copy in cases where a non-aliasing address is required, i.e. when passing the l-value as an inout argument or when implementing materializeForSet. An aliasing addressor would naturally return an UnsafeBytePointer value (or better yet a typed version of it).

John.

···

On May 12, 2016, at 7:47 PM, Jordan Rose via swift-dev <swift-dev@swift.org> wrote:

On May 12, 2016, at 18:56, Andrew Trick <atrick@apple.com <mailto:atrick@apple.com>> wrote:

- What was the thought behind putting UnsafeBytePointer in PointerTypeKind? OpaquePointer isn’t there, and I’m concerned about places that test if something’s a pointer by checking that the pointee type is non-null (by far the common pattern).

In general I wanted UnsafeBytePointer to stand-in for UnsafePointer<Void> throughout the type system and handle most of the same implicit conversions. Specifically, I wanted getAnyPointerElementType to do the same thing as UnsafePointer<Void> and return an empty tuple pointee type so that the calling code could be reused. Also, I thought that supporting PointerToPointerExpr was necessary.

The only extra burden of doing this that I could find was that getPointerPointeePropertyDecl may return null. The only code that calls this is emitStoreToForeignErrorSlot.

I'm very open to alternate implementations, especially once the proposal is accepted.

I’m sorry, I got ASTContext::getPointerPointeePropertyDecl and TypeBase::getAnyPointerElementType mixed up. I’m still a little unsure that this is the right way to go, and I wonder how many uses of getAnyPointerElementType actually make sense for UnsafeBytePointer, but I see now that it’s the most incremental way to make this change.

For fun I looked through the (relatively small) set of uses for getAnyPointerElementType, and it looks like only the inout-to-pointer ones are relevant for UnsafeBytePointer. (These are the ones in lib/Sema, plus SILGen’s RValueEmitter::visitInOutToPointerExpr.) So it could be dropped as a pointer type. But it doesn’t seem to be doing any harm.

IMHO “lacks type safety” implies you might get nonsense values but that’s it. I assume that’s the average lay-programmer’s interpretation of the concept too.

I would wager that the vast majority of developers don’t understand undefined behavior or don’t fully understand its consequences. The whole “the program is free to format your hard drive or print SAUSAGE, beep, and exit” discussion almost never fails to surprise anyone I’ve had it with. Doesn’t the C standard still say that the entire program itself is undefined, not just up until the point of executing undefined behavior? I bet if you took a poll you’d be lucky to get 5% of responders who knew that.

IMHO the standard library docs around Unmanaged, UnsafePointer, and OpaquePointer would benefit greatly with much more in-depth explanations of the concepts and consequences of failing to adhere to the rules. In theory developers should read the docs, check google, etc. In reality a huge number of them will possibly bother to check the headers or run with the QuickHelp pane open and that’s the extent of the exposure they’ll have. One could say they shouldn’t be messing with such unsafe features if that’s the case but that won’t stop someone from writing the next Heartbleed by abusing UnsafeMutablePointer (or perhaps less dramatically just create exploitable security holes in a framework used by a bunch of apps).

Again just my opinion but I’d love to see Swift set a standard of over-explaining anytime potentially unsafe operations are involved. I think it would promote a culture of careful consideration around the use of unsafe operations.

Russ

···

On May 12, 2016, at 11:21 AM, John McCall via swift-dev <swift-dev@swift.org> wrote:

Well, we can say "A program has undefined behavior if it does X or Y", or we can say "A program which does X or Y lacks type safety". In all cases we are referring to a concept defined elsewhere. If we say "undefined behavior", we are using an easily-googled term whose popular discussions will quickly inform the reader of the consequences of the violation. If we say "type safety", we are using a term with that's popularly used in very vague, hand-wavey ways and whose consequences aren't usually discussed outside of formal contexts. If we say "memory safety", we're using a term that doesn't even have that precedent. So we can use the latter two terms if we want, but that just means we need to have a standard place where we define them and describe the consequences of violating them, probably with at least a footnote saying "this is analogous to the undefined behavior rules of C and C++".

John.

In other places where the standard library intentionally has undefined behavior, it looks like we use the term "serious programming error", for instance in the the doc comment for `assert`:

/// * In -Ounchecked builds, `condition` is not evaluated, but the
/// optimizer may assume that it *would* evaluate to `true`. Failure
/// to satisfy that assumption in -Ounchecked builds is a serious
/// programming error.

which feels a bit colloquial to me, and doesn't provide much insight into the full consequences of UB. I think we're better off using an established term.

-Joe

···

On May 12, 2016, at 11:21 AM, John McCall <rjmccall@apple.com> wrote:

On May 12, 2016, at 10:45 AM, Jordan Rose via swift-dev <swift-dev@swift.org> wrote:

On May 12, 2016, at 10:44, Joe Groff <jgroff@apple.com> wrote:

On May 12, 2016, at 9:27 AM, Jordan Rose via swift-dev <swift-dev@swift.org> wrote:

- I’m uncomfortable with using the term “undefined behavior” as if it’s universally understood. Up until now we haven't formally had that notion in Swift, just “type safety” and “memory safety” and “invariant-preserving” and the like. Maybe we need it now, but I think it needs to be explicitly defined. (I’d actually talk to Dave about exactly what terms make the most sense for users.)

We do have undefined behavior, and use that term in the standard library docs where appropriate:

stdlib/public/core/Optional.swift- /// `!` (forced unwrap) operator. However, in optimized builds (`-O`), no
stdlib/public/core/Optional.swift- /// check is performed to ensure that the current instance actually has a
stdlib/public/core/Optional.swift- /// value. Accessing this property in the case of a `nil` value is a serious
stdlib/public/core/Optional.swift: /// programming error and could lead to undefined behavior or a runtime
stdlib/public/core/Optional.swift- /// error.
stdlib/public/core/Optional.swift- ///
stdlib/public/core/Optional.swift- /// In debug builds (`-Onone`), the `unsafelyUnwrapped` property has the same
--
stdlib/public/core/StringBridge.swift- /// The caller of this function guarantees that the closure 'body' does not
stdlib/public/core/StringBridge.swift- /// escape the object referenced by the opaque pointer passed to it or
stdlib/public/core/StringBridge.swift- /// anything transitively reachable form this object. Doing so
stdlib/public/core/StringBridge.swift: /// will result in undefined behavior.
stdlib/public/core/StringBridge.swift- @_semantics("self_no_escaping_closure")
stdlib/public/core/StringBridge.swift- func _unsafeWithNotEscapedSelfPointer<Result>(
stdlib/public/core/StringBridge.swift- _ body: @noescape (OpaquePointer) throws -> Result
--
stdlib/public/core/Unmanaged.swift- /// reference's lifetime fixed for the duration of the
stdlib/public/core/Unmanaged.swift- /// '_withUnsafeGuaranteedRef' call.
stdlib/public/core/Unmanaged.swift- ///
stdlib/public/core/Unmanaged.swift: /// Violation of this will incur undefined behavior.
stdlib/public/core/Unmanaged.swift- ///
stdlib/public/core/Unmanaged.swift- /// A lifetime of a reference 'the instance' is fixed over a point in the
stdlib/public/core/Unmanaged.swift- /// programm if:

Those latter two are in stdlib-internal declarations. I think I have the same objection with using the term for 'unsafelyUnwrapped'.

Well, we can say "A program has undefined behavior if it does X or Y", or we can say "A program which does X or Y lacks type safety". In all cases we are referring to a concept defined elsewhere. If we say "undefined behavior", we are using an easily-googled term whose popular discussions will quickly inform the reader of the consequences of the violation. If we say "type safety", we are using a term with that's popularly used in very vague, hand-wavey ways and whose consequences aren't usually discussed outside of formal contexts. If we say "memory safety", we're using a term that doesn't even have that precedent. So we can use the latter two terms if we want, but that just means we need to have a standard place where we define them and describe the consequences of violating them, probably with at least a footnote saying "this is analogous to the undefined behavior rules of C and C++".

Agreed.

Do we have a good place to document common terms? Preferably one that isn't a book?

John.

···

On May 12, 2016, at 3:21 PM, Joe Groff <jgroff@apple.com> wrote:

On May 12, 2016, at 11:21 AM, John McCall <rjmccall@apple.com> wrote:

On May 12, 2016, at 10:45 AM, Jordan Rose via swift-dev <swift-dev@swift.org> wrote:

On May 12, 2016, at 10:44, Joe Groff <jgroff@apple.com> wrote:

On May 12, 2016, at 9:27 AM, Jordan Rose via swift-dev <swift-dev@swift.org> wrote:

- I’m uncomfortable with using the term “undefined behavior” as if it’s universally understood. Up until now we haven't formally had that notion in Swift, just “type safety” and “memory safety” and “invariant-preserving” and the like. Maybe we need it now, but I think it needs to be explicitly defined. (I’d actually talk to Dave about exactly what terms make the most sense for users.)

We do have undefined behavior, and use that term in the standard library docs where appropriate:

stdlib/public/core/Optional.swift- /// `!` (forced unwrap) operator. However, in optimized builds (`-O`), no
stdlib/public/core/Optional.swift- /// check is performed to ensure that the current instance actually has a
stdlib/public/core/Optional.swift- /// value. Accessing this property in the case of a `nil` value is a serious
stdlib/public/core/Optional.swift: /// programming error and could lead to undefined behavior or a runtime
stdlib/public/core/Optional.swift- /// error.
stdlib/public/core/Optional.swift- ///
stdlib/public/core/Optional.swift- /// In debug builds (`-Onone`), the `unsafelyUnwrapped` property has the same
--
stdlib/public/core/StringBridge.swift- /// The caller of this function guarantees that the closure 'body' does not
stdlib/public/core/StringBridge.swift- /// escape the object referenced by the opaque pointer passed to it or
stdlib/public/core/StringBridge.swift- /// anything transitively reachable form this object. Doing so
stdlib/public/core/StringBridge.swift: /// will result in undefined behavior.
stdlib/public/core/StringBridge.swift- @_semantics("self_no_escaping_closure")
stdlib/public/core/StringBridge.swift- func _unsafeWithNotEscapedSelfPointer<Result>(
stdlib/public/core/StringBridge.swift- _ body: @noescape (OpaquePointer) throws -> Result
--
stdlib/public/core/Unmanaged.swift- /// reference's lifetime fixed for the duration of the
stdlib/public/core/Unmanaged.swift- /// '_withUnsafeGuaranteedRef' call.
stdlib/public/core/Unmanaged.swift- ///
stdlib/public/core/Unmanaged.swift: /// Violation of this will incur undefined behavior.
stdlib/public/core/Unmanaged.swift- ///
stdlib/public/core/Unmanaged.swift- /// A lifetime of a reference 'the instance' is fixed over a point in the
stdlib/public/core/Unmanaged.swift- /// programm if:

Those latter two are in stdlib-internal declarations. I think I have the same objection with using the term for 'unsafelyUnwrapped'.

Well, we can say "A program has undefined behavior if it does X or Y", or we can say "A program which does X or Y lacks type safety". In all cases we are referring to a concept defined elsewhere. If we say "undefined behavior", we are using an easily-googled term whose popular discussions will quickly inform the reader of the consequences of the violation. If we say "type safety", we are using a term with that's popularly used in very vague, hand-wavey ways and whose consequences aren't usually discussed outside of formal contexts. If we say "memory safety", we're using a term that doesn't even have that precedent. So we can use the latter two terms if we want, but that just means we need to have a standard place where we define them and describe the consequences of violating them, probably with at least a footnote saying "this is analogous to the undefined behavior rules of C and C++".

In other places where the standard library intentionally has undefined behavior, it looks like we use the term "serious programming error", for instance in the the doc comment for `assert`:

/// * In -Ounchecked builds, `condition` is not evaluated, but the
/// optimizer may assume that it *would* evaluate to `true`. Failure
/// to satisfy that assumption in -Ounchecked builds is a serious
/// programming error.

which feels a bit colloquial to me, and doesn't provide much insight into the full consequences of UB. I think we're better off using an established term.

Am I the only one who sees defining "undefined behavior" as a paradox?

I'm not disagreeing with better documentation, but there's no way to specify the behavor of compiled code once you feed the compiler an incorrect fact. Violating a simple constraint that two pointers cannot alias can easily lead to executing code paths that would not otherwise be executed, hence unknown side effects. We could make statements about the current implemenation of the compiler but that would only be misleading as it's impossible to make any guarantee about future compilers once you've violated the contract. The implementation should make common cases less surprising, but limits on the possible side effects can't be specified. Once you intentionally step beyond the protection that the Swift language provides, you're firmly in C/C++ compiler territory. So for more on that, see one of the many discussions out there on the topic in general.

What we should try really hard to do is to make it clear what rules programmers need to follow to safely use "unsafe" constructs. Once you have those rules, you have a contract with future compilers and you can write code sanitizers.

I'm specifically focussing on UnsafePointer's Pointee type, because making that safer requires source breaking changes, and because the rules were so nonobvious. This is an API that programmers use when they are comfortable taking responsibility for the lifetime and bounds of an object. They are probably not expecting to take responsibility for type safety, and likely not even aware of strict aliasing rules.

-Andy

···

On May 12, 2016, at 4:03 PM, John McCall via swift-dev <swift-dev@swift.org> wrote:

On May 12, 2016, at 3:21 PM, Joe Groff <jgroff@apple.com> wrote:

On May 12, 2016, at 11:21 AM, John McCall <rjmccall@apple.com> wrote:

On May 12, 2016, at 10:45 AM, Jordan Rose via swift-dev <swift-dev@swift.org> wrote:

On May 12, 2016, at 10:44, Joe Groff <jgroff@apple.com> wrote:

On May 12, 2016, at 9:27 AM, Jordan Rose via swift-dev <swift-dev@swift.org> wrote:

- I’m uncomfortable with using the term “undefined behavior” as if it’s universally understood. Up until now we haven't formally had that notion in Swift, just “type safety” and “memory safety” and “invariant-preserving” and the like. Maybe we need it now, but I think it needs to be explicitly defined. (I’d actually talk to Dave about exactly what terms make the most sense for users.)

We do have undefined behavior, and use that term in the standard library docs where appropriate:

stdlib/public/core/Optional.swift- /// `!` (forced unwrap) operator. However, in optimized builds (`-O`), no
stdlib/public/core/Optional.swift- /// check is performed to ensure that the current instance actually has a
stdlib/public/core/Optional.swift- /// value. Accessing this property in the case of a `nil` value is a serious
stdlib/public/core/Optional.swift: /// programming error and could lead to undefined behavior or a runtime
stdlib/public/core/Optional.swift- /// error.
stdlib/public/core/Optional.swift- ///
stdlib/public/core/Optional.swift- /// In debug builds (`-Onone`), the `unsafelyUnwrapped` property has the same
--
stdlib/public/core/StringBridge.swift- /// The caller of this function guarantees that the closure 'body' does not
stdlib/public/core/StringBridge.swift- /// escape the object referenced by the opaque pointer passed to it or
stdlib/public/core/StringBridge.swift- /// anything transitively reachable form this object. Doing so
stdlib/public/core/StringBridge.swift: /// will result in undefined behavior.
stdlib/public/core/StringBridge.swift- @_semantics("self_no_escaping_closure")
stdlib/public/core/StringBridge.swift- func _unsafeWithNotEscapedSelfPointer<Result>(
stdlib/public/core/StringBridge.swift- _ body: @noescape (OpaquePointer) throws -> Result
--
stdlib/public/core/Unmanaged.swift- /// reference's lifetime fixed for the duration of the
stdlib/public/core/Unmanaged.swift- /// '_withUnsafeGuaranteedRef' call.
stdlib/public/core/Unmanaged.swift- ///
stdlib/public/core/Unmanaged.swift: /// Violation of this will incur undefined behavior.
stdlib/public/core/Unmanaged.swift- ///
stdlib/public/core/Unmanaged.swift- /// A lifetime of a reference 'the instance' is fixed over a point in the
stdlib/public/core/Unmanaged.swift- /// programm if:

Those latter two are in stdlib-internal declarations. I think I have the same objection with using the term for 'unsafelyUnwrapped'.

Well, we can say "A program has undefined behavior if it does X or Y", or we can say "A program which does X or Y lacks type safety". In all cases we are referring to a concept defined elsewhere. If we say "undefined behavior", we are using an easily-googled term whose popular discussions will quickly inform the reader of the consequences of the violation. If we say "type safety", we are using a term with that's popularly used in very vague, hand-wavey ways and whose consequences aren't usually discussed outside of formal contexts. If we say "memory safety", we're using a term that doesn't even have that precedent. So we can use the latter two terms if we want, but that just means we need to have a standard place where we define them and describe the consequences of violating them, probably with at least a footnote saying "this is analogous to the undefined behavior rules of C and C++".

In other places where the standard library intentionally has undefined behavior, it looks like we use the term "serious programming error", for instance in the the doc comment for `assert`:

/// * In -Ounchecked builds, `condition` is not evaluated, but the
/// optimizer may assume that it *would* evaluate to `true`. Failure
/// to satisfy that assumption in -Ounchecked builds is a serious
/// programming error.

which feels a bit colloquial to me, and doesn't provide much insight into the full consequences of UB. I think we're better off using an established term.

Agreed.

Do we have a good place to document common terms? Preferably one that isn't a book?

John.

- I’m uncomfortable with using the term “undefined behavior” as if it’s universally understood. Up until now we haven't formally had that notion in Swift, just “type safety” and “memory safety” and “invariant-preserving” and the like. Maybe we need it now, but I think it needs to be explicitly defined. (I’d actually talk to Dave about exactly what terms make the most sense for users.)

We do have undefined behavior, and use that term in the standard library docs where appropriate:

stdlib/public/core/Optional.swift- /// `!` (forced unwrap) operator. However, in optimized builds (`-O`), no
stdlib/public/core/Optional.swift- /// check is performed to ensure that the current instance actually has a
stdlib/public/core/Optional.swift- /// value. Accessing this property in the case of a `nil` value is a serious
stdlib/public/core/Optional.swift: /// programming error and could lead to undefined behavior or a runtime
stdlib/public/core/Optional.swift- /// error.
stdlib/public/core/Optional.swift- ///
stdlib/public/core/Optional.swift- /// In debug builds (`-Onone`), the `unsafelyUnwrapped` property has the same
--
stdlib/public/core/StringBridge.swift- /// The caller of this function guarantees that the closure 'body' does not
stdlib/public/core/StringBridge.swift- /// escape the object referenced by the opaque pointer passed to it or
stdlib/public/core/StringBridge.swift- /// anything transitively reachable form this object. Doing so
stdlib/public/core/StringBridge.swift: /// will result in undefined behavior.
stdlib/public/core/StringBridge.swift- @_semantics("self_no_escaping_closure")
stdlib/public/core/StringBridge.swift- func _unsafeWithNotEscapedSelfPointer<Result>(
stdlib/public/core/StringBridge.swift- _ body: @noescape (OpaquePointer) throws -> Result
--
stdlib/public/core/Unmanaged.swift- /// reference's lifetime fixed for the duration of the
stdlib/public/core/Unmanaged.swift- /// '_withUnsafeGuaranteedRef' call.
stdlib/public/core/Unmanaged.swift- ///
stdlib/public/core/Unmanaged.swift: /// Violation of this will incur undefined behavior.
stdlib/public/core/Unmanaged.swift- ///
stdlib/public/core/Unmanaged.swift- /// A lifetime of a reference 'the instance' is fixed over a point in the
stdlib/public/core/Unmanaged.swift- /// programm if:

Those latter two are in stdlib-internal declarations. I think I have the same objection with using the term for 'unsafelyUnwrapped'.

Well, we can say "A program has undefined behavior if it does X or Y", or we can say "A program which does X or Y lacks type safety". In all cases we are referring to a concept defined elsewhere. If we say "undefined behavior", we are using an easily-googled term whose popular discussions will quickly inform the reader of the consequences of the violation. If we say "type safety", we are using a term with that's popularly used in very vague, hand-wavey ways and whose consequences aren't usually discussed outside of formal contexts. If we say "memory safety", we're using a term that doesn't even have that precedent. So we can use the latter two terms if we want, but that just means we need to have a standard place where we define them and describe the consequences of violating them, probably with at least a footnote saying "this is analogous to the undefined behavior rules of C and C++".

In other places where the standard library intentionally has undefined behavior, it looks like we use the term "serious programming error", for instance in the the doc comment for `assert`:

/// * In -Ounchecked builds, `condition` is not evaluated, but the
/// optimizer may assume that it *would* evaluate to `true`. Failure
/// to satisfy that assumption in -Ounchecked builds is a serious
/// programming error.

which feels a bit colloquial to me, and doesn't provide much insight into the full consequences of UB. I think we're better off using an established term.

Agreed.

Do we have a good place to document common terms? Preferably one that isn't a book?

We cite Wikipedia in the stdlib doc comments, why can't we cite TSPL, and put the detailed discussion there?

-Chris

···

On May 12, 2016, at 4:03 PM, John McCall via swift-dev <swift-dev@swift.org> wrote:

On May 12, 2016, at 3:21 PM, Joe Groff <jgroff@apple.com> wrote:

On May 12, 2016, at 11:21 AM, John McCall <rjmccall@apple.com> wrote:

On May 12, 2016, at 10:45 AM, Jordan Rose via swift-dev <swift-dev@swift.org> wrote:
On May 12, 2016, at 10:44, Joe Groff <jgroff@apple.com> wrote:

On May 12, 2016, at 9:27 AM, Jordan Rose via swift-dev <swift-dev@swift.org> wrote:

John.
_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

I don't feel comfortable editing TSPL. Should I?

John.

···

On May 20, 2016, at 7:56 AM, Chris Lattner <clattner@apple.com> wrote:
On May 12, 2016, at 4:03 PM, John McCall via swift-dev <swift-dev@swift.org> wrote:

On May 12, 2016, at 3:21 PM, Joe Groff <jgroff@apple.com> wrote:

On May 12, 2016, at 11:21 AM, John McCall <rjmccall@apple.com> wrote:

On May 12, 2016, at 10:45 AM, Jordan Rose via swift-dev <swift-dev@swift.org> wrote:
On May 12, 2016, at 10:44, Joe Groff <jgroff@apple.com> wrote:

On May 12, 2016, at 9:27 AM, Jordan Rose via swift-dev <swift-dev@swift.org> wrote:

- I’m uncomfortable with using the term “undefined behavior” as if it’s universally understood. Up until now we haven't formally had that notion in Swift, just “type safety” and “memory safety” and “invariant-preserving” and the like. Maybe we need it now, but I think it needs to be explicitly defined. (I’d actually talk to Dave about exactly what terms make the most sense for users.)

We do have undefined behavior, and use that term in the standard library docs where appropriate:

stdlib/public/core/Optional.swift- /// `!` (forced unwrap) operator. However, in optimized builds (`-O`), no
stdlib/public/core/Optional.swift- /// check is performed to ensure that the current instance actually has a
stdlib/public/core/Optional.swift- /// value. Accessing this property in the case of a `nil` value is a serious
stdlib/public/core/Optional.swift: /// programming error and could lead to undefined behavior or a runtime
stdlib/public/core/Optional.swift- /// error.
stdlib/public/core/Optional.swift- ///
stdlib/public/core/Optional.swift- /// In debug builds (`-Onone`), the `unsafelyUnwrapped` property has the same
--
stdlib/public/core/StringBridge.swift- /// The caller of this function guarantees that the closure 'body' does not
stdlib/public/core/StringBridge.swift- /// escape the object referenced by the opaque pointer passed to it or
stdlib/public/core/StringBridge.swift- /// anything transitively reachable form this object. Doing so
stdlib/public/core/StringBridge.swift: /// will result in undefined behavior.
stdlib/public/core/StringBridge.swift- @_semantics("self_no_escaping_closure")
stdlib/public/core/StringBridge.swift- func _unsafeWithNotEscapedSelfPointer<Result>(
stdlib/public/core/StringBridge.swift- _ body: @noescape (OpaquePointer) throws -> Result
--
stdlib/public/core/Unmanaged.swift- /// reference's lifetime fixed for the duration of the
stdlib/public/core/Unmanaged.swift- /// '_withUnsafeGuaranteedRef' call.
stdlib/public/core/Unmanaged.swift- ///
stdlib/public/core/Unmanaged.swift: /// Violation of this will incur undefined behavior.
stdlib/public/core/Unmanaged.swift- ///
stdlib/public/core/Unmanaged.swift- /// A lifetime of a reference 'the instance' is fixed over a point in the
stdlib/public/core/Unmanaged.swift- /// programm if:

Those latter two are in stdlib-internal declarations. I think I have the same objection with using the term for 'unsafelyUnwrapped'.

Well, we can say "A program has undefined behavior if it does X or Y", or we can say "A program which does X or Y lacks type safety". In all cases we are referring to a concept defined elsewhere. If we say "undefined behavior", we are using an easily-googled term whose popular discussions will quickly inform the reader of the consequences of the violation. If we say "type safety", we are using a term with that's popularly used in very vague, hand-wavey ways and whose consequences aren't usually discussed outside of formal contexts. If we say "memory safety", we're using a term that doesn't even have that precedent. So we can use the latter two terms if we want, but that just means we need to have a standard place where we define them and describe the consequences of violating them, probably with at least a footnote saying "this is analogous to the undefined behavior rules of C and C++".

In other places where the standard library intentionally has undefined behavior, it looks like we use the term "serious programming error", for instance in the the doc comment for `assert`:

/// * In -Ounchecked builds, `condition` is not evaluated, but the
/// optimizer may assume that it *would* evaluate to `true`. Failure
/// to satisfy that assumption in -Ounchecked builds is a serious
/// programming error.

which feels a bit colloquial to me, and doesn't provide much insight into the full consequences of UB. I think we're better off using an established term.

Agreed.

Do we have a good place to document common terms? Preferably one that isn't a book?

We cite Wikipedia in the stdlib doc comments, why can't we cite TSPL, and put the detailed discussion there?