[RFC] UnsafeBytePointer API for In-Memory Layout


(Andrew Trick) #1

Hello Swift evolution,

I sent this to swift-dev last week. Sorry to post on two lists!

Swift does a great job of protecting against undefined behavior--as long as you avoid "unsafe" APIs, that is. However, unsafe APIs are important for giving developers control over implementation details and performance. Naturally, the contract between unsafe APIs and the optimizer is crucial. When a developer uses an unsafe API, the rules governing safe, well-defined behavior must be clear. On the opposite end, the optimizer must know which assumptions it can make based on those rules. Simply saying that anything goes because "unsafe" is in the name is not helpful to this effort.

For a long time, I've wanted these rules nailed down. We have more users taking advantage of advanced features, and more optimizations that take advantage of assumptions guided by the type system. This seems like a particularly good time to resolve UnsafePointer semantics, considering the type system and UnsafePointer work that's been going on recently. Strict aliasing is something I would like addressed. If we do nothing here, then we will end up by default inheriting C/C++ semantics, as with any language that relies on a C/C++ backend. In other words, developers will be forced to write code with technically undefined behavior and rely on the compiler to be smart enough to recognize and recover from common patterns. Or we can take advantage of this opportunity and instead adopt a sound memory model with respect to aliasing.

This proposal is only an RFC at this point. I'm sending it out now to allow for plenty of time for discussion (or advance warning). Keep in mind that it could change considerably before it goes up for review.

-Andy

UnsafeBytePointer API for In-Memory Layout

Proposal: SE-NNNN <https://github.com/atrick/swift-evolution/blob/voidpointer/proposals/XXXX-unsafebytepointer.md>
Author(s): Andrew Trick <https://github.com/atrick>
Status: Awaiting review <https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#rationale>
Review manager: TBD
<https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#introduction>Introduction

UnsafePointer and UnsafeMutable refer to a typed region of memory, and the compiler must be able to assume that UnsafePointer element (Pointee) type is consistent with other access to the same memory. See proposed Type Safe Memory Access documentation <https://github.com/atrick/swift/blob/type-safe-mem-docs/docs/TypeSafeMemory.rst>. Consequently, inferred conversion between UnsafePointer element types exposes an easy way to abuse the type system. No alternative currently exists for manual memory layout and direct access to untyped memory, and that leads to an overuse of UnsafePointer. These uses of UnsafePointer, which depend on pointer type conversion, make accidental type punning likely. Type punning via UnsafePointer is semantically undefined behavior and de facto undefined behavior given the optimizer's long-time treatment of UnsafePointer.

In this document, all mentions of UnsafePointer also apply to UnsafeMutablePointer.

<https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#motivation>Motivation

To avoid accidental type punning, we should prohibit inferred conversion between UnsafePointer<T> and UnsafePointer<U> unless the target of the conversion is an untyped or nondereferenceable pointer (currently represented as UnsafePointer<Void>).

To support this change we should introduce a new pointer type that does not bind the type of its Pointee. Such a new pointer type would provide an ideal foundation for an API that allows byte-wise pointer arithmetic and a legal, well-defined means to access an untyped region of memory.

As motivation for such an API, consider that an UnsafePointer<Void> or OpaquePointer may be currently be obtained from an external API. However, the developer may know the memory layout and may want to read or write elements whose types are compatible with that layout. This a reasonable use case, but unless the developer can guarantee that all accesses to the same memory location have the same type, then they cannot use UnsafePointer to access the memory without risking undefined behavior.

An UnsafeBytePointer example, using a new proposed API is included below.

<https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#proposed-solution>Proposed solution

Introduce an UnsafeBytePointer type along with an API for obtaining a UnsafeBytePointer value at a relative byte offset and loading and storing arbitrary types at that location.

Statically prohibit inferred UnsafePointer conversion while allowing inferred UnsafePointer to UnsafeBytePointerconversion.

UnsafeBytePointer meets multiple requirements:

An untyped pointer to memory
Pointer arithmetic within byte-addressable memory
Type-unsafe access to memory (legal type punning)
UnsafeBytePointer will replace UnsafeMutablePointer<Void> as the representation for untyped memory. For API clarify we could consider a typealias for VoidPointer. I don't think a separate VoidPointer type would be useful--there's no danger that UnsafeBytePointer will be casually dereferenced, and don't see the danger in allowing pointer arithmetic since the only reasonable interpretation is that of a byte-addressable memory.

Providing an API for type-unsafe memory access would not serve a purpose without the ability to compute byte offsets. Of course, we could require users to convert back and forth using bitPatterns, but I think that would be awkward and only obscure the purpose of the UnsafeBytePointer type.

In this proposal, UnsafeBytePointer does not specify mutability. Adding an UnsafeMutableBytePointer would be straightforward, but adding another pointer type needs strong justification. I expect to get input from the community on this. If we agree that the imported type for const void* should be UnsafeBytePointer, then we probably need UnsafeMutablePointer to handle interoperability.

<https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#detailed-design>Detailed design

The public API is shown here. For details and comments, see the unsafeptr_convert branch <https://github.com/atrick/swift/commits/unsafeptr_convert>.

struct UnsafeBytePointer : Hashable, _Pointer {

  let _rawValue: Builtin.RawPointer

  var hashValue: Int {...}

  init<T>(_ : UnsafePointer<T>)
  init<T>(_ : UnsafeMutablePointer<T>)
  init?<T>(_ : UnsafePointer<T>?)
  init?<T>(_ : UnsafeMutablePointer<T>?)

  init<T>(_ : OpaquePointer<T>)
  init?<T>(_ : OpaquePointer<T>?)

  init?(bitPattern: Int)
  init?(bitPattern: UInt)

  func load<T>(_ : T.Type) -> T

  @warn_unused_result
  init(allocatingBytes size: Int, alignedTo: Int)

  @warn_unused_result
  init<T>(allocatingCapacity count: Int, of: T.Type)

  func deallocateBytes(_ size: Int, alignedTo: Int)

  func deallocateCapacity<T>(_ num: Int, of: T.Type)

  // Returns a pointer one byte after the initialized memory.
  func initialize<T>(with newValue: T, count: Int = 1) -> UnsafeBytePointer

  // Returns a pointer one byte after the initialized memory.
  func initialize<T>(from: UnsafePointer<T>, count: Int) -> UnsafeBytePointer

  func initializeBackward<T>(from source: UnsafePointer<T>, count: Int)

  func deinitialize<T>(_ : T.Type, count: Int = 1)
}

extension OpaquePointer {
  init(_ : UnsafeBytePointer)
}

extension Int {
  init(bitPattern: UnsafeBytePointer)
}

extension UInt {
  init(bitPattern: UnsafeBytePointer)
}

extension UnsafeBytePointer : RandomAccessIndex {
  typealias Distance = Int

  func successor() -> UnsafeBytePointer
  func predecessor() -> UnsafeBytePointer
  func distance(to : UnsafeBytePointer) -> Int
  func advanced(by : Int) -> UnsafeBytePointer
}

func == (lhs: UnsafeBytePointer, rhs: UnsafeBytePointer) -> Bool

func < (lhs: UnsafeBytePointer, rhs: UnsafeBytePointer) -> Bool

func + (lhs: UnsafeBytePointer, rhs: Int) -> UnsafeBytePointer

func + (lhs: Int, rhs: UnsafeBytePointer) -> UnsafeBytePointer

func - (lhs: UnsafeBytePointer, rhs: Int) -> UnsafeBytePointer

func - (lhs: UnsafeBytePointer, rhs: UnsafeBytePointer) -> Int

func += (lhs: inout UnsafeBytePointer, rhs: Int)

func -= (lhs: inout UnsafeBytePointer, rhs: Int)
Occasionally, we need to convert from a UnsafeBytePointer to an UnsafePointer. This should only be done in very rare circumstances when the author understands the compiler's strict type rules for UnsafePointer. Although this could be done by casting through an OpaquePointer, an explicit, designated unsafe pointer cast API would makes the risks more obvious and self-documenting. For example:

extension UnsafePointer {
  init(_ from: UnsafeBytePointer, toPointee: Pointee.type)
}
extension UnsafeMutablePointer {
  init(_ from: UnsafeBytePointer, toPointee: Pointee.type)
}
Similarly, conversion between UnsafePointer types must now be spelled with an explicitly Pointee type:

extension UnsafePointer {
  init<U>(_ from: UnsafePointer<U>, toPointee: Pointee.Type)
  init<U>(_ from: UnsafeMutablePointer<U>, toPointee: Pointee.Type)
}
extension UnsafeMutablePointer {
  init<U>(_ from: UnsafeMutablePointer<U>, toPointee: Pointee.Type)
}
<https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#impact-on-existing-code>Impact on existing code

The largest impact of this change is that void* and const void* are imported as UnsafeBytePointer. This impacts many public APIs, but with implicit argument conversion should not affect typical uses of those APIs.

Any Swift projects that rely on type inference to convert between UnsafePointer types will need to take action. The developer needs to determine whether type punning is necessary. If so, they must migrate to the UnsafeBytePointer API. Otherwise, they can work around the new restriction by using a toPointee, or mutating label.

Disallowing inferred UnsafePointer direct conversion requires some standard library code to use an explicit toPointeelabel for unsafe conversions that may violate strict aliasing.

All occurrences of Unsafe[Mutable]Pointer<Void> in the standard library are converted to UnsafeBytePointer. e.g. unsafeAddress() now returns UnsafeBytePointer, not UnsafePointer<Void>.

Some occurrences of Unsafe[Mutable]Pointer<Pointee> in the standard library are replaced with UnsafeBytePointer, either because the code was playing too loosely with strict aliasing rules, or because the code actually wanted to perform pointer arithmetic on byte-addresses.

StringCore.baseAddress changes from OpaquePointer to UnsafeBytePointer because it is computing byte offsets and accessing the memory. OpaquePointer is meant for bridging, but should be truly opaque; that is, nondereferenceable and not involved in address computation.

The StringCore implementation does a considerable amount of casting between different views of the String storage. The current implementation already demonstrates some awareness of strict aliasing rules. The rules are generally followed by ensuring that the StringBuffer only be accessed using the appropriate CodeUnit within Swift code. For interoperability and optimization, String buffers frequently need to be cast to and from CChar. This is valid as long access to the buffer from Swift is guarded by dynamic checks of the encoding type. These unsafe, but dynamically legal conversion points will now be labeled with toPointee.

CoreAudio utilities now use an UnsafeBytePointer.

<https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#implementation-status>Implementation status

On my unsafeptr_convert branch <https://github.com/atrick/swift/commits/unsafeptr_convert>, I've made most of the necessary changes to support the addition of UnsafeBytePointerand the removal of inferred UnsafePointer conversion.

There are a several things going on here in order to make it possible to build the standard library with the changes:

A new UnsafeBytePointer type is defined.

The type system imports void* as UnsafeBytePointer.

The type system handles implicit conversions to UnsafeBytePointer.

UnsafeBytePointer replaces both UnsafePointer<Void> and UnsafeMutablePointer<Void>.

The standard library was relying on inferred UnsafePointer conversion in over 100 places. Most of these conversions now either take an explicit label, such as 'toPointee', 'mutating'. Some have been rewritten.

Several places in the standard library that were playing loosely with strict aliasing or doing bytewise pointer arithmetic now use UnsafeBytePointer instead.

Explicit labeled Unsafe[Mutable]Pointer initializers are added.

The inferred Unsafe[Mutable]Pointer conversion is removed.

TODO:

Once this proposal is accepted, and the rules for casting between pointers types have been decided, we need to finish implementing the type system support. The current implementation (intentionally) breaks a few tests in pointer_conversion.swift. We also need to ensure that interoperability requirements are met. Currently, many argument casts to be explicitly labeled. The current implementation also makes it easy for users to hit an "ambiguous use of 'init'" error when relying on implicit argument conversion.

Additionally:

A name mangled abbreviation needs to be created for UnsafeBytePointer.

The StringAPI tests should probably be rewritten with UnsafeBytePointer.

The NSStringAPI utilities and tests may need to be ported to UnsafeBytePointer

The CoreAudio utilities and tests may need to be ported to UnsafeBytePointer.

<https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#alternatives-considered>Alternatives considered

<https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#existing-workaround>Existing workaround

In some cases, developers can safely reinterpret values to achieve the same effect as type punning:

let ptrI32 = UnsafeMutablePointer<Int32>(allocatingCapacity: 1)
ptrI32[0] = Int32()
let u = unsafeBitCast(ptrI32[0], to: UInt32.self)
Note that all access to the underlying memory is performed with the same element type. This is perfectly legitimate, but simply isn't a complete solution. It also does not eliminate the inherent danger in declaring a typed pointer and expecting it to point to values of a different type.

<https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#discarded-alternatives>Discarded alternatives

We considered adding a typePunnedMemory property to the existing Unsafe[Mutabale]Pointer API. This would provide a legal way to access a potentially type punned Unsafe[Mutabale]Pointer. However, it would certainly cause confusion without doing much to reduce likelihood of programmer error. Furthermore, there are no good use cases for such a property evident in the standard library.

The opaque _RawByte struct is a technique that allows for byte-addressable buffers while hiding the dangerous side effects of type punning (a _RawByte could be loaded but it's value cannot be directly inspected). UnsafePointer<_RawByte> is a clever alternative to UnsafeBytePointer. However, it doesn't do enough to prevent undefined behavior. The loaded _RawByte would naturally be accessed via unsafeBitCast, which would mislead the author into thinking that they have legally bypassed the type system. In actuality, this API blatantly violates strict aliasing. It theoretically results in undefined behavior as it stands, and may actually exhibit undefined behavior if the user recovers the loaded value.

To solve the safety problem with UnsafePointer<_RawByte>, the compiler could associate special semantics with a UnsafePointer bound to this concrete generic parameter type. Statically enforcing casting rules would be difficult if not impossible without new language features. It would also be impossible to distinguish between typed and untyped pointer APIs. For example, UnsafePointer<T>.load<U> would be a nonsensical vestige.

<https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#alternate-proposal-for-void-type>Alternate proposal for void* type

Changing the imported type for void* will be somewhat disruptive. Furthermore, this proposal currently drops the distinction between void* and const void*--an obvious loss of API information.

We could continue to import void* as UnsafeMutablePointer<Void> and const void* as UnsafePointer<Void>, which will continue to serve as an "opaque" untyped pointer. Converting to UnsafeBytePointer would be necesarry to perform pointer arithmetic or to conservatively handle possible type punning.

This alternative is much less disruptive, but we are left with two forms of untyped pointer, one of which (UnsafePointer) the type system somewhat conflates with typed pointers.

Given the current restrictions of the language, it's not clear how to statically enforce the necessary rules for castingUnsafePointer<Void> once general UnsafePointer<T> conversions are disallowed. The following conversions should be inferred, and implied for function arguments (ignoring mutability):

UnsafePointer<T> to UnsafePointer<Void>

UnsafePointer<Void> to UnsafeBytePointer

I did not implement this simpler design because my primary goal was to enforce legal pointer conversion and rid Swift code of undefined behavior. I can't do that while allowing UnsafePointer conversions.

<https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#api-improvements>API improvements

As proposed, the initialize API infers the stored value:

func initialize<T>(with newValue: T, count: Int = 1) -> UnsafeBytePointer
This is somewhat dangerous because the developer may not realize the size of the object(s) that will be written to memory. This can be easily asserted by checking the return pointer:

let newptr = ptr.initialize(with: 3)
assert(newptr - ptr == 8)
As an alternative, we could force the user to provide the expected type name in the initialize invocation:

func initialize<T>(_ T.Type, with newValue: T, count: Int = 1)
  -> UnsafeBytePointer
<https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#future-improvements>Future improvements

UnsafeBytePointer should eventually support unaligned memory access. I believe that we will eventually have a modifier that allows "packed" struct members. At that time we may also want to add a "packed" flag to UnsafeBytePointer's load and initialize methods.

When accessing a memory buffer, it is generally convenient to cast to a type with known layout and compute offsets relative to the type's size. This is how UnsafePointer<Pointee> works. A generic UnsafeTypePunnedPointer<Pointee> could be introduced with the same interface as UnsafePointer<Pointer>, but without the strict aliasing requirements. This seems like an overdesign simply to avoid calling strideof() in an rare use case, but nothing prevents adding this type later.

<https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#unsafebytepointer-example>UnsafeBytePointer example

/// An example of using UnsafeBytePointer to implement manual memory layout.

/// A Buffer for reading and writing basic types at a fixed address.
/// Indirection allows the buffer to refer to mutable state elsewhere.
struct MessageBuffer {
  let ptr: UnsafeBytePointer

  enum IndirectFlag { case Direct, Indirect }

  private func getPointer(atOffset n: Int, _ isIndirect: IndirectFlag)
  -> UnsafeBytePointer {
    switch isIndirect {
    case .Indirect:
      return (ptr + n).load(UnsafeBytePointer.self)
    case .Direct:
      return ptr + n
    }
  }

  func readUInt32(atOffset n: Int, _ isIndirect: IndirectFlag) -> UInt32 {
    return getPointer(atOffset: n, isIndirect).load(UInt32.self)
  }
  func readFloat32(atOffset n: Int, _ isIndirect: IndirectFlag) -> Float32 {
    return getPointer(atOffset: n, isIndirect).load(Float32.self)
  }

  func writeUInt32(_ val: UInt32, atOffset n: Int) {
    getPointer(atOffset: n, .Direct).initialize(with: val)
  }
  func writeFloat32(_ val: Float32, atOffset n: Int) {
    getPointer(atOffset: n, .Direct).initialize(with: val)
  }
  func writeIndirect(_ ptr: UnsafeBytePointer, atOffset n: Int) {
    getPointer(atOffset: n, .Direct).initialize(with: ptr)
  }
}

/// Encoded message format.
struct MessageFormat : Sequence, IteratorProtocol {
  typealias Element = MessageFormat

  private static let maxFormatFields = 32 / 4
  static let maxBufferBytes = maxFormatFields * strideof(UInt)

  var formatCode: UInt32 = 0
  var elementCode: UInt32 = 0
  var offset: Int = 0

  init(bitPattern: UInt32) {
    formatCode = bitPattern
  }

  enum Kind {
    case None, Reserved, UInt32, Float32, IndirectUInt32, IndirectFloat32
  }

  /// The first field's kind.
  var kind : Kind {
    get {
      switch elementCode {
      case 0x0: return Kind.None
      case 0x2: return Kind.UInt32
      case 0x3: return Kind.Float32
      case 0x6: return Kind.IndirectUInt32
      case 0x7: return Kind.IndirectFloat32
      default: return Kind.Reserved
      }
    }
  }

  func elementStride() -> Int {
    return (elementCode & 0x4) != 0 ? strideof(UInt) : 4
  }

  /// Get the format for the next element.
  mutating func next() -> Element? {
    if elementCode != 0 {
      offset += elementStride()
    }
    elementCode = formatCode & 0xF
    formatCode >>= 4
    if kind == .None {
      return nil
    }
    // align to the next element size
    let offsetMask = elementStride() - 1
    offset = (offset + offsetMask) & ~offsetMask
    return self
  }
}

func createBuffer() -> MessageBuffer {
  return MessageBuffer(ptr: UnsafeBytePointer(
      allocatingBytes: MessageFormat.maxBufferBytes, alignedTo: strideof(UInt)))
}

func destroy(buffer: MessageBuffer) {
  buffer.ptr.deallocateBytes(MessageFormat.maxBufferBytes,
    alignedTo: strideof(UInt))
}

var sharedInt: UInt32 = 42
var sharedFloat: Float32 = 16.25

func generateMessage(inBuffer mb: MessageBuffer) -> MessageFormat {
  let mf = MessageFormat(bitPattern: 0x06727632)
  for field in mf {
    switch field.kind {
    case .UInt32:
      mb.writeUInt32(66, atOffset: field.offset)
    case .Float32:
      mb.writeFloat32(41.625, atOffset: field.offset)
    case .IndirectUInt32:
      mb.writeIndirect(&sharedInt, atOffset: field.offset)
    case .IndirectFloat32:
      mb.writeIndirect(&sharedFloat, atOffset: field.offset)
    case .None:
      fallthrough
    case .Reserved:
      return MessageFormat(bitPattern: 0)
    }
  }
  return mf
}

func handleMessage(buffer mb: MessageBuffer, format: MessageFormat) -> Bool {
  for field in format {
    switch field.kind {
    case .UInt32:
      print(mb.readUInt32(atOffset: field.offset, .Direct))
    case .Float32:
      print(mb.readFloat32(atOffset: field.offset, .Direct))
    case .IndirectUInt32:
      print(mb.readUInt32(atOffset: field.offset, .Indirect))
    case .IndirectFloat32:
      print(mb.readFloat32(atOffset: field.offset, .Indirect))
    case .None:
      fallthrough
    case .Reserved:
      return false
    }
  }
  return true
}

func runProgram() {
  let mb = createBuffer()
  let mf = generateMessage(inBuffer: mb)
  if handleMessage(buffer: mb, format: mf) {
    print("Done")
  }
  destroy(buffer: mb)
}
runProgram()


(Joe Groff) #2

Regarding the UnsafeBytePointer API:

struct UnsafeBytePointer : Hashable, _Pointer {

  let _rawValue: Builtin.RawPointer

  var hashValue: Int {...}

  init<T>(_ : UnsafePointer<T>)
  init<T>(_ : UnsafeMutablePointer<T>)
  init?<T>(_ : UnsafePointer<T>?)
  init?<T>(_ : UnsafeMutablePointer<T>?)

  init<T>(_ : OpaquePointer<T>)
  init?<T>(_ : OpaquePointer<T>?)

  init?(bitPattern: Int)
  init?(bitPattern: UInt)

  func load<T>(_ : T.Type) -> T

  @warn_unused_result
  init(allocatingBytes size: Int, alignedTo: Int)

  @warn_unused_result
  init<T>(allocatingCapacity count: Int, of: T.Type)

  func deallocateBytes(_ size: Int, alignedTo: Int)

  func deallocateCapacity<T>(_ num: Int, of: T.Type)

  // Returns a pointer one byte after the initialized memory.
  func initialize<T>(with newValue: T, count: Int = 1) -> UnsafeBytePointer

  // Returns a pointer one byte after the initialized memory.
  func initialize<T>(from: UnsafePointer<T>, count: Int) -> UnsafeBytePointer

  func initializeBackward<T>(from source: UnsafePointer<T>, count: Int)

  func deinitialize<T>(_ : T.Type, count: Int = 1)
}

Should we also have 'assign' methods, matching 'initialize'? Should 'deinitialize' be called 'destroy', matching 'UnsafeMutablePointer's API?

-Joe


(Guillaume Lessard) #3

I’m sympathetic to the elimination of UnsafePointer<Void> as general shorthand for an arbitrary pointer, but I lose the plot of this very long proposal. It seems to me that this increases API surface, yet everything I could do before, I could still do; it just involves more typing. What exactly does this make better?

Cheers,
Guillaume Lessard


(Geordie J) #4

I read this proposal and I'm a bit unsure what its purpose would be:

Basically you want to prevent UnsafePointer<XYZ>(UnsafePointer<Void>)
conversions and/or vice-versa? And you'd achieve this by replacing
UnsafePointer<Void> with UnsafeBytePointer that has no bound pointer type?

In one sense the change seems fine to me, but as someone who uses a lot of
C APIs and a lot of CoreAudio/CoreMIDI in Swift already I can't really see
what benefit it'd bring. Presumably we'd still want an option of converting
UnsafeBytePointer to UnsafePointer<SomeActualType> for things like C
function pointer callback "context"/"userInfo" uses, so it's not like we'd
be preventing programmer error in that way.

Call me conservative but to me the current system seems to work as well as
it can. If anything it's already enough boilerplate going through hoops
converting an UnsafeMutablePointer<Void> into a [Float] even when I know
and the C API knows perfectly well what it actually contains... Would
happily be convinced otherwise about this proposal though, I'm pretty new
at all this.

Geordie
Andrew Trick via swift-evolution <swift-evolution@swift.org> schrieb am
Mo., 9. Mai 2016 um 20:15:

···

Hello Swift evolution,

I sent this to swift-dev last week. Sorry to post on two lists!

Swift does a great job of protecting against undefined behavior--as long
as you avoid "unsafe" APIs, that is. However, unsafe APIs are important for
giving developers control over implementation details and performance.
Naturally, the contract between unsafe APIs and the optimizer is crucial.
When a developer uses an unsafe API, the rules governing safe, well-defined
behavior must be clear. On the opposite end, the optimizer must know which
assumptions it can make based on those rules. Simply saying that anything
goes because "unsafe" is in the name is not helpful to this effort.

For a long time, I've wanted these rules nailed down. We have more users
taking advantage of advanced features, and more optimizations that take
advantage of assumptions guided by the type system. This seems like a
particularly good time to resolve UnsafePointer semantics, considering the
type system and UnsafePointer work that's been going on recently. Strict
aliasing is something I would like addressed. If we do nothing here, then
we will end up by default inheriting C/C++ semantics, as with any language
that relies on a C/C++ backend. In other words, developers will be forced
to write code with technically undefined behavior and rely on the compiler
to be smart enough to recognize and recover from common patterns. Or we can
take advantage of this opportunity and instead adopt a sound memory model
with respect to aliasing.

This proposal is only an RFC at this point. I'm sending it out now to
allow for plenty of time for discussion (or advance warning). Keep in mind
that it could change considerably before it goes up for review.

-Andy

UnsafeBytePointer API for In-Memory Layout

   - Proposal: SE-NNNN
   <https://github.com/atrick/swift-evolution/blob/voidpointer/proposals/XXXX-unsafebytepointer.md>
   - Author(s): Andrew Trick <https://github.com/atrick>
   - Status: Awaiting review
   <https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#rationale>
   - Review manager: TBD

<https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#introduction>
Introduction

UnsafePointer and UnsafeMutable refer to a typed region of memory, and
the compiler must be able to assume that UnsafePointer element (Pointee)
type is consistent with other access to the same memory. See proposed
Type Safe Memory Access documentation
<https://github.com/atrick/swift/blob/type-safe-mem-docs/docs/TypeSafeMemory.rst>.
Consequently, inferred conversion between UnsafePointer element types
exposes an easy way to abuse the type system. No alternative currently
exists for manual memory layout and direct access to untyped memory, and
that leads to an overuse of UnsafePointer. These uses of UnsafePointer,
which depend on pointer type conversion, make accidental type punning
likely. Type punning via UnsafePointer is semantically undefined behavior
and de facto undefined behavior given the optimizer's long-time treatment
of UnsafePointer.

In this document, all mentions of UnsafePointer also apply to
UnsafeMutablePointer.

<https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#motivation>
Motivation

To avoid accidental type punning, we should prohibit inferred conversion
between UnsafePointer<T> and UnsafePointer<U> unless the target of the
conversion is an untyped or nondereferenceable pointer (currently
represented as UnsafePointer<Void>).

To support this change we should introduce a new pointer type that does
not bind the type of its Pointee. Such a new pointer type would provide
an ideal foundation for an API that allows byte-wise pointer arithmetic and
a legal, well-defined means to access an untyped region of memory.

As motivation for such an API, consider that an UnsafePointer<Void> or
OpaquePointer may be currently be obtained from an external API. However,
the developer may know the memory layout and may want to read or write
elements whose types are compatible with that layout. This a reasonable use
case, but unless the developer can guarantee that all accesses to the same
memory location have the same type, then they cannot use UnsafePointer to
access the memory without risking undefined behavior.

An UnsafeBytePointer example, using a new proposed API is included below.

<https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#proposed-solution>Proposed
solution

Introduce an UnsafeBytePointer type along with an API for obtaining a
UnsafeBytePointer value at a relative byte offset and loading and storing
arbitrary types at that location.

Statically prohibit inferred UnsafePointer conversion while allowing
inferred UnsafePointer to UnsafeBytePointerconversion.

UnsafeBytePointer meets multiple requirements:

   1. An untyped pointer to memory
   2. Pointer arithmetic within byte-addressable memory
   3. Type-unsafe access to memory (legal type punning)

UnsafeBytePointer will replace UnsafeMutablePointer<Void> as the
representation for untyped memory. For API clarify we could consider a
typealias for VoidPointer. I don't think a separate VoidPointer type
would be useful--there's no danger that UnsafeBytePointer will be
casually dereferenced, and don't see the danger in allowing pointer
arithmetic since the only reasonable interpretation is that of a
byte-addressable memory.

Providing an API for type-unsafe memory access would not serve a purpose
without the ability to compute byte offsets. Of course, we could require
users to convert back and forth using bitPatterns, but I think that would
be awkward and only obscure the purpose of the UnsafeBytePointer type.

In this proposal, UnsafeBytePointer does not specify mutability. Adding an
UnsafeMutableBytePointer would be straightforward, but adding another
pointer type needs strong justification. I expect to get input from the
community on this. If we agree that the imported type for const void* should
be UnsafeBytePointer, then we probably need UnsafeMutablePointer to handle
interoperability.

<https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#detailed-design>Detailed
design

The public API is shown here. For details and comments, see the unsafeptr_convert
branch <https://github.com/atrick/swift/commits/unsafeptr_convert>.

struct UnsafeBytePointer : Hashable, _Pointer {

  let _rawValue: Builtin.RawPointer

  var hashValue: Int {...}

  init<T>(_ : UnsafePointer<T>)
  init<T>(_ : UnsafeMutablePointer<T>)
  init?<T>(_ : UnsafePointer<T>?)
  init?<T>(_ : UnsafeMutablePointer<T>?)

  init<T>(_ : OpaquePointer<T>)
  init?<T>(_ : OpaquePointer<T>?)

  init?(bitPattern: Int)
  init?(bitPattern: UInt)

  func load<T>(_ : T.Type) -> T

  @warn_unused_result
  init(allocatingBytes size: Int, alignedTo: Int)

  @warn_unused_result
  init<T>(allocatingCapacity count: Int, of: T.Type)

  func deallocateBytes(_ size: Int, alignedTo: Int)

  func deallocateCapacity<T>(_ num: Int, of: T.Type)

  // Returns a pointer one byte after the initialized memory.
  func initialize<T>(with newValue: T, count: Int = 1) -> UnsafeBytePointer

  // Returns a pointer one byte after the initialized memory.
  func initialize<T>(from: UnsafePointer<T>, count: Int) -> UnsafeBytePointer

  func initializeBackward<T>(from source: UnsafePointer<T>, count: Int)

  func deinitialize<T>(_ : T.Type, count: Int = 1)
}
extension OpaquePointer {
  init(_ : UnsafeBytePointer)
}
extension Int {
  init(bitPattern: UnsafeBytePointer)
}
extension UInt {
  init(bitPattern: UnsafeBytePointer)
}
extension UnsafeBytePointer : RandomAccessIndex {
  typealias Distance = Int

  func successor() -> UnsafeBytePointer
  func predecessor() -> UnsafeBytePointer
  func distance(to : UnsafeBytePointer) -> Int
  func advanced(by : Int) -> UnsafeBytePointer
}
func == (lhs: UnsafeBytePointer, rhs: UnsafeBytePointer) -> Bool
func < (lhs: UnsafeBytePointer, rhs: UnsafeBytePointer) -> Bool
func + (lhs: UnsafeBytePointer, rhs: Int) -> UnsafeBytePointer
func + (lhs: Int, rhs: UnsafeBytePointer) -> UnsafeBytePointer
func - (lhs: UnsafeBytePointer, rhs: Int) -> UnsafeBytePointer
func - (lhs: UnsafeBytePointer, rhs: UnsafeBytePointer) -> Int
func += (lhs: inout UnsafeBytePointer, rhs: Int)
func -= (lhs: inout UnsafeBytePointer, rhs: Int)

Occasionally, we need to convert from a UnsafeBytePointer to an
UnsafePointer. This should only be done in very rare circumstances when
the author understands the compiler's strict type rules for UnsafePointer.
Although this could be done by casting through an OpaquePointer, an
explicit, designated unsafe pointer cast API would makes the risks more
obvious and self-documenting. For example:

extension UnsafePointer {
  init(_ from: UnsafeBytePointer, toPointee: Pointee.type)
}extension UnsafeMutablePointer {
  init(_ from: UnsafeBytePointer, toPointee: Pointee.type)
}

Similarly, conversion between UnsafePointer types must now be spelled
with an explicitly Pointee type:

extension UnsafePointer {
  init<U>(_ from: UnsafePointer<U>, toPointee: Pointee.Type)
  init<U>(_ from: UnsafeMutablePointer<U>, toPointee: Pointee.Type)
}extension UnsafeMutablePointer {
  init<U>(_ from: UnsafeMutablePointer<U>, toPointee: Pointee.Type)
}

<https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#impact-on-existing-code>Impact
on existing code

The largest impact of this change is that void* and const void* are
imported as UnsafeBytePointer. This impacts many public APIs, but with
implicit argument conversion should not affect typical uses of those APIs.

Any Swift projects that rely on type inference to convert between
UnsafePointer types will need to take action. The developer needs to
determine whether type punning is necessary. If so, they must migrate to
the UnsafeBytePointer API. Otherwise, they can work around the new
restriction by using a toPointee, or mutating label.

Disallowing inferred UnsafePointer direct conversion requires some
standard library code to use an explicit toPointeelabel for unsafe
conversions that may violate strict aliasing.

All occurrences of Unsafe[Mutable]Pointer<Void> in the standard library
are converted to UnsafeBytePointer. e.g. unsafeAddress() now returns
UnsafeBytePointer, not UnsafePointer<Void>.

Some occurrences of Unsafe[Mutable]Pointer<Pointee> in the standard
library are replaced with UnsafeBytePointer, either because the code was
playing too loosely with strict aliasing rules, or because the code
actually wanted to perform pointer arithmetic on byte-addresses.

StringCore.baseAddress changes from OpaquePointer to UnsafeBytePointer because
it is computing byte offsets and accessing the memory. OpaquePointer is
meant for bridging, but should be truly opaque; that is, nondereferenceable
and not involved in address computation.

The StringCore implementation does a considerable amount of casting
between different views of the String storage. The current implementation
already demonstrates some awareness of strict aliasing rules. The rules are
generally followed by ensuring that the StringBuffer only be accessed
using the appropriate CodeUnit within Swift code. For interoperability
and optimization, String buffers frequently need to be cast to and from
CChar. This is valid as long access to the buffer from Swift is guarded
by dynamic checks of the encoding type. These unsafe, but dynamically legal
conversion points will now be labeled with toPointee.

CoreAudio utilities now use an UnsafeBytePointer.

<https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#implementation-status>Implementation
status

On my unsafeptr_convert branch
<https://github.com/atrick/swift/commits/unsafeptr_convert>, I've made
most of the necessary changes to support the addition of UnsafeBytePointerand
the removal of inferred UnsafePointer conversion.

There are a several things going on here in order to make it possible to
build the standard library with the changes:

   -

   A new UnsafeBytePointer type is defined.
   -

   The type system imports void* as UnsafeBytePointer.
   -

   The type system handles implicit conversions to UnsafeBytePointer.
   -

   UnsafeBytePointer replaces both UnsafePointer<Void> and
   UnsafeMutablePointer<Void>.
   -

   The standard library was relying on inferred UnsafePointer conversion
   in over 100 places. Most of these conversions now either take an explicit
   label, such as 'toPointee', 'mutating'. Some have been rewritten.
   -

   Several places in the standard library that were playing loosely with
   strict aliasing or doing bytewise pointer arithmetic now use
   UnsafeBytePointer instead.
   -

   Explicit labeled Unsafe[Mutable]Pointer initializers are added.
   -

   The inferred Unsafe[Mutable]Pointer conversion is removed.

TODO:

Once this proposal is accepted, and the rules for casting between pointers
types have been decided, we need to finish implementing the type system
support. The current implementation (intentionally) breaks a few tests in
pointer_conversion.swift. We also need to ensure that interoperability
requirements are met. Currently, many argument casts to be explicitly
labeled. The current implementation also makes it easy for users to hit an
"ambiguous use of 'init'" error when relying on implicit argument
conversion.

Additionally:

   -

   A name mangled abbreviation needs to be created for UnsafeBytePointer.
   -

   The StringAPI tests should probably be rewritten with UnsafeBytePointer
   .
   -

   The NSStringAPI utilities and tests may need to be ported to
   UnsafeBytePointer
   -

   The CoreAudio utilities and tests may need to be ported to
   UnsafeBytePointer.

<https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#alternatives-considered>Alternatives
considered
<https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#existing-workaround>Existing
workaround

In some cases, developers can safely reinterpret values to achieve the
same effect as type punning:

let ptrI32 = UnsafeMutablePointer<Int32>(allocatingCapacity: 1)
ptrI32[0] = Int32()let u = unsafeBitCast(ptrI32[0], to: UInt32.self)

Note that all access to the underlying memory is performed with the same
element type. This is perfectly legitimate, but simply isn't a complete
solution. It also does not eliminate the inherent danger in declaring a
typed pointer and expecting it to point to values of a different type.

<https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#discarded-alternatives>Discarded
alternatives

We considered adding a typePunnedMemory property to the existing
Unsafe[Mutabale]Pointer API. This would provide a legal way to access a
potentially type punned Unsafe[Mutabale]Pointer. However, it would
certainly cause confusion without doing much to reduce likelihood of
programmer error. Furthermore, there are no good use cases for such a
property evident in the standard library.

The opaque _RawByte struct is a technique that allows for
byte-addressable buffers while hiding the dangerous side effects of type
punning (a _RawByte could be loaded but it's value cannot be directly
inspected). UnsafePointer<_RawByte> is a clever alternative to
UnsafeBytePointer. However, it doesn't do enough to prevent undefined
behavior. The loaded _RawByte would naturally be accessed via
unsafeBitCast, which would mislead the author into thinking that they
have legally bypassed the type system. In actuality, this API blatantly
violates strict aliasing. It theoretically results in undefined behavior as
it stands, and may actually exhibit undefined behavior if the user recovers
the loaded value.

To solve the safety problem with UnsafePointer<_RawByte>, the compiler
could associate special semantics with a UnsafePointer bound to this
concrete generic parameter type. Statically enforcing casting rules would
be difficult if not impossible without new language features. It would also
be impossible to distinguish between typed and untyped pointer APIs. For
example, UnsafePointer<T>.load<U> would be a nonsensical vestige.

<https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#alternate-proposal-for-void-type>Alternate
proposal for void* type

Changing the imported type for void* will be somewhat disruptive.
Furthermore, this proposal currently drops the distinction between void*
and const void*--an obvious loss of API information.

We could continue to import void* as UnsafeMutablePointer<Void> and const
void* as UnsafePointer<Void>, which will continue to serve as an "opaque"
untyped pointer. Converting to UnsafeBytePointer would be necesarry to
perform pointer arithmetic or to conservatively handle possible type
punning.

This alternative is *much* less disruptive, but we are left with two
forms of untyped pointer, one of which (UnsafePointer) the type system
somewhat conflates with typed pointers.

Given the current restrictions of the language, it's not clear how to
statically enforce the necessary rules for castingUnsafePointer<Void> once
general


(Joe Groff) #5

Future improvements

UnsafeBytePointer should eventually support unaligned memory access. I believe that we will eventually have a modifier that allows "packed" struct members. At that time we may also want to add a "packed" flag to UnsafeBytePointer's load and initialize methods.

We should probably call out the fact that `load` and `initialize` require alignment in the meantime.

When accessing a memory buffer, it is generally convenient to cast to a type with known layout and compute offsets relative to the type's size. This is how UnsafePointer<Pointee> works. A generic UnsafeTypePunnedPointer<Pointee> could be introduced with the same interface as UnsafePointer<Pointer>, but without the strict aliasing requirements. This seems like an overdesign simply to avoid calling strideof() in an rare use case, but nothing prevents adding this type later.

This need could also be addressed with some additional convenience methods on UnsafeBytePointer to load or store at a given index, something like:

  func load<T>(asArrayOf type: T.Type, at index: Int) -> T {
    return (self + strideof(T) * index).load(T)
  }
  func initialize(asArrayOf type: T.Type, initialValue: T, at index: Int) {
    return (self + strideof(T) * index).initialize(initialValue)
  }

-Joe


(Andrew Trick) #6

Responding to this on the swift-evolution thread...
https://github.com/atrick/swift-evolution/blob/voidpointer/proposals/XXXX-unsafebytepointer.md

Some concerns with UnsafeBytePointer:

- I was concerned about having a store() to go with load(). It’s just deinitialize + initialize with a count of 1, but that’s easily the common case when you do need to write to something. That said, I’m not sure which people are more likely to mess up: using initialize and forgetting to deinitialize before, or using store when there wasn’t anything there before.

store() is definitely a common case, but it is subtly broken if the user doesn't realize that the overwritten value must be exactly the same type. The problem is that this API is advertised as supporting type punning. The only way to make it safe as advertised is to force users to deinitialize<T>() + initialize<U>. I think this needs to be clearly explained in the API comments.

+ ///
+ /// - Note: The converse of loading a value, storing a value `T` into
+ /// initialized memory requires the user to know the previously initialized
+ /// value's type. Full 'store' semantics can be achieved with:
+ /// `deinitialize(PreviousType)`
+ /// `initialize(NewType, with: value)`
+ /// If the previosly initialized type cannot reference any managed objects,
+ /// then the `deinitialize` call can be skipped.
   public func load<T>(_ : T.Type) -> T {

- I am concerned about eliminating the distinction between mutable and immutable memory. That is, I think we’ll want the Mutable variant to be a separate type.

Yes. I haven't gotten feedback yet on that, or on importing 'void*' as UnsafeBytePointer. If the feeling on both is positive, then I think it's worth ammending my proposal at this point to include UnsafeMutableBytePointer.

- Is there a good way to do a mass copy or move from an UnsafeBytePointer?

I didn’t add ‘move’ from UnsafeBytePointer for the same reason that I didn’t add ‘store’.

I could add the following functions to Unsafe[Mutable]Pointer though for completeness:

- initialize(from: UnsafeBytePointer, count)
- assign(from: UnsafeBytePointer, count)

I could also add a mass move from UnsafePointer *to* UnsafeBytePointer.

  func moveInitializeFrom<T>(_ source: UnsafePointer<T>, count: Int) {
  func moveInitializeBackwardFrom<T>(_ source: UnsafePointer<T>, count: Int) {

-Andy

···

On May 12, 2016, at 9:27 AM, Jordan Rose <jordan_rose@apple.com> wrote:


(Andrew Trick) #7

Hello Swift evolution,

I'm sending this proposal out again for another round of RFC. The first round did not get much specific feedback, and nothing has fundamentally changed. In this updated version I beefed up the explanation a bit and clarified the language.

-Andy

Hello Swift evolution,

I sent this to swift-dev last week. Sorry to post on two lists!

Swift does a great job of protecting against undefined behavior--as long as you avoid "unsafe" APIs, that is. However, unsafe APIs are important for giving developers control over implementation details and performance. Naturally, the contract between unsafe APIs and the optimizer is crucial. When a developer uses an unsafe API, the rules governing safe, well-defined behavior must be clear. On the opposite end, the optimizer must know which assumptions it can make based on those rules. Simply saying that anything goes because "unsafe" is in the name is not helpful to this effort.

For a long time, I've wanted these rules nailed down. We have more users taking advantage of advanced features, and more optimizations that take advantage of assumptions guided by the type system. This seems like a particularly good time to resolve UnsafePointer semantics, considering the type system and UnsafePointer work that's been going on recently. Strict aliasing is something I would like addressed. If we do nothing here, then we will end up by default inheriting C/C++ semantics, as with any language that relies on a C/C++ backend. In other words, developers will be forced to write code with technically undefined behavior and rely on the compiler to be smart enough to recognize and recover from common patterns. Or we can take advantage of this opportunity and instead adopt a sound memory model with respect to aliasing.

This proposal is only an RFC at this point. I'm sending it out now to allow for plenty of time for discussion (or advance warning). Keep in mind that it could change considerably before it goes up for review.

-Andy

UnsafeBytePointer API for In-Memory Layout

Proposal: SE-NNNN <https://github.com/atrick/swift-evolution/blob/voidpointer/proposals/XXXX-unsafebytepointer.md>
Author(s): Andrew Trick <https://github.com/atrick>
Status: Awaiting review <https://github.com/atrick/swift-evolution/blob/voidpointer/proposals/XXXX-unsafebytepointer.md#rationale>
Review manager: TBD
<https://github.com/atrick/swift-evolution/blob/voidpointer/proposals/XXXX-unsafebytepointer.md#introduction>Introduction

UnsafePointer and UnsafeMutable refer to a typed region of memory, and the compiler must be able to assume that UnsafePointer element (Pointee) type is consistent with other access to the same memory. See proposed Type Safe Memory Access documentation <https://github.com/atrick/swift/blob/type-safe-mem-docs/docs/TypeSafeMemory.rst>. Consequently, conversion between UnsafePointer element types exposes an easy way to abuse the type system.

In the following example, takesUIntPtr accesses a memory location as a UInt, which is incompatible with the declared type of the pointer passed to takesIntPtr, yet the statement that performs the pointer conversion provides no indication that type punning may be taking place:

func takesUIntPtr(_ p: UnsafeMutablePointer<UInt>) -> UInt {
  return p[0]
}
func takesIntPtr(q: UnsafeMutablePointer<Int>) -> UInt {
  return takesUIntPtr(UnsafeMutablePointer(q))
}
If this pointer conversion was accidental, then it is likely a serious bug. Type punning via UnsafePointer is semantically undefined behavior and de facto undefined behavior given the optimizer's long-time treatment of UnsafePointer.

If the user's intention is to perform type punning, then UnsafePointer is the wrong API. Swift does not currently provide an API that permits safe, legal, type punning.

Swift-evolution thread: [RFC] UnsafeBytePointer API for In-Memory Layout <https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20160509/thread.html#16909>
In this document, all mentions of UnsafePointer also apply to UnsafeMutablePointer.

<https://github.com/atrick/swift-evolution/blob/voidpointer/proposals/XXXX-unsafebytepointer.md#motivation>Motivation

Type punning is not a normal use case for UnsafePointer and will likely lead to undefined behavior. To avoid accidental type punning, we should prohibit inferred conversion between UnsafePointer<T> and UnsafePointer<U> unless the target of the conversion is an untyped or nondereferenceable pointer (currently represented as UnsafePointer<Void>). An "inferred conversion" is one in which a generic type is initialized via type inferrence without the need to spell the destination type:

struct S {
  let ptr : UnsafePointer<T>
}

let p = UnsafePointer<U>(...)
S(ptr: UnsafePointer(p))
To support this change we should introduce a new pointer type that does not bind the type of its Pointee. Such a new pointer type would allow inferred and implicit conversion from typed to untyped pointers, which is the common, safe use case for UnsafePointer conversion. More importantly it would provide an ideal foundation for an API that allows byte-wise pointer arithmetic and a legal, well-defined means to access an untyped region of memory (safe type punning). No alternative currently exists for manual memory layout and direct access to untyped memory. Not only is this a legitimate use case that Swift should support, but the lack a proper API has already lead to dangerous overuse of UnsafePointer.

As motivation for such an API, consider that an UnsafePointer<Void> or OpaquePointer may be currently be obtained from an external API. However, the developer may know the memory layout and may want to read or write elements whose types are compatible with that layout. This a reasonable use case, but unless the developer can guarantee that all accesses to the same memory location have the same type, then they cannot use UnsafePointer to access the memory without risking undefined behavior.

An example based on the proposed new UnsafeBytePointer API is included below.

<https://github.com/atrick/swift-evolution/blob/voidpointer/proposals/XXXX-unsafebytepointer.md#proposed-solution>Proposed solution

This proposal introduces UnsafeBytePointer and UnsafeMutableBytePointer types, along with an API for obtaining an UnsafeMutableBytePointer value at a relative byte offset and loading and storing arbitrary types at that location.

UnsafePointer<T> to UnsafeBytePointer conversion will be allowed via an unlabeled initializer and in some cases may be implicit. However, inferred UnsafePointer<T> conversion will now be statically prohibited. Converting arbitrary UnsafePointer<T> types will instead require a labeled initializer that accepts the destination type:

struct S {
  let ptr : UnsafePointer<T>
}

let p = UnsafePointer<U>(...)
S(ptr: UnsafePointer(p, to: T))
Just as with unsafeBitCast, although the destination of the cast can usually be inferred, we want the developer to explicitly state the intended destination type, both because type inferrence can be surprising, and because it's important to the reader for code comprehension.

While the new UnsafeBytePointer type supports the removal of dangerous UnsafePointer conversion, it's API also meets multiple requirements:

An untyped pointer to memory
Type-unsafe access to memory (legal type punning)
Pointer arithmetic within byte-addressable memory
UnsafeMutableBytePointer will replace UnsafeMutablePointer<Void> and UnsafeBytePointer will replace UnsafePointer<Void> as the standard representations for untyped memory. The Swift imported type for void* and const void* will be UnsafeMutableBytePointer and UnsafeBytePointer respectively.

Note: For API clarity we could consider a typealias for VoidPointer. A separate VoidPointer type would not be very useful--there's no danger that UnsafeBytePointer will be casually dereferenced, and no danger in allowing pointer arithmetic since the only reasonable interpretation is that of a byte-addressable memory.

Loading from and storing to memory via an Unsafe[Mutable]BytePointer is safe independent of the type of value being loaded or stored and independent of the memory's allocated type as long as layout guarantees are met (per the ABI). This allows legal type punning within Swift and allows Swift code to access a common region of memory that may be shared across an external interface that does not provide type safety guarantees. Accessing type punned memory directly through a designated Unsafe[Mutable]BytePointer type provides sound basis for compiler implementation of strict aliasing. This is in contrast with the approach of simply providing a special unsafe pointer cast operation for bypassing type safety, which cannot be reliably implemented.

Providing an API for type-unsafe memory access would not serve much purpose without the ability to compute byte offsets. Of course, we could require users to convert back and forth using bitPatterns, but that would be awkward and only obscure the purpose of the UnsafeBytePointer type.

<https://github.com/atrick/swift-evolution/blob/voidpointer/proposals/XXXX-unsafebytepointer.md#detailed-design>Detailed design

The public API is shown here. For details, see the unsafeptr_convert branch <https://github.com/atrick/swift/commits/unsafeptr_convert>.

struct UnsafeMutableBytePointer : Hashable, _Pointer {

  let _rawValue: Builtin.RawPointer

  var hashValue: Int {...}

  init<T>(_ : UnsafePointer<T>)
  init<T>(_ : UnsafeMutablePointer<T>)
  init?<T>(_ : UnsafePointer<T>?)
  init?<T>(_ : UnsafeMutablePointer<T>?)

  init<T>(_ : OpaquePointer<T>)
  init?<T>(_ : OpaquePointer<T>?)

  init?(bitPattern: Int)
  init?(bitPattern: UInt)

  /// Load a single `T` value from memory.
  ///
  /// - Precondition: The underlying pointer is properly aligned for
  /// accessing `T`.
  ///
  /// - Note: The converse of loading a value, storing a value `T` into
  /// initialized memory requires the user to know the previously initialized
  /// value's type. Full 'store' semantics can be achieved with:
  /// `deinitialize(PreviousType)`
  /// `initialize(NewType, with: value)`
  /// If the previosly initialized type cannot reference any managed objects,
  /// then the `deinitialize` call can be skipped.
  func load<T>(_ : T.Type) -> T

  /// Load a `T` value at the specified `index` from `self` as if it
  /// contains at least `index` + 1 contiguous values of type `T`.
  ///
  /// - Precondition: The underlying pointer is properly aligned for
  /// accessing `T`.
  func load<T>(asArrayOf _: T.Type, at index: Int) -> T

  /// Initialize this memory location with `count` consecutive copies
  /// of `newValue`
  ///
  /// Returns a `UnsafeBytePointer` to memory one byte past the last
  /// initialized value.
  ///
  /// - Precondition: The memory is not initialized.
  ///
  /// - Precondition: The underlying pointer is properly aligned for
  /// accessing `T`.
  ///
  /// - Precondition: `count` is non-negative.
  ///
  /// - Postcondition: The memory is initialized; the value should eventually
  /// be destroyed or moved from to avoid leaks.
  func initialize<T>(_: T.Type, with newValue: T, count: Int = 1)
    -> UnsafeBytePointer

  /// Initialize the memory location at `index` with `newValue` as if `self`
  /// holds at least `index` + 1 contiguous values of type `T`.
  ///
  /// Returns a `UnsafeBytePointer` to memory one byte past the
  /// initialized value.
  ///
  /// - Precondition: The memory at `index` is not initialized.
  ///
  /// - Precondition: The underlying pointer is properly aligned for
  /// accessing `T`.
  ///
  /// - Postcondition: The memory is initialized; the value should eventually
  /// be destroyed or moved from to avoid leaks.
  func initialize<T>(asArrayOf _: T.Type, initialValue: T, at index: Int)
    -> UnsafeBytePointer

  /// De-initialize the `count` `T`s starting at `self`, returning
  /// their memory to an uninitialized state.
  ///
  /// - Precondition: The `T`s at `self..<self + count` are initialized.
  ///
  /// - Postcondition: The memory is uninitialized.
  func deinitialize<T>(_ : T.Type, count: Int = 1)

  /// De-initialize `T` at the memory location `index` as if `self` holds at
  /// least `index` + 1 contiguous values of type `T`, returning
  /// the memory to an uninitialized state.
  ///
  /// - Precondition: The `T` value at `index` is initialized.
  ///
  /// - Postcondition: The memory at `index` is uninitialized.
  func deinitialize<T>(asArrayOf _: T.Type, at index: Int)

  /// Allocate and point at uninitialized memory for `size` bytes with
  /// `alignedTo` alignment.
  ///
  /// - Postcondition: The memory is allocated, but not initialized.
  @warn_unused_result
  init(allocatingBytes size: Int, alignedTo: Int)

  /// Allocate and point at uninitialized memory for `count` values of `T`.
  ///
  /// - Postcondition: The memory is allocated, but not initialized.
  @warn_unused_result
  init<T>(allocatingCapacity count: Int, of: T.Type)

  /// Deallocate uninitialized memory allocated for `size` bytes with
  /// `alignedTo` alignment.
  ///
  /// - Precondition: The memory is not initialized.
  ///
  /// - Postcondition: The memory has been deallocated.
  func deallocateBytes(_ size: Int, alignedTo: Int)

  /// Deallocate uninitialized memory allocated for `count` values of `T`.
  ///
  /// - Precondition: The memory is not initialized.
  ///
  /// - Postcondition: The memory has been deallocated.
  func deallocateCapacity<T>(_ num: Int, of: T.Type)

  /// Omitting comments for the following convenient variations on
  /// intitialize...

  func initialize<T>(from: UnsafePointer<T>, count: Int) -> UnsafeBytePointer

  func initializeBackward<T>(from source: UnsafePointer<T>, count: Int)

  func moveInitialize<T>(from source: UnsafeMutablePointer<T>, count: Int) {

  func moveInitializeBackward<T>(from source: UnsafeMutablePointer<T>,
    count: Int)
}

extension OpaquePointer {
  init(_ : UnsafeBytePointer)
}

extension Int {
  init(bitPattern: UnsafeBytePointer)
}

extension UInt {
  init(bitPattern: UnsafeBytePointer)
}

extension UnsafeBytePointer : Strideable {
  func distance(to : UnsafeBytePointer) -> Int
  func advanced(by : Int) -> UnsafeBytePointer
}

func == (lhs: UnsafeBytePointer, rhs: UnsafeBytePointer) -> Bool

func < (lhs: UnsafeBytePointer, rhs: UnsafeBytePointer) -> Bool

func + (lhs: Int, rhs: UnsafeBytePointer) -> UnsafeBytePointer

func - (lhs: UnsafeBytePointer, rhs: UnsafeBytePointer) -> Int
UnsafeBytePointer provides the same interface as UnsafeMutablePointer except for store, variations on initialize, and deinitialize.

Occasionally, we need to convert from a UnsafeBytePointer to an UnsafePointer. This should only be done in very rare circumstances when the author understands the compiler's strict type rules for UnsafePointer. Although this could be done by casting through an OpaquePointer, an explicit, designated unsafe pointer cast API would makes the risks more obvious and self-documenting. For example:

extension UnsafePointer {
  init(_ from: UnsafeBytePointer, to: Pointee.type)
}
extension UnsafeMutablePointer {
  init(_ from: UnsafeBytePointer, to: Pointee.type)
}
Similarly, conversion between UnsafePointer types must now be spelled with an explicitly Pointee type:

extension UnsafePointer {
  init<U>(_ from: UnsafePointer<U>, to: Pointee.Type)
  init<U>(_ from: UnsafeMutablePointer<U>, to: Pointee.Type)
}
extension UnsafeMutablePointer {
  init<U>(_ from: UnsafeMutablePointer<U>, to: Pointee.Type)
}
Some existing conversions between UnsafePointer types do not convert Pointee types but instead coerce an UnsafePointer to an UnsafeMutablePointer. This is no longer an inferred conversion, but must be explicitly requested:

extension UnsafeMutablePointer {
  init(mutating from: UnsafePointer<Pointee>)
}
<https://github.com/atrick/swift-evolution/blob/voidpointer/proposals/XXXX-unsafebytepointer.md#impact-on-existing-code>Impact on existing code

The largest impact of this change is that void* and const void* are imported as UnsafeMutableBytePointer and UnsafeBytePointer. This impacts many public APIs, but with implicit argument conversion should not affect typical uses of those APIs.

Any Swift projects that rely on type inference to convert between UnsafePointer types will need to take action. The developer needs to determine whether type punning is necessary. If so, they must migrate to the UnsafeBytePointer API. Otherwise, they can work around the new restriction by using a to: Pointee, or mutating label.

Disallowing inferred UnsafePointer direct conversion requires some standard library code to use an explicit to: Pointee label for unsafe conversions that may violate strict aliasing.

All occurrences of Unsafe[Mutable]Pointer<Void> in the standard library are converted to UnsafeBytePointer. e.g. unsafeAddress() now returns UnsafeBytePointer, not UnsafePointer<Void>.

Some occurrences of Unsafe[Mutable]Pointer<Pointee> in the standard library are replaced with UnsafeBytePointer, either because the code was playing too loosely with strict aliasing rules, or because the code actually wanted to perform pointer arithmetic on byte-addresses.

StringCore.baseAddress changes from OpaquePointer to UnsafeBytePointer because it is computing byte offsets and accessing the memory. OpaquePointer is meant for bridging, but should be truly opaque; that is, nondereferenceable and not involved in address computation.

The StringCore implementation does a considerable amount of casting between different views of the String storage. The current implementation already demonstrates some awareness of strict aliasing rules. The rules are generally followed by ensuring that the StringBuffer only be accessed using the appropriate CodeUnit within Swift code. For interoperability and optimization, String buffers frequently need to be cast to and from CChar. This is valid as long access to the buffer from Swift is guarded by dynamic checks of the encoding type. These unsafe, but dynamically legal conversion points will now be labeled with to: Pointee.

CoreAudio utilities now use an UnsafeBytePointer.

<https://github.com/atrick/swift-evolution/blob/voidpointer/proposals/XXXX-unsafebytepointer.md#implementation-status>Implementation status

On my unsafeptr_convert branch <https://github.com/atrick/swift/commits/unsafeptr_convert>, I've made most of the necessary changes to support the addition of UnsafeBytePointerand the removal of inferred UnsafePointer conversion.

There are a several things going on here in order to make it possible to build the standard library with the changes:

A new UnsafeBytePointer type is defined.

The type system imports void* as UnsafeBytePointer.

The type system handles implicit conversions to UnsafeBytePointer.

UnsafeBytePointer replaces both UnsafePointer<Void> and UnsafeMutablePointer<Void> (Recent feedback suggestes that UnsafeMutablePointer should also be introduced).

The standard library was relying on inferred UnsafePointer conversion in over 100 places. Most of these conversions now either take an explicit label, such as to: Pointee, mutating. Some have been rewritten.

Several places in the standard library that were playing loosely with strict aliasing or doing bytewise pointer arithmetic now use UnsafeBytePointer instead.

Explicit labeled Unsafe[Mutable]Pointer initializers are added.

The inferred Unsafe[Mutable]Pointer conversion is removed.

TODO:

Once this proposal is accepted, and the rules for casting between pointers types have been decided, we need to finish implementing the type system support. The current implementation (intentionally) breaks a few tests in pointer_conversion.swift. We also need to ensure that interoperability requirements are met. Currently, many argument casts to be explicitly labeled. The current implementation also makes it easy for users to hit an "ambiguous use of 'init'" error when relying on implicit argument conversion.

Additionally:

UnsafeMutableBytePointer needs to be introduced, and we need to distinguish between void* and const void*import types.

A name mangled abbreviation needs to be created for UnsafeBytePointer.

The StringAPI tests should probably be rewritten with UnsafeBytePointer.

The NSStringAPI utilities and tests may need to be ported to UnsafeBytePointer

The CoreAudio utilities and tests may need to be ported to UnsafeBytePointer.

<https://github.com/atrick/swift-evolution/blob/voidpointer/proposals/XXXX-unsafebytepointer.md#alternatives-considered>Alternatives considered

<https://github.com/atrick/swift-evolution/blob/voidpointer/proposals/XXXX-unsafebytepointer.md#existing-workaround>Existing workaround

In some cases, developers can safely reinterpret values to achieve the same effect as type punning:

let ptrI32 = UnsafeMutablePointer<Int32>(allocatingCapacity: 1)
ptrI32[0] = Int32()
let u = unsafeBitCast(ptrI32[0], to: UInt32.self)
Note that all access to the underlying memory is performed with the same element type. This is perfectly legitimate, but simply isn't a complete solution. It also does not eliminate the inherent danger in declaring a typed pointer and expecting it to point to values of a different type.

<https://github.com/atrick/swift-evolution/blob/voidpointer/proposals/XXXX-unsafebytepointer.md#discarded-alternatives>Discarded alternatives

We considered adding a typePunnedMemory property to the existing Unsafe[Mutabale]Pointer API. This would provide a legal way to access a potentially type punned Unsafe[Mutabale]Pointer. However, it would certainly cause confusion without doing much to reduce likelihood of programmer error. Furthermore, there are no good use cases for such a property evident in the standard library.

The opaque _RawByte struct is a technique that allows for byte-addressable buffers while hiding the dangerous side effects of type punning (a _RawByte could be loaded but it's value cannot be directly inspected). UnsafePointer<_RawByte> is a clever alternative to UnsafeBytePointer. However, it doesn't do enough to prevent undefined behavior. The loaded _RawByte would naturally be accessed via unsafeBitCast, which would mislead the author into thinking that they have legally bypassed the type system. In actuality, this API blatantly violates strict aliasing. It theoretically results in undefined behavior as it stands, and may actually exhibit undefined behavior if the user recovers the loaded value.

To solve the safety problem with UnsafePointer<_RawByte>, the compiler could associate special semantics with a UnsafePointer bound to this concrete generic parameter type. Statically enforcing casting rules would be difficult if not impossible without new language features. It would also be impossible to distinguish between typed and untyped pointer APIs. For example, UnsafePointer<T>.load<U> would be a nonsensical vestige.

<https://github.com/atrick/swift-evolution/blob/voidpointer/proposals/XXXX-unsafebytepointer.md#alternate-proposal-for-void-type>Alternate proposal for void* type

Changing the imported type for void* will be somewhat disruptive. Furthermore, this proposal currently drops the distinction between void* and const void*--an obvious loss of API information.

We could continue to import void* as UnsafeMutablePointer<Void> and const void* as UnsafePointer<Void>, which will continue to serve as an "opaque" untyped pointer. Converting to UnsafeBytePointer would be necesarry to perform pointer arithmetic or to conservatively handle possible type punning.

This alternative is much less disruptive, but we are left with two forms of untyped pointer, one of which (UnsafePointer) the type system somewhat conflates with typed pointers.

Given the current restrictions of the language, it's not clear how to statically enforce the necessary rules for castingUnsafePointer<Void> once general UnsafePointer<T> conversions are disallowed. The following conversions should be inferred, and implied for function arguments (ignoring mutability):

UnsafePointer<T> to UnsafePointer<Void>

UnsafePointer<Void> to UnsafeBytePointer

I did not implement this simpler design because my primary goal was to enforce legal pointer conversion and rid Swift code of undefined behavior. I can't do that while allowing UnsafePointer conversions.

<https://github.com/atrick/swift-evolution/blob/voidpointer/proposals/XXXX-unsafebytepointer.md#future-improvements>Future improvements

UnsafeBytePointer should eventually support unaligned memory access. I believe that we will eventually have a modifier that allows "packed" struct members. At that time we may also want to add a "packed" flag to UnsafeBytePointer's load and initialize methods.

<https://github.com/atrick/swift-evolution/blob/voidpointer/proposals/XXXX-unsafebytepointer.md#unsafebytepointer-example>UnsafeBytePointer example

/// An example of using UnsafeMutableBytePointer to implement manual memory
/// layout.

/// A Buffer for reading and writing basic types at a fixed address.
/// Indirection allows the buffer to refer to mutable state elsewhere.
struct MessageBuffer {
  let ptr: UnsafeMutableBytePointer

  enum IndirectFlag { case Direct, Indirect }

  private func getPointer(atOffset n: Int, _ isIndirect: IndirectFlag)
  -> UnsafeMutableBytePointer {
    switch isIndirect {
    case .Indirect:
      return (ptr + n).load(UnsafeMutableBytePointer.self)
    case .Direct:
      return ptr + n
    }
  }

  func readUInt32(atOffset n: Int, _ isIndirect: IndirectFlag) -> UInt32 {
    return getPointer(atOffset: n, isIndirect).load(UInt32.self)
  }
  func readFloat32(atOffset n: Int, _ isIndirect: IndirectFlag) -> Float32 {
    return getPointer(atOffset: n, isIndirect).load(Float32.self)
  }

  func writeUInt32(_ val: UInt32, atOffset n: Int) {
    getPointer(atOffset: n, .Direct).initialize(with: val)
  }
  func writeFloat32(_ val: Float32, atOffset n: Int) {
    getPointer(atOffset: n, .Direct).initialize(with: val)
  }
  func writeIndirect(_ ptr: UnsafeMutableBytePointer, atOffset n: Int) {
    getPointer(atOffset: n, .Direct).initialize(with: ptr)
  }
}

/// Encoded message format.
struct MessageFormat : Sequence, IteratorProtocol {
  typealias Element = MessageFormat

  private static let maxFormatFields = 32 / 4
  static let maxBufferBytes = maxFormatFields * strideof(UInt)

  var formatCode: UInt32 = 0
  var elementCode: UInt32 = 0
  var offset: Int = 0

  init(bitPattern: UInt32) {
    formatCode = bitPattern
  }

  enum Kind {
    case None, Reserved, UInt32, Float32, IndirectUInt32, IndirectFloat32
  }

  /// The first field's kind.
  var kind : Kind {
    get {
      switch elementCode {
      case 0x0: return Kind.None
      case 0x2: return Kind.UInt32
      case 0x3: return Kind.Float32
      case 0x6: return Kind.IndirectUInt32
      case 0x7: return Kind.IndirectFloat32
      default: return Kind.Reserved
      }
    }
  }

  func elementStride() -> Int {
    return (elementCode & 0x4) != 0 ? strideof(UInt) : 4
  }

  /// Get the format for the next element.
  mutating func next() -> Element? {
    if elementCode != 0 {
      offset += elementStride()
    }
    elementCode = formatCode & 0xF
    formatCode >>= 4
    if kind == .None {
      return nil
    }
    // align to the next element size
    let offsetMask = elementStride() - 1
    offset = (offset + offsetMask) & ~offsetMask
    return self
  }
}

func createBuffer() -> MessageBuffer {
  return MessageBuffer(ptr: UnsafeMutableBytePointer(
      allocatingBytes: MessageFormat.maxBufferBytes, alignedTo: strideof(UInt)))
}

func destroy(buffer: MessageBuffer) {
  buffer.ptr.deallocateBytes(MessageFormat.maxBufferBytes,
    alignedTo: strideof(UInt))
}

var sharedInt: UInt32 = 42
var sharedFloat: Float32 = 16.25

func generateMessage(inBuffer mb: MessageBuffer) -> MessageFormat {
  let mf = MessageFormat(bitPattern: 0x06727632)
  for field in mf {
    switch field.kind {
    case .UInt32:
      mb.writeUInt32(66, atOffset: field.offset)
    case .Float32:
      mb.writeFloat32(41.625, atOffset: field.offset)
    case .IndirectUInt32:
      mb.writeIndirect(&sharedInt, atOffset: field.offset)
    case .IndirectFloat32:
      mb.writeIndirect(&sharedFloat, atOffset: field.offset)
    case .None:
      fallthrough
    case .Reserved:
      return MessageFormat(bitPattern: 0)
    }
  }
  return mf
}

func handleMessage(buffer mb: MessageBuffer, format: MessageFormat) -> Bool {
  for field in format {
    switch field.kind {
    case .UInt32:
      print(mb.readUInt32(atOffset: field.offset, .Direct))
    case .Float32:
      print(mb.readFloat32(atOffset: field.offset, .Direct))
    case .IndirectUInt32:
      print(mb.readUInt32(atOffset: field.offset, .Indirect))
    case .IndirectFloat32:
      print(mb.readFloat32(atOffset: field.offset, .Indirect))
    case .None:
      fallthrough
    case .Reserved:
      return false
    }
  }
  return true
}

func runProgram() {
  let mb = createBuffer()
  let mf = generateMessage(inBuffer: mb)
  if handleMessage(buffer: mb, format: mf) {
    print("Done")
  }
  destroy(buffer: mb)
}
runProgram()

···

On May 9, 2016, at 11:14 AM, Andrew Trick via swift-evolution <swift-evolution@swift.org> wrote:


(Joe Groff) #8

Andy, I think it's worth clarifying the primary purpose of this proposal. Our main goal here is to provide a legal means for "type-punning" memory access. Like C and C++, it's technically undefined behavior in Swift to cast an UnsafePointer<T> to an UnsafePointer<U> of a different type and load a value out of memory that's of a different type from what was stored there. We don't take much advantage of this yet in Swift's optimizer, since we don't have good alternative API. UnsafeBytePointer seeks to fill this gap by providing a type that can safely do type-punned loads and stores.

-Joe

···

On May 9, 2016, at 12:38 PM, Geordie Jay via swift-evolution <swift-evolution@swift.org> wrote:

I read this proposal and I'm a bit unsure what its purpose would be:

Basically you want to prevent UnsafePointer<XYZ>(UnsafePointer<Void>) conversions and/or vice-versa? And you'd achieve this by replacing UnsafePointer<Void> with UnsafeBytePointer that has no bound pointer type?

In one sense the change seems fine to me, but as someone who uses a lot of C APIs and a lot of CoreAudio/CoreMIDI in Swift already I can't really see what benefit it'd bring. Presumably we'd still want an option of converting UnsafeBytePointer to UnsafePointer<SomeActualType> for things like C function pointer callback "context"/"userInfo" uses, so it's not like we'd be preventing programmer error in that way.

Call me conservative but to me the current system seems to work as well as it can. If anything it's already enough boilerplate going through hoops converting an UnsafeMutablePointer<Void> into a [Float] even when I know and the C API knows perfectly well what it actually contains... Would happily be convinced otherwise about this proposal though, I'm pretty new at all this.

Geordie

On May 9, 2016, at 12:57 PM, Guillaume Lessard via swift-evolution <swift-evolution@swift.org> wrote:

I’m sympathetic to the elimination of UnsafePointer<Void> as general shorthand for an arbitrary pointer, but I lose the plot of this very long proposal. It seems to me that this increases API surface, yet everything I could do before, I could still do; it just involves more typing. What exactly does this make better?

Cheers,
Guillaume Lessard


(Andrew Trick) #9

I read this proposal and I'm a bit unsure what its purpose would be:

Basically you want to prevent UnsafePointer<XYZ>(UnsafePointer<Void>) conversions and/or vice-versa? And you'd achieve this by replacing UnsafePointer<Void> with UnsafeBytePointer that has no bound pointer type?

I want to prevent UnsafePointer<U>(UnsafePointer<T>) *except* when the destination is UnsafePointer<Void>.

UnsafePointer<Void>(UnsafePointer<T>) is fine.

UnsafeBytePointer provides two thing:
- A means to prevent the conversion above
- An API for legal type punning, which does not exist today

In one sense the change seems fine to me, but as someone who uses a lot of C APIs and a lot of CoreAudio/CoreMIDI in Swift already I can't really see what benefit it'd bring. Presumably we'd still want an option of converting UnsafeBytePointer to UnsafePointer<SomeActualType> for things like C function pointer callback "context"/"userInfo" uses, so it's not like we'd be preventing programmer error in that way.

It’s possible to cast UnsafeBytePointer to UnsafePointer<SomeActualType>. I want the programmer to make their intent explicit by writing a cast and spelling SomeActualType at the point of the cast. In the proposal, that’s done using a labeled initializer.

Call me conservative but to me the current system seems to work as well as it can. If anything it's already enough boilerplate going through hoops converting an UnsafeMutablePointer<Void> into a [Float] even when I know and the C API knows perfectly well what it actually contains... Would happily be convinced otherwise about this proposal though, I'm pretty new at all this.

I think you are asking for implicit conversions when calling C APIs. That’s good feedback. When implementing this proposal I tried to allow implicit conversions in reasonable cases, but leaned toward being conservative. I would rather see more explicit casts now and eliminate them if people find it awkward.

I'm looking for some consensus on core aspects of the proposal, then we can take into consideration precisely which implicit conversions should be supported.

-Andy

···

On May 9, 2016, at 12:38 PM, Geordie Jay <geojay@gmail.com> wrote:

Geordie
Andrew Trick via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> schrieb am Mo., 9. Mai 2016 um 20:15:
Hello Swift evolution,

I sent this to swift-dev last week. Sorry to post on two lists!

Swift does a great job of protecting against undefined behavior--as long as you avoid "unsafe" APIs, that is. However, unsafe APIs are important for giving developers control over implementation details and performance. Naturally, the contract between unsafe APIs and the optimizer is crucial. When a developer uses an unsafe API, the rules governing safe, well-defined behavior must be clear. On the opposite end, the optimizer must know which assumptions it can make based on those rules. Simply saying that anything goes because "unsafe" is in the name is not helpful to this effort.

For a long time, I've wanted these rules nailed down. We have more users taking advantage of advanced features, and more optimizations that take advantage of assumptions guided by the type system. This seems like a particularly good time to resolve UnsafePointer semantics, considering the type system and UnsafePointer work that's been going on recently. Strict aliasing is something I would like addressed. If we do nothing here, then we will end up by default inheriting C/C++ semantics, as with any language that relies on a C/C++ backend. In other words, developers will be forced to write code with technically undefined behavior and rely on the compiler to be smart enough to recognize and recover from common patterns. Or we can take advantage of this opportunity and instead adopt a sound memory model with respect to aliasing.

This proposal is only an RFC at this point. I'm sending it out now to allow for plenty of time for discussion (or advance warning). Keep in mind that it could change considerably before it goes up for review.

-Andy

UnsafeBytePointer API for In-Memory Layout

Proposal: SE-NNNN <https://github.com/atrick/swift-evolution/blob/voidpointer/proposals/XXXX-unsafebytepointer.md>
Author(s): Andrew Trick <https://github.com/atrick>
Status: Awaiting review <https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#rationale>
Review manager: TBD
<https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#introduction>Introduction

UnsafePointer and UnsafeMutable refer to a typed region of memory, and the compiler must be able to assume that UnsafePointer element (Pointee) type is consistent with other access to the same memory. See proposed Type Safe Memory Access documentation <https://github.com/atrick/swift/blob/type-safe-mem-docs/docs/TypeSafeMemory.rst>. Consequently, inferred conversion between UnsafePointer element types exposes an easy way to abuse the type system. No alternative currently exists for manual memory layout and direct access to untyped memory, and that leads to an overuse of UnsafePointer. These uses of UnsafePointer, which depend on pointer type conversion, make accidental type punning likely. Type punning via UnsafePointer is semantically undefined behavior and de facto undefined behavior given the optimizer's long-time treatment of UnsafePointer.

In this document, all mentions of UnsafePointer also apply to UnsafeMutablePointer.

<https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#motivation>Motivation

To avoid accidental type punning, we should prohibit inferred conversion between UnsafePointer<T> and UnsafePointer<U> unless the target of the conversion is an untyped or nondereferenceable pointer (currently represented as UnsafePointer<Void>).

To support this change we should introduce a new pointer type that does not bind the type of its Pointee. Such a new pointer type would provide an ideal foundation for an API that allows byte-wise pointer arithmetic and a legal, well-defined means to access an untyped region of memory.

As motivation for such an API, consider that an UnsafePointer<Void> or OpaquePointer may be currently be obtained from an external API. However, the developer may know the memory layout and may want to read or write elements whose types are compatible with that layout. This a reasonable use case, but unless the developer can guarantee that all accesses to the same memory location have the same type, then they cannot use UnsafePointer to access the memory without risking undefined behavior.

An UnsafeBytePointer example, using a new proposed API is included below.

<https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#proposed-solution>Proposed solution

Introduce an UnsafeBytePointer type along with an API for obtaining a UnsafeBytePointer value at a relative byte offset and loading and storing arbitrary types at that location.

Statically prohibit inferred UnsafePointer conversion while allowing inferred UnsafePointer to UnsafeBytePointerconversion.

UnsafeBytePointer meets multiple requirements:

An untyped pointer to memory
Pointer arithmetic within byte-addressable memory
Type-unsafe access to memory (legal type punning)
UnsafeBytePointer will replace UnsafeMutablePointer<Void> as the representation for untyped memory. For API clarify we could consider a typealias for VoidPointer. I don't think a separate VoidPointer type would be useful--there's no danger that UnsafeBytePointer will be casually dereferenced, and don't see the danger in allowing pointer arithmetic since the only reasonable interpretation is that of a byte-addressable memory.

Providing an API for type-unsafe memory access would not serve a purpose without the ability to compute byte offsets. Of course, we could require users to convert back and forth using bitPatterns, but I think that would be awkward and only obscure the purpose of the UnsafeBytePointer type.

In this proposal, UnsafeBytePointer does not specify mutability. Adding an UnsafeMutableBytePointer would be straightforward, but adding another pointer type needs strong justification. I expect to get input from the community on this. If we agree that the imported type for const void* should be UnsafeBytePointer, then we probably need UnsafeMutablePointer to handle interoperability.

<https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#detailed-design>Detailed design

The public API is shown here. For details and comments, see the unsafeptr_convert branch <https://github.com/atrick/swift/commits/unsafeptr_convert>.

struct UnsafeBytePointer : Hashable, _Pointer {

  let _rawValue: Builtin.RawPointer

  var hashValue: Int {...}

  init<T>(_ : UnsafePointer<T>)
  init<T>(_ : UnsafeMutablePointer<T>)
  init?<T>(_ : UnsafePointer<T>?)
  init?<T>(_ : UnsafeMutablePointer<T>?)

  init<T>(_ : OpaquePointer<T>)
  init?<T>(_ : OpaquePointer<T>?)

  init?(bitPattern: Int)
  init?(bitPattern: UInt)

  func load<T>(_ : T.Type) -> T

  @warn_unused_result
  init(allocatingBytes size: Int, alignedTo: Int)

  @warn_unused_result
  init<T>(allocatingCapacity count: Int, of: T.Type)

  func deallocateBytes(_ size: Int, alignedTo: Int)

  func deallocateCapacity<T>(_ num: Int, of: T.Type)

  // Returns a pointer one byte after the initialized memory.
  func initialize<T>(with newValue: T, count: Int = 1) -> UnsafeBytePointer

  // Returns a pointer one byte after the initialized memory.
  func initialize<T>(from: UnsafePointer<T>, count: Int) -> UnsafeBytePointer

  func initializeBackward<T>(from source: UnsafePointer<T>, count: Int)

  func deinitialize<T>(_ : T.Type, count: Int = 1)
}

extension OpaquePointer {
  init(_ : UnsafeBytePointer)
}

extension Int {
  init(bitPattern: UnsafeBytePointer)
}

extension UInt {
  init(bitPattern: UnsafeBytePointer)
}

extension UnsafeBytePointer : RandomAccessIndex {
  typealias Distance = Int

  func successor() -> UnsafeBytePointer
  func predecessor() -> UnsafeBytePointer
  func distance(to : UnsafeBytePointer) -> Int
  func advanced(by : Int) -> UnsafeBytePointer
}

func == (lhs: UnsafeBytePointer, rhs: UnsafeBytePointer) -> Bool

func < (lhs: UnsafeBytePointer, rhs: UnsafeBytePointer) -> Bool

func + (lhs: UnsafeBytePointer, rhs: Int) -> UnsafeBytePointer

func + (lhs: Int, rhs: UnsafeBytePointer) -> UnsafeBytePointer

func - (lhs: UnsafeBytePointer, rhs: Int) -> UnsafeBytePointer

func - (lhs: UnsafeBytePointer, rhs: UnsafeBytePointer) -> Int

func += (lhs: inout UnsafeBytePointer, rhs: Int)

func -= (lhs: inout UnsafeBytePointer, rhs: Int)
Occasionally, we need to convert from a UnsafeBytePointer to an UnsafePointer. This should only be done in very rare circumstances when the author understands the compiler's strict type rules for UnsafePointer. Although this could be done by casting through an OpaquePointer, an explicit, designated unsafe pointer cast API would makes the risks more obvious and self-documenting. For example:

extension UnsafePointer {
  init(_ from: UnsafeBytePointer, toPointee: Pointee.type)
}
extension UnsafeMutablePointer {
  init(_ from: UnsafeBytePointer, toPointee: Pointee.type)
}
Similarly, conversion between UnsafePointer types must now be spelled with an explicitly Pointee type:

extension UnsafePointer {
  init<U>(_ from: UnsafePointer<U>, toPointee: Pointee.Type)
  init<U>(_ from: UnsafeMutablePointer<U>, toPointee: Pointee.Type)
}
extension UnsafeMutablePointer {
  init<U>(_ from: UnsafeMutablePointer<U>, toPointee: Pointee.Type)
}
<https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#impact-on-existing-code>Impact on existing code

The largest impact of this change is that void* and const void* are imported as UnsafeBytePointer. This impacts many public APIs, but with implicit argument conversion should not affect typical uses of those APIs.

Any Swift projects that rely on type inference to convert between UnsafePointer types will need to take action. The developer needs to determine whether type punning is necessary. If so, they must migrate to the UnsafeBytePointer API. Otherwise, they can work around the new restriction by using a toPointee, or mutating label.

Disallowing inferred UnsafePointer direct conversion requires some standard library code to use an explicit toPointeelabel for unsafe conversions that may violate strict aliasing.

All occurrences of Unsafe[Mutable]Pointer<Void> in the standard library are converted to UnsafeBytePointer. e.g. unsafeAddress() now returns UnsafeBytePointer, not UnsafePointer<Void>.

Some occurrences of Unsafe[Mutable]Pointer<Pointee> in the standard library are replaced with UnsafeBytePointer, either because the code was playing too loosely with strict aliasing rules, or because the code actually wanted to perform pointer arithmetic on byte-addresses.

StringCore.baseAddress changes from OpaquePointer to UnsafeBytePointer because it is computing byte offsets and accessing the memory. OpaquePointer is meant for bridging, but should be truly opaque; that is, nondereferenceable and not involved in address computation.

The StringCore implementation does a considerable amount of casting between different views of the String storage. The current implementation already demonstrates some awareness of strict aliasing rules. The rules are generally followed by ensuring that the StringBuffer only be accessed using the appropriate CodeUnit within Swift code. For interoperability and optimization, String buffers frequently need to be cast to and from CChar. This is valid as long access to the buffer from Swift is guarded by dynamic checks of the encoding type. These unsafe, but dynamically legal conversion points will now be labeled with toPointee.

CoreAudio utilities now use an UnsafeBytePointer.

<https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#implementation-status>Implementation status

On my unsafeptr_convert branch <https://github.com/atrick/swift/commits/unsafeptr_convert>, I've made most of the necessary changes to support the addition of UnsafeBytePointerand the removal of inferred UnsafePointer conversion.

There are a several things going on here in order to make it possible to build the standard library with the changes:

A new UnsafeBytePointer type is defined.

The type system imports void* as UnsafeBytePointer.

The type system handles implicit conversions to UnsafeBytePointer.

UnsafeBytePointer replaces both UnsafePointer<Void> and UnsafeMutablePointer<Void>.

The standard library was relying on inferred UnsafePointer conversion in over 100 places. Most of these conversions now either take an explicit label, such as 'toPointee', 'mutating'. Some have been rewritten.

Several places in the standard library that were playing loosely with strict aliasing or doing bytewise pointer arithmetic now use UnsafeBytePointer instead.

Explicit labeled Unsafe[Mutable]Pointer initializers are added.

The inferred Unsafe[Mutable]Pointer conversion is removed.

TODO:

Once this proposal is accepted, and the rules for casting between pointers types have been decided, we need to finish implementing the type system support. The current implementation (intentionally) breaks a few tests in pointer_conversion.swift. We also need to ensure that interoperability requirements are met. Currently, many argument casts to be explicitly labeled. The current implementation also makes it easy for users to hit an "ambiguous use of 'init'" error when relying on implicit argument conversion.

Additionally:

A name mangled abbreviation needs to be created for UnsafeBytePointer.

The StringAPI tests should probably be rewritten with UnsafeBytePointer.

The NSStringAPI utilities and tests may need to be ported to UnsafeBytePointer

The CoreAudio utilities and tests may need to be ported to UnsafeBytePointer.

<https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#alternatives-considered>Alternatives considered

<https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#existing-workaround>Existing workaround

In some cases, developers can safely reinterpret values to achieve the same effect as type punning:

let ptrI32 = UnsafeMutablePointer<Int32>(allocatingCapacity: 1)
ptrI32[0] = Int32()
let u = unsafeBitCast(ptrI32[0], to: UInt32.self)
Note that all access to the underlying memory is performed with the same element type. This is perfectly legitimate, but simply isn't a complete solution. It also does not eliminate the inherent danger in declaring a typed pointer and expecting it to point to values of a different type.

<https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#discarded-alternatives>Discarded alternatives

We considered adding a typePunnedMemory property to the existing Unsafe[Mutabale]Pointer API. This would provide a legal way to access a potentially type punned Unsafe[Mutabale]Pointer. However, it would certainly cause confusion without doing much to reduce likelihood of programmer error. Furthermore, there are no good use cases for such a property evident in the standard library.

The opaque _RawByte struct is a technique that allows for byte-addressable buffers while hiding the dangerous side effects of type punning (a _RawByte could be loaded but it's value cannot be directly inspected). UnsafePointer<_RawByte> is a clever alternative to UnsafeBytePointer. However, it doesn't do enough to prevent undefined behavior. The loaded _RawByte would naturally be accessed via unsafeBitCast, which would mislead the author into thinking that they have legally bypassed the type system. In actuality, this API blatantly violates strict aliasing. It theoretically results in undefined behavior as it stands, and may actually exhibit undefined behavior if the user recovers the loaded value.

To solve the safety problem with UnsafePointer<_RawByte>, the compiler could associate special semantics with a UnsafePointer bound to this concrete generic parameter type. Statically enforcing casting rules would be difficult if not impossible without new language features. It would also be impossible to distinguish between typed and untyped pointer APIs. For example, UnsafePointer<T>.load<U> would be a nonsensical vestige.

<https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#alternate-proposal-for-void-type>Alternate proposal for void* type

Changing the imported type for void* will be somewhat disruptive. Furthermore, this proposal currently drops the distinction between void* and const void*--an obvious loss of API information.

We could continue to import void* as UnsafeMutablePointer<Void> and const void* as UnsafePointer<Void>, which will continue to serve as an "opaque" untyped pointer. Converting to UnsafeBytePointer would be necesarry to perform pointer arithmetic or to conservatively handle possible type punning.

This alternative is much less disruptive, but we are left with two forms of untyped pointer, one of which (UnsafePointer) the type system somewhat conflates with typed pointers.

Given the current restrictions of the language, it's not clear how to statically enforce the necessary rules for castingUnsafePointer<Void> once general


(Andrew Trick) #10

I was wondering if anyone would ask for ‘assign’. It presumes that you are storing the same type of object that was previously stored in your buffer. I didn’t want to proactively support that case because it’s a convenience and not really consistent with the pointer being type punned. You can always call deinitialize() first if you need to before calling ‘initialize'. I used ‘deinitialize’ to be consistent with UnsafeMutablePointer.

-Andy

···

On May 9, 2016, at 1:20 PM, Joe Groff <jgroff@apple.com> wrote:

Regarding the UnsafeBytePointer API:

struct UnsafeBytePointer : Hashable, _Pointer {

  let _rawValue: Builtin.RawPointer

  var hashValue: Int {...}

  init<T>(_ : UnsafePointer<T>)
  init<T>(_ : UnsafeMutablePointer<T>)
  init?<T>(_ : UnsafePointer<T>?)
  init?<T>(_ : UnsafeMutablePointer<T>?)

  init<T>(_ : OpaquePointer<T>)
  init?<T>(_ : OpaquePointer<T>?)

  init?(bitPattern: Int)
  init?(bitPattern: UInt)

  func load<T>(_ : T.Type) -> T

  @warn_unused_result
  init(allocatingBytes size: Int, alignedTo: Int)

  @warn_unused_result
  init<T>(allocatingCapacity count: Int, of: T.Type)

  func deallocateBytes(_ size: Int, alignedTo: Int)

  func deallocateCapacity<T>(_ num: Int, of: T.Type)

  // Returns a pointer one byte after the initialized memory.
  func initialize<T>(with newValue: T, count: Int = 1) -> UnsafeBytePointer

  // Returns a pointer one byte after the initialized memory.
  func initialize<T>(from: UnsafePointer<T>, count: Int) -> UnsafeBytePointer

  func initializeBackward<T>(from source: UnsafePointer<T>, count: Int)

  func deinitialize<T>(_ : T.Type, count: Int = 1)
}

Should we also have 'assign' methods, matching 'initialize'? Should 'deinitialize' be called 'destroy', matching 'UnsafeMutablePointer's API?


(Andrew Trick) #11

Absolutely, that’s the main point. I’ll work on the proposal's language. But I should point out that the optimizer has taken advantage of UnsafePointer’s type for a long time. We’re only saved because when we convert UMP types, we usually pass the memory off to an external C function, which acts as a boundary to optimization.

The current proposal grew out of my auditing the standard library, attempting to weed out undefined behavior.

Also note that I initially wanted to propose a much less ambitious API that allowed type punning, but otherwise left UMP unchanged. However, I got some strong feedback early on that if converting UMP types leads to undefined behavior, then it should be prohibited in the API, unless the programming explicitly requests the conversion. I happen to agree with that feedback. Since you and others also wanted a more complete API for manual memory layout, I saw that as one solution to both problems.

-Andy

···

On May 9, 2016, at 1:16 PM, Joe Groff <jgroff@apple.com> wrote:

Andy, I think it's worth clarifying the primary purpose of this proposal. Our main goal here is to provide a legal means for "type-punning" memory access. Like C and C++, it's technically undefined behavior in Swift to cast an UnsafePointer<T> to an UnsafePointer<U> of a different type and load a value out of memory that's of a different type from what was stored there. We don't take much advantage of this yet in Swift's optimizer, since we don't have good alternative API. UnsafeBytePointer seeks to fill this gap by providing a type that can safely do type-punned loads and stores.


(Xiaodi Wu) #12

Along similar lines, with the indexing model change, isn't the following
outdated?


extension UnsafeBytePointer : RandomAccessIndex {
  typealias Distance = Int

  func successor() -> UnsafeBytePointer
  func predecessor() -> UnsafeBytePointer
  func distance(to : UnsafeBytePointer) -> Int
  func advanced(by : Int) -> UnsafeBytePointer
}

···

On Mon, May 9, 2016 at 3:20 PM, Joe Groff via swift-evolution < swift-evolution@swift.org> wrote:

Regarding the UnsafeBytePointer API:

struct UnsafeBytePointer : Hashable, _Pointer {

  let _rawValue: Builtin.RawPointer

  var hashValue: Int {...}

  init<T>(_ : UnsafePointer<T>)
  init<T>(_ : UnsafeMutablePointer<T>)
  init?<T>(_ : UnsafePointer<T>?)
  init?<T>(_ : UnsafeMutablePointer<T>?)

  init<T>(_ : OpaquePointer<T>)
  init?<T>(_ : OpaquePointer<T>?)

  init?(bitPattern: Int)
  init?(bitPattern: UInt)

  func load<T>(_ : T.Type) -> T

  @warn_unused_result
  init(allocatingBytes size: Int, alignedTo: Int)

  @warn_unused_result
  init<T>(allocatingCapacity count: Int, of: T.Type)

  func deallocateBytes(_ size: Int, alignedTo: Int)

  func deallocateCapacity<T>(_ num: Int, of: T.Type)

  // Returns a pointer one byte after the initialized memory.
  func initialize<T>(with newValue: T, count: Int = 1) -> UnsafeBytePointer

  // Returns a pointer one byte after the initialized memory.
  func initialize<T>(from: UnsafePointer<T>, count: Int) -> UnsafeBytePointer

  func initializeBackward<T>(from source: UnsafePointer<T>, count: Int)

  func deinitialize<T>(_ : T.Type, count: Int = 1)
}

Should we also have 'assign' methods, matching 'initialize'? Should
'deinitialize' be called 'destroy', matching 'UnsafeMutablePointer's API?

-Joe

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Geordie J) #13

So what's in it for us as Swift devs?

It may be technically undefined behaviour (by that I think you mean there's
no real knowing what could happen), but it seems to be rampant throughout
pretty much all the C code I've come in contact with (I'm less familiar
with C++).

If we lose type information by calling a C API that takes a void pointer,
how can we hope to retrieve it in any safe way, other than saying "we
assume with good reason and hope to hell that this is what we say it is".
And if we can't do that, what advantage does this proposal provide over
what we already have?

···

Joe Groff <jgroff@apple.com> schrieb am Mo., 9. Mai 2016 um 22:16: > > > On May 9, 2016, at 12:38 PM, Geordie Jay via swift-evolution < > swift-evolution@swift.org> wrote:

>
> I read this proposal and I'm a bit unsure what its purpose would be:
>
> Basically you want to prevent UnsafePointer<XYZ>(UnsafePointer<Void>)
conversions and/or vice-versa? And you'd achieve this by replacing
UnsafePointer<Void> with UnsafeBytePointer that has no bound pointer type?
>
> In one sense the change seems fine to me, but as someone who uses a lot
of C APIs and a lot of CoreAudio/CoreMIDI in Swift already I can't really
see what benefit it'd bring. Presumably we'd still want an option of
converting UnsafeBytePointer to UnsafePointer<SomeActualType> for things
like C function pointer callback "context"/"userInfo" uses, so it's not
like we'd be preventing programmer error in that way.
>
> Call me conservative but to me the current system seems to work as well
as it can. If anything it's already enough boilerplate going through hoops
converting an UnsafeMutablePointer<Void> into a [Float] even when I know
and the C API knows perfectly well what it actually contains... Would
happily be convinced otherwise about this proposal though, I'm pretty new
at all this.
>
> Geordie

> On May 9, 2016, at 12:57 PM, Guillaume Lessard via swift-evolution < > swift-evolution@swift.org> wrote:
>
> I’m sympathetic to the elimination of UnsafePointer<Void> as general
shorthand for an arbitrary pointer, but I lose the plot of this very long
proposal. It seems to me that this increases API surface, yet everything I
could do before, I could still do; it just involves more typing. What
exactly does this make better?
>
> Cheers,
> Guillaume Lessard

Andy, I think it's worth clarifying the primary purpose of this proposal.
Our main goal here is to provide a legal means for "type-punning" memory
access. Like C and C++, it's technically undefined behavior in Swift to
cast an UnsafePointer<T> to an UnsafePointer<U> of a different type and
load a value out of memory that's of a different type from what was stored
there. We don't take much advantage of this yet in Swift's optimizer, since
we don't have good alternative API. UnsafeBytePointer seeks to fill this
gap by providing a type that can safely do type-punned loads and stores.

-Joe


(Geordie J) #14

I read this proposal and I'm a bit unsure what its purpose would be:

Basically you want to prevent UnsafePointer<XYZ>(UnsafePointer<Void>) conversions and/or vice-versa? And you'd achieve this by replacing UnsafePointer<Void> with UnsafeBytePointer that has no bound pointer type?

I want to prevent UnsafePointer<U>(UnsafePointer<T>) *except* when the destination is UnsafePointer<Void>.

UnsafePointer<Void>(UnsafePointer<T>) is fine.

UnsafeBytePointer provides two thing:
- A means to prevent the conversion above
- An API for legal type punning, which does not exist today

So you mean to enable UnsafePointer<Void> aka. UnsafeBytePointer(UnsafePointer<T>), but disable other type-to-type pointer recasts? I guess that’s a worthy goal at some level, but is there anything stopping someone just saying UnsafePointer(UnsafeBytePointer(myPointerToMemoryContainingTypeT), toPointee: U.type)?

It still just seems like we can do the same thing spelled differently. I don’t see how changing how that happens could benefit us or the compiler, but maybe this is one we should just take your word on.

Assuming the likely case that this is just beyond my understanding, I do wonder why we’d need to change the API. I guess there are a lot of assumptions made about both UnsafePointer<Void> and UnsafePointer<T> that don’t necessarily apply to both to an equal degree?

In one sense the change seems fine to me, but as someone who uses a lot of C APIs and a lot of CoreAudio/CoreMIDI in Swift already I can't really see what benefit it'd bring. Presumably we'd still want an option of converting UnsafeBytePointer to UnsafePointer<SomeActualType> for things like C function pointer callback "context"/"userInfo" uses, so it's not like we'd be preventing programmer error in that way.

It’s possible to cast UnsafeBytePointer to UnsafePointer<SomeActualType>. I want the programmer to make their intent explicit by writing a cast and spelling SomeActualType at the point of the cast. In the proposal, that’s done using a labeled initializer.

How is this different from what we do now, namely UnsafePointer<SomeActualType>(myUnsafePointer) <— I’m also spelling out SomeActualType there. I think I’m still misunderstanding something critical here.

From your email that just came in:

if converting UMP types leads to undefined behavior, then it should be prohibited in the API, unless the programming explicitly requests the conversion

This is the point I’d really like to try and understand: can you clarify how the new API is any more or less explicit than the old one?

Call me conservative but to me the current system seems to work as well as it can. If anything it's already enough boilerplate going through hoops converting an UnsafeMutablePointer<Void> into a [Float] even when I know and the C API knows perfectly well what it actually contains... Would happily be convinced otherwise about this proposal though, I'm pretty new at all this.

I think you are asking for implicit conversions when calling C APIs. That’s good feedback. When implementing this proposal I tried to allow implicit conversions in reasonable cases, but leaned toward being conservative. I would rather see more explicit casts now and eliminate them if people find it awkward.

Maybe, but I’m not sure how that’d look under this proposal. I mean Strings and literals currently being accepted as UnsafePointer<CChar> is a nice touch, and last I checked I can use [T, T, T, ...] array literals in place of UnsafePointer<T>, I certainly wouldn’t want to go below that level of conservatism here.

···

Am 09.05.2016 um 23:04 schrieb Andrew Trick <atrick@apple.com>:

On May 9, 2016, at 12:38 PM, Geordie Jay <geojay@gmail.com <mailto:geojay@gmail.com>> wrote:

I'm looking for some consensus on core aspects of the proposal, then we can take into consideration precisely which implicit conversions should be supported.

-Andy

Geordie
Andrew Trick via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> schrieb am Mo., 9. Mai 2016 um 20:15:
Hello Swift evolution,

I sent this to swift-dev last week. Sorry to post on two lists!

Swift does a great job of protecting against undefined behavior--as long as you avoid "unsafe" APIs, that is. However, unsafe APIs are important for giving developers control over implementation details and performance. Naturally, the contract between unsafe APIs and the optimizer is crucial. When a developer uses an unsafe API, the rules governing safe, well-defined behavior must be clear. On the opposite end, the optimizer must know which assumptions it can make based on those rules. Simply saying that anything goes because "unsafe" is in the name is not helpful to this effort.

For a long time, I've wanted these rules nailed down. We have more users taking advantage of advanced features, and more optimizations that take advantage of assumptions guided by the type system. This seems like a particularly good time to resolve UnsafePointer semantics, considering the type system and UnsafePointer work that's been going on recently. Strict aliasing is something I would like addressed. If we do nothing here, then we will end up by default inheriting C/C++ semantics, as with any language that relies on a C/C++ backend. In other words, developers will be forced to write code with technically undefined behavior and rely on the compiler to be smart enough to recognize and recover from common patterns. Or we can take advantage of this opportunity and instead adopt a sound memory model with respect to aliasing.

This proposal is only an RFC at this point. I'm sending it out now to allow for plenty of time for discussion (or advance warning). Keep in mind that it could change considerably before it goes up for review.

-Andy

UnsafeBytePointer API for In-Memory Layout

Proposal: SE-NNNN <https://github.com/atrick/swift-evolution/blob/voidpointer/proposals/XXXX-unsafebytepointer.md>
Author(s): Andrew Trick <https://github.com/atrick>
Status: Awaiting review <https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#rationale>
Review manager: TBD
<https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#introduction>Introduction

UnsafePointer and UnsafeMutable refer to a typed region of memory, and the compiler must be able to assume that UnsafePointer element (Pointee) type is consistent with other access to the same memory. See proposed Type Safe Memory Access documentation <https://github.com/atrick/swift/blob/type-safe-mem-docs/docs/TypeSafeMemory.rst>. Consequently, inferred conversion between UnsafePointer element types exposes an easy way to abuse the type system. No alternative currently exists for manual memory layout and direct access to untyped memory, and that leads to an overuse of UnsafePointer. These uses of UnsafePointer, which depend on pointer type conversion, make accidental type punning likely. Type punning via UnsafePointer is semantically undefined behavior and de facto undefined behavior given the optimizer's long-time treatment of UnsafePointer.

In this document, all mentions of UnsafePointer also apply to UnsafeMutablePointer.

<https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#motivation>Motivation

To avoid accidental type punning, we should prohibit inferred conversion between UnsafePointer<T> and UnsafePointer<U> unless the target of the conversion is an untyped or nondereferenceable pointer (currently represented as UnsafePointer<Void>).

To support this change we should introduce a new pointer type that does not bind the type of its Pointee. Such a new pointer type would provide an ideal foundation for an API that allows byte-wise pointer arithmetic and a legal, well-defined means to access an untyped region of memory.

As motivation for such an API, consider that an UnsafePointer<Void> or OpaquePointer may be currently be obtained from an external API. However, the developer may know the memory layout and may want to read or write elements whose types are compatible with that layout. This a reasonable use case, but unless the developer can guarantee that all accesses to the same memory location have the same type, then they cannot use UnsafePointer to access the memory without risking undefined behavior.

An UnsafeBytePointer example, using a new proposed API is included below.

<https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#proposed-solution>Proposed solution

Introduce an UnsafeBytePointer type along with an API for obtaining a UnsafeBytePointer value at a relative byte offset and loading and storing arbitrary types at that location.

Statically prohibit inferred UnsafePointer conversion while allowing inferred UnsafePointer to UnsafeBytePointerconversion.

UnsafeBytePointer meets multiple requirements:

An untyped pointer to memory
Pointer arithmetic within byte-addressable memory
Type-unsafe access to memory (legal type punning)
UnsafeBytePointer will replace UnsafeMutablePointer<Void> as the representation for untyped memory. For API clarify we could consider a typealias for VoidPointer. I don't think a separate VoidPointer type would be useful--there's no danger that UnsafeBytePointer will be casually dereferenced, and don't see the danger in allowing pointer arithmetic since the only reasonable interpretation is that of a byte-addressable memory.

Providing an API for type-unsafe memory access would not serve a purpose without the ability to compute byte offsets. Of course, we could require users to convert back and forth using bitPatterns, but I think that would be awkward and only obscure the purpose of the UnsafeBytePointer type.

In this proposal, UnsafeBytePointer does not specify mutability. Adding an UnsafeMutableBytePointer would be straightforward, but adding another pointer type needs strong justification. I expect to get input from the community on this. If we agree that the imported type for const void* should be UnsafeBytePointer, then we probably need UnsafeMutablePointer to handle interoperability.

<https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#detailed-design>Detailed design

The public API is shown here. For details and comments, see the unsafeptr_convert branch <https://github.com/atrick/swift/commits/unsafeptr_convert>.

struct UnsafeBytePointer : Hashable, _Pointer {

  let _rawValue: Builtin.RawPointer

  var hashValue: Int {...}

  init<T>(_ : UnsafePointer<T>)
  init<T>(_ : UnsafeMutablePointer<T>)
  init?<T>(_ : UnsafePointer<T>?)
  init?<T>(_ : UnsafeMutablePointer<T>?)

  init<T>(_ : OpaquePointer<T>)
  init?<T>(_ : OpaquePointer<T>?)

  init?(bitPattern: Int)
  init?(bitPattern: UInt)

  func load<T>(_ : T.Type) -> T

  @warn_unused_result
  init(allocatingBytes size: Int, alignedTo: Int)

  @warn_unused_result
  init<T>(allocatingCapacity count: Int, of: T.Type)

  func deallocateBytes(_ size: Int, alignedTo: Int)

  func deallocateCapacity<T>(_ num: Int, of: T.Type)

  // Returns a pointer one byte after the initialized memory.
  func initialize<T>(with newValue: T, count: Int = 1) -> UnsafeBytePointer

  // Returns a pointer one byte after the initialized memory.
  func initialize<T>(from: UnsafePointer<T>, count: Int) -> UnsafeBytePointer

  func initializeBackward<T>(from source: UnsafePointer<T>, count: Int)

  func deinitialize<T>(_ : T.Type, count: Int = 1)
}

extension OpaquePointer {
  init(_ : UnsafeBytePointer)
}

extension Int {
  init(bitPattern: UnsafeBytePointer)
}

extension UInt {
  init(bitPattern: UnsafeBytePointer)
}

extension UnsafeBytePointer : RandomAccessIndex {
  typealias Distance = Int

  func successor() -> UnsafeBytePointer
  func predecessor() -> UnsafeBytePointer
  func distance(to : UnsafeBytePointer) -> Int
  func advanced(by : Int) -> UnsafeBytePointer
}

func == (lhs: UnsafeBytePointer, rhs: UnsafeBytePointer) -> Bool

func < (lhs: UnsafeBytePointer, rhs: UnsafeBytePointer) -> Bool

func + (lhs: UnsafeBytePointer, rhs: Int) -> UnsafeBytePointer

func + (lhs: Int, rhs: UnsafeBytePointer) -> UnsafeBytePointer

func - (lhs: UnsafeBytePointer, rhs: Int) -> UnsafeBytePointer

func - (lhs: UnsafeBytePointer, rhs: UnsafeBytePointer) -> Int

func += (lhs: inout UnsafeBytePointer, rhs: Int)

func -= (lhs: inout UnsafeBytePointer, rhs: Int)
Occasionally, we need to convert from a UnsafeBytePointer to an UnsafePointer. This should only be done in very rare circumstances when the author understands the compiler's strict type rules for UnsafePointer. Although this could be done by casting through an OpaquePointer, an explicit, designated unsafe pointer cast API would makes the risks more obvious and self-documenting. For example:

extension UnsafePointer {
  init(_ from: UnsafeBytePointer, toPointee: Pointee.type)
}
extension UnsafeMutablePointer {
  init(_ from: UnsafeBytePointer, toPointee: Pointee.type)
}
Similarly, conversion between UnsafePointer types must now be spelled with an explicitly Pointee type:

extension UnsafePointer {
  init<U>(_ from: UnsafePointer<U>, toPointee: Pointee.Type)
  init<U>(_ from: UnsafeMutablePointer<U>, toPointee: Pointee.Type)
}
extension UnsafeMutablePointer {
  init<U>(_ from: UnsafeMutablePointer<U>, toPointee: Pointee.Type)
}
<https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#impact-on-existing-code>Impact on existing code

The largest impact of this change is that void* and const void* are imported as UnsafeBytePointer. This impacts many public APIs, but with implicit argument conversion should not affect typical uses of those APIs.

Any Swift projects that rely on type inference to convert between UnsafePointer types will need to take action. The developer needs to determine whether type punning is necessary. If so, they must migrate to the UnsafeBytePointer API. Otherwise, they can work around the new restriction by using a toPointee, or mutating label.

Disallowing inferred UnsafePointer direct conversion requires some standard library code to use an explicit toPointeelabel for unsafe conversions that may violate strict aliasing.

All occurrences of Unsafe[Mutable]Pointer<Void> in the standard library are converted to UnsafeBytePointer. e.g. unsafeAddress() now returns UnsafeBytePointer, not UnsafePointer<Void>.

Some occurrences of Unsafe[Mutable]Pointer<Pointee> in the standard library are replaced with UnsafeBytePointer, either because the code was playing too loosely with strict aliasing rules, or because the code actually wanted to perform pointer arithmetic on byte-addresses.

StringCore.baseAddress changes from OpaquePointer to UnsafeBytePointer because it is computing byte offsets and accessing the memory. OpaquePointer is meant for bridging, but should be truly opaque; that is, nondereferenceable and not involved in address computation.

The StringCore implementation does a considerable amount of casting between different views of the String storage. The current implementation already demonstrates some awareness of strict aliasing rules. The rules are generally followed by ensuring that the StringBuffer only be accessed using the appropriate CodeUnit within Swift code. For interoperability and optimization, String buffers frequently need to be cast to and from CChar. This is valid as long access to the buffer from Swift is guarded by dynamic checks of the encoding type. These unsafe, but dynamically legal conversion points will now be labeled with toPointee.

CoreAudio utilities now use an UnsafeBytePointer.

<https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#implementation-status>Implementation status

On my unsafeptr_convert branch <https://github.com/atrick/swift/commits/unsafeptr_convert>, I've made most of the necessary changes to support the addition of UnsafeBytePointerand the removal of inferred UnsafePointer conversion.

There are a several things going on here in order to make it possible to build the standard library with the changes:

A new UnsafeBytePointer type is defined.

The type system imports void* as UnsafeBytePointer.

The type system handles implicit conversions to UnsafeBytePointer.

UnsafeBytePointer replaces both UnsafePointer<Void> and UnsafeMutablePointer<Void>.

The standard library was relying on inferred UnsafePointer conversion in over 100 places. Most of these conversions now either take an explicit label, such as 'toPointee', 'mutating'. Some have been rewritten.

Several places in the standard library that were playing loosely with strict aliasing or doing bytewise pointer arithmetic now use UnsafeBytePointer instead.

Explicit labeled Unsafe[Mutable]Pointer initializers are added.

The inferred Unsafe[Mutable]Pointer conversion is removed.

TODO:

Once this proposal is accepted, and the rules for casting between pointers types have been decided, we need to finish implementing the type system support. The current implementation (intentionally) breaks a few tests in pointer_conversion.swift. We also need to ensure that interoperability requirements are met. Currently, many argument casts to be explicitly labeled. The current implementation also makes it easy for users to hit an "ambiguous use of 'init'" error when relying on implicit argument conversion.

Additionally:

A name mangled abbreviation needs to be created for UnsafeBytePointer.

The StringAPI tests should probably be rewritten with UnsafeBytePointer.

The NSStringAPI utilities and tests may need to be ported to UnsafeBytePointer

The CoreAudio utilities and tests may need to be ported to UnsafeBytePointer.

<https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#alternatives-considered>Alternatives considered

<https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#existing-workaround>Existing workaround

In some cases, developers can safely reinterpret values to achieve the same effect as type punning:

let ptrI32 = UnsafeMutablePointer<Int32>(allocatingCapacity: 1)
ptrI32[0] = Int32()
let u = unsafeBitCast(ptrI32[0], to: UInt32.self)
Note that all access to the underlying memory is performed with the same element type. This is perfectly legitimate, but simply isn't a complete solution. It also does not eliminate the inherent danger in declaring a typed pointer and expecting it to point to values of a different type.

<https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#discarded-alternatives>Discarded alternatives

We considered adding a typePunnedMemory property to the existing Unsafe[Mutabale]Pointer API. This would provide a legal way to access a potentially type punned Unsafe[Mutabale]Pointer. However, it would certainly cause confusion without doing much to reduce likelihood of programmer error. Furthermore, there are no good use cases for such a property evident in the standard library.

The opaque _RawByte struct is a technique that allows for byte-addressable buffers while hiding the dangerous side effects of type punning (a _RawByte could be loaded but it's value cannot be directly inspected). UnsafePointer<_RawByte> is a clever alternative to UnsafeBytePointer. However, it doesn't do enough to prevent undefined behavior. The loaded _RawByte would naturally be accessed via unsafeBitCast, which would mislead the author into thinking that they have legally bypassed the type system. In actuality, this API blatantly violates strict aliasing. It theoretically results in undefined behavior as it stands, and may actually exhibit undefined behavior if the user recovers the loaded value.

To solve the safety problem with UnsafePointer<_RawByte>, the compiler could associate special semantics with a UnsafePointer bound to this concrete generic parameter type. Statically enforcing casting rules would be difficult if not impossible without new language features. It would also be impossible to distinguish between typed and untyped pointer APIs. For example, UnsafePointer<T>.load<U> would be a nonsensical vestige.

<https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#alternate-proposal-for-void-type>Alternate proposal for void* type

Changing the imported type for void* will be somewhat disruptive. Furthermore, this proposal currently drops the distinction between void* and const void*--an obvious loss of API information.

We could continue to import void* as UnsafeMutablePointer<Void> and const void* as UnsafePointer<Void>, which will continue to serve as an "opaque" untyped pointer. Converting to UnsafeBytePointer would be necesarry to perform pointer arithmetic or to conservatively handle possible type punning.

This alternative is much less disruptive, but we are left with two forms of untyped pointer, one of which (UnsafePointer) the type system somewhat conflates with typed pointers.

Given the current restrictions of the language, it's not clear how to statically enforce the necessary rules for castingUnsafePointer<Void> once general


(Andrew Trick) #15

Future improvements

UnsafeBytePointer should eventually support unaligned memory access. I believe that we will eventually have a modifier that allows "packed" struct members. At that time we may also want to add a "packed" flag to UnsafeBytePointer's load and initialize methods.

We should probably call out the fact that `load` and `initialize` require alignment in the meantime.

Will do.

When accessing a memory buffer, it is generally convenient to cast to a type with known layout and compute offsets relative to the type's size. This is how UnsafePointer<Pointee> works. A generic UnsafeTypePunnedPointer<Pointee> could be introduced with the same interface as UnsafePointer<Pointer>, but without the strict aliasing requirements. This seems like an overdesign simply to avoid calling strideof() in an rare use case, but nothing prevents adding this type later.

This need could also be addressed with some additional convenience methods on UnsafeBytePointer to load or store at a given index, something like:

  func load<T>(asArrayOf type: T.Type, at index: Int) -> T {
    return (self + strideof(T) * index).load(T)
  }
  func initialize(asArrayOf type: T.Type, initialValue: T, at index: Int) {
    return (self + strideof(T) * index).initialize(initialValue)
  }

Yep. I like that.

-Andy

···

On May 12, 2016, at 8:41 AM, Joe Groff <jgroff@apple.com> wrote:


(Austin Zheng) #16

I think this is a good change with an admirable premise: that unsafety should not necessarily be a binary proposition - either avoid it completely, or use it and silently run into all sorts of potential UB pitfalls. (This is compounded by the problem that exactly what UB is is poorly understood - for example, how many engineers working with C assume that signed integer overflow must wrap? What happens when unsigned integers overflow?)

While it is not possible (or desirable) to protect developers from making any possible mistake, it would be great if Swift eventually reached a state where there were only a few straightforward ways to use the APIs to produce UB, and those few ways could be learned by developers wishing to work with the unsafe APIs.

Someone with more experience working with C APIs or raw memory will have to comment on the ergonomics of the change, and whether or not it introduces any unforeseen problems therein.

Austin

P.S. On an unrelated note, it might be better to host a proposal in a Gist or elsewhere; the first time I sent this message the mailing list software caused it to bounce. I suspect the same might have happened to other people's responses.

···

On May 19, 2016, at 12:08 AM, Andrew Trick via swift-evolution <swift-evolution@swift.org> wrote:

Hello Swift evolution,

I'm sending this proposal out again for another round of RFC. The first round did not get much specific feedback, and nothing has fundamentally changed. In this updated version I beefed up the explanation a bit and clarified the language.

-Andy

On May 9, 2016, at 11:14 AM, Andrew Trick via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Hello Swift evolution,

I sent this to swift-dev last week. Sorry to post on two lists!

Swift does a great job of protecting against undefined behavior--as long as you avoid "unsafe" APIs, that is. However, unsafe APIs are important for giving developers control over implementation details and performance. Naturally, the contract between unsafe APIs and the optimizer is crucial. When a developer uses an unsafe API, the rules governing safe, well-defined behavior must be clear. On the opposite end, the optimizer must know which assumptions it can make based on those rules. Simply saying that anything goes because "unsafe" is in the name is not helpful to this effort.

For a long time, I've wanted these rules nailed down. We have more users taking advantage of advanced features, and more optimizations that take advantage of assumptions guided by the type system. This seems like a particularly good time to resolve UnsafePointer semantics, considering the type system and UnsafePointer work that's been going on recently. Strict aliasing is something I would like addressed. If we do nothing here, then we will end up by default inheriting C/C++ semantics, as with any language that relies on a C/C++ backend. In other words, developers will be forced to write code with technically undefined behavior and rely on the compiler to be smart enough to recognize and recover from common patterns. Or we can take advantage of this opportunity and instead adopt a sound memory model with respect to aliasing.

This proposal is only an RFC at this point. I'm sending it out now to allow for plenty of time for discussion (or advance warning). Keep in mind that it could change considerably before it goes up for review.

-Andy

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Chris Lattner) #17

Hi Andy,

I think this is a reasonable proposal. It seems like the real win here is to be able to define TBAA rules for Unsafe[Mutable]Pointer references, instead of having to treat them *all* conservatively (something I’m generally supportive of). A few questions/observations:

- It seems like the proposal should include a discussion about that, because that’s a pretty substantial change to the programming model.

- Does TBAA for these accesses actually produce better performance in practice on any existing known use cases?

- Would it be possible for tools like UBSAN to catch violations of this? I’m not familiar with what ubsan does for C TBAA violations (if anything).

- It isn’t clear to me why it is important to change how "void*” is imported. Since you can’t deference an UnsafePointer<Void> anyway, why does it matter for this proposal?

-Chris

···

On May 19, 2016, at 12:08 AM, Andrew Trick via swift-evolution <swift-evolution@swift.org> wrote:

Hello Swift evolution,

I'm sending this proposal out again for another round of RFC. The first round did not get much specific feedback, and nothing has fundamentally changed. In this updated version I beefed up the explanation a bit and clarified the language.


(Russ Bishop) #18

UnsafeBytePointer API for In-Memory Layout

UnsafePointer and UnsafeMutable refer to a typed region of memory, and the compiler must be able to assume that UnsafePointer element (Pointee) type is consistent with other access to the same memory. See proposed Type Safe Memory Access documentation <https://github.com/atrick/swift/blob/type-safe-mem-docs/docs/TypeSafeMemory.rst>. Consequently, conversion between UnsafePointer element types exposes an easy way to abuse the type system.

I don’t necessarily disagree with the proposal but I think we should clearly answer the following question:

Why doesn’t UnsafePointer<T>(_: UnsafePointer<U>) read as UnsafePointer<T>(_: UnsafePointer<Void>). That is to say you can only “type pun” through a Void pointer. A convenience method could be offered, something like UnsafePointer.reinterpretBytes<U>(_ ptr: UnsafePointer<U>, as: U.Type) -> U so all valid cases of type punning can be explicit.

As motivation for such an API, consider that an UnsafePointer<Void> or OpaquePointer may be currently be obtained from an external API. However, the developer may know the memory layout and may want to read or write elements whose types are compatible with that layout. This a reasonable use case, but unless the developer can guarantee that all accesses to the same memory location have the same type, then they cannot use UnsafePointer to access the memory without risking undefined behavior.

IMHO if we had a @packed attribute a lot of this nonsense could be made explicit by defining a Swift struct that had the appropriate memory layout. This is how a lot of “PInvoke” stuff was done in the C# world. It also gives you an “out” if you need a very specific layout in memory for some other reason.

Just as with unsafeBitCast, although the destination of the cast can usually be inferred, we want the developer to explicitly state the intended destination type, both because type inferrence can be surprising, and because it's important to the reader for code comprehension.

I’d definitely prefer a labelled initializer, especially one with an uncommon name. IMHO It should immediately stand out in code reviews.

Note: For API clarity we could consider a typealias for VoidPointer. A separate VoidPointer type would not be very useful--there's no danger that UnsafeBytePointer will be casually dereferenced, and no danger in allowing pointer arithmetic since the only reasonable interpretation is that of a byte-addressable memory.

Agreed; even today messing with UnsafeMutablePointer<Void> requires you to understand that the size corresponds to bytes which is not intuitive.

Loading from and storing to memory via an Unsafe[Mutable]BytePointer is safe independent of the type of value being loaded or stored and independent of the memory's allocated type as long as layout guarantees are met (per the ABI). This allows legal type punning within Swift and allows Swift code to access a common region of memory that may be shared across an external interface that does not provide type safety guarantees. Accessing type punned memory directly through a designated Unsafe[Mutable]BytePointer type provides sound basis for compiler implementation of strict aliasing. This is in contrast with the approach of simply providing a special unsafe pointer cast operation for bypassing type safety, which cannot be reliably implemented.

I’m not sure how to word it but I feel like some of this might help if it were included at the very beginning so people understand why this is a problem. I also think the stdlib docs should have a lot more to say about the rules, undefined behavior, and the consequences thereof. That will be all that a lot of developers ever bother to learn on the subject (a shame but out of scope for a swift evolution proposal :slight_smile: )

Russ


(Joe Groff) #19

So what's in it for us as Swift devs?

It may be technically undefined behaviour (by that I think you mean there's no real knowing what could happen), but it seems to be rampant throughout pretty much all the C code I've come in contact with (I'm less familiar with C++).

Undefined behavior means that the compiler can optimize as if it couldn't happen. For example, in this C code:

  int foo(int *x, float *y) {
    *x = 2;
    *y = 3.0;
    return *x;
  }

the compiler will likely optimize 'foo' to always return 2, since it's allowed to assume its pointer parameters x and y are different types so don't alias, If code calls `foo` with aliasing pointers such as `foo(&x, (float*)&x)`, it'll break.

If we lose type information by calling a C API that takes a void pointer, how can we hope to retrieve it in any safe way, other than saying "we assume with good reason and hope to hell that this is what we say it is".

This doesn't change anything in that respect. The aliasing rules in C and Swift refer to the type of value that's dynamically stored in memory, not the static type of a pointer. It's legal to cast a pointer from T* to void* and back to T*, and load a T from the resulting pointer, so long as a T value resides in the referenced memory at the time the load occurs.

And if we can't do that, what advantage does this proposal provide over what we already have?

This API gives you a way to legally perform pointer type punning, when you do want to reinterpret memory as a different type. In C and C++ the only standard way to do so is to `memcpy`.

-Joe

···

On May 9, 2016, at 1:25 PM, Geordie Jay <geojay@gmail.com> wrote:

Joe Groff <jgroff@apple.com> schrieb am Mo., 9. Mai 2016 um 22:16: > > > On May 9, 2016, at 12:38 PM, Geordie Jay via swift-evolution <swift-evolution@swift.org> wrote:
>
> I read this proposal and I'm a bit unsure what its purpose would be:
>
> Basically you want to prevent UnsafePointer<XYZ>(UnsafePointer<Void>) conversions and/or vice-versa? And you'd achieve this by replacing UnsafePointer<Void> with UnsafeBytePointer that has no bound pointer type?
>
> In one sense the change seems fine to me, but as someone who uses a lot of C APIs and a lot of CoreAudio/CoreMIDI in Swift already I can't really see what benefit it'd bring. Presumably we'd still want an option of converting UnsafeBytePointer to UnsafePointer<SomeActualType> for things like C function pointer callback "context"/"userInfo" uses, so it's not like we'd be preventing programmer error in that way.
>
> Call me conservative but to me the current system seems to work as well as it can. If anything it's already enough boilerplate going through hoops converting an UnsafeMutablePointer<Void> into a [Float] even when I know and the C API knows perfectly well what it actually contains... Would happily be convinced otherwise about this proposal though, I'm pretty new at all this.
>
> Geordie

> On May 9, 2016, at 12:57 PM, Guillaume Lessard via swift-evolution <swift-evolution@swift.org> wrote:
>
> I’m sympathetic to the elimination of UnsafePointer<Void> as general shorthand for an arbitrary pointer, but I lose the plot of this very long proposal. It seems to me that this increases API surface, yet everything I could do before, I could still do; it just involves more typing. What exactly does this make better?
>
> Cheers,
> Guillaume Lessard

Andy, I think it's worth clarifying the primary purpose of this proposal. Our main goal here is to provide a legal means for "type-punning" memory access. Like C and C++, it's technically undefined behavior in Swift to cast an UnsafePointer<T> to an UnsafePointer<U> of a different type and load a value out of memory that's of a different type from what was stored there. We don't take much advantage of this yet in Swift's optimizer, since we don't have good alternative API. UnsafeBytePointer seeks to fill this gap by providing a type that can safely do type-punned loads and stores.

-Joe


(Andrew Trick) #20

Along similar lines, with the indexing model change, isn't the following outdated?

Yes. Thanks. I’m working on updating both the proposal and implementation.
-Andy

···

On May 9, 2016, at 1:23 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

extension UnsafeBytePointer : RandomAccessIndex {
  typealias Distance = Int

  func successor() -> UnsafeBytePointer
  func predecessor() -> UnsafeBytePointer
  func distance(to : UnsafeBytePointer) -> Int
  func advanced(by : Int) -> UnsafeBytePointer
}

On Mon, May 9, 2016 at 3:20 PM, Joe Groff via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:
Regarding the UnsafeBytePointer API:

struct UnsafeBytePointer : Hashable, _Pointer {

  let _rawValue: Builtin.RawPointer

  var hashValue: Int {...}

  init<T>(_ : UnsafePointer<T>)
  init<T>(_ : UnsafeMutablePointer<T>)
  init?<T>(_ : UnsafePointer<T>?)
  init?<T>(_ : UnsafeMutablePointer<T>?)

  init<T>(_ : OpaquePointer<T>)
  init?<T>(_ : OpaquePointer<T>?)

  init?(bitPattern: Int)
  init?(bitPattern: UInt)

  func load<T>(_ : T.Type) -> T

  @warn_unused_result
  init(allocatingBytes size: Int, alignedTo: Int)

  @warn_unused_result
  init<T>(allocatingCapacity count: Int, of: T.Type)

  func deallocateBytes(_ size: Int, alignedTo: Int)

  func deallocateCapacity<T>(_ num: Int, of: T.Type)

  // Returns a pointer one byte after the initialized memory.
  func initialize<T>(with newValue: T, count: Int = 1) -> UnsafeBytePointer

  // Returns a pointer one byte after the initialized memory.
  func initialize<T>(from: UnsafePointer<T>, count: Int) -> UnsafeBytePointer

  func initializeBackward<T>(from source: UnsafePointer<T>, count: Int)

  func deinitialize<T>(_ : T.Type, count: Int = 1)
}

Should we also have 'assign' methods, matching 'initialize'? Should 'deinitialize' be called 'destroy', matching 'UnsafeMutablePointer's API?

-Joe

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution