[RFC] UnsafeBytePointer API for In-Memory Layout

Joe_Groff · May 9, 2016, 9:23pm

Regarding the UnsafeBytePointer API:

struct UnsafeBytePointer : Hashable
, _Pointer {

let _rawValue: Builtin.
RawPointer

var hashValue: Int {...
}

init<T>(_ : UnsafePointer<T>
)

init<T>(_ : UnsafeMutablePointer<T>
)

init?<T>(_ : UnsafePointer<T>
?)

init?<T>(_ : UnsafeMutablePointer<T>
?)

init<T>(_ : OpaquePointer<T>
)

init?<T>(_ : OpaquePointer<T>
?)

init?(bitPattern: Int
)

init?(bitPattern: UInt
)

func load<T>(_ : T.Type) ->
T

@warn_unused_result

init(allocatingBytes size: Int, alignedTo: Int
)

@warn_unused_result

init<T>(allocatingCapacity count: Int, of: T.Type
)

func deallocateBytes(_ size: Int, alignedTo: Int
)

func deallocateCapacity<T>(_ num: Int, of: T.Type
)

// Returns a pointer one byte after the initialized memory.

func initialize<T>(with newValue: T, count: Int = 1) ->
UnsafeBytePointer

// Returns a pointer one byte after the initialized memory.

func initialize<T>(from: UnsafePointer<T>, count: Int) ->
UnsafeBytePointer

func initializeBackward<T>(from source: UnsafePointer<T>, count: Int
)

func deinitialize<T>(_ : T.Type, count: Int = 1
)
}

Should we also have 'assign' methods, matching 'initialize'? Should 'deinitialize' be called 'destroy', matching 'UnsafeMutablePointer's API?

I was wondering if anyone would ask for ‘assign’. It presumes that you are storing the same type of object that was previously stored in your buffer. I didn’t want to proactively support that case because it’s a convenience and not really consistent with the pointer being type punned. You can always call deinitialize() first if you need to before calling ‘initialize'.

I see. I guess it makes sense that, once you've initialized as the new type, you can cast to UMP and type-safe-ly reassign via that interface.

I used ‘deinitialize’ to be consistent with UnsafeMutablePointer.

My mistake, I hadn't noticed that 'destroy' was renamed there too.

-Joe

···

On May 9, 2016, at 2:18 PM, Andrew Trick <atrick@apple.com> wrote:

On May 9, 2016, at 1:20 PM, Joe Groff <jgroff@apple.com> wrote:

Geordie_J · May 9, 2016, 8:58pm

Thanks for your patient clarification Joe.

My understanding was that type punning == your example with T* -> Void* ->
T* -> T. Assuming it's not, I now imagine you're talking about
reinterpreting the layout of C structs and the like for some horrifically
beautiful optimisation or low-level trick purpose, which sounds nice but is
way beyond my level of understanding or needs.

I'll take your word for it that the example with aliasing pointers is
something that might actually happen, as it stands it just looks like an
unfortunate programmer error, not sure if that was your point (to catch
that kind of thing before it happens).

···

Joe Groff <jgroff@apple.com> schrieb am Mo., 9. Mai 2016 um 22:48: > > > On May 9, 2016, at 1:25 PM, Geordie Jay <geojay@gmail.com> wrote:

>
> So what's in it for us as Swift devs?
>
> It may be technically undefined behaviour (by that I think you mean
there's no real knowing what could happen), but it seems to be rampant
throughout pretty much all the C code I've come in contact with (I'm less
familiar with C++).

Undefined behavior means that the compiler can optimize as if it couldn't
happen. For example, in this C code:

 int foo(int *x, float *y) {
 *x = 2;
 *y = 3.0;
 return *x;
 }

the compiler will likely optimize 'foo' to always return 2, since it's
allowed to assume its pointer parameters x and y are different types so
don't alias, If code calls `foo` with aliasing pointers such as `foo(&x,
(float*)&x)`, it'll break.

> If we lose type information by calling a C API that takes a void
pointer, how can we hope to retrieve it in any safe way, other than saying
"we assume with good reason and hope to hell that this is what we say it
is".

This doesn't change anything in that respect. The aliasing rules in C and
Swift refer to the type of value that's dynamically stored in memory, not
the static type of a pointer. It's legal to cast a pointer from T* to void*
and back to T*, and load a T from the resulting pointer, so long as a T
value resides in the referenced memory at the time the load occurs.

> And if we can't do that, what advantage does this proposal provide over
what we already have?

This API gives you a way to legally perform pointer type punning, when you
do want to reinterpret memory as a different type. In C and C++ the only
standard way to do so is to `memcpy`.

-Joe

> Joe Groff <jgroff@apple.com> schrieb am Mo., 9. Mai 2016 um 22:16: > > > > > On May 9, 2016, at 12:38 PM, Geordie Jay via swift-evolution < > swift-evolution@swift.org> wrote:
> >
> > I read this proposal and I'm a bit unsure what its purpose would be:
> >
> > Basically you want to prevent UnsafePointer<XYZ>(UnsafePointer<Void>)
conversions and/or vice-versa? And you'd achieve this by replacing
UnsafePointer<Void> with UnsafeBytePointer that has no bound pointer type?
> >
> > In one sense the change seems fine to me, but as someone who uses a
lot of C APIs and a lot of CoreAudio/CoreMIDI in Swift already I can't
really see what benefit it'd bring. Presumably we'd still want an option of
converting UnsafeBytePointer to UnsafePointer<SomeActualType> for things
like C function pointer callback "context"/"userInfo" uses, so it's not
like we'd be preventing programmer error in that way.
> >
> > Call me conservative but to me the current system seems to work as
well as it can. If anything it's already enough boilerplate going through
hoops converting an UnsafeMutablePointer<Void> into a [Float] even when I
know and the C API knows perfectly well what it actually contains... Would
happily be convinced otherwise about this proposal though, I'm pretty new
at all this.
> >
> > Geordie
>
> > On May 9, 2016, at 12:57 PM, Guillaume Lessard via swift-evolution < > swift-evolution@swift.org> wrote:
> >
> > I’m sympathetic to the elimination of UnsafePointer<Void> as general
shorthand for an arbitrary pointer, but I lose the plot of this very long
proposal. It seems to me that this increases API surface, yet everything I
could do before, I could still do; it just involves more typing. What
exactly does this make better?
> >
> > Cheers,
> > Guillaume Lessard
>
> Andy, I think it's worth clarifying the primary purpose of this
proposal. Our main goal here is to provide a legal means for "type-punning"
memory access. Like C and C++, it's technically undefined behavior in Swift
to cast an UnsafePointer<T> to an UnsafePointer of a different type and
load a value out of memory that's of a different type from what was stored
there. We don't take much advantage of this yet in Swift's optimizer, since
we don't have good alternative API. UnsafeBytePointer seeks to fill this
gap by providing a type that can safely do type-punned loads and stores.
>
> -Joe

Andrew_Trick · May 10, 2016, 7:12am

I read this proposal and I'm a bit unsure what its purpose would be:

Basically you want to prevent UnsafePointer<XYZ>(UnsafePointer<Void>) conversions and/or vice-versa? And you'd achieve this by replacing UnsafePointer<Void> with UnsafeBytePointer that has no bound pointer type?

I want to prevent UnsafePointer(UnsafePointer<T>) *except* when the destination is UnsafePointer<Void>.

UnsafePointer<Void>(UnsafePointer<T>) is fine.

UnsafeBytePointer provides two thing:
- A means to prevent the conversion above
- An API for legal type punning, which does not exist today

So you mean to enable UnsafePointer<Void> aka. UnsafeBytePointer(UnsafePointer<T>), but disable other type-to-type pointer recasts? I guess that’s a worthy goal at some level, but is there anything stopping someone just saying UnsafePointer(UnsafeBytePointer(myPointerToMemoryContainingTypeT), toPointee: U.type)?

Nothing prevents that. But I want to be able to see when it happens and want it to be very deliberate.

It still just seems like we can do the same thing spelled differently. I don’t see how changing how that happens could benefit us or the compiler, but maybe this is one we should just take your word on.

Assuming the likely case that this is just beyond my understanding, I do wonder why we’d need to change the API. I guess there are a lot of assumptions made about both UnsafePointer<Void> and UnsafePointer<T> that don’t necessarily apply to both to an equal degree?

Just to reiterate. There’s currently no way to legally type pun. UnsafeBytePointer solves that problem. I’m looking for feedback on whether it should also replace UnsafePointer<Void>. However, if UnsafeBytePointer does not replace UnsafePointer<Void> then I can’t enforce the casting rules that I want. I’ll dedicate the rest of this message to that problem..

In one sense the change seems fine to me, but as someone who uses a lot of C APIs and a lot of CoreAudio/CoreMIDI in Swift already I can't really see what benefit it'd bring. Presumably we'd still want an option of converting UnsafeBytePointer to UnsafePointer<SomeActualType> for things like C function pointer callback "context"/"userInfo" uses, so it's not like we'd be preventing programmer error in that way.

It’s possible to cast UnsafeBytePointer to UnsafePointer<SomeActualType>. I want the programmer to make their intent explicit by writing a cast and spelling SomeActualType at the point of the cast. In the proposal, that’s done using a labeled initializer.

How is this different from what we do now, namely UnsafePointer<SomeActualType>(myUnsafePointer) <— I’m also spelling out SomeActualType there. I think I’m still misunderstanding something critical here.

I'd be fine with UnsafePointer<ToType>(p), but type inference means that you can usually omit the generic parameter. I do also find the label helpful. I think a standalone function for converting pointer types would work, but I feel like we've been moving toward initializers for conversion.

I used the term "inferred initialization" in the proposal without defining it. What I mean by an "inferred initializer" is that the generic type being initialized is inferred by the argument type and/or the overloaded initializer itself is resolved based on the argument type. These behaviors can be prevented by providing an argument label and requiring the initialized type parameter to be passed as an argument.

Here's an example of initializing generic UnsafePointer via inferrence:

func takesUMP(_ p: UnsafeMutablePointer<UInt>) -> UInt {
return p[0]
}

func scary(q: UnsafeMutablePointer<Int>) -> UInt {
return takesUMP(UnsafeMutablePointer(q))
}

I don't see any cues that 'scary' may be introducing undefined behavior.

What I'm proposing with UnsafePointer initialization is the same thing that we do with unsafeBitCast. The destination of the cast can usually be inferred, but we want the developer to explicitly state the expected destination type both because type inferrence can be surprising, and because it's important to the reader for code comprehension.

"How does this make the developer's life easier" is the wrong question. I'm trying to make something that developer's probably shouldn't be doing harder to do. Type punning is not a normal use case for UnsafePointer. It's likely going to lead to undefined behavior. For that reason, it should not be something that users of UnsafePointer fall into unaware. I also think it's critical for code inspection to be able to identify points where this conversion happens. The pointer conversion by itself is not undefined behavior, but it is the only point in code where we can identify possible type punning. It's the only anchor we have for finding code that makes extremely subtle assumptions about types and requires intense scrutiny.

I had to sift through hundreds of occurrences in the standard library of type inferred UnsafePointer intializers of the form: UnsafePointer(p). Most of those were not actually doing "unsafe" conversion, but I had no way of knowing.

Most of the conversion in the standard library turned out to be:
- to and from UnsafePointer<Void>
- to and from mutating pointers
- to and from Autoreleasing pointers

I cannot emphasize enough how much easier it would have been for me to understand the code if each cast's intention had been spelled out.

From your email that just came in:

if converting UMP types leads to undefined behavior, then it should be prohibited in the API, unless the programming explicitly requests the conversion

This is the point I’d really like to try and understand: can you clarify how the new API is any more or less explicit than the old one?

Let me first claim that we should not rely on implicit argument conversion to indicate which casts are safe. Clearly an implicit conversion should be safe. But implicit argument conversion doesn't apply everywhere, and writing an explicit initialization doesn't tell me that something unsafe is happening. I maintain that it is reasonable to write "UnsafePointer(p)" whenever the programmer wants to be explicit about the expression's type, and that should not by itself indicate any unsafe conversion is happening.

So with that in mind, I'll make everything explicit in this example:

func takesVoid (_: UnsafePointer<Void>) {}
func takesUP_T (_: UnsafePointer<T>) {}
func takesUP_X (_: UnsafePointer<X>) {}
func takesUMP_T (_: UnsafeMutablePointer<T>) {}
func takesUMP_X (_: UnsafeMutablePointer<X>) {}

let p = UnsafePointer<T>(...)

Before:

takesVoid(UnsafePointer(p))

takesUP_T(UnsafePointer(p))

takesUP_X(UnsafePointer(p))

takesUMP_T(UnsafeMutablePointer(p))

takesUMP_X(UnsafeMutablePointer(p))

After:

takesVoid(UnsafeBytePointer(p))

takesUP_T(UnsafePointer(p))

takesUP_X(UnsafePointer(p, to: X.self))

takesUMP_T(UnsafeMutablePointer(mutating: p))

takesUMP_X(UnsafeMutablePointer(mutating: UnsafePointer(p, to: X.self)))

(I admit the ".self" syntax is ugly and it would be nice if it went away).

This will result in redundancies such as:

- self = String.decodeCString(UnsafePointer(cString), as: UTF8.self,
+ self = String.decodeCString(UnsafePointer(cString, to: UTF8.CodeUnit.self), as: UTF8.self,

In this case the API was designed for inferred initializers, and we could probably do away with the 'as' label now. But keep in mind that my goal is not to make type punning look pretty. Let's expose its ugliness.

Call me conservative but to me the current system seems to work as well as it can. If anything it's already enough boilerplate going through hoops converting an UnsafeMutablePointer<Void> into a [Float] even when I know and the C API knows perfectly well what it actually contains... Would happily be convinced otherwise about this proposal though, I'm pretty new at all this.

I think you are asking for implicit conversions when calling C APIs. That’s good feedback. When implementing this proposal I tried to allow implicit conversions in reasonable cases, but leaned toward being conservative. I would rather see more explicit casts now and eliminate them if people find it awkward.

Maybe, but I’m not sure how that’d look under this proposal. I mean Strings and literals currently being accepted as UnsafePointer<CChar> is a nice touch, and last I checked I can use [T, T, T, ...] array literals in place of UnsafePointer<T>, I certainly wouldn’t want to go below that level of conservatism here.

Right. Implicit conversion from String and Array literals to UnsafeBytePointer is one of the main things I didn't implement yet (it still works the same
for UnsafePointer). I can do that, but wanted feedback on other parts of the proposal first.

-Andy

···

On May 9, 2016, at 2:43 PM, Geordie J <geojay@gmail.com> wrote:

Am 09.05.2016 um 23:04 schrieb Andrew Trick <atrick@apple.com <mailto:atrick@apple.com>>:

On May 9, 2016, at 12:38 PM, Geordie Jay <geojay@gmail.com <mailto:geojay@gmail.com>> wrote:

I'm looking for some consensus on core aspects of the proposal, then we can take into consideration precisely which implicit conversions should be supported.

-Andy

Geordie
Andrew Trick via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> schrieb am Mo., 9. Mai 2016 um 20:15:
Hello Swift evolution,

I sent this to swift-dev last week. Sorry to post on two lists!

Swift does a great job of protecting against undefined behavior--as long as you avoid "unsafe" APIs, that is. However, unsafe APIs are important for giving developers control over implementation details and performance. Naturally, the contract between unsafe APIs and the optimizer is crucial. When a developer uses an unsafe API, the rules governing safe, well-defined behavior must be clear. On the opposite end, the optimizer must know which assumptions it can make based on those rules. Simply saying that anything goes because "unsafe" is in the name is not helpful to this effort.

For a long time, I've wanted these rules nailed down. We have more users taking advantage of advanced features, and more optimizations that take advantage of assumptions guided by the type system. This seems like a particularly good time to resolve UnsafePointer semantics, considering the type system and UnsafePointer work that's been going on recently. Strict aliasing is something I would like addressed. If we do nothing here, then we will end up by default inheriting C/C++ semantics, as with any language that relies on a C/C++ backend. In other words, developers will be forced to write code with technically undefined behavior and rely on the compiler to be smart enough to recognize and recover from common patterns. Or we can take advantage of this opportunity and instead adopt a sound memory model with respect to aliasing.

This proposal is only an RFC at this point. I'm sending it out now to allow for plenty of time for discussion (or advance warning). Keep in mind that it could change considerably before it goes up for review.

-Andy

UnsafeBytePointer API for In-Memory Layout

Proposal: SE-NNNN <https://github.com/atrick/swift-evolution/blob/voidpointer/proposals/XXXX-unsafebytepointer.md>
Author(s): Andrew Trick <https://github.com/atrick>
Status: Awaiting review <https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#rationale>
Review manager: TBD
<https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#introduction>Introduction

UnsafePointer and UnsafeMutable refer to a typed region of memory, and the compiler must be able to assume that UnsafePointer element (Pointee) type is consistent with other access to the same memory. See proposed Type Safe Memory Access documentation <https://github.com/atrick/swift/blob/type-safe-mem-docs/docs/TypeSafeMemory.rst>\. Consequently, inferred conversion between UnsafePointer element types exposes an easy way to abuse the type system. No alternative currently exists for manual memory layout and direct access to untyped memory, and that leads to an overuse of UnsafePointer. These uses of UnsafePointer, which depend on pointer type conversion, make accidental type punning likely. Type punning via UnsafePointer is semantically undefined behavior and de facto undefined behavior given the optimizer's long-time treatment of UnsafePointer.

In this document, all mentions of UnsafePointer also apply to UnsafeMutablePointer.

<https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#motivation>Motivation

To avoid accidental type punning, we should prohibit inferred conversion between UnsafePointer<T> and UnsafePointer unless the target of the conversion is an untyped or nondereferenceable pointer (currently represented as UnsafePointer<Void>).

To support this change we should introduce a new pointer type that does not bind the type of its Pointee. Such a new pointer type would provide an ideal foundation for an API that allows byte-wise pointer arithmetic and a legal, well-defined means to access an untyped region of memory.

As motivation for such an API, consider that an UnsafePointer<Void> or OpaquePointer may be currently be obtained from an external API. However, the developer may know the memory layout and may want to read or write elements whose types are compatible with that layout. This a reasonable use case, but unless the developer can guarantee that all accesses to the same memory location have the same type, then they cannot use UnsafePointer to access the memory without risking undefined behavior.

An UnsafeBytePointer example, using a new proposed API is included below.

<https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#proposed-solution>Proposed solution

Introduce an UnsafeBytePointer type along with an API for obtaining a UnsafeBytePointer value at a relative byte offset and loading and storing arbitrary types at that location.

Statically prohibit inferred UnsafePointer conversion while allowing inferred UnsafePointer to UnsafeBytePointerconversion.

UnsafeBytePointer meets multiple requirements:

An untyped pointer to memory
Pointer arithmetic within byte-addressable memory
Type-unsafe access to memory (legal type punning)
UnsafeBytePointer will replace UnsafeMutablePointer<Void> as the representation for untyped memory. For API clarify we could consider a typealias for VoidPointer. I don't think a separate VoidPointer type would be useful--there's no danger that UnsafeBytePointer will be casually dereferenced, and don't see the danger in allowing pointer arithmetic since the only reasonable interpretation is that of a byte-addressable memory.

Providing an API for type-unsafe memory access would not serve a purpose without the ability to compute byte offsets. Of course, we could require users to convert back and forth using bitPatterns, but I think that would be awkward and only obscure the purpose of the UnsafeBytePointer type.

In this proposal, UnsafeBytePointer does not specify mutability. Adding an UnsafeMutableBytePointer would be straightforward, but adding another pointer type needs strong justification. I expect to get input from the community on this. If we agree that the imported type for const void* should be UnsafeBytePointer, then we probably need UnsafeMutablePointer to handle interoperability.

<https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#detailed-design>Detailed design

The public API is shown here. For details and comments, see the unsafeptr_convert branch <https://github.com/atrick/swift/commits/unsafeptr_convert>\.

struct UnsafeBytePointer : Hashable, _Pointer {

 let _rawValue: Builtin.RawPointer

 var hashValue: Int {...}

 init<T>(_ : UnsafePointer<T>)
 init<T>(_ : UnsafeMutablePointer<T>)
 init?<T>(_ : UnsafePointer<T>?)
 init?<T>(_ : UnsafeMutablePointer<T>?)

 init<T>(_ : OpaquePointer<T>)
 init?<T>(_ : OpaquePointer<T>?)

 init?(bitPattern: Int)
 init?(bitPattern: UInt)

 func load<T>(_ : T.Type) -> T

 @warn_unused_result
 init(allocatingBytes size: Int, alignedTo: Int)

 @warn_unused_result
 init<T>(allocatingCapacity count: Int, of: T.Type)

 func deallocateBytes(_ size: Int, alignedTo: Int)

 func deallocateCapacity<T>(_ num: Int, of: T.Type)

 // Returns a pointer one byte after the initialized memory.
 func initialize<T>(with newValue: T, count: Int = 1) -> UnsafeBytePointer

 // Returns a pointer one byte after the initialized memory.
 func initialize<T>(from: UnsafePointer<T>, count: Int) -> UnsafeBytePointer

 func initializeBackward<T>(from source: UnsafePointer<T>, count: Int)

 func deinitialize<T>(_ : T.Type, count: Int = 1)
}

extension OpaquePointer {
 init(_ : UnsafeBytePointer)
}

extension Int {
 init(bitPattern: UnsafeBytePointer)
}

extension UInt {
 init(bitPattern: UnsafeBytePointer)
}

extension UnsafeBytePointer : RandomAccessIndex {
 typealias Distance = Int

 func successor() -> UnsafeBytePointer
 func predecessor() -> UnsafeBytePointer
 func distance(to : UnsafeBytePointer) -> Int
 func advanced(by : Int) -> UnsafeBytePointer
}

func == (lhs: UnsafeBytePointer, rhs: UnsafeBytePointer) -> Bool

func < (lhs: UnsafeBytePointer, rhs: UnsafeBytePointer) -> Bool

func + (lhs: UnsafeBytePointer, rhs: Int) -> UnsafeBytePointer

func + (lhs: Int, rhs: UnsafeBytePointer) -> UnsafeBytePointer

func - (lhs: UnsafeBytePointer, rhs: Int) -> UnsafeBytePointer

func - (lhs: UnsafeBytePointer, rhs: UnsafeBytePointer) -> Int

func += (lhs: inout UnsafeBytePointer, rhs: Int)

func -= (lhs: inout UnsafeBytePointer, rhs: Int)
Occasionally, we need to convert from a UnsafeBytePointer to an UnsafePointer. This should only be done in very rare circumstances when the author understands the compiler's strict type rules for UnsafePointer. Although this could be done by casting through an OpaquePointer, an explicit, designated unsafe pointer cast API would makes the risks more obvious and self-documenting. For example:

extension UnsafePointer {
 init(_ from: UnsafeBytePointer, toPointee: Pointee.type)
}
extension UnsafeMutablePointer {
 init(_ from: UnsafeBytePointer, toPointee: Pointee.type)
}
Similarly, conversion between UnsafePointer types must now be spelled with an explicitly Pointee type:

extension UnsafePointer {
 init(_ from: UnsafePointer, toPointee: Pointee.Type)
 init(_ from: UnsafeMutablePointer, toPointee: Pointee.Type)
}
extension UnsafeMutablePointer {
 init(_ from: UnsafeMutablePointer, toPointee: Pointee.Type)
}
<https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#impact-on-existing-code>Impact on existing code

The largest impact of this change is that void* and const void* are imported as UnsafeBytePointer. This impacts many public APIs, but with implicit argument conversion should not affect typical uses of those APIs.

Any Swift projects that rely on type inference to convert between UnsafePointer types will need to take action. The developer needs to determine whether type punning is necessary. If so, they must migrate to the UnsafeBytePointer API. Otherwise, they can work around the new restriction by using a toPointee, or mutating label.

Disallowing inferred UnsafePointer direct conversion requires some standard library code to use an explicit toPointeelabel for unsafe conversions that may violate strict aliasing.

All occurrences of Unsafe[Mutable]Pointer<Void> in the standard library are converted to UnsafeBytePointer. e.g. unsafeAddress() now returns UnsafeBytePointer, not UnsafePointer<Void>.

Some occurrences of Unsafe[Mutable]Pointer<Pointee> in the standard library are replaced with UnsafeBytePointer, either because the code was playing too loosely with strict aliasing rules, or because the code actually wanted to perform pointer arithmetic on byte-addresses.

StringCore.baseAddress changes from OpaquePointer to UnsafeBytePointer because it is computing byte offsets and accessing the memory. OpaquePointer is meant for bridging, but should be truly opaque; that is, nondereferenceable and not involved in address computation.

The StringCore implementation does a considerable amount of casting between different views of the String storage. The current implementation already demonstrates some awareness of strict aliasing rules. The rules are generally followed by ensuring that the StringBuffer only be accessed using the appropriate CodeUnit within Swift code. For interoperability and optimization, String buffers frequently need to be cast to and from CChar. This is valid as long access to the buffer from Swift is guarded by dynamic checks of the encoding type. These unsafe, but dynamically legal conversion points will now be labeled with toPointee.

CoreAudio utilities now use an UnsafeBytePointer.

<https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#implementation-status>Implementation status

On my unsafeptr_convert branch <https://github.com/atrick/swift/commits/unsafeptr_convert>, I've made most of the necessary changes to support the addition of UnsafeBytePointerand the removal of inferred UnsafePointer conversion.

There are a several things going on here in order to make it possible to build the standard library with the changes:

A new UnsafeBytePointer type is defined.

The type system imports void* as UnsafeBytePointer.

The type system handles implicit conversions to UnsafeBytePointer.

UnsafeBytePointer replaces both UnsafePointer<Void> and UnsafeMutablePointer<Void>.

The standard library was relying on inferred UnsafePointer conversion in over 100 places. Most of these conversions now either take an explicit label, such as 'toPointee', 'mutating'. Some have been rewritten.

Several places in the standard library that were playing loosely with strict aliasing or doing bytewise pointer arithmetic now use UnsafeBytePointer instead.

Explicit labeled Unsafe[Mutable]Pointer initializers are added.

The inferred Unsafe[Mutable]Pointer conversion is removed.

TODO:

Once this proposal is accepted, and the rules for casting between pointers types have been decided, we need to finish implementing the type system support. The current implementation (intentionally) breaks a few tests in pointer_conversion.swift. We also need to ensure that interoperability requirements are met. Currently, many argument casts to be explicitly labeled. The current implementation also makes it easy for users to hit an "ambiguous use of 'init'" error when relying on implicit argument conversion.

Additionally:

A name mangled abbreviation needs to be created for UnsafeBytePointer.

The StringAPI tests should probably be rewritten with UnsafeBytePointer.

The NSStringAPI utilities and tests may need to be ported to UnsafeBytePointer

The CoreAudio utilities and tests may need to be ported to UnsafeBytePointer.

<https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#alternatives-considered>Alternatives considered

<https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#existing-workaround>Existing workaround

In some cases, developers can safely reinterpret values to achieve the same effect as type punning:

let ptrI32 = UnsafeMutablePointer<Int32>(allocatingCapacity: 1)
ptrI32[0] = Int32()
let u = unsafeBitCast(ptrI32[0], to: UInt32.self)
Note that all access to the underlying memory is performed with the same element type. This is perfectly legitimate, but simply isn't a complete solution. It also does not eliminate the inherent danger in declaring a typed pointer and expecting it to point to values of a different type.

<https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#discarded-alternatives>Discarded alternatives

We considered adding a typePunnedMemory property to the existing Unsafe[Mutabale]Pointer API. This would provide a legal way to access a potentially type punned Unsafe[Mutabale]Pointer. However, it would certainly cause confusion without doing much to reduce likelihood of programmer error. Furthermore, there are no good use cases for such a property evident in the standard library.

The opaque _RawByte struct is a technique that allows for byte-addressable buffers while hiding the dangerous side effects of type punning (a _RawByte could be loaded but it's value cannot be directly inspected). UnsafePointer<_RawByte> is a clever alternative to UnsafeBytePointer. However, it doesn't do enough to prevent undefined behavior. The loaded _RawByte would naturally be accessed via unsafeBitCast, which would mislead the author into thinking that they have legally bypassed the type system. In actuality, this API blatantly violates strict aliasing. It theoretically results in undefined behavior as it stands, and may actually exhibit undefined behavior if the user recovers the loaded value.

To solve the safety problem with UnsafePointer<_RawByte>, the compiler could associate special semantics with a UnsafePointer bound to this concrete generic parameter type. Statically enforcing casting rules would be difficult if not impossible without new language features. It would also be impossible to distinguish between typed and untyped pointer APIs. For example, UnsafePointer<T>.load would be a nonsensical vestige.

<https://github.com/atrick/swift-evolution/tree/voidpointer/proposals#alternate-proposal-for-void-type>Alternate proposal for void* type

Changing the imported type for void* will be somewhat disruptive. Furthermore, this proposal currently drops the distinction between void* and const void*--an obvious loss of API information.

We could continue to import void* as UnsafeMutablePointer<Void> and const void* as UnsafePointer<Void>, which will continue to serve as an "opaque" untyped pointer. Converting to UnsafeBytePointer would be necesarry to perform pointer arithmetic or to conservatively handle possible type punning.

This alternative is much less disruptive, but we are left with two forms of untyped pointer, one of which (UnsafePointer) the type system somewhat conflates with typed pointers.

Given the current restrictions of the language, it's not clear how to statically enforce the necessary rules for castingUnsafePointer<Void> once general

Andrew_Trick · May 20, 2016, 4:55am

I totally agree, and was only trying to follow convention. From now on I’ll attach the text and only provide a link that renders a page:

https://github.com/atrick/swift-evolution/blob/voidpointer/proposals/XXXX-unsafebytepointer.md

-Andy

···

On May 19, 2016, at 9:46 PM, Austin Zheng <austinzheng@gmail.com> wrote:

P.S. On an unrelated note, it might be better to host a proposal in a Gist or elsewhere; the first time I sent this message the mailing list software caused it to bounce. I suspect the same might have happened to other people's responses.

Andrew_Trick · May 20, 2016, 6:35am

Hello Swift evolution,

I'm sending this proposal out again for another round of RFC. The first round did not get much specific feedback, and nothing has fundamentally changed. In this updated version I beefed up the explanation a bit and clarified the language.

Hi Andy,

I think this is a reasonable proposal. It seems like the real win here is to be able to define TBAA rules for Unsafe[Mutable]Pointer references, instead of having to treat them *all* conservatively (something I’m generally supportive of). A few questions/observations:

- It seems like the proposal should include a discussion about that, because that’s a pretty substantial change to the programming model.

Right, although we don’t have a formal spec, I thought it was worthwhile to capture Swift’s notion of TBAA rules for the sake of discussion. This design document is based on today’s reality, not the proposed change:

github.com

atrick/swift/blob/type-safe-mem-docs/docs/TypeSafeMemory.rst

:orphan:

=======================
Type Safe Memory Access
=======================

.. contents:: :local:
   
Introduction
============

Swift enforces type safe access to memory and follows strict aliasing
rules. However, code that uses unsafe APIs or imported types can
circumvent the language's natural type safety. Consider the following
example of *type punning* using the ``UnsafePointer`` type::

  let ptrT = UnsafeMutablePointer<T>.allocate(capacity: 1)
  // Store T at this address.
  ptrT[0] = T()
  // Load U at this address

This file has been truncated. show original

(and there’s an even lower-level SIL description)

- Does TBAA for these accesses actually produce better performance in practice on any existing known use cases?

Yes, but I’d have to rerun the numbers. Under the hood we do a lot of conversion to UnsafePointer. My feeling is that UnsafePointer is intended to be a tool for building low-level, high-performance data structures.

- Would it be possible for tools like UBSAN to catch violations of this? I’m not familiar with what ubsan does for C TBAA violations (if anything).

Yes, we definitely want to be able to feed TBAA information to a sanitizer. That’s a big part of my argument for clarifying the rules for safe/unsafe operation. But I think it’s quite different from the things that UBSAN checks now.

- It isn’t clear to me why it is important to change how "void*” is imported. Since you can’t deference an UnsafePointer<Void> anyway, why does it matter for this proposal?

Briefly, it’s not essential for the TBAA story. That has to do with allowing safe casts to be inferred and implicit, while type-unsafe casts require explicit conversion. Also, the UnsafeMutablePointer API is nonsense with a Void Pointee.

Whether it’s part of this proposal, or done later really depends on how we want to handle migration. Here’s how it plays out:
- I got strong feedback that I should first be able to prevent UnsafePointer<T> to UnsafePointer coercion before introducing a new aliasing pointer type
- Wherever users are attempting that conversion, some additional cast will need to be introduced
- A lot of UnsafePointer conversion is actually safe conversion to UnsafePointer<Void>, it would be a shame to penalize all that code
- Changing the import type of UnsafePointer<Void> allows that code to be migrated without introducing useless casts

It might be nice to introduce UsafeBytePointer first without restricting UnsafePointer conversion, then there’s no immediate need to change the import type. Leaving the API in that state just don’t make as much sense conceptually. I also actually tried this and was not able to implement all the expected implicit pointer conversions without running into ambiguous overloads. I think eliminating the UnsafePointer<T> conversion nicely simplifies the type checker’s job.

-Andy

···

On May 19, 2016, at 10:21 PM, Chris Lattner <clattner@apple.com> wrote:

On May 19, 2016, at 12:08 AM, Andrew Trick via swift-evolution <swift-evolution@swift.org> wrote:

-Chris

Andrew_Trick · May 20, 2016, 7:14am

UnsafeBytePointer API for In-Memory Layout

UnsafePointer and UnsafeMutable refer to a typed region of memory, and the compiler must be able to assume that UnsafePointer element (Pointee) type is consistent with other access to the same memory. See proposed Type Safe Memory Access documentation <https://github.com/atrick/swift/blob/type-safe-mem-docs/docs/TypeSafeMemory.rst>\. Consequently, conversion between UnsafePointer element types exposes an easy way to abuse the type system.

I don’t necessarily disagree with the proposal but I think we should clearly answer the following question:

I think these are two questions:

Why doesn’t UnsafePointer<T>(_: UnsafePointer) read as UnsafePointer<T>(_: UnsafePointer<Void>). That is to say you can only “type pun” through a Void pointer.

That’s a reasonable request and would definitely ease migration. However, it still communicates to the user that type punning is a normal, expected use of UnsafePointer. Also, it doesn’t allow all uses of potentially type punning to be identified through code inspection. I know how helpful that feature is because I’ve been auditing code for potential undefined behavior.

A convenience method could be offered, something like UnsafePointer.reinterpretBytes(_ ptr: UnsafePointer, as: U.Type) -> U so all valid cases of type punning can be explicit.

Maybe you mean UnsafePointer.reinterpretBytes(as: U.Type) -> U

That’s a possibility. It’s slightly reminiscent of my first attempt to deal with this problem.

However, simply as a convenience it’s too similar to unsafeBitCast(p[0], to: U.self) assuming you know the types are layout compatible.

With my proposal you could now do UnsafeBytePointer(p).load(U.self), which is overall a much more clear, safer design.

As motivation for such an API, consider that an UnsafePointer<Void> or OpaquePointer may be currently be obtained from an external API. However, the developer may know the memory layout and may want to read or write elements whose types are compatible with that layout. This a reasonable use case, but unless the developer can guarantee that all accesses to the same memory location have the same type, then they cannot use UnsafePointer to access the memory without risking undefined behavior.

IMHO if we had a @packed attribute a lot of this nonsense could be made explicit by defining a Swift struct that had the appropriate memory layout. This is how a lot of “PInvoke” stuff was done in the C# world. It also gives you an “out” if you need a very specific layout in memory for some other reason.

I think think that’s complementary and addresses the usability of doing manual layout.

Just as with unsafeBitCast, although the destination of the cast can usually be inferred, we want the developer to explicitly state the intended destination type, both because type inferrence can be surprising, and because it's important to the reader for code comprehension.

I’d definitely prefer a labelled initializer, especially one with an uncommon name. IMHO It should immediately stand out in code reviews.

Well, I agree with that sentiment. I would even be fine with a freestanding function to make it really clear, but that defies convention. I’m looking for people to weigh in.

Once of the reasons I finished migrating the stdlib (multiple times), is so that proposal reviewers can look at my branch see the real effects of the proposal. My first implementation was something like UnsafePointer.init(unsafePointerCast: p). I’ve actually gotten strong feedback to force the destination type to be spelled, and shorten the label. I could probably dig up an earlier version of the changes.

I have to admit though that the UnsafePointer(p, to: U.self) syntax tends to read better and at least there’s an easy regex that can pick up on it.

Note: For API clarity we could consider a typealias for VoidPointer. A separate VoidPointer type would not be very useful--there's no danger that UnsafeBytePointer will be casually dereferenced, and no danger in allowing pointer arithmetic since the only reasonable interpretation is that of a byte-addressable memory.

Agreed; even today messing with UnsafeMutablePointer<Void> requires you to understand that the size corresponds to bytes which is not intuitive.

Ah, that’s good feedback in favor of replacing UnsafePointer<Void> and its status as imported type!

Loading from and storing to memory via an Unsafe[Mutable]BytePointer is safe independent of the type of value being loaded or stored and independent of the memory's allocated type as long as layout guarantees are met (per the ABI). This allows legal type punning within Swift and allows Swift code to access a common region of memory that may be shared across an external interface that does not provide type safety guarantees. Accessing type punned memory directly through a designated Unsafe[Mutable]BytePointer type provides sound basis for compiler implementation of strict aliasing. This is in contrast with the approach of simply providing a special unsafe pointer cast operation for bypassing type safety, which cannot be reliably implemented.

I’m not sure how to word it but I feel like some of this might help if it were included at the very beginning so people understand why this is a problem. I also think the stdlib docs should have a lot more to say about the rules, undefined behavior, and the consequences thereof. That will be all that a lot of developers ever bother to learn on the subject (a shame but out of scope for a swift evolution proposal :) )

I thought that including the link to proposed Type Safe Memory Access documentation <https://github.com/atrick/swift/blob/type-safe-mem-docs/docs/TypeSafeMemory.rst> in the first paragraph was sufficient. But reviewers seem to be skipping over it!

Andy

···

On May 19, 2016, at 10:35 PM, Russ Bishop <xenadu@gmail.com> wrote: