[Pitch] Constrain implicit raw pointer conversion to bitwise-copyable values

Swift has a known safety issue with implicit conversions that we've been planning to do something about for several months now. I believe the timing is right for adding compiler warnings in Swift 5.9 so we can promote the most dangerous cases to errors in Swift 6.

Here is a proposal to add restrictions on implicit casts to raw pointers, allowing them only when the source value is "bitwise copyable". Raw pointers are primarily intended for use with bitwise-copyable types because it is extremely difficult and dangerous to work with raw bytes in the presence of object references. This new restriction is consistent with our plan to gate raw pointer APIs on a bitwise-copyable layout constraint, which will be introduced in a separate proposal.

Implementation: Swift PR #63825.

The implicit raw pointer casts that we want to prohibit happen accidentally, without any source-level indication of unsafety. They nonetheless pose a serious safety and security concern. They are a source of programming mistakes leading to runtime crashes but have no discernable benefit. Here, the user likely wants to inspect the contents of a string, but instead they've leaked the internal representation:

func readBytes(_ pointer: UnsafeRawPointer)

func inspectString(string: inout String) {
  readBytes(&string) // reads the string's internal representation
}

This is a pernicious security issue because the code will happen to work during testing for small strings. Removing the '&' sigil changes the string conversion into an array-like implicit conversion to its elements:

func inspectString(string: inout String) {
  readBytes(string) // reads the string's characters
}

In the next example, the author clearly expected Foundation.Data to have the same sort of implicit conversion support as Array:

func foo(data: inout Data) {
  readBytes(&data)
}

This compiles without error, but it unintentionally exposes data object's internal storage representation rather than its elements. Balancing safety with C interoperability usability is often difficult, but in this case we can achieve much greater safety without sacrificing much overall usability. Given the recent improvements to pointer interoperability in Swift 5.7, and with a major language update on the horizon, this good time to correct this behavior.

20 Likes

I like the idea and think it’s a net win, even if idioms like associated objects need to be written differently in the future. I do think the warning text should be phrased differently since the compiler will continue to make these invalid pointers for now; saying “cannot” is incorrect. Perhaps “forming a pointer to variable 'foo' rather than its contents; this is likely incorrect because 'foo' may contain object references”.

And I do think that “may” is important. Any non-frozen Swift type can change its layout in the future, of course, but even types built from source outside the current module are suspect; it’s not a SemVer-breaking change to add a String property to a struct or a payload case to an enum. So probably any non-frozen Swift type outside the current module should get diagnosed, besides @objc enum…but that’s unfortunate for newtypes, especially when we have documented that a struct containing a single property is zero-cost in Swift. I don’t really have an answer there (other than that people do in fact want @frozen for non-ABI-stable modules already, and this is one more case where that would be interesting). Maybe it’s okay since there’s a workaround?

On the other side we have C structs, but it’s probably okay to say C structs that don’t currently have object references in them shouldn’t grow them in the future, even by replacing an existing member.

2 Likes

From the "Proposed Solution" section:

It is safe to assume that fixed-width integers are bitwise-copyable.

Should this assumption be dropped when BitwiseCopyable is available?

I actually expect BitwiseCopyable to formalize this assumption. That proposal should include the standard protocols for which we'll introduce BitwiseCopyable conformance. FixedWidthInteger should be among those.

I'll mention that in Source Compatibility. Once programmers assume physical layout, they're outside the protection of semantic compatibility. The general workaround here only hides that problem. So preemptively forcing people into the workaround does more harm. In general, we can't guard against layout changes. But when layout changes in a way that adds another dimension of memory unsafety, we want to force a source to break at that time so the maintainer of the client sees the problem.

2 Likes

+1 to this proposal. AFAIK it solves everything we discussed recently, does this to a much greater level of detail and is backed with the implementation already, so well done.

memcpy works at the byte level (i.e. it can't copy 7 bits of a byte without copying the 8th (memory I/O is done at a byte granularity)) and thus memcpy doesn't even talk about bits in its man page. However I agree that "BitwiseCopyable" is a good name to use here and not any worse than a potential "BytewiseCopyable" alternative.

I guess Double could follow the suit.

What about the tuples? Will there be some automatic conformance that says: the tuple of "BitwiseCopyable" elements is "BitwiseCopyable" itself? Or will it be possible to manually conform a tuple class to "BitwiseCopyable" (provided its elements are "BitwiseCopyable")?

On a point of clarification, would the "BitwiseCopyable" applied to a struct mean what "POD" currently means? In other words will it follow (1) or (2) below?

On that front, could BitwiseCopyable be applied to a reference type? (obviously the one that doesn't contain reference fields in its payload). I guess – "no" and it is worth to mention it in the proposal (unless it is already mentioned and I just missed it).

1 Like

I'm assuming unstated here is that BitwiseCopyable will be a formalization of _Trivial as a layout constraint and thus retrofittable into an existing hierarchy, since ordinary protocols are impossible to slide into existing standard library protocol hierarchies in an ABI-compatible way?

Anyway, whether possible or not, do we get much mileage in baking in an assumption ahead of time like this about FixedWidthInteger, versus blessing just the concrete built-in fixed-width integer types (and can we include floating-point types, as mentioned above)? Do you have concrete use cases in mind for generic or opaque some FixedWidthInteger that we'd like to support specifically?

1 Like

Please don’t forget about SIMD types, which aren’t themselves numeric but are composed of numeric values. This extends to some SIMD types which aren’t currently modeled in the stdlib, like matrices (e.g. simd_float4x4) and packed vectors (e.g. simd_packed_float16). These types are critical to efficiently working with Metal buffers.

4 Likes

This all needs to be discussed in a separate BitwiseCopyable proposal. But since that's delayed, we're forced to review this somewhat out-of-order. Meanwhile, I can answer these questions with fair confidence...

BitwiseCopyable is a formalization of the _Trivial layout constraint and the _isPOD() runtime type check.

BitwiseCopyable is a "layout constraint" as opposed to a marker protocol, meaning that concrete types automatically conform when their elements are BitwiseCopyable. So, yes, that applies to tuples. It does not apply to reference types.

FixedWidthInteger catches reasonable use cases where code is generic over some integer size and still wants to manipulate raw bytes. Once we have BitwiseCopyable, all types that currently conform to FixedWidthInteger will automatically conform. That gives us an extra convenient workaround today, and saves generic code from obvious redundancy in the future: <T: FixedWidthInteger & BitwiseCopyable>.

For example, String's internal implementation is generic over CodeUnit:

protocol _UnicodeEncoding {
  /// The basic unit of encoding
  associatedtype CodeUnit: UnsignedInteger, FixedWidthInteger

SwiftNIO has some great examples of working with raw bytes in a way that's generic over FixedWidthInteger:

@glessard pointed out that all these protocols should by constrained to be BitwiseCopyable: FixedWidthInteger, _Pointer, SIMDStorage, and SIMDScalar

2 Likes

Since this pitch is not blocked on implementation of BitwiseCopyable, should it not explicitly enumerate all such reasonable types? Once BitwiseCopyable is accepted and implemented, the rule can switch over.

That's a good point, but I think there should be practical benefit to complicating the compiler (it's a layering violation to expose these individual protocols). This proposal rests on the assumption that we'll have BitwiseCopyable before converting the diagnostic to an error. In the meantime withUnsafe APIs are fine workarounds for the extremely rare case of manipulating raw bytes in generic code. We're only making an exception for FixedWidthInteger because it occurs in the wild in Swift packages.

Found a use case that would not be supported by the pitch:

struct S {
    var a: Int
    var b: Double
    .. many many POD fields
   .. and then
    var cachedEntry: RefType?
}

whereas the struct has many POD fields but also a few ref field(s) which are unimportant for struct recreation. Usage pattern:

func write(_ s: S) {
    var copy = s
    copy.cachedEntry = nil
    fwrite(&copy, ...)
}
init() {
    var s: S!
    fread(&s)
    self = s
}

This pattern could work in principle (nil pointers are bitwise copyable) but understandably it won't be supported because we are after a static compile time checked solution that won't know if the pointer is nil or non nil during compilation time. The workaround would be to split the struct above into the main POD-only part and the secondary ref-part and embed them into a single parent struct to work with.

An explicit "unsafe" cast tells the compiler that you know what you're doing:
withUnsafeBytes(&copy) { ... }

At the slightly annoying cost of extra indentation and having to spell out $0.baseAddress, because withUnsafeBytes(of:) gives you an UnsafeRawBufferPointer.

If there aren't any serious concerns, I'd like to enable this warning to get initial developer feedback before the evolution PR goes into formal review.

3 Likes