[Pitch] Constrain implicit raw pointer conversion to bitwise-copyable values

Andrew_Trick · February 23, 2023, 5:04pm

Swift has a known safety issue with implicit conversions that we've been planning to do something about for several months now. I believe the timing is right for adding compiler warnings in Swift 5.9 so we can promote the most dangerous cases to errors in Swift 6.

Here is a proposal to add restrictions on implicit casts to raw pointers, allowing them only when the source value is "bitwise copyable". Raw pointers are primarily intended for use with bitwise-copyable types because it is extremely difficult and dangerous to work with raw bytes in the presence of object references. This new restriction is consistent with our plan to gate raw pointer APIs on a bitwise-copyable layout constraint, which will be introduced in a separate proposal.

Implementation: Swift PR #63825.

The implicit raw pointer casts that we want to prohibit happen accidentally, without any source-level indication of unsafety. They nonetheless pose a serious safety and security concern. They are a source of programming mistakes leading to runtime crashes but have no discernable benefit. Here, the user likely wants to inspect the contents of a string, but instead they've leaked the internal representation:

func readBytes(_ pointer: UnsafeRawPointer)

func inspectString(string: inout String) {
  readBytes(&string) // reads the string's internal representation
}

This is a pernicious security issue because the code will happen to work during testing for small strings. Removing the '&' sigil changes the string conversion into an array-like implicit conversion to its elements:

func inspectString(string: inout String) {
  readBytes(string) // reads the string's characters
}

In the next example, the author clearly expected Foundation.Data to have the same sort of implicit conversion support as Array:

func foo(data: inout Data) {
  readBytes(&data)
}

This compiles without error, but it unintentionally exposes data object's internal storage representation rather than its elements. Balancing safety with C interoperability usability is often difficult, but in this case we can achieve much greater safety without sacrificing much overall usability. Given the recent improvements to pointer interoperability in Swift 5.7, and with a major language update on the horizon, this good time to correct this behavior.

jrose · February 23, 2023, 5:24pm

I like the idea and think it’s a net win, even if idioms like associated objects need to be written differently in the future. I do think the warning text should be phrased differently since the compiler will continue to make these invalid pointers for now; saying “cannot” is incorrect. Perhaps “forming a pointer to variable 'foo' rather than its contents; this is likely incorrect because 'foo' may contain object references”.

And I do think that “may” is important. Any non-frozen Swift type can change its layout in the future, of course, but even types built from source outside the current module are suspect; it’s not a SemVer-breaking change to add a String property to a struct or a payload case to an enum. So probably any non-frozen Swift type outside the current module should get diagnosed, besides @objc enum…but that’s unfortunate for newtypes, especially when we have documented that a struct containing a single property is zero-cost in Swift. I don’t really have an answer there (other than that people do in fact want @frozen for non-ABI-stable modules already, and this is one more case where that would be interesting). Maybe it’s okay since there’s a workaround?

On the other side we have C structs, but it’s probably okay to say C structs that don’t currently have object references in them shouldn’t grow them in the future, even by replacing an existing member.

benrimmington · February 23, 2023, 6:20pm

From the "Proposed Solution" section:

It is safe to assume that fixed-width integers are bitwise-copyable.

Should this assumption be dropped when BitwiseCopyable is available?

Andrew_Trick · February 23, 2023, 6:33pm

I actually expect BitwiseCopyable to formalize this assumption. That proposal should include the standard protocols for which we'll introduce BitwiseCopyable conformance. FixedWidthInteger should be among those.

Andrew_Trick · February 23, 2023, 6:43pm

I'll mention that in Source Compatibility. Once programmers assume physical layout, they're outside the protection of semantic compatibility. The general workaround here only hides that problem. So preemptively forcing people into the workaround does more harm. In general, we can't guard against layout changes. But when layout changes in a way that adds another dimension of memory unsafety, we want to force a source to break at that time so the maintainer of the client sees the problem.

tera · February 23, 2023, 9:44pm

+1 to this proposal. AFAIK it solves everything we discussed recently, does this to a much greater level of detail and is backed with the implementation already, so well done.

memcpy works at the byte level (i.e. it can't copy 7 bits of a byte without copying the 8th (memory I/O is done at a byte granularity)) and thus memcpy doesn't even talk about bits in its man page. However I agree that "BitwiseCopyable" is a good name to use here and not any worse than a potential "BytewiseCopyable" alternative.

I guess Double could follow the suit.

What about the tuples? Will there be some automatic conformance that says: the tuple of "BitwiseCopyable" elements is "BitwiseCopyable" itself? Or will it be possible to manually conform a tuple class to "BitwiseCopyable" (provided its elements are "BitwiseCopyable")?

On a point of clarification, would the "BitwiseCopyable" applied to a struct mean what "POD" currently means? In other words will it follow (1) or (2) below?

On that front, could BitwiseCopyable be applied to a reference type? (obviously the one that doesn't contain reference fields in its payload). I guess – "no" and it is worth to mention it in the proposal (unless it is already mentioned and I just missed it).

xwu · February 23, 2023, 10:55pm

I'm assuming unstated here is that BitwiseCopyable will be a formalization of _Trivial as a layout constraint and thus retrofittable into an existing hierarchy, since ordinary protocols are impossible to slide into existing standard library protocol hierarchies in an ABI-compatible way?

Anyway, whether possible or not, do we get much mileage in baking in an assumption ahead of time like this about FixedWidthInteger, versus blessing just the concrete built-in fixed-width integer types (and can we include floating-point types, as mentioned above)? Do you have concrete use cases in mind for generic or opaque some FixedWidthInteger that we'd like to support specifically?

ksluder · February 23, 2023, 11:53pm

Please don’t forget about SIMD types, which aren’t themselves numeric but are composed of numeric values. This extends to some SIMD types which aren’t currently modeled in the stdlib, like matrices (e.g. simd_float4x4) and packed vectors (e.g. simd_packed_float16). These types are critical to efficiently working with Metal buffers.

Andrew_Trick · February 24, 2023, 6:31pm

This all needs to be discussed in a separate BitwiseCopyable proposal. But since that's delayed, we're forced to review this somewhat out-of-order. Meanwhile, I can answer these questions with fair confidence...

BitwiseCopyable is a formalization of the _Trivial layout constraint and the _isPOD() runtime type check.

BitwiseCopyable is a "layout constraint" as opposed to a marker protocol, meaning that concrete types automatically conform when their elements are BitwiseCopyable. So, yes, that applies to tuples. It does not apply to reference types.

FixedWidthInteger catches reasonable use cases where code is generic over some integer size and still wants to manipulate raw bytes. Once we have BitwiseCopyable, all types that currently conform to FixedWidthInteger will automatically conform. That gives us an extra convenient workaround today, and saves generic code from obvious redundancy in the future: <T: FixedWidthInteger & BitwiseCopyable>.

For example, String's internal implementation is generic over CodeUnit:

protocol _UnicodeEncoding {
  /// The basic unit of encoding
  associatedtype CodeUnit: UnsignedInteger, FixedWidthInteger

SwiftNIO has some great examples of working with raw bytes in a way that's generic over FixedWidthInteger:

github.com

apple/swift-nio/blob/b2629903ca496716a81f263288ae74f99b9dd89c/Sources/NIOCore/ByteBuffer-multi-int.swift

//===----------------------------------------------------------------------===//
//
// This source file is part of the SwiftNIO open source project
//
// Copyright (c) 2021 Apple Inc. and the SwiftNIO project authors
// Licensed under Apache License v2.0
//
// See LICENSE.txt for license information
// See CONTRIBUTORS.txt for the list of SwiftNIO project authors
//
// SPDX-License-Identifier: Apache-2.0
//
//===----------------------------------------------------------------------===//

/// NOTE: THIS FILE IS AUTO-GENERATED BY dev/generate-bytebuffer-multi-int.sh

extension ByteBuffer {
    @inlinable
    @_alwaysEmitIntoClient
    public mutating func readMultipleIntegers<T1: FixedWidthInteger, T2: FixedWidthInteger>(endianness: Endianness = .big, as: (T1, T2).Type = (T1, T2).self) -> (T1, T2)? {

This file has been truncated. show original

@glessard pointed out that all these protocols should by constrained to be BitwiseCopyable: FixedWidthInteger, _Pointer, SIMDStorage, and SIMDScalar

ksluder · February 24, 2023, 6:45pm

Since this pitch is not blocked on implementation of BitwiseCopyable, should it not explicitly enumerate all such reasonable types? Once BitwiseCopyable is accepted and implemented, the rule can switch over.

Andrew_Trick · February 24, 2023, 7:18pm

That's a good point, but I think there should be practical benefit to complicating the compiler (it's a layering violation to expose these individual protocols). This proposal rests on the assumption that we'll have BitwiseCopyable before converting the diagnostic to an error. In the meantime withUnsafe APIs are fine workarounds for the extremely rare case of manipulating raw bytes in generic code. We're only making an exception for FixedWidthInteger because it occurs in the wild in Swift packages.

tera · February 25, 2023, 10:28am

Found a use case that would not be supported by the pitch:

struct S {
    var a: Int
    var b: Double
    .. many many POD fields
   .. and then
    var cachedEntry: RefType?
}

whereas the struct has many POD fields but also a few ref field(s) which are unimportant for struct recreation. Usage pattern:

func write(_ s: S) {
    var copy = s
    copy.cachedEntry = nil
    fwrite(&copy, ...)
}
init() {
    var s: S!
    fread(&s)
    self = s
}

This pattern could work in principle (nil pointers are bitwise copyable) but understandably it won't be supported because we are after a static compile time checked solution that won't know if the pointer is nil or non nil during compilation time. The workaround would be to split the struct above into the main POD-only part and the secondary ref-part and embed them into a single parent struct to work with.

Andrew_Trick · February 25, 2023, 11:21pm

An explicit "unsafe" cast tells the compiler that you know what you're doing:
withUnsafeBytes(&copy) { ... }

ksluder · February 26, 2023, 2:04am

At the slightly annoying cost of extra indentation and having to spell out $0.baseAddress, because withUnsafeBytes(of:) gives you an UnsafeRawBufferPointer.

Andrew_Trick · February 28, 2023, 5:41am

If there aren't any serious concerns, I'd like to enable this warning to get initial developer feedback before the evolution PR goes into formal review.

Ling_Wang · June 22, 2023, 4:05am

I found this proposal from this warning

Forming 'UnsafeRawPointer' to an inout variable of type String exposes the internal representation rather than the string contents.

in my objc_getAssociatedObject code with an inout String key.

However, I couldn't find the final proposal in Swift Evolution. What's the status?

I'm not sure this suggested workaround is correct:

class Container {
  static var key = "key"

  func getObject() -> Any? {
    withUnsafePointer(to: Container.key) {
      return objc_getAssociatedObject(self, $0)
    }
  }
}

Notice it is using the non-inout version of withUnsafePointer which I believe gives the address of a temp var in each invocation, how could it be used to access the associated object reliably?

Even with the inout version, is it guaranteed to give a consistent pointer value across different calls for a static var?

Andrew_Trick · June 22, 2023, 4:25pm

The core team deferred this proposal because they want to see BitwiseCopyable added to the language before it is accepted. Meanwhile, we still need to issue warnings to prevent some serious problems from creeping into more code.

The 5.9 release notes refer to a github issue instead: Constrain implicit raw pointer conversion to bitwise-copyable values #64927.

Which, in turn, refers to the unaccepted proposal text :)
To workaround these warnings, please refer to Workarounds for common cases.

Great point! Although I personally think the compiler is wrong for creating a temporary in this case, rather than enforcing exclusivity, that is indeed what it does. You don't need to use withUnsafeMutablePointer but you do need to pass the key 'inout':

withUnsafePointer(to: &Container.key) {

I added a note about this in the workarounds and corrected one of the examples where the '&' had been incorrectly removed at some point.

Even this inout usage is not formally supported, but it will work under the same conditions as the code without the workaround. In other words, if you were able to pass an 'inout' reference to objc_getAssociatedObject, then passing the same inout reference to withUnsafePointer will also work. So, in general, this is a safe workaround.

Ling_Wang · June 23, 2023, 4:37am

I'm still not sure about the inout case. @Joe_Groff's SE proposal 0205 says:

Even for inout arguments, the semantics of withUnsafeBytes and
withUnsafePointer are extremely narrow: the pointer cannot be used after
the closure returns, cannot be mutably aliased, and is not guaranteed to
have a stable identity across withUnsafe* applications to the same
property.

Andrew_Trick · June 26, 2023, 10:09pm

the pointer...is not guaranteed to
have a stable identity across withUnsafe* applications to the same
property

This caveat applies to inout arguments in general. Using withUnsafePointer(to: &property) only moves the same inout-to-pointer conversion up a level, making it more explicit. It does not make the pointer any less stable than it would be with an implicit conversion of an argument passed to a C function. In practice, you can expect static properties (globals) to have a stable address, and adding the workaround makes it no less safe.

Swift does not have a formally supported way to convert a value to a stable address. But it does have stable object identities, and it has a way to convert an object reference into a pointer. If you want to follow the letter of the law, you would need the associated object key to be a separate object:

Alternatively, you can use the key's object identity rather the address of the property. But this only works with objects that require separate allocation at instantiation. Neither NSString, nor NSNumber can be safely used. NSObject is a safe bet:
class Container {
  static var key = NSObject()

  func getID(_ object: AnyObject) -> UnsafeRawPointer {
    return UnsafeRawPointer(Unmanaged.passUnretained(object).toOpaque())
  }
  func getObject() -> Any? {
    return objc_getAssociatedObject(self, getID(Container.key))
  }
}

drewster99 · August 23, 2023, 4:08pm

Hey all!

For associatedObject keys, the pitch listed this as a "safer and more principled approach" for dealing with associatedObject keys:

@propertyWrapper
struct UniqueAddress {
  var _placeholder: Int8 = 0
 
  var wrappedValue: UnsafeRawPointer {
    mutating get {
      // This is "ok" only as long as the wrapped property appears
      // inside of something with a stable address (a global/static
      // variable or class property) and the pointer is never read or
      // written through, only used for its unique value
      return withUnsafeBytes(of: &self) {
        return $0.baseAddress.unsafelyUnwrapped
      }
    }
  }
}

class Container {
  @UniqueAddress static var key
  
  func getObject() -> Any? {
    return objc_getAssociatedObject(self, Container.key)
  }
}

We tried this implementation but Thread Sanitizer gave us a lot of warnings when multiple threads were reading, because every get is "mutating" (even though it's really not). Switching the property wrapper to a class instead of a struct, we ended up with this, which now uses the non-inout version of withUnsafeBytes:

@propertyWrapper
public class UniqueAddress {
    private var _placeholder: Int8 = 0
    
    public var wrappedValue: UnsafeRawPointer {
        // This is "ok" only as long as the wrapped property appears
        // inside of something with a stable address (a global/static
        // variable or class property) and the pointer is never read or
        // written through, only used for its unique value
        return withUnsafeBytes(of: self) {
            return $0.baseAddress.unsafelyUnwrapped
        }
    }
    
    private init(_placeholder: Int8) {
        self._placeholder = _placeholder
    }
    
    public init() { }
}

That gets rid of all the warnings, both compile-time and run-time, but is it correct?

Alternatively -- why is this wrong?:

private struct Container {
    static var myKey: Void?
}

//elsewhere
objc_setAssociatedObject(self, &Container.myKey, value)

Thanks!!