What is a "copy"?

It might sound like an "easy" answer… but I am struggling to find where the definition of this might be documented.

For example:

This tells me that a type which is Copyable is a type that can be copied… but then what does that mean?

There might be some more clues in SE-0390:

All currently existing types in Swift are copyable, meaning it is possible to create multiple identical, interchangeable representations of any value of the type.

But then what does that mean? Are "identical" and "interchangeable" defined anywhere else across Swift? I don't see much more discussion about what that means or what that implies.

If we try and work to a formal definition of this:

if b = copy(a) then a.isIdentical(to: b)
if b = copy(a) then a.isInterchangable(to: b)

Unstated and undefined are axioms of Reflexivity:

a.isIdentical(to: a) is always true
a.isInterchangable(to: a) is always true

Symmetry:

a.isIdentical(to: b) implies b.isIdentical(to: a)
a.isInterchangable(to: b) implies b.isInterchangable(to: a)

and Transitivity:

a.isIdentical(to: b) and b.isIdentical(to: c) implies a.isIdentical(to: c)
a.isInterchangable(to: b) and b.isInterchangable(to: c) implies a.isInterchangable(to: c)

If the definition of Copyable presented in SE-0390 was meant to imply that a representation that is identical must be interchangeable and a representation that is interchangeable must be identical that would simplify things a little… but still AFAIK leaves some more basic questions unanswered. For example: if a representation is identical and interchangeable does this imply that a representation is a copy? Or are some representations that are identical and interchangeable not copies?

A slightly different idea is presented in SE-0426:

[Types that are "bitwise-copyable"] can be moved or copied with direct calls to memcpy and […] require no special destroy operation.

I believe the unstated implication is that a copy produced by memcpy must then compare equal using memcmp:

if b = bitwiseCopy(a) then memcmp(a, b)

But I see no mention in SE-0426 of whether or not we consider a and b to be identical or interchangeable. A "future direction" does suggest that "BitwiseCopyable could be defined as the composition of several protocols":

typealias BitwiseCopyable = Bitwise & Copyable & DefaultDeinit

But I don't see where we currently "inherit" the concepts of a copy being identical and interchangeable from Copyable on BitwiseCopyable.

Another slightly different idea is presented in the definition of Equatable:

If we accepted the axiom of Reflexivity on our original identical and interchangeable properties and we add the axiom of Reflexivity from Equatable that seems to imply that a representation that is identical and interchangeable is also "substitutable". But this doesn't help me much to look for a definition of identical or interchangeable on a type that is not Equatable. There also seems to be a subtle implication here that it would be possible for a representation to be substitutable and not identical and interchangeable.

Could there be any other place in the repos where there might be more documentation here on these topics? Would there have been any evolution proposals that spent more time on these questions?

For a Copyable type, copies are literally interchangeable with each other without changing the surface meaning of the program (though obviously can affect the performance characteristics). If you have

foo(a)

then

let b = copy a
foo(b)

will behave exactly the same. The compiler by this principle is free to insert or remove copies to improve performance or maintain language semantics.

7 Likes

This is not implied, nor is it implied in C where memcpy() originates. Values may contain padding bytes or even padding bits of undefined value, and those bytes may on some CPU architectures (hi Alpha) not even contain valid bit patterns representable as bytes!

3 Likes

If a value’s type is Copyable, you can make a copy of the value, and this copy has its own lifetime independent of the original value.

The primitive operation of copying a value could do anything (if you have an imported C++ type with a copy constructor) but for types defined in Swift, the copy is always bitwise identical to the original value, but while copying we have to update reference counts.

Now, your question was perhaps asking for a precise definition of the sense in which the original value and the copy are “identical”. I don’t think there’s a really formal definition here because at the end of the day, operations on the value could read alignment padding, or look at the address of the value, and then the copy is definitely not going to behave the same.

But to get around the lack of equality, you can say this. For every pure “value semantics” function f that takes your value’s concrete type and returns a Bool, you should have f(x) == f(copyOfX). So if two values cannot be distinguished by any property you can test, then they are identical or interchangeable in the sense meant by the proposal.

6 Likes

This all makes sense to me. I think one remaining question here is if we have any definitions of "identical" and "interchangeable" that are independent of each other. Do we believe there is ever a representation that is identical and is not interchangeable? Or is this an "if and only if" relationship where identical and interchangeable are just two different words that mean the same thing?

If a representation a is always interchangeable with itself — which sounds like a reasonable assumption — and we have the if and only if relationship back to identical, I think we then have the axiom of Reflexivity on identical where a representation a is always identical to itself. Correct?

What about going from a representation being identical to a also being a copy of a? Was that also an if and only if relationship such that all identical representations are also legit copies?

Edit: Utter reading comprehension failure.

original text

Trivial example:

struct UInt48: BitwiseCopyable, Comparable, BinaryInteger, ... {
  var lo: UInt16
  // var __implicit_padding: UInt16
  var hi: UInt32
}

The bits in __implicit_padding don't matter for this type. Copy an instance of this type, randomize the bits in __implicit_padding, and it will still be identical to the original as far as well-formed code is concerned.

You don't even need to manually stir the bits: the compiler is allowed to make a copy by splatting the fields into a new location rather than copying every bit à la memcpy(). This, consequently, is why memcmp() on composite types is unsafe in the general case, and why the C standard has always been weaselly about "trap representations" and the like.

Lemme try this again… yes, sorta? An example would be a lock such as os_unfair_lock or pthread_mutex_t where the bits of the value might be the same, but the address of the value provides its true identity. So the bits of two values may compare equal, but you cannot interchange them.

I don't know if that satisfies the constraints you're thinking of though, since the address of a value is not an innate property of the value itself (doubly not so in Swift.)

2 Likes

Hmm… so from what I understand so far about these two examples:

Note that pthread_mutex_t , pthread_rwlock_t , and os_unfair_lock are value types, not reference types. That means that if you use = on them, you make a copy. This is important, because these types can't be copied! […] If you use these types, you must be careful never to copy them, whether explicitly with a = operator, or implicitly by, for example, embedding them in a struct or capturing them in a closure.

So to maybe tie this back to the discussion about copying… could we say that if a type T is copyable then a representation t of type T that is identical is also interchangeable? And we then condition the reflexivity of identical on this type being copyable?

I believe Mike was writing in re C and Objective-C there. In C, a value always has an address unless declared with the register keyword (although it may be promoted to a register or constant value by the compiler if the address of the value is never observed).

Swift is stricter: a value only has an address while you're explicitly asking for it via withUnsafePointer(to:) or similar. When nothing is observing a value's address, a value in Swift may be copied arbitrarily. Technically, withUnsafePointer(to:) makes a copy of the value you pass to it, and the original value may never actually be assigned to any address at all. A value like os_unfair_lock is identified by its address, so it cannot even be touched by Swift. This is why we introduced OSAllocatedUnfairLock (which heap-allocates its lock to ensure it has a fixed address.)

(The less said about Swift's Synchronization.Mutex here, the better. It's effectively special-cased.)

I'm still unsure if this discussion about mutexes is relevant to your question, but at the very least if I exchange the values assigned to two variables of type os_unfair_lock, I didn't actually exchange the locks themselves, just their internal states. But if their bit patterns are identical and the exchange is performed atomically with respect to an observer, then with respect to that observer nothing has changed and the rest of the system will chug along happily without knowing you did anything at all.

Of course, this all assumes that there's no DMA or mapped memory involved, because it's possible that poking a value into a memory-mapped register, even if that value equals the old value, will perturb some external state. Again, maybe beyond the scope of your question, but worth mentioning?

Edit: :scream:

You must be careful with the pthread locks, because you can create a value using the empty () initializer, but that value won't be a valid lock. These locks must be separately initialized using pthread_mutex_init or pthread_rwlock_init:

var mutex = pthread_mutex_t()
pthread_mutex_init(&mutex, nil)

Yeah, don't do that. This is UB in Swift. See Russ Bishop's perennial post here.

3 Likes

It might be possible to define such a distinction. As Slava noted, depending on the type, there may be types with machine-level representations with padding, for which values that vary in the padding bits only are arguably not truly identical, but according to the language semantics are fully interchangeable, since the padding bits are considered undefined. Types could also have multiple logical representations that are not entirely identical in their defined bits but are still interchangeable, such as two closures over the same captures for the same function.

2 Likes

Wouldn’t The simple example of NaN for floating point numbers fulfill that?

1 Like

While you could define a distinction, I believe that in all of the documentation sources you cite, these words are used as synonyms:

  • interchangeable
  • identical
  • substitutable

The meaning is essentially that two values behave the same in all observable ways.

You might find Stepanov's book "Elements of Programming" interesting. It's an attempt to formalize "value semantics", so it touches upon all of the above concepts. It also tries to address some of the things Joe mentioned with representational issues around alignment and padding.

3 Likes
A tangent about class instance variable address stability.

I assumed the above code is safe considering var mutex is an instance variable of a class. I understand the worry about & operator being magical in Swift (compared to C) in that it might not give the true address of a variable but an address of some temporary variable, pseudocode:

foo(&a) // some C call
// var temp = a // getter
// temp address is passed to foo
// upon return: a = temp // setter

However in reality I could not see this happening.

How do I break this test?

final class Test {
    var value: UInt8 = 0
    
    func foo() {
        let selfAddress = unsafeBitCast(self, to: UInt.self)
        // print("mallocSize: ", malloc_size(UnsafeRawPointer(bitPattern: selfAddress))) // 17 in tests
        let s = "\(ObjectIdentifier(self))"
            .replacingOccurrences(of: "ObjectIdentifier(0x", with: "")
            .replacingOccurrences(of: ")", with: "")
        let objectIdentifier = UInt(s, radix: 16)!
        
        let withUnsafeMutablePointerAddress = withUnsafeMutablePointer(to: &value) { UInt(bitPattern: $0) }
        let withUnsafePointerAddress = withUnsafePointer(to: &value) { UInt(bitPattern: $0) }
        let withUnsafeMutableBytesAddress = withUnsafeMutableBytes(of: &value) { UInt(bitPattern: $0.baseAddress!) }
        let withUnsafeBytesAddress = withUnsafeBytes(of: &value) { UInt(bitPattern: $0.baseAddress!) }
        let f1Address = f1(&value)
        let f2Address = f2(&value)
        let f3Address = f3(&value)
        let f4Address = f4(&value)

        precondition(objectIdentifier == selfAddress)
        precondition(withUnsafeMutablePointerAddress - selfAddress <= 32) // was 16 in tests
        precondition(withUnsafePointerAddress == withUnsafeMutablePointerAddress)
        precondition(withUnsafeMutableBytesAddress == withUnsafeMutablePointerAddress)
        precondition(withUnsafeBytesAddress == withUnsafeMutablePointerAddress)
        precondition(f1Address == withUnsafeMutablePointerAddress)
        precondition(f2Address == withUnsafeMutablePointerAddress)
        precondition(f3Address == withUnsafeMutablePointerAddress)
        precondition(f4Address == withUnsafeMutablePointerAddress)
    }
    
    func f1(_ value: UnsafePointer<UInt8>) -> UInt { UInt(bitPattern: value) }
    func f2(_ value: UnsafeMutablePointer<UInt8>) -> UInt { UInt(bitPattern: value) }
    func f3(_ value: UnsafeRawPointer) -> UInt { UInt(bitPattern: value) }
    func f4(_ value: UnsafeMutableRawPointer) -> UInt { UInt(bitPattern: value) }
}

Test().foo()
print("ok")

I tried debug/release, with various diagnostic settings like thread sanitiser / UB sanitiser. Tried on godbolt and Xcode.

Please help me break this test.

The compiler is smart enough not to do extra work here, but if you make this class public, enable library evolution, and access it from another module, sadness will likely occur.

2 Likes

Great. Making the class public helped (to break it) indeed (just in case I made all members public). Specifically the tests f1 & f3 failed (the immutable unsafe pointer variants). The other tests still pass (however I didn't try yet your other hints about library evolution and modules).

Worth checking if the pointers you get back are on the stack or the heap. The compiler may reuse stack addresses here which would cause false positives (pointer has same bit pattern, but different provenance.)

1 Like

FWIW, in Xcode version I used malloc_size (now commented in the snippet) that gave sane results (like 17) which hints that's a heap address. Don't know what to import to be able using it on godbolt. The stack address check should be easy (a simple check against current thread's stack base and size) but how to do a isHeap() check?

I believe I also found references in standard library to "distinguishability":

So if equality implies substitutability by definition we then have elements that can be substitutable and distinguishable.

The === operator is "informally" also referred to as the "identical-to" operator:

Which helps tie distinguishability back to identity.

So far I see "identical-indistinguishable" mostly implying "interchangeable-substitutable"… but I don't usually see the other direction as holding.