[Pitch] BitwiseCopyable marker protocol

ksluder · February 11, 2024, 5:20pm

Whether a type is bitwise-copyable is orthogonal to the internal organization of those bits (or bytes, for that matter). struct Foo { let a: Int16, let b: Float } is bitwise-copyable, but if I copy bits [8, 32) — aka bytes [1, 4) — I don’t get any sensible value.

dnadoba · February 11, 2024, 6:19pm

That's right. Copying bits vs bytes doesn't make a difference for any types in Swift today. However, all types in Swift are always sized in bytes, not bits. E.g. the numbers MemoryLayout<T> returns are all in bytes, not in bits. The implementation of the proposed StorageView.loadUnaligned will need to use MemoryLayout<T>.size to get the size of the type and therefore can in fact only copy full bytes.

FixedWidthInteger on the other hand has a static bitWidth property. C has bit fields that allow bit sized properties.
I'm looking more into the future where we might want to support reading/writing integers that are not complete bytes and use the term BitwiseCopyable for this instead. This protocol could then also require a static sizeInBits property, similar to FixedWidthInteger. Note that the memory layout of these types will likely still be full bytes, they can just be read/written from/to a packed representation e.g. a C bit field or binary networking protocols.

ksluder · February 11, 2024, 6:59pm

You can’t store values in anything other than an integer multiple of bytes on any modern architecture. Therefore, there’s no such thing as “ reading/writing integers that are not complete bytes”. Swift could invent some pseudo bit type without inspectable storage, but it would ultimately be backed by bytes so would it be worth it?

dnadoba · February 11, 2024, 11:06pm

This would be an argument for calling the protocol BytewiseCopyable.

There might not be a single instruction but you can build it from multiple instructions using shifting and masking. This can get quite complex e.g. see how AnyObject actually implements reference counting and packs strong reference count, unowned reference count and flags in one word. On 64-bit platforms the strong reference count uses 30-bits and the unowned reference count 31-bits. On 32-bit platforms 22-bits and 8-bits respectively.
Note the FIXME: redo this abstraction more cleanly.

The compiler already makes use of spare bits of types in enums and Swift could expose this as API in the future and extend it to structs.

John_McCall · February 12, 2024, 10:11am

I think you may be misunderstanding what Jordan’s asking about. You’re talking about a constraint that would say that values of a type can be relocated in memory as a bitwise copy, without making any promises about destruction. You’re right that many types would satisfy this without being trivial to destroy. However, Jordan is talking about types that are trivial to destroy and can be bitwise-relocated but for some semantic reason are non-copyable. He’s saying that your suggested name for the constraint forces it to rule out those types when in principle they could be orthogonal — you could have an API that requires Bitwise & ~Copyable, for example.

I do struggle to think of a type that would fall into that category, though. Maybe something that’s using non-copyability to enforce serial accesses without having ownership of any underlying value? Such a type would often also be non-escaping, but I suppose there’s no reason per se to tie this optimization to escapability either. MutableBufferView woukd be an example.

Joe_Groff · February 12, 2024, 4:33pm

No matter what, Sendable is a special case IMO. It is the only thing in the core language so far that matches the core properties of a "marker protocol"—it has no strict formal requirements, allows for retroactive conformances, and leaves behind no runtime metadata at all. By contrast, the layout constraints we've been talking about are dynamically recoverable from metadata, impose formal requirements on the layout of the type, and can't be retroactively implemented.

With strict concurrency enforced, maybe there's a path to promote Sendable to becoming a layout constraint too; when it becomes a strictly enforced part of the API, then maybe it would start to make sense to emit metadata for it now that conformances can't be added and removed without breaking code. Allowing for retroactive @unchecked Sendable conformances also makes less sense in a legacy-free codebase. I suspect it'll take some time for the ecosystem to get to that point, though.

Outside of the standard library, developers in the broader ecosystem have also already found use for marker protocols in their strictest definition. It might be interesting to develop the concept on its own in a direction separate from these layout constraints. For instance, devs coming from C++ and Rust often want generics that just monomorphize, all the time, and maybe a more general "static protocol" which never emits runtime metadata, like a marker protocol, but also supports formal requirements as long as they're used in situations where the compiler can always specialize them.

tbkka · February 12, 2024, 9:17pm

For a single operation, it's not much. But those small costs can really add up: A container that has an overloaded path for BitwiseCopyable types can safely use bulk memcpy operations to deal with collections of objects at a time. That can make the difference between a single million-byte memcpy and a million calls to a separate function that loads and stores a single byte. That difference can make or break a video- or audio-processing hot path.

Thanks to @John_McCall for clarifying. I also misunderstood @jrose's point.

So this is suggesting a "bitwise syntax" that would indicate that a particular code path (type or function) supports values that satisfy these constraints:

Nothing needs to be done to destroy it
If it is copyable, copying can be done with memcpy
If it is moveable, moving can be done with memcpy

And the claim is that combinations of this syntax with copyability and/or moveability constraints would cover a lot of different cases. Hmmmm..... I'm also struggling to come up with a motivating use case...

Joe_Groff · February 12, 2024, 9:24pm

Being able to check independently whether a type has a no-op deinit, and skip a elementwise destructor loop if so, can also be a significant optimization. It doesn't necessarily need to be exposed as a static type constraint for that to be possible, though. Some developers have raised the point that there could be useful types with a no-op deinit but which contain interior pointers to themselves or address-discriminated signed pointers, so they wouldn't be bitwise-movable, so we wouldn't necessarily want "no-op deinit and movable" to have to imply bitwise-movable.

jrose · February 12, 2024, 9:30pm

Well, let’s see:

Nontrivial deinit, trivially copyable: (probably impossible)
Nontrivial deinit, trivially movable, nontrivially copyable: strong class references
Nontrivial deinit, trivially movable, non-Copyable: file handles, most Rust types
Nontrivial deinit, nontrivial move: ObjC weak references, many C++ types
Nontrivial deinit, non-movable: some C++ types, usually self-referential

And then:

Trivial deinit, trivially copyable: integers
Trivial deinit, trivially movable, nontrivially copyable: ??
Trivial deinit, trivially movable, non-Copyable: some kind of token type
Trivial deinit, nontrivial move: self-referential storage using pointers
Trivial deinit, non-movable: tokens that hand out pointers? and self-referential storage that didn’t bother to provide a move operator

So even though there’s a missing cell here, I think it’s clear enough that there are valid uses in most of these combinations. Which might even be enough to split out “trivial deinit” as its own base thing (it’s the main requirement for a heterogeneous arena allocator, for example). We don’t currently have a formal notion of “non-movable” either, so it’s hard to speculate that far. But I do think it’s reasonable to not tie it to copying.

John_McCall · February 12, 2024, 9:45pm

Lifetime-restricted values often have the property that they don't specifically need cleanup — whatever cleanup is necessary is done in aggregate at the end of the lifetime — but may be non-copyable for semantic reasons, such as being a "first-class" reflection of a mutable access. For example, if you made an array of (disjoint) references into another array, the elements of that array would not be copyable, but they would be bitwise movable and trivial to destroy.

tbkka · February 12, 2024, 9:52pm

So now it sounds like there's interest in having three independent markers:

This value can be trivially destroyed ("TriviallyDestroyable"?)
This value can be bitwise copied in memory ("BitwiseCopyable"?)
This value can be bitwise moved in memory ("BitwiseMoveable"?)

Being able to overload independently on each of these three would be the maximally flexible approach, especially if these were exposed to developers in a way that the optimizer and runtime both had access to the result. A container type could then optimize a resize operation for BitwiseMoveable values, a copy operation for BitwiseCopyable values, and a deinit for TriviallyDestroyable values.

Joe_Groff · February 12, 2024, 9:58pm

If this cell is truly empty, then it seems like it'd be safe to say:

typealias BitwiseCopyable = TrivialDeinit & BitwiseMovable & Copyable

The only counterexamples I can think of would be if the copy operation is nontrivial only for the sake of logging or instrumentation purposes that don't otherwise affect the main program's behavior, which is not something we currently support.

jrose · February 12, 2024, 10:19pm

Oh, I guess a RwLock based on an atomic could be trivially movable and trivially destructible but not trivially copyable, since copying would require locking but moving ensures no one is currently borrowing the thing. That’s a bit contrived, especially given our atomics being more constricted than Rust’s, but it’s not totally implausible.

Joe_Groff · February 12, 2024, 10:23pm

Atomics bring up another possible distinction between "bitwise movable" and "bitwise borrowable". Atomics are literally bitwise-movable, since as you noted you can only move one when you have ownership, but "bitwise movable" in most other cases also carries an expectation that the value is address-independent, so values of the type can be passed in registers, a borrowed value can memcpy'd to create another valid borrowed value, and things like that, which aren't true for atomics or lightweight locks because they do need to have fixed addresses at least while they're being borrowed.

wadetregaskis · February 12, 2024, 11:27pm

That's actually an interesting thread to pull on, as I can think of more than one scenario in which these behaviours might vary depending on debug vs release builds (or other pseudo-dynamic factors). e.g. in debug builds you might register each instance [as a pointer] in some global pool to aid debugging and testing, but omit that for release builds. So trivially moveable, copyable, and deinitable only in release builds [at best].

I can't think of any particularly compelling cases, though - just relatively esoteric or "neat trick" examples.

carlynorama · February 12, 2024, 11:30pm

As someone who loves a good memcpy, what about BitwiseCodeable? That tells me that either the memory layout is simple, or I will be given a memory layout plan that I can trust. Which is all "BitwiseCopyable" wants to promise, right?

one could have type thats a BitwiseCodeable & ~Copyable without feeling there's a contradiction in the names.

So that means there wouldn't need to be BitwiseCopyable and BitwiseMovable. If its BitwiseCodeable and Copyable it would be "BitwiseCopyable" if its BitwiseCodeable and Movable, its "BitwiseMovable"

Jumhyn · February 12, 2024, 11:40pm

My interpretation of 'BitwiseCodable' would be 'a type that can be written to a binary serialization format with just raw byte copies' which is a stronger promise than what BitwiseCopyable makes.

carlynorama · February 12, 2024, 11:45pm

A type that conforms to the Codeable protocol today can either take advantage of default implementation because its all very simple, which would line up with that kind of raw situation... or one can provide a custom decoder which gives more flexibility. The proposal really read as very similar to Codable to me, but totally prepared to be wrong.

Jumhyn · February 13, 2024, 12:00am

What I'm getting at is that I would expect a type which declares itself BitwiseCodable to be, at a minimum, BitwiseCopyable because I should be able do:

file.write(value.codableBytes)
let valueCopy = file.readCodableBytes(as: SomeBitwiseCodableType.self)

jrose · February 13, 2024, 12:14am

BitwiseCodable raises a lot of questions about endianness and validation, not to mention field additions over time. It might be worth coming back to, but I think it’s still different from this initial proposal.