Awful codegen for ==(_:_:) of plain-old-data type

taylorswift · October 24, 2023, 11:14pm

i have a plain-old-data IPv6 type defined with a layout matching the “natural” UInt16x8 format:

@frozen public
struct IPv6:Equatable, Hashable, Sendable
{
    public
    var a:UInt16
    public
    var b:UInt16
    public
    var c:UInt16
    public
    var d:UInt16
    public
    var e:UInt16
    public
    var f:UInt16
    public
    var g:UInt16
    public
    var h:UInt16
}

the stored properties are just raw storage and hold the words in big-endian representation.

for curiosity, i plugged this into godbolt to see what the synthesized Equatable conformance looks like:

public
func eq(a:IPv6, b:IPv6) -> Bool
{
    a == b
}

-O
-whole-module-optimization

output.eq(a: output.IPv6, b: output.IPv6) -> Swift.Bool:
        xor     eax, eax
        cmp     rdi, rdx
        jne     .LBB41_5
        cmp     si, cx
        jne     .LBB41_5
        mov     rdx, rsi
        shr     rdx, 16
        mov     rdi, rcx
        shr     rdi, 16
        cmp     dx, di
        jne     .LBB41_5
        mov     rdx, rsi
        shr     rdx, 32
        mov     rdi, rcx
        shr     rdi, 32
        cmp     dx, di
        jne     .LBB41_5
        shr     rcx, 48
        shr     rsi, 48
        cmp     si, cx
        sete    al
.LBB41_5:
        ret

thats… yikes! is it really comparing the addresses 16 bits at a time?

if i lay out the type like:

@frozen public
struct IPv6:Equatable, Hashable, Sendable
{
    /// The prefix address, in big-endian byte order.
    public
    var prefix:UInt64
    /// The subnet address, in big-endian byte order.
    public
    var subnet:UInt64

    @inlinable public
    init(prefix:UInt64, subnet:UInt64)
    {
        self.prefix = prefix
        self.subnet = subnet
    }
}

instead, i get:

output.eq(a: output.IPv6, b: output.IPv6) -> Swift.Bool:
        xor     rdi, rdx
        xor     rsi, rcx
        or      rsi, rdi
        sete    al
        ret

any reason why the compiler can’t do that on its own?

wadetregaskis · October 24, 2023, 11:35pm

I'm surprised it doesn't default to (essentially) inlining memcmp. It would only have to do anything more complicated if it detects something unusual about the struct's layout, like interior padding (assuming there aren't guarantees otherwise that the padding bytes will always be the same?).

taylorswift · October 24, 2023, 11:37pm

it shouldn’t be calling memcmp either, based on @frozen, it should know there is no interior padding and that its size = stride = 16, enabling:

        xor     rdi, rdx
        xor     rsi, rcx
        or      rsi, rdi
        sete    al

wadetregaskis · October 25, 2023, 12:37am

Right, but that's why I specified inlining memcpy. For most types they're small enough that the inlined implementation should optimise down to just be some bespoke loads and compares. But for larger types it might actually make sense to not inline it - once you get above a certain size and have to start using an actual loop, it might be better to just call memcpy (trading off call overhead vs code size).

(Unless you're on x86 or somesuch where you have some many-word monstrosity of an instruction that basically is memcmp.)

Come to think of it I haven't actually used memcpy for an equality implementation in Swift, yet, but it's something I've used in the past in C/C++ and the like, for essentially the same thing.