As I noted, the x86_64 psABI currently specifies two different alignments(!) for __int128
and _BitInt(128)
(16B and 8B, respectively). The AAPCS specifies 16B for both "quadword integer" (__int128_t
, pretty much) and _BitInt(n)
where 64 < n <= 128, but is not faithfully implemented by clang.
On 32b ARM, 8B alignment is desirable for 64b and 128b types because the load/store dual/multiple instructions require an extra cycle on many uArches when the address is not 8B aligned. I expect similar considerations apply on some other 32b CPUs. The proposal that we match the alignment of UInt64 allows us to benefit from these considerations.