[Pitch] 128 bit Integer Types

scanon · February 22, 2024, 12:14am

128-bit Integer Types

Author: Stephen Canon
Review Manager: TBD
Status: Pitch

Motivation

128b integers are the largest fixed-size type that is currently commonly used in "general-purpose" code. They are much less common than 64b types, but common enough that adding them to the standard library makes sense.

We use them internally in the standard library already (e.g. as an implementation detail of Duration).

Proposed solution

Introduce two new structs, UInt128 and Int128, conforming to all of the usual fixed-width integer protocols.

Detailed design

The API of these types are entirely constrained by their protocol conformances. No other API is being invented or introduced. They will have the same API as any other fixed-width integer type.

While the API of these types is fully determined, their ABI must be resolved. Specifically, we must address the question of their endianness and alignment. This has not been a decision that we had to make for other basic integer types--their layout was determined by the constraint of interoperability with the corresponding C type. However, C does not have _a _ 128-bit integer type--it has between zero and two.

There's no good reason not to maintain platform endianness, so we should simply do that. However, alignment is quite a bit murkier. Let's dive in.

Clang and GCC have historically exposed the extension types __uint128_t and __int128_t on 64b platforms only. These types basically behave like C builtin integer types--their size and alignment are 16B.

The C23 standard introduces _BitInt(N) as a means to spell arbitrary-width integer types, but these still have some warts. In particular, _BitInt(128) as implemented in clang has 8B alignment on x86_64 and arm64. For arm64, this is clearly a bug; the AAPCS specifies that it should have 16B alignment. For x86_64, the situation is less clear. The x86_64 psABI document specifies that it should have 8B alignment, but the authors of the proposal that added the feature tell me that it should be 16B aligned and that they are attempting to change the psABI.

I propose that [U]Int128 be 16B aligned on 64b platforms,¹ matching __[u]int128_t (and _BitInt(128) assuming that C fixes their mess before the ABI becomes locked in). On 32b platforms, I propose that it has whatever alignment UInt64 has. This matches the behavior of _BitInt() on platforms where it's been defined, and is really the only sensible definion for platforms that do not have a clear precedent. It certainly should not be less-aligned than UInt64, and there's not much reason to give it higher alignment on any 32b architecture either.

The clang importer will be updated to bridge __uint128_t to UInt128 and __int128_t to Int128. We will not import _BitInt() types until the ABI problems with those types have been clearly resolved.

¹ For the purposes of this discussion, arm64_32 is a 64b platform; i.e. UInt128 will be 16B aligned on that target; I would expect the same to apply to Swift targeting other "32-bit pointer in a 64b environment" platforms.

Source compatibility

This proposal has no effect on source compatibility.

ABI compatibility

This proposal has no effect on ABI compatibility.

Implications on adoption

Adopting this feature will require a target with runtime support.

Future directions

Implement clang importer support for _BitInt(128) on any platforms where the finalized ABI is compatible with our layout.

Alternatives considered

Rather than adding [U]Int128, we could implement some form of generic-sized fixed-width integer (like _BitInt() in C). Given both the lack of consensus around what integer generic parameters ought to look like in Swift (or if they ought to exist at all), and the growing pains that _BitInt() is currently going through, such a design would be premature.

While other fixed-width integer types are interesting, 128 bits is a couple orders of magnitude more useful than all the others for general-purpose software at this point in time. So doing only 128b integers now makes good sense.

I'm interested in feedback about my thinking regarding alignment; everything else about this pitch is entirely determined by existing protocol conformances. I'll make a package available sometime in the next few days for people to experiment with using these types.

Joe_Groff · February 22, 2024, 12:23am

If existing platform ABIs for C specify an alignment for a 128-bit integer type (and hopefully only one), then I agree we should follow the C ABI in order to maintain seamless interop. Taken in isolation, aligning integer types more than the platform's largest GPR size (so, 4 bytes on a 32-bit platform, or 8 bytes on a 64-bit platform) seems like a bit of a waste of memory to me, since the alignment doesn't seem like it would buy much.

scanon · February 22, 2024, 12:29am

As I noted, the x86_64 psABI currently specifies two different alignments(!) for __int128 and _BitInt(128) (16B and 8B, respectively). The AAPCS specifies 16B for both "quadword integer" (__int128_t, pretty much) and _BitInt(n) where 64 < n <= 128, but is not faithfully implemented by clang.

On 32b ARM, 8B alignment is desirable for 64b and 128b types because the load/store dual/multiple instructions require an extra cycle on many uArches when the address is not 8B aligned. I expect similar considerations apply on some other 32b CPUs. The proposal that we match the alignment of UInt64 allows us to benefit from these considerations.

jrose · February 22, 2024, 12:31am

Prior art: Rust has had i128 and u128 for years, though on the topic of alignment the docs only say

Most primitives are generally aligned to their size, although this is platform-specific behavior. In particular, on x86 u64 and f64 are only aligned to 32 bits.

and I know Gankra has caught differences in the C ABIs of __int128_t on different compilers, not just differences between C and Rust. (I think these have to do with whether you can split a u128 over a register and the stack if you’re about to run out of registers.)

scanon · February 22, 2024, 12:31am

Rust is dealing with the same mess w.r.t. C types, FWIW.

Joe_Groff · February 22, 2024, 1:00am

For Windows and Apple platforms at least, it seems like we can look to what the platform SDKs do in practice, if they have any __int128, __int128_t, or _BitInt(128) types in their APIs at all, as a potentially stronger source of authority if C implementors are waffling. I don't know if that helps though.

ksluder · February 22, 2024, 1:16am

My instinct is:

Int128 and UInt128 should use 16B alignment on all platforms.
For now, the importer should import __uint128_t as (hi: UInt64, lo: UInt64), and __int128_t as (hi: Int64, lo: UInt64), unless the type is also marked as alignas(16), in which case it should import as Int128 or UInt128.
_BitInt(128) would be treated as __int128_t, and unsigned _BitInt(128) would be treated as __uint128_t.
When Swift is built with a clang that has 16B alignment for __(u)int128_t, then the importer can import it as (U)Int128 without the explicit alignas.

jrose · February 22, 2024, 2:00am

That would be a source-breaking change and we must not do something like that.

ksluder · February 22, 2024, 2:01am

What would be source-breaking? You mean eventually changing how we import C types? If we start importing __uint128_t as UInt128 from the outset, whenever Swift adopts a clang that changes its alignment it would pick up an ABI breaking change.

jrose · February 22, 2024, 2:15am

If someone has code that manipulates the structure of an int128_t using lo and hi, removing those members would break.

One option to avoid this is to have an opaque type CInt128, much like CGFloat. But too many people disliked CGFloat, enough to add bespoke implicit conversions to the language.

The more conservative option is to say “if the C type doesn’t match whatever we pick for the Swift type in ABI, it’s not imported at all”, and then clients can manually write helper functions if they need it anyway, as they do today. That is strictly more powerful, and forward-compatible with C compiler changes (even if those changes would be ABI-breaking on the C side…), but does make more code platform-dependent than it might be otherwise.

scanon · February 22, 2024, 2:24am

I’m not sure why we would do this; those types are always 16B aligned on platforms that support them, and so would never have such an alignment annotation in C.

ksluder · February 22, 2024, 2:28am

Sorry, I got very confused by this reply and thought __int128_t was getting an 8B alignment too.

Karl · February 22, 2024, 7:23am

I was curious if this was being tracked, and found the following discussions which folk may find interesting.

psABI Gitlab issue:

Discussion then continues on this mailing list thread:

https://groups.google.com/g/x86-64-abi/c/-JeR9HgUU20?pli=1

wadetregaskis · February 22, 2024, 5:31pm

Atomic support

ARMv8-A already supports 128-bit atomic reads & writes, to a limited extend, with ldaxp/stlxp. They require full (128-bit) alignment:

Memory accesses generated by Load-Acquire pair or Store-Release pair instructions must be aligned to the size of the pair, otherwise the access generates an Alignment fault¹.

(it doesn't seem to explicitly say that they're atomic over the whole pair, but that seems implied by their purpose)

Granted Atomic (or similar) wrappers could force a coarser alignment (I assume…?).

General future support

I see the appeal in memory efficiency of using 64-bit alignment today because the implementation will happen to be based on 64-bit loads & stores, right now, so it happens to work - but it might be short-sighted.

If Arm adds [more] 128-bit load/store support in future², it will almost certainly require natural (128-bit) alignment (as is the convention for all existing load/store widths).

This will probably "just" mean "for full performance" for plain load/stores to regular ("Normal") memory accesses, given A-profile requires support for unaligned accesses there (I think?), but it will probably mean "to not fault" for ldx/stx, the various LSE instructions³, etc. It will almost certainly be the only way to guarantee the load/store doesn't tear, in any case. And I expect unaligned load/store will continue to be hard disallowed for Device memory.

That said, currently LDP/STP (load/store GPR pairs) are guaranteed atomic (for the whole lot) - if the whole load/store is 64 bits or less - even if only half aligned. It's likely that would be extended up to 128 bits, while preserving the half-aligned special exception, if 128 bit support were more broadly added. So that might provide a partial escape hatch. It doesn't help for LSE instructions etc, though, so it'd still have a hefty performance cost.

"Smaller" [micro]architectures

If Swift is to consider non-A-profile ARMv8 architectures (e.g. M- or R- for Embedded Swift) it's almost certainly going to require full alignment since those tend not to support unaligned accesses (last I checked). Although the pertinent question there is if & when they support intrinsic 128-bit load/stores.

References

B2.2.1 Requirements for single-copy atomicity (and subsection: Changes to single-copy atomicity in Armv8.4)
B2.5.2 Alignment of data accesses
C3.2.7 Load-Acquire/Store-Release
C3.2.12 Atomic instructions

¹ Unless LSE2 is implemented, in which case it's a slightly looser rule that the operation must not straddle a 16-byte boundary - although that is of course equivalent, for pairs of 64-bit values.

² And not necessarily for a hypothetical AArch128. AArch64 can of course support types larger than 64 bits (it already does, with NEON & SVE). Though admittedly that's not a perfect comparison since NEON & SVE load/stores maybe (?!) don't guarantee atominicity at all currently:

Reads to SIMD and floating-point registers of a 128-bit value that is 64-bit aligned in memory are treated as a pair of single-copy atomic 64-bit reads.

Atomicity rules for SIMD load and store instructions also apply to SVE load and store instructions.

But:

…load/store [scalar or vector] SIMD and floating-point instructions make no guarantee of atomicity, even when the address is naturally aligned to the size of the data.

(C3.2.9 Load/store scalar SIMD and floating-point, and C3.2.10 Load/store Advanced SIMD)

SVE unpredicated load and store instructions do not guarantee that any access larger than a byte will be performed as a single-copy atomic access.

³ LSE1 doesn't support unaligned accesses at all, and LSE2 loosens that slightly but still requires essentially 16-byte alignment; its operands cannot straddle a 16-byte boundary, irrespective of their size [less than or equal to 16 bytes].

Tangentially, LSE2 sounds like a big boon for memory efficiency regarding atomics since it lets you bin-pack structs much better. I don't know if any of Apple's current microarchitectures support it, and certainly some do not, but hopefully one day it can be taken as a given (on Apple platforms at least).

scanon · February 22, 2024, 5:41pm

Atomicity is really neither here nor there for the purposes of this question; we use the WordPair type for 128b atomic accesses on 64b platforms that support it, which would not be replaced with [U]Int128. On 32b platforms there is no 128b atomic support, but if there were one it would follow a similar pattern.

[U]Int128 will conform to AtomicRepresentable via WordPair, so Atomic<[U]Int128> will be 16B aligned on any platform that requires it, even if [U]Int128 does not end up being 16B aligned in that platforms Swift ABI.

wadetregaskis · February 22, 2024, 5:41pm

Also, re. C compatibility, perhaps it's fair to apply the principle that people should only pay for what they use? If [U]Int128 were forced to be only half-aligned just for C compatibility, that's penalising everyone who uses pure Swift.

(there may inevitably be C bridges in lower layers, but better to abstract away the problems with C compatibility at those integration points, than let them leak through)

wadetregaskis · February 22, 2024, 5:43pm

Why is that?

Joe_Groff · February 22, 2024, 5:49pm

Is there a real performance hit to half-aligning a 128-bit integer? It's going into two machine registers already. Steve noted that 32-bit ARM cores benefit from 64-bit aligning ldm/stm, which is an argument for 64 bit aligning them at least. I might be spoiled by Apple CPUs but are there AArch64 implementations where ldp/stp take a similar hit? With the ever-widening speed gap between memory and computation, the wasted memory from over-alignment seems like the bigger hit to worry about. (All this is speaking in isolation, of course; I still think maintaining C/platform ABI compatibility is worth whatever tradeoff we make.)

scanon · February 22, 2024, 5:51pm

A few reasons: first it would introduce needless churn for existing users of swift-atomics that are adopting the standard library feature, but also because most uses of 128b atomics are not semantically a 128b integer; it's two pointers or a pointer and length, or some other similar representation; the .first / .second properties make this quite a bit less error prone than thinking about endianness or other unpacking considerations. We can also expose a means to access it as a 128b integer, of course, for people who need that.

Alejandro · February 22, 2024, 5:53pm

It also eliminates the need for folks who want to write double wide atomics for both 32 bit and 64 bit systems to say:

#if ...
let x: Atomic<UInt128>
#else
let x: Atomic<UInt64>
#endif