[Pitch] 128 bit Integer Types

On some uArches cacheline-straddling accesses incur a significant penalty, but on most recent cores that penalty is generally less than the broader system impact of padding out memory (and therefore touching more cachelines overall). Similar considerations apply to page-straddling access.

As you go to wider and wider types, the balance swings further. For 128b you can plausibly argue both ways, but by the time you get to 256b it's pretty clear that you should not be trying to make them 32B aligned by default.

8 Likes

Right, but for [U]Int128 why wouldn't Atomic<Int128> just use Int128 under the hood? It'd be weird if it used a different type if Int128 were inherently [amendable to being] atomic anyway.

I mean, it's probably not a big deal in any sense - most people would never know or care. It just seems odd in principle.

Kind of like how you can ldp/stp two 32-bit words on AArch64, and sometimes that's convenient when you really want them in separate registers for some reason, but it adds conceptual complexity and generally it seems unnecessarily (you save a little bit of instruction space, but how often would that actually end up mattering?). The complexity can be dismissed if the compiler's doing it under the hood, as merely an optimisation, but less so if it surfaces into the visible language.

On A-profile for non-atomic use of Normal memory, probably not. I seem to recall some Arm CPU architects lamenting that unaligned accesses have to be fast because so much legacy (i.e. x86) code assumes they are. For better or worse.

Keep in mind "atomic" here is in the purest definition, meaning it encompasses tearing too, even for standalone loads or stores. Whether compare-and-swap is fast is one thing, but not being able to reliably read or write a single integer is really bad.

I don't know.

I vaguely recall discussing alignment with various CPU architects in the [Arm] server space years ago, and I seem to recall them being mildly annoyed at having to handle misaligned accesses, but I don't remember what the actual micro-architecture implementations ended up doing. I suspect they had to accomodate misalignment, because [again] x86.

Hard to predict that more than a few years out, though. And the memory story (latency, bandwidth, etc) varies a lot when you consider the whole Arm ecosystem (let-alone other architectures!), beyond just Apple's little corner.

Within Apple's space, memory bandwidth and latency are relatively high - but memory capacity is arguably a bit low. So yes, density is somewhat favoured on Apple's platforms.

That all said, I'm not sure how much difference any of this really makes, to the end user - it seems likely that 128-bit data types will be rare, and if they're not they'll be in large numbers (arrays or matrices) in which case they'll almost certainly be 16-byte aligned anyway, if only because malloc does it implicitly.

Hah.

I think the alignment question is maybe not as scary as it sounds. It seems perfectly reasonable to say it's 4- or 8-byte aligned on 32-bit and 16-byte aligned on 64-bit and that, if a 128-bit C integer has worse alignment, that's C's problem to solve.

I do think importing _BitInt(128) is the right thing to do even if it forces copies on some architectures/platforms, because interop with C remains a very important feature of Swift especially when doing any sort of systems programming. (But since _BitInt is brand new, I don't feel so strongly that I'd write my local MPP about it.) If we don't import it, but the layout ends up the same ignoring alignment, we could document that a pointer cast and unaligned load is allowed.

We should also make a best effort to bridge to __int128 and unsigned __int128 on Win64, if possible, otherwise there is no clear interop path for Windows developers using that type.

Anyway—ship it! Int128 would be a great addition to the stdlib.

2 Likes

If Swift said "we're making 128-bit values naturally aligned", would that have any impact on C? It seems like - from the discussions @Karl pointed to - there's general regret within the C community that half alignment has been used, but that it's also potentially changeable.

The problem comes when we get an __int128 * from C, or conversely try to pass an UnsafePointer<Int128> off from Swift to C. If Swift and C don't agree on alignment, then one side or the other is going to face UB dealing with the other side's pointers.

It is possible to say we simply don't interoperate with pointers to Int128, and treat it as a "bridged" type that only interoperates at the scalar level, like how you can get an NSString * from Objective-C as a Swift String but you can't get an UnsafePointer<String> from an NSString ** obviously.

10 Likes

There's some appeal to that, because this presumably wouldn't apply for collections of [U]Int128 - those would be governed by 16-byte alignment anyway, right?

Or, more to the point: would there be a conversion cost for Arrays / UnsafeMemoryBufferPointers of [U]Int128?

If it's only individual [U]Int128s that have a bridging cost - e.g. function arguments & return values - that seems less of a concern to me. Either they're not used often enough that the bridging cost matters, or the C function you're calling is small enough that you can probably just rewrite it in Swift (well, maybe… assuming Swift supports any compiler intrinsics or other special stuff that you might otherwise be forced into C for).

And/or, could WordPair (or equivalent) be used as the imported type for any C types where they're not known to be naturally aligned? That way you still have full access and compatibility with C with at most some tiny inconvenience (assuming WordPair gains integer: Int128 & unsignedInteger: UInt128 properties, and corresponding initialisers).

1 Like

We also have enormous caches these days. I believe the M1 has 12MB L2 cache per cluster of 4 performance cores, and the M2 has 16MB. Then there's a system level cache shared with the GPU (and Neural Engine?) that's 24-96MB depending on Pro/Max/Ultra. Or something like that.

AMD's 7900X3D has 128MB of L3 cache. That's a very high-end processor, but still. I remember when having 2MB of L2+L3 cache was considered a lot. Now we're getting in to the hundreds of MBs in consumer parts.

3 Likes

This is something we have to deal with more broadly, and we should provide some mechanism to import under-aligned and over-aligned types relatively painlessly and correctly, but that's a separate proposal.

2 Likes

One Swift specific thing to consider is the size of Int128?. Unless we make a particular value invalid so it can represent nil directly, the tag will take up an extra byte, and the alignment of Int128 will directly affect how much padding follows. So if it's 16-byte-aligned, then [Int128?] will use 32 bytes per element, whereas if it's 8-byte-aligned, then [Int128?] will use 24 bytes per element, neither of which is great, though the latter would at least not be "worse" than [Int?] already is on 64-bit platforms. Swift at least reclaims tail padding in aggregates so an (Int128?, Int8) should fit the second element in the padding, unlike C.

5 Likes

I think it’s less likely we’d regret giving Int128 natural alignment, since anyone storing large arrays of optional Int128s can do a much better job of optimizing by packing the “optional” bits into their own UInt128., especially if they plan to process the array with SIMD.

4 Likes

Int64 being 4-byte aligned on i386 keeps popping up as a surprise; it might be nice if we could avoid intentionally repeating that here.

It may not be best to overthink the costs of padding. If wasted memory is a concern, engineers have the option to split their 128-bit integers into smaller chunks and store those, like Duration does.

Duration contains a 128-bit integer in spirit, but it does this by storing it in two halves, which has the effect of reducing its alignment.

@frozen
public struct Duration: Sendable {
  /// The low 64 bits of a 128-bit signed integer value counting attoseconds.
  public var _low: UInt64

  /// The high 64 bits of a 128-bit signed integer value counting attoseconds.
  public var _high: Int64

  internal var _attoseconds: _Int128 {
    _Int128(high: _high, low: _low)
  }
}

Incidentally, I think it would be useful if a Duration.attoseconds property and a corresponding init(attoseconds:) initializer would land as part of this work. (The current APIs do work, but they are a bit too indirect.)

We also should call out that [U]Int128 will conform to AtomicRepresentable on architectures that support that. One interesting curiosity is that we have one arch with 32-bit words that still has support for "quadword" atomics: arm64_32. We should clarify whether we want to ship the AtomicRepresentable conformance there.

9 Likes

If Int128 existed at the time Duration was defined, would it have been defined differently? e.g. to just use an Int128, or even to just be an Int128?

I'm not sure how important the answer is to this discussion, but maybe it hints at which way the balance is leaning currently.

I assume Duration won't change as a result of this proposal, for ABI compatibility reasons if nothing else?

2 Likes

That seems like the right middle ground if we can't guarantee the pointers are equivalent, yeah. In terms of prior art, I was thinking about things like NSError which bridges to Error as a type, but if passed by reference (i.e. NSError **) is left alone as NSErrorPointer (ignoring throws bridging of course.)

No, I don't think so! The alignment of Duration was very intentionally chosen to be the same as Int64 everywhere (usually 8, except i386, where presumably it's 4).

We did consider explicitly setting its alignment to 16, and we deliberately chose not to do that, precisely because we worried about unnecessarily wasting memory on padding. Optional Durations definitely do frequently pop up as stored properties, sometimes perhaps even in bulk.

(I think I was the one finalizing the Duration ABI -- hence my confident assertion about the choice being deliberate & independent of the availability of a proper Int128 implementation. But my hand was lovingly guided by expert opinion, Patrick Swayze style -- hence the royal "we".)

struct Duration is a frozen type, so its layout is what it is. We do need new APIs on it, though!

5 Likes

100% in agreement there - the intent was originally to do that but we were limited by the constraints of ABI at the time.

@scanon per this type; do we envision that UInt128/Int128 would be castable to a reference via some sort of bridging? That might be tricky for a NSNumber to properly represent that.

2 Likes

We could have Int128/UInt128 be like Float80, which is a numeric type that does not bridge with NSNumber at all!

8 Likes

I'm guessing the fear is wasting space due to alignment due to introducing an extra byte for the optional flag. Duration is capable representing a span of ~700 ages of the universe to attosecond precision... could we reserve one of those myriad bit patterns to represent .none ?

1 Like

Optional is one example that's likely to be common, but any aggregate containing an Int128 would potentially be affected. For instance, if you had a struct { var timestamp: Int128; var otherValue: Int }, the alignment of Int128 would affect the alignment of the enclosing struct, causing it to have the same alignment and potentially extra padding between consecutive values in memory.

1 Like

Mentioning for completeness (not because I'm sure it's worth the effort): if there were [U]Int127 types they'd be nicely amenable to use with Optional, without any 'unnatural' modifications to their range. And would still be sufficient for 99% of use cases, I expect.

Sidenote: I'd love to see more powerful integer constraints, but I'm not sure the C-style "_BitInt" is the right solution. On many levels. e.g. what if I want to limit the value to positive integers (≥1), or even numbers, or powers of two? All common desires (some with distinct optimiser opportunities, like powers of two). My point being, that might be worth pursuing at some point but if so it's best done properly, not half-way like "_BitInt". [U]Int128 in the meantime, even if it has memory efficiency concerns with Optional, seems valuable.

2 Likes

Great idea! [U]Int127, [U]Int63, [U]Int31, [U]Int15 and [U]Int7 would fit very nicely with optionals. It's wasting the whole bit (so half of the range) rather than eating just a single bit pattern, but looks reasonable still.