[Pitch] 128 bit Integer Types

wadetregaskis · February 23, 2024, 2:08am

Note that we already have this, in a sense, for unsigned integers - you can use the sign bit for the 'none' value, e.g. use an Int64 but values less than zero mean nil, so it's essentially an Optional<UInt63>.

Of course, Swift won't do this for you, which makes it relatively tedious and error-prone.

There is some appeal to being able to express that to Swift; to give it permission to use a bit rather than as much as 64 more bits (due to alignment requirements).

But, that's tangential to this pitch; I don't think it obviates the need for nor merit of [U]Int128.

Joe_Groff · February 23, 2024, 2:21am

My take on limited-range integer values in general is that it's better to keep the limited values, and corresponding smaller representations, confined to the storage where those smaller representations are necessary, while still providing a common big-enough Int type as your API. A property macro might be able to take a Pascal-ish declaration like

  @LimitedRange(0...9000) var powerLevel: Int

and produce a storage property of smaller type, with bounds checks on get/set when converting to Int at the API level. If the field is Optional typed it could also use an out-of-range value to represent the nil case.

wadetregaskis · February 23, 2024, 3:13am

I think the main issue with that is that it turns trivial assignment into something that can crash, e.g. if you write powerLevel = 9001.

You can try to work around it with bespoke setters that throw instead (not great ergonomics without additional language support), or you can just silently clamp the value (but doesn't make sense in many cases).

tera · February 23, 2024, 3:30am

But that we have with normal ints, when you assign a big or negative value to, say, UInt8.

Unchecked exceptions could work great in this and similar cases.

How can I make a 1 byte struct like Int8, and optional of it that doesn't add another byte?

struct Int7 { // one byte, values in -64 ... +63 range
    private var rawValue: Int8
    ...
} 
struct Int8x { // one byte, values in -127 ... +127 range
    private var rawValue: Int8
    ...
} 
var x: Optional<Int7> = .some(42)  // still one byte wanted!
var y: Optional<Int8x> = .some(42) // still one byte wanted!

Does it require compiler support?

Joe_Groff · February 23, 2024, 3:37am

If you're forcing the type conversion onto your clients, then a crash is still there when they write powerLevel = UInt16(someInteger) or something like that. Maybe a developer could use one of the other non-trapping constructors but I doubt they would in practice. Trapping when preconditions are unmet is usually the right thing to do.

The supported way would be to define an enum with fewer than 128 cases. The less-supported way would be to -enable-experiment-feature BuiltinModule and wrap the Builtin.Int7 type in a struct.

tera · February 23, 2024, 3:45am

I guess this won't work for Int31 / Int32x

That's to support "-1" type like Int7, Int31, etc?

What about Int8x, Int32x types above (the ones that sacrifices a single value of the whole range to represent nil)?

If Optional<T> can't do this, maybe we could introduce some "OptionalProtocol", that types could conform to?

Pseudocode:

struct Int32x: OptionalProtocol { ... }
var x = Int32x(42)
if let y = x { // ✅
    ...
}

al45tair · February 23, 2024, 9:24am

Aligning to the size of the type does have one significant advantage; it stops reads or writes of that type from splitting across cache line and page boundaries. It's hard to be certain, but I would expect many uses of these large types to want to align to the size of the type even if the underlying implementation supported a smaller alignment boundary, to avoid cross-boundary reads and writes.

I also note that malloc() on many platforms already aligns to 16 byte boundaries because of the alignment requirements of vector instructions, and further that sensible choice of ordering within a struct or class would allow the compiler to pack data into the space that would otherwise be used as padding. Given that some platforms will be 16-byte aligned, it seems likely that structs that contain Int128s would be ordered appropriately for that situation anyway, so the memory saving is probably more limited in practice.

al45tair · February 23, 2024, 9:36am

Maybe the fix here is to fix things so that Array<Optional<SomeType>> keeps the optional flags in a side table instead of with the data, unless SomeType has extra inhabitants. That would be more efficient in the general case anyway, and the only problem is if we've somehow guaranteed the memory layout of Array<Optional<SomeType>>

ksluder · February 23, 2024, 2:46pm

It would be strange (and a breaking change) for Array of Optional to not be able to yield contiguous storage.

al45tair · February 23, 2024, 2:50pm

The Array type actually doesn't guarantee contiguous storage, not least because it can be bridged from NSArray, which also doesn't guarantee contiguous storage. ContiguousArray does, of course.

tera · February 23, 2024, 2:58pm

I was too quick to jump to a conclusion; it works perfectly:

enum Enum255 {
    case x00, x01, ... xFE // enum with 255 cases
}
struct Int8x { var rawValue: Enum255 }
struct Int16x { var rawValue: (UInt8, Enum255) }
struct Int32x { var rawValue: (UInt8, UInt8, UInt8, Enum255) }
struct Int64x { var rawValue: (UInt8, UInt8, UInt8, UInt8, UInt8, UInt8, UInt8, Enum255) }

precondition(MemoryLayout<Optional<Int8x>>.size == 1)
precondition(MemoryLayout<Optional<Int16x>>.size == 2)
precondition(MemoryLayout<Optional<Int32x>>.size == 4)
precondition(MemoryLayout<Optional<Int64x>>.size == 8)

When working with this type I need to ensure that the "nil" pattern (which, experimentally, happens to be 00 00 ... 00 FF doesn't happen during normal operations on the type (like +, -, etc). For example, speaking of Int16x: the valid range is -32767 ... +32767, when converting between user facing values and internal representation I can offset that range so that the prohibited number 0x8000 corresponds to the nil pattern 0x00FF (taking endianness into account).

I can do a similar trick with Optional<struct holding UnsafeRawPointer> which is also 8 bytes, but that's only for 8 byte sized types.

PS. For good or bad such types won't be aligned the same way as their normal integer counterparts.
PPS. I'm missing C unions

Joe_Groff · February 23, 2024, 3:17pm

It might be interesting to expand our notion of type layout to have a "preferred alignment" that can be more than the ABI-required alignment, which we would use when allocating or emplacing values of the type without any other constraints, while generating code that is tolerant of values being only at the required alignment. Since most instructions on contemporary CPUs can tolerate less-than-ideally-aligned memory accesses, you can get the benefit of well aligned data "for free" without making that good alignment a global structural requirement.

We generally want people to think of it as guaranteeing contiguous storage, since even if it's backed by an NSArray, you can force it into a linear array using withUnsafeBufferPointer, and developers use that as if it's a "free" operation so they'll notice if it starts getting more expensive. I agree with you and @ksluder that, in most cases where you're consciously storing an array of sparsely-present values, a better data structure is appropriate, but [Int128?] could nonetheless arise from generic standard library or common utility packages.

al45tair · February 23, 2024, 3:28pm

If the NSArray grows beyond a certain size, that will be expensive already (because large NSArrays are backed by a 2-3 tree, not by contiguous memory). But yes, that is the trip hazard here if we pulled the optional flags out into a separate block of memory.

scanon · February 23, 2024, 3:37pm

We could definitely build such a thing, but I think we'd make it a different type than Array, because it violates a bunch of basic expectations people have for Array.

al45tair · February 23, 2024, 3:38pm

It's the good old std::vector<bool> problem all over again, I guess

Joe_Groff · February 23, 2024, 4:19pm

The existing trip hazard is only there for code that has to interoperate with Objective-C, where the Objective-C side uses NSArray subclasses that don't themselves implement a contiguous storage SPI. So it's not a concern for "pure" Swift code or code on non-Apple platforms. It would be a new thing if we started making Array produce not-necessarily-contiguous storage on behalf of native Swift code.

ksluder · February 23, 2024, 4:32pm

Exactly. I wouldn’t want to see Swift go down the C++ route of “specialized generics”. Language concepts are better when they compose.

benrimmington · February 23, 2024, 5:00pm

It's not in a numeric protocol, but would init(bitPattern:) also be included?

extension Int128 {
  public init(bitPattern: UInt128)
}

extension UInt128 {
  public init(bitPattern: Int128)
}

Codable only supports 8, 16, 32, and 64-bit integers directly, so how should 128-bit integers be encoded?

scanon · February 23, 2024, 5:10pm

Yes, as well as the big/littleEndian API.

wadetregaskis · February 23, 2024, 5:55pm

Presumably Encodable and Decodable would be updated, in Foundation, once [U]Int128 is available to them?