How to find rounding error in floating-point integer literal initializer?

I have a type like this:

struct ExtraPrecision<T: FloatingPoint> {
  var large: T
  var small: T

Which represents the sum of large plus small, with small being less than large.ulp.

I want to conform it to ExpressibleByIntegerLiteral, and while large is easy to handle, I’m not sure how to calculate small:

extension ExtraPrecision: ExpressibleByIntegerLiteral {
  init(integerLiteral value: T.IntegerLiteralType) {
    large = T(integerLiteral: value)
    small = ???

Specifically, small should be the nearest representable T to the exact result of value minus large.

Is there any way to achieve this?

1 Like

A working solution:

struct ExtraPrecision<T: BinaryFloatingPoint> {
  var large: T
  var small: T

extension ExtraPrecision: ExpressibleByIntegerLiteral where T.IntegerLiteralType: FixedWidthInteger {
  init(integerLiteral value: T.IntegerLiteralType) {
    typealias I = T.IntegerLiteralType

    large = T(value)
    while I(exactly: large) == nil {
        if value < 0 {
            large = large.nextUp
        } else {
            large = large.nextDown
    small = T(value - I(exactly: large)!)

It's not exactly pretty, and it loses some precision near Int.max and Int.min

In the ideal solution when trying to represent Int.max, large would be Int.max+1, and small would be -1. In my solution large becomes (Int.max+1).nextDown:(

Hmm, that certainly works for the constraints you’ve included, but I’m not sure I want to require either T: BinaryFloatingPoint or T.IntegerLiteralType: FixedWidthInteger.

The ExtraPrecision mechanism ought to work for any FloatingPoint type regardless of radix, and it ought to work with an arbitrary-width literal type as well. For example, a floating-point type might have an arbitrary-width exponent and a fixed-width significand.

What constraint do you have in mind? ExpressibleByIntegerLiteral.IntegerLiteralType doesn't seem to have any extra requirement other than _ExpressibleByBuiltinIntegerLiteral which isn't terribly useful. I can't even (lossy) roundtrip with that.

1 Like

I used BinaryFloatingPoint because I have no idea how non-binary floating point numbers behave. Does it work with non-IEEE floats? No idea. You can remove the constraint, and check!

As for the FixedWidthInteger, as far as I know all types that are _ExpressibleByBuiltinIntegerLiteral are also FixedWidthInteger, so there's nothing to lose.

The initializer requirement for _ExpressibleByBuiltinIntegerLiteral takes a Builtin.IntLiteral parameter.

I’m having trouble locating the implementation for Builtin.IntLiteral (does anyone know where to find it?) but while searching I came across BuiltinIntegerLiteralType which is documented as “The builtin arbitrary-precision integer type. Useful for constructing integer literals.”

@John_McCall is the authority on Builtin.IntLiteral. It was introduced here if you want to dig into the history. I should note that since it's in the Builtin module, it's not really usable in code outside of the standard library.

1 Like

My hope was that we would eventually "productize" the concept into a library type. Note that it cannot support normal integer operations, though; it is not designed to be a runtime arbitrary-precision type.

It's an oversight that the ABI isn't documented. It is quite simple:

// Terminology:
//   chunkWidth := sizeof(size_t) * CHAR_BITS.
//   bit i of an integer is the bit set by (1 << i)
struct IntLiteral {
  // Chunks of the integer value.
  // Chunks individually have native endianness, but the chunks
  // are in little-endian order; i.e. data[i] represents bits
  //   (i*chunkWidth) ..< ((i+1)*chunkWidth)
  // Negative numbers are stored in two's complement, and
  // the last chunk is sign-extended.
  // The length of this array is:
  //   (bitWidth + chunkWidth - 1) / chunkWidth
  size_t *data;
  // Bit 0 is whether the value is negative.
  // Bits 1..7 are reserved and currently always set to 0.
  // Bits 8..<chunkWidth are the bit-width of the integer,
  // i.e. the minimum number of bits necessary to represent
  // the integer including a sign bit.
  size_t flags;

So suppose we wanted to represent the number -182716:

  182716 == 0x2c9bc == 010 1100 1001 1011 1100 (note leading zero)
 -182716            == 101 0011 0110 0100 0100 (note leading one)

In both cases, the bit width is 19. If we were using a 4-bit chunk (for exposition purposes), then the chunks array for -182716 would look like the following; note the sign-extension of the final chunk:

  0100, 0100, 0110, 0011, 1101