SE-0368: StaticBigInt

Douglas_Gregor · August 2, 2022, 10:52pm

Hello Swift community,

The review of SE-0368 "StaticBigInt" begins now and runs through August 16, 2022.

Reviews are an important part of the Swift evolution process. All review feedback should be either on this forum thread or, if you would like to keep your feedback private, directly to the review manager. When emailing the review manager directly, please keep the proposal link at the top of the message.

What goes into a review?

The goal of the review process is to improve the proposal under review through constructive criticism and, eventually, determine the direction of Swift. When writing your review, here are some questions you might want to answer in your review:

What is your evaluation of the proposal?
Is the problem being addressed significant enough to warrant a change to Swift?
Does this proposal fit well with the feel and direction of Swift?
If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?
How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

More information about the Swift evolution process is available at

https://github.com/apple/swift-evolution/blob/main/process.md

Thank you,

Doug Gregor
Review Manager

allevato · August 2, 2022, 11:24pm

This looks great overall and resolves a real limitation for folks doing low-level numerics work. A couple questions about finer details:

The proposal states:

The implementation of init(integerLiteral:) must avoid calling APIs that may use Self -typed literals, which would trigger infinite recursion.

Can we detect this in the compiler and make it an error with an appropriate diagnostic like error: Using a literal of type $TYPE_NAME in its own literal initializer would cause infinite recursion?

  /// Returns the given value unchanged.
  public static prefix func + (_ rhs: Self) -> Self

Does this operator exist only to provide symmetry with negative literals, as pointed out by the inconsistency in the Future Directions section?

Just to clarify the behavior of this operator, since all the examples in its doc comment use UInt64:

public subscript<Element>(_ index: Int) -> Element

This isn't restricted to UInt64, right? If I coerced the results to UInt8, I'd get byte-sized slices of the integer, for example?

Should we change the signature of this to be:

public subscript<Element>(_ index: Int, as type: Element.Type = Element.self) -> Element

so that users have a choice in writing this instead?

let value: StaticBigInt = ...
_ = value[0] as UInt64
_ = value[0, as: UInt64.self]

AFAICT, it's more common in the standard library to not rely only on return type coercion but instead allow the option of feeding the type in as an optional argument (last time I checked, only numericCast didn't permit that).

tbkka · August 2, 2022, 11:33pm

This looks good. I'm especially happy that you didn't try to also solve the similar (but considerably more intricate) FP literal problems at the same time. ;-)

benrimmington · August 3, 2022, 1:48am

The literal is more likely to be hidden in an API call.

See also: Default implementations and unintentional infinite recursion.

Yes, and it's implemented as @_alwaysEmitIntoClient, in case we can remove it later.

Right, although using a UInt8 element would involve accessing the same word 4 or 8 times.

It's only restricted to integers from the standard library, to avoid unintentional infinite recursion.

Examples in the standard library are:

AsyncStream.init and AsyncThrowingStream.init
where the element type could also be provided in angle brackets,
i.e. AsyncStream<T>() vs AsyncStream(T.self)
MainActor.run, withTaskGroup, and withThrowingTaskGroup
where the argument is also used as the return type of a trailing closure.

I did consider adding an Element.Type = UInt.self parameter, but SE-0347 suggests that I'd need a different feature:

public subscript<Element = UInt>(_ index: Int) -> Element
//                       ^^^^^^

ksluder · August 3, 2022, 3:09am

  /// The elements are ordered from least significant to most significant, with
  /// an infinite sign extension. Negative values are in two's complement.
  ///
  ///     let value: StaticBigInt = -0x8000000000000000_0000000000000000
  ///     value.bitWidth        //-> 128
  ///     value[0] as UInt64    //-> 0x0000000000000000
  ///     value[1] as UInt64    //-> 0x8000000000000000
  ///     value[2] as UInt64    //-> 0xFFFFFFFFFFFFFFFF

The choice of value in this example isn’t the most illuminating, because the bit patterns for positive and negative powers of 2 are so similar. This example does illustrate something that confused me at first: you can’t look at any particular element of values and know whether the overall number is negative or positive. 0x8000… could be the most significant nonzero element of 2^127, or it could be the element preceding the infinite sign extension of -2^127.

Maybe the documentation could provide an algorithm for reconstructing the number from value?

xwu · August 3, 2022, 3:23am

This is a very useful addition to the language. I appreciate how the final API turned out. Two points of clarification—

First, regarding the subscript:
(Besides echoing @allevato's suggestion to allow for an optional explicit argument to specify the return type,) is there a reason that the examples use UInt64 as opposed to the recommended word-sized UInt? Your replies above suggest that UInt is indeed the preferred default here.

Relatedly, if it's desired to adopt @allevato's suggestion and also desired to default it to UInt, can that be accomplished/would it be worth it to manually add a second concrete overload of the subscript?

Second, regarding the prefix + inconsistency:
What is the exact error for StaticBigInt(+42) as currently implemented? Could that be overcome with an (admittedly somewhat gross) init(_: StaticBigInt) overload (made @alwaysEmitIntoClient in order to make it deletable in the future with zero ABI impact)?

One nit:
Although reasonable to assume, it may be helpful (in the proposal text and in the eventual documentation) to clarify explicitly that not only the subscript but also bitWidth is in reference to the two's complement representation for negative values.

benrimmington · August 3, 2022, 4:33am

Using value.signum() on a negative StaticBigInt will return -1 as Int.

StaticBigInt is always signed, so its bitWidth always includes the sign bit:

-0x80000000000000000000000000000000 is 128 bits, and the sign extension is all ones.
+0x80000000000000000000000000000000 is 129 bits, and the sign extension is all zeros.

I don't think the documentation of this specific API should be explaining two's complement representation.

However, the documentation comments can be improved during (or after) review of the implementation, if this proposal is accepted.

ksluder · August 3, 2022, 4:57am

That’s not quite what I was suggesting. The documentation currently doesn’t explain how to put signum, bitWidth, and values together.

Thinking out loud, I would try concatenating values[(bitWidth / MemoryLayout<U: UnsignedInteger>) >.. 0], and then, depending on the chosen runtime representation, possibly use signum to determine whether to undo the two’s complement transformation.

An example implementation of such an algorithm would show how to use the different pieces of API together, which does seem like the kind of thing I would expect to see in documentation.

Edit: or maybe a better example would be printing out a decimal representation of a StaticBigInt. That would avoid tempting the reader to adopt the sample code as a basis for their own BigInt implementation, freeing the sample up to demonstrate the APIs without concern for performance.

benrimmington · August 3, 2022, 6:37am

Word-sized types are preferred, but examples/tests for UInt32 and UInt64 elements are easier, because they don't need to account for different architectures.

Multi-word storage is also verbose. For example:

#if arch(i386) || arch(arm) || arch(arm64_32) || arch(wasm32) || arch(powerpc)
  public typealias Vector = SIMD8<UInt>
#elseif arch(x86_64) || arch(arm64) || arch(powerpc64) || arch(powerpc64le) || arch(s390x)
  public typealias Vector = SIMD4<UInt>
#endif

Perhaps, but duplicating the documentation comment is a pain. I was hoping that a generic subscript with @_specialize(exported: true, where Element: _Trivial) might have the same performance (and constant evaluation) as a concrete overload.

The error (and fixit) are unhelpful:

let f = StaticBigInt(+42)
// error: missing argument label 'integerLiteral:' in call

There are issues for other types, but they're mostly hidden by conversion initializers:

let g = Int8(+42)  // OK

let h = UInt64(+0xFFFF_FFFF_FFFF_FFFF)
// error: integer literal '18446744073709551615' overflows
//        when stored into 'Int'

That only works if called via StaticBigInt.init(+42) rather than StaticBigInt(+42), to avoid the SE-0213 behavior. It would only be needed when StaticBigInt is used directly. If used as an ExpressibleByIntegerLiteral associated type, then the conforming type will either provide their own + operator function, or use the default implementation from AdditiveArithmetic.

benrimmington · August 3, 2022, 7:16am

The implementation of StaticBigInt: CustomDebugStringConvertible creates a hexadecimal string, but it wouldn't make a good documentation example.

Converting to a decimal string would require quotientAndRemainder(dividingBy: 10), or in other words a mutable BigInt: SignedInteger type.

The proposed solution has an example with overflow checking, followed by copying of the infinitely-sign-extended words.

If your type stores sign-and-magnitude rather than two's complement, then you'll have to negate a negative value. (This is also needed for the hexadecimal string representation.)

xwu · August 3, 2022, 12:09pm

Is that so? My read of the error is that prefix + “exempts” the expression from SE-0213, causing the root of the problem. Is there really no change to the error that arises if you add a concrete init(_: StaticBigInt) trampoline?

~~I think the same workaround is applicable for these.~~ [A concrete overload to try to work around this problem doesn’t work because it’s never selected over the generic one that causes the literal to be inferred to have the default literal type, but I’m hoping the same issue may not affect the StaticBigInt case where it complains of lacking an appropriate overload?]

That said, this is one symptom of a general issue with operators, literals, and generics which rears its ugly head when you write x == ~0 in the generic context, in which case you are inadvertently comparing your bit pattern to (~0 as Int) regardless of the type of x. (Outside the generic context, we already have concrete overloads in the same vein as what I’m suggesting to mask this problem.)

benrimmington · August 3, 2022, 3:52pm

I was mistaken earlier. Your suggestion does appear to fix the error (but only when StaticBigInt is used directly as a stand-alone type, not as an associated type). I think the issue still needs to be fixed at the language level, for all integer and floating-point literals.

ksluder · August 3, 2022, 5:06pm

I think I understand why I was initially struck by a desire for additional documentation: the term values is at odds with the term “place value” that is commonly taught in elementary school.

According to the elementary school definition, the place values of -123 are -3×1, -2×10, and -1×100. Extending this to hex, the place values of the number -0xABCD are -D×1, -C×10, -B×100, and -A×1000. But in this API, the (UInt8) elements of StaticBigInt(-0xABCD).values are [32 /* ×11 */, 54 /* ×1100 */, FF /* ×110000 */, …].

The elements of values aren’t actually related to the value of the StaticBigInt at all; it’s their bit pattern which is related. Integer initializers in the standard library use the term bitPattern consistently; I think bitPatterns, bitGroupings, or even the original term words would be a better name for this property than values. (words is probably only a good idea if the generic subscript is changed to returning UInt.)

benrimmington · August 3, 2022, 6:08pm

There isn't a values property, so I think you've misread the example code.

The words property was replaced by a subscript, because the "infinitely-sign-extended" model can't be implemented as a finite Words collection.

It then became a generic subscript, because I wanted the convenience of choosing between UInt, UInt32, and UInt64 elements. (UInt8, UInt16, and potentially Swift.UInt128 elements are also supported.)

ksluder · August 3, 2022, 6:15pm

Oh gosh, you’re absolutely right. This is the example that got stuck in my memory as .values:

let value: StaticBigInt = -0x8000000000000000_0000000000000000
  ///     value.bitWidth        //-> 128
  ///     value[0] as UInt64    //-> 0x0000000000000000
  ///     value[1] as UInt64    //-> 0x8000000000000000
  ///     value[2] as UInt64

Can it be modeled as an infinite multi-pass sequence that also happens to have a subscript operator? It feels strange to index into a StaticBigInt as if it were itself a collection.

beccadax · August 3, 2022, 7:30pm

allevato:

Should we change the signature of this to be:
public subscript<Element>(_ index: Int, as type: Element.Type = Element.self) -> Element
so that users have a choice in writing this instead?
let value: StaticBigInt = ...
_ = value[0] as UInt64
_ = value[0, as: UInt64.self]

I’m actually thinking we should require the as: parameter. The closest analogies to this API that I can think of are pointer APIs like UnsafeRawPointer.load(fromByteOffset:as:); these require a type-pinning parameter not for any technical reason, but purely for clarity.

An alternative might be to make StaticBigInt generic on some underlying unsigned integer type, and then make it a Collection with that element type:

extension UInt256: ExpressibleByIntegerLiteral {

  public init(integerLiteral value: StaticBigInt<UInt64>) {
    precondition(
      value.signum() >= 0 && value.bitWidth <= Self.bitWidth + 1,
      "integer literal '\(value)' overflows when stored into '\(Self.self)'"
    )
    self.words = Words()
    for index in value.indices {
      self.words[index] = value[index]
    }
  }
}

This would make the type and indexing a little less squishy, but I’m not sure the added complexity of a generic type is worth it.

While the subscript typing could be improved, I don’t think it’s a deal-breaker, and the proposal otherwise looks pretty solid to me.

ksluder · August 3, 2022, 7:56pm

Also, it would be nice to be able to pass an existential directly as an argument, rather than having to create a local function that just unwraps it into a type parameter so it can be used with value[n] as T.

benrimmington · August 3, 2022, 8:43pm

SIMD types have subscripts, but they're not collections (or sequences).
BigInt wouldn't want an infinite sequence, and other types may not want the sign bit, so indexing depends on the use case.
Collection operations may produce subsequences or lazy wrappers, whereas Sequence operations may unexpectedly produce arrays.
https://belkadan.com/blog/2021/08/Swift-Regret-Sequence/

In which case, should both arguments be labeled? Would a method be preferred?

The problem with making it a collection is that we can't subscript past endIndex, in order to copy the sign extension. So we'd have to pre- or post-initialize the rest of the elements (if your example was a signed Int256 type).

Nickolas_Pohilets · August 3, 2022, 9:38pm

Would BigInt need additional compiler support to be able to be initialized with a pointer to immortal data in const section? Or StaticBigInt provides enough compiler/runtime support to implement BigInt in pure Swift code?

Is StaticBigInt binary compatible with immortal array buffer of words that BigInt could use? Would construction of BigInt from StaticBigInt require heap allocation and deep copy?

BTW, do array literals already use immortal buffers?

xwu · August 3, 2022, 9:51pm

Or, we make this subscript non-generic, returning UInt only, that being the preferred type anyway. The performance pitfall that can occur with using types of narrower bit width, and the lack of performance benefit (as I understand it) for this more complicated API for types of wider bit width, along with trivial composability, argue against the generic subscript bearing its own weight particularly if it requires these gymnastics to design.