StaticBigInt

benrimmington · January 10, 2022, 5:24pm

StaticBigInt

Status: Pitch
Implementation: apple/swift#40722

Introduction

Arbitrary-precision integer literals were implemented in Swift 5.0, but they're currently unavailable to types outside of the standard library. This proposal would productize them into a StaticBigInt type (reminiscent of the StaticString type).

Motivation

There are two compiler-intrinsic protocols for integer literals.

public protocol _ExpressibleByBuiltinIntegerLiteral {
    init(_builtinIntegerLiteral value: Builtin.IntLiteral)
}

public protocol ExpressibleByIntegerLiteral {
    associatedtype IntegerLiteralType: _ExpressibleByBuiltinIntegerLiteral
    init(integerLiteral value: IntegerLiteralType)
}

All integer (and floating-point) types in the standard library conform to both protocols.
Types outside of the standard library can only conform to the second protocol.
Therefore, the associated IntegerLiteralType must be a standard library type.

For example, if larger fixed-width integers (such as UInt256) were added to the Swift Numerics package, they would currently have to use smaller literals (such as UInt64).

let value: UInt256 = 0x1_0000_0000_0000_0000
//                   ^
// error: integer literal '18446744073709551616' overflows when stored into 'UInt256'

Proposed solution

Swift Numerics could (perhaps conditionally) use StaticBigInt as an associated type.

extension UInt256: ExpressibleByIntegerLiteral {

#if compiler(>=9999) && COMPILATION_CONDITION
    public typealias IntegerLiteralType = StaticBigInt
#else
    public typealias IntegerLiteralType = UInt64
#endif

    public init(integerLiteral value: IntegerLiteralType) {
        precondition(
            value.signum() >= 0 && value.bitWidth <= 257,
            "integer literal overflows when stored into 'UInt256'"
        )
        // Copy the elements of `value.words` into this instance.
        // Avoid numeric APIs that may trigger infinite recursion.
    }
}

Overflow would be a runtime error, unless compile-time evaluation can be used in the future.

Detailed design

StaticBigInt is an immutable arbitrary-precision signed integer. It can't conform to any numeric protocols, but it does implement some numeric APIs: signum(), bitWidth, and words.

public struct StaticBigInt: Sendable, _ExpressibleByBuiltinIntegerLiteral {
    public init(_builtinIntegerLiteral value: Builtin.IntLiteral)
}

// `Self` Literals
extension StaticBigInt: ExpressibleByIntegerLiteral {
    public init(integerLiteral value: Self)
    public static prefix func + (_ rhs: Self) -> Self
}

// Numeric APIs
extension StaticBigInt {
    public func signum() -> Int
    public var bitWidth: Int { get }
    public var words: Words { get }
}

// Collection APIs
extension StaticBigInt {
    public struct Words: RandomAccessCollection, Sendable {
        public typealias Element = UInt
        public typealias Index = Int
    }
}

Alternatives considered

A mutable BigInt: SignedInteger type could (eventually) be implemented in the standard library.

Future directions

StaticBigInt (or a similar type) might be useful for auto-generated constant data, if we also had multiline integer literals.

let _swift_stdlib_graphemeBreakProperties: StaticBigInt = (((0x_
    0x____________________________3DEE0100_0FEE0080_2BEE0020_03EE0000_B701F947_ // 620...616
    0x_8121F93C_85C1F90C_8A21F8AE_80E1F888_80A1F85A_80E1F848_8061F80C_8541F7D5_ // 615...608
    /* [74 lines redacted] */
    0x_2280064B_0000061C_21400610_40A00600_200005C7_202005C4_202005C1_200005BF_ // 15...8
    0x_25800591_20C00483_2DE00300_800000AE_000000AD_800000A9_0400007F_03E00000_ // 7...0
)))

Acknowledgments

John McCall implemented arbitrary-precision integer literals (in Swift 5.0).

StaticBigInt is a thin wrapper around the existing Builtin.IntLiteral type.

xwu · January 10, 2022, 7:38pm

This is fantastic and would enable some degree of parity for third-party libraries.

One piece of feedback:

It’s likely we’ll eventually want to be able to fix (extend, more likely) floating-point protocols so that arbitrary-precision float literals are possible too. I recall there are also a set of floating-point types in IEEE standards (I forget the term of art here) that are meant only for storage and, like StaticBigInt, don’t provide many—or possibly any—mathematical operations (but unlike StaticBigInt are not restricted to be immutable). I would also not be surprised if Swift eventually acquires some of these down the line.

Therefore, I’ve often thought it’d be nice to have some terminology for these kinds of numeric types and a consistent naming scheme for them. “Static” is good here but not all storage-but-not-math numeric types will have the restriction that values are immutable; crucially, though, there’s nothing about “static” that clearly conveys the notion that it’s a type meant for storing a numeric value but not for computation. I think it would be cool if we could come up with such a term (or even protocol) with StaticBigInt suitably named to be the first of its kind.

taylorswift · January 10, 2022, 8:57pm

this is a good start, but it would be better if it took into account Decimal-related future directions that would be unblocked by this. (i’ve been sitting on a proposal for that for a while now.)

in the oft chance that the train has not yet left the station, i would note:

the current design where we have literals of multiple numeric bases map to the same literal-initialization pathways is bad.
magic protocols like StringProtocol and UnicodeCodec that only exist to express a static union type are a form of technical debt, which is also bad.
compiler-checked overflow errors do not scale well, and sooner or later we will have to abandon this idea. the StaticBigInt pitched here is only needed because we wanted to implement compile-time overflow errors in the past, and yet does not actually enable us to diagnose any compile-time overflow errors.
we are sorely missing an IntegerLiteralProtocol, whose home _ExpressibleByBuiltinIntegerLiteral is currently occupying.
such an IntegerLiteralProtocol should probably require a [UInt] array of words (like the pitched StaticBigInt), and for reasons that will soon become apparent when we try to design decimal types, a var places:Int { get } which measures the lexical digit count (100 != 0100), and an associatedtype Base:IntegerLiteralBase.
Base can be an enum.
Base would allow us to keep all the compile-time overflow checking we have now, without forcing us to accrue more technical debt every time we try to expand the language’s literals functionality.

John_McCall · January 11, 2022, 3:18am

Hmm. Guaranteeing that this is an array — and therefore also the count and order of the array elements — really ties this to the details of Swift's ABI. It would not be implementable to create one of these from generic code under a different ABI, and that seems like a line we shouldn't cross. Do you think it's especially important to expose this as an array (e.g. to allow the efficient creation of BigInts from a StaticBigInt), or can we reasonably abstract access in some way?

Sajjon · January 11, 2022, 9:24am

Care to expand a bit on this? Do you mean that you see runtime-checked errors as a scalabe solution? I think being able to capture overflow errors at compile time is an amazing and important feature that protects be from lots of silly bugs.

But you have identified it as non scalable? Do you mean that it is cumbersome to implement these checks? Or that they lead to bad performance?

benrimmington · January 11, 2022, 3:17pm

The proposed type seems closest to StaticString — both provide low-level access to the representation of a literal value. However, we can discuss other names, if there are any suggestions.

I haven't seen your proposal, so I can't imagine how this will unblock it.

It didn't occur to me that the ABI for this feature might change on other platforms.

A nested Words type could implement withContiguousStorageIfAvailable(_:).
A generic view type with UInt{,8,16,32,64} elements might be useful in the Proposed solution and Future directions examples.

Saklad5 · January 12, 2022, 4:35pm

I feel this is another illustration of why the numeric protocols need to be overhauled. They impose an irrational hierarchy through unnecessary protocol inheritance, making them unusable outside narrow (albeit common) use cases.

benrimmington · January 12, 2022, 6:29pm

I don't think that's true in this case. Most of the numeric APIs (regardless of their place in the protocol hierarchy) can't be implemented by StaticBigInt. The following are the only other exceptions AFAIK.

// Not proposed.
extension StaticBigInt {
    public static var isSigned: Bool { get }

    public static var zero: Self { get }
    public init()

    public var leadingZeroBitCount: Int { get }
    public var nonzeroBitCount: Int { get }
    public var trailingZeroBitCount: Int { get }
}

The leadingZeroBitCount and nonzeroBitCount are from FixedWidthInteger, so they might not be appropriate anyway.

The proposed signum() returns Int rather than Self. A previous design had relational operators (removed in commit 1286e0a).

taylorswift · January 12, 2022, 9:01pm

i worded this badly. what i was referring to is the current one-to-one association between cases of _ExpressibleByBuiltinIntegerLiteral and the compile-time overflow checks.

as UInt8 // statically asserts 0 ... 255
as Int8 // statically asserts -128 ... 127 
... 
as Int // statically asserts Int.min ... Int.max

this is what does not scale well, because we need to add a new integer type to the standard library for every static assertion.

StaticBigInt leads to Decimal. (or at least, i really hope we design it in a way that can.) one problem we will have is that the following will probably end up being legal by oversight:

0xffp-2 as Decimal<UInt16>

and we will have to figure out a way to ban it so people always write Decimal literals in decimal base. if we don’t get ourselves off of _ExpressibleByBuiltin{Integer/Float}Literal, we’ll probably end up having to do one of:

making Decimal literals a weird subcase of floating point literals that look syntactically identical but are viewed by the compiler as two completely different things, kind of like the current mess with Unicode.Scalar, Character, and String literals.
punting this into a @_decimalBaseOnly attribute, since that seems to be how we solve all our problems these days

historically, #1 sounds more likely to me, and this is the path that will lead to StaticBigFloat and StaticBigDecimal, and no way to reconcile them.

as a more nitty-gritty matter, a user-facing Decimal type will probably be generic over backing storage, and option #1 will force us into uncomfortable conditional conformances over _ExpressibleByBuiltinIntegerLiteral.

extension Float:_ExpressibleByBuiltinFloatLiteral {}
extension Double:_ExpressibleByBuiltinFloatLiteral {}
extension Float80:_ExpressibleByBuiltinFloatLiteral {}
extension StaticBigFloat:_ExpressibleByBuiltinFloatLiteral {}
extension StaticBigDecimal:_ExpressibleByBuiltinFloatLiteral {}

extension Decimal:_ExpressibleByBuiltinFloatLiteral 
    where Storage:_ExpressibleByBuiltinIntegerLiteral // ???
{}

we could probably make this work, but experience from the SIMD APIs suggests this will be complicated and doesn’t seem to match what we are trying to model very well. it will also force us to bake a Decimal type into the standard library very early on in the process, whereas i would rather see it prototyped in a preview module before promoting it to the standard library proper.

sorry if i am derailing the original pitch with Decimal-related talk, but BigInts are very important for Decimals so it would be better to get a Decimal-friendly API in place while we are still sketching out StaticBigInt.

John_McCall · January 12, 2022, 10:21pm

In order to get static literal overflow diagnostics for a type, we inherently need static information about that type at a point where we know the specific literal. That implies being able to do some sort of general simulated computation on the literal value. To do this on a user-defined type, we will need general features for constant evaluation and static diagnostic emission based on the results. (Swift already has the framework for such a feature; it's just driven by some underscored attributes. We don't hard-code Int8 in the compiler.)

Assuming we have that general feature, Ben's proposal lets us trigger it for an arbitrary integer literal, because you can do the constant evaluation and static diagnostics in an init(integerLiteral:) that takes a StaticBigInt.

taylorswift · January 12, 2022, 10:53pm

if this is the end goal, what would be stopping us from getting rid of IntegerLiteralType and just feeding everyone a StaticBigInt? going even further, why not just make the source code string of the literal available (as UTF-8) to the Expressible implementation?

what happens if the ExpressibleByIntegerLiteral type lives in a different module? will downstream users experience source breakage if a library author marks an init(integerLiteral:) @inlinable?

John_McCall · January 12, 2022, 10:56pm

With that said, if StaticBigInt only exposes the underlying data as an UnsafeBufferPointer, that will probably interfere with constant evaluation. That might be another argument for providing a more "ABI-neutral" access to the data, like a method to return the kth least significant word.

John_McCall · January 12, 2022, 11:02pm

If a resilient type has stable overflow conditions, it can put those conditions in an inlinable initializer that delegates to a non-inlinable initializer that does the actual initialization. If it doesn't have stable overflow conditions, it probably shouldn't be triggering diagnostics in out-of-module clients.

taylorswift · January 12, 2022, 11:06pm

i meant in the more practical sense that library authors often release 1.0.0 with no @inlinables, and then add it in where there is a performance problem, since you can go from non-inlinable to inlinable, but you can’t go back. this sounds like it would make the decision to not include @inlinable also a “long term implications” decision, which is the opposite of resilient.

if we just do the simple thing and expose a [UInt] array, how much of a performance hit are we talking about?

John_McCall · January 12, 2022, 11:13pm

It would make a lot of source details unexpectedly semantically important, like whether a number is written as 65536 or 0x10000 or 0x1_0000.
It's really hard to imagine a problem that would be solved by preserving this distinction that wouldn't be better solved in some higher-level way (for integers, at least).
Generic code that constructs an opaque type from a literal would have to actually parse the integer literal at runtime, which is comparatively expensive.
Any future improvements to how you can write integer literals would potentially need deployment restrictions, which many programmers would rightly see as ridiculous.

taylorswift · January 12, 2022, 11:15pm

can an @inlinable initializer opt-out of the compile time validation? if someone has unreachable code somewhere that’s overflowing an initializer, they shouldn’t have to lint to upgrade to the latest library version.

John_McCall · January 12, 2022, 11:22pm

The trade-off you're talking about here is inherent to the problem, not to its solutions. Either the library statically diagnoses overflows in its clients or it does not. The former will be opt-in, through one mechanism or another, and libraries are unlikely to do so in a 1.0 release.

With that said, as a general rule, Swift has been fairly aggressive about adding diagnostics for source constructs that will literally always trap, and I don't think it would be unreasonable for a library to opt in to such diagnostics in a later release.

Well, it would generally require allocating an array dynamically, copying a bunch of values into it, and then throwing that array away when the operation is done because approximately 0% of integer types will internally store a [UInt], including BigInt.

taylorswift · January 12, 2022, 11:28pm

yes, this only matters for Decimals.

for what it’s worth, a lot of people are already doing this with their own BigInt/Decimal APIs. it’s either that or mental exponents.

let x:Decimal = .init(units: 3_14, places: 2), 
    y:Decimal = "3.14"

well, we would only need one array per literal, no matter how many times that code path gets hit, right? we don’t need to throw it away and recreate it every time.

i could see this being a problem for some heavily gybbed package plugins, but i just don’t think most codebases contain so many bigint literals that this would be a problem in practice.

John_McCall · January 12, 2022, 11:37pm

I’ve been hesitant to bring this up, but we do already have an ABI for constructing integer literals that we’re happy with and that generally aligns with Ben’s proposal. We’re not going to make that ABI worse just because it’d arguably be “simpler” according to specific metrics we don’t care about. The ABI prioritizes the things which are actually important, namely, efficiently testing for overflow while constructing a value of a fixed-width binary integer type, without burdening code and binary sizes with additional, irrelevant information.

taylorswift · January 12, 2022, 11:42pm

i don’t have a problem with the pointer-based ABI at all, i only brought up [UInt] as an alternative to this

if the constant evaluation works with the pointer-based ABI, there’s really no reason to not go with that.