StaticBigInt

benrimmington · January 13, 2022, 8:25pm

The current ABI looks similar to the _BitInt Clang ABI. One difference is that a "signed _BitInt must be at least two bits wide," while Builtin.IntLiteral only needs one bit for values -1 and 0.

Please could we discuss some design changes. The proposed numeric APIs are:

extension StaticBigInt {

  public func signum() -> Int {
    Bool(Builtin.isNegative_IntLiteral(_value)) ? -1 : (bitWidth == 1) ? 0 : +1
  }

  public var bitWidth: Int {
    Int(Builtin.bitWidth_IntLiteral(_value))
  }

  public var words: UnsafeBufferPointer<UInt> {
    let start = UnsafePointer<UInt>(Builtin.startOfWords_IntLiteral(_value))
    let count = (bitWidth + UInt.bitWidth - 1) / UInt.bitWidth
    return UnsafeBufferPointer<UInt>(start: start, count: count)
  }
}

I think overflow diagnostics for fixed-width integers would only need the signum() and bitWidth APIs, which can hopefully be evaluated at compile time. Can we rely on always having those flags? Would something like the experimental #assert be used?

If the words property returns a nested Words type, should we have builtin functions for both the count and subscript of a random-access collection? Will the stored chunk type be UInt on all platforms?

John_McCall · January 13, 2022, 8:39pm

I think you're right that overflow diagnostics for fixed-width integers only need signum and bitWidth. If we want to do more compile-time evaluation than that, though, it might be tricky to have an API built around the use of an unsafe pointer, which generally won't support constant-evaluation.

We can easily support both constant and dynamic evaluation of an operation like Builtin.literalWordAtIndex(_: Builtin.IntLiteral, _: Int) -> UInt. I'm happy to leave it to you how best to surface that as actual API.

xwu · January 17, 2022, 5:45am

Sure, we can discuss the bikeshedding part sometime; that wasn't entirely the point I was trying to make.

Rather, I was trying to raise the point that this type (however it is named) could be the first of possibly a whole family of types which cannot—and, moreover, we deliberately do not desire to—conform to numeric protocols. For example, as I mentioned above, IEEE 754 defines families of "interchange formats" to "support the exchange of floating-point data" that don't also have to be supported "arithmetic formats."

It would be nice™, therefore, if we could have a term (maybe not "interchange") to describe this family of numeric types as distinct from the standard types we support, so that it is clear—as in your answer to @Saklad5, quoted below—that it is not some design shortcoming or temporary implementation limit that causes the numeric APIs not to be there, but rather that it's inherent to the purpose for which the type exists.

My claim is that the "non-arithmetic interchange-ness" of StaticBigInt distinguishes this type from Int along a distinct axis from its static-ness (à la StaticString), which I agree also distinguishes this type from Int. I am less interested in debating the precise name of the type at this point, or even whether the "static-ness" of the type is more salient to its name than the "non-arithmetic interchange-ness," only to put it out there that there is this second axis along which the proposed type here is distinct from Int.

xwu · January 17, 2022, 6:00am

taylorswift:

one problem we will have is that the following will probably end up being legal by oversight:
0xffp-2 as Decimal<UInt16>
and we will have to figure out a way to ban it so people always write Decimal literals in decimal base.

This is getting away from the pitch at hand, I think, but I don't understand why you've come to this conclusion. We don't ban people from expressing binary floating-point values with literals in decimal base, and it is not clear to me why we "have to" ban people from expressing decimal floating-point values with literals in hexadecimal or binary base.

In fact, a long time ago now but on this very list, we clarified that what the floating-point types in Swift model the uncountably many real numbers, not the countably many exactly representable values. Note how Int(x) traps when x is not representable as an integer, but Double(x) does not trap when x cannot be represented exactly. Many IEEE arithmetic operations, moreover, have semantics that demand notionally infinite precision that is rounded at the last step.

To remain consistent with the existing design, I'm in fact strongly of the opinion that it's actively undesirable to ban people from expressing decimal floating-point values in other bases. After all, for a Decimal type, if users need to know that their literal value is represented exactly without rounding, they can invoke init(exactly:).

Saklad5 · January 17, 2022, 1:29pm

That’s exactly what I consider a design shortcoming: it is unreasonable to conflate those “numeric APIs” with being numeric. These APIs shouldn’t be there, and the relevant protocols shouldn’t require them. The types are fine, it’s the protocols they can’t use that are flawed.

In other words, a protocol named FloatingPoint should not require arithmetic support, as there are well-established examples of floating-point types that do not have them.

Protocol inheritance should be reserved for cases where all requirements in the inherited protocol are relevant to the inheriting protocol. Furthermore, protocols should not have unrelated requirements: if there’s a scenario that calls for “partial conformance,” that indicates that the protocol should have been split up such that you can express that directly.

xwu · January 17, 2022, 3:08pm

I’m not sure what you mean; the type pitched here—and similar types like it meant for interchange—are not meant for use in numeric operations. That is the point that I’m stressing here: the protocols aren’t flawed; rather, these types aren’t intended to conform to them.

Saklad5 · January 17, 2022, 3:09pm

The description of FloatingPoint is:

A floating-point numeric type.

It should not require support for arithmetic operations in the first place. Interchange types are floating-point numeric types, but do not support those operations.

xwu · January 17, 2022, 3:11pm

My point is that, by construction, they wouldn’t be numeric types. We choose to define numeric types as those that support numeric operations. And I’m suggesting that we clarify this point with a term (maybe “interchange,” maybe not) that indicates this concept: a type used to represent numeric values that isn’t (necessarily) a “numeric” type.

I think maybe ultimately you’re getting at the same point, which is that there should be a concept in the language for all such types which share common requirements—namely, that they store a numeric value. Whether there should be a named protocol for such a concept is another matter, which hinges on whether there are useful algorithms that could be made generic over all such types.

Saklad5 · January 17, 2022, 3:24pm

My point is that it doesn’t make sense to define “numeric” that way, and the disconnect leads to all of these problems.

As is, Numeric can be described as three unrelated capabilities: the potential to be initialized exactly from a type conforming to BinaryInteger, support for the modulus operation, and support for multiplication.

I don’t think the average person or mathematicians define “numeric” that way, nor do I think they should be all or nothing (that is, a single protocol) in terms of conformance.

Exactly. As for whether useful algorithms could be made generic over a protocol, consider that it need not be useful in isolation. Even marker protocols can provide valuable information for an interface: I doubt many people use Sendable alone as a constraint, and it obviously doesn’t add any functionality. Sure, it’ll be required by the compiler in the future, but simply being able to explicitly require that guarantee now is invaluable.

I can’t really conceive of a good description for a protocol called Numeric, though: storing “numeric values” is a recursive definition. That’s probably for the best, as it’s easier to deprecate it in a future language mode than it is to change meaning and keep the name.

A protocol for having a binary representation (maybe just a typealiased specialization of RawRepresentable), combined with a protocol for representing an IEEE 754 floating point, would be much easier to work with. Arithmetic support could be implemented in distinct (not floating-point specific) protocols, then protocol composition and a typealias (or an actual protocol with inheritance if there are additional requirements specific to them) could be used to describe the arithmetic formats.

xwu · January 17, 2022, 4:17pm

Well, I don’t agree; and in any case, disagreement is immaterial as the ship has sailed.

Salient to this pitch is how we distinguish types that aren’t Numeric but store numeric values from those that are—redesigning the hierarchy of numeric protocols in ways already rejected in the past isn’t one of the options.

scanon · January 17, 2022, 4:21pm

"FloatingPoint" the protocol is shorthand for "IEEE 754 arithmetic format", which requires:

Arithmetic operations (add, subtract, multiply, divide, square root, fused multiply–add)
Conversions (between formats, to and from strings, etc.)
Scaling and (for decimal) quantizing.
Copying and manipulating the sign (abs, negate, etc.)
Comparisons and total ordering
Classification and testing for NaNs, etc.
...

(All the required clause 5 operations).

For that matter, "numeric" in that sentence references the Numeric protocol, which also requires a number of arithmetic operations. If/when we added a protocol for non-arithmetic formats, we would make that explicit (FloatingPointStorageFormat or whatever), as that's by far the less useful thing to have a name for.

scanon · January 17, 2022, 4:23pm

Numeric as it exists in the standard library binds roughly to the notion of a ring. It's not intended to be a precise mathematical abstraction; there are some compromises to how "normal people" needed to work with the protocols. If we were going to redo it, we might tweak some things, but such a change would be massively source and binary breaking, so here we are, let's move on.

(Note that the far, far bigger issues with the numeric protocol hierarchy are (a) the existence of magnitude on Numeric (b) Stridable's stride type conforming to Numeric and (c) the way signed- and unsignedness work in the integer protocols. The relatively minor quibbles with floating-point don't even register, and we definitely will not break source or binary compatibility for them.)

Saklad5 · January 17, 2022, 4:25pm

In that case the documentation simply needs to be updated to specify that.

OK.

Could we introduce a new protocol for requirements shared by both arithmetic and interchange formats, move said requirements from FloatingPoint into this protocol, then make FloatingPoint inherit them?

scanon · January 17, 2022, 4:27pm

Not with the existing compiler (this is currently an ABI-breaking change).

You can define a protocol for storage formats, but you can't move operations from FloatingPoint onto that protocol.

Saklad5 · January 17, 2022, 4:28pm

What if they were duplicated outright instead of being moved? That’d be functionally equivalent, right?

scanon · January 17, 2022, 4:29pm

The change to make FloatingPoint refine the new protocol is a breaking change (because existing types conforming to FloatingPoint in compiled binaries will not have the witness for it).

xwu · January 17, 2022, 4:30pm

No, it is not possible to add protocols above the existing hierarchy in an ABI-stable way. If there is a demonstrated use case for algorithms generic over storage-only formats and arithmetic formats, then separate protocols can be made with duplicative requirements that existing types can also conform to. It remains to be seen if there is such a need—as @scanon alludes to above.

To be clear, it is not and has never been the goal to dice up numeric protocols (or any other protocol hierarchy) into the smallest quantums of shared functionality; flexibility and usability are balanced against each other and the protocols that exist reflect what’s judged to be the most useful groupings of APIs.

Saklad5 · January 17, 2022, 4:40pm

Got it, thanks for explaining.

Is there a formal resource for what constitutes an ABI break? I’ve tried poking around the repository, but most of the resources I find are old manifestos for ABI stability and calling conventions.

I feel that made more sense before Swift 3 added the ability to make type aliases for protocol composition. The smallest meaningful quantums of functionality can be composed into larger groupings that actually see use. You cannot do the reverse.

benrimmington · January 17, 2022, 6:48pm

Double has init(bitPattern:) and bitPattern APIs, where the "binary interchange format" is stored as UInt64. I presume that "decimal interchange formats" could also use unsigned integers: UInt32 for Decimal32, etc. Are you referring to these formats, or something else?

xwu · January 17, 2022, 7:04pm

No, something else. I mean that the IEEE standard contemplates implementations supporting a Float128 type, for instance (or Float256, etc., or Float192 for that matter), that can store 128-bit floating-point data losslessly without necessarily offering floating-point computation. These types wouldn’t be just a sequence of bits (which is what UInt64 is to Double); they would offer at minimum conversions to and from other binary floating-point types, and the standard further specifies that it “should” (but not “shall”) provide certain other non-computational operations such as isFinite and isNaN. (Notably, floating-point comparison of values is not required.) For these purposes, Double is its own IEEE interchange format.

As I said above, there are some commonalities here where StaticBigInt can provide some of the non-computational numeric APIs—it’s not just a sequence of words like UnsafeMutablePointer<UInt>—but not all.

If we were to decide that being able to store 128-bit floating-point values is important without necessarily having 128-bit floating-point operations, we might want to make the type something other than bare Float128—which would make the type seem defective since it can’t do math. If, say, we namespaced it as Interchange.Float128, then the distinction is clear. How (or if, but I think so) the design and naming of StaticBigInt could facilitate or save room for a family of such types (or at least not make future additions inconsistent or awkward) is the thought I want to raise.