# Integers and their protocols

(Xiaodi Wu) #1

Hi all,

I promised some time ago to provide some thoughts based on the experience
of using the newly revised integer protocols. I've finally synthesized
these thoughts into one document. It is somewhat long, and for that I
apologize. I hope that it is at least clear in terms of how it lays out the
issues.

Text:

Notes on the user experience of new integer protocols

Xiaodi Wu
June 17, 2017
Introduction

The design, re-design, and implementation of SE-0104
<https://github.com/apple/swift-evolution/blob/master/proposals/0104-improved-integers.md>,
a proposal to revise integer protocols in Swift, is now largely complete in
shipping previews of Swift 4. As an exercise, I have used the new APIs to
develop a set of additional numeric facilities
<https://github.com/xwu/NumericAnnex>. Here are some insights gained from
that experience--and from two unpublished exercises to implement a BigInt
type (one from scratch, one wrapping GMP)--as well as suggestions for
improvement.

1. Performing heterogeneous comparison with integer literals
2. Conforming to _ExpressibleByBuiltinIntegerLiteral
3. Overriding implementations of heterogeneous comparison and bit shift
operators
4. Defining the semantics of masking shifts for arbitrary-precision
integers
5. Using ArithmeticOverflow
6. Initializing integers from a floating-point source

comparison with integer literals

SE-0104 added heterogeneous comparison and bit shift operators to the
language to improve the user experience (for example, you can now check if
an Int value is equal to a UInt value).1

1 Similar enhancements for operations such as addition are not yet possible
because a design is lacking for how to express *promotion*. When (if) Swift
allows integer constants in generic constraints, this may become a more
realistic prospect as promotion would then be expressible using generic
constraints (e.g. func + <T, U>(lhs: T, rhs: U) -> U where T :
FixedWidthInteger, U : FixedWidthInteger, T.BitWidth < U.BitWidth). This is
meant to be an aside and is certainly outside the scope of correcting or
incrementally improving upon SE-0104.

These operators behave as intended with concrete types, but comparisons in
generic algorithms behave differently. This was encountered during review
of the standard library's implementation of DoubleWidth, which in fact had
a bug as a consequence of the following behavior:

func f() -> Bool {
return UInt.max == ~0
}
func g() -> Bool {
return UInt.max == .max
}
func h<T : FixedWidthInteger>(_: T.Type) -> Bool {
return T.max == ~0
}
func i<T : FixedWidthInteger>(_: T.Type) -> Bool {
return T.max == .max
}
f() // Returns `true`.g() // Returns
`true`.h(UInt.self) // Returns `false`.i(UInt.self) // Returns `true`.

The reason that h(_ gives a surprising result is explained as follows:

···

-

Each concrete integer type implements its own overload of the
homogeneous comparison operator, whereas protocol extension methods
implement heterogeneous comparison operators. When an integer literal is
used on the right-hand side of the comparison, the compiler looks first to
the concrete type for an suitable implementation of the operator and finds
the homogeneous overload. Therefore, it does *not* traverse the protocol
hierarchy and instead infers the literal to be of the same type as the
left-hand side.
-

In generic code, *even if* the most refined protocol implements its own
overload of the homogeneous comparison operator, the compiler will look for
all overloads of the operator by traversing the entire protocol hierarchy.
Since heterogeneous comparison operators are defined somewhere along the
hierarchy, the compiler will *always* find an overload that accepts the
"preferred" integer literal type (Swift.IntegerLiteralType, aka Int) and
infers the literal to be of type Int.

Therefore, in the invocation h(UInt.self), we are actually comparing
UInt.max to ~(0 as Int). This is a surprising result, as evidenced by the
fact that a bug nearly slipped into the standard library itself.
for improvement

Based on the demonstration that the expression T.max == .max infers the
correct type in both concrete and generic code, I would suggest that type
inference for integer literals be changed based on the following (notional)
rule:

- Any use of an integer literal x should be equivalent to the use of an
implicit member expression .init(integerLiteral: x).

This generally preserves the current behavior that an integer literal will
preferentially be inferred to be of type IntegerLiteralType:

func j(_ x: Int) {
print("Int", x)
}
func j(_ x: UInt) {
print("UInt", x)
}
j(42) // Prints "Int
42".j(.init(integerLiteral: 42)) // Prints "Int 42".

Intriguingly, T.init(integerLiteral: 0) is currently ambiguous where T :
FixedWidthInteger, suggesting that some re-design of the protocol would
also be necessary to have this rule give the appropriate behavior.

func k<T : FixedWidthInteger>(_: T.Type) -> Bool {
return (0 as T) == T.init(integerLiteral: 0)
// error: ambiguous reference to member 'init(integerLiteral:)'}

_ExpressibleByBuiltinIntegerLiteral

For a numeric type modeling rational numbers, it is natural to conform that
type to ExpressibleByIntegerLiteral. For instance, let x = 42 as Rational would
create an instance of Rational<Int> with numerator 42 and denominator 1. A
similar line of reasoning applies to a numeric type modeling complex
numbers (and likely other types as well).

To conform to the protocol ExpressibleByIntegerLiteral, the type must
implement an initializer of the form init(integerLiteral: U). However, U must
conform to an underscored protocol _ExpressibleByBuiltinIntegerLiteral.

For a type such as Rational<T>, where the numerator and denominator are of
type T, it is natural to have this initializer take an integer literal of
type T. Currently, therefore, this requires that T be constrained to an
underscored protocol. This is suboptimal for two reasons: (1)
_ExpressibleByBuiltinIntegerLiteral is an underscored protocol not meant
for public use; (2) it prevents an implementation of Rational<T> where T :
_ExpressibleByBuiltinIntegerLiteral from having, say, arbitrary-precision
integers as numerator and denominator (i.e., Rational<BigInt>) unless the
arbitrary-precision integer type itself conforms to
_ExpressibleByBuiltinIntegerLiteral.
for improvement

From a user perspective, the semantics of ExpressibleByIntegerLiteral suggest

that a generic type that has exclusively stored properties of type T where
T : ExpressibleByIntegerLiteral should be conformable to
ExpressibleByIntegerLiteral by implementing an initializer that accepts a
literal value of type T--without being constrained to any underscored
protocol. I suspect, however, that this is not a trivial detail to
implement in the compiler.
of heterogeneous comparison and bit shift operators

In Swift, there is a distinction between default implementations of
protocol requirements and protocol extension methods which are not
requirements. Both are defined in extensions to protocols, but default
implementations are dynamically dispatched and can be overridden by
conforming types, while extension methods can be shadowed but never
overridden. (For example, homogeneous == is a protocol requirement of
Equatable, but homogeneous != is a protocol extension method that cannot be
overridden.)

The paragraph above is meant to provide some background on existing
features of the language, but it is meant neither to question their
existence in the language nor to question the design of Equatable and
Comparable, which are far outside the scope of improving the revised
integer protocols.

The heterogeneous comparison and bit shift operators introduced in Swift 4
by SE-0104 are protocol extension methods.

Consider a custom BigInt type and the comparison (42 as BigInt) == (21 as
UInt). There is an efficient way to perform that comparison which does not
involve first converting the UInt value to a BigInt value. However, even if
BigInt manually implements this specialized comparison operator, a generic
algorithm implemented for all binary integers (say, for integer
exponentiation) will use the less efficient standard library implementation
of heterogeneous ==. By contrast, the same algorithm will use the most
specialized implementation of homogeneous ==, because that function is a
protocol requirement that is dynamically dispatched.
for improvement

Heterogeneous <, <=, ==, >, >= should be requirements of the protocol on
which they are now extension methods, as should heterogeneous masking
shifts. That would permit conforming types to provide more specialized
implementations. This may, however, increase the work of the type checker
at compile time.
of masking shifts for arbitrary-precision integers

SE-0104 introduced two different types of bit shift operations for
integers, called masking shifts and smart shifts.

Smart shifts, spelled << and >>, are protocol extension methods that always
behave in a well-defined way when overshifting or undershifting. For
example, x << -2 is now equivalent to x >> 2 in Swift 4, where previously
the behavior was undefined.

However, because there is sometimes a performance cost for branches that
handle overshifting and undershifting that cannot be optimized away, Swift
now offers masking shifts, spelled &<< and &>>. As explained in SE-0104,
"[a] masking shift logically preprocesses the right[-]hand operand by
masking its bits to produce a value in the range 0...(x-1) where x is the
number of bits in the left[-]hand operand." These semantics for masking
shifts make sense when the number of bits in the left-hand operand is
fixed, as is the case for all fixed-width integers.

However, *all* binary integers must implement &<< and &>>. For an
arbitrary-width integer type (which, for various reasons, are best
represented as sign-and-magnitude rather than two's-complement), the
literal interpretation of those semantics is problematic. For example, most
users might think that (1 as BigInt) << 1000 should not result in
overshift. However, the number of bits in the left-hand operand is 2, and
therefore a plausible interpretation of the semantics of &<< would have (1
as BigInt) &<< 1000 == 1. By extension, therefore, (1 as BigInt) << 1000 ==
0; this is a counterintuitive result.
for improvement

The semantics of smart shift should be clarified to state that the
right-hand operand is preprocessed by masking its bits to produce a value
in the range 0..<x where x is the maximum number of bits in a value of the
same type as the left-hand operand. This is equivalent to viewing
arbitrary-width integers *for the purposes of bitwise operations* as
notionally infinite sequences of bits sign-extended from the
two's-complement representation of the integral value.

For the purposes of implementation, BinaryInteger may require a new static
property maxBitWidth, which would be equal to bitWidth for fixed-width
integers and could be defined as Int.max for arbitrary-precision integers.
ArithmeticOverflow

With SE-0104, the addWithOverflow family of arithmetic operations are not
longer static. They have been renamed addingReportingOverflow and the like,
and their return type has been changed from (T, Bool) to (partialValue: T,
overflow: ArithmeticOverflow). ArithmeticOverflow is an enum with two
cases, overflow and none. Initially, this may appear to be an improvement

However, in actual use, ArithmeticOverflow is extremely lacking in
ergonomics. Where previously it was possible to destructure the tuple and
evaluate overflow by writing if overflow { ... }, it is now required to use
a much more verbose incantation: if overflow == .overflow { ... }. There is
little else one can do with an ArithmeticOverflow result besides converting
it into a value of type Bool. When working with these operations
repeatedly--and, chances are that if you need such an operation, you don't
need it just once--this quickly becomes very cumbersome.

Inside the standard library, the ArithmeticOverflow values returned from
*ReportingOverflow functions are created by calling an initializer that
takes a Bool argument. Therefore, with every call to a
*ReportingOverflow function,
a Bool is converted to an ArithmeticOverflow, returned, then converted by
the user back into a Bool. Worse, based on comments in the standard
library, it appears that the compiler cannot currently elide these
operations.
for improvement

1.

Change the return type of *ReportingOverflow functions to (partialValue:
T, overflow: Bool) (or, didOverflow) and eliminate the
ArithmeticOverflow type.
In practice, this would require deprecating the current functions and
2.

Since the presence of type inference may cause (1) to be source-breaking
even with proper deprecations, avoid creating a source-breaking change by
naming the revised functions something else. Specifically, take the
opportunity to improve the name of these functions, changing
mutandis* for the remaining functions). This avoids the "-ing -ing"
clash and eliminates the nonsensical interpretation of the phrase "adding,
reporting overflow by 42."

from a floating-point source

The new BinaryInteger protocol has the following requirements:

- init?<T : FloatingPoint>(exactly source: T)
- init<T : FloatingPoint>(_ source: T)

However, in my experience, it does not appear possible (or at least, it is
not apparent even after some careful thought) how to implement these
requirement solely with the properties and methods required by FloatingPoint.
There do, however, exist straightforward ways to convert
BinaryFloatingPoint values
to BinaryInteger values.

In the standard library, concrete integer types implement conversions to
and from concrete built-in floating-point types. However, whether
coincidentally or not, the standard library itself has not yet implemented
these protocol requirements. At last check, the entire implementation of
the first requirement was:

public init?<T : FloatingPoint>(exactly source: T) {
// FIXME(integers): implement fatalError()
}

for improvement

Consider changing the protocol requirement so that T is constrained as
follows: T : BinaryFloatingPoint.

(Ben Rimmington) #2

There are also some commented-out requirements, which were accepted for SE-0067.

(One requirement is obsoleted by the Numeric protocol).

Can these be implemented yet, based on your experience?

FloatingPoint

···

-------------

/* TODO: Implement the following APIs once a revised integer protocol is
introduced that allows for them to be implemented. In particular, we
need to have an "index of most significant bit" operation and "get
absolute value as unsigned type" operation on the Integer protocol.

/// Creates the closest representable value to the given integer.
///
/// - Parameter value: The integer to represent as a floating-point value.
init<Source: Integer>(_ value: Source)

/// Creates a value that exactly represents the given integer.
///
/// If the given integer is outside the representable range of the type, the
/// result is `nil`.
///
/// - Parameter value: The integer to represent as a floating-point value.
init?<Source: Integer>(exactly value: Source)
*/

BinaryFloatingPoint
-------------------

/* TODO: Implement these once it becomes possible to do so (requires revised
Integer protocol).
/// Creates a new instance from the given value, rounded to the closest
/// possible representation.
///
/// - Parameter value: A floating-point value.
init<Source: BinaryFloatingPoint>(_ value: Source)

/// Creates a new instance equivalent to the exact given value.
///
/// If the value you pass as `value` can't be represented exactly, the result
/// of this initializer is `nil`.
///
/// - Parameter value: A floating-point value to represent.
init?<Source: BinaryFloatingPoint>(exactly value: Source)
*/

(Xiaodi Wu) #3

There are also some commented-out requirements, which were accepted for
SE-0067.

(One requirement is obsoleted by the Numeric protocol).

Can these be implemented yet, based on your experience?

Based on a cursory thinking through only:

The first one, yes (if we observe that Integer is now BinaryInteger), but
probably inefficiently as it would have to use `words`, I think. If the
requirement were on BinaryFloatingPoint and T were constrained to
BinaryInteger & FixedWidthInteger, I think it could be more efficient. Of
course, both could be added. We could not remove the concrete initializer
from a UInt source, though, as initializing from `words` would rely on that.

The second one, I think so, but I haven't tried to make sure.

···

On Sun, Jun 18, 2017 at 05:30 Ben Rimmington <me@benrimmington.com> wrote:

FloatingPoint
-------------

> /* TODO: Implement the following APIs once a revised integer protocol
is
> introduced that allows for them to be implemented. In particular,
we
> need to have an "index of most significant bit" operation and "get
> absolute value as unsigned type" operation on the Integer protocol.
>
> /// Creates the closest representable value to the given integer.
> ///
> /// - Parameter value: The integer to represent as a floating-point
value.
> init<Source: Integer>(_ value: Source)
>
> /// Creates a value that exactly represents the given integer.
> ///
> /// If the given integer is outside the representable range of the
type, the
> /// result is `nil`.
> ///
> /// - Parameter value: The integer to represent as a floating-point
value.
> init?<Source: Integer>(exactly value: Source)
> */

BinaryFloatingPoint
-------------------

> /* TODO: Implement these once it becomes possible to do so (requires
revised
> Integer protocol).
> /// Creates a new instance from the given value, rounded to the closest
> /// possible representation.
> ///
> /// - Parameter value: A floating-point value.
> init<Source: BinaryFloatingPoint>(_ value: Source)
>
> /// Creates a new instance equivalent to the exact given value.
> ///
> /// If the value you pass as `value` can't be represented exactly, the
result
> /// of this initializer is `nil`.
> ///
> /// - Parameter value: A floating-point value to represent.
> init?<Source: BinaryFloatingPoint>(exactly value: Source)
> */