Literal initialization via coercion

xwu · April 1, 2018, 10:36pm

Sorry for the long delay. Here are my longer thoughts below.

As it's been agreed in the interim that type-checking performance is only a side benefit and not the main motivation, I've omitted discussion on how we might be able to improve type-checking performance without making a source-breaking change.

Notes on literal initialization

Background

As detailed by John McCall:

The official way to build a literal of a specific type is to write the literal
in an explicitly-typed context, like so:
let x: UInt16 = 7
// or
let y = 7 as UInt16
Nonetheless, programmers often try the following:
UInt16(7)
Unfortunately, this does not attempt to construct the value using the
appropriate literal protocol; it instead performs overload resolution using
the standard rules.... Often this leads to static ambiguities or, worse,
causes the literal to be built using a default type (such as Int); this may
have semantically very different results which are only caught at runtime.

Differences in behavior can be witnessed not only in diagnostics:

let a = 32768 as Int16
// Causes a compile time error:
// integer literal '32768' overflows when stored into 'Int16'

let b = Int16(32768)
// Causes a **runtime** error:
// Not enough bits to represent a signed value

...but also in initialized results:

let c = 3.14159265358979323846 as Float80
// 3.14159265358979323851
let d = Float80(3.14159265358979323846)
// 3.141592653589793116

let e = 8388608.5000000001 as Float
// 8388609
let f = Float(8388608.5000000001)
// 8388608

Notes on the proposed solution

The proposed change here is as follows:

[A]ll initializer expressions involving literal types behave like coercion of
[the] literal to specified type if such type conforms to the expected literal
protocol.

...or expressed alternatively:

Given a function call expression of the form A(B) (that is, an expr-call
with a single, unlabeled argument) where B is an expr-literal or
expr-collection, if A has type T.Type for some type T and there is a
declared conformance of T to an appropriate literal protocol for B, then
the expression always resolves as a literal construction of type T (as if
the expression were written B as A) rather than as a general initializer
call.

Such a rule change would bring about two desired results:

A(42) and 42 as A would have identical behavior.
Type-checker logic would be simplified, potentially speeding up type checking
of some complex expressions.

Drawbacks are:

It is a source-breaking change that would have to be limited to Swift 5+.

It is a special-case rule that, as proposed, would cause differences in
behavior between the following expressions:

let x = UInt(42)      // As proposed, coercion
let y = UInt.init(42) // As proposed by Pavel Yaskevich, not a coercion
let z = UInt((42))    // As proposed by John McCall, not a coercion

Generalizing the special-case rule

If it is desired first to coerce a literal to type B and then convert to type
A, it is straightforward to write A(B(42)) or A(42 as B). Therefore, it is
not clearly desirable to preserve subtle differences between A(42) and
A.init(42).

Fortunately, it is not necessary to do so. The special-case rule proposed above
can be generalized, and in the process, another major weakness involving the
inferred type of literals can also be addressed:

Since the additon of heterogeneous comparison and bit shift operators to the
language, there has been a little-known footgun which is encountered in generic
code--and which has actually been encountered within the standard library
itself:

func f() -> Bool {
  return UInt.max == ~0
}
f() // true

func h<T : FixedWidthInteger>(_: T.Type) -> Bool {
  return T.max == ~0
}
h(UInt.self) // false!

Comparison with an integer literal now defaults to heterogeneous comparison
with the default IntegerLiteralType (aka Int). In concrete code, Max Moiseev
has hard-coded workarounds into the
standard library, but the same workarounds cannot be used for generic code.

Why? Because the concrete workarounds in turn rely on a type-checker hack that
prefers concrete operator implementations over generic operator implementations
for performance!

The problem is not merely a theoretical or historical one: the continued
presence of this footgun is holding back implementation of heterogeneous
comparison for floating-point types.

What do the two seemingly distinct issues have in common? Let's consider the
following four examples:

// Example 1:
UInt(0xffff_ffff_ffff_ffff)
// Users expect `0xffff_ffff_ffff_ffff` to be coerced to type `UInt`.

// Example 2:
UInt.init(0xffff_ffff_ffff_ffff)
// The same expectation is reasonable here.

// Example 3:
extension UInt {
  static func makeValue<T: BinaryInteger>(_ x: T) -> UInt {
    return UInt(x)
  }
}
UInt.makeValue(0xffff_ffff_ffff_ffff)
// The same expectation is reasonable here.

// Example 4:
infix operator <=> : ComparisonPrecedence
extension UInt {
  static func <=> <T: BinaryInteger>(lhs: UInt, rhs: T) -> Int {
    if lhs == rhs { return 0 }
    return lhs < rhs ? -1 : 1
  }
}
UInt.max <=> 0xffff_ffff_ffff_ffff
// The same expectation is reasonable here.

The proposed solution discussed above is a special-case rule that addresses
example 1 only. But a simpler rule would address all of the use cases above:

Any {foo} literal argument in a call to a static or instance method of type
T, where T: ExpressibleByFooLiteral, should be coerced to type T [edit: by default and if possible]
instead of the default FooLiteralType. [*]

I bring this up because, if we are to implement a source-breaking change to the
Swift programming language, it would be ideal to make a single source-breaking
change that addresses both of these very present, very real pain points. It
would help us to avoid creating a special-case rule and (potentially--I'm
certainly not expert on this) allow us to reap type-checker performance
improvements in a greater proportion of expressions that use literals.

[*] This is 100% sufficient for numeric literals; for string literals, where the
"currency type" is intended to be String, the existing behavior may be
preferable, although that may be debatable. (It may still prove to be the case
that users expect implicit "same-type coercion" behavior.) However, if
desired, a more elaborate proposed solution can allow ExpressibleByFooLiteral
protocols to state a type alias which becomes the default type to which {foo}
literal arguments are coerced:

protocol ExpressibleByIntegerLiteral {
  typealias IntegerLiteralType = Self
  // ...
}

protocol ExpressibleByStringLiteral {
  typealias StringLiteralType = String
  // ...
}