Literal initialization via coercion

xedin · March 22, 2018, 12:41am

Currently types conforming to literal protocols are type-checked using regular
initializer rules, which means that for expressions like UInt32(42) the type-checker
is going to look up a set of available initializer choices and attempt them
one-by-one trying to deduce the best solution.

This is not always a desired behavior when it comes to numeric and
other literals, because it means that argument is going to be type-checked
separately (most likely to some default literal type like Int) and passed
to an initializer call. At the same time coercion behavior would treat
the expression above as 42 as UInt32 where 42 is ascribed to be UInt32
and constructed without an intermediate type.

The proposed change makes all initializer expressions involving literal types
behave like coercion of literal to specified type if such type conforms to the
expected literal protocol. As a result expressions like UInt64(0xffff_ffff_ffff_ffff),
which result in compile-time overflow under current rules, become valid. It
also simplifies type-checker logic and leads to speed-up in some complex expressions.

This is a source breaking change because it's possible to declare a conformance to
a literal protocol and also have a failable initializer with the same parameter type:

struct Q: ExpressibleByStringLiteral {
  typealias StringLiteralType =  String

  var question: String

  init?(_ possibleQuestion: StringLiteralType) {
    return nil
  }

  init(stringLiteral str: StringLiteralType) {
    self.question = str
  }
}

_ = Q("ultimate question")    // 'nil'
_ = "ultimate question" as Q  // Q(question: 'ultimate question')

Although such situations are possible, we consider them to be quite rare
in practice. FWIW, none were found in the compatibility test suite.

Implementation (currently only for numerics) is available at: https://github.com/apple/swift/pull/15311

Nevin · March 22, 2018, 1:20am

This is great!

It lets people use the natural initializer syntax while reaping the benefits of compile-time type conversion.

blangmuir · March 22, 2018, 6:31pm

What exactly does this mean? Are we creating a new implicit init(_:) for all literal convertible types?

xedin · March 22, 2018, 6:39pm

We are not going to create a new initializer, calls like UInt32(42) are going to be implicitly converted to 42 as UInt32 which is going to use init(integerLiteral:).

blangmuir · March 22, 2018, 6:46pm

What about UInt32.init(42)? Does that still get today's behaviour?

xedin · March 22, 2018, 6:48pm

Sure, if the init is explicitly specified we'll honor that.

xwu · March 23, 2018, 3:40am

I have some thoughts on this which will take longer than I can write-up completely in an evening. Please bear with me while I sketch them out over the next few days. I'll also try to roll in some expository text as well so hopefully we can rope some more people into the discussion.

The short-form of my feedback is that:

I agree that, in the end state (hopefully as soon as Swift 5), UInt64(0xffff_ffff_ffff_ffff) should become valid and behave just as you propose.
I agree that we have a problem with type checking performance which your pitch would likely address.
I think the language issue (item 1) and the type checking performance issue (item 2) can be resolved completely independently of each other (and I'll outline, in the later follow-up feedback, how exactly I think it can be done).
I think that the language issue should be broadened to address closely related issues with type checking literal expressions that are pressing pain points today (and that such broadening won't explode the size of the required fix in an unreasonable way).

More soon.

xedin · March 23, 2018, 4:08am

Thanks @xwu, I'm really curious what you have to say here! It might also make sense to do a separate pitch and keep this one to the point of initialization since it's a low hanging fruit.

allevato · March 23, 2018, 4:18am

Thanks for bringing this up! This is one of those parts of Swift that I've wanted to see cleaned up for a while.

It may be a good idea to read up on John McCall's thread from a couple years ago where he brought up the same issue, just to see if there's any valuable discussion about design/implementation. It's a shame this didn't get fixed back then.

You mention the case of UInt64(0xffff_ffff_ffff_ffff) becoming valid, as it should. But I would also mention that this turns some expressions that are currently only runtime errors into compiler errors (as they also should be, if the conversion is invalid). For example, "ab" as Character is currently a compile-time error but Character("ab") compiles and traps at runtime. Having the latter caught at compile time falls nicely out of this change.

xedin · March 23, 2018, 6:13pm

Thanks for bringing up John's thread, @allevato! I wasn't aware that one existed, but I think I could borrow some stuff from it when this gets into proposal form, and I'll mention added safety from this change.

jrose · March 23, 2018, 6:19pm

I'm in favor of this purely from a language standpoint. The fact that it has performance benefits is a bonus.

Tony's example does show that this may break existing source more than we thought, and so it might make sense to limit this change to Swift 5 mode. It would also encourage people to switch to Swift 5. :-)

John_McCall · March 25, 2018, 1:03am

I am definitely still in favor of making construction syntax with an unlabelled literal argument just directly construct the literal in the constructed type if possible, as a general language rule.

Chris_Lattner3 · April 1, 2018, 3:11pm

+1

-Chris

xwu · April 1, 2018, 10:36pm

Sorry for the long delay. Here are my longer thoughts below.

As it's been agreed in the interim that type-checking performance is only a side benefit and not the main motivation, I've omitted discussion on how we might be able to improve type-checking performance without making a source-breaking change.

Notes on literal initialization

Background

As detailed by John McCall:

The official way to build a literal of a specific type is to write the literal
in an explicitly-typed context, like so:
let x: UInt16 = 7
// or
let y = 7 as UInt16
Nonetheless, programmers often try the following:
UInt16(7)
Unfortunately, this does not attempt to construct the value using the
appropriate literal protocol; it instead performs overload resolution using
the standard rules.... Often this leads to static ambiguities or, worse,
causes the literal to be built using a default type (such as Int); this may
have semantically very different results which are only caught at runtime.

Differences in behavior can be witnessed not only in diagnostics:

let a = 32768 as Int16
// Causes a compile time error:
// integer literal '32768' overflows when stored into 'Int16'

let b = Int16(32768)
// Causes a **runtime** error:
// Not enough bits to represent a signed value

...but also in initialized results:

let c = 3.14159265358979323846 as Float80
// 3.14159265358979323851
let d = Float80(3.14159265358979323846)
// 3.141592653589793116

let e = 8388608.5000000001 as Float
// 8388609
let f = Float(8388608.5000000001)
// 8388608

Notes on the proposed solution

The proposed change here is as follows:

[A]ll initializer expressions involving literal types behave like coercion of
[the] literal to specified type if such type conforms to the expected literal
protocol.

...or expressed alternatively:

Given a function call expression of the form A(B) (that is, an expr-call
with a single, unlabeled argument) where B is an expr-literal or
expr-collection, if A has type T.Type for some type T and there is a
declared conformance of T to an appropriate literal protocol for B, then
the expression always resolves as a literal construction of type T (as if
the expression were written B as A) rather than as a general initializer
call.

Such a rule change would bring about two desired results:

A(42) and 42 as A would have identical behavior.
Type-checker logic would be simplified, potentially speeding up type checking
of some complex expressions.

Drawbacks are:

It is a source-breaking change that would have to be limited to Swift 5+.

It is a special-case rule that, as proposed, would cause differences in
behavior between the following expressions:

let x = UInt(42)      // As proposed, coercion
let y = UInt.init(42) // As proposed by Pavel Yaskevich, not a coercion
let z = UInt((42))    // As proposed by John McCall, not a coercion

Generalizing the special-case rule

If it is desired first to coerce a literal to type B and then convert to type
A, it is straightforward to write A(B(42)) or A(42 as B). Therefore, it is
not clearly desirable to preserve subtle differences between A(42) and
A.init(42).

Fortunately, it is not necessary to do so. The special-case rule proposed above
can be generalized, and in the process, another major weakness involving the
inferred type of literals can also be addressed:

Since the additon of heterogeneous comparison and bit shift operators to the
language, there has been a little-known footgun which is encountered in generic
code--and which has actually been encountered within the standard library
itself:

func f() -> Bool {
  return UInt.max == ~0
}
f() // true

func h<T : FixedWidthInteger>(_: T.Type) -> Bool {
  return T.max == ~0
}
h(UInt.self) // false!

Comparison with an integer literal now defaults to heterogeneous comparison
with the default IntegerLiteralType (aka Int). In concrete code, Max Moiseev
has hard-coded workarounds into the
standard library, but the same workarounds cannot be used for generic code.

Why? Because the concrete workarounds in turn rely on a type-checker hack that
prefers concrete operator implementations over generic operator implementations
for performance!

The problem is not merely a theoretical or historical one: the continued
presence of this footgun is holding back implementation of heterogeneous
comparison for floating-point types.

What do the two seemingly distinct issues have in common? Let's consider the
following four examples:

// Example 1:
UInt(0xffff_ffff_ffff_ffff)
// Users expect `0xffff_ffff_ffff_ffff` to be coerced to type `UInt`.

// Example 2:
UInt.init(0xffff_ffff_ffff_ffff)
// The same expectation is reasonable here.

// Example 3:
extension UInt {
  static func makeValue<T: BinaryInteger>(_ x: T) -> UInt {
    return UInt(x)
  }
}
UInt.makeValue(0xffff_ffff_ffff_ffff)
// The same expectation is reasonable here.

// Example 4:
infix operator <=> : ComparisonPrecedence
extension UInt {
  static func <=> <T: BinaryInteger>(lhs: UInt, rhs: T) -> Int {
    if lhs == rhs { return 0 }
    return lhs < rhs ? -1 : 1
  }
}
UInt.max <=> 0xffff_ffff_ffff_ffff
// The same expectation is reasonable here.

The proposed solution discussed above is a special-case rule that addresses
example 1 only. But a simpler rule would address all of the use cases above:

Any {foo} literal argument in a call to a static or instance method of type
T, where T: ExpressibleByFooLiteral, should be coerced to type T [edit: by default and if possible]
instead of the default FooLiteralType. [*]

I bring this up because, if we are to implement a source-breaking change to the
Swift programming language, it would be ideal to make a single source-breaking
change that addresses both of these very present, very real pain points. It
would help us to avoid creating a special-case rule and (potentially--I'm
certainly not expert on this) allow us to reap type-checker performance
improvements in a greater proportion of expressions that use literals.

[*] This is 100% sufficient for numeric literals; for string literals, where the
"currency type" is intended to be String, the existing behavior may be
preferable, although that may be debatable. (It may still prove to be the case
that users expect implicit "same-type coercion" behavior.) However, if
desired, a more elaborate proposed solution can allow ExpressibleByFooLiteral
protocols to state a type alias which becomes the default type to which {foo}
literal arguments are coerced:

protocol ExpressibleByIntegerLiteral {
  typealias IntegerLiteralType = Self
  // ...
}

protocol ExpressibleByStringLiteral {
  typealias StringLiteralType = String
  // ...
}

John_McCall · April 1, 2018, 10:48pm

I take it that that's supposed to be a defaulting rule, where literals in call arguments are defaulted to the Self type of the call instead of the global default type for the literal? Is there any sort of restriction about how the corresponding parameter has to be declared, like maybe this only applies if the parameter is of type Self, or of some generic parameter type T?

Anyway, that is a much more aggressive rule, and the repercussions are quite a bit harder to predict in advance than the syntactical initializer rule.

xwu · April 1, 2018, 10:50pm

Yes, sorry, as a defaulting rule, whenever there is more than one overload or when the parameter is of a generic type; I've just edited to clarify.

The repercussions, in practice, should be limited to scenarios in which there's possible unintended behavior today, based on the observations (as detailed above) that we've been trying to do all sorts of creative things to make Swift behave in this way in the first place. You are certainly correct to remark that it's deliberately broader, but the purpose of putting forward this possible solution is that the alternative, limiting each change to one specific kind of expression, feels like we're playing whack-a-mole.

taylorswift · April 1, 2018, 11:25pm

did i just read that UInt(13) and UInt((13)) are going to mean different things¿?

xwu · April 1, 2018, 11:26pm

In John McCall's earlier pitch, yes:

Note that, as specified, it is possible to suppress this typing rule by wrapping the literal in parentheses.

taylorswift · April 1, 2018, 11:28pm

im getting war flashbacks to C preprocessor parentheses

anthonylatsis · April 1, 2018, 11:49pm

Is there such a case when we would want to supress the rule?