Literal initialization via coercion

Thanks for bringing this up! This is one of those parts of Swift that I've wanted to see cleaned up for a while.

It may be a good idea to read up on John McCall's thread from a couple years ago where he brought up the same issue, just to see if there's any valuable discussion about design/implementation. It's a shame this didn't get fixed back then.

You mention the case of UInt64(0xffff_ffff_ffff_ffff) becoming valid, as it should. But I would also mention that this turns some expressions that are currently only runtime errors into compiler errors (as they also should be, if the conversion is invalid). For example, "ab" as Character is currently a compile-time error but Character("ab") compiles and traps at runtime. Having the latter caught at compile time falls nicely out of this change.

5 Likes

Thanks for bringing up John's thread, @allevato! I wasn't aware that one existed, but I think I could borrow some stuff from it when this gets into proposal form, and I'll mention added safety from this change.

I'm in favor of this purely from a language standpoint. The fact that it has performance benefits is a bonus.

Tony's example does show that this may break existing source more than we thought, and so it might make sense to limit this change to Swift 5 mode. It would also encourage people to switch to Swift 5. :-)

4 Likes

I am definitely still in favor of making construction syntax with an unlabelled literal argument just directly construct the literal in the constructed type if possible, as a general language rule.

3 Likes

+1

-Chris

Sorry for the long delay. Here are my longer thoughts below.

As it's been agreed in the interim that type-checking performance is only a side benefit and not the main motivation, I've omitted discussion on how we might be able to improve type-checking performance without making a source-breaking change.


Notes on literal initialization

Background

As detailed by John McCall:

The official way to build a literal of a specific type is to write the literal
in an explicitly-typed context, like so:

let x: UInt16 = 7
// or
let y = 7 as UInt16

Nonetheless, programmers often try the following:

UInt16(7)

Unfortunately, this does not attempt to construct the value using the
appropriate literal protocol; it instead performs overload resolution using
the standard rules.... Often this leads to static ambiguities or, worse,
causes the literal to be built using a default type (such as Int); this may
have semantically very different results which are only caught at runtime.

Differences in behavior can be witnessed not only in diagnostics:

let a = 32768 as Int16
// Causes a compile time error:
// integer literal '32768' overflows when stored into 'Int16'

let b = Int16(32768)
// Causes a **runtime** error:
// Not enough bits to represent a signed value

...but also in initialized results:

let c = 3.14159265358979323846 as Float80
// 3.14159265358979323851
let d = Float80(3.14159265358979323846)
// 3.141592653589793116

let e = 8388608.5000000001 as Float
// 8388609
let f = Float(8388608.5000000001)
// 8388608

Notes on the proposed solution

The proposed change here is as follows:

[A]ll initializer expressions involving literal types behave like coercion of
[the] literal to specified type if such type conforms to the expected literal
protocol.

...or expressed alternatively:

Given a function call expression of the form A(B) (that is, an expr-call
with a single, unlabeled argument) where B is an expr-literal or
expr-collection, if A has type T.Type for some type T and there is a
declared conformance of T to an appropriate literal protocol for B, then
the expression always resolves as a literal construction of type T (as if
the expression were written B as A) rather than as a general initializer
call.

Such a rule change would bring about two desired results:

  1. A(42) and 42 as A would have identical behavior.
  2. Type-checker logic would be simplified, potentially speeding up type checking
    of some complex expressions.

Drawbacks are:

  1. It is a source-breaking change that would have to be limited to Swift 5+.

  2. It is a special-case rule that, as proposed, would cause differences in
    behavior between the following expressions:

    let x = UInt(42)      // As proposed, coercion
    let y = UInt.init(42) // As proposed by Pavel Yaskevich, not a coercion
    let z = UInt((42))    // As proposed by John McCall, not a coercion
    

Generalizing the special-case rule

If it is desired first to coerce a literal to type B and then convert to type
A, it is straightforward to write A(B(42)) or A(42 as B). Therefore, it is
not clearly desirable to preserve subtle differences between A(42) and
A.init(42).

Fortunately, it is not necessary to do so. The special-case rule proposed above
can be generalized, and in the process, another major weakness involving the
inferred type of literals can also be addressed:

Since the additon of heterogeneous comparison and bit shift operators to the
language, there has been a little-known footgun which is encountered in generic
code--and which has actually been encountered within the standard library
itself
:

func f() -> Bool {
  return UInt.max == ~0
}
f() // true

func h<T : FixedWidthInteger>(_: T.Type) -> Bool {
  return T.max == ~0
}
h(UInt.self) // false!

Comparison with an integer literal now defaults to heterogeneous comparison
with the default IntegerLiteralType (aka Int). In concrete code, Max Moiseev
has hard-coded workarounds into the
standard library, but the same workarounds cannot be used for generic code.

Why? Because the concrete workarounds in turn rely on a type-checker hack that
prefers concrete operator implementations over generic operator implementations
for performance!

The problem is not merely a theoretical or historical one: the continued
presence of this footgun is holding back implementation of heterogeneous
comparison for floating-point types
.

What do the two seemingly distinct issues have in common? Let's consider the
following four examples:

// Example 1:
UInt(0xffff_ffff_ffff_ffff)
// Users expect `0xffff_ffff_ffff_ffff` to be coerced to type `UInt`.

// Example 2:
UInt.init(0xffff_ffff_ffff_ffff)
// The same expectation is reasonable here.

// Example 3:
extension UInt {
  static func makeValue<T: BinaryInteger>(_ x: T) -> UInt {
    return UInt(x)
  }
}
UInt.makeValue(0xffff_ffff_ffff_ffff)
// The same expectation is reasonable here.

// Example 4:
infix operator <=> : ComparisonPrecedence
extension UInt {
  static func <=> <T: BinaryInteger>(lhs: UInt, rhs: T) -> Int {
    if lhs == rhs { return 0 }
    return lhs < rhs ? -1 : 1
  }
}
UInt.max <=> 0xffff_ffff_ffff_ffff
// The same expectation is reasonable here.

The proposed solution discussed above is a special-case rule that addresses
example 1 only
. But a simpler rule would address all of the use cases above:

Any {foo} literal argument in a call to a static or instance method of type
T, where T: ExpressibleByFooLiteral, should be coerced to type T [edit: by default and if possible]
instead of the default FooLiteralType.
[*]

I bring this up because, if we are to implement a source-breaking change to the
Swift programming language, it would be ideal to make a single source-breaking
change that addresses both of these very present, very real pain points. It
would help us to avoid creating a special-case rule and (potentially--I'm
certainly not expert on this) allow us to reap type-checker performance
improvements in a greater proportion of expressions that use literals.


[*] This is 100% sufficient for numeric literals; for string literals, where the
"currency type" is intended to be String, the existing behavior may be
preferable, although that may be debatable. (It may still prove to be the case
that users expect implicit "same-type coercion" behavior.) However, if
desired
, a more elaborate proposed solution can allow ExpressibleByFooLiteral
protocols to state a type alias which becomes the default type to which {foo}
literal arguments are coerced:

protocol ExpressibleByIntegerLiteral {
  typealias IntegerLiteralType = Self
  // ...
}

protocol ExpressibleByStringLiteral {
  typealias StringLiteralType = String
  // ...
}
3 Likes

I take it that that's supposed to be a defaulting rule, where literals in call arguments are defaulted to the Self type of the call instead of the global default type for the literal? Is there any sort of restriction about how the corresponding parameter has to be declared, like maybe this only applies if the parameter is of type Self, or of some generic parameter type T?

Anyway, that is a much more aggressive rule, and the repercussions are quite a bit harder to predict in advance than the syntactical initializer rule.

Yes, sorry, as a defaulting rule, whenever there is more than one overload or when the parameter is of a generic type; I've just edited to clarify.

The repercussions, in practice, should be limited to scenarios in which there's possible unintended behavior today, based on the observations (as detailed above) that we've been trying to do all sorts of creative things to make Swift behave in this way in the first place. You are certainly correct to remark that it's deliberately broader, but the purpose of putting forward this possible solution is that the alternative, limiting each change to one specific kind of expression, feels like we're playing whack-a-mole.

did i just read that UInt(13) and UInt((13)) are going to mean different things¿?

In John McCall's earlier pitch, yes:

Note that, as specified, it is possible to suppress this typing rule by wrapping the literal in parentheses.

im getting war flashbacks to C preprocessor parentheses

Is there such a case when we would want to supress the rule?

I see a problem here: sometimes the “wanted” behavior isn’t actually wanted at all. like:

extension Int8 
{
    func asOffset<I>(from base:I) -> I where I:BinaryInteger 
    {
        return I(self) + base
    }
}

let field:Int8 = 4
print(field.asOffset(from: 0x400000))

The 0x400000 is probably a hardcoded pointer and the intent here was really

field.asOffset(from: 0x400000 as Int)

but with Xiaodi’s proposal it becomes an error.

Yes, it's certainly possible for such a scenario to arise, but I think in real-world usage this particular example is unlikely to be such a case: the user is likely to actually use the result for something other than printing, in which scenario the type is no longer determined by literal type inference rules.

extension Int8 
{
    func asOffset<I>(from base:I) -> UnsafeRawPointer? where I:BinaryInteger 
    {
        return UnsafeRawPointer(bitPattern: UInt(base)).flatMap{ $0 + Int(self) }
    }
}

let field:Int8 = 4
let pointer:UnsafeRawPointer? = field.asOffset(from: 0x400000)

Yes, again, possible. But isn't this API best declared with a parameter UInt? (The desired default inferred type is, in any case, UInt and not today's Int, meaning today's Swift doesn't behave ideally in the general case of hardcoded memory addresses either.) Is this real-world code? What's the use case for supporting differently typed memory addresses?

no but i thought the point was to brainstorm possible counterexamples so problems get caught before they affect real-world code

Yes, I think this is the behavior people expect from the syntax Int(5), and we've seen enough confusion with the current semantics to warrant a change.

Doug

Slightly ot:
Isn't anyone else a little bit unsatisfied that even with the proposed changes Swift can't deal with literals of really big numbers?
Thanks to operator overloading, you can easily create your own numeric types, but there is a fundamental restriction on their initializer.
I know that this isn't an issue for serious applications, but wouldn't it be nice if pupils could just query the Swift REPL for sqrt(10000000000000000000000000000000)without having to think about the limitations of Int64?

3 Likes

Wasn’t some work on something like this done by @xwu and others with the DoubleWidth types? Or are you saying that we should automatically create those types to fit the size of the literal?

We can move this discussion to another thread though, since it’s not directly related.