Prepitch: Character integer literals

So, even though you cannot add availability to the conformance, you can add the functions that fulfill the conformance. So it would be possible to add the init(unicodeScalarLiteral:) in the proposal, guarded by availability. If it's within that function that the static checks happen, this would still allow users to add the conformance to Int8 themselves while getting the compile-time warning. It's worth an experiment to confirm.

the tightest that unicodeScalarLiteral can be constrained to is Unicode.Scalar, which says nothing about whether the scalar value (which is 32 bits long) is actually an ascii scalar. Are you suggesting we add Int8 to the list of allowed Self.UnicodeScalarLiteralType associatedtypes?

I haven't looked at this part of the implementation PR, but how was the desired feature of a compile-time error being handled previously?

The question is, if everything except the actual conformance of Int8 to ExpressibleByUnicodeScalarLiteral were implemented, but the conformance itself left to the user (sidestepping the need for availability), would you still get that compile-time feedback? It's worth running a quick experiment to confirm (and if it doesn't work, seeing if the implementation can be phrased slightly differently so it does work).

It would be possible only if Int8 were conformed to _ExpressibleByBuiltinUnicodeScalarLiteral because this protocol is what constrains Self.UnicodeScalarLiteralType. I canā€™t do this test because that would involve modifying a builtin compiler protocol. This is the only way that would make it possible to write an initializer for UInt8 and Int8 that takes a Int8 statically-checked input and conforms them to ExpressibleByUnicodeScalarLiteral.

The test I'm describing involves changing the standard library i.e. modify the PR to still add the needed initializers, but just don't add the conformances, and instead try adding them from outside the std lib using that compiler and see if all the expected behaviors still work.

Now, it's possible that approach won't work with _ExpressibleByBuiltinUnicodeScalarLiteral since it's special (you certainly can't write your own implementation of the init because it uses a Builtin.Int32 ā€“ but maybe if that's already there you can add your own conformance). It's also more dubious a practice to encourage adding conformance to an underscored protocol. But it's worth an experiment at least.

I am confused. Doesnā€™t a UnicodeScalarLiteral use double-quotes?

And arenā€™t we talking about adding an entirely new Expressible protocol that uses single-quotes?

PR updated. Seems to work fine with users opting into ExpressibleByUnicodeScalarLiteral.

/// Conformances for character literals
/// Users will need to opt-in for now.
extension ${Self} :
    _ExpressibleByBuiltinUnicodeScalarLiteral/*,
    ExpressibleByUnicodeScalarLiteral*/ {
  @_transparent
  public init(_builtinUnicodeScalarLiteral value: Builtin.Int32) {
    self = ${Self}(truncatingIfNeeded: UInt32(value))
  }
  @_transparent
  @available(swift 5.1)
  public init(unicodeScalarLiteral value: ${Self}) {
    self = value
  }
}
1 Like

Great. Does the same hold for the _ExpressibleByBuiltinUnicodeScalarLiteral protocol as well?

I took the middle route and added the conformance to _ExpressibleByBuiltinUnicodeScalarLiteral and didnā€™t gate the implementation so users only have to add ExpressibleByUnicodeScalarLiteral; you canā€™t use one without the other. I experimented with something like the following:

@available(swift 5.1)
extension ${Self} : ExpressibleByUnicodeScalarLiteral {}

But the @available seems to be ignored for extensions.

But that middle route isn't valid unfortunately. We cannot add ungated features post-ABI stability.

It's probably a bug that you can add the availability on an extension but have it ignored. It should either work or fail to compile.

Iā€™m running out of ideas. Getting @availability working for extensions seems the most promising option.

It seems to me that the best path forward is to drop the additional conformances to _ExpressibleByBuiltinUnicodeScalarLiteral and have invalid non-ASCII scalars trap at runtime with preconditions rather than compile time for now. This isn't quite as nice, but any coverage testing at all would pick up any issues in code that went outside the ascii range. This would allow us to unblock and run the review now with an implementation that could ship in 5.1.

Does this mean all the rationales and stuff in the proposal document relating to the compile time checks should be taken out before review? I think it makes the proposal much less likely to pass, considering how big a selling point the compile time validation was. I also donā€™t think runtime precondition fails are an acceptable long-term solution, and if this goes in after the ABI freeze, every solution is going to be a long-term solution.

Also, if we canā€™t add ABI-stable conformances to _ExpressibleByBuiltinUnicodeScalarLiteral, how would we be able to add conformances to ExpressibleByUnicodeScalarLiteral? Or will this remain a ā€œhiddenā€ feature

I'd suggest they be moved to a "future directions" section.

A compile-time error for an invalid literal is not so significant compared to a runtime failure that I'd imagine it making or breaking a proposal. Yes it is much nicer and desirable whenever practical. But an invalid literal will fail 100% of the time you run it, so any test that exercises that line of code will immediately show the problem.

The short-term solution would be to add everything except the conformance itself, which could be added by the user via e.g. extension Int8: ExpressibleByUnicodeScalarLiteral { } to enable the feature in full.

1 Like

iā€™m told static #assert has already been implemented in the compiler as an experimental feature. maybe we could enable its use, just within the standard library. is that a realistic alternative?

in the chance that it becomes possible someday to include this in the standard library, would it have to go through evolution again?

I'm not sure how experimental it is ā€“ I wouldn't rely on an experimental feature as a lynchpin of a proposal you want to get into 5.1. Like I say, I don't see having compile-time instead of run-time errors as make or break for whether this proposal is accepted.

When we add availability for conformances that itself will have to go through review. I expect we can tack adding this conformance onto that.

@Ben_Cohen, I must be dim but Iā€™m having trouble mapping what you suggest onto my understanding of how these protocols work. _ExpressibleByBuiltinUnicodeScalarLiteral is a constraint on the associated type UnicodeScalarLiteralType

public protocol ExpressibleByUnicodeScalarLiteral {
  /// A type that represents a Unicode scalar literal.
  ///
  /// Valid types for `UnicodeScalarLiteralType` are `Unicode.Scalar`,
  /// `Character`, `String`, and `StaticString`.
  associatedtype UnicodeScalarLiteralType : _ExpressibleByBuiltinUnicodeScalarLiteral

  /// Creates an instance initialized to the given value.
  ///
  /// - Parameter value: The value of the new instance.
  init(unicodeScalarLiteral value: UnicodeScalarLiteralType)
}

so it is not possible to use ExpressibleByUnicodeScalarLiteral on a type without a conformance to _ExpressibleByBuiltinUnicodeScalarLiteral and I know of no other way to initialise an int-like type from a character/string literal. I also canā€™t see this affects whether values can be compile time checked as it all happens in CSAppy.cpp when the destination type looks like an Int.

Iā€™d like to take another tack however. The implementation of these conformances are never used at run time. I donā€™t know how, but if you look at the assembly language for when a character literal is used the ASCII value has been transformed all the way to an Integer constant in a single instruction e.g. cmp.

let i: Int8 = 99
if i == 'a' {
}

bin/swiftc -target armv7-apple-ios10.3 -S i.swift -o ~/a.s gives:

	.section	__TEXT,__text,regular,pure_instructions
	.ios_version_min 10, 3
	.syntax unified
	.globl	_main
	.p2align	1
	.code	16
	.thumb_func	_main
_main:
	sub	sp, #8
	movs	r2, #99
	movw	r3, :lower16:(_$s1a1is4Int8Vvp-(LPC0_0+4))
	movt	r3, :upper16:(_$s1a1is4Int8Vvp-(LPC0_0+4))
LPC0_0:
	add	r3, pc
	strb	r2, [r3]
	ldrsb.w	r2, [r3]
	cmp	r2, #97
	str	r0, [sp, #4]
	str	r1, [sp]
	bne	LBB0_2
	b	LBB0_3
LBB0_2:
	b	LBB0_3
LBB0_3:
	movs	r0, #0
	add	sp, #8
	bx	lr

There is no reference or linkage to the witness tables or symbols of the standard library so this is not an ABI issue at all. Where it me Iā€™d go to review as originally proposed with both conformances as the use of these integer conformances and implementations never make it past the compiler in a way that would affect a program as far as I can see.

1 Like

You can use Unicode.Scalar, which does have the needed conformance already. It exposes the numeric value:

extension Int8: ExpressibleByUnicodeScalarLiteral {
  public typealias UnicodeScalarLiteralType = Unicode.Scalar
  
  public init(unicodeScalarLiteral value: Unicode.Scalar) {
    precondition(value.isASCII)
    self.init(truncatingIfNeeded: value.value)
  }
}

let i: Int8 = "X" // i = 88

Like you say, the validation happens in the compiler, so its possible the precondition is redundant.

Under most circumstances in concrete or specialized code the actual literal processing ought to be optimized away (I would hope!). But for the change to be ABI stable, this would need to be guaranteed under all circumstances. The most strenuous test of this would be to have this non-inlinable function inside one module:

func f<T: ExpressibleByUnicodeScalarLiteral>() -> T {
  return 'X'
}

called from another module, and confirm the conformance is still never referenced.

Your suggestion checks out and the ASCII constraint is checked at compile time so it looks like weā€™re good to go. Amazingly, the compiler still optimises it all down to an integer constant as before. So, the only thing weā€™re missing at this stage is the conformance which users will have to opt into to enable the integer conversions. Shame, but difficult to avoid I guess.

btw: weā€™re trying to migrate people off double quotes for Unicode.Scalar and Character literals and certainly not the new integer literals so your example gives an error in the proposed implementation.

(swift) let i: Int8 = "X" // i = 88
<REPL Input>:1:15: error: integers can only be expressed by single quoted character literals
let i: Int8 = "X" // i = 88
              ^

ā€” Bug or feature? PR updated.

/// Intended for character literals. Per integer type
/// conformances to ExpressibleByUnicodeScalarLiteral
/// will have to wait until they can be gated to 5.1.
extension FixedWidthInteger {
  @_transparent
  @available(swift 5.1)
  public init(unicodeScalarLiteral value: Unicode.Scalar) {
    self.init(truncatingIfNeeded: value.value)
  }
}
1 Like

actually, you can call the initializers dynamically at run-time

> let u:Unicode.Scalar = "a"
u: Unicode.Scalar = U'a'
> Unicode.Scalar.init(unicodeScalarLiteral: u)
$R0: Unicode.Scalar = U'a'

I canā€™t think of any legitimate use cases for this and if weā€™re being real it should not be possible to call them from any function that is not also a literal initializer. but if we donā€™t ban this before ABI stability, we wonā€™t be able to guarantee this.

1 Like