Improved compile-time validation for types expressible by integer literals

tanner0101 · June 13, 2019, 5:09pm

Swift currently supports integer literal validation for common programming errors like over- and underflow:

// error: integer literal '600' overflows when stored into 'UInt8'
let a: UInt8 = 600
//             ^

It would be useful if custom types that conform to a protocol like ExpressibleByIntegerLiteral could take advantage of similar compile-time validation. Here are a few example use cases where this functionality could be helpful.

Custom Integer Types

User-defined integer types like UInt24 could benefit from compile-time validation. Currently, custom integers must use runtime checks to verify the literal integer values are within range: https://github.com/apple/swift-nio/blob/master/Sources/NIO/IntegerTypes.swift#L70

Furthermore, user-defined integer types that over- or underflow Int / UInt could take advantage of this protocol to store big numbers.

Type-safe time / date APIs

Types like a theoretical MinuteInHour that have a clear range (in this case, 0..<60) could take advantage of compile-time validation for concise, expressive APIs. Take the following example (ignoring all of the numerous time edge-cases, please haha).

func schedule(at hour: HourInDay, _ minute: MinuteInHour) { ... }

schedule(at: 12, 43)
schedule(at: 17, 32)
// error: integer literal '61' is not valid for 'MinuteInHour'
schedule(at: 5, 61)
//              ^

Advanced cases

While a simple valid range description for ExpressibleByIntegerLiteral could solve most of the use cases, I think there are some advanced use cases that are more interesting. For example OddInteger / EvenInteger or even FibonacciNumber.

Logic beyond simply checking min/max would be required for these types, but could be very interesting:

let a: OddNumber = 1
// error: integer literal '2' is not valid for 'OddNumber'
let b: OddNumber = 2
let c: OddNumber = 3

let a: FibonacciNumber = 1
let b: FibonacciNumber = 2
let c: FibonacciNumber = 3
// error: integer literal '4' is not valid for 'FibonacciNumber'
let d: FibonacciNumber = 4

Rough Sketch

Here's a very rough sketch assuming a new protocol ExpressibleByRawIntegerLiteral. I'm not really sure how Swift stores an integer-literal that has yet to be converted, but I imagine it could be done as something like a collection of digits alongside some metadata, like isNegative

public struct RawIntegerLiteral {
    public enum Digit: Int {
         case one = 1, two, ..., eight, nine   
    }
    // true if the integer-literal has a `-` at the beginning
    public var isNegative: Bool
    public var digits: [Digit]
}

Assuming we have RawIntegerLiteral, here's what ExpressibleByRawIntegerLiteral for a type like MinuteInHour could look like:

public struct MinuteInHour {
    let number: Int
    
    // not type-safe, does runtime checks
    public init(_ number: Int) {
        assert(number >= 0, "Minute cannot preceed 0")
        assert(number < 60, "Minute cannot exceed 60")
        self.number = number
    }
}

extension MinuteInHour: ExpressibleByRawIntegerLiteral {
    // type safe, does compile-time checks
    public init(rawIntegerLiteral value: RawIntegerLiteral) throws {
        guard !value.isNegative else {
            throw "minute cannot be negative"
        }

        switch value.digits.count {
        case 1:
            // the raw integer only has one digit in the ones' place
            self.number = value.digits.[0].rawValue
        case 2:
            // the raw integer has two digits, check the tens' place
            // to ensure it is less than six, this will verify 0..<60 range
            guard value[0].digits < .six else {
                throw "minute cannot exceed 60"
            }
            // add the tens' and ones' places to get the actual number
            self.number = (value.digits[0].rawValue * 10) + value.digits[1].rawValue
        default:
            // there is a number in the hundreds' place, definitely too big
            throw "minute cannot exceed 60"
        }
    }
}

// for the sake of concision 
extension String: Error { }

Usage would look something like:


let a: MinuteInHour = 0
let b: MinuteInHour = 23
let c: MinuteInHour = 61 // error: minute cannot exceed 60
let d: MinuteInHour = -5 // error: minute cannot be negative
let e: MinuteInHour = 123 // error: minute cannot exceed 60

I'm really interested to know if this is something that has been considered before or whether the necessary machinery exists in the compiler to make it work.

Looking forward to seeing your thoughts, thanks!

Jon_Shier · June 13, 2019, 5:14pm

Yes, some kind of support for constrained numeric types would be huge. However, whether we do it by enhancing the literal protocols, the numeric protocols, or by compile time evaluation, I don't know.

xwu · June 13, 2019, 5:40pm

tanner0101:

public struct MinuteInHour {
    let number: Int
    
    // not type-safe, does runtime checks
    public init(_ number: Int) {
        assert(number >= 0, "Minute cannot preceed 0")
        assert(number < 60, "Minute cannot exceed 60")
        self.number = number
    }
}

Better validation at compile time would benefit many types, not just those expressible by integer literals. The groundwork is being laid for this with compile-time constant expressions.

Once built out, one imagines that simply adding @compilerEvaluable to the initializer would get you the diagnostics you want. I don't see a role for additional protocols.

tanner0101 · June 13, 2019, 8:17pm

Thanks for sharing! That does seem like it would cover most of what I mentioned here. There is still one bit that @compilerEvaluable alone wouldn't cover, though:

For this case, there would need to be an intermediary representation of an integer literal that you can use before Swift converts it to an actual integer type.

Perhaps we could achieve this by creating a new conformer to _ExpressibleByBuiltinIntegerLiteral like RawIntegerLiteral that offers you access to the raw data parsed by the compiler.

With both of these combined, we could have something like:

struct Int128: ExpressibleByIntegerLiteral {
    ...

    @compilerEvaluable
    init(integerLiteral value: RawIntegerLiteral) {
        print(value.digits) // [Digit]
        print(value.isNegative) // Bool
        assert("foo") // assertion happens at compile time
    }
}

Maybe RawIntegerLiteral is worth breaking out as a separate pitch with a note that compile-time evaluation will make it even more powerful.

taylorswift · June 13, 2019, 8:23pm

worth reading this post over: SE-0243: Codepoint and Character Literals - #167 by taylorswift (ignore the as coercion part lol)

xwu · June 13, 2019, 8:40pm

How to rework integer and float literals to support arbitrary precision is SR-920. There are a large number of people who have discussed this over the years. The problem is actually much more acute for float literals as the status quo is completely unsuitable for decimal floating-point types.

Chris_Lattner3 · June 15, 2019, 4:27am

FWIW, I agree with Xiaodi here - the constexpr evaluation logic should be to prove that various call sites result in a (dynamic) assert that always fails, and thus should be able to hoist that dynamic failure to a static compile time failure.

This will be a huge feature (when anyone gets around to implementing it :-), that allows detection of a wide range of different sorts of things. I expect it to dovetail well with the eDSL discussion, with the recent "better string literal interpolation discussion" (for os_log etc), and catch lots of other bugs in general.

I'd love to see this get baked out over time.

-Chris

John_McCall · June 15, 2019, 5:07am

You can see a description here, but long story short, it's an array of words (unmanaged, presumed constant, of minimum length required to express the signed value, sign-extended in the most significant word, little-endian across words) plus a bit-width (that minimum length, not rounded up to a multiple of the word size) and an isNegative flag. This makes it very efficient to do a checked conversion to a fixed-width binary integer type, which is of course the primary supported operation. Doing a checked conversion to a fixed-range integer type can then be done by first converting to an enclosing fixed-width binary integer type and then doing a range check.

All of this is pretty well-abstracted already in the compiler, we just don't have (1) a library type that exposes it or (2) a user-visible protocol for working with it. The path of least resistance for exposing it to users is probably just to add a library type that wraps it, then make that conform to ExpressibleByBuiltinIntegerLiteral, so that you can just use that as the builtin literal type for your own ExpressibleByIntegerLiteral types.

Whether that'd be sufficient in practice to allow normal expression folding to provide your compile-time validation, I don't know.

We don't have a floating-point literal equivalent, in part because you can't just eliminate source-code base differences with FP literals. I'm not sure what the right FP-literal equivalent would be, since we'd want to preserve decimal-base literal values with perfect precision while also understanding that converting to a binary FP type would still be the dominant use case.