Edge case enum

xwu · November 1, 2023, 6:12pm

My read of the guarantees—I could be wrong—is that this is explicitly what Unicode states will never happen.

According to the Unicode policy for Normalization Forms, applicable to Unicode 4.1 and all later versions, the results of normalizing a string on one version will always be the same as normalizing it on any other version, as long as the string contains only assigned characters according to both versions.

(Source: UAX #15)

The code that would stop compiling after the update would be an enum you created containing cases with string raw values that are different representations of the same just-added emoji: and it can't be just any new emoji sequence, but would have to be one with new emoji codepoints; and the new emoji codepoints would have to compose with each other or have a "strongly discourage[d]" new composition with an existing codepoint.

Which, I mean, you can contrive to do in theory—although, now that I reflect, I'm actually not aware of any emojis which have decomposed and composed codepoint counterparts introduced in the same Unicode version, so in fact I'm not sure you could actually contrive this in practice—but (a) it's nigh impossible to do by accident; and (b) as it is, the behavior of an enum with non-unique raw values is un(der?)specified, so not compiling it after you update your compiler would be arguably an improvement?

xwu · November 1, 2023, 6:14pm

As I recall, this is already tracked as a bug.

tera · November 1, 2023, 6:31pm

~~However I do not see EQ on rawValues being called when comparing enumeration constants in another example with a custom expressibleByXXXLiteral enumeration:~~

Edit: I've been corrected below.

Quite elaborate setup (any way to simplify it?)

struct C: ExpressibleByIntegerLiteral, RawRepresentable {
    typealias RawValue = Int
    var rawValue: RawValue
    init(rawValue value: RawValue) {
        rawValue = 0 // let's ignore value
    }
    init(integerLiteral value: RawValue) {
        rawValue = 0 // let's ignore value
    }
    init() {
        rawValue = 0
    }
    static func == (lhs: Self, rhs: Self) -> Bool {
        print("EQ \(lhs) and \(rhs)")
        return true
    }
    var hashValue: Int { 0 }
}

enum E: C {
    init?(rawValue: C) {
        fatalError("unused")
    }
    var rawValue: C {
        switch self {
        case .x: return C(rawValue: 1)
        case .y: return C(rawValue: 2)
        }
    }
    
    case x = 1, y = 2
}

var x = E.x
var y = E.y
print(x.rawValue == y.rawValue) // true, EQ called
print(x == y) // false, EQ not called

jrose · November 1, 2023, 6:49pm

C has to be Equatable, which yours isn’t. (And C doesn’t need to be RawRepresentable, which yours is.)

tera · November 1, 2023, 6:59pm

You are absolutely right!

Corrected code

struct C: ExpressibleByIntegerLiteral, Equatable {
    var value: Int
    init(integerLiteral value: Int) {
        self.value = value
    }
    init(value: Int) {
        self.value = value
    }
    static func == (lhs: Self, rhs: Self) -> Bool {
        print("EQ \(lhs) and \(rhs)")
        return true
    }
    func hash(into hasher: inout Hasher) {
        hasher.combine(0)
    }
}

enum E: C {
    case x = 1, y = 2
}

var x = E.x
var y = E.y
print(x.rawValue == y.rawValue) // EQ C(value: 1) and C(value: 2), true
print(x) // x
print(y) // y
print(x == y) // EQ C(value: 1) and C(value: 2), true !!!

Indeed comparing enumeration constants compares rawValues instead of discriminators

tera · November 1, 2023, 7:13pm

Do you mean this one or another?

sveinhal · November 2, 2023, 9:31am

I expected — and tested — for this exact case yesterday, but found that the compiler did give an error Raw value for enum case is not unique. So I shrugged and decided not to share that finding here.

However, I now see that I tried to express

enum Number: Double {
    case a = 0
    case b = -0
}

Maybe not surprising, but I was surprised that -0 and -0.0 are parsed differently, when type isn't inferred to be Int. It seem -0 is first parsed as an Int and then converted to a Double?

tera · November 2, 2023, 10:50am

Yep, tis is a known feature, -0 is treated as integer zero and then converted to double zero, whilst -0.0 is treated as negative double zero.

print(Double(-0).sign)  // plus
print((-0.0).sign)      // minus

Another known issues are that '+0' or '+0.0' or '-(0)' or '-(0.0)' is not a literal expression and that '0.0 == 0.0' and '0.0 < 0.0' is ambiguous.

tera · November 2, 2023, 11:52am

Another class of oddball enums is related to floating point overflow:

enum E: Float {
    case x = 0xCp127 // Warning: '0xCp127' overflows to inf during conversion to 'Float'
    case y = 0xCp128 // Warning: '0xCp128' overflows to inf during conversion to 'Float'
}
print(E.x.rawValue == E.y.rawValue) // true
print(E.x == E.y) // true
print(E(rawValue: .infinity)!) // x

Perhaps it should be an error instead of a warning.