My read of the guarantees—I could be wrong—is that this is explicitly what Unicode states will never happen.
According to the Unicode policy for Normalization Forms, applicable to Unicode 4.1 and all later versions, the results of normalizing a string on one version will always be the same as normalizing it on any other version, as long as the string contains only assigned characters according to both versions.
The code that would stop compiling after the update would be an enum you created containing cases with string raw values that are different representations of the same just-added emoji: and it can't be just any new emoji sequence, but would have to be one with new emoji codepoints; and the new emoji codepoints would have to compose with each other or have a "strongly discourage[d]" new composition with an existing codepoint.
Which, I mean, you can contrive to do in theory—although, now that I reflect, I'm actually not aware of any emojis which have decomposed and composed codepoint counterparts introduced in the same Unicode version, so in fact I'm not sure you could actually contrive this in practice—but (a) it's nigh impossible to do by accident; and (b) as it is, the behavior of an enum with non-unique raw values is un(der?)specified, so not compiling it after you update your compiler would be arguably an improvement?
However I do not see EQ on rawValues being called when comparing enumeration constants in another example with a custom expressibleByXXXLiteral enumeration:
Edit: I've been corrected below.
Quite elaborate setup (any way to simplify it?)
struct C: ExpressibleByIntegerLiteral, RawRepresentable {
typealias RawValue = Int
var rawValue: RawValue
init(rawValue value: RawValue) {
rawValue = 0 // let's ignore value
}
init(integerLiteral value: RawValue) {
rawValue = 0 // let's ignore value
}
init() {
rawValue = 0
}
static func == (lhs: Self, rhs: Self) -> Bool {
print("EQ \(lhs) and \(rhs)")
return true
}
var hashValue: Int { 0 }
}
enum E: C {
init?(rawValue: C) {
fatalError("unused")
}
var rawValue: C {
switch self {
case .x: return C(rawValue: 1)
case .y: return C(rawValue: 2)
}
}
case x = 1, y = 2
}
var x = E.x
var y = E.y
print(x.rawValue == y.rawValue) // true, EQ called
print(x == y) // false, EQ not called
I expected — and tested — for this exact case yesterday, but found that the compiler did give an error Raw value for enum case is not unique. So I shrugged and decided not to share that finding here.
However, I now see that I tried to express
enum Number: Double {
case a = 0
case b = -0
}
Maybe not surprising, but I was surprised that -0 and -0.0 are parsed differently, when type isn't inferred to be Int. It seem -0 is first parsed as an Int and then converted to a Double?
Another class of oddball enums is related to floating point overflow:
enum E: Float {
case x = 0xCp127 // Warning: '0xCp127' overflows to inf during conversion to 'Float'
case y = 0xCp128 // Warning: '0xCp128' overflows to inf during conversion to 'Float'
}
print(E.x.rawValue == E.y.rawValue) // true
print(E.x == E.y) // true
print(E(rawValue: .infinity)!) // x
Perhaps it should be an error instead of a warning.
While I’m somewhat in favor of this change as I would like the perf improvement of this, I don’t think this has been changed intentionally and this is really a bug.
I’m especially suspicious because hashValue is still equal for .a and .b which shouldn’t be the case if this would have been truly “fixed”: Compiler Explorer
Good find. I'd say that hash values being equal is a bug.. just that!
The values are different, that rawValues are the same should not make a difference. You are right, the hashes are better be different for different enum cases.
Perhaps the enum's hashValue should be based solely on the discriminator (and the associated values if any) and not take the case's rawValue into account.
(Just in case, remember that it’s acceptable for hashes to be the same when content is different, just not ideal. It’s the opposite that’s fully breaking.)
You mean + specifically or an arbitrary prefix operator?
IMHO Swift could benefit from some simple constant expression folding done at compile time, (without investing into full tilt const expressions), for example:
enum E: Int {
case a = +123
case b = -2*(3+4)/5
}
i.e. "zero syntax cost" const expressions, perhaps with a very limited subset of allowed operators.