Edge case enum

tera · November 1, 2023, 2:08am

Found this edge case scenario:

enum E: String {
    case a = "\u{00c5}"     // Å precomposed form
    case b = "A\u{030A}"    // Å decomposed form
}

Here I am using enumeration constants with "equal" (according to string equivalence rules) rawValues strings but different strings according to the check done by compiler: the first string is written in a precomposed form and the second string is written in a decomposed form – swift compiler allows that.

print(E.a.rawValue) // Å
print(E.b.rawValue) // Å

print(E.a.rawValue.utf8.count) // 2
print(E.b.rawValue.utf8.count) // 3

print(E.a.rawValue == E.b.rawValue) // true. understandaby

So far so good, nothing extraordinary above.

print(MemoryLayout<E>.size) // 1
print(unsafeBitCast(E.a, to: UInt8.self)) // 0
print(unsafeBitCast(E.b, to: UInt8.self)) // 1

Business as usual.

But then:

print(E.a == E.b) // true

This is unexpected. Note that during enum comparison enum constants themselves are compared (in this case byte quantities), not their raw values.

xwu · November 1, 2023, 2:44am

Seems the root of the error is that the compiler should be rejecting this code.

taylorswift · November 1, 2023, 3:09am

i disagree, i would expect == to be completely independent of choice of raw value.

going a bit further down the rabbit hole, i don’t think the compiler could reject this kind of problem generally, because you are allowed to use any ExpressibleByStringLiteral type for the raw value, and the init(stringLiteral:) witness might be opaque.

wadetregaskis · November 1, 2023, 3:12am

I suspect this is a bug in the compiler that's now stuck with us forever due to ABI compatibility concerns… I vaguely recall someone (@scanon?) saying as much for a similar case last week.

(the bug being that the compiler doesn't compare string values like String does, in a nutshell)

xwu · November 1, 2023, 4:23am

Not sure what you mean here; raw values for enum cases are required to be "unique"—it doesn't make sense that this would be some notion of uniqueness other than ==.

taylorswift · November 1, 2023, 4:27am

they are required to be unique in the way Comparable is required to be consistent, but when you say

the compiler cannot enforce that the same way the compiler cannot enforce that your Equatable, Hashable, Comparable, etc. is correct.

xwu · November 1, 2023, 4:31am

The compiler already enforces "uniqueness" for Int raw values; I'm saying it can and should do so for the String raw values "\u{00c5}" and "A\u{030A}"—that it cannot do the same in the general case of some esoteric custom type doesn't mean it's meaningless to diagnose the common, specific cases.

This is par for the course for how Swift compiler diagnostics handle literals: you get a compile-time error that 1024 overflows Int8, for example, when of course there is no telling how a custom third-party type might implement init(integerLiteral:).

No ABI compatibility concerns need apply to compile-time diagnostics :)

taylorswift · November 1, 2023, 4:41am

i think a limited expansion of the raw value compiler diagnostics to String and Substring makes sense.

i think one reason why strings were treated differently from numbers in the diagnostics system was the compiler’s ICU dependency. since the compiler started providing its own unicode tables a lot of compile time diagnostics that were not possible then are reasonable now.

ksluder · November 1, 2023, 4:48am

But it might mean there are two separately addressable bugs. The compiler should prevent you from assigning equal literal values to two enum cases. But perhaps == should be changed to perform discriminator matching instead of raw value comparison.

I’m curious it the behavior of case let diverges from if == in this case.

jrose · November 1, 2023, 6:02am

There’s not a bug here (in the sense that things are behaving as sorta-designed), just an unfortunate == provided by RawRepresentable as a default Equatable conformance. Because the enum has an == method, the enum-specific synthesis does not kick in. This is bad for this particular case, but also because of performance—comparing enum values is, in general, faster than comparing strings!

Ideally RawRepresentable wouldn’t provide that method at all, but it would be a breaking change to remove it. Having the compiler hardcode that that specific implementation shouldn’t count is ugly but technically possible; if we’re concerned about it being a breaking change in edge cases like this, the change could be limited to Swift 6 mode.

You can work around this by manually implementing == with a switch, but that definitely stinks as an answer.

jrose · November 1, 2023, 6:05am

(Addendum: why doesn’t the compile attempt to enforce the uniqueness of string literals by String equality? Because that would require Unicode tables in the compiler, and in particular you could get different behavior with different versions of Unicode. So the compiler doesn’t even try. There’s also no guarantee that a custom string literal type even cares about String equality, but the check could be hardcoded to String and Substring, or even just say “if you really want this behavior, implement it yourself”, so the “version of Unicode” thing is really the important limiting factor.)

xwu · November 1, 2023, 7:10am

Unicode has stability guarantees with respect to normalization forms such that a normalized string containing only assigned code points at time of compilation will remain normalized in the future. Unless I'm mistaken, this should be sufficient for the purposes of uniquing string raw values.

tera · November 1, 2023, 10:52am

Could there be two different issues here? I don't see EQ being called during enum case comparison (and it would be a bug on itself if it was called, as it shouldn't be!). Testing code:

// obviously bad, but just to see if it's being called and when:
extension String {
    static func == (lhs: Self, rhs: Self) -> Bool {
        print("comparing \(lhs) and \(rhs)")
        return false // always false
    }
}

print(E.a.rawValue) // Å
print(E.b.rawValue) // Å
print(E.a.rawValue == E.b.rawValue) // "comparing Å and Å", false
print(E.a == E.b) // still true, EQ was not called

xwu · November 1, 2023, 2:52pm

tera:

// obviously bad, but just to see if it's being called and when:
extension String {
    static func == (lhs: Self, rhs: Self) -> Bool {
        print("comparing \(lhs) and \(rhs)")
        return false // always false
    }
}

Just like in the other thread, you're shadowing == with another method that shares the same name, but you are not changing and cannot change the conformance of String to Equatable, so you're not observing what you think you are.

tera · November 1, 2023, 3:48pm

Found the equivalent edge case that doesn't use Strings and behaves exactly the same way:

enum F: Double {
    case x = -0.0
    case y = 0.0
}

var x = F.x
var y = F.y
var xraw = x.rawValue
var yraw = y.rawValue
print(xraw == yraw) // true
dumpHex(&x, MemoryLayout.size(ofValue: x)) // 00
dumpHex(&y, MemoryLayout.size(ofValue: y)) // 01
dumpHex(&xraw, MemoryLayout.size(ofValue: xraw)) // 00 00 00 00 00 00 00 80
dumpHex(&yraw, MemoryLayout.size(ofValue: yraw)) // 00 00 00 00 00 00 00 00
print(x == y) // true! unexpected

jrose · November 1, 2023, 3:57pm

It doesn’t, however, guarantee that an older version of Unicode will normalize the same way (in the worst case, because the codepoints are unassigned). So the Unicode tables have to be bundled with the compiler at the very least.

xwu · November 1, 2023, 5:38pm

Well, unless I'm missing something, it's not a worst case but the only case, where the string contains codepoints that are unassigned in that older version. It seems reasonable (to me at least) that Swift's compiler diagnostics for string literal uniqueness are limited to strings without unassigned codepoints just as it is to only String and Substring.

There are a number of Unicode recommendations relevant for a language like Swift that allows Unicode identifiers, for which the compiler could very much benefit from these Unicode tables. So to my mind the availability of these tables and APIs for the compiler is a question of when, not if.

jrose · November 1, 2023, 5:48pm

Still without reading the guarantees, my concern is that text that is not normalized will be normalized differently in a future version of Unicode. But even the unassigned case is bad: if I use an older compiler to work with some just-added emoji, and then update my compiler, the code could stop compiling. (Or it’s just a warning, as long as there’s a way to silence the warning.)

michelf · November 1, 2023, 6:05pm

I think the most surprising thing here is not that the compiler is allowing the enum to exist, but that a == b is equivalent to a.rawValue == b.rawValue. I would have expected the comparison to be based on the discriminant, not the raw value.

Comparing the raw values can make the comparison more expensive, especially with strings, and it can make the result different based on runtime considerations, like the version of Unicode used.

tera · November 1, 2023, 6:08pm

Note that there's also a possibility of code starting crashing at runtime, e.g. if the two strings were different in the old version of unicode but started compare equal in the new version:

let x = ["Å" : 1, "Å" : 2]
// 🛑 Fatal runtime error: Dictionary literal contains duplicate keys

. Besides as the Double example above shows it's not just about Strings and Unicode.