Substring violates Hashable's requirements - stdlib/compiler bug?

I bumped into what I guess might be a bug in the standard library or compiler.

The actual code and situation where it happened is quite involved and this is the smallest program I could make that reproduces the bug:

func randomString() -> String {
    (0..<10_000).map({ _ in Bool.random() ? "å" : "ä" }).joined()
}

extension String {
    func randomSubstring() -> Substring {
        let i = indices.dropLast(1).randomElement()!
        return self[i...i]
    }
}

func test() {
    let s1 = randomString()
    let s2 = randomString()
    var dict = [Substring : Int]()
    for _ in 0 ..< 1000 {
        let a = s1.randomSubstring()
        let b = s2.randomSubstring()
        dict[a, default: 0] += 1
        dict[b, default: 0] += 1
    }
    dict.forEach { print($0, $1) }
}
test()

Typical results when compiling and running this program a couple of times (the dictionary sometimes contains duplicate keys, and sometimes it crashes):

% swiftc --version
swift-driver version: 1.45.2 Apple Swift version 5.6.1 (swiftlang-5.6.0.323.66 clang-1316.0.20.12)
Target: x86_64-apple-macosx12.0
% swiftc -O test.swift && ./test
ä 492
å 1002
ä 506
% swiftc -O test.swift && ./test
ä 970
å 1030
% swiftc -O test.swift && ./test
å 498
ä 479
å 520
ä 503
% swiftc -O test.swift && ./test
Fatal error: Duplicate keys of type 'Substring' were found in a Dictionary.
This usually means either that the type violates Hashable's requirements, or
that members of such a dictionary were mutated after insertion.

Note that if the "å" and "ö" Characters are changed to eg "x" and "y", then it will behave as expected (never print duplicate Dictionary keys and never crash):

% swiftc -O test.swift && ./test
y 1004
x 996
% swiftc -O test.swift && ./test
y 1016
x 984
% swiftc -O test.swift && ./test
x 993
y 1007

Neat...I think this is enough to file a bug as-is!

https://github.com/apple/swift/issues/59388

1 Like

Try this version, does it return "true + false" for you? For me it returns correct true+true in some compiler versions and incorrect "true + false" in others (e.g. 5.6), although I might have oversimplified it and thrown the baby out with the bathwater.

func test() {
    let a = "ä1"
    let b = "ä2"
    let sa = a[a.startIndex ... a.startIndex]
    let sb = b[b.startIndex ... b.startIndex]
    print("strings equal: \(sa == sb)")
    print("hashes equal: \(sa.hashValue == sb.hashValue)")
}

test()

strings equal: true
hashes equal: false

1 Like

Thanks for raising this issue!

Yes, this is unfortunately indeed broken in the Swift 5.6 stdlib -- the fix is present on the Swift 5.7 release branch.

(Note: This fix is not included in the beta OS releases that Apple released earlier this week.)

Currently, the least painful workaround is to refrain from using Substring as a Set/Dictionary key for code that needs to run on a 5.6 stdlib. (I know that's still quite painful. :disappointed:)

2 Likes

Thanks for confirming!

Worth putting this ultimate check (debug + opt-in diagnostic mode only)?

    // ideally have this within every equatable implementation:
    public static func == (_ a: T, _ b: T) -> Bool {
        let equal = a.equal(b) // original eq code
        #if DEBUG
        if equal && some_debug_diagnostic_on {
            assert(a.hashValue == b.hashValue)
        }
        #endif
        return equal
    }
1 Like

We're thinking about ways to make the fix deploy back to earlier releases.

3 Likes