Substring violates Hashable's requirements - stdlib/compiler bug?

Jens · June 11, 2022, 9:30pm

I bumped into what I guess might be a bug in the standard library or compiler.

The actual code and situation where it happened is quite involved and this is the smallest program I could make that reproduces the bug:

func randomString() -> String {
    (0..<10_000).map({ _ in Bool.random() ? "å" : "ä" }).joined()
}

extension String {
    func randomSubstring() -> Substring {
        let i = indices.dropLast(1).randomElement()!
        return self[i...i]
    }
}

func test() {
    let s1 = randomString()
    let s2 = randomString()
    var dict = [Substring : Int]()
    for _ in 0 ..< 1000 {
        let a = s1.randomSubstring()
        let b = s2.randomSubstring()
        dict[a, default: 0] += 1
        dict[b, default: 0] += 1
    }
    dict.forEach { print($0, $1) }
}
test()

Typical results when compiling and running this program a couple of times (the dictionary sometimes contains duplicate keys, and sometimes it crashes):

% swiftc --version
swift-driver version: 1.45.2 Apple Swift version 5.6.1 (swiftlang-5.6.0.323.66 clang-1316.0.20.12)
Target: x86_64-apple-macosx12.0
% swiftc -O test.swift && ./test
ä 492
å 1002
ä 506
% swiftc -O test.swift && ./test
ä 970
å 1030
% swiftc -O test.swift && ./test
å 498
ä 479
å 520
ä 503
% swiftc -O test.swift && ./test
Fatal error: Duplicate keys of type 'Substring' were found in a Dictionary.
This usually means either that the type violates Hashable's requirements, or
that members of such a dictionary were mutated after insertion.

Note that if the "å" and "ö" Characters are changed to eg "x" and "y", then it will behave as expected (never print duplicate Dictionary keys and never crash):

% swiftc -O test.swift && ./test
y 1004
x 996
% swiftc -O test.swift && ./test
y 1016
x 984
% swiftc -O test.swift && ./test
x 993
y 1007

xwu · June 11, 2022, 9:37pm

Neat...I think this is enough to file a bug as-is!

Jens · June 11, 2022, 9:51pm

https://github.com/apple/swift/issues/59388

tera · June 11, 2022, 10:56pm

Try this version, does it return "true + false" for you? For me it returns correct true+true in some compiler versions and incorrect "true + false" in others (e.g. 5.6), although I might have oversimplified it and thrown the baby out with the bathwater.

func test() {
    let a = "ä1"
    let b = "ä2"
    let sa = a[a.startIndex ... a.startIndex]
    let sb = b[b.startIndex ... b.startIndex]
    print("strings equal: \(sa == sb)")
    print("hashes equal: \(sa.hashValue == sb.hashValue)")
}

test()

strings equal: true
hashes equal: false

lorentey · June 11, 2022, 11:23pm

Thanks for raising this issue!

Yes, this is unfortunately indeed broken in the Swift 5.6 stdlib -- the fix is present on the Swift 5.7 release branch.

(Note: This fix is not included in the beta OS releases that Apple released earlier this week.)

Currently, the least painful workaround is to refrain from using Substring as a Set/Dictionary key for code that needs to run on a 5.6 stdlib. (I know that's still quite painful. )

tera · June 11, 2022, 11:48pm

Thanks for confirming!

Worth putting this ultimate check (debug + opt-in diagnostic mode only)?

    // ideally have this within every equatable implementation:
    public static func == (_ a: T, _ b: T) -> Bool {
        let equal = a.equal(b) // original eq code
        #if DEBUG
        if equal && some_debug_diagnostic_on {
            assert(a.hashValue == b.hashValue)
        }
        #endif
        return equal
    }

lorentey · June 11, 2022, 11:52pm

We're thinking about ways to make the fix deploy back to earlier releases.