Type Safety of String Indices

SDGGiesbrecht · January 30, 2019, 4:16am

After tracking down and reading the pitch, review and re‐review of SE‐0180, it appears these people may be interested too: @dabrahams , @jrose, @Lily_Ballard, @xwu, @Drew_Crawford1.

Yup. Here’s a real world one that used to make sense, but is now brittle:

(It’s still somewhat reduced; some generics have been specialized for clarity—forgive me if I introduced typos in the process.)

extension String.UnicodeScalarView.Index {

    /// Returns the position of the cluster that contains this index.
    func cluster(in clusters: String) -> String.Index {
        let string = String(clusters)

        var copy = self
        var position = samePosition(in: string)

        while position == nil {
            // What does the next line do now if its not really a UnicodeScalarView.Index?
            copy = string.unicodeScalars.index(before: copy)
            position = copy.samePosition(in: string)
        }

        guard let result = copy.samePosition(in: string) else {
            unreachable()
        }
        return result
    }
}

And from there:

extension Range where Bound == String.UnicodeScalarView.Index {

    /// Returns the range of clusters that contains this range.
    func clusters(in clusters: String) -> Range<String.Index> {
        let lower = lowerBound.cluster(in: clusters)
        if let upper = upperBound.samePosition(in: String(clusters)) {
            return lower ..< upper
        } else {
            return lower ..< clusters.index(after: upperBound.cluster(in: clusters))
        }
    }
}

It is then used in ways like this:

let nfd = café.decomposedStringWithCanonicalMapping()
for scalarRange in nfd.unicodeScalars.matches(for: "\u{301}") {
    let converted = scalarRange.clusters(in: nfd)
    let offsets = nfd.distance(from: nfd.startIndex, to: converted.lowerBound)
        ..< nfd.distance(from: nfd.startIndex, to: converted.upperBound)
    print("Don’t use that symbol at \(offsets.lowerBound)!")

    let replacement = nfd[converted].unicodeScalars.filter { $0 != "\u{301}" }
    let stringifiedReplacement = String(String.UnicodeScalarView(replacement))
    print("Replace it with \(stringifiedReplacement)")
}

Does/should the noted line in the bottom‐level function trap? The documentation has no answer. If it rounds down, we could be indirectly asking what comes before the start index. If it rounds up, the range conversion method up one level could call past the end index. The function could start with a precondition check that verifies it can convert to itself, but that’s extra runtime overhead and extra code.

So is there—or can there be again—a better way for the intermediate functions to declare that they accept only the kind of index they are designed to handle, without introducing a long cascade of unnecessary optionality? Before Swift 4 this sort of thing just worked and the compiler enforced its integrity. Now it hurts my head just trying to reason about it—and it is really easy to forget a conversion and enter the twilight zone. It just occurred to me that even if optionality were embraced all the way up the chain, the top level caller will likely implicitly unwrap, which sets everything back to square one if some conversion was neglected before the call.

Edit: Found and fixed at least two typos already. Wrong time of night to be trying to do this...