String Index unification vs BidirectionalCollection requirements

this would also be useful for hierarchical structures, like a markdown tree. it would be so useful if you could iterate the same structure by section, or by paragraph, or by inline span.

This is what I use to not feel the pain of that verbosity.
struct StringIndex {
    private let string: String
    let index: String.Index
    
    init(_ string: String, _ index: String.Index) {
        self.string = string
        self.index = index
    }
    
    static func + (a: Self, distance: String.IndexDistance) -> Self {
        Self(a.string, a.string.index(a.index, offsetBy: distance))
    }
    static func - (a: Self, distance: String.IndexDistance) -> Self {
        Self(a.string, a.string.index(a.index, offsetBy: -distance))
    }
    static func - (a: Self, b: Self) -> String.IndexDistance {
        a.string.distance(from: b.index, to: a.index)
    }
    static func += (a: inout Self, distance: String.IndexDistance) {
        a = a + distance
    }
    static func -= (a: inout Self, distance: String.IndexDistance) {
        a = a - distance
    }
    
    @discardableResult
    static prefix func ++ (a: inout Self) -> Self {
        a += 1
        return a
    }
    
    @discardableResult
    static prefix func -- (a: inout Self) -> Self {
        a -= 1
        return a
    }
    
    @discardableResult
    static postfix func ++ (a: inout Self) -> Self {
        let v = a
        ++a
        return v
    }
    
    @discardableResult
    static postfix func -- (a: inout Self) -> Self {
        let v = a
        --a
        return v
    }
    
    static prefix func *(a: Self) -> Character {
        a.string[a.index]
    }
}

prefix operator ++
postfix operator ++
prefix operator --
postfix operator --
prefix operator *

extension String {
    var start: StringIndex {
        StringIndex(self, startIndex)
    }
    var end: StringIndex {
        StringIndex(self, endIndex)
    }
    subscript(_ i: StringIndex) -> Character {
        self[i.index]
    }
}

using like this:

var p = string.start + 4
print(string[p++])
print(string[p])
print(string[--p])
print(*p++)
print(*p--)
etc

PS. interestingly I can comment out operator declaration for "operator ++" and "operator --", looks like they are already defined by swift. Yet I can not declare them twice in my code - there's some magic with those built-in declarations.

PPS: StringIndex implementation can enforce that only indices within the same strings are comparable:

extension StringIndex: Equatable {
    public static func == (a: Self, b: Self) -> Bool {
        assert(a.string == b.string) // 🤔
        return a.index == b.index
    }
}

But there is a bug here as it gives false negatives: two different strings (e.g. different unicode compositions) can still compare equal. Is there any way to somehow compare internal string references with === ?

Your statement is true, but not all substrings are slices. Any generic collection algorithm operating on a string is only capable of acquiring proper slices, and these slices hold to all your expectations. Other kinds of substrings can be acquired in string land which are not slices of the string (they are only slices of its memory). Once you have such a substring, any collection algorithm operating on it must also only be capable of acquiring proper slices of the substring, and these slices must hold to all your expectations.

That last sentence is the part that is currently broken, which opens the door to the invalid state where all the other bizarre bugs start to cascade into each other. The two ways of fixing it and plugging the hole are to add the substring’s bounds into an index list inherited from its parent, or to treat its memory slice like a distinct string. @lorentey prefers the latter; I have no preference. Neither way ensures a substring has the same elements as its parent—that would be impossible, because it is fundamentally at odds with the functionality of subscripting a string with an endpoint in the middle of an element. Would a type distinction between a ProperSlice and a PseudoSlice have been a good idea at the time of SE‐0180? Maybe, but I am pretty sure it is too late now. (Though I myself would not be averse.)

3 Likes

To be precise, SE-0180 requires the latter. There is an ocean of Swift code already written that potentially assumes this "memory slicing" behavior, however broken it currently is in the stdlib. (I'm merely an engineer drifting in this ocean; I can paddle very hard, but I can only go so far against its currents.)

Substrings with bounds that aren't valid indices in their base exhibit some clearly incorrect/nonsensical behavior in Swifts 4...5.6, but they do reliably print the way SE-0180 wanted to define them. IIRC, appending such a substring to a string also works like SE-0180 wanted it to work. A handful of other operations are also safe & quite reliable, I expect.

To avoid having to go through Swift Evolution to revoke or amend SE-0180 (and, more importantly, to avoid the subsequent source/binary compatibility problems, which seem like a whirlpool of pain I really don't want to get sucked into), the smoothest way forward seems to be to fix the implementation to follow the original intent of this proposal as close as possible.

Luckily, it seems possible to do this with (relatively!) minor compatibility risks.

4 Likes

The fix for this has landed on main now and is expected to get cherry picked on release/5.7 soon.

The fix rounds indices passed to String and Substring operations down to the nearest Character, without any runtime warnings or conditional traps. (Subscript operations do not round to Character boundaries, only scalars, following the requirements of SE-0180. If a substring's startIndex isn't valid (reachable) in its base, then the substring's break positions are expected to diverge from the ones in base.)

The PR also fixes a large pile of index related issues in String and its various views, but it does not include updating RangeReplaceableCollection docs, or adding public rounding methods.

5 Likes