[Pre-Pitch] Getting the index to the next valid element

CTMacUser · June 19, 2019, 1:37am

Since my misadventures in creating an enum-based trie resulted in discovering a compiler bug, I've gone back to manually creating a CR/LF/CRLF/CR-CRLF parser. I tried years ago, and now updating it with my better experience with Swift.

I start with a protocol to indicate the CR and LF values:

/// A type that supports values for the ASCII line feed and carriage return.
protocol InternetLineBreakerValues {
    /// The value of the ASCII+ carriage return code point.
    static var crValue: Self { get }
    /// The value of the ASCII+ line feed code point.
    static var lfValue: Self { get }
}

I could have extended just UInt8, but I made this protocol then extended it with default implementations for ExpressibleByIntegerLiteral and ExpressibleByUnicodeScalarLiteral types so I can then add extensions for UnicodeScalar, Int8, etc. (Just in case I go back to the Swift-language token iterator.)

Now add an iterator to search for line breaks:

/// An iterator over locations within a given collection of where its line-
/// breaking sequences are.
struct LineTerminatorLocationIterator<Base: Collection> where Base.Element: Equatable & InternetLineBreakerValues {
    /// The remaining sub-collection to search.
    var collection: Base.SubSequence
    /// Which line-breaking sequences to search for.
    let targets: LineTerminatorSearchTargets
}

Since I need to look at most 3 code-points back, I originally manually arranged to get the next three values, but had to add special cases when there were less than three elements left. Get a function to simply return nil once going out of bounds was a lot easier then having to stop the flow to create if-let-else branches in my initializations.

extension Collection {
    /// Returns the position immediately after the given index, if they're
    /// dereferencable.
    func elementIndex(after i: Index) -> Index? {
        precondition(i < endIndex)
        let next = index(after: i)
        return next < endIndex ? next : nil
    }
}

extension LineTerminatorLocationIterator: IteratorProtocol {

    mutating func next() -> Range<Base.Index>? {
        var result: Range<Base.Index>?
        var first = collection.isEmpty ? nil : collection.startIndex
        var second = first.flatMap { collection.elementIndex(after: $0) }
        var third = second.flatMap { collection.elementIndex(after: $0) }
        while let firstIndex = first, result == nil {
            defer {
                first = second
                second = third
                third = third.flatMap { collection.elementIndex(after: $0) }
            }

            let secondValue = second.map { collection[$0] }
            let thirdValue = third.map { collection[$0] }
            switch (collection[firstIndex], secondValue, thirdValue) {
            case (Base.Element.crValue, Base.Element.crValue?, Base.Element.lfValue?) where targets.contains(.crcrlf):
                result = firstIndex..<collection.index(after: third!)
            case (Base.Element.crValue, Base.Element.lfValue?, _) where targets.contains(.crlf):
                result = firstIndex..<collection.index(after: second!)
            case (Base.Element.crValue, _, _) where targets.contains(.cr),
                 (Base.Element.lfValue, _, _) where targets.contains(.lf):
                result = firstIndex..<collection.index(after: firstIndex)
            default:
                break
            }
        }
        collection = collection[(result?.upperBound ?? collection.endIndex)...]
        return result
    }

}

I made a Sequence for this iterator, then upgraded it with Collection extensions, then optionally added BidirectionalCollection extensions. Doing index(before:) required the elementIndex(before:) method to help in the same way.