Prefix(while:) removes one element too much

Hi,
for learning Swift i write a little parser at the moment. The Idea is to make heavy use of Sequences and lazy evaluation. I want to go from "Sequence of Character" to "Sequence of Token" ...

The problem i have with prefix(while:) is, that it also removes the first element that doesn't fit the predicate.

class Charstream : Sequence, IteratorProtocol
{
    private var remaining: Substring

    init(_ from: Substring) {
        remaining = from
    }

    func next() -> Character? {
        return remaining.popFirst()
    }
}

let test = Charstream("01234abcd")

let num = test.prefix(while: { $0 >= "0" && $0 <= "9" } )

for c in test {
    print(c)
}

This code prints:
b
c
d

What would be a swifty way to get around this?

You can use drop(while:) to get the sequence with the leading digits removed

let letters = test.drop(while: { $0 >= "0" && $0 <= "9" } )

for c in letters {
    print(c)
}

Btw, you might consider making your Sequence reusable, so that you can access it multiple times

example
class Charstream : Sequence
{
    private var remaining: Substring

    init(_ from: Substring) {
        remaining = from
    }

    func makeIterator() -> IndexingIterator<Substring> {
        remaining.makeIterator()
    }
}

Oh, I didn't notice that you want to make it all as a stream. I don't think there's a good way to do that - to check if a character is a digit, you have to remove it from a stream first. You would have to somehow put it back to the stream, before feeding it forward (even drop(while:) has to save that one nondigit, before fetching the rest of them from the stream)

As @cukr says, prefix(while:) and drop(while:) require reading one extra element (in this case, the first non-digit) in order to know if they're done discarding non-matching elements.

first(where:) would have the same effect of discarding those non-matching elements, but has the added benefit of returning that first non-digit instead of discarding it.

let test = Charstream("01234abcd")

// Notice the condition here is inverted, since we're now _looking_ for
// matches instead of dropping non-matches
guard let firstMatching = test.first(where: {
    $0 < "0" || $0 > "9" }) else { return }

for c in sequence(first: firstMatching, next: { _ in test.next() }) {
    print(c)
}

// Prints:
// a
// b
// c
// d

I'm not sure of the full context of your parser implementation, but I'm sure that you could simply return nil (or some other sentinel behaviour) in the else clause of the guard.

Thanks for the replys. I think i should have made the example more concise.
Lets assume we have two functions:
matchNumber() consumes Characters until it hits a non digit
matchOperator() consumes Characters which are valid as operators
But if I now write "10+1" the "+" will be dropped by the matchNumber() function.

In my original example i would like that num is ["0", "1", "2", "3", "4"] but without removing the "a".

I have to admit that my thinking here is biased from another language. I know a similar concept to Sequences from Dlang (they call it Ranges). The difference is that in D taking the first element and removing the first element are two different operations, whereas in Swift next() does both.

There happen to be a lot places in parsing where you want to look at the next element before deciding what to do. I know i can just use Arrays and Slices with first and removeFirst(), but then i have unnecessary allocations.

ArraySlice and Substring are very cheap. Unless you're using thousands and thousands of them, you shouldn't be worried. Or maybe you're referring to the fact that they prevent the deallocation of the original collection?

Yes, your current Charstream consumes one character per next call, so since prefix(while: condition) repeatedly calls next until it gets a (consumed!) character that does not satisfy the condition, it will have to consume "a" in order to conclude that "a" is not a digit.

Unless your parser keeps the last consumed and discarded character in some kind of memory you cannot build a parser around your Charstream.

I probably shouldn't have called it stream. It is really just a Sequence. So you should say "You can't build a parser around Swifts Sequence".

This is obvious... The whole point of this post was to ask, if there is a way to work around this limitation of Sequences. I just thought there must be a way, because this would be braindead simple with Dlang's Ranges (which are very similar to Swift's Sequences beside that).

That was not obvious from the title or content of the OP. Well, Sequence is single pass, so:

  1. If all input characters are available at once: Build your parser around a Collection instead.

  2. If the input characters are not available all at once (ie they are a stream): Build your parser around a Sequence and a buffer to hold any "peeked/discarded/non-consumed" characters, to make them available when parsing the next token.

Please post an implementation in D and I'm sure someone could translate it to idiomatic Swift.

1 Like

I actually found a cool solution which i want to share. The problem was that next() does two things,
removing and returning the first element.
On the other hand, if you create your own Sequence/Range protocol you lose all the nice standard library functionality.... actually you don't if you use protocol extension.

This creates a D like range (it actually looks better than in D because you don't have optionals there).

protocol MyRange : Sequence, IteratorProtocol {
    associatedtype Element
    var first: Element? {get}
    func removeFirst()
}

It's easy to make a Sequence out of this Range.

extension Sequence where Self: MyRange {
    func next() -> Element? {
        let result = first
        if result != nil {
            removeFirst()
        }
        return result
    }
}

And now let a more sane version of prefix(while:) shadow the original one

extension MyRange {
    func prefix(while predicate: (Element) -> Bool) -> [Element] {
        var result: [Element] = []
        while let tmp = first {
            if predicate(tmp) {
                result.append(tmp)
                removeFirst()
            } else {
                return result
            }
        }
        return result
    }
}

The example in the first post looks now like this.


class Charstream : MyRange
{
    private var remaining: Substring

    init(_ from: Substring) {
        remaining = from
    }

    var first: Character? {
        return remaining.first
    }

    func removeFirst() {
        remaining.removeFirst()
    }
}

let test = Charstream("01234abcd")

let num = test.prefix(while: { $0 >= "0" && $0 <= "9" } )

print(num)

for c in test {
    print(c)
}