How can I fix "Type 'MyType' does not conform to protocol 'Collection'"?

CTMacUser · September 13, 2019, 7:41am

Continuing the discussion from Transitioning a lazy sequence to a collection:

I'm writing this in anger. I searched for the phrase in the subject in case anyone had a solution. Turns out, I already asked the question (at the above link), for the same kind of type! The question was never answered, so I need it here.

What causes this problem; is there some cache in Xcode or the Swift compiler that gets fried? Cleaning the build materials doesn't seem to purge the problem. It's a worse version of a problem I've seen before; usually the compiler will add protocol stubs for you, but sometimes it gets stuck on members you've already added. Now it sometimes gives up on advising you what's wrong.

Here's the type:

public struct LazyDeduplicationSequence<Base: Sequence> {
    @usableFromInline
    let base: Base
    @usableFromInline
    let areEquivalent: (Base.Element, Base.Element) -> Bool
    @usableFromInline
    init(_ base: Base, by areEquivalent: @escaping (Base.Element, Base.Element) -> Bool) {
        self.base = base
        self.areEquivalent = areEquivalent
    }
}

extension LazyDeduplicationSequence: Sequence {
    public typealias Iterator = LazyDeduplicationIterator<Base.Iterator>
    public typealias Element = Iterator.Element
    public __consuming func makeIterator() -> Iterator {
        return LazyDeduplicationIterator(base.makeIterator(), by: areEquivalent)
    }
    @inlinable
    public var underestimatedCount: Int {
        return base.underestimatedCount.signum()
    }
}

extension LazyDeduplicationSequence: LazySequenceProtocol {}

public typealias LazyDeduplicationCollection<T: Collection> = LazyDeduplicationSequence<T>

extension LazyDeduplicationSequence: Collection where Base: Collection {  // ERROR HERE
    public typealias SubSequence = LazyDeduplicationSequence<Base.SubSequence>
    public struct Index: Comparable {
        let isoRange: Range<Base.Index>
        public static func < (lhs: Index, rhs: Index) -> Bool {
            return lhs.isoRange.lowerBound < rhs.isoRange.lowerBound
        }
    }
    public var startIndex: Index {
        let start = base.startIndex
        guard let firstValue = base.first else { return endIndex }

        let end = base.firstIndex { !areEquivalent($0, firstValue) } ?? base.endIndex
        return Index(isoRange: start..<end)
    }
    public var endIndex: Index {
        return Index(isoRange: base.endIndex..<base.endIndex)
    }
    public var isEmpty: Bool { return base.isEmpty }
    public func index(after i: Index) -> Index {
        let nextStart = i.isoRange.upperBound
        precondition(i.isoRange.lowerBound < base.endIndex)
        guard nextStart < base.endIndex else { return endIndex }

        let nextValue = base[nextStart]
        let nextEnd = base[nextStart...].firstIndex { !areEquivalent($0, nextValue) } ?? base.endIndex
        return Index(isoRange: nextStart..<nextEnd)
    }
    public subscript(position: Index) -> Element {
        return base[position.isoRange.lowerBound]
    }
    public subscript(bounds: Range<Index>) -> SubSequence {
        return LazyDeduplicationSequence<Base.SubSequence>(base[bounds.lowerBound.isoRange.lowerBound ..< bounds.upperBound.isoRange.upperBound], by: areEquivalent)
    }
}

extension LazyDeduplicationSequence.Index: Hashable where Base.Index: Hashable {}

The error at the start of the Collection extension block is:

Type 'LazyDeduplicationSequence<Base>' does not conform to protocol 'Collection'

Am I triggering some bug?

Lantua · September 13, 2019, 12:07pm

Could you also include LazyDeduplicationIterator<Base.Iterator> ?

CTMacUser · September 13, 2019, 7:43pm

I first commented out the Collection conformance

extension LazyDeduplicationSequence/*: Collection*/ where Base: Collection {

And the Collection members themselves gave no error. Then I wondered if adding the conformance at the same time as the members was the problem. I first diverted myself with the backwards code:

extension LazyDeduplicationSequence/*: BidirectionalCollection*/ where Base: BidirectionalCollection {
    public func index(before i: Index) -> Index {
        let previousEnd = i.isoRange.lowerBound
        let previousLast = base.index(before: previousEnd)  // Crashes here if at startIndex
        let previousValue = base[previousLast]
        let previousStart: Base.Index
        if let doublePreviousLast = base[...previousLast].lastIndex(where: { !areEquivalent($0, previousValue) }) {
            previousStart = base.index(after: doublePreviousLast)
        } else {
            previousStart = base.startIndex
        }
        return Index(previousStart..<previousEnd)
    }
}

Then added conformance in separated statements:

extension LazyDeduplicationSequence: Collection where Base: Collection {}
extension LazyDeduplicationSequence: LazyCollectionProtocol where Base: Collection {}
extension LazyDeduplicationSequence: BidirectionalCollection where Base: BidirectionalCollection {}

But the same error for the Collection block showed up. But the BidirectionalCollection error had a fix-it attached. When I opened it, it asked to provide stubs. To my surprise, the stub wasn't a dummy Index type-alias, but a dummy Indices type-alias.

Could the default definition of Indices been the problem? I re-read "Indices.swift," where DefaultIndices lives, but I didn't see a problem. Then I made up my own indices-bundle type:

public struct LazyDeduplicationIndices<Elements: Collection> where Elements.SubSequence == Elements {
    @usableFromInline
    let elements: LazyDeduplicationCollection<Elements>
    @inlinable
    init(_ base: LazyDeduplicationCollection<Elements>) {
        elements = base
    }
}

extension LazyDeduplicationIndices: Collection {
    @inlinable
    public var underestimatedCount: Int { return elements.underestimatedCount }

    public struct Index: Comparable {
        @usableFromInline
        let isoRange: Range<Elements.Index>
        @inlinable
        init(_ base: Range<Elements.Index>) {
            isoRange = base
        }
        @inlinable
        public static func < (lhs: Index, rhs: Index) -> Bool {
            return lhs.isoRange.lowerBound < rhs.isoRange.lowerBound
        }
    }

    @inlinable
    public var startIndex: Index { return elements.startIndex }
    @inlinable
    public var endIndex: Index { return elements.endIndex }
    @inlinable
    public subscript(position: Index) -> Index {
        return position
    }
    @inlinable
    public subscript(bounds: Range<Index>) -> LazyDeduplicationIndices {
        return LazyDeduplicationIndices(elements[bounds])
    }
    @inlinable
    public var indices: LazyDeduplicationIndices { return self }
    @inlinable
    public var isEmpty: Bool { return elements.isEmpty }
    @inlinable
    public func index(after i: Index) -> Index {
        return elements.index(after: i)
    }
}

extension LazyDeduplicationIndices.Index: Hashable where Elements.Index: Hashable {}

extension LazyDeduplicationIndices: BidirectionalCollection where Elements: BidirectionalCollection {
    @inlinable
    public func index(before i: Index) -> Index {
        return elements.index(before: i)
    }
}

Since I didn't erase the collection code in LazyDeduplicationSequence, I could re-use it, even though the sequence type itself doesn't (yet) conditionally conform to Collection. I added a custom indices member to the extension then uncommented the Collection-conformance statement, and... it worked! Or at least didn't trigger a pre-compilation error. Uncommenting the LazyCollectionProtocol and BidirectionalCollection statements worked too, and the project worked when I did a full unit-test.

Why did supplying a custom Indices work? Is there a bug in the pre-checker in that it didn't tell me what was wrong in Collection conformance? I only got it due to trying BidirectionalCollection conformance and that's what BC suggested I fill in a stub for.

CTMacUser · September 13, 2019, 7:46pm

Here it is, but as mentioned in the post I just wrote before this, the iterator type wasn't the problem.

public struct LazyDeduplicationIterator<Base: IteratorProtocol> {
    @usableFromInline
    var base: Base
    @usableFromInline
    let areEquivalent: (Base.Element, Base.Element) -> Bool
    @usableFromInline
    var previous: Base.Element?
    @inlinable
    init(_ base: Base, by areEquivalent: @escaping (Base.Element, Base.Element) -> Bool) {
        self.base = base
        self.areEquivalent = areEquivalent
        previous = nil
    }
}

extension LazyDeduplicationIterator: IteratorProtocol {
    public typealias Element = Base.Element
    public mutating func next() -> Element? {
        if let previous = previous {
            self.previous = nil
            while let current = base.next() {
                guard !areEquivalent(previous, current) else { continue }

                self.previous = current
                break
            }
        } else {
            previous = base.next()
        }
        return previous
    }
}

Lantua · September 13, 2019, 7:47pm

Yeah, likely not. Just wanna get some “working” example.

Lantua · September 13, 2019, 8:24pm

Ok, a few things:

It looks like debouncing rather than deduplication.
startIndex can take O(n) time. It should be documented, as it departs from Collection's contract.
~~index(after:) can take O(n) time. It should be documented for violation of Collection contract.~~ It's for RandomAccessCollection, nvm.

The subscript(bounds:) implementation is incorrect. You shouldn't use upperBound.upperBound. It'd accidentally include value at upperBound if it has duplicated. Use upperBound.lowerBound instead.

let a = LazyDeduplicationCollection([1, 2, 3, 3, 3], by: ==)
let start = a.startIndex, end = a.index(start, offsetBy: 2)
a[start] // 1
a[end] // 3
Array(a[start..<end]) // [1, 2, 3], but should exclude 3

The problem lies with the Index, try to move it outside. I don't know why either. It seems like a bug.

public struct LazyDeduplicationIndex<Base: Comparable>: Comparable {
    ...
}

extension LazyDeduplicationIndex: Hashable where Base: Hashable {}

extension LazyDeduplicationSequence: Collection where Base: Collection {
    public typealias SubSequence = LazyDeduplicationSequence<Base.SubSequence>
    public typealias Index = LazyDeduplicationIndex<Base.Index>

    ...
}

Though I'd suggest an easier implementation by using just Base.Index that points to first entry in the duplicated subsequence. This also fix the subscript(bounds:) bug mentioned above.

extension LazyDeduplicationSequence: Collection where Base: Collection {
    public typealias SubSequence = LazyDeduplicationSequence<Base.SubSequence>
    public typealias Index = Base.Index

    public var startIndex: Index { return base.startIndex }
    public var endIndex: Index { return base.endIndex }
    public var isEmpty: Bool { return base.isEmpty }
    public subscript(position: Index) -> Element { return base[position] }

    public func index(after i: Index) -> Index {
        let value = base[i]
        return base[i...].firstIndex { !areEquivalent($0, value) } ?? base.endIndex
    }

    public subscript(bounds: Range<Index>) -> SubSequence {
        return .init(base[bounds], by: areEquivalent)
    }
}

Which results in

let a = LazyDeduplicationCollection([1, 2, 3, 3, 3], by: ==)
let start = a.startIndex, end = a.index(start, offsetBy: 2)
a[start] // 1
a[end] // 3
Array(a[start..<end]) // [1, 2]

CTMacUser · September 14, 2019, 3:07am

What's the difference? When I looked at your sample code, it looked odd until I realized that you exclude the "3" values entirely. In my algorithm, a run of repeats is reduced to a single instance. But your code obliterates a run of repeats completely (i.e. to zero instances).

Is reduction to zero instances what you call "deduplicate?" As far as I can tell when I look up that term, deduplication still reduces to one instance. I wouldn't think reducing a run of repeats to zero instances is very useful; I don't think it's actually a thing (in computer science terms).

My source code file has warnings about that. I stripped the comments for here.

I caught that when I was doing my unit tests.

I ended up doing that as I moved the Index code to my Indices type and used a type-alias in the collection type. But then why does the nested Index definition work in Indices?! The only difference I see is that the collection type is actually the sequence type with conditional conformance in play, while the Indices collection type is always definitive.

I had the index cache the duplicated range so I wouldn't have to keep calculating it. Then again, I still have to do a search each time through index(after:) and index(before:) to find the full adjacent range. Maybe it's not worth it. Switching back would give me O(1) startIndex again.

Lantua · September 14, 2019, 3:22am

Deduplication generally maintains the structure of the original data (order, hierarchy, etc), while removing the duplicates and replacing them with lightweight references, thus saving space. If you dedup some data, you wouldn't notice any structural change, only (possible) performance gain.
Debounce simply removes the repeated consecutive elements (rather, merging bouncing data when appropriate, but I think this is the closest term here).

If you have ["foo", "foo", "bar", "foo", "hello"], deduplication would, in a strict sense, result in Reference [0, 0, 1, 0, 2] + Storage ["foo", "bar", "hello"] however it looks like in the end, which doesn't happen here.

I'm not reducing them to zero instances. a should still be [1, 2, 3], you can even try [1, 1, 2, 2, 3, 3, 3] which yields the same result. The sample code shows that range expression is incorrect. start points to 1 and end points to 3. So start..<end should start from 1 up to but excluding 3.

original       // [1, 1, 2, 2, 2, 3, 3, 3, 3, 3]
a              // [1, 2, 3]
start          //  ^     ^
end            //        |
a[start..<end] // [1, 2) exclude the end element.

And the above would be the result if you do it by hand.

You seem to have already realised this, so I don't know where the confusion is.

You're not caching anything. It's the same amount of computation whether you store upperBound or not.

You cache upperBound to get next.lowerBound quickly,
but still use next.lowerBound to slowly compute next.upperBound
You only use lowerBound to slowly compute next.lowerBound.

And you're not using upperBound outside the context of advancing either.

CTMacUser · September 14, 2019, 3:04pm

Lantua:

I'm not reducing them to zero instances. a should still be [1, 2, 3] , you can even try [1, 1, 2, 2, 3, 3, 3] which yields the same result. The sample code shows that range expression is incorrect. start points to 1 and end points to 3 . So start..<end should start from 1 up to but excluding 3 .
original       // [1, 1, 2, 2, 2, 3, 3, 3, 3, 3]
a              // [1, 2, 3]
start          //  ^     ^
end            //        |
a[start..<end] // [1, 2) exclude the end element.
And the above would be the result if you do it by hand.

You seem to have already realised this, so I don't know where the confusion is.

You're using endIndex as a pointer to the last element. It's not, it's to a virtual position one past the last element. Since Swift uses half-open ranges by default, such a virtual position is a requirement. Or how did you think we represent cases where we want to target every element, including the last? Or how startIndex..<endIndex still worked for an empty collection?

Lantua · September 14, 2019, 3:05pm

No, not endIndex. The code uses end, which is start shifted by 2.

(Shoulda named them s and e I guess).