It turns out that it is not as straightforward as I thought it would be and would love some feedback from the community.
Out-of-date part of original post.
Implementation
Overload zipLongest(_:_:) specifically for Collections. It acts as the constructor for a ZipLongest2Collection:
@inlinable public func zipLongest
<Collection1: Collection, Collection2: Collection>
(_ collection1: Collection1, _ collection2: Collection2)
-> ZipLongest2Collection<Collection1, Collection2> {
return ZipLongest2Collection(
_collection1: collection1, _collection2: collection2
)
}
The general definition of ZipLongest2Collection:
@_fixed_layout public struct ZipLongest2Collection
<Collection1: Collection, Collection2: Collection> {
@usableFromInline internal let _collection1: Collection1
@usableFromInline internal let _collection2: Collection2
@inlinable public init (
_collection1 collection1: Collection1,
_collection2 collection2: Collection2
) {
(_collection1, _collection2) = (collection1, collection2)
}
}
Conformance to Collection:
extension ZipLongest2Collection: Collection {
public typealias Index = Int
public typealias Element = (Collection1.Element?, Collection2.Element?)
@inlinable public var startIndex: Index { return 0 }
@inlinable public var endIndex: Index {
return Swift.max(collection1.count, collection2.count)
}
@inlinable public subscript (index: Index) -> Element {
let index1 = collection1.index(collection1.startIndex, offsetBy: index)
let index2 = collection2.index(collection2.startIndex, offsetBy: index)
return (
collection1.indices.contains(index1) ? .some(collection1[index1]) : .none,
collection2.indices.contains(index2) ? .some(collection2[index2]) : .none
)
}
@inlinable public func index (after i: Index) -> Index { return i + 1 }
}
For completion's sake, the implementations for BidirectionalCollection and RandomAccessCollection:
extension ZipLongest2Collection: BidirectionalCollection
where Collection1: BidirectionalCollection, Collection2: BidirectionalCollection {
@inlinable public func index (before i: Index) -> Index { return i - 1 }
}
extension ZipLongest2Collection: RandomAccessCollection
where Collection1: RandomAccessCollection, Collection2: RandomAccessCollection {}
A simple example (for context and to gain an intuition of how the algorithm works):
print(Array(zipLongest([0, 1, 2], [0])))
// Prints: "[(Optional(0), Optional(0)), (Optional(1), nil), (Optional(2), nil)]"
Rationale
Index
We need a concrete type for Index, but which one should we choose?
-
A pair is a compound type, this means that we have two indices, one for Collection1 and one for Collection2. They might have the same Index, but we can't make this a requirement as it would leave out too many use-cases.
-
We can't make Index a compound type from the indices of both Collections.
-
We can't use the Index of the longest Collection since we can only confirm the longest Collection at run-time.
From deduction, the most obvious choice here is Int. It's got precedence in Array, other types from the Standard Library and C. This leads to the next issue.
Bottlenecks
Since our Index differs from the underlying indices, we can't compare them. This means that we need alternative solutions to implement endIndex and subscript(_:).
endIndex
We can use count to retrieve our endIndex:
@inlinable public var endIndex: Index {
return Swift.max(collection1.count, collection2.count)
}
But now we've hit a bottleneck. We're expected to provide our endIndex in O(1).
count is O(1) if the Collection conforms to RandomAccessCollection, otherwise it's O(n), where n is the length of the Collection.
subscript(_:)
Here we use index(_:offsetBy:) to retrieve the underlying indices and contains(_:) and subscript(_:) to check for the existence and retrievement of its associated Element:
@inlinable public subscript (index: Index) -> Element {
let index1 = collection1.index(collection1.startIndex, offsetBy: index)
let index2 = collection2.index(collection2.startIndex, offsetBy: index)
return (
collection1.indices.contains(index1) ? .some(collection1[index1]) : .none,
collection2.indices.contains(index2) ? .some(collection2[index2]) : .none
)
}
Again, its time complexity is O(n) where O(1) is expected unless the underlying Collections conform to RandomAccessCollection.
(contains(_:) is needed here for safety. We're not certain if one of the elements lies outside of the shorter Collection.)
last
The default implementation of last, by way of BidirectionalCollection, is the following:
public var last: Element? {
return isEmpty ? nil : self[index(before: endIndex)]
}
Same story here, our implementation of endIndex makes this O(n) and RandomAccessCollection saves the day once again.
Points of Discussion
Taking performance semantics and usability into consideration, what would be the best way forward? How should these types be implemented? (Or maybe you see room for improvement?)
Currently I see two options:
-
If the implementation specified above meets the threshold of Standard Library inclusion, properly document all complexity exceptions.
-
Only conditionally conform to RandomAccessCollection iff Collection1 and Collection2 conform to RandomAccessCollection. This excludes all other Collections down the protocol chain. For these we've still got ZipLongest2Sequence as a back-up.