Introduction
I'm currently implementing zipLongest(_:_:)
(apple/swift#18769).
In the pitch phase, @Ben_Cohen mentioned the possibility of providing Collection
versions of zip. Indeed, this would be a nice feature to have for both zipLongest(_:_:)
and zip(_:_:)
.
This means that we have to introduce two new types, namely ZipLongest2Collection
and Zip2Collection
. (I'll primarily focus on the first, but everything is applicable to the latter as well.)
It turns out that it is not as straightforward as I thought it would be and would love some feedback from the community.
Current Implementation
An always up-to-date implementation can be found as a Swift package here.
Current Points Of Discussion
The current points of discussion can be found here.
Out-of-date part of original post.
Implementation
Overload zipLongest(_:_:)
specifically for Collection
s. It acts as the constructor for a ZipLongest2Collection
:
@inlinable public func zipLongest
<Collection1: Collection, Collection2: Collection>
(_ collection1: Collection1, _ collection2: Collection2)
-> ZipLongest2Collection<Collection1, Collection2> {
return ZipLongest2Collection(
_collection1: collection1, _collection2: collection2
)
}
The general definition of ZipLongest2Collection
:
@_fixed_layout public struct ZipLongest2Collection
<Collection1: Collection, Collection2: Collection> {
@usableFromInline internal let _collection1: Collection1
@usableFromInline internal let _collection2: Collection2
@inlinable public init (
_collection1 collection1: Collection1,
_collection2 collection2: Collection2
) {
(_collection1, _collection2) = (collection1, collection2)
}
}
Conformance to Collection
:
extension ZipLongest2Collection: Collection {
public typealias Index = Int
public typealias Element = (Collection1.Element?, Collection2.Element?)
@inlinable public var startIndex: Index { return 0 }
@inlinable public var endIndex: Index {
return Swift.max(collection1.count, collection2.count)
}
@inlinable public subscript (index: Index) -> Element {
let index1 = collection1.index(collection1.startIndex, offsetBy: index)
let index2 = collection2.index(collection2.startIndex, offsetBy: index)
return (
collection1.indices.contains(index1) ? .some(collection1[index1]) : .none,
collection2.indices.contains(index2) ? .some(collection2[index2]) : .none
)
}
@inlinable public func index (after i: Index) -> Index { return i + 1 }
}
For completion's sake, the implementations for BidirectionalCollection
and RandomAccessCollection
:
extension ZipLongest2Collection: BidirectionalCollection
where Collection1: BidirectionalCollection, Collection2: BidirectionalCollection {
@inlinable public func index (before i: Index) -> Index { return i - 1 }
}
extension ZipLongest2Collection: RandomAccessCollection
where Collection1: RandomAccessCollection, Collection2: RandomAccessCollection {}
A simple example (for context and to gain an intuition of how the algorithm works):
print(Array(zipLongest([0, 1, 2], [0])))
// Prints: "[(Optional(0), Optional(0)), (Optional(1), nil), (Optional(2), nil)]"
Rationale
Index
We need a concrete type for Index
, but which one should we choose?
-
A pair is a compound type, this means that we have two indices, one for
Collection1
and one forCollection2
. They might have the sameIndex
, but we can't make this a requirement as it would leave out too many use-cases. -
We can't make
Index
a compound type from the indices of bothCollection
s. -
We can't use the
Index
of the longestCollection
since we can only confirm the longestCollection
at run-time.
From deduction, the most obvious choice here is Int
. It's got precedence in Array
, other types from the Standard Library and C. This leads to the next issue.
Bottlenecks
Since our Index
differs from the underlying indices, we can't compare them. This means that we need alternative solutions to implement endIndex
and subscript(_:)
.
endIndex
We can use count
to retrieve our endIndex
:
@inlinable public var endIndex: Index {
return Swift.max(collection1.count, collection2.count)
}
But now we've hit a bottleneck. We're expected to provide our endIndex
in O(1).
count
is O(1) if the Collection
conforms to RandomAccessCollection
, otherwise it's O(n), where n is the length of the Collection
.
subscript(_:)
Here we use index(_:offsetBy:)
to retrieve the underlying indices and contains(_:)
and subscript(_:)
to check for the existence and retrievement of its associated Element
:
@inlinable public subscript (index: Index) -> Element {
let index1 = collection1.index(collection1.startIndex, offsetBy: index)
let index2 = collection2.index(collection2.startIndex, offsetBy: index)
return (
collection1.indices.contains(index1) ? .some(collection1[index1]) : .none,
collection2.indices.contains(index2) ? .some(collection2[index2]) : .none
)
}
Again, its time complexity is O(n) where O(1) is expected unless the underlying Collection
s conform to RandomAccessCollection
.
(contains(_:)
is needed here for safety. We're not certain if one of the elements lies outside of the shorter Collection
.)
last
The default implementation of last
, by way of BidirectionalCollection
, is the following:
public var last: Element? {
return isEmpty ? nil : self[index(before: endIndex)]
}
Same story here, our implementation of endIndex
makes this O(n) and RandomAccessCollection
saves the day once again.
Points of Discussion
Taking performance semantics and usability into consideration, what would be the best way forward? How should these types be implemented? (Or maybe you see room for improvement?)
Currently I see two options:
-
If the implementation specified above meets the threshold of Standard Library inclusion, properly document all complexity exceptions.
-
Only conditionally conform to
RandomAccessCollection
iffCollection1
andCollection2
conform toRandomAccessCollection
. This excludes all otherCollection
s down theprotocol
chain. For these we've still gotZipLongest2Sequence
as a back-up.