Add Various Zip-Related Types and Operations
- Proposal: SE-NNNN
- Authors: Dennis Vennink
- Review Manager: To be determined
- Implementation: dennisvennink/bdf3c52ae1d9a3750ec3507ab3262921
- Previous Discussions: 1, 2, 3
- Status: Awaiting implementation
Introduction
This proposal adds various zip-related types and operations to the standard library.
Motivation
In functional programming, zip and unzip are irreplaceable operations. zip maps an n-tuple of sequences to a sequence of n-tuples. It exhausts the input sequences up to the length of either the shortest or the longest sequence. In the case of shortest sequence exhaustion, all mapped elements have a value, but the rest of the elements of the longer sequences are not included. In the case of longest sequence exhaustion, the latter elements of the shorter sequences in the n-tuples have no value, but all elements are included. unzip is the inverse of zip; it maps a sequence of n-tuples to an n-tuple of sequences.
Currently, the standard library only contains the generic function zip(_:_:) that takes two Sequence parameters and returns a Sequence of 2-tuples using shortest sequence exhaustion. An operation using longest sequence exhaustion is oft-requested on the mailing lists and forums. An unzip operation complements both and completes the zip-related API for 2-tuples.
Proposed Solution
This proposal adds: the generic namespace Zip containing the types ShortestSequence and LongestSequence, the generic function zipLongest(_:_:), the method extension unzip() on Sequence where Element is equal to 2-tuples and the typealiases Zip2Sequence and Zip2Iterator.
Detailed Design
A new generic namespace Zip is introduced:
public struct Zip <Sequence1: Sequence, Sequence2: Sequence> { ... }
Zip contains two optimised Sequence types. The first is ShortestSequence (which is almost identical to Zip2Sequence):
public struct ShortestSequence: Sequence {
internal let sequence1: Sequence1
internal let sequence2: Sequence2
internal init (_ sequence1: Sequence1, _ sequence2: Sequence2) {
(self.sequence1, self.sequence2) = (sequence1, sequence2)
}
public struct Iterator: IteratorProtocol {
internal var iterator1: Sequence1.Iterator
internal var iterator2: Sequence2.Iterator
internal var reachedEnd = false
internal init (_ iterator1: Sequence1.Iterator, _ iterator2: Sequence2.Iterator) {
(self.iterator1, self.iterator2) = (iterator1, iterator2)
}
public mutating func next() -> (Sequence1.Iterator.Element, Sequence2.Iterator.Element)? {
if self.reachedEnd {
return nil
}
guard let element1 = self.iterator1.next(), let element2 = self.iterator2.next() else {
self.reachedEnd.toggle()
return nil
}
return (element1, element2)
}
}
public func makeIterator () -> Iterator {
return Iterator(self.sequence1.makeIterator(), self.sequence2.makeIterator())
}
}
And LongestSequence, which is a new type capable of longest sequence exhaustion:
public struct LongestSequence: Sequence {
internal let sequence1: Sequence1
internal let sequence2: Sequence2
internal init (_ sequence1: Sequence1, _ sequence2: Sequence2) {
(self.sequence1, self.sequence2) = (sequence1, sequence2)
}
public struct Iterator: IteratorProtocol {
internal var iterator1: Sequence1.Iterator
internal var iterator2: Sequence2.Iterator
internal var reachedEnd = false
internal init (_ iterator1: Sequence1.Iterator, _ iterator2: Sequence2.Iterator) {
(self.iterator1, self.iterator2) = (iterator1, iterator2)
}
public mutating func next() -> (Sequence1.Iterator.Element?, Sequence2.Iterator.Element?)? {
if self.reachedEnd {
return nil
}
let element1 = self.iterator1.next()
let element2 = self.iterator2.next()
if element1 == nil && element2 == nil {
self.reachedEnd.toggle()
return nil
}
return (element1, element2)
}
}
public func makeIterator () -> Iterator {
return Iterator(self.sequence1.makeIterator(), self.sequence2.makeIterator())
}
}
zip(_:_:) will now return a ShortestSequence:
public func zip <Sequence1: Sequence, Sequence2: Sequence> (_ sequence1: Sequence1, _ sequence2: Sequence2) -> Zip<Sequence1, Sequence2>.ShortestSequence {
return Zip.ShortestSequence(sequence1, sequence2)
}
zipLongest(_:_:) is a new generic function that returns a LongestSequence:
public func zipLongest <Sequence1: Sequence, Sequence2: Sequence> (_ sequence1: Sequence1, _ sequence2: Sequence2) -> Zip<Sequence1, Sequence2>.LongestSequence {
return Zip.LongestSequence(sequence1, sequence2)
}
Finally, unzip() is an extension on Sequence where Element is equal to 2-tuples:
extension Sequence {
public func unzip <Type1, Type2> () -> ([Type1], [Type2]) where Self.Element == (Type1, Type2) {
return self.reduce(into: ([Type1](), [Type2]())) { (result, element) in
result.0.append(element.0)
result.1.append(element.1)
}
}
}
Source Compatibility
To prevent code breakage, typealiases for Zip2Sequence and Zip2Iterator will be redirected to Zip.ShortestSequence and Zip.ShortestSequence.Iterator, respectively:
@available(*, deprecated, renamed: "Zip.ShortestSequence")
public typealias Zip2Sequence<Sequence1, Sequence2> = Zip<Sequence1, Sequence2>.ShortestSequence where Sequence1: Sequence, Sequence2: Sequence
@available(*, deprecated, renamed: "Zip.ShortestSequence.Iterator")
public typealias Zip2Iterator<Sequence1, Sequence2> = Zip<Sequence1, Sequence2>.ShortestSequence.Iterator where Sequence1: Sequence, Sequence2: Sequence
Effect on ABI Stability
Not applicable.
Effect on API Resilience
Not applicable.
Alternatives Considered
The main points of focus were the naming and return type of zipLongest(_:_:); would it return Optionals or not? If not, then variants with defaults or a closure would have to be introduced. Using default values meant that dummy placeholder values had to be used, which was not ideal. The variant with a closure makes using the function harder. These were all quickly dismissed as they're all derivatives of the Optional-returning variant. It turned out that .lazy.map(_:) can be used to efficiently implement all other variants.
While tempted, implementations for arities higher than 2 have been deferred until variadic generics have been introduced.