Introduction
This proposal addresses the limitations of the Standard Library's current zip API by adding improvements in the form of enhancements and additions.
It's comprised of:
- the types
Zip[2–6]Index, Zip[2–6]Sequence and ZipLongest[2–6]Sequence;
- the global
functions zip(_:_:[_:[_:[_:[_:]]]]), zipLongest(_:_:[_:[_:[_:[_:]]]]);
- and the
extension method unzip() on Sequence and LazyCollectionProtocol.
Discussions
It has been discussed in the following Evolution forum threads:
Its development has been discussed in the following Development forum thread:
Motivation
The Standard Library's current zip API consists solely of the global function zip(_:_:), which creates a Sequence of pairs from a pair of Sequences by zipping their respective elements until the shortest Sequence is exhausted, truncating the latter elements of the longest Sequence.
It's primarily used to iterate over two Sequences simultaneously:
func dotProduct <T: Numeric> (_ vector1: [T], _ vector2: [T]) -> T {
return zip(vector1, vector2).map(*).reduce(0, +)
}
print(dotProduct([-1.5, 2], [4, 3]))
// Prints "0.0"
And since it's lazily implemented we can also zip over (pseudo-)infinite Sequences; which is a common functional pattern:
extension Sequence {
func enumerated () -> Zip2Sequence<PartialRangeFrom<Int>, Self> {
return zip(0..., self)
}
}
print(Array("abc".enumerated()))
// Prints "[(0, "a"), (1, "b"), (2, "c")]"
However, it's missing certain protocol conformances, and features found in many other programming languages, like non-truncating and inverse zip operations and support for higher arities.
Missing protocol Conformances
Collection, BidirectionalCollection and RandomAccessCollection
Currently, Zip2Sequence's only conformance is to Sequence. Since SE-0234 removed SubSequence as an associatedtype from Sequence, we can extend Zip2Sequence to conditionally conform to protocols higher up the hierarchy without introducing a new type.
Adding these additional conformances has a number of benefits:
- It makes their properties and methods available for use.
- It makes existing type inferred code that uses operations with improved default implementations higher up the hierarchy more capable or performant.
- It guarantees that the instance can be non-destructively consumed by iteration when the underlying
Sequences conform to Collection.
However, the granularity of Sequence's protocol hierarchy makes a correct implementation of these conformances non-trivial.
Each Collection places additional conditional constraints on the underlying Sequences to comply with its semantics. These constraints hinder members lower in the protocol hierarchy from using a Collection's more capable or performant default implementations in a generic context. This means that these members will have to be reimplemented on an unconstrained extension. But which members specifically can only be found out by manually tracking code paths in the Standard Library's source code as the compiler doesn't provide fixits.
LazySequenceProtocol
Because Zip2Sequence doesn't conform to LazySequenceProtocol when its underlying Sequences conform to LazySequenceProtocol, some subsequent operations for which lazy variants exist are not used, even though Zip2Sequence itself is implemented lazily:
func id <T> (_ x: T) -> T {
return x
}
print(type(of: zip([0, 1, 2].lazy, "abc".lazy).map(id)))
// Prints "Array<(Int, Character)>"
Here, we workaround this limitation by calling lazy on zip(_:_:)'s result to make consequent operations lazy again:
print(type(of: zip([0, 1, 2].lazy, "abc".lazy).lazy.map(id)))
// Prints "LazyMapSequence<Zip2Sequence<LazySequence<Array<Int>>, LazySequence<String>>, (Int, Character)>"
While this doesn't have a negative impact on performance—as LazyMapSequence will unwrap LazySequence—it's not exactly elegant.
Non-Truncating zip Operation
Some successive operations require all elements of the underlying Sequences, including the latter elements of the longest Sequence. Such an operation is useful when the lengths of the underlying Sequences are indeterminable at compile time and helps to prevent potential correctness traps. It's also versatile in that it composes nicely with other operations.
Both eager and lazy concrete type-returning implementations of this operation can be created from existing operations and language constructs.
Here, we use state and a while loop to create an eager imperative implementation:
func zipLongest <Sequence1: Sequence, Sequence2: Sequence> (_ sequence1: Sequence1, _ sequence2: Sequence2) -> [(Sequence1.Element?, Sequence2.Element?)] {
var (iterator1, iterator2) = (sequence1.makeIterator(), sequence2.makeIterator())
var result = [(Sequence1.Element?, Sequence2.Element?)]()
result.reserveCapacity(Swift.max(sequence1.underestimatedCount, sequence2.underestimatedCount))
while true {
let (element1, element2) = (iterator1.next(), iterator2.next())
if element1 == nil && element2 == nil {
break
} else {
result.append((element1, element2))
}
}
return result
}
print(zipLongest([0, 1, 2], [0]))
// Prints "[(Optional(0), Optional(0)), (Optional(1), nil), (Optional(2), nil)]"
And here, we use sequence(state:next:) and lazy to create a lazy functional implementation:
func zipLongest <Sequence1: Sequence, Sequence2: Sequence> (_ sequence1: Sequence1, _ sequence2: Sequence2) -> LazySequence<UnfoldSequence<(Sequence1.Element?, Sequence2.Element?), (Sequence1.Iterator, Sequence2.Iterator)>> {
return sequence(state: (sequence1.makeIterator(), sequence2.makeIterator())) {
let (element1, element2) = ($0.0.next(), $0.1.next())
return element1 == nil && element2 == nil ? nil : (element1, element2)
}.lazy
}
print(Array(zipLongest([0, 1, 2], [0])))
// Prints "[(Optional(0), Optional(0)), (Optional(1), nil), (Optional(2), nil)]"
However, these implementations won't scale along with the capabilities of the underlying Sequences. A type that does, similar to Zip2Sequence, is tricky to implement for the aforementioned reasons.
Inverse zip Operation
Most programming languages that include zip also include some variant of its inverse operation, usually named unzip, as an idiom. Swift should follow suit.
Fortunately, this operation doesn't require any new types. But implementing it is tricky as some common performance and correctness traps exist.
Here, reduce(into:_:) can't reserve the capacity up-front:
func unzip <Base: Sequence, Element1, Element2> (_ sequence: Base) -> ([Element1], [Element2]) where Base.Element == (Element1, Element2) {
return sequence.reduce(into: ([Element1](), [Element2]())) { (result, element) in
result.0.append(element.0)
result.1.append(element.1)
}
}
And a lazy variant can't be implemented on a type that only conforms to LazySequenceProtocol since the operation requires multiple passes.
Support for Higher Arities
zip is currently limited to a pair of Sequences.
Here, we workaround it by nesting multiple calls to zip(_:_:) and calling map(_:) on its result to realign its elements:
import UIKit
let components = (
red: sequence(state: 0.0...1.0) { CGFloat.random(in: $0) },
green: sequence(state: 0.0...1.0) { CGFloat.random(in: $0) },
blue: sequence(state: 0.0...1.0) { CGFloat.random(in: $0) },
alpha: repeatElement(CGFloat(1.0), count: 3)
)
zip(components.red, zip(components.green, zip(components.blue, components.alpha)))
.map { UIColor(red: $0, green: $1.0, blue: $1.1.0, alpha: $1.1.1) }
.forEach { print($0) }
// Prints "UIExtendedSRGBColorSpace 0.357433 0.784562 0.903349 1"
// Prints "UIExtendedSRGBColorSpace 0.411219 0.622004 0.842976 1"
// Prints "UIExtendedSRGBColorSpace 0.615279 0.888622 0.651203 1"
However, this becomes less readable and increasingly more complicated for higher arities and is less performant.
For non-truncating zip operations this becomes a readability issue as both the values and tuples will be wrapped in Optionals.
Read more.